Sie sind auf Seite 1von 6

Self-Organized Hierarchical Methods for Time Series Forecasting

Fuad M. Abinader Jr.1, Alynne C.S. de Queiroz, Daniel W. Honda


Universidade Federal do Rio Grande do Norte Natal/RN, Brazil e-mail: fabinader@gmail.com, alynnesaraiva@hotmail.com, danielwhonda@yahoo.com.br

Abstract Time series forecasting with the use of Artificial Neural Networks (ANN), in special with self-organized maps (SOM), has been explored in the literature with good results. One good strategy for improving computational cost and specialization of SOMs in general is constructing it via hierarchical structures. This work presents four different heuristics for constructing hierarchical SOMs for time series prediction, evaluating their computational cost and forecast precision and providing insight on future enhancements. Keywords- Time Series Forecasting; Self-Organized Maps, Hierarchical SOM

II. LITERATURE REVIEW In this section, we present some literature review on the application of Self-Organized Maps (SOMs) for time series forecasting as well as the use of hierarchical structures for SOM construction for improving computational cost and specialization, whose constitute the basis for this work. A. Self-Organized Map (SOM) for Time Series Forecasting When applying ANN for time series forecasting, the issue of forecasting time series is usually seen as a function approximation problem, and as such most literature works have focused on applying supervised learning methods like MLP or RBF for global and/or local approximation, respectively. Each sample on the training data, called regression vectors, are composed by the n last values in the time series (where n represents the regression order), and the expected value used during the training phase is the expected future value for that regression vector itself. Examples of such approaches can be seen on [5]and [6]. Self-Organizing Maps (SOMs) are a special type of architecture for ANN which performs at the same time input vector quantization and dimensionality reduction, while trying to maintain topologic relationships, and allowing easy clustering and data visualization, all using unsupervised learning [7]. Applying SOMs for time series forecasting at first is counter-intuitive, as quantizing vectors seems a solution far away from a function approximation problem; however, as it was presented in [8] if a special architecture and algorithm named Vector-Quantized Temporal Associative Memory (VQTAM) is used it is not only possible to forecast time series but also use it to extract useful information for time series analysis. On VQTAM, the input vector at time step t, x(t), is such that x(t) = [xout(t);xin(t)], where xin(t)=[x(t-1);x(t-2);...;x(t denotes the regression vector (i.e. past samples) p+1)] and xout(t) denotes the one-step desired prediction value used for training the network. The weight vector of the i-nth neuron, wi(t) is such that wi(t) = [wiout(t) wiin(t)] where wiin(t)=[wi(t-1);wi(t-2);...;wi(t-p+1)] denotes the portions of the weight (prototype) vector which store information about the inputs and wiout(t) denotes the output of the desired mapping. During training, the i-nth best matching unit (BMU) i*(t) (i.e. the winning neuron) is defined solely based on the xin(t) regression vector, such that

I.

INTRODUCTION

The large amount of time series data available at governments and companies requires automatic methods for information extraction in order to aid planning and decision making processes, generally presenting a compromise between observed data, random phenomena and related parameters [1]. In the last 20 years, there has been an increasing interest on the use of Artificial Neural Networks (ANN) for time series analysis and prediction [2]. Works on literature usually focus on using Multi-Layer Perceptron (MLP) and Radial Basis Function (RBF); however, a different class of ANN named Self-Organizing Maps (SOMs), although being recommended for vector quantization and clustering and for such seeming not fit for such task, has been lately reported on the literature as a efficient method for time series forecasting which produces as a side-effect knowledge on the time series itself (e.g SOM neuron prototypes, U-map with hits, etc..) [3]. Hierarchical methods for constructing SOMs have been advocated as a means for improving training time and prototype specialization, whose first example is the one proposed by Luttrell [4]. This paper further extends this concept by applying four different heuristics to construct hierarchical SOMs aimed at forecasting time series values, while evaluating their performance regarding forecast precision and computational cost in terms of space and time. The rest of the paper organizes as follows: on section II we review the application of SOMs for time series forecasting, as well as the construction of hierarchical SOMs; on section III we propose four different heuristics for time series prediction, whose performance results are presented on section IV; finally, section V concludes by providing relevant insights on future works and enhancements.
1 Supported by the RH Doutorado FAPEAM Ph.D. scholarship program (Edital n 020/2010)

i * (t ) ! arg min x in (t )  w i (t )
iN

in

(1)

, while the weight update procedure is performed using the entire x(t) = [xout(t);xin(t)] vector as in regular SOM. After the VQTAM network is trained, forecasting the time series value for a given regression vector xin(t) consists on again determining the i-nth winning neuron i*(t) using the definition (1) above, and for this i-nth winning neuron then return the value at wiout as the quantized forecasted value. As seen above, the VQTAM model provides a one-step ahead forecast, and can be used as a starting point for different forecasting models [3]. One example is Double Vector Quantization (DVQ) which uses two VQTAMs (one for the regression vectors themselves and another for the difference between two consecutive regression vectors) for multiple-step ahead forecasting via a incremental Monte Carlo simulation process. [9]. Lehtokangas et al. proposed using a Linear Auto-Regressive Model from Cluster of Data Vectors (LARD) [10], based on SOM and whose advantage is better identifying temporal consecutive dependencies. On the other hand, the Local Linear Mapping (LLM) method divides the temporal series model by constructing different AR sub-models for each sub-domain of samples falling under each i-nth winning neuron [11]. A mid-term between LARD and LLM is the K-SOM method, which combines the vector quantization from VQTAM with the construction of local linear AR models built during training process [12]. B. Hierarchical Self-Organized Map (HSOM) In its seminal work, Luttrell introduced the Hierarchical Self-Organized Map (HSOM) model [4], composed of a tree of maps with hierarchical relationships and which main advantages are to speed-up the learning time and enabling further specialization of prototypes. HSOM construction procedure works as follows, at the start a SOM is associated to the root node, and once this SOM is stabilized (i.e. learning rate converges), for each SOM prototype a new child node is associated. For each child node, if it is determined that it needs further specialization then a new SOM is trained with the portion of the input associated with the SOM prototype at the fathers node SOM. If a node is a leaf node (i.e. has no children), then it does not contain a SOM further specializing that region in the input state. Figure 1 represents what a HSOM with 2 layers and SOM map size 2x3 would look like.

specialization but where not because the algorithm has reached the layer limit. To overcome this, the Dynamic Hierarchical Self-Organized Map (HSOM) was proposed, introducing heuristic algorithms for tree expansion which allowed unbalanced trees being formed, with additional layers for more dense input sub-domains (i.e. more input samples associated to a given SOM prototype) [13]. Figure 2 represents what a DHSOM structure would look like.

Figure 2 - DHSOM structure after training

In DHSOM, there are three criteria for a SOM prototype further specializing (i.e. expanding the tree), as follows: y Activity: the prototype is active if the number of patterns (i.e. samples) it quantizes is superior to the mean number of patterns in all prototypes in the whole map which this prototype is part of; y Representativeness: the prototype is representative when its quantization error is superior to the mean quantization error in the whole map which this prototype is part of; y Maturity: A neuron is mature when it represents a minimal fraction of the initial pattern set (i..e. training samples), and it only works as a stop criterion as it disables further expansion of the network when the representativeness of the prototype is lesser than a pre-defined threshold; Costa [14] defines the size of the child map as a function of the percentage of patterns it is representing, relative to the total of patterns represented by its fathers prototype, i.e. (2) , where Nc represents the number of patterns associated to the new child map, Np represents the number of patterns associated to the current parent map, represents a growing constant (defined as 0.3) and Mp and Mc represent respectively the sizes of the parent and child maps. III. TIME SERIES FORECAST WITH HIERARCHICAL SOM On Section II we presented the VQTAM model for time series forecasting using SOM, as well as introducing the use of hierarchical structures to SOM as a means to speed-up training time while enabling further specialization. In this Section we present our proposal of constructing the VQTAM model using a hierarchical structure much like HSOM and DHSOM, using four different heuristic algorithms for the definition of number of layers and SOM map sizes.

Figure 1 - HSOM structure after training

In the original HSOM proposal, the number of layers and the map sizes were fixed, i.e. no growing dynamics on the network training so the generated tree is fully balanced. Although this eases the implementation it is not efficient, as there may be input space regions which either need no specialization but yet are specialized or need even further

A. Hierarchical VQTAM (HVQTAM) with Constant Number of Layers and Map Size In the Hierarchical VQTAM (HVQTAM) approach, we prepare the training set using the regression vectors X = [x1out,x1in ; x2out,x2in ; ;xnout,xnin] of size n as in regular VQTAM, and the algorithm for training and creating the hierarchical VQTAM (HVQTAM). Figure 3 illustrates graphically the regression vectors associated to each prototype on HVQTAM. Here, it can be seen that the prototypes in child node 3 are specializations of the 2nd prototype associated to parent node 1. Realizing time series predictions in HVQTAM consists in reproducing the steps represented on Table I.

Realizing time series forecasting in SD-HVQTAM consists in reproducing the same steps presented on Table II for HVQTAM, as the hierarchical structure is basically the same only differing on the tree configuration, which may be unbalanced if there is a higher concentration of training patterns in a given input domain region. C. Dynamic Hierarchical VQTAM (D-HVQTAM) with Variable Number of Layers and Map Size determining using Parent-Child Pattern Ratio heuristics This approach is fully dynamic, in the sense that differently from SD-HVQTAM the map size for each node on the hierarchical structure is defined dynamically, using as a basis the data within each j-nth partition Xj in Xchildren. For each node to be expanded, an algorithm named Optimal Data-Based Binning for Histograms [15] is applied to determine the optimal VQTAM map size for that Xj partition as the optimal number of bins if we were to create a histogram for the data within the Xj partition. This algorithm derivates the a posteriori probability for different number of bins in a histogram in the density model for the data in partition Xj using the Bayesian theory, so the solution (i.e. the optimal number of bins) is a balance between the maximum likehood function within each bin (which increases with the increase in number of bins) and the a posteriori probability (which decreases with the increase in number of bins). The training algorithm for D-HVQTAM is an adaptation of the training algorithm for SD-HVQTAM. Forecasting in D-HVQTAM consists in reproducing the same steps presented on Table II for HVQTAM. D. Dynamic Hierarchical VQTAM (D-HVQTAM) with Variable Number of Layers and Map Size determining using Parent-Child Pattern Ratio heuristics In this last approach, we start from the last D-HVQTAM approach and modify it to substitute the Optimal Data-Based Binning for Histograms heuristics on the child nodes for the heuristics defined by Costa [14], which defines the size of the child map as a function of the percentage of patterns it is representing, relative to the total of patterns represented by its fathers prototype, as defined in Equation 2. Therefore, the training algorithm for this variation of D-HVQTAM using Costas heuristics for child map size is an adaptation of the training algorithm for the previous D-HVQTAM using optimal bin number Forecasting in this D-HVQTAM variant consists in reproducing the same steps presented on Table II. IV. PERFORMANCE OF HIERARCHICAL SOM METHODS The main motivations behind the substitution of a regular SOM for a hierarchical SOM would be obtaining speed-ups on the training process and the possibility of further specialization of each prototype. Therefore, in order to verify the performance gains of using hierarchical VQTAM methods instead of regular VQTAM for time series forecasting we conducted a performance evaluation study whose results we report in this section.

Figure 3 - HVQTAM illustration


TABLE I. 1. 2. HVQTAM FORECASTING ALGORITHM

3.

Define the i-nth current node icur as the root node; in Present the regression vector x (t) to the mcur VQTAM map associated to the i-nth current node icur, finding the j-nth BMU winj using the equation defined in Equation 1; If the j-nth ij node associated to the winj BMU is a leaf node, return the value in woutj as the time series forecast; otherwise, define the j-nth ij node associated to the winj BMU as the icur current node, and return to step 2;

As can be seen, forecasting time series values with HVQTAM is as simple as navigating through the tree structure to find the leaf node with BMU win most similar to the regression vector xin, and for this return the value in wout. B. Semi-Dynamic Hierarchical VQTAM (SD-HVQTAM) with Variable Number of Layers and Map Size determining using Optimal Bin Size heuristics In this approach, the VQTAM map size used by each node is still constant as in HVQTAM, but we no longer set the l maximum number of layers on the hierarchical structure, but instead we use a simple heuristics of determining the expansion of each node (i.e. training a new VQTAM and adding child nodes) whenever a minimum number of training patterns in the j-nth partition Xj in Xchildren is found. The training algorithm for SD-HVQTAM is therefore a simple modification of the HVQTAM algorithm in order to no longer bound the maximum number of layers, but instead check at each moment whether the minimal number of training patterns is present on the j-nth partition Xj in Xchildren.

A. Performance Evaluation Methodology In order to evaluate the four heuristics proposed on Section III, we utilized the same time series proposed by Barreto [8], representing oil pressure measurements in a hydraulic actuator. Performance metrics used for this study include execution time, memory space occupied by the hierarchical structure, number of rules (i.e.VQTAM prototypes associated to each leaf node) and the Normalized Root Mean Squared Error, proposed by Barreto for evaluation of nonhierarchical VQTAM methods [3] and which measures the difference between the real and forecasted time series value, given by Equation 3 below:

indicating that the prototype specialization is far away from being optimal, leaving room for improvements.

Figure 4 - Forecast using conventional VQTAM

(3)

We implemented the four heuristic algorithms for hierarchical VQTAM construction and evaluated their performance against conventional VQTAM on a MatLab simulation using HITs famous SOM Toolbox v2.0 [16], and since Matlab simulations are not as efficient in terms of time and memory space as e.g. conventional C++ software, we normalized the time and space cost metrics we report here so to make them the most platform-independent as possible. For the evaluation of the four different heuristics, we used the following parameters specified on Table I.
TABLE I. Hierarchical Method HVQTAM Map size Map size SD-HVQTAM Minimal samples per prototype D-VQTAM v1 D-VQTAM v2 Growing constant 0.3 Minimal samples per prototype Maturity Criterion 50 50 1% 4x1 4x1 TABLE TYPE STYLES Parameters
Name Value

Figure shows forecast for the same validation data using a HVQTAM trained with the same training data. It can be seen that although the output of the 1st layer (in blue) in HVQTAM is far away from the real value (in black), the output from the last layer (total of 64 patterns) where the leaf nodes are (in red) visually resembles the output of the conventional VQTAM (total of 325 patterns) using 5 times less patterns. This not only proves that additional layers increase forecast precision but also that hierarchical methods for time series like this are more efficient than conventional methods.
1 S rie Orig ina l P redi o 1a.ca m ad a 0.6 P redi o Hi erarqu ica 0.8

0.4

0.2

-0.2

-0.4

-0.6

-0.8

-1

10 0

20 0

30 0

40 0

50 0

60 0

70 0

80 0

90 0

10 00

Maximum no. of layers

Figure 5 - Forecast using HVQTAM

The tree configuration for this HVQTAM presented in Figure is balanced, as expected due to the fixed map size and maximum number of layers.
1 0. 9 0. 8

0. 7

0. 6

0. 5

0. 4

0. 3

B. Performance Evaluation Results On Figure 4 we present the one-step forecasted values from a conventional VQTAM structure (in blue) plotted against the real values (in red). Using the SOM toolkits own capabilities of determining the optimal SOM map size, i.e. using Principal Component Analysis (PCA) to determine the ratio between the eigenvalues of the two most relevant components/eigenvectors, the SOM trained yielded a SOM map size of 65x5 prototypes. It can be visually observed that although in general the VQTAM provides good approximation for time series values there are both regions where the forecasted value is far away from real value (e.g. on lower Y-scale valleys) and regions where it oscillates around the real value (e.g. on higher Y-scale peaks),

0. 2

0. 1

0. 1

0. 2

0. 3

0. 4

0. 5 heigh t = 4

0. 6

0. 7

0. 8

0. 9

Figure 6 - Tree for HVQTAM

Figure shows forecast for the same validation data using a HVQTAM trained with the same training data. Again, results for the last layer present better forecast precision than the output of the first layer, but differently from HVQTAM, results for SD-HVQTAM present both less oscillation around the real values and are more near on regions with less dense time series values. This happens because differently from HVQTAM, SD-HVQTAM does not precludes different number of layers per child node (i.e. unbalanced tree) so its SOM maps could grow dynamically and more uniformly

where there were more samples leading to more precise forecasting.


1 0.8 S rie O rig in a l P re d i o 1 a . c a m a d a P re d i o c a m a d a F in a l 0.6

0. 9

0. 8

0. 7

0. 6

0. 5

0.4

0. 4
0.2

0. 3
0

0. 2
-0 . 2

0. 1
-0 . 4

0
-0 . 6

0. 1

0. 2

0. 3

0. 4

0. 5 height = 2

0. 6

0. 7

0. 8

0. 9

-0 . 8

-1

Figure 40 - Tree for D-HVQTAM v1


0 100 200 300 400 500 600 700 800 900 1000

Figure 7 - Forecast using SD-HVQTAM

This unbalanced tree as result of this semi-dynamic behavior (remember, the map size is still fixed for SDHVQTAM) can be seen on Figure
1 0 .9 0 .8

Figure 51 shows forecast for the same validation data using a D-HVQTAM v2 with map size of the root VQTAM determined by the optimal bin size trained with the same training data, and map size for the child nodes determined as a ratio of the map size in the parent node. As it can be seen, visually the precision from both D-HVQTAM methods are almost indistinguishable between them.
1 S r ie O rig ina l 0 .8 P re d i o 1a . ca m a da P re d i o ca m ad a F in al 0 .6

0 .7

0 .6

0 .5

0 .4 0 .4 0 .2 0 .3 0

0 .2

0 .1

- 0. 2

- 0. 4 0 0 0 .1 0 .2 0 .3 0 .4 0 .5 h eig ht = 5 0 .6 0 .7 0 .8 0 .9 1 - 0. 6

Figure 8 - Tree for SD-HVQTAM

- 0. 8

-1 0 1 00 2 00 3 00 4 00 5 00 6 00 7 00 8 00 9 00 1 00 0

Figure shows forecast for the same validation data using a D-HVQTAM with map size determined by the optimal bin size trained with the same training data. Here, differently from the two previously hierarchical methods the forecast for the first layer already present a good approximation of real values, which can be explained by the use of the Optimal Data-Based Binning for Histograms algorithm leading to the determination of a initial map size (28 in total) already providing a near-uniform distribution of regression vectors among SOM prototypes. It can also be noticed that the dimension of the difference between forecast from first layer and last layer is also smaller if compared with results from previous methods.
1 S rie Ori ginal P redio 1 a. cam ada P redio c am ada Fin al 0.8 0.6

Figure 5 - Forecast using D-HVQTAM v2

It can be noticed on Figure 62 that the tree now is not only unbalanced, but that while there are child nodes which were not expanded there are also nodes in which two additional layers were created. This can be attributed mainly to the different criteria for determining the map sizes among the root node and their children, which leads to sub-optimal map sizes in certain regions with less dense time series data.
1 0 .9 0 .8

0 .7

0 .6

0 .5

0 .4

0 .3

0.4

0 .2

0.2

0 .1

0 .1

0 .2

0 .3

0 .4

0 .5 h eig ht = 3

0 .6

0 .7

0 .8

0 .9

-0.2

-0.4

Figure 6 - Tree for D-HVQTAM v2

-0.6

-0.8

-1

100

200

300

400

500

600

700

800

900

1000

Figure 9 - Forecast using D-HVQTAM v1

Numerical results for this evaluation are summarized on Table VII, presenting normalized errors, time and space cost, as well as the number of rules generated.
TABLE II. Method VQTAM HVQTAM SD-HVQTAM D-HVQTAM v1 D-HVQTAM v2
Norm. Time 2.24 1.71 1.16 1.55 1

Since the VQTAM at the first layer of D-HVQTAM v1 was already producing good forecast approximation, it also means that the need for further specialization is smaller, which becomes transparent when analyzing the tree structure in Figure 40 and noticing not only that it is balanced (despite this not being enforced by the algorithm) but also that the map sizes for the children nodes are also smaller (not larger than 10 prototypes per VQTAM).

TABLE TYPE STYLES


Norm. Memory 2.53 4.61 3.2 1.65 1 # of rules 325 199 152 164 128 NRMSE 1st layer 0.85% 23.49% 23.56% 4.2% 4.3% NRMSE last layer 1.22% 1.57% 1.45% 2.02%

Analyzing the numerical results from Table II it can be clearly seen that the VQTAM method is providing forecasts which are numerically more precise than hierarchical methods at the cost of larger training time and memory costs, which can altogether be explained by the larger number of rules generated. However, as we are looking to use hierarchical structures in order to optimize computational costs and further specializes prototypes, this numerical result not necessarily represents a uniform behavior, as it was previously explained for Figure . Techniques HVQTAM, SD-HVQTAM and DHVQTAM v1 present similar NRMSE results on the last layer, while the better results for the first layer are provided by both versions of D-HVQTAM, which means that in order to obtain similar forecast approximations as those presented by the first layer of both D-HVQTAM the static and semidynamic HVQTAM methods need to specialize all the way down to the last layer. Although NRMSE results for DHVQTAM v2 at the last layer are not the best ones they are quite near even to the original VQTAM results, at the advantage of smaller computational costs in both time and memory space, therefore in front of those results we advocate that D-HVQTAM v2 is the hierarchical method that presents the best compromise between efficiency, specialization and computational cost. V. CONCLUSION Performance evaluation indicated that time series forecasting using hierarchical structures of self-organized maps presents reasonable vector quantization characteristics, resulting in good results in terms of forecast precision and computational cost. Further research directions are the automatic extraction of knowledge from the hierarchical structure (i.e. by using the U-map on the last layers), as well as enabling on-line update on the hierarchical structure, which becomes possible in this hierarchical configuration since an update on a given input domain region represented by a tree branch does not necessarily implicate on updates on the whole tree. Also, since the most determinant factor for such satisfactory results was the use of heuristics for determining optimal tree parameters like the number of sons (i.e. map size) and number of layers, we indicate as a promising future work further enhancing heuristics determining such parameters whose could enable even higher performance improvements in terms of forecasting error without compromising computational efficiency. REFERENCES [1] A. K. Palit and D. Popovic, Computational Intelligence in Time Series Forecasting - Theory and Engineering Applications, 1st ed.: Springer, 2005, vol. I. [2] A. Weigend and N. Gershefeld, Time Series Prediction: Forecasting the Future and Understanding the Past.: Westview Press, 1993. [3] G.A. Barreto, "Time Series Prediction with the SelfOrganizing Map: A Review," Perspectives of NeuralSymbolic Integration, vol. 77, pp. 135-158, 2007.

[4] S. P. Luttrell, "Hierarchical self-organizing networks," in First IEE International Conference on Artificial Neural Networks, London, 1989, pp. 2-6. [5] G. Zhang, B. E. Patuwo, and M. Y. Hu, "Forecasting with artificial neural networks: The state of the art," International Journal of Forecasting, vol. 14, no. I, pp. 35-62, March 1998. [6] R. J. Frank, N. Davey, and S. P. Hunt, "Time Series Prediction and Neural Networks," Journal of Intelligent & Robotic Systems, vol. 31, pp. 91-103, 2001. [7] T. Kohonen, "The self-organizing map," Proceedings of the IEEE, vol. 78, no. 9, pp. 1464 - 1480 , September 1990. [8] G. A. Barreto and A. F. R. Araujo, "Identification and control of dynamical systems using the self-organizing map," IEEE Transactions on Neural Networks, vol. 15, no. 5, pp. 12441259, 2004. [9] G. Simon and et. al, "Time Series Forecasting: obtaining long term trends with self-organizing maps.," Pattern Recognition Letters, Elsevier, vol. 26, 2005. [10] J. Saarinen, K. Kaski, and P. Huuhtanen M. Lehtokangas, "A network of autoregressive processing units for time series modeling.," Applied Mathematics and Computation, 1996. [11] J. et. al. Walker, "Non-linear prediction with selforganizing map," Proceedings of the IEEE Internaitonal Joint Conference on Neural Networks (IJCNN 90), vol. 1, 1990. [12] J.C.M. Mota, L.G.M. Souza, and R.A. Frota G.A. Barreto, "Nonstationary time series prediction using local models based on competitive neural networks.," Lecture Notes in Computer Science, pp. 3029:1146 1155, 2004. [13] J. M. Barbalho, Algoritmo SOM com Estrutura Hierarquica e Dinmica Aplicado a Compresso de Imagens, 2002. [14] J.A.F Costa, "Classificao Automtica e Anlise de Dados por Redes Neurais Auto-Organizveis," Tese de Doutorado, 1999. [15] K. H. Knuth, "Optimal Data Based Binning for Histograms," Preprint do Arxiv, 2006. [16] Helsinki Institute of Technology. (2011, July) HIT's SOM Toolbox v2.0. [Online]. http://www.cis.hut.fi/somtoolbox/

Das könnte Ihnen auch gefallen