Sie sind auf Seite 1von 9

Forecasting Stock Price by SVMs Regression

Yukun Bao1, Yansheng Lu2, and Jinlong Zhang3


1Department of Management Scince & information System, School of Management,
Huazhong University of Science and Technology, Wuhan 430074, China
yukunbao@mail.hust.edu.cn
2 College of Computer Science, Huazhong University of Science and Technology, Wuhan
430074, China
yslu@mail.hust.edu.cn
3Department of Management Scince & information System, School of Management,
Huazhong University of Science and Technology, Wuhan 430074, China
jlzhang@mail.hust.edu.cn

Abstract. Forecasting stock price is one of the fascinating issues of stock


market research. Accurately forecasting stock price, which forms the basis
for the decision making of financial investment, is probably the biggest
challenge for capital investment industry, which leads it a widely
researched area. Time series forecasting and neural network are once
commonly used for prediction on stock price. This paper deals with the
application of a novel neural network technique, support vector machines
(SVMs) regression, in forecasting stock price. The objective of this paper is to
examine the feasibility of SVMs regression in forecasting stock price. A data
set from shanghai stock market in China is used for the experiment to test the
validity of SVMs regression. The experiment shows SVMs regression a
valuable method in forecasting the stock price.
Keywords: Stock price forecasts; SVMs regression; Machine learning

1 Introduction

Stock price forecasts are valuable for investors, which tell out the investment
opportunities. Toward the fascinating goal, research efforts have been made to find
superior forecasting methods. Financial time series forecasting is regarded as one of
the most challenging applications of modern time series forecasting. But numerous
studies have found that univariate time series, such as Box-Jenkins ARIMA models
are as accurate as more expensive linear regression or vector autoregressive models
[1,2,3]. The success of linear models, however, is conditional upon the underlying
data generating process being linear and not being random. One view in financial
economics is that market prices are random and that past prices cannot be used as a
guide for the price behavior in the future. Chaos theory, however, suggests that a
seemingly random process may in fact have been generated by a deterministic
function that is not random [4,5]. In such a case, ARIMA methods are no longer a

C. Bussler and D. Fensel (Eds.): AIMSA 2004, LNAI 3192, pp. 295–303, 2004.
© Springer-Verlag Berlin Heidelberg 2004
296 Y. Bao, Y. Lu, and J. Zhang

useful tool for estimation and forecasting. Research efforts turn to new methods. One
of them is the study of neural networks.
Neural networks have been successfully used for modeling financial time series [6,7].
Especially, several researchers report modest, positive results with the prediction of
market prices using neural networks [8,9,10], but not by using price and volume
histories alone, and no one uses technical analysis pattern heuristics. Neural networks
are universal function approximations that can map any non-linear function without a
priori assumptions about the properties of the data [11]. Unlike traditional statistical
models, neural networks are data-driven, non-parametric weak models, and they let
“the data speak for themselves”.Consequently, neural networks are less susceptible to
the problem of model misspecification as compared to most of the parametric models.
Neural networks are also more noise tolerant, having the ability to learn complex
systems with incomplete and corrupted data. In addition, they are more flexible,
having the capability to learn dynamic systems through a retraining process using new
data patterns. So neural networks are more powerful in describing the dynamics of
financial time series in comparison to traditional statistical models [12,13,14,15].
Recently, a novel neural network algorithm, called support vector machines (SVMs),
was developed by Vapnik and his co-workers [16]. Unlike most of the traditional
neural network models that implement the empirical risk minimization principle,
SVMs implement the structural risk minimization principle which seeks to minimize
an upper bound of the generalization error rather than minimize the training error.
This induction principle is based on the fact that the generalization error is bounded
by the sum of the training error and a confidence interval term that depends on the
Vapnik–Chervonenkis (VC) dimension. Based on this principle, SVMs achieve an
optimum network structure by striking a right balance between the empirical error and
the VC-confidence interval. This eventually results in better generalization
performance than other neural network models. Another merit of SVMs is that the
training of SVMs is equivalent to solving a linearly constrained quadratic
programming. This means that the solution of SVMs is unique, optimal and absent
from local minima, unlike other networks’ training which requires non-linear
optimization thus running the danger of getting stuck in a local minima. Originally,
SVMs have been developed for pattern recognition problems [17]. However, with the
introduction of Vapnik’s ε -insensitive loss function, SVMs have been extended to
solve non-linear regression estimation problems and they have been shown to exhibit
excellent performance [18,19].
This paper consists of five sections. Section 2 presents the principles of SVMs
regression and the general procedures of applying it. By raising an example from
stock market in China, the detailed procedures involving data set selection, data
preprocessing and scaling, kernel function selection and so on are presented in
Section 3. Section 4 discusses the experimental results followed by the conclusions
drawn from this study and further research hints in the last section.
Forecasting Stock Price by SVMs Regression 297

2 SVMs Regression Theory

Given a set of data points G = {( x i , d i )}in ( x i is the input vector, di is the desired
value and n is the total number of data patterns), SVMs approximate the function
using the following:
y = f ( x ) = wφ ( x ) + b (1)
where φ (x) is the high dimensional feature space which is non-linearly mapped from
the input space x . The coefficients w and b are estimated by minimizing
1 n 1 2
RSVMs (C ) = C ∑ Lε (d i , yi ) + w , (2)
n i =1 2
 d − y − ε d − y ≥ ε
Lε (d , y ) =  (3)
0 otherwise
n
In the regularized risk function given by Eq. (2), the first term C ( 1 n ) ∑ Lε (d i , y i ) is
i =1
the empirical error (risk). They are measured by the ε -insensitive loss function given
by Eq. (3). This loss function provides the advantage of enabling one to use sparse
data points to represent the decision function given by Eq. (1). The second term
1 2
w , on the other hand, is the regularization term. C is referred to as the regularized
2
constant and it determines the trade-off between the empirical risk and the
regularization term. Increasing the value of C will result in the relative importance of
the empirical risk with respect to the regularization term to grow. ε is called the tube
size and it is equivalent to the approximation accuracy placed on the training data
points. Both C and ε are user-prescribed parameters.
To obtain the estimations of w and b , Eq. (2) is transformed to the primal function
given by Eq. (4) by introducing the positive slack variables ξ i and ξ i* as follows:
n 1 2
Minimize RSVMs ( w, ξ (*) ) = C ∑ (ξ i + ξ i* ) + w
i =1 2
d i − wφ ( xi ) − bi ≤ ε + ξ i ,
Subjected to (4)
wφ ( xi ) + bi − d i ≤ ε + ξ i* , ξ (*) ≥ 0
Finally, by introducing Lagrange multipliers and exploiting the optimality constraints,
the decision function given by Eq. (1) has the following explicit form [18]:
n
f ( x, ai , ai* ) = ∑ (ai − ai* ) K ( x, xi ) + b (5)
i =1

In Eq. (5), ai and ai* are the so-called Lagrange multipliers. They satisfy the
equalities ai * ai* = 0 , ai ≥ 0 and ai* ≥ 0 where i = 1,2,…,n and are obtained by
maximizing the dual function of Eq. (4) which has the following form:
298 Y. Bao, Y. Lu, and J. Zhang

n n
R(ai , ai* ) = ∑ d i (ai − ai* ) − ε ∑ (ai + ai* )
i =1 i =1
(6)
1 n n
− ∑ ∑ (ai − ai )(a j − a j )K ( xi , x j )
* *
2 i =1 j =1
with the constraints
n
∑ ( ai − ai ),
*
i =1
0 ≤ ai ≤ C , i = 1,2...,n,
0 ≤ ai* ≤ C, i = 1,2...,n.
Based on the Karush–Kuhn–Tucker (KKT) conditions of quadratic programming,
only a certain number of coefficients (ai − ai* ) in Eq. (5) will assume non-zero values.
The data points associated with them have approximation errors equal to or larger
than ε and are referred to as support vectors. These are the data points lying on or
outside the ε -bound of the decision function. According to Eq. (5), it is evident that
support vectors are the only elements of the data points that are used in determining
the decision function as the coefficients (ai − ai* ) of other data points are all equal to
zero. Generally, the larger the ε , the fewer the number of support vectors and thus
the sparser the representation of the solution. However, a larger ε can also depreciate
the approximation accuracy placed on the training points. In this sense, ε is a trade-
off between the sparseness of the representation and closeness to the data.
K ( xi , x j ) is defined as the kernel function. The value of the kernel is equal to the
inner product of two vectors X i and X j in the feature space φ ( xi ) and φ ( x j ) , that is,
K ( xi , x j ) = φ ( xi ) * φ ( x j ) . The elegance of using the kernel function is that one can deal
with feature spaces of arbitrary dimensionality without having to compute the map
φ ( x) explicitly. Any function satisfying Mercer’s condition [16] can be used as the
kernel function. The typical examples of kernel function are as follows:
Linear: K ( xi , x j ) = xi T x j .
d
Polynomial: K ( xi , x j ) = (γxiT x j + r ) , γ > 0.
Radial basis function (RBF):
2
K ( xi , x j ) = exp(−γ xi − x j ), γ > 0.

Sigmoid: K ( xi , x j ) = tanh (γxiT x j + r ) .


Here, γ , r and d are kernel parameters. The kernel parameter should be carefully
chosen as it implicitly defines the structure of the high dimensional feature space
φ ( x) and thus controls the complexity of the final solution.
From the implementation point of view, training SVMs is equivalent to solving a
linearly constrained quadratic programming (QP) with the number of variables twice
as that of the training data points.
Forecasting Stock Price by SVMs Regression 299

Generally speaking, SVMs regression for forecasting follows the procedures:


1.Transform data to the format of an SVM and conduct simple scaling on the data;
2.Choose the kernel functions;
3.Use cross-validation to find the best parameter C and γ ;
4. Use the best parameter C and γ to train the whole training set;
5.Test.

3 Forecasting Stock Price

3.1 Data Sets

We select daily closing prices of Haier (a famous corporation in China) of shanghai


stock exchange between April.15, 2003 and Nov. 25, 2003. Haier is a famous
electrical appliance manufacturer in China and electrical appliance market in China is
in stable situation. This is to say, there is no great fluctuation in market volume,
market competition is fair and market profit rate is at a reasonable level. During that
time, the stock market in China is calm and no special political events affected the
stock except the SARS. But it didn’t affect the electrical appliance market. So we
choose it as the sample of our experiment. There are totally 140 data points from
www.stockstar.com. We used 100 data points in front of the data series as training
data sets and the rest 40 data points as testing data.

3.2 Data Preprocessing and Scaling

In order to enhance the forecasting ability of model, we transform the original closing
prices into relative difference in percentage of price (RDP)[20]. As mentioned by
Thomason [20], there are four advantages in applying this transformation. The most
prominent advantage is that the distribution of the transformed data will become more
symmetrical and will follow more closely a normal distribution. This modification to
the data distribution will improve the predictive power of the neural network.
The input variables are determined from four lagged RDP values based on 5-day
periods (RDP-5, RDP-10, RDP-15 and RDP-20) and one transformed closing price
(EMA15) which is obtained by subtracting a 15-day exponential moving average
from the closing price. The optimal length of the moving day is not critical but it
should be longer than the forecasting horizon of 5 days [20]. EMA15 is used to
maintain as much information as contained in the original closing price as possible,
since the application of the RDP transform to the original closing price may remove
some useful information. The output variable RDP+5 is obtained by first smoothing
the closing price with a 3-day exponential moving average because the application of
a smoothing transform to the dependent variable generally enhances the prediction
performance of the SVMs. The calculations for all the indicators are showed in
table 1.
300 Y. Bao, Y. Lu, and J. Zhang

Table 1. Input and output variables

Indicator Calculation
Input variables
EMA15 P (i) − EMA15 (i )
RDP-5 ( p(i ) − p (i − 5)) / p (i − 5) *100
RDP-10 ( p(i ) − p (i − 10)) / p (i − 10) *100
RDP-15 ( p(i ) − p(i − 15)) / p (i − 15) *100
RDP-20 ( p(i ) − p(i − 20)) / p (i − 20) *100

Output variable
RDP+5 ( p(i + 5) − p (i)) / p (i) *100
( p(i ) = EMA3 (i)

3.3 Performance Criteria

The prediction performance is evaluated using the following statistical metrics,


namely, the normalized mean squared error (NMSE), mean absolute error (MAE),
directional symmetry (DS) and weighted directional symmetry (WDS). NMSE and
MAE are the measures of the deviation between the actual and predicted values. The
smaller the values of NMSE and MAE, the closer are the predicted time series values
to the actual values. DS provides an indication of the correctness of the predicted
direction of RDP+5 given in the form of percentages (a large value suggests a better
predictor). The weighted directional symmetry measures both the magnitude of the
prediction error and the direction. It penalizes the errors related to incorrectly
predicted direction and rewards those associated with correctly predicted direction.
The smaller the value of WDS, the better is the forecasting performance in terms of
both the magnitude and direction.

3.4 Kernel Function Selection

We use general RBF as the kernel function. The RBF kernel nonlinearly maps
samples into a higher dimensional space, so it, unlike the linear kernel, can handle the
case when the relation between class labels and attributes is nonlinear. Furthermore,
the linear kernel is a special case of RBF as (Ref. [13]) shows that the linear kernel
~
with a penalty parameter C has the same performance as the RBF kernel with some
parameters (C , γ ) . In addition, the sigmoid kernel behaves like RBF for certain
parameters [14].
The second reason is the number of hyper-parameters which influences the
complexity of model selection. The polynomial kernel has more hyper-parameters
than the RBF kernel.
Forecasting Stock Price by SVMs Regression 301

Finally, the RBF kernel has less numerical difficulties. One key point is 0 < K ij ≤ 1 in
contrast to polynomial kernels of which kernel values may go to infinity
(γ xiT x j + r > 1) or zero (γ xiT x j + r < 1) while the degree is large.

3.5 Cross-Validation and Grid-Search

There are two parameters while using RBF kernels: C and γ . It is not known
beforehand which C and γ are the best for one problem; consequently some kind of
model selection (parameter search) must be done. The goal is to identify good (C , γ )
so that the classifier can accurately predict unknown data (i.e., testing data). Note that
it may not be useful to achieve high training accuracy (i.e., classifiers accurately
predict training data whose class labels are indeed known). Therefore, a common way
is to separate training data to two parts of which one is considered unknown in
training the classifier. Then the prediction accuracy on this set can more precisely
reflect the performance on classifying unknown data. An improved version of this
procedure is cross-validation.
We use a grid-search on C and γ using cross-validation. Basically pairs of (C , γ ) are
tried and the one with the best cross-validation accuracy is picked. We found that
trying exponentially growing sequences of C and γ is a practical method to identify
good parameters (for example, C = 2 −5 , 2 −3 ,…, 215 , γ = 2 −15 , 2 −13 ,..., 23 ).

4 Results and Discuss

In Fig. 1 the horizontal axis is the trading days of the test set and the vertical axis is
stock price. It could be found that actual stock price run down from above the
predicted one to below it at the 2nd trading day; in the following four trading days the
actual stock price goes down further; it tells that it is a timing for selling at 2nd day.
Actual stock price run up from below the predicted stock price to above it at the 6th
trading day; in the following two trading days actual stock price stays above the
closing price of the 6th trading day; it tells that it is timing for buying at 6th day. On
the following days, it could be done like this. The timing area for buying and selling
derived from Fig. 1 tells the investment values.

5 Conclusion

The use of SVMs in forecasting stock price is studied in this paper. The study
concluded that SVMs provide a promising alternative to the financial time series
forecasting. And the strengths of SVMs regression are coming from following points:
1) usage of SRM; 2) few controlling parameters; 3) global unique solution derived
from a quadratic programming.
302 Y. Bao, Y. Lu, and J. Zhang

But further research toward an extremely changing stock market should be done,
which means the data fluctuation may affect the performance of this method. Another
further research hint is the knowledge priority used in training the sample and
determining the function parameters.

Fig. 1. Comparison between actual and predicted Haier’s stock price

References

[1] D.A. Bessler and J.A. Brandt. Forecasting livestock prices with individual and composite
methods. Applied Economics, 13, 513-522, 1981.
[2] K.S. harris and R.M. Leuthold. A comparison of alternative forecasting techniques for
live stock prices: A case study. North Central J. Agricultural Economics, 7, 40-50, 1985.
[3] J.H. Dorfman and C.S. Mcintosh. Results of a price forecasting competition. American J.
Agricultural Economics, 72, 804-808, 1990.
[4] S.C. Blank. Chaos in future markets? A nonlinear dynamic analysis. J. Futures Markets,
11, 711-728, 1991.
[5] J. Chavas and M.T. Holt. Market instability and nonlinear dynamics. American J.
Agricultural Economics, 75, 113-120, 1993.
[6] Hall JW. Adaptive selection of U.S. stocks with neural nets. In: GJ Deboeck (Ed.),
Trading on the edge: neural, genetic, and fuzzy systems for chaotic financial markets.
New York:Wiley, 1994.
[7] Yaser SAM, Atiya AF. Introduction to financial forecasting. Applied Intelligence, 6, 205-
213, 1996.
Forecasting Stock Price by SVMs Regression 303

[8] G. Grudnitski, L. Osburn, Forecasting S&P and gold futures prices: an application of
neural networks, The Journal of Futures Markets, 13 (6) 631–643, 1993.
[9] S. Kim, S. Chun, Graded forecasting using an array of bipolar predictions: application of
probabilistic neural networks to a stock market index, International Journal of Forecasting
14 (3), 323–337, 1998.
[10] E. Saad, D. Prokhorov, D. Wunsch, Comparative study of stock trend prediction using
time delay, recurrent and probabilistic neural networks, IEEE Transactions on Neural
Networks, 9 (6), 1456-1470, 1998.
[11] Cheng W, Wanger L, Lin CH. Forecasting the 30-year US treasury bond witha system of
neural networks. Journal of Computational Intelligence in Finance 1996;4:10–6.
[12] Sharda R, Patil RB. A connectionist approach to time series prediction: an empirical test.
In: Trippi, RR, Turban, E, (Eds.), Neural Networks in Finance and Investing, Chicago:
Probus Publishing Co., 1994, 451–64.
[13] Haykin S. Neural networks: a comprehensive foundation. Englewood CliKs, NJ: Prentice
Hall, 1999.
[14] Zhang GQ, Michael YH. Neural network forecasting of the British Pound=US Dollar
exchange rate. Omega 1998;26(4):495–506.
[15] Kaastra I, Milton SB. Forecasting futures trading volume using neural networks. The
Journal of Futures Markets 1995;15(8):853–970.
[16] Vapnik VN. The nature of statistical learning theory. New York: Springer, 1995.
[17] Schmidt M. Identifying speaker with support vector networks. Interface ‘96 Proceedings,
Sydney, 1996.
[18] Muller KR, Smola A, Scholkopf B. Prediction time series with support vector machines.
Proceedings of International Conference on Artificial Neural Networks, Lausanne,
Switzerland, 1997, p 999
[19] Vapnik VN, GolowichSE, Smola AJ. Support vector method for function approximation,
regression estimation, and signal processing. Advances in Neural Information Processing
Systems 1996;9:281-287.
[20] Thomason M. The practitioner methods and tool. Journal of Computational Intelligence
in Finance 1999;7(3):36–45.
[21] Keerthi, S. S. and C-.J. Lin. Asymptotic behaviors of support vector machines with
Gaussian kernel. Neural Computation 15 (7), 1667–1689.
[22] Lin, H.-T. and C.-J. Lin. A study on sigmoid kernels for SVM and the training of non-
PSD kernels by SMO-type methods. Technical report, Department of Computer Science
and Information Engineering, National Taiwan University. Available at
http://www.csie.ntu.edu.tw/~cjlin/papers

Das könnte Ihnen auch gefallen