Beruflich Dokumente
Kultur Dokumente
22
International Journal of Computer Applications (0975 – 8887)
Volume 41– No.3, March 2012
i i 1
l
and i *
i 1
l
that maximize the objective function. It is to be noted that we do not require function to compute
f (x) which is one of the advantages of using the kernel.
l l
1 l l
Q(i ,i ) y i (i i ) (i i ) ( )( j j ) K ( xi , x j ) Advantages of SVM
* * * * *
i i
i 1 i 1 2 i 1 j 1 The support vector machine has the following specific
subjected to the following conditions: advantages over other machine learning methods for supervised
learning:
l training a support vector machine involves
( i ) 0
*
(1) i
optimization of a convex function with linear
i 1 constraint. This problem has a unique global
minimum which in turn overcome strucking to local
0 i C
minima observed in neural network.
the constructed model has an explicit dependence
(2)
0 i C only on the support vectors, which reduces the
*
computational cost.
for i = 1,2,…, l it scales relatively well to high dimensional data and
the trade-off between complexity and empirical error can be
where C is a user specified constant and K: X X R is the controlled explicitly by the appropriate choices of C and .
Mercer Kernal defined by: 3. RELATED WORK
In the past, most prediction models were based on conventional
K ( x, z) ( x).( z) (4) statistical methods, like time-series and multivariate analysis. In
recent years, however, machine learning methodologies,
including the artificial neural networks, genetic algorithms
The solution of the Primal yields (GA) and fuzzy technologies, have been used for predictions.
n
f ( x) ( i i )(( xi ) ( x)) b
*
i 1
International Journal of Computer Applications (0975 – 8887)
Volume 41– No.3, March 2012
MEDIAN
INDEX
MEAN
HIGH
SD
Input Variables
EMA15 p(i) − EMA15(i)
S&P 6312.45 2524.2 4627.85 4713.8 913.018 RDP-5 (p(i) − p(i − 5)) / p(i − 5) -100
CNX RDP-10 (p(i) − p(i − 10)) / p(i − 10) -100
NIFTY RDP-15 (p(i) − p(i − 15)) / p(i − 15) -100
BANK 13268.7 3339.7 7662.5 7350.45 2236.99 RDP-20 (p(i) − p(i − 20)) / p(i − 20) -100
NIFTY Output Variables
S&P 5502.6 1966.8 3794.94 3837.0 816.909
CNX 500
RDP+5
( p(i 5) −
p(i )) / p(i) -100
CNX 6260.66 2006.7 3611.63 3499.385 806.942
INFRA p(i ) = EMA3 (i)
CNX 100 6280.45 2388.6 4509.59 4545.95 947.771
EMAn(i) is the n-day exponential moving average of the ith day
and p(i) is the closing price of the ith day.
The data collected consists of daily previous closing price, open Since outliers may make it difficult or time-consuming to arrive
price, high price, low price, traded volume, and traded value. at an effective solution for the SVMs, RDP values beyond the
The daily closing prices are used as the data sets. Table 1 shows limits of ±2 standard deviations are selected as outliers. They
high price, low price, mean, median and standard deviation of are replaced with the closest marginal values. Another pre-
the five futures prices collected for our experiment. processing technique used in this study is data scaling. All the
4.2 Preprocessing data points are scaled into the range of [− 0.9; 0.9] as the data
Choosing a suitable forecasting horizon is the first step in points include both positive values and negative values. All of
financial forecasting. From the trading aspect, the forecasting the five data sets are partitioned into three parts according to the
horizon should be sufficiently long so that the over-trading time sequence.
resulting in excessive transaction costs could be avoided. From 4.3 Performance Criteria
the prediction aspect, the forecasting horizon should be short The prediction performance is evaluated using the following
enough as the persistence of financial time series is of limited statistical metrics, namely, the normalized mean squared error
duration. As suggested by Thomason [15] a forecasting horizon (NMSE), mean absolute error (MAE) and directional symmetry
of five days is a suitable choice for the daily data. As the precise (DS) [14]. The definitions of these criteria can be found in
values of the daily prices is often not as meaningful to trading Table 3.
as its relative magnitude, and also the high-frequency NMSE and MAE are the measures of the deviation between the
components in financial data are often more difficult to actual and predicted values. The smaller the values of NMSE
successfully model, the original closing price is transformed and MAE, the closer are the predicted time series values to the
into a five-day relative difference in percentage of price (RDP). actual values. DS provides an indication of the correctness of
The input variables are determined from four lagged RDP the predicted direction of RDP+5 given in the form of
values based on five-day periods (RDP-5, RDP-10, RDP-15, percentages (a large value suggests a better predictor).
and RDP-20) and one transformed closing price which is Table 3 - Performance Metrics & Their Calculation
obtained by subtracting a 15-day exponential moving average
(EMA15) from the closing price. The subtraction is performed
Metrics Calculation
to eliminate the trend in price as the maximum value and the ^
NMSE 1 n
minimum value is in the ratio of about 2: 1 in all the five data
2n
( y yi ) i
2
sets. The optimal length of the moving day is not critical, but it i 1
( y y) 2
2
EMA15 is used to maintain as much of the information n 1 i1 i
contained in the original closing price as possible since the
n
application of the RDP transform to the original closing price
y yi
may remove some useful information. The output variable
i 1
RDP+5 is obtained by first smoothing the closing price with a n
MAE 1 ^
three-day exponential moving average, because the application
of a smoothing transform to the dependent variable generally n
i 1
yi y i
[16]. n
d
i 1
i
1 y y y y ^ ^
i 1
0
di
i
i i 1
0
otherwise
24
International Journal of Computer Applications (0975 – 8887)
Volume 41– No.3, March 2012
^
N is the total number of data patterns. y and y represent the
actual and predicted output value
26