Beruflich Dokumente
Kultur Dokumente
Neural Nets
Leonard Aye
1994
FTSE Trend Forecasting Using Neural Networks Contents
Contents
Project Summary ...............................................................................................................................................4
Project Aim ............................................................................................................................................4
Introduction .......................................................................................................................................................10
Report layout..........................................................................................................................................10
Software package ...................................................................................................................................11
Neural Networks ...............................................................................................................................................10
Brief introduction...................................................................................................................................10
Input Data Manipulations..................................................................................................................................11
Introduction............................................................................................................................................11
Input data manipulations ........................................................................................................................12
General manipulations ................................................................................................................12
Indexes manipulations ................................................................................................................13
Interest rates................................................................................................................................17
Exchange rates ............................................................................................................................18
Economic data.............................................................................................................................18
Futures ........................................................................................................................................18
Complete list of input data sets ..............................................................................................................19
Output Data Selection .......................................................................................................................................21
Number of predictive items in the neural net .........................................................................................21
Short term prediction ..................................................................................................................21
Long term prediction ..................................................................................................................21
Selection of predictive item....................................................................................................................21
Selection of FTSE difference......................................................................................................22
Network Tuning ................................................................................................................................................27
Hidden nodes ..............................................................................................................................27
Learning rate (0.0–1.0) ...............................................................................................................28
Momentum (0.0–0.9) ..................................................................................................................28
Learning Threshold (0.0–3.0) .....................................................................................................28
Number of presentation ..............................................................................................................29
Presentation type.........................................................................................................................29
Minimum and maximum values..................................................................................................29
Results ...............................................................................................................................................................52
Results evaluation methods....................................................................................................................52
Trading — Short term prediction................................................................................................52
Forecasting — Long term prediction ..........................................................................................54
Preliminary results .................................................................................................................................54
Result sheets ...............................................................................................................................54
Test 1—FTSE +1D MOM prediction .........................................................................................54
Test 2—FTSE +2D MOM (residual) prediction.........................................................................55
Test 3—FTSE +65D MOM prediction .......................................................................................55
Reduction of input data sets ...................................................................................................................56
Numerical Analysis.....................................................................................................................56
Direct experimentation................................................................................................................56
Conclusions .......................................................................................................................................................56
Highlights...............................................................................................................................................56
Conclusions............................................................................................................................................56
Input data manipulation ..............................................................................................................56
Output data selection ..................................................................................................................57
Network Tuning..........................................................................................................................57
Analysis methods ........................................................................................................................57
Appendices ........................................................................................................................................................59
Appendix A— Back-propagation algorithm...............................................................................59
Appendix B— FTSE Analyses ...................................................................................................59
Appendix A .......................................................................................................................................................60
Back-propagation algorithm...................................................................................................................60
Appendix B .......................................................................................................................................................61
Momentum (Close to Close Difference) .....................................................................................61
Returns ........................................................................................................................................62
Percentage Change of Momentum (PCM)..................................................................................62
Moving Average (MAV) ............................................................................................................62
Copyright— © Len Aye 1994 2
FTSE Trend Forecasting Using Neural Networks Contents
PROJECT SUMMARY
Project Aim
This report is a result of 3 months work at a stock-broker firm based in London. The aim
of the project is to add value to existing business areas by making predictions about the
level of Index values in the future using Neural Networks. The FTSE index was chosen
as the initial target for the estimation.
The assumption is that, except in the case of unexpected shocks, e.g. the invasion of
Kuwait, the likely future levels for the market are largely contained in the data available
to participants in the market today.
So vast is the amount of that data that turning it into usable information is a difficult task.
The function of the neural network is to help discriminate between the data as to what is
significant and to discover patterns in the data which enable it to make estimates about
the future. The intention is not that the neural network would stand alone but that it will
be used to complement the existing methods.
From the Technical Analysis perspective the required time scale for the FTSE estimation
is 3 months and a predictive accuracy of ±1.5%. For Trading purpose, a 1 or 2 day
estimate is required with an accuracy for large moves (greater than ±0.75%) of 0.5%, but
with an over-riding requirement of getting the direction of movement correct.
The task of performing financial predictions, or any other analysis, using neural nets
involves 4 major steps: input data selection, output data selection, network tuning and
analysing results from the network.
The purpose of the report is to describe our initial findings in these four areas, namely to
establish:-
the most promising data sets that could be used as indicators of FTSE prediction
the appropriate output parameters which could be predicted most accurately by
Neuroshell
the parameters in NeuroShell that are most likely to affect the overall accuracy of the
results and methods used in tuning these parameters, and
the appropriate methods for analysing the results.
INTRODUCTION
The task of predicting accurately the future value of FTSE 100 Shares Index, either one
day or a few months ahead, is by no means easy. In the past, and even now, statistical
tools have been used and have proved successful, up to a certain point, in predicting such
financial indicators.
However, there is now a different class of computerised tools that are becoming available
which can be use alongside the statistical methods in predicting data consisting of non-
linear patterns. This new class of tools are called Neural Networks (or neural nets) and
are originated in the field of psychology, cognitive science and later crossed over to
computing.
The idea of neural nets was first investigated in the 1940s and only recently have
practical, off-the-shelf tools are becoming available. The neural nets have been applied to
such diverse fields as classification: speech, image and hand-written characters
recognition, medical screening, geo-demographic analysis; control of complex non-linear
plants such as engines and chemical processes; data fusion: medical diagnosis, sales
forecasting, credit/loan risk analysis; and of course, prediction: financial systems and
exchange rate forecasting.
Report layout
This report is a summary of work carried out during the first 6 months of the project. In
order to understand the results from our experiments it is necessary that the reader has
some basic understanding of the neural nets. Hence, the section ‘Neural Networks’
briefly explains the idea behind the principle—alternatively skip the section if you are
already familiar with the subject.
As stated earlier, the task of performing financial predictions, or any other analysis, using
neural nets involves 4 major steps: input data selection, output data selection, network
tuning and analysing the results from the network, hence the main body of the report is
broken down into 4 sections to reflect these 4 steps1.
The section ‘Input Data Manipulations’ shows the data sets that were acquired and how
they were manipulated so that they can be used as inputs to the neural network.
The next step is to decide what we want the network to produce as outputs, i.e. the items
to be predicted. This is not as obvious as one would have expected. The section ‘Output
Data Selection’ details the various parameters that were tested for their suitability as
predictive items for the FTSE index.
Once we have established both the items to be used as input and output we then trained
the network. The sections ‘Network Tuning’ describes the parameters involved in
training a network (within the confines of the NeuroShell package) and how they were
tuned.
After the tests were carried out the results from the tests were analysed and the ‘Results’
section highlights the observations from the tests.
The last section ‘Conclusions and future plans’ presents our findings and observations
from each of the previous sections and our plans for the next 6 months of the project. For
those readers who are not technically inclined may skip to this last section for a
condensed summary of the report.
1 The reader should be aware that the optimal data sets or parameters required for each step are not
obtained in isolation with other steps but were obtained in parallel by doing experiments iteratively..
Software package
The package that we have used for all the experiments is called NeuroShell®2 and in this
report we use the term ‘neural net’ when the context applies to neural networks in general
and ‘NeuroShell’ when the context refers to the particulars of the package.
2 NeuroShell™ is a trademark of Ward Systems Group, Inc., 245 West Patrick Street, Frederick,
Maryland 21701, USA. Tel: (+1) 301 662-7950.
NEURAL NETWORKS
Brief introduction
Neural networks are typically composed of interconnected “units”, and each connection
is associated with a modifier weight3. Each unit converts the pattern of incoming
activities that it receives into a single outgoing activity that it broadcasts to other units. It
performs this conversion in two stages. First, it multiplies each incoming activity by the
weight on the connection and adds together all these weighted inputs to get a quantity
called the total input. Second, a unit uses an input-output function that transforms the
total input into the outgoing activity (see Figure 2.1 below).
WEIGHTED
INPUT UNIT
ACTIVITY INPUT
WEIGHT OUTPUT
2 ACTIVITY
0.1
INPUT
0.5 SUM OUTPUT
FUNCTION
1.5
To make a neural network that performs some specific task, the weights on the
connections and how the units are connected to each other must be set appropriately. The
connections determine whether it is possible for one unit to influence another. The
weights specify the strengths of the influence.
The common types of neural networks consists of three layers of units: a layer of input
units is connected to a layer of “hidden” units, which is in turn connected to a layer of
output units. The activity of the input units represents the raw information that is fed into
the network. The activities of each hidden unit is determined by the activities of the input
units and the weights on the connections between the input and hidden units. Similarly,
the behaviour of the output units depends on the activity of the hidden units and the
weights between the hidden and output units (see Figure 2.2). The number of hidden
layers in a network depends very much on the problem to be solved using the network.
3 Hinton, G. E. (1992), How Neural Networks Learn from Experience, Scientific American, September
1992, pp 105-109.
O1 O2
OUTPUT LAYER
HIDDEN LAYER
H1 H2 H3 H4 H5
INPUT LAYER
I1 I2 I3
I = input unit
H = hidden unit
O = output unit
Figure 2.2 — A common three layer neural network
To train a network, the input patterns are presented to the network and the actual activity
of the output units and the desired activity is compared. The error is calculated, which is
defined as the square of the difference between the actual and the desired activities. The
weights of each connection is then changed in order to reduce the error. The above
process is repeated until the network classifies, or recognises, every input pattern
correctly.
Introduction
Of the numerous financial data at our disposal we have chosen the following financial
data as suitable indicators for the prediction of FTSE. These data sets are classified into
their relative groups, as follows:
Indexes
FTSE 100
FTSE Eurotrack 100
Dow Jones
DAX
NIKKEI
CAC 40
Interest rates
Exchange rates
US $ – £ Sterling
French Franc – £ Sterling
Japanese Yen ¥ – £ Sterling
German Marks DM – £ Sterling
Economic data
Futures trading
The list above shows our initial list of financial and economic indicators that we have
decided to use as predictive variables. The data sets as they stand in their raw form
contain historical information that is not directly apparent in the data, and by calculating
their derivatives (e.g. moving averages, etc.) this hidden information or patterns can be
brought to the surface and made more explicit, and consequently be recognised by the
neural network.
The sections below describes how the raw data were analysed and the types of derivatives
measured.
General manipulations
The following adjustments were applied to all the data sets.
1–Spikes in data
When calculating the derivatives of the data set—index values in particular—we need a
way of handling the sudden rise or fall of large magnitude in values, e.g. when the stock
market crashed the FTSE dropped by over 250 points. Because of this crash all the
derivatives that were calculated (i.e. moving averages, differences, rate of change, etc.)
have large spikes in them. Since our interest is in the direction or movements but less so
in the absolute values of the movements above a given level we can reduce the size of
these spikes without loosing information.
Another reason for dealing with spikes is that the precision of the NeuroShell output is
determined by the range of the minimum and maximum values set for a particular data
series. The NeuroShell manual suggests that when dealing with spikes the minimum and
maximum values should be set tightly around the majority of the data set.
Hence, from the graphical analysis of the data series in Excel, we decided that 4 standard
deviations of the data series would be suitable for using as the minimum and maximum
values.
The back-propagation algorithm is suitable for the majority of problems, where the data
to be trained is discrete or independent of each other. However, the algorithm does not
handle temporal or historical data well4. To overcome this limitation, we used
momentums (differences) of the indexes between today and some periods in the past as
representatives of the ‘historical’ information in the data.
The following table shows the various index differences that we wish to calculate and
used as inputs to the neural network.
4 There are other algorithms such as recurrent algorithms, which can handle time-series data.
However, the current version of NeuroShell does not provide this feature.
Multiples of 5 are chosen to avoid week-day affects which may be particularly great in
the UK because of its settlement accounting system.
These will provide some history which the neural network would not otherwise get.
However, it is unlikely that all of these are significant and part of the neural net’s job is to
discriminate between them.
3–Levels
From our early experience of NeuroShell we have found that the network cannot predict
values that are outside the range of its learning set. This particular problem is not limited
to NeuroShell alone but is a limitation with the neural nets in general. For this reason any
data that has levels (or trends) must be transformed into one that does not contain levels.
More importantly, we cannot use NeuroShell to predict the real value of FTSE. The two
methods described below can be used to eliminate the trend in the data.
Differences
This is simply the normalisation of the raw data and can be done in many ways, and the
most simplest method of removing the levels is to calculate the difference between the
current value and the value some periods ago. It is obviously sufficient, in predicting
future values, to calculate the difference from today.
As daily differences are a function of market level then this series too will have widening
bounds. However, this is a second order effect and unlikely to be significant over the
periods of 2 days or 3 months currently being considered for prediction.
Trend removals
This method on the other hand approximates the underlying trend, using linear
regression, and removing the trend from the raw data series and using only the residual
series as an input to the network. This is difficult because of the number of data series,
each with its own trend which will not be independent. At present, it may be safer not to
adjust for trend.
Significant figures
All data series are calculated from their raw values and chopped at 4 std. deviations
individually for each series. After this, the effect of rounding
up to the next multiple of 0.2, so that values are xxx.2, xxx.4, xxx.6, ....
to the nearest 0.5, so that values are xxx.0, xxx.5, xxx.0, ....
Indexes manipulations
The following applies to index data sets only. When dealing with index data we should be
aware of the following points:
Raw Index data will not be used as input because of level problems, and at the same
time we must never loose sight of the actual Index values.
To calculate the Index value it is sufficient to calculate the expected difference from
today’s Index value.
No inputs should be used that are expected to have a trend because the neural
network does not predict at all well outside its learning experience (although
differences are OK).
It is acceptable to underestimate very large changes as these are generally exceptions
that are not expected to be within the normal patterns previously seen — i.e. the
neural network should not be expected to anticipate a large ‘shock’ to the market but
might be expected to predict reasonably the aftermath of a shock given it has seen a
few before.
All derived data series, differences mainly, should be limited to 4 std. dev. of the
original data set, and rounded to the same accuracy of the original data.
All data should be rounded to an acceptable degree of accuracy. Nothing need be
more accurate than 0.01%, e.g. 0.01*2500/100 = 0.25 in FTSE. For FTSE, clearly
0.02% (±0.5) is acceptable.
History information about the data must somehow be made available to the network.
The use of raw data will depend very much upon the required output type, i.e. short
or long-term predictions. For the short-term predictions, e.g. 1 day (and 2 day, as a
check for the 1 day prediction), we could use the raw FTSE data for calculating the
derivatives. In contrast, average values of the Index, which reduce the noise and
daily fluctuations in the Index, could be used for the long-term, e.g. 65 days or 3
months, prediction.
All the predicted Index values, either short- or long-term, will be differences from
today’s value only.
These particular indicators were chosen on the basis of Robin Griffiths’s experience.
They are widely used in the market (which to some extent must make them self fulfilling)
and he has found them the most valuable of the huge range available (e.g. from Reuters
RT handbook).
Trend removal
While it is expected that any trend in the FTSE data is exponential rather than linear
(because the rise should be related to the growth of money values, with re-investment),
we should nevertheless test for this assumption.
Linear
Assume there is a trend,
FTSE = m (time) + const + error.
Exponential
Assume there is a trend
log(FTSE) = m (time) + const + error.
Inverse
Assume there is a trend,
1
=m (time) + const + error.
FTSE
We should pick the best solution not on the basis of the individual ∑ ( error ) 2
terms
above, but on the equivalent calculation for the series converted back into FTSE values.
i.e. it is always calculated as ( FTSE − est ( FTSE )) 2 ∑
We then just take the best of these three, subtract it from the original data set and use the
resulting values (FTSE residual) as the items to be estimated by the neural network5. The
following graph shows the trends in the raw FTSE.
It can be observed from the graph that the trends of FTSE are slightly offset by the large
peak in 1987, i.e. the trend lines are above what one would consider an optimum trend. It
can also be observed that in both of the graphs the linear trend fits the graph better than
the exponential and we use this linear trend to calculate the FTSE residual values, which
could then be used as the items to be estimated.
3000
Underlying trends of raw FTSE
2800
2600
2400
2200
2000
1800
FTSE 100
1600 y=mx+c
y=c*m^x
1400
1200
1000
Ja F M A A M Ju Ju A S O N D Ja F M M A M Ju Ju A S O N D Ja F M M A M Ju Ju A S O N D Ja F F M A M Ju Ju A S O N D Ja Ja F M A M Ju Ju A S O N D Ja Ja F M A M Ju Ju A S O N D D Ja F M A M Ju Ju A S O N D D Ja F M A M Ju Ju A S O N N D Ja
n- e a p p a n- l- u e ct- o e n- e a a p a n- l- u e ct- o e n- e a a p a n- l- u e ct- o e n- e e a p a n- l- u e ct- o e n- n- e a p a n- l- u e ct- o e n- n- e a p a n- l- u e ct- o e e n- e a p a n- l- u e ct- o e e n- e a p a n- l- u e ct- o o e n-
85 b- r- r- r- y- 85 85 g- p- 85 v- c- 86 b- r- r- r- y- 86 86 g- p- 86 v- c- 87 b- r- r- r- y- 87 87 g- p- 87 v- c- 88 b- b- r- r- y- 88 88 g- p- 88 v- c- 89 89 b- r- r- y- 89 89 g- p- 89 v- c- 90 90 b- r- r- y- 90 90 g- p- 90 v- c- c- 91 b- r- r- y- 91 91 g- p- 91 v- c- c- 92 b- r- r- y- 92 92 g- p- 92 v- v- c- 93
85 85 85 85 85 85 85 85 85 86 86 86 86 86 86 86 86 86 87 87 87 87 87 87 87 87 87 88 88 88 88 88 88 88 88 88 89 89 89 89 89 89 89 89 90 90 90 90 90 90 90 90 90 91 91 91 91 91 91 91 91 91 92 92 92 92 92 92 92 92 92
Seasonality
This should be tackled only after the trend has been removed. We should do long term
seasonality (1 year) first, only then should we see whether there is any remaining cycles
that might be removed hopefully by looking at the graphs.
It is difficult to decide on the best method for calculating seasonals without knowing the
nature of the trends described above and looking at the resulting graphic to see whether
the seasonal variations are likely to remain constant or rise with increasing trend, and to
what extent. However, it would probably be reasonable to start with the assumption that
seasonals are a constant ratio to trend.
5 Microsoft Excel provides built-in functions for calculating the straight line and exponential curves that
best fit the given series of values.
For the linear trend, the gradient m and constant c can be obtained from the function LINEST(values)
which returns an array that describes the line.
With these values a straight line is then constructed using arbitrary x values ranging from 0 to n
number of data points in the series.
For the exponential trend, the gradient m and constant c can be obtained from the function
LOGEST(values) which returns an array that describes the curve, and the gradient and constant are
obtained as described above.
Use of exponentially weighted moving average (EMA) of the FTSE residual could be
applied here; e.g.
It is well known in the market that FTSE behaves in a seasonal pattern, i.e. one that
repetitive over a certain time period. For example, the value of FTSE rises around
beginning of each year (see figure below). The question is how do we incorporate this
information as an input to the neural network.
Our first attempt was to use another input which simply consists of a series of numbers
representing the days in a year. For example, day 1 is always the first Monday in the
second week of a new year. We use this input together with FTSE derivatives to indicate
the seasonal change of FTSE. In order that this new information will be of use to the
network the data will have to be presented in a rotational instead of a random basis. So
far, we have not removed any seasonal information from the data but simply placed an
additional indicator to the neural network that seasonal variations exists in the data.
2800
2600
2400
2200
2000
1800
1600
1400
1200
1000
JaJaJaJaJaJaJaJaJaJaJaJa F F F F F F F F F F M M M M M M M M M M A A A A A A A A A A A M M M M M M M M M M M M JuJuJuJuJuJuJuJuJuJuJuJuJuJuJuJuJuJuJuJuJu A A A A A A A A A A A S S S S S S S S S S S O O O O O O O O O O O N N N N N N N N N N N D D D D D D D D D D D
n- n- n- n- n- n- n- n- n- n- n- n- e e e e e e e e e e a a a a a a a a a a p p p p p p p p p p p a a a a a a a a a a a a n- n- n- n- n- n- n- n- n- n- l- l- l- l- l- l- l- l- l- l- l- u u u u u u u u u u u e e e e e e e e e e e ct-ct-ct-ct-ct-ct-ct-ct-ct-ct-ct- o o o o o o o o o o o e e e e e e e e e e e
858585858585858585858585 b- b- b- b- b- b- b- b- b- b- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- y- y- y- y- y- y- y- y- y- y- y- y- 858585858585858585858585858585858585858585 g- g- g- g- g- g- g- g- g- g- g- p- p- p- p- p- p- p- p- p- p- p- 8585858585858585858585 v- v- v- v- v- v- v- v- v- v- v- c- c- c- c- c- c- c- c- c- c- c-
85858585858585858585858585858585858585858585858585858585858585858585858585858585858585 85858585858585858585858585858585858585858585 85858585858585858585858585858585858585858585
1985 FTSE 1986 FTSE 1987 FTSE 1988 FTSE 1989 FTSE 1990 FTSE 1991 FTSE 1992 FTSE
Other Indexes
The following two manipulation methods were applied to the following indexes: Dow
Jones, DAX, Nikkei, and CAC 40.
Trend replacements
The following two differences are used as the indicators of the Index without the trend:
• Index - FTSE
⎛ Index ⎞
• FTSE − ⎜ ⎟.
⎝ £/Ex. rate ⎠
Historical data
This is done by calculating the difference between today’s index and the index n periods
ago, and we used the 1 day, 1 month, 3 months and 12 months difference of the following
derivatives:
• Index
• Index - FTSE
⎛ Index ⎞
• FTSE − ⎜ ⎟.
⎝ £/Ex. rate ⎠
• RSI 14 days
• RSI 9 days.
Interest rates
The following shows the data manipulation carried out for the UK Interest rates, but is
equally applicable to other nations rates.
Historical data
The following table shows the historical data that are expected to be important and
obtained using differences.
3 M Interbank rates
This is an additional factor used for Interbank rates and the following table shows the
differences that we wish to calculate.
Exchange rates
For exchange rates 1 day and 1 month (20 day) differences were calculated as derivatives
of the exchange rates. The derivatives are used together with the raw values of exchange
rates because although the exchange rates varies, they are normally bounded within
certain ranges.
as is 1 day 1 month
US $/£ exchange rate 9 9 9
French Franc/£ exchange rate 9 9 9
Japanese ¥/£ exchange rate 9 9 9
German Marks DM/£ exchange rate 9 9 9
Economic data
Here, only the 12 month percentage change is calculated for use as replacement of the
actual values for the following economic data:
Futures
Currently, we have not yet used the Futures data extensively.
Acronym Meaning
UK UK
US US
GE Germany
FR France
JP Japan
DJ Dow Jones
NIK Nikkei
BR Base rate
3MIB 3 month interbank
INF Inflation
>7Y BY more than 7 years Bond yield
20YR. GILT 20 years Gilt yield
30Y BY 30 years Bond yield
1day MOM FTSE-(DAX/EX.RATE) CAC 1D MOM US 30Y BY 12M MOM FRR >7YBY-INF
2 day MOM DAX 1D MOM CAC 20D MOM US BR-3MIB 1D MOM FR 3MIB 1D MOM
1 week MOM DAX 20D MOM CAC 65D MOM US BR-3MIB 20D MOM FR 3MIB 20D MOM
25day MOM DAX 65D MOM CAC-FTSE 1D MOM US 3MIB-30YBY 1D MOM FR >7Y BY 1D MOM
50day MOM DAX 12M MOM CAC-FTSE 20D MOM US 3MIB-30YBY 20D MOM FR >7Y BY 20D MOM
65 days MOM FTSE-DAX 1D MOM CAC-FTSE 65D MOM US 3MIB-30YBY 12M MOM FR >7Y BY 12M MOM
1 year MOM FTSE-DAX 20D MOM FTSE-(CAC/EX.RATE) 1D US 30Y-INF 1D MOM FR 3MIB->7YBY 1D MOM
MOM
% change of MOM FTSE-DAX 3M MOM FTSE-(CAC/EX.RATE)20D US 30Y-INF 20D MOM FR 3MIB->7YBY 20D MOM
over 10 days MOM
% change of MOM FTSE-DAX 12M MOM FTSE-(CAC/EX.RATE) 65D US BR-INF 1D MOM FRR >7YBY-INF 1D MOM
over 25 days MOM
% change of MOM Day numbers CAC RSI 9D 1D MOM US BR-INF 20D MOM FRR >7YBY-INF 20D MOM
over 50 days
Close-2day MAV FT-DAXEX 1D MOM CAC RSI 9D 20D MOM JP BR-3MIB UK-US 3MIB
Close-5 day MAV FT-DAXEX 20D MOM CAC RSI 9D 65D MOM JP 3MIB-10Y BY UK-GE 3MIB
Close-25 day MAV FT-DAXEX 65D MOM CAC RSI 14D 1D MOM JP 10Y BY-INF UK-FR 3MIB
Close-50 day MAV FT-DAXEX 12M MOM CAC RSI 14D 20D MOM JP BR-INF UK-JP 3MIB
3 day ROC DAX RSI 9D 1D MOM CAC RSI 14D 65D MOM JP BR 1D MOM US-GE 3MIB
5 day ROC DAX RSI 9D 20D MOM UK BR-3M IB JP BR 20D MOM GE-FR 3MIB
25 day ROC DAX RSI 9D 65D MOM UK 3MIB-20YR.GILT JP 3MIB 1D MOM GE-JP 3MIB
50 day ROC DAX RSI 9D 12M MOM UK 20YR.GILT-INFLATION JP 3MIB 20D MOM US$/£ EX. RATE
MACD DAX RSI 14D 1D MOM UK BR-INFLATION JP10Y BY 1D MOM US$/£ 1D MOM
RSI 9 day DAX RSI 14D 20D UK BR 1D MOM JP 10Y BY 20D MOM US$/£ 20D MOM
MOM
RSI 14 days DAX RSI 14D 65D UK BR 20D MOM JP 10Y BY 12M MOM FRANCS/£ EX. RATE
MOM
Zero cl-cl- vol DAX RSI 14D 12M UK 3MIB 1D MOM JP BR-3MIB 1D MOM FR/£ 1D MOM
MOM
DJ-FTSE NIKKEI-FT UK 3MIB 20D MOM JP BR-3MIB 20D MOM FR/£ 20D MOM
FTSE-(DJ/EX.RATE) FT-(NIKKEI/EX.RATE) UK 20YR.GILT 1D MOM JP 3MIB-10YBY 1D MOM MARKS/£ EX. RATE
DJ 1D MOM NIK 1D MOM UK 20YR.GILT 20D MOM JP 3MIB-10YBY 20D MOM MARKS/£ 1D MOM
DJ 20D MOM NIK 20D MOM UK 20YR.GILT 12M MOM JP 3MIB-10YBY 12M MOM MARKS/£ 20D MOM
DJ 65D MOM NIK 65D MOM UK BR-3M IB 1D MOM JP 1OYBY-INF 1D MOM YEN/£ EX. RATE
DJ 12M MOM NIK 12M MOM UK BR-3M IB 20D MOM JP 1OYBY-INF 20D MOM YEN/£ 1D MOM
DJ-FTSE 1D MOM NIK-FT 1D MOM UK 3MIB-20YR.GILT 1D MOM JP BR-INF 1D MOM YEN/£ 20D MOM
DJ-FTSE 20D MOM NIK-FT 20D MOM UK 3MIB-20YR.GILT 20D JP BR-INF 20D MOM UK GDP 12M % CHANGE
MOM
DJ-FTSE 65D MOM NIK-FT 65D MOM UK 3MIB-20YR.GILT 12M GE BR-3M IB UK M. SUPLY 12M %
MOM CHANGE
DJ-FTSE 12M MOM NIK-FT 12M MOM UK 20YR.GILT-INF 1D MOM GE 3MIB-10YR BY UK INF 12M % CHANGE
FT-DJEX 1D MOM FT-NIKEX 1D MOM UK 20YR.GILT-INF 20D MOM GE BR 1D MOM US GDP 12M % CHANGE
FT-DJEX 20D MOM FT-NIKEX 20D MOM UK BR-INF 1D MOM GE BR 20D MOM US M. SUPLY 12M %
CHANGE
FT-DJEX 65D MOM FT-NIKEX 65D MOM UK BR-INF 20D MOM GE 3M IB 1D MOM US INF 12M % CHANGE
FT-DJEX 12M MOM FT-NIKEX 12M MOM US BR-3M IB GE 3M IB 20D MOM FR GDP 12M % CHANGE
DJ RSI9D 1D MOM NIK RSI 9D 1D MOM US 3MIB-30Y BY GE10 YR BY 1D MOM FR M. SUPLY 12M %
CHANGE
DJ RSI9D 20D MOM NIK RSI 9D 20D MOM US 30Y BY-INF GE10 YR BY 20D MOM FR INF 12M % CHANGE
DJ RSI9D 65D MOM NIK RSI 9D 65D MOM US BR-INF GE10 YR BY 12M MOM GE GDP 12M % CHANGE
DJ RSI9D 12M MOM NIK RSI 9D 12M MOM US BR 1D MOM GE BR-3M IB 1D MOM GE M. SUPLY 12M %
CHANGE
DJ RSI14D 1D MOM NIK RSI14D 1D MOM US BR 20D MOM GE BR-3M IB 20D MOM JP GDP 12M % CHANGE
DJ RSI14D 20D NIK RSI14D 20D MOM US 3M IB 1D MOM GE 3MIB-10YR BY 1D JP M. SUPLY 12M %
MOM MOM CHANGE
DJ RSI14D 65D NIK RSI14D 65D MOM US 3M IB 20D MOM GE 3MIB-10YR BY 20D JP INF 12M % CHANGE
MOM MOM
DJ RSI14D 12M CAC-FTSE US 30Y BY 1D MOM GE 3MIB-10YR BY 12M
MOM MOM
FTSE-DAX FTSE-(CAC/EX.RATE) US 30Y BY 20D MOM FR 3MIB->7YBY
Note: The first 22 data sets (1 day MOM to Zero cl-cl vol.) are the derivatives of FTSE
Index..
However, it is generally accepted that a neural network which has more than 1 output
performs less well than separate networks each having a single output. This is particularly
true in NeuroShell which uses a least squares minimisation technique to decide how to
apportion its weights adjustments amongst several outputs. This means that the accuracy
of each output is sacrificed in order to minimise the total error of all the outputs. With
this in mind we will have to build two networks for predicting short term and long term
FTSE indexes separately. The question then arises is what type of data will be suitable for
each of the two networks.
Hence, we believe that the types of input data for both network will be similar in many
aspects, but the weightings will be different.
Initially we expected the returns (percentage change) of FTSE would prove useful as the
predictive item. However, the tests carried out by Grashoff showed that the use of returns
was not successful as expected. This may be because price moves are always discrete
units (1p, 2p, etc.). Thus using absolute differences provides the neural net with more
repetition. When returns are used, a change of 10p in price is an input of different value
depending on the underlying price level at the start of the period. Returns therefore
provides a more continuous set of value to present to the neural net and also compensate
for level. However, any benefit appears to be offset by the vastly increased number of
different values. Some work was done by rounding the returns to a small number of
significant figures and while this gave some improvement the end result was less
accuracy in prediction than achieved by straight differences.
The same observation was made for the use of differences of logarithms of FTSE. Hence,
all our tests now use the FTSE difference (the value of FTSE n number of days ahead
with respect to today’s value) as the predictive item from the neural net.
In earlier tests, we have used the 2 day moving average (MAV), defined as
(P t −1 + P t ) / 2 , as the factor to be predicted, since the daily differences contains too
much noise for the neural net. It is in the nature of market that daily movements are
generally over done and that some correction occurs the following day. However, the 2
day moving average suffers from the problem that its average point in time is about
midday rather than end day: the price of today and yesterday are recorded at close of
business times, so the average of the two days is around midday.
A better average might then be, what we have called, 3 day weighted average, that is;
P t −1 + 2∗ P t + P t +1
4
This is centred at the close of business required but has the problem that it requires
tomorrow’s value. Using this would therefore involves estimating at least two days
forward.
As the graph below shows the 2 day moving average differs from the actual FTSE value
by less than 0.5% normally.
FTSE/MAV(FTSE, 2D)*100-100
3
2.5
1.5
0.5
-0.5
-1
-1.5
-2
-2.5
M M M M M A A A A M M M M M J J J J J J J J A A A A A S S S S O O O O N N N N N D D D D J J J J J F F F F M M M M A A A A M M M M M J J J J J J J J J A A A A S S S S O O O
a a a a a p p p p a a a a a u u u u u u u u u u u u u e e e e c c c c o o o o o e e e e a a a a a e e e e a a a a p p p p a a a a a u u u u u u u u u u u u u e e e e c c c
r r r r r r r r r y y y y y n n n n l l l l g g g g g p p p p t t t t v v v v v c c c c n n n n n b b b b r r r r r r r r y y y y y n n n n l l l l l g g g g p p p p t t t
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
However, with 3 day weighted average the difference in relation to the real FTSE value is
around 0.25%, half that of the 2 day moving average (see graph below).
It is unlikely that any wider moving average would be of value because of the incidence
of ‘special’ events. Such as average would have to be treated with care because it
includes future information.
P t −1 + 2∗ P t + P t+1
Perhaps, is assigned to day t+1, when this problem disappears. If it
4
was assigned as a difference to Pt the value would be
⎛ P + 2 ∗ Pt + Pt +1 ⎞
Pt − ⎜ t −1 ⎟
⎝ 4 ⎠
2 P − Pt −1 − Pt +1
= t
4
which is perhaps a measure of how much yesterday’s value was an over or under estimate
of some ‘true’ underlying value for the index.
Initial results from tests using the 3 day MAV showed that the overall percentage error is
2.8% (or 1.78% and 4% for the first and second half of the test set) and therefore use of
this measure was discontinued for the moment.
NETWORK TUNING
After 6 months intensive use of NeuroShell we have made the following observations
with regards to the package.
Hidden nodes
The number of hidden nodes suitable for a particular application is still an inexact
science. NeuroShell provides a simple tool, which in itself is a network, called
HIDNODES which can be used to determine the number of hidden nodes required for a
particular problem. HIDNODES expects three inputs (number of input nodes, number of
output nodes and a figure representing the complexity of the patterns in the sample data
set) and produces as output the number of hidden nodes to use.
This tool, though useful, does not guarantee that the number of hidden nodes it suggests
will work for the problem, since it requires the user to provide the network with a
subjective figure (from 0 to 10, where 0 being not very complex and 10 being very
complex). Depending on this figure the number of hidden nodes can vary in number of
tens (as shown in the following table).
Table 6.1 — Suggested number of nodes in relation the complexity of the problem
As an alternative, a good rule of thumb in deciding the number of hidden nodes required
is that the total number of weights in a network should be much less than the total number
of patterns in the sample set and the number of output nodes. This is to avoid having the
problem of overfitting, i.e. the network is memorising instead of generalising the given
input sample data, which results in the network producing very good results on the
sample data set but does very poorly in other data set. Using this rule of thumb, we
decided to use 25 hidden nodes. We arrived at this figure as follows:
In a 3-layer, fully connected network, each of the input node is connected to all the
hidden nodes and similarly each of the output node is connected to all the hidden nodes,
as shown below.
O1 O2
H1 H2 H3 H4 H5
I = input node
H = hidden node
O = output node
I1 I2 I3
Total connections, Tc = N H (N I +N O )
where NH = number of hidden nodes
NI = number of input nodes
NO= number of output nodes,
In the data sets that we used in the experiment, the total number of patterns, or cases in
NeuroShell terminology, is around 1500 (approximately 5½ years worths of data), and
the total number of input nodes is around 40 and there is usually only one output in the
networks.
Note that this is by no means a definite or a strict rule. We used this formula to give us an
initial value and found to be of value.
The value of 0.4 was found to give good predictions (together with the value of
Momentum, see below) and is occasionally reduced to 0.2 in some tests once the network
has learned for sometime, and the accumulated errors have not reduced further.
Momentum (0.0–0.9)
The term momentum in NeuroShell is different from that used in financial analysis where
it is used to mean the difference between the index value today and the value some
periods ago. In NeuroShell, momentum (µ) is a factor which determines the proportion of
the last weight change which is added to the new weight change.
In tests, it was found that the value of 0.6 (together with the value of 0.4 for the Learning
rate) produced the best results.
We have found that this value has some indirect effect on the learning accuracy of the
network, and the value of 0.0001 was found to produce networks that have learned
accurately.
Number of presentation
This is the number of times the data sets are presented to the network on a case by case
basis, where a case consists of all the financial indicators, e.g. FTSE, Dow Jones, etc., on
a particular day. Once the input case is seen by the network it produces an output and
compares it with the expected value and makes any necessary adjustments to the weights
in the nodes.
The performance of a network relates closely to the number of times a particular case is
seen by the network. In other words, as the number of times a case is seen by the neural
net increases the better the output of that case will be. However, care should be taken not
to over present (i.e. over train) the network by presenting the cases in the learning set
more than necessary. It is true that the network predictions get better as the number of
presentation increases, but this is only true for the learning set.
A network that is being trained too well on the learning set is normally useless in
predicting any events or values using data it has not seen before. This is the classic case
of the network ‘memorising’ the learning set and hence cannot generalise on any data that
lies outside the learning set.
Again, deciding how many presentations would produce an adequately learned-, good
generalising network is still an inexact science.
Presentation type
There are two ways in which the input data can be presented to the network; random and
rotation.
Random
In this method, the patterns, or cases, from the sample set are presented randomly to the
network. The advantage of this method is that the learning time is usually quicker than
the rotation method. However, there is a danger that if the number of cases is sufficiently
large in the sample set the learning time will be a great deal longer in order to ensure that
all the cases in the sample set are presented at least once to the network. If not, learning
will take place only from those randomly chosen patterns, and all the cases may not have
been seen by the network.
Rotate
As the name suggests the network learns by reading the data from the sample set, one day
at a time, in sequential and rotational order (from top to bottom of the files, and back to
top again). This method is useful for learning and predicting events which contain
historical information, and also ensures that all of the patterns in the sample set are seen
by the network. As the FTSE prediction involves the use of historical data this method of
data presentation was used most often and produced better results than the random
presentation.
The effect of the latter condition can be seen in one of tests (FT10ST01) where the
values of the test set fall outside the range of the sample set (+100, -300). The graph
below shows that the prediction of the values below -300 in the latter part of the test set
the neural net is hopeless at predicting values outside that of the known range, where the
percentage error was found to be close to 8% in that region compared with 1.3% for the
first half of the test set.
50
-50
-100
-150
-200
-250
-300
NOTE:
-350 File= FT10ST01
Presentation=3.4M Actual FTSE +2 days
-400 Threshold=0.4
Momentum= 0.6 Predicted
-450
-500
O O O O O O N N N N N N N D D D D D D D Ja Ja Ja Ja Ja Ja Ja Ja F F F F F F M M M M M M M M A A A A A A A M M M M M M M Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju A A A A A A A S S S S S S S O O O
ct- ct- ct- ct- ct- ct- o o o o o o o e e e e e e e n- n- n- n- n- n- n- n- e e e e e e a a a a a a a a p p p p p p p a a a a a a a n- n- n- n- n- n- n- l- l- l- l- l- l- l- l- u u u u u u u e e e e e e e ct- ct- ct-
91 91 91 91 91 91 v- v- v- v- v- v- v- c- c- c- c- c- c- c- 92 92 92 92 92 92 92 92 b- b- b- b- b- b- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- y- y- y- y- y- y- y- 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 g- g- g- g- g- g- g- p- p- p- p- p- p- p- 92 92 92
91 91 91 91 91 91 91 91 91 91 91 91 91 91 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92
Figure 6.2 — Limitation of neural net to predict values outside the known range
RESULTS
Results evaluation methods
Analysis methods
Trading requires accurate prediction of price movements over a one/two day period. Any
longer and although the prediction may come right, intermediate adverse values could
break position limits. Even with one day predictions, intra-day values could hurt but this
is less likely.
It is required to predict the one day, two day and three day closing values. Accurate one
day predictions would be the ideal. Experience suggests that there are many
circumstances where a move in one direction is reversed, at least partially, the following
day and hence our expectation is that a prediction of the two day value will be more
reliable. If the prediction for the day following our target day (+1 or +2) is in the same
direction as the prediction for the target day then our expectation that it will be fulfilled
should be greater. Hence the need for three day prediction, to add confidence to the two
day estimate.
It is important in trading to accept that on some days we make profits and some losses—
we do not expect to be right all the time but we have to restrict the losses and expect the
cumulative profit to grow steadily. The maximum accumulative downside must also be
acceptable.
In trading using the neural network, we therefore need to make decisions to be long,
neutral or short as this is a system for aiding positioning. We don’t want to make
mistakes, but we don’t want to miss opportunities.
All predictions are for differences from today’s level; not for absolute values with which
the neural network cannot cope.
Direction only
Numeric and graphical analysis should be used.
Straight count
Compares the actual and predicted direction of the FTSE of each case in the test set, and
output as follows:
Assign numeric values, e.g. success=1; failure=0, to each test case, and the overall result
is computed by finding the average number of successful cases, e.g. 125 from 250 implies
a 50% success rate.
In addition, the result is subdivided into two halves to give a feel for the effect of distance
from the learning set. (This is because the training set and the test set are derived from the
same source data file, chronologically ordered. This means that the first case in the test
set follows immediately from that of the last case in the learning set.)
The overall total should be split to indicate whether success is better on –ve or +ve
predictions and –ve or +ve moves.
Magnitude
Look at the magnitude of predicted changes compared with the last known value of the
index at that date and separate out moves of less than x% absolute where x is probably
0.5% or 1%.
success if |PC| > |xI| AND (PC/AC > 0), where I = Index value
This could be amended to include as failures small predicted moves which turned out to
be large.
Again, this analysis should be split into first half, second half and all of test set. Also, the
total should be split to show if success is better for +ve or –ve predictions and for +ve or
–ve moves.
Quantitative analysis
Initial quantitative analysis carried out on the output results is on the basis of percentage
errors. The percentage error (PE) is calculated as follows:
PI − AI
PE = ∗100
AI
1
i) Average PE ,
n
∑ PE( t ) .
ii) Std. Dev. PE,
iii) Average absolute PE,
iv) Std. Dev. absolute PE,
v) Max. PE,
vi) Min. PE.
The above analyses were carried out for first half, second half, and whole of both the test
set and the sample set.
Accurate 3 month predictions would be the ideal. However, as these Index values are
being predicted as differences which could be subject to significant variations in the
initial and final specific daily values some averaging is desirable.
Estimation of the two day average as a proxy for the FTSE was considered a good
compromise. Tests showed that it generally varied by less than 0.5% from the FTSE
value itself, and except on exceptional days the variation was within 1% bounds. As a
measure of noise, the volatility of the 2 day MAV (moving average) was 11.7%
compared to 16% for the FTSE.
The tests for short term values was applied to these longer term estimates as well. The
results needed to be presented initially as success in predicting the 2 day MAV but also in
terms of predicting the FTSE itself.
Preliminary results
In our earlier experiments with NeuroShell, the following tests produced the most
promising results, and of these three types of tests the latter two gave the best results of
the experiments and these are shown on the following pages.
Result sheets
The result sheets on the following pages are of three varieties:
Test Record Sheet—contains the summary of the conditions, input data sets and their
contributions, accuracy of prediction and any other information that is relevant to the
test
Line Graph—showing the actual and the predicted outputs plotted over time (usually
from October 1991 to October 1992). A 100% accurate prediction means that the
actual and the predicted graphs will be identical.
Scattered Graph—compares the values of actual and the predicted outputs. Again, a
100% accurate predictions means that the scattered values will be aligned with the
y=x line.
Although the overall direction accuracy is 80%, the average percentage error is 1.38% —
above the limit of acceptance. However, if we only look at the first half of the test, i.e. the
first 6 months from the last available data that was used in the training of the network, we
can see that the average error is 0.33%. This shows that the neural net produced better
predictions to events in the near future than those that are more than 6 months away.
Figure 7.1 shows the actual and predicted values of FTSE 2 day momentum plotted over
October 1991- October 1992, and figure 7.2 shows the comparison of the actual and
predicted values.
Figure 7.3 shows the actual and predicted values of FTSE 65 day momentum plotted over
October 1991- October 1992, and Figure 7.4 shows the comparison of the actual and
predicted values.
Numerical Analysis
To minimise the amount of input parameters required, correlation analysis was first
carried out on the derivatives of the FTSE index. Any data set which has high correlation
with other data sets is of little value as an input to the neural net. This is because it does
not contain any information that is not found in other correlated data sets already.
The table below shows the results of the correlation analysis. From the result of the
analysis we can remove those data sets which are highly correlated (> 85%) with more
than one others.
FTSE 1 day 1day 2 day 1 week 25day 50day 65 day 1 year % change % change % change
100 Return MOM MOM MOM MOM MOM MOM MOM of MOM of MOM of MOM
s over 10 over 25 over 50
days days days
FTSE 100 1
1 day Returns 1
1day MOM 0.98 1
2 day MOM 1
1 week MOM 1
25day MOM 1
50day MOM 1
65 days MOM 0.87 1
1 year MOM 1
% change of MOM 1
over 10 days
% change of MOM 0.98 1
over 25 days
% change of MOM 0.98 0.86 1
over 50 days
Close-2 Close-5 Close-25 Close-50 3 day 5 day 25 day 50 day MACD RSI 9 RSI 14 Zero cl-
day MAV day MAV day MAV day MAV ROC ROC ROC ROC day days cl- vol.
Close-2day MAV 1
Close-5 day MAV 0.85 1
Close-25 day MAV 1
Close-50 day MAV 0.88 1
3 day ROC 0.92 1
5 day ROC 0.9 1
25 day ROC 0.88 0.92 1
50 day ROC 1
MACD 0.85 1
RSI 9 day 1
RSI 14 days 0.86 0.97 1
Zero cl-cl- vol. 1
Direct experimentation
Due to the limitations of NeuroShell as well as the large number of input data sets that we
have generated a total of 3 tests had to be devised to test for the suitability of the inputs,
namely:
In all of the tests the +2 day momentum of the FTSE 2 day MAV was used as the
predictive item.
From the graphs we have extracted the extreme cases, i.e. the inputs which made the most
and least significance.
Observations
The majority of the Nikkei index derivatives fall in the medium to low significance
region, where as DAX, Dow Jones and CAC index derivatives made significant
contributions. This is not surprising, and at the same time confirms that the Japanese
market plays a less influential role in the movements of FTSE. It is safe to state that we
can remove most Nikkei index derivatives from the future tests since they do not make a
great deal of contribution.
The 1D MOM for many items are of low significance and could be excluded, perhaps in
favour of 2D MOMs.
The Day Numbers, representing seasonality are of higher significance than might have
been anticipated.
Table 7.3 — Most and least significant contributions of interest rates derivatives
Observations
Again, 1D MOMs, the Japanese indicators and inflation seem the least significant. The
2D MOMs seem to have significant value and also the 12 month MOMs.
The continued significance of both 9 and 14 day FTSE RSI suggests that we should try
this derivative for other items of data.
Table 7.4 — Most and least significant contributions of exchange rate and GDP derivatives
Observations
Surprisingly, the Yen/£ Exchange rate comes out as being significant. Perhaps, something
from Japan has to be!
FTSE, Exchange rate and economic data derivatives (65 days prediction)
When the previous test was rerun to predict the 65 days, instead of 2 days, momentum,
we obtained slightly different results, as shown in graphs FT14TD04.CFT and
FT14TD04.CFO. The table below shows the most and least significant contributors.
Observations
It can be seen that the 12 month % change of inflation and GDP indicators are the
prominent factors in the longer term prediction, compared to that of the short-term (2
day) prediction.
CONCLUSIONS
Highlights
At this stage of the project, to some extent we are still trying to understand the major
factors involved in training a network as much as concentrating our efforts in producing
networks which can predict with high accuracy. However, in the process of trying to
understand these factors we have also produced some networks which gave high accuracy
in their predictions namely:
Predicting 2day MAV of FTSE 65 days (3 months) ahead gave good results, e.g.
overall direction accuracy of 79%.
Predicting residual (i.e. where an approximated linear trend was removed from raw
values) of FTSE 2 days ahead also gave good results with overall directional
accuracy of 80%, and an average percentage error of 1.4%. In particular, the average
error for the first part of the test set was as low as 0.33%.
The predictions on the first half of the tests sets are better than those of the latter part
of the test sets.
Conclusions
It also appeared that the Japanese market indicators do not played a major role in the
tests. We will have to carry out further tests to see if all of the Japanese inputs can be
removed without suffering from loss of accurate prediction.
Predicting FTSE 2 day momentum, without trend, produced acceptable results and
we need to carry out further tests to see if this is also true for long-term predictions.
Network Tuning
In terms of using the NeuroShell package, we have made the following observations:
A network with more than one output consistently failed to converge (minimise
errors) on the training set, hence produced poor predictions.
A total number of 25 hidden nodes was found to be satisfactory with most of the
tests, when the number of input nodes is between 14 to 45. We have not done
extensive tests on networks with larger number of input nodes.
The values for Threshold and Momentum which consistently gave good results were
found to be 0.4 and 0.6 respectively, again when the number of input nodes is
between 14 to 45.
Reducing the values of Threshold and Momentum by 50% after the network has
been trained for sometime (around 2M presentation) did not improve the overall
predicted results.
The back-propagation algorithm used for learning the past experience of the market
cannot handle time-series data. The ability to handle time-series data is of great
importance for financial prediction since the network needs to learn the market
behaviour in the past.
Current method of preparation of input data sets, especially the calculation of
derivatives, in spreadsheet is time consuming, laborious, and most importantly, error
prone. A typical test would normally take from half to a full working day for
preparation and validation of the data.
NeuroShell allows networks with only a single hidden layer. The higher the number
of hidden layers the greater the network is able to recognise larger number of market
scenarios and be able to predict with greater accuracy.
No security measures to protect the network from inexperienced user, i.e. the
network parameters can be easily altered by users. This is dangerous because a
network can only predict accurately as long as the parameters in the network
remained unchanged. The package does not provide security measures to stop novice
users from tinkling with the network parameters which could result in the well-
trained network becoming next to useless.
Analysis methods
The results of directional analysis is somewhat misleading. The figure below shows a
result from one of the tests (FT10ST01). It can be seen from the graph that the
accuracy of direction is better on the first half compared with the second half of the
test set.
50
-50
-100
-150
-200
-250
-300
-350
-500
O O O O O O O O N N N N N N N N N N N D D D D D D D D D D D Ja Ja Ja Ja Ja Ja Ja Ja Ja Ja Ja F F F F F F F F F F M M M M M M M M M M M A A A A A A A A A A A M M M M M M M M M M M Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju Ju A A A A A A A A A A A S S S S S S S S S S S O O O
ct- ct- ct- ct- ct- ct- ct- ct- o o o o o o o o o o o e e e e e e e e e e e n- n- n- n- n- n- n- n- n- n- n- e e e e e e e e e e a a a a a a a a a a a p p p p p p p p p p p a a a a a a a a a a a n- n- n- n- n- n- n- n- n- n- n- l- l- l- l- l- l- l- l- l- l- l- u u u u u u u u u u u e e e e e e e e e e e ct- ct- ct-
91 91 91 91 91 91 91 91 v- v- v- v- v- v- v- v- v- v- v- c- c- c- c- c- c- c- c- c- c- c- 92 92 92 92 92 92 92 92 92 92 92 b- b- b- b- b- b- b- b- b- b- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- r- y- y- y- y- y- y- y- y- y- y- y- 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 g- g- g- g- g- g- g- g- g- g- g- p- p- p- p- p- p- p- p- p- p- p- 92 92 92
91 91 91 91 91 91 91 91 91 91 91 91 91 91 91 91 91 91 91 91 91 91 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92 92
However, the result from the directional analysis showed that the directional
accuracy is better in the second half than the first. The reason for this is as follows:
During the first half of the test set the fluctuation of the actual values are small and
oscillate around 0 line, and any small errors in the prediction make the predicted
values go below or above the 0 line. Since directional accuracy only looks at those
predicted values which are in the same region (above or below the 0 line) as that of
the actual values the overall directional accuracy in the first half is lower than that of
the second half of the set, where even though the differences between the actual and
the predicted values are large they both happened to be in the same region. The
implication is that the directional accuracy of the predicted results is dependent upon
the relative position of the 0 line.
A better assessment of the results was needed and for that we use the percentage
error and standard deviations of errors between the actual and the predicted output
for each case. This is done for both the learning and the test set for comparison. The
reason being that although the network does very well on the learning set (over 90%
accuracy), but not so on the test set should indicate that the network may be simply
memorising instead of generalising. By analysing both the learning and the test set
we hope to determine the right amount of learning (i.e. number of presentation of
cases) required to give good results on the test set, and any data the network has not
seen before.
APPENDICES
APPENDIX A
Back-propagation algorithm
The following is an extract from a paper by Camp7. A network learns by successive
repetitions of a problem, making smaller errors with each iteration. The most commonly
used function for the error is the sum of the squared errors of the output units:
E=
1
∑ ( yi − di )2
2
The value d i is the desired output of unit i, and y i is its actual output, where y i is the
sigmoid function 1 / (1 + e − x ) . To minimise the error, take the derivative of the error
with respect to w ij, the weights between units i and j:
δE
= y i y j (1 − y j )β j
δw ij
The error can then be calculated directly from the links going into the output units. For
hidden units, however, the derivative depends on values calculated at all the layers that
come after it. That is, the value β must be back-propagated through the network to
calculate the derivatives.
7 Drew van Camp, Neurons for Computer, Scientific American, September 1992, pp 125-127.
APPENDIX B
FTSE Index analyses
The following is the complete set of analysis that we have performed on the FTSE Index.
The analyses that were performed are as follows:
• Momentum (close to close difference)
• Returns
• Percentage Change of momentum
• Moving Averages
• Rate of Change
• Moving Average Convergence-Divergence
• Relative Strength Index
• Zero close-close volatility
The formulae described using Microsoft Excel notation below assumed that the
worksheet is set up as follows:
FTSE Derivative
INDEX
1 1234.5
2 1345.6
3 1456.7
4 1567.8
5 1678.9
6 1789.0
7 1890.1
8 1901.2
9 2012.3
10 2123.4
11 2234.5
Description
This is a measure of the difference between the today’s and previous days index, usually
over 1, 2, 5, 25 and 50 days. For use in NeuroShell this measure is preferred to the
absolute value of the index.
Formula
Momx = v - v[n]
where n= 1 to 260 days.
Excel formula
Momx=Ax-A(x-n)
Returns
Description
Formula
⎡ v ⎤
Returns = ⎢ln ⎥ ∗100
⎣ v[1] ⎦
Excel formula
Returnsx= LN(Ax/A(x-n))*100
Description
This is a measure of the % change of momentum between today and previous days index,
usually over 10, 25 and 50 days.
Formula
v − v[n]
PCM = ∗100
v
where n = 10, 25 and 50 days.
Excel formula
PCMx = (Ax-A(x-n))/Ax*100
Description
This is a measure of the arithmatic mean of the index. Of the various moving average
measures this is the simplest and often known as Simple Moving Average (SMA).
A moving average smooths out fluctuations in values and may help to indicate trends in
the market. A shorter moving average (i.e. when n is small) is more sentive to changes
and results in less smoothing than a longer moving average.
Normal usage is in comparing the value of ROC with the raw index data. When there is a
divegence between the ROC and the price, followed by a break in the trend, this indicates
the signal to buy or sell.
Formula
n −1
∑
1
MAV = v[ 0] − ∗ v[i]
n i= 0
Excel formula
MAVx = Ax - sum(A(x-n):Ax)/n
Description
The rate of change measures how fast the momentum of the index is changing.
Formula
v
ROC = × 100
v[n]
Excel formula
ROCx= (Ax/A(x-n))*100
Description
This is an indicator of overbought and oversold signals in the market. This measure is
obtained by working out the difference of the two exponential moving averages of short
and long periods. When the difference in value is greater than the exponential moving
average of the difference, it can be a signal to buy. Conversely, when the difference in
value is less than the exponential moving average of the difference, it can be a signal to
buy.
In addition, if the MACD lines are too far above or below the zero line, they could
indicate an overbought or oversold situation respectively.
Formula
w= EMA(v,sf2) - EMA(v,sf1)
MACD = EMA(w,sf3)
where EMA(v,sf) is defined by:
where sf, sf1, sf2, sf3 = smoothing factor (0.0-1.0), and sf2>sf1
Excel formula
Here, the EMA for long and short periods are calculated first, in two different columns,
e.g.
EMA1(sf1) = v (for the first value) EMA1(sf2) = v (for the first value)
EMAx(sf1) =0.02*vx+0.98*EMAx-1 EMAx(sf2) = 0.05*vx+0.95*EMAx-1
MACD x = EMA x (sf 3 ) = 0.1 × [EMAx ( sf 2 ) − EMAx ( sf1 )] + 0.9 × MACD( x −1)
Description
This is an indicator of trend reversals in the market, and is preferred over the momentum
indicators.
Formula
If MEMA(u,n) = MEMA(d,n) =0
then
RSI(v,n) = 50
else
100 × MEMA(u,n)
RSI(v,n) =
MEMA(u,n) + MEMA(d,n)
where v = close
u = max(v-v[1],0)
d = max(v[1],0).
MEMA(v,n) is given by:
Excel formula
Again, MEMA for the two cases, MEMA(u,n) and MEMA(d,n), are calculated first.
Description
This is an estimate of volatility in the market and the major assumption here is that the
underlying distribution has a zero trend.
Formula
y = ln(close/close[1])
t = time (in years) until end of period
⎧ 1 n −1 y[i ]2 ⎫
ZCCV = 100 ∗ ⎨ ∗ ∑ ⎬ zx
⎩ n i =0 (t[i ] − t[i + 1]) ⎭
⎧ 1 n 2⎫
= 100 ∗ ⎨ ∑ yi ⎬ ∗ 256
⎩ n − 1 i =1 ⎭
Excel formula
ZCCVx = 100*STDEV(Ax:Ax-n)*SQRT(256)