Sie sind auf Seite 1von 46

1222 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO.

6, NOVEMBER 1997

Neural Networks in Financial Engineering:


A Study in Methodology
Apostolos-Paul N. Refenes, A. Neil Burgess, and Yves Bentz

(Invited Paper)

Abstract—Neural networks have shown considerable successes and so asset risks were estimated by the standard deviation of
in modeling financial data series. However, a major weakness historical returns.
of neural modeling is the lack of established procedures for Subsequently, portfolio theory suggested that the efficient
performing tests for misspecified models, and tests of statis-
tical significance for the various parameters that have been frontier [51] is obtained by solving for the weights which
estimated. This is a serious disadvantage in applications where maximize a utility of the following form:
there is a strong culture for testing not only the predictive
power of a model or the sensitivity of the dependent variable
to changes in the inputs but also the statistical significance of
the finding at a specified level of confidence. Rarely is this more
important than in the case of financial engineering, where the (1)
data generating processes are dominantly stochastic and only
partially deterministic. Partly a tutorial, partly a review, this According to (1) the portfolio’s expected return is
paper describes a collection of typical applications in options determined by the expected returns of the individual securities
pricing, cointegration, the term structure of interest rates and in the portfolio and the proportion of each security
models of investor behavior which highlight these weaknesses represented in the portfolio . The expected risk of the
and propose and evaluate a number of solutions. We describe a
number of alternative ways to deal with the problem of variable portfolio is determined by three factors: the proportion of
selection, show how to use model misspecification tests, we deploy each security represented in the portfolio , the standard
a novel way based on cointegration to deal with the problem of deviation of each security from its expected return,
nonstationarity, and generally describe approaches to predictive and the correlation between these deviations for each pair
neural modeling which are more in tune with the requirements
of securities in the portfolio . (The term is
for modeling financial data series.
commonly referred to as the covariance).
Index Terms— Cointegration, computational finance, financial This traditional assumption was founded upon the theory of
engineering, model identification, model selection, neural net-
works, variable selection, volatility, yield curve.
market efficiency, which stated simply implies that all public
information on future price movement for a tradable asset
has already been incorporated in its current price, and that
I. ACTIVE ASSET MANAGEMENT, therefore it is not possible to earn economic profits by trading
NEURAL NETWORKS, AND RISK on this information set. In statistical terms, this implies the
so-called “random walk” model, whereby the expectation for
T HE ultimate goal of any investment strategy is to max-
imize returns with the minimum risk. In the framework
of modern portfolio management theory, this is achieved by
the next period is the current value. The empirical finance
literature up to the 1970’s universally reinforced this view
constructing a portfolio of investments which is weighted in for all actively traded capital markets, by testing and failing
a way that seeks to achieve the required balance of maximum to refute the random walk hypothesis on daily, weekly, and
return and minimum risk. The construction of such an optimal monthly data. Yet this posed a serious dilemma, a gulf between
portfolio clearly requires a priori estimates of asset returns theory and practice, as traders did continue to make profits in
and risk. Traditionally, it used to be accepted that returns are the short term. If they were just lucky, their luck seemed to
random and that the best prediction for tomorrow’s return is show no signs of running out.
today’s return. Over a longer period, expected returns were By the end of the 1980’s theory had matured to provide a
calculated by averaging historical returns. Any deviation from more comfortable fit with trading realities. In the first place
this (“naive”) prediction was considered as unpredictable noise it was recognized that the conventional tests of the random
walk hypothesis were very “weak,” in the sense that the
Manuscript received August 10, 1996; revised August 10, 1997. This work evidence would have to be very strong to reject this null
was supported in part by the ESRC under the ROPA program, Research hypothesis. Typically, period by period changes were tested for
Grant R022 250 057, Barclays, Citibank, Hermes Investment Management, the
Mars group and Sabre Fund Management. The methodology and applications zero mean and white noise. Minor departures from randomness
described in this paper were developed over a period of two years at the would not be significant in these tests; yet it only takes
Decision Technology Centre, London Business School. minor departures to offer real trading opportunities. From
The authors are with the London Business School, Regents Park, London
NW1 4SA, U.K. the perspective of scientific method, it is remarkable that the
Publisher Item Identifier S 1045-9227(97)07972-1. EMH should have gained such empirical support based upon
1045–9227/97$10.00  1997 IEEE
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1223

a testing methodology that started by assuming it is true, and departure from EMH let us formulate portfolio management
then adopted tests which would rarely have the power to refute theory in a more general framework in which the mean-
it! variance optimization is a special case. The general case is
Econometric tests introduced during the 1980’s specified a simple extension to the utility in (1), whereby
a more general model for the time series behavior of asset
returns, involving autoregressive and other terms, such that
the random walk would be a special case if the first-order
autoregressive coefficient were equal to one, and all others
were zero. Thus a more general structural model was proposed
for which the random walk is a special case. It turned out that
under this model-based estimation procedure, it was possible
to reject the random walk special case hypothesis for almost (3)
all of the major capital market series. Not only is this turn-
around more satisfactory in providing results which close the where represents the expected return for security
gap between statistical conclusions and practical observation, conditioned on its exposure to a vector of factors and
it also demonstrated the methodological need to propose a defines the exact nature of the model by indexing a class of
general model first, before concluding that a time-series has structured models or predictors. For example for the random
no structure. walk, and for a multifactor CAPM model
Finance theory has now matured to the position whereby takes the form of the structured model in (2).
markets can still be considered efficient in the more sophisti- measures the deviation of each security , from
cated way of representing the expectations, risk attitudes and its expected value i.e., the standard error of model for
economic actions of many agents, yet still have a deterministic each security in the portfolio. For example for the random
component to their price movements relating to fundamental walk this is given by the standard deviation i.e.,
factors. Thus we now have the so-called “multifactor” capital but for the more general case the prediction
asset pricing model [70] and arbitrage pricing theory [65] risk is
which attempt to explain asset returns as a weighted com-
bination of the assets’ exposure to different factors as shown
in (2)
(2)
According to (3), the expected return of our portfolio is
where is the return of asset are the determinant factors, determined by two factors: 1) the returns of the individual
the exposure of asset to factor is the expected securities in the portfolio whose expectation for the next period
abnormal return from the asset, and is the nonpredictable is no longer the historical average but is given by and
part of the return, i.e., the error of the model. 2) the proportion of each security represented in the portfolio.
Hence, the “naive” estimate (or unconditional expectation) The expected risk of the portfolio is determined by three
of asset returns is replaced by a more general estimate condi- factors: 1) the proportion of each security represented in the
tioned on the values of (fundamental or market) factors . portfolio; 2) the deviation of each security from its predicted
Accepting the random work hypothesis is now the default return (i.e., the standard error of model for each security in
case which will be accepted should it turn out that none of the portfolio); and 3) the correlation between the prediction
the factors under consideration is statistically significant. In errors for each pair of securities in the portfolio.
the more general case there is no reason why the structured Clearly, the expected value of portfolio [as defined in (3)]
model in (2) should be limited to include only linear structures assumes that our expectations for the individual returns are
with noninteracting independent variables and Gausian distri- accurate. However, this is never the case and the actual value
butions. In terms of inviting the question of how general a of the portfolio depends on the accuracy of the individual
model should be proposed in the first place, this focus on predictors. To illustrate the effects (benefits and risks) of
empirical model-building allows us to consider the use of prediction accuracy on actual portfolio value let us consider
neural-network technology as being a consistent, if extreme, a simple portfolio composed of two securities. In order to
example, of this new approach. By proposing the most general separate the effects of prediction accuracy from the effects that
of modeling frameworks, it is also providing a stronger “test” are due to covariances between prediction errors we make one
for market efficiency conclusions, albeit with tests that are not of the securities the risk-free. For the other asset, we simulate
based upon statistical hypothesis protocols, but on accuracy it as hypothetical security the returns of which randomly
and performance metrics. generated (100 independent observations, zero mean, standard
However, this seemingly minor departure from the efficient deviation one). For each point in the series, we then generate
markets hypothesis (EMH) has major implications on the way a set of predictors (with increasing predictive power) and for
in which we manage risk and reward. It also induces strin- each predictor we compute the actual value of the portfolio.
gent requirements on the design and validation of predictive Fig. 1 shows the value of this portfolio as a function of
modeling methodologies, particularly so on neural networks prediction accuracy. The prediction accuracy, plotted in the
(NN’s). To appreciate the implications of this apparently minor axis is in fact the correlation between actual and predicted
1224 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

Fig. 1. “It is not actually terribly difficult to make money in the securities markets.”1 Actual return/risk of a simple portfolio of two assets as a function
of prediction accuracy (). The x axis shows the correlation between actual and predicted returns. The actual return/risk of the mean-variance portfolio
(correlation zero) is positioned at the origin.

Fig. 2. “It is actually terribly difficult to make money in the securities markets.”2 Actual return/risk of a simple portfolio of two assets as a function of prediction
accuracy. The x axis shows the correlation between predicted and actual. The return/risk of the mean-variance portfolio ( = 0) is shown by the horizontal line.

for the hypothetical security. The axis plots the actual value are helping to keep EMH in widespread use, one does not need
of the portfolio equation. to look further than understanding the risks which can arise if
The return/risk for the mean variance optimization (corre- the predictive models are of poor quality. Fig. 2 shows how
lation zero) is positioned at the origin. As we move from portfolio value decreases as a function of prediction accuracy.
the “naive” predictor (with ) toward the theoretically The axis shows the correlation between actual and predicted
perfect predictor (with ) the actual return increases but for each pair of securities in the portfolio. The theoretically
not uniformly. It is much steeper in the initial stages but it perfect predictor (with ) is depicted on the right-hand
tails off for predictors with . Most of the payoff is side (RHS) of the axis. The worst-case predictor (with
obtained by predictors which can explain between 15–25% of ) is depicted on the extreme left-hand side (LHS)
the variability in the securities’ returns. In other words it only of the axis. The “naive” predictor (with ) corresponds
requires minor improvement upon the random walk hypothesis to the random walk model.
to gain significant improvement in returns. It is clear from Fig. 2 that in terms of risk/reward the random
In view of this fact, it is seemingly remarkable that the walk model is a rather “efficient” predictor, despite its “naive”
EMH should have gained such wide acceptance and have nature. But it is also clear from Fig. 1 that it only requires
survived for so long. However, to appreciate the forces which minor improvements upon the random walk hypothesis to

1 Allegedly attributable to Peter Baring, two months prior to the collapse of 2 Nick Leeson, “Rogue Trader,” (1996) immediately after the collapse of
Barings Bank. Barings.
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1225

gain significant improvement in returns. It is also clear that if In Section II, we use the problem of high-frequency volatil-
any predictive modeling methodology is to become as widely ity forecasting to demonstrate a stepwise modeling strategy
accepted as the random walk model it must be accompanied which builds upon classical linear techniques for model iden-
by robust procedures for testing the validity of the estimated tification and variable selection. The analysis identifies some
models. In terms of extending the EMH to provide a more important linear and nonlinear characteristics of the data with
comfortable fit with trading realities in the more sophisticated strong implications for the market maker. A strong linear
way of representing the nonlinearities in the expectations, risk mean reversion component of intraday implied volatility is
attitudes, and economic actions of many agents participating reported. A second component induced by the bid/ask bounce
in the market, NN’s can be seen as a consistent example of of volatility quotes is also present. Changes in the underlying
multifactor CAPM and APT models whereby asset returns asset have a significant non linear effect on implied volatility
can be explained as a nonlinear combination of the assets’ quotes over the following trading hour. The evolution of the
exposure to different factors strike price also has a strong influence on volatility changes
over the next trading hour. Finally, a volatility smile is reported
(4)
indicating how implied volatility innovations, are related to
where is an unknown function, the inputs are drawn maturity effects.
independently with an unknown stationary probability density In Section III, we use the principle of cointegration to deal
function, is a vector of free parameters which determine with nonstationary data and to demonstrate how partial -tests
the structure of the model and is an independent random can be used to perform significance tests on parts of an NN
variable with a known (or assumed) distribution. The learning and particularly variable selection. The approach is applied to
or regression problem is to find an of given the nontrivial problem of explaining the cointegration residuals
the dataset D from a class of predictors or models indexed by among European equity derivative indexes in the context of
. exchange rate volatility, interest rate volatility, etc.
The novelty about NN’s lies in the ability to model nonlinear In Section IV, we analyze the problem of modeling the
processes with few (if any) a priori assumptions about the yield curve to demonstrate how the careful application of
specific functional form of . This is particularly useful financial economics theory and regularization methods can
in financial engineering applications where much is assumed be used to deal with both the problems of nonstationary and
and little is actually known about the nature of the processes with variable selection. A factor analysis indicates that the
determining asset prices. Neural networks are a relatively bulk of changes in Eurodollar futures are accounted for by
recent development in the field of nonparametric estimation. unpredictable parallel shifts in the yield curve which closely
Well-studied and frequently used members of the family follow a random walk. However, the second and third factors
include nearest neighbors regression and kernel smoothers correspond roughly to a tilt and a flex in the yield curve and
(e.g., [39]), projection pursuit [31], alternating condional ex- these show evidence of a degree of predictability in the form of
pectations (ACE’s), or average derivative estimation (ADE) mean-reversion. We construct portfolios of Eurodollar futures
[78], classification and regression trees (CART’s) [11], etc. which are immunized against the first two factors but exposed
Because of their universal approximation properties, NN’s to the third and compare linear and nonlinear techniques to
provide an elegant formalism for unifying all these paradigms. model the expected return on these portfolios. The approach
However, much of the application development with neural is best described as attempting to identify combinations of
networks has been done in an ad hoc basis without due assets which offer opportunities for statistical arbitrage.
consideration for dealing with the requirements which are In Section V, we take a metamodeling approach and argue
specific in financial data. These requirements include: 1) that even if security price fluctuations are largely unpre-
testing the statistical significance of the input variables; 2) dictable, it is possible that investor behavior may not be.
testing for misspecified models; 3) dealing with nonstationary Within this framework, we construct a metamodel of investor
data; 4) handling leverages in the datasets; and 5) generally behavior using a dataset composed of financial data on 90
formulating the problem in a way which is more amenable to French Stocks drawn from the SBF250 index. Based on the
predictive modeling. predictions of the metamodel, we show how it is possible to
In this paper we describe a collection of applications in manage actively investment strategies rather than the underly-
options pricing, cointegration, the term structure of interest ing assets in order to produce excess returns.
rates, and models of investor behavior which highlight these
weaknesses and propose and evaluate a number of solutions. II. VOLATILITY FORECASTING AND VARIABLE SELECTION
We describe a number of ways for principled variable selection
including a stepwise procedure building upon the Box–Jenkins
methodology [9], analysis of variance and regularization. We A. Overview
show how model misspecification tests can be used to ensure Of all the inputs required in option valuation, volatility is
against models which make systematic error. We describe the most difficult for traders to understand. At the same time,
how the principle of cointegration can be used to deal with as any experienced trader will attest, volatility often plays
nonstationary data and generally describe ways of formulating the most important role. Changes in our assumptions about
the problem in a manner that makes predictive modeling more volatility can have a dramatic effect on our evaluation of an
likely to succeed. option, and the manner in which the market assesses volatility
1226 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

can have an equally dramatic effect on an option’s price. The positive or negative) in the price of the underlying index
two basic approaches to measuring volatility [15] are either induce a U-shaped response on the quoted volatility while a
to compute the realized volatility over the recent past from volatility smile is reported near expiration dates.
historical price data or to calculate the “implied volatility”
from current option prices by solving the pricing model for B. Experimental Setup—Empirical Properties of Volatility
the volatility that equates the model and market prices. It has
become widely recognized in the academic finance literature We examine intraday movements of the Ibex35 Implied
that the implied volatility is the “market’s” volatility forecast, Volatility series obtained from short maturity close to the
and that it is a better estimate than historical volatility [47], money call options during a six-month sample period: Novem-
[17], [75], [4], [73], [55]. Other approaches to forecasting ber 92 through to April 93. Intraday historical data are avail-
volatility have been available in the past years using formal able on the Spanish Financial Futures Exchange option con-
time-varying volatility models, fitting models of the ARCH tract on Ibex35. The Ibex35 index contains the 35 most liquid
family to past data and computing the expected value of future stocks that trade in the Spanish Stock Exchange through
variance [79]. Again, it has been found that volatility forecasts its CATS system. This dataset has been made available by
estimated from call and put options prices with exogenous the research department of MEFF providing high-quality and
variables contain incremental information relative to standard precise real time information from electronically recorded
ARCH specifications for conditional volatility which only trades. Options on Ibex35 are European style options, have
use information in past returns. The two approaches are not a monthly expiration date and at every point in time the three
mutually exclusive. Subsequently, it would be desirable to closer correlative contracts are quoted (i.e., in March 1993,
develop nonlinear models of implied volatility which take there will be quotes for the end of March, April, and May
advantage not only of the time structure in a (univariate) contracts). The measure of the implied volatility is obtained by
series (of volatility) but also make use of exogenous variables solving the pricing model [8] for the volatility that equates the
i.e., information contained in other potentially informative model and market prices using the Newton–Raphson method.
variables that have been reported to have an influence on Our sampling interval is 60 min and the prediction interval
volatility such as trading effects, maturity effects, spreads, etc. one (60-min) period ahead. The calculated volatility is the
The literature suggests several potential advantages that implied volatility of the option price which is nearest to the
NN’s have over statistical methods for this type of modeling next time border. We define the change in implied volatility as
but one of the important weaknesses of NN’s is that they are as
yet not supported by the rich collection of model diagnostics (5)
which are an integral part of statistical procedures. We utilize
a stepwise modeling procedure which builds upon classical where denotes the hourly changes of implied volatility at
linear techniques for model identification and variable selec- time . So is the difference between the implied
tion. Within this framework, we investigate the relationship volatility immediately past the time border 12:00:00 and the
between changes in implied volatility and various exogenous implied volatility immediately past the time boarder 11:00:00.
variables suggested in the literature. Using European index Our empirical analysis of the implied volatility uses volatil-
Ibex35 option data we investigate the construction of reliable ity changes as defined in (5) rather than the levels as quoted (in
estimators of implied volatility by a simple stepwise procedure basis points) at MEFF. Two considerations motivate this. First
consisting of two phases. the academic and practitioner interest lies in understanding the
In the first phase the objective is to construct a well-specified changes or innovations to expected volatility: how changes
model of implied volatility which captures both the time de- in expected volatility influence changes in security valuation.
pendency of implied volatility and the most significant linear Second, the series of Ibex35 implied volatility levels appear to
dependencies on the exogenous variables. In the second phase, be a near random walk. Implied Volatility levels have a first-
the objective is to provide incremental value by extending the order autocorrelation of 90%, indicating that although a unit
model to capture any residual nonlinear dependencies which root can be rejected for the series such high autocorrelation
may still be significant. may affect inference in finite samples.
We show that with conservative use of complexity penalty We select our universe of exogenous variables (see summary
terms and model cross validation, this simple step-wise for- on Table I) on the basis of their availability to the market
ward modeling strategy has been successful in capturing maker at a specific point of time and the variable’s relevance
both the main linear influences and some important nonlinear to explaining implied volatility as reported in the literature. For
dependencies in high-frequency models of implied volatility. example, [27] and [29] found that volatility is larger when the
Changes in intraday implied volatility are dominated by a exchange is open than when it is closed while [30] suggested
mean reverting time dependency component which is primarily that volatility is to some extent caused by trading itself. It
a linear effect. The NN model is able to capture this effect would therefore seem desirable to make use of an exogenous
and also some significant nonlinear characteristics of the data variable to account for trading effects on volatility. The high-
with important implications to the market maker. Of particular frequency data available in this study facilitates the use of two
significance is the relationship between implied volatility and such variables: volume and velocity. Volume is the number of
the effects on the strike, maturity, and changes in spot prices of contracts traded in an hour and velocity is the number of
the underlying market. For example, large movements (either trades per hour.
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1227

TABLE I
SUMMARY OF ALL VARIABLES

Day and Lewis [20] found that option prices reflect increases model to account for the volatility smile replacing the constant
in the volatility of the underlying stock indexes at both local volatility term with a local volatility function deduced
quarterly and nonquarterly expiration dates. The behavior of from observed market prices. Pozo [58] has observed U-
the implied volatility for options spanning the expiration date shaped patterns occurring in the Ibex35 options during various
is consistent with an unexpected component to the increase subperiods between 1991 and 1993. It is not the task of this
in volatility nearby expiration dates. It would be desirable to study to look for volatility smile patterns in the Ibex35 data
account for this maturity effect by making use of an exogenous but in order to account for a possible volatility smile it would
dummy variable which is encoded as one when there are four be desirable to make use of a measure of the degree to which
or less days to expiration and zero otherwise. A separate short maturity call options are in-, at-, or out-of-the money.
variable, time to maturity, is introduced for the following We use the variable moneyness as a measure of the degree
reason: volatility is calculated for call options close to the to which short maturity call options are in-, at-, or out-of-the
money with short time to maturity (less than three months) money. This is calculated as the average ratio of the spot/strike
and the quotation system provides the three closer correlative price for every option traded within the hour.
contracts. This means that the obtained implied volatility series It has also been argued that volatility is expected to be
has been derived from different maturity contracts. Although higher on certain days than others as well as on certain times
we expect that the implied volatilities obtained from different within the day. While this problem cannot be totally eliminated
maturity contracts will be significantly similar (due to the fact it is helpful to include a weekend or day effect in our models
that we are dealing with short maturity contracts), we shall and confirm of refute the presence of negative correlation
nevertheless introduce a time to maturity variable. in the time series. Three variables are used to capture these
One possible explanatory factor for future changes in im- effects: day, week-end, and time effect: day effect is a variable
plied volatility that is often described in literature, is the set to (1, 2, , 5) for each week day. The weekend effect
relative bid-ask spread or any shifts in it. In a market with zero is set to one for Fridays and Mondays and zero elsewhere.
transaction costs, changes in the price of an option are caused Likewise a dummy time of trade variable is incorporated as
by the arrival of new information regarding the distributions an input into the models: trades registered in the first hour of
of returns on the option or by innovations in the process the day are coded one, the rest are coded zero (i.e., overnight
governing the option value. In other words, if the dealers incur new information can affect the behavior of the market at the
costs in executing a transaction, they require compensation. opening hour).
One part of the compensation includes the bid-ask spread. The remaining variables in Table I are easy to encode:
On the other hand, the market maker will widen the bid-ask Change in spot is a measure of changes in the spot price of
spread if the probability increases that he is dealing with better the underlying asset (Futures on Ibex35) at the end of every
informed traders [18]. The variable average-spread is used to hour. So, at 12:00:00 the value of the variable change of spot
encode this information as the sum of the spreads in all trades will be the difference of the closing price at (or just before)
taking place within an hour divided by the number of trades 12:00:00 minus the closing price at (or just before) 11:00:00.
in the hour. Historic volatility is the standard deviation of past index
Some recent studies of S&P index options show that options Ibex35 returns. The horizon over which historic volatility
with low strikes have significantly higher implied volatil- is computed is related to time-to-maturity of traded options.
ities than those with high strikes [34]. Derman and Kani When the majority of options traded have a time to maturity
[21] showed that the average of at-the-money call and put longer than 15 days, a historical volatility measure is computed
implied volatilities fall as the strike level increases. Out-of- with a sample horizon of 25 days of index returns. Otherwise
the-money puts trade at higher implied volatilities than out- the sample horizon is 15 days. The combination of these two
of-the-money calls. This asymmetry is commonly called the measures appears to have the highest correlation coefficient
volatility “skew.” They attempt to extend the Black–Scholes with the implied volatility series. Finally, the interest rate is
1228 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

the current yield of the T-bill whose maturity most closely Arguably, if the ARMA(p, q) orders are “well specified,”
matches the option expiration. by definition we accept the hypothesis that the residuals
The objective here is to identify the subset of variables are white noise disturbances. In practise however, due
which are most significant in explaining implied volatility to weaknesses in randomness tests it is often observed that
variations from the universe of the 11 variables outline above the addition of exogenous variables may also produce “well
also augmented by any significant time-dependency terms. specified” models with better performance.
This requires a procedure for searching a relatively large space 2) Identifying exogenous influences: Using stepwise mul-
of candidate variables and selecting those which: 1) give the tiple linear regression analysis identify the predictive
best explanation in terms of ; 2) have the strongest effect; power of each exogenous variable and build a par-
and 3) are statistically the most significant. The modeling simonious model, incorporating any error correcting
strategy is described in the next section. and/or autoregressive terms as suggested in Step 1), e.g.,

(7)
C. Modeling Strategy
Given the hypothesis that the data generating process for
implied volatility is inherently a non linear process (as the where are the most significant
option pricing model would suggest) it would seem desirable exogenous variables.
to attempt to estimate the unknown “true” model with a 3) Identifying nonlinear dependencies: Using multivari-
nonparametric method which can capture nonlinear effects ate NN analysis with the variables and/or error correct-
such as an NN. However, a major weakness of NN’s is the lack ing/autoregressive terms identified in Step 2), construct
of an established procedure for performing tests of significance a well-specified neural model of implied volatility
for input variables. Many techniques and algorithms have
been proposed to “measure” the “effect” of different parts of (8)
an NN model, usually for use in one of three cases: either
input variables (in the case of methods devoted to “sensitivity
with
analysis”), weights (in the case of “pruning” algorithms) or,
and the vectors denoting the significant
and more rarely, hidden units. Typically these methods are not
autoregressive terms, moving average terms, and exoge-
supported by a coherent statistical argument but are instead
nous variables, respectively, as identified in (7). In this
justified on the practical grounds of being better than nothing.
paper the number of hidden units are chosen by simple
In the cases where statistical arguments are presented the
cross validation but more sophisticated methods can be
algorithms are often very computationally and conceptually
used.
complex (for instance requiring the calculation of the Hessian
matrix for the network, and/or assuming nonnegative eigen- Clearly, this simple procedure is rather limiting to the
values) and always prone to producing misleading results due neural-network approach for two reasons. Firstly, the predic-
to overfitting. tive variables have already been selected in Step 2) in a way
This section describes a step-wise procedure for neural that best suits the linear regression. It is therefore probable
model identification which builds upon classical linear tech- that the selected variables only explain the linear part of the
niques for model identification and variable selection. We relationship. Nevertheless, even in these restrictive conditions
adopt a conservative approach whereby complexity is intro- it is possible that a nonlinear estimator can provide incremental
duced only if the resultant model provides significant incre- value to the model, perhaps arising from the interaction
mental value, and the additional parameters are statistically between the exogenous variables and/or the time dependencies.
significant. The procedure consists of two phases. In the first Furthermore, to the extend that one of the main criticisms
phase the objective is to construct a well-specified model of of NN methods is their lack of statistical explainability,
the dependent variable (i.e., implied volatility) which captures working from a platform of statistical insight and looking
both the time dependency of implied volatility and the most for incremental value over a conventionally understood set of
significant linear dependencies on the exogenous variables. input variables, we can view this restriction as being the price
In the second phase, the objective is to provide incremental for potentially improved credibility. Second, it is possible that
value by extending the model to capture any residual nonlinear variables which could have higher explanatory power in the
dependencies which may still be significant. nonlinear sense may have been rejected by the linear criterion
Phase 1: in the first phase. This hypothesis needs to be verified in a
second pass.
1) Identifying time structure: Using univariate time series
The second phase of the model identification procedure
analysis and the Box–Jenkins identification procedure
verifies if incremental value can be achieved by including
investigate the time structure of the output variable and
additional variables in the model which although rejected
build a reliable estimator of implied volatility of the form
by the linearity criterion in the first phase, may still have
significant explanatory power in the nonlinear sense. The
(6) second phase is an iterative procedure analogous to forward
variable selection with regression analysis.
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1229

Phase 2: Clearly, it is possible to perform the search entirely in the


1) Forward model estimation: For each variable not opposite (backward) direction. Our main reason for preferring
already included in the input vector of the model at forward selection is that it is difficult to obtain an accurate
the current iteration construct a new well-specified estimate of the effective degrees of freedom in a neural model
model incorporating all the old variables plus each for the complexity penalty term. Although several methods
as follows: are described in the literature (e.g., [53] among others) they
require the Hessian to be positive definite which in practice is
(9) difficult to achieve and very expensive to compute. As we are
this will produce as many new models as there are taking a cautious approach to avoid overfitting our estimate of
“unused” variables. The starting input variable vector will always be upper bounded by the number of additional
comprises only those significant variables that were parameters that a variable introduces to the model.
identified in (7) and used in (9). Another optimization to the procedure described in phase 2
2) Variable significance estimation: Compute a above would be to control the selection of candidate variables
complexity-adjusted payoff measure to evaluate on the basis of the (linear) correlation of each candidate
the change in model performance that would result if variable with the residuals of the current model or more
variable were added to the model. The payoff is effectively in the nonlinear case to use analysis of variance.
computed by Such optimizations are analyzed with synthetic data under
controlled simulation elsewhere (see, for example, [14]) and
(10) also in Section III of this paper.
Having described the basic steps in our stepwise modeling
The first term in (10), measures the correlation between strategy, we return our attention to its application on the
the observed values of the dependent variable (implied problem of modeling implied volatility changes.
volatility) and the predictions of the model which uses
an additional variable The second term in (10)
penalises the model for the complexity introduced by the D. Identifying Time Dependencies—Time Series Analysis
additional degrees of freedom, . There are several ways Let us begin by considering the univariate properties of the
in which we can measure payoff against complexity in hourly changes in volatility. Summary statistics are calculated
an NN model. These will be explored in Section II-G. for the output variable over the entire sample and for three
3) Model extension: If during this current iteration there different subperiods (to highlight any nonstationarity or drift
exists no model whose performance is greater than effects). Each subsample is a correlative window and contains
the previous iteration the procedure is termi- 200 observations. Table II shows the subsample periods and
nated. Our baseline metric, , is defined as the correla- summarizes the properties of hourly changes in implied volatil-
tion between observed and predicted by the multivariate ity. The mean hourly changes ranges from 0.011% in the
neural model which uses only the variables selected third subsample to 0.0134% in the second period. The mean
by the linear analysis [see (8)]. If there is at least volatility change over the entire sample is 0.28%. Apparently,
one model which outperforms the baseline metric we over the six-month period, market volatility did not drift in
proceed to construct a well-specified model in which one direction or another. The standard deviation of the hourly
is extended to include all volatility changes is also fairly stable, ranging from 0.1985 in
the old variables plus one (or more) new variables the second subsample to 0.2422 in the third.
selected on the basis of the highest payoff . The Table II also provides the autocorrelation structure of the
procedure is repeated from Step 1). hourly volatility changes for one through six lags. Like the
The procedure described here is only one of several alterna- mean and standard deviation, the autocorrelation structure is
tive ways in which forward selection can be controlled. Ideally, quite stable for the three subsamples. The first-order autocor-
it would be desirable to consider not only a single candidate relations, for example, range from 0.507 to 0.435 and the
variable in each step but (at least) pairwise combinations second-order range from 0.102 to 0.017, revealing a negative
of variables. This is particularly important if the candidate correlation. This degree of correlation, is higher than the
variables are believed to be interacting in some way, cf. the autocorrelation reported in [40] for S&P100 options. Clearly
eXclusive OR problem where none of the individual inputs we encounter a strong mean reverting phenomena in our
has any explanatory power but it is only when both of them Ibex35 data. Big movements are followed by smaller changes
are put together that the solution is achievable. We select the but with opposite sign. This behavior could be induced by
simple version of the forward search in order to reduce the different causes. Some primary causes for Ibex35 are described
computational requirements. Nevertheless, in order to account in [61] also in [58]. A similar behavior is encountered by
for potential interaction effects we shall use a “short-cut” Jacquillat et al. [44] on the Paris Stock Exchange, and [64]
whereby we construct a single model with the entire universe attributes the behavior to the bid-ask bounce.
of variables at the end of the procedure. If the performance Our task is to identify a parsimonious representation of the
of this model (in the validation set) is better than the baseline generating process for this data set. If the autocorrelations have
model (in unadjusted terms) we shall take this as an indication a cutoff point, that is, if they are zero for all greater than
of significant interaction. some small number and the partial autocorrelations taper off
1230 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

TABLE II
STATISTICAL PROPERTIES OF HOURLY Ibex35 MARKET CHANGES OF IMPLIED VOLATILITY

TABLE III
UNIVARIATE ESTIMATION OF HOURLY CHANGES OF IMPLIED VOLATILITY

for growing an MA representation is suggested. In our case, to 36.54% for the first subsample. To check the overall
see Table II, the autocorrelations have a cutoff point after acceptability of the residual autocorrelations (white noise),
the first lag, and the partial autocorrelations are reduced in the portmanteau statistic, Q statistic is often used. If the
magnitude slowly. We identify an MA of order 1 for every ARMA orders are well specified, for an ARMA(p, q) process
subsample period. The MA(1) estimation results are shown in the statistic Q is approximately distributed with - -
Table III. degrees of freedom. In Table III, the hypothesis of white noise
In every subsample period, the coefficient of the MA(1) is disturbance can be accepted for the residuals. Nevertheless, a
large and negative, ranging form 0.7213 in subsample 1 to number of researchers have detected intertemporal relations
0.6865 in subsample 3. These coefficients are all significant, between expected volatility and market information. Let us
with -statistics above 10.00. They are also consistent with the now turn our attention more explicitly to these relations.
moving average identification reported in Table II, confirming
the strong mean reversion effect of the data. E. Identifying Exogenous Influences
This simple linear model is able to explain a large proportion We are interested in assessing the degree to which the
of the variability in changes of implied volatility. Adjusted exogenous variables in Table I have a significant influence in
-squared values range from 34.97% for the entire sample explaining changes in implied volatility, and whether by using
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1231

TABLE IV
ESTIMATION RESULTS FOR MLR EQUATIONS ON HOURLY CHANGES IN IMPLIED VOLATILITY

these variables we obtain incremental value over and above the improvement in model fit. (i.e., 40.4% Vs 34.9% in terms of
time series model. We use the regression model proposed in adjusted R-squared).
the second step of our modeling strategy [see (7)]. To account The results are largely consistent with those obtained in
for the strong MA(1) component in the regression equation Table III, and confirm the strong negative linear relation
we incorporate an autoregressive element in the multivariate between changes in implied volatility and its lagged values.
model. If we reverse the MA(1) model The coefficients for lag one, two, and three are negative and
significant, decreasing in magnitude. This pattern is present
in every subsample period, dominating the explanation of
the output variable. While the autoregressive terms represent
we obtain
this negative relationship, new variables (moneyness, change
in spot, and maturity effect) appear in the equation adding
extra value. Note the stability of estimation results among the
subsample periods. The coefficient for moneyness is always
positive and significant while the change in spot variable has
a negative linear relation with the output variable. The slope
that is when
of the dummy variable, maturity effect, is also significant.
(11) (Volume, which is not shown on the table, also appears as
significant but only in the second subsample with a weak
The acf and pacf in Table II suggest that the first three lags contribution to the model -statistic of 1.831).
are the only ones that make a significant contribution to the All the estimations present a high -squared value. Evi-
explanation of changes in implied volatility. After lag3 for dence in Table IV indicates that there is an improvement in
each subsample there is a significant cutoff in the pacf. We explaining the variability of the output variable, that ranges
shall therefore use three autoregressive variables in our input from 12.8% for the third subsample to 2% in the first sub-
set: and to take into account sample. To test normality for the residuals we examine the
the infinite autoregressive order of (11). skewness and kurtosis values. A skewness of zero indicates
Our regression analysis is based on backward stepwise that the data are symmetrically distributed. Positive values of
variable selection. With backward selection, we begin with a skewness suggest that the upper tail of the curve is longer than
model that contains all of the candidate variables and eliminate the lower tail, while negative values suggest that the lower
them one at a time. We check at each stage to verify that pre- tail is longer. To assess the significance of the skewness value
viously removed variables are still not significant. We reenter we use its standardized value . We can reject the hypothesis
variables into the model if they become significant when other about normality of the residuals at the 0.05 significance level
variables are removed. We run regression estimations for all for every subsample. For a normal distribution, the kurtosis
three subsamples. coefficient is zero. When the coefficient is less than zero, the
Table IV gives a summary of the results. Overall, the curve is flat and when is greater than zero, the curve is either
addition of exogenous variables gives a small but significant very steep at the center or has relatively long tails. Table IV
1232 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

TABLE V
LEARNING RESULTS FOR DIFFERENT TOPOLOGIES AND TRAINING TIME

shows that the residuals are far from being Gaussian. They sets of performance measures are shown for the validation
represent curves with relatively long tails for each subsample. sample ( observations), and the training sample
In summary, the normality assumption does not hold but ( observations). For each sample the table shows
residuals do not present serial correlation (Durbin–Watson the root mean square error (RMSE), the number of iterations at
statistics range from 1.950 in subsample three to 2.060 in which minimal was obtained (Converg. It.), the correlation
subsample one). between estimated and observed ( ), and the percentage of
Having analyzed the time structure of the implied volatility correctly predicted directional changes (Poccid).
series, and shown that a small but significant incremental value The performance results are stable and similar for different
can be obtained with the use of additional variables and partic- topologies (except for Net3). The correlations between esti-
ularly moneyness, changes in spot, volume, and maturity, the mated and observed in the cross validation set range from
next step in the modeling procedure is to investigate whether 0.6465 for Net2 to 0.6941 for Net1 all of which are better
nonlinear analysis using NN’s can add further incremental than the in-sample correlation for multiple linear regression
value. (i.e., 0.635, see Table IV). Rmse values range from 0.075 15
in Net1 to 0.091 84 in Net3. Correlation measures of the vector
F. Nonlinear Dependencies and Variable Interactions are around 0.65 except for Net3 where the correlation
We start our nonlinear analysis with the third step of phase is 0.47.
one of our modeling strategy whereby we use a multivariate The first and simplest neural model (Net1) with two hid-
neural model with the variables and autoregressive terms den units does not appear to be able to fit the data well.
identified in Step 2) to construct a well-specified neural model Its in-sample RMSE remains constant at 0.1063 throughout
of implied volatility changes. We then verify if incremental training, with a correlation between actual and predicted of
value can be obtained by including additional variables in 0.6578 which is only marginally better than the multiple linear
the model through a stepwise forward procedure (phase 2 in regression (i.e., 0.635 see Table IV). Also its performance (in
our modeling strategy). To control model selection we make terms of correlation ) in the validation set is suspiciously
use of a simple statistical technique based on cross validation. higher than that in the training set.
cross validation requires that the available dataset is divided The third neural model (Net3) with four hidden units appears
into two parts: a training set, used for determining the values to have been trapped in a local minimum with a mean square
of the weights and biases, and a validation set, used for error similar to the first network and an in-sample correlation
deciding when to stop training. In the experiments reported between actual and predicted of 0.41 which is worse than the
here we allocate the earlier available timespan data consisting multiple linear regression.
of 400 observations to the training set and the following 100 In principle, models with the characteristics of Net1 and
observations to the validation set. The final 50 observations Net3 exhibit all the signs of a misspecified model. Indeed,
will be used as an (one-off) ex-ante test set to compare all a test for serial correlation on the residuals of both these
methods. We use the common backpropagation algorithm for models shows that their residuals are autocorrelated to nearly
estimating the network weights with one layer of (up to five) the same level as the original implied volatility series with one
hidden units, a logistic energy transfer function, linear outputs, significant lag . This can also be seen in their
a weight decay term (set to 0.001) and a momentum term D.W. statistics. Although we cannot formally test the D.W.
(set to 0.1). for these two nonlinear models, we shall reject both models as
Clearly, since the predictive variables have already been misspecified. Misspecified models can occur with NN’s for the
selected according to a linear criterion (see Table III) any classical reasons i.e., omitting important variables or fitting a
incremental value obtained in this step will be primarily low-order polynomial to a dataset which has been generated by
attributable to possible interactions between the exogenous a high-order nonlinear process. Being unable to escape from
variables and/or time dependencies. A summary of the results a local minimum (as for example with Net3) or not having a
is shown in Table V for four different neural models with sufficiently high number of free parameters (or a combination
two, three, four, and five hidden units, respectively. Three of the two) can also lead to a badly specified model.
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1233

TABLE VI TABLE VII


IN-SAMPLE PERFORMANCE ALL DATA FORWARD VARIABLE SELECTION—FIRST FORWARD PASS

For both the remaining models Net2 and Net4 with three
and five hidden units, respectively, we can accept the null
hypothesis of the residuals being white noise disturbances.
Net4 appears to give a better fit for the in-sample data
( as opposed to ) but not overly so.
This can be seen in the validation figures (i.e.,
versus ). Thus, we select the model provided by where we are generally prepared to allow variables to enter the
Net4 for further experiments. model with a relatively low -statistic. However, by using the
Table VI gives a summary of the overall results so far. -statistic we are effectively testing the hypothesis of variable
Clearly, the multivariate neural model has provided significant significance at a specified level of confidence. With neural
incremental value in terms of explaining changes in implied models it is not straightforward to test the same hypothesis as
volatility over the MLR and Box–Jenkins models. As we
it involves making strong and potentially invalid assumptions
shall see in Section II-H, this incremental value is persistent
(see for example the seminal work of [76] or [62] for variable
in the ex-ante test set and also in the economic evaluation
significance testing; and also [3] for omitted nonlinearity). An
of the models. But before evaluating the models on the
ex-ante test set let us examine whether further incremental alternative criterion is goodness of fit but given the fact that
value can be achieved by including additional variables which NN’s can fit the data arbitrarily close and the validation sample
although rejected by the linear criterion in the first phase of is relatively small it is desirable to use a stricter criterion for
our modeling methodology, may nevertheless have significant (variable) entry (into the model).
explanatory power in the nonlinear sense. The results of the first pass are summarized in Table VII.
From the remaining eight variables, it appears that the addition
G. Nonlinear Variable Selection of volume, day-effect, velocity, average spread, and maturity
In this, the second, phase of our model identification proce- can produce models with better correlation between observed
dure we attempt to identify any residual nonlinear dependen- and estimated in the cross validation period. However, when
cies that may be present through a process of forward variable adjusted for additional complexity none of these variables ap-
selection. pears to add significant incremental value, over and above the
The first step in this procedure is to compute a complexity- benchmark multivariate neural model (with
adjusted payoff measure which evaluates the change in model .
error that would result if an additional variable, were The results shown in Table VII essentially preclude any
added to the model. The payoff (is analogous to the Akaike further analysis for the step wise forward procedure. With
information criterion and) penalizes the gain in performance perhaps less restrictive penalties for the additional complexity
by a complexity term which depends upon the sample size and it might have been desirable to include volume, and day effect
the additional degrees of freedom. Recall, that it is computed but our conservative use of extra complexity precludes that.
by (10) where is the correlation between Instead of running the risk of overfitting the data we choose
observed and estimated (in the cross validation set) given by to verify this result by a variable significance estimation
a model which includes all previous variables plus an procedure which operates in the opposite, i.e., backward
additional variable . The second term in the equation fashion.
is the complexity penalty term where is the sample size Backward variable selection operates in the opposite direc-
and is the additional number of free parameters. So, for
tion. It is similar to the sensitivity based pruning introduced
example if we have a network with four hidden units, the
by Moody and Utans [53] whereby we fit a model with all
addition of a new variable will use a maximum of new
available variables (using cross validation) and attempt to
parameters.
With our validation sample size fixed to 100 and a determine the most significant variables by computing the
maximum number of potential degrees of freedom change in error that would result if variable were removed
(i.e., the new connections from the additional variable to all from the network. By setting each of the input variables to
five hidden units), the complexity penalty term grows almost its mean value (one at a time), we compute the measure
linearly with . In other words we shall only include a new , which assesses the overall contribution to prediction
variable if it contributes to a performance gain of 5% or accuracy due to that particular variable. We use two criteria
more. This is a relatively strong criterion for variable entry for prediction accuracy: 1) the RMSE between observed and
particularly when compared to forward regression analysis estimated in the validation set and 2) the correlation between
1234 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

TABLE VIII
LEARNING RESULTS FOR DIFFERENT TOPOLOGIES AND TRAINING TIME. ALL DATA

observed and estimated (also in the validation) TABLE IX


STATISTIC < BASED ON RMSE CORRELATION
(26)
and

(27)

where lies between zero and one. Note that in computing


no retraining of the network is required. By ranking the
importance of all variables using a nonlinear criterion, the
objective of this test is simply to verify or refute the hypothesis
that no significant variable have been left out without having
to incur the cost of iteratively reestimating models with all
possible combinations of variables. This test differs from
Moody and Utans [53] in that the sensitivity (contribution)
of each variable is computed only for the validation set and
in that it uses the ratio rather than the difference in the two
error measures.
The results are shown in Tables VIII and IX for different the table. However, there are two new variables that appear at
neural models and different variables, respectively. Table VIII the top of the table: Historical volatility, day effect and volume,
shows the performance measures obtained for each network’s but this is not consistent. Historical volatility is at the bottom
crossvalidation, and training sets ( and Insam, respectively). of the table with the RMSE measure of while volume and
The performance results are similar for different topologies day effect are inconsistent in that their importance varies with
(except for NET1—note its D.W. statistic). the subsampling period.
Overall including all 14 variables in the model gives worse It also appears from Table IX that the contribution of these
results than both the multiple linear regression with the six two variables becomes important when they are both included
most significant variables and the multivariate neural model in the model rather than through their individual contribution
with the same six variables. RMSE measures in the validation (as the forward analysis has shown) indicating an interaction
set ranges from 0.0833 in NET3 to 0.0966 in NET1. Corre- effect between volume and day of the week. With hindsight
lation analysis of the pair of vectors (observed, estimated) is it might have been desirable to continue the forward selection
around 0.59 except for NET1 where correlation is below 0.40 process beyond the first pass. Ultimately there is a price to be
(again note also its D.W. statistic). Again, Poccid shows a paid for being (perhaps too) conservative with the complexity
very stable behavior around 75% for all the networks trained, penalty term in the payoff criterion which we are prepared to
except for NET1 where Poccid is a low 55%. accept.
The objective of this test, however, is to verify that no So far we have studied the in-sample and/or validation set
significant variable have been omitted from the nonlinear performance of the models, and have concluded that a simple
analysis. Table IX gives a ranking of all 14 variables according step-wise modeling strategy can capture both the main linear
to both versions of the metric. The results broadly confirm and nonlinear features of the data. In addition, we have shown
the information that we had extracted from the univariate that there is significant incremental value to be obtained from
and multivariate linear analysis. The contribution of the au- nonlinear dependencies and interaction between the variables
toregressive variables, Vol.lag(1), Vol.lag(2), and Vol.lag(3) is (see Table V). The ultimate goal, however, is to predict future
evident, confirming the strong mean reversion effect in the values of the time series. In the next section we compare
data. These variables appear always in the top part of the the ex-ante forecasting performance of Box–Jenkins univariate
table for both versions of the metric. The maturity effect, predictors, forecasts from standard MLR and nonlinear neural
moneyness, and changes in spot also appear in the top half of models.
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1235

TABLE X
OUTSAMPLE PERFORMANCE. ALL DATA (a)

H. Ex Ante Evaluation of The Models The economic evaluation of the models involves the use
The ex-ante test set consists of 50 observations held out of of a simple “trading strategy” for a market participant using
the sample to test the validity of our findings. The three models the predictions of the neural, MLR, and time-series models to
are evaluated both in terms of their prediction accuracy on this purchase or sell delta-hedged short maturity call options in the
test set and in terms of their the economic performance. Ibex35 option market (between ten and 45 days). This requires
The prediction accuracy of each model is evaluated by a the purchase/sale of at-the-money call options on Ibex35 in
single-step ahead prediction. At the closing of each trading every time border. The option positions are held until the
hour, when the values of all independent variables are known end of the hour at which point another position is established
we make a forecast for the change in implied volatility over based on the forecasted direction by the models. In a general
the next trading hour. The forecasts of each model are then sense, the strategy will profit when the forecasted direction
compared with the actual changes. Two measures are used to is the same as the true direction taken by the market and
evaluate forecasting accuracy. The correlation between actual when the true implied volatility quotes remain rather volatile
and predicted over the unseen 50 observations, and Poccid: or in other words the magnitude of the movement of true
the percentage of correctly predicted directional changes. The volatility is large. We conduct the evaluation in two stages. In
results for each model are shown in Table X. Recall that the the first stage we ignore transaction costs and assume that
Box–Jenkins and MLR models have been estimated over the trades are available at the end of every time border. The
entire sample of 500 observations, whereas the neural model number of trades is set to 50. The profit/loss figures are based
has used only the first 400 of these observations directly for on hourly investments of 100 option contracts and do not
training. incorporate reinvested profits. Clearly, this is a simplification
The results both in terms of correlation and in directional of real market conditions.
changes are consistent with the expectations from the in- The cumulative profit curves associated with each model
sample period (and in the case of neural models with the forecasts (NN, MLR, and Box–Jenkins) show a steady prof-
validation period). The neural model yields correlation val- itability for each model (see Fig. 3).
ues around 0.63 against 0.55 of the MLR model and 0.50 Though cumulative profits have a slight drawdown near the
of the univariate model. The improvement obtained by the end of the forecasted period for the three models, the trends for
neural model is significant in terms of . Starting from the remainder of the period are strong and consistent. The cu-
the Box–Jenkins model which uses the least information mulative profit curve for the MA(1) model predictions presents
available (i.e., the implied volatility series alone) it is pos- a considerable number of peaks and troughs. Nonetheless, the
sible to add extra value by introducing exogenous influences strategy using the forecasts from the MA(1) model otherwise
via new inputs: first in a straightforward linear model and earns consistent and positive profits. The NN predictions are
then considering possible nonlinear interactions among the more profitable than MLR or univariate predictions. Univariate
variables. forecasts present the worst performance overall. The ability to
Financial data series are very often nonstationary. One test consistently profit from the trading strategy based on any of
that is often used to safeguard against spurious models is the model predictions may be due to the strong mean reversion
to divide the training data into a number of subsamples. effect reported in earlier sections. In that sense, these profits
A separate model is then reestimated for each subsample are not riskless. Their riskiness must also be considered in
and a test for stationarity is performed on the parameters of evaluating their significance.
each model. If the separate models are radically different or Table XII provides a measure of the risk-adjusted returns of
their performance on the ex-ante test set is radically different the strategy. The mean hourly profit is reported, along with
then there is risk that the estimated models have captured the -statistic with the null hypothesis that the sample mean
some temporary and perhaps unrepeatable relationship rather is zero. Since the divisor of the computation is the sample
than a relationship which is invariant through time. The standard divided by the squared root of the sample size, the
results, shown in Table X, are consistent with those obtained -statistic (other than a scaling factor) bear a correspondence
with applying the modeling strategy to each of the three to the reward/risk tradeoff of each model.
subsampling periods as shown in Table II. The analysis is Table XII indicates that the profits based on the forecasts
described in [61] and safely concludes that all models give of all models are impressively large. The reward/risk ratios
acceptably stable performance. ranges from 1.99 for MA(1) model to 4.00 for the NN
1236 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

TABLE XII
HOURLY PROFITS FOR Ibex35 OPTION TRADING STRATEGY (a)

Fig. 3. Profit and loss of different models under a simplistic trading strategy.

Fig. 4. Profit and loss of different models, adjusted for transaction costs.

system. This evidence supports the notion that model forecasts ratio ranges from 0.233 for the univariate model to 2.06 for
are able to give economic value to the market participant. the NN model. Clearly, trading the forecasts of the neural
However, to demonstrate that the strategy can benefit from model appears to have economic significance in contrast with
the forecasting models, it is also required that we include the MLR and univariate models.
the effect of transaction costs. This is shown in Fig. 4 Also So far we have described a step-wise model selection
the last two columns of Table XII indicate the influence of procedure for NN’s and evaluated its use in forecasting implied
bid/ask transactions costs on trading strategy profits. As the volatility changes. We have shown that this simple procedure is
results indicate, the introduction of transaction costs (at 0.5%) successful in capturing the main linear and nonlinear depen-
virtually eliminates the profitability of the linear models but dencies of implied volatility to various exogenous variables
not so for the neural forecasts. The -statistic or reward/risk described in the literature, and that in the case of Ibex-35
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1237

options neural models give significant incremental value both are influenced by the situation of the strike. In other words,
in terms of forecasting accuracy and also in economic in it indicates how the volatility smile changes through time
terms. However, it has been argued that NN’s are “black with changes in the strike. When moneyness is close to zero,
boxes” which are difficult to analyze and understand and hence i.e., strike is close to spot price, the influence of this input
competitive rather than synergetic with theory formulation. In variable is minimal. Conversely, when moneyness moves away
the next section, we use sensitivity analysis to obtain some from zero the influence on the output becomes larger although
insight into the nature of the relationship that has been captured this relation appears to be asymmetric. For in-the-money
by the neural model between innovations in implied volatility options moneyness seems to influence the changes
and its determinant factors. in volatility to a higher degree than for out-of-the-money
options . Again, this result has important economic
implications for market participants. The neural model has
I. Sensitivity Analysis succeed in capturing the importance of this variable and in
Fig. 5 shows graphs of the response functions of the output doing so providing valuable information.
with respect to the inputs. The graphs obtained refer to the We described a step-wise model selection procedure for
training of NET4. Recall that the variables Vol.lag1, Vol.lag2, NN’s and evaluated its use in forecasting implied volatility
and Vol.lag3 are those with the greatest influence on the change changes. We have shown that this simple procedure is success-
in implied volatility and they should represent a mean reverting ful in capturing the main linear and nonlinear dependencies of
effect. Fig. 5 clearly confirms this effect. implied volatility to various exogenous variables described in
Fig. 6 indicates how the independent variable, volatility the literature, and that in the case of Ibex-35 options neural
changes, moves with the maturity effect variable while holding models give significant incremental value both in terms of
the remaining inputs constant to their most typical value forecasting accuracy and also in economic in terms.
(median). The neural model activates only two different points The modeling strategy is based on a integrative approach,
in the maturity effect domain, i.e., zero and one. This result is combining information in a hierarchical building process. We
not surprising since this is a dummy variable (0,1). What the departed from the least available information set given by
model does is to sensitise the output variable near these two the univariate series itself using the well-known Box–Jenkins
points. Note that the influence on volatility changes is minimal time series analysis. From this bottom line, we incorporated
in the intermediate values of the interval (0,1). exogenous influences, first making use of a linear econometric
The model succeeds in recognizing and differentiating the tool (MLR) and then implementing a more complex nonlinear
effect of the dummy in the data, separating the changes in model using NN’s. This strategy intends to overcome the lack
volatility occurred in days where the maturity of the contract is of a systematic model-building process for neural models, and
equal or less than three days (dummy ) from those realized aims to make neurotechnology a more understandable and
when the option has a maturity longer than three trading days useful tool for financial economists.
(dummy ). The eventual relationship between these two Besides the problem of variable selection, the construction
variables is not entirely clear since the extreme values zero of reliable neural estimators for financial data entails two
and one, appear to compensate each other. further problems: dealing with nonstationary data and handling
Fig. 7 depicts how the independent variable varies as we influential observations or leveraged data. In this section we
change the other input variable in our model, (changes in spot). addressed the problem of nonstationarity by differencing the
Large changes in spot would typically induce similarly large implied volatility series. In the next section we use the concept
changes in implied volatility. Alternatively, when changes in of conditional cointegration as a framework for handling
spot are relatively small, the derivatives with respect to this nonstationary data which has the added benefit of enhancing
variable are almost flat. This behavior of volatility changes statistical inference. Within this framework, we describe an
with respect to changes in spot is what we would expect but alternative way for principled variable selection and address
which is undetectable with linear methods.. the issue of robustness to influential observations.
Fig. 8 shows how changes in volatility moves with the
moneyness variable. Hull and White [43] and Stein and Stein III. MODELING COINTEGRATION IN
[72] have shown that it is rational for implied volatility to
INTERNATIONAL EQUITY INDEX FUTURES
vary with the strike price , when the asset volatility is
believed to be stochastic. When special assumptions are made,
their equations and calculations show that a plot of theoretical A. Overview
implieds against displays a smile: the function has a U- Recent years have witnessed a growing dissatisfaction with
shape and the minimum implied occurs at a value of near the stationarity and ergodicity assumptions upon which the
the forward price of the underlying asset. bulk of time-series theory has been founded. Although these
Some authors have reported smile effects in different data assumptions may be reasonable for many time series in the
sets [69], [71], [32]. Pozo [58] has argued that U-shaped natural sciences they are rather too restrictive for economic
pattern occurred for Ibex35 options during various subperiods and financial data series most of which appear to have first
between 1991 and 1993. It is not the task of this study to and second unconditional moments that are far from being
look for volatility smile patterns in the Ibex35 data. However, time invariant. A time series can be nonstationary in an
Fig. 8 gives a measure of the way in which volatility changes unlimited number of ways but one particular class of non-
1238 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

(a)

(b)

(c)
Fig. 5. Mean reversion effects in volatility.

stationary process has monopolized the interest of econome- integrated behavior, as well as on statistical inference in
tricians, namely that of integrated (i.e., stochastically trending) the presence of integrated variables, has appeared and keeps
processes primarily due to Granger and Newbold [35]. They growing steadily. However, when modeling only in terms of
advocated the use of differencing as a way of dealing with differenced variables considerable long-run information (which
nonstationarity. As a result, a vast literature on testing for might be present on the levels of the variables), is lost and
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1239

Fig. 6. Volatility changes as a function of the maturity effect.

Fig. 7. Volatility changes as a function of changes in the underlying spot.

Fig. 8. Volatility changes as a function of moneyness.

statistical inference may be impeded. This is particularly likely, due for example to random shocks or indeed due to
important in finance where theory suggests the existence of deterministic but unknown factors, these deviations tend to
long-run equilibrium relationships among variables. Although be bounded through the actions of a variety of agents which
short-run deviations from the equilibrium point are most act as stabilizing mechanisms bringing the system back to
1240 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

equilibrium. Granger [36]–[38] and Engle [25] developed what provided by extending the model to capture any dependencies
can be regarded as the statistical counterpart of these ideas: of the cointegration on other factors both in the linear and
the concept of cointegration. nonlinear sense. In the third stage we apply a nonlinear
Cointegration allows individual time series to be stationary variable selection methodology to identify the most significant
in first differences (but not in levels), while some linear of the (unknown) factors which influence the strength of the
combinations of the series are stationary in levels. By inter- cointegration. Finally in the fourth stage we address the issue
preting such a linear combination as a long-run relationship of robustness to outliers and influential observations and show
or an “attractor” of the system, cointegration implies that that combining a median estimator with the less robust mean-
deviations from this attractor are stationary, even though squared-error estimator, using a simple trading rule, further
the series themselves have infinite variance. Cointegration enhances the out-of-sample risk/return performance of the
provides a way of alleviating the inefficiencies caused by system.
the disuse of long-run information (available on the levels
of the variables) while considerably facilitating statistical B. Experimental Setup—Identifying Cointegration
inference. The need to make use of information about long- We examine cointegration among international equity fu-
run relationships among the levels of variables has long been tures indexes mostly drawn from the European markets. The
recognized by the advocates of error correction mechanisms idea behind cointegration is quite simple: there are markets
(ECM’s) [66], [19], [42], which attempt to incorporate both and/or economic variables which in the long run share a
short-run dynamics and long-run information through the common stochastic trend but from which there may be tem-
use of error-correction terms in linear regression. There is porary divergences. Such markets are called cointegrated if
however an impressive relationship between error correction the residuals of the regression of one variable on another
mechanisms and cointegration. Granger [37] showed that are stationary (mean-reverting). The hypothesis is that these
if a set of variables is cointegrated, then it has an ECM residuals, which represent temporary discrepancies in the
representation and conversely, an ECM always produces a set relative values of the variables, are due to factors outside the
of variables that are cointegrated. This means that for variables cointegration and, in the short run, current events may take
that move stochastically together over time the ECM model priority over the longer term cointegration.
provides a linear representation which is capable of adequately To demonstrate how cointegration works let us consider the
representing the time series properties of the data set. index of FTSE futures and its relationship with a basket of
Such models of cointegration have been very useful in other indexes, comprising the U.S. S&P, German Dax, French
expanding our understanding of financial time series; never- Cac, Dutch Eoe and Swiss SMI. The data are daily closing
theless many empirical “anomalies” have remained unexplain- prices for all the indexes from 6 June 1988 to 17 November
able. This may be due to the linear nature of the models and 1993.
the fact that short-run deviations from the equilibrium point are The procedure for identifying cointegration is as follows:
attributed to random shocks. It is entirely plausible that short- firstly we regress the level of the FTSE on the levels of the
run deviations may be (at least partially) attributable to other other indexes. The coefficients of the regression represent the
economic factors. For example, if the short term influence amounts of each of the other indexes which should be held in a
of other factors can disturb the longer term cointegration, portfolio in order to obtain, on average, the same performance
we would hope to be able to model the strength of the as the FTSE. Saying that the indexes are cointegrated is
cointegration as a function of other events occurring in the equivalent to saying that the markets have an idea of a “fair”
financial markets. For instance, it is entirely plausible to level of the FTSE compared to the other indexes and that if
suggest that a rapid increase in oil prices would tend to the FTSE rises above or falls below this level then it will tend
decouple two cointegrated markets, especially if one belonged to move back toward it; this mean reversion usually takes
to an oil-producing nation and the other to an oil consumer. place over a longish time period (weeks or months). In a
Unfortunately, traditional linear models fail to capture this type statistical sense we can test for a stable relationship over time
of behavior because the sensitivity of the model to a given by testing the residuals of this regression for stationarity. This
factor is constant and reflected by a single coefficient. In a is a straightforward procedure using the well-known “unit root
traditional regression model this coefficient comes to represent test” [22].
the average strength of the relationship between asset returns Fig. 9 shows the levels of FTSE plotted together with the
and a given factor. Subsequently it would be desirable to levels of a basket of DAX, CAC, EoE, S&P, and SMI. The
develop nonlinear models of cointegration in which short-run proportions of each of the other indexes is determined by their
deviation from the equilibrium point are conditioned on other beta in the regression
factors which may effect the strength of the cointegration in
FTSE DAX EoE S P
nonlinear ways.
In this section we introduce the concept of “conditional SMI (28)
cointegration” and show how this provides both a framework where is random noise.
for dealing with nonstationarity and enhancing statistical infer- It easy to see that the FTSE is periodically under/over
ence. The methodology consists of four stages. In the first stage valued. This becomes clearer when we plot the residuals
we verify the presence of cointegration in the linear sense. of the regression which show a clear mean-reversion (see
In the second stage we examine if incremental value can be Fig. 10).
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1241

Fig. 9. Cointegration between the FTSE and a portfolio of international indexes—in level terms, the six indexes are individually stochastically trending
together over time but in the long run the system is brought back to equilibrium.

Fig. 10. Residuals from the cointegrating regression.

We test the residuals for stationarity using the unit-root test. The key difference between cointegration and no-arbitrage
We find that the critical value for the 99% confidence level relationships is that cointegration is a statistical rather than
of the stationarity test is 3.5 and that the actual value which a guaranteed relationship; the fact that prices have moved
we obtain is 5.2. This value is higher than the critical value together in the past can suggest that they will continue to do
and hence shows that the cointegration effect is statistically so in future but can not guarantee that this will in fact happen.
significant. Because of this cointegration does not offer a riskless profit
If cointegration were the only factor at work then any and is perhaps best considered as statistical arbitrage to reflect
small discrepancies in prices would rapidly be eliminated the similarity to “normal” arbitrage while acknowledging the
by the mean-reversion effect. In fact this would then be presence of uncertainty or risk.
indistinguishable from a conventional “no arbitrage” situation If the short-term influence of other factors can disturb the
in which price discrepancies can be exploited to generate longer term cointegration, we would hope to be able to model
a riskless profit. However, this is clearly not the case in the strength of the cointegration as a function of other events
this instance where significant mispricings can occur and can occurring in the financial markets. Table XIII describes a set of
persist for long periods of time. This reflects the fact that candidate variables which might explain short-run deviations
economic, financial, political, and industry specific factors from the equilibrium point (i.e., the way in which the residuals
will influence the relative prices of the different markets. fluctuate).
The cointegration hypothesis suggests that in spite of these For instance, it is entirely plausible to suggest that a rapid
factors the markets will move together in the long term change in oil prices would tend to decouple two cointegrated
and that a “mispricing” will cause upward price pressure markets, especially if one belonged to an oil-producing nation
on the undervalued asset and downward pressure on the and the other to an oil consumer. Unfortunately, traditional
overvalued asset, tending to push the prices of the two assets linear models fail to capture this type of behavior because
together. the sensitivity of the model to a given factor is constant
1242 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

TABLE XIII
SUMMARY OFCANDIDATE VARIABLES

and reflected by a single coefficient. In a traditional regres- mator which is robust to influential observations and leverage
sion model this coefficient comes to represent the average data.
strength of the relationship between asset returns and a given There are many ways of dealing with these issues. It is,
factor. for example, entirely acceptable to use the modeling strategy
However, this situation changes when we consider NN described in the previous section of this paper to identify time
models. The powerful universal approximation abilities of structure in the cointegrating residuals, exogenous influences,
NN’s allow them to represent arbitrary nonlinear functions. and nonlinear dependencies. However, with this method the
An important corollary of this is that the sensitivity (or partial criteria for variable selection and hypothesis testing are to a
derivative) of the model to a given factor can arbitrarily vary large extent model-dependent. It is often desirable for such
as a function of the other inputs to the model. In the case criteria to be independent of the actual model. Although not
of cointegration, this would allow us to model a relationship always possible, this is particularly true for models which are
which is strongly cointegrated under certain circumstances, susceptible to overfitting. In this section we shall deploy a
weakly cointegrated under others and perhaps even negatively slightly different procedure for the purposes of illustrating
cointegrated under yet others. an alternative way of variable selection (which is model
This notion of conditional cointegration becomes particu- independent) and hypothesis testing. The procedure consists
larly relevant when we consider the increasing efficiency of of the following steps.
the financial markets. Cointegration effects strong enough to 1) Identifying cointegration: Using multiple linear regres-
prevail under all market conditions would be relatively easy sion regress the level of the dependent variable against
to detect using linear techniques, and hence would be unlikely the levels of the independent variables and test the
to generate excess profits for very long. On the other hand, residuals for stationarity
more subtle conditional cointegration effects are less amenable
to linear techniques and hence are less likely to have been
(29)
eliminated by market activity. For instance, two assets might
exhibit an average cointegration which generates insufficient (30)
profits, but this might be a composite of a strong (profitable)
cointegration under certain circumstances and a weak (or even The test for stationarity involves constructing the auxil-
negative) cointegration under other circumstances. NN’s and iary regression in (30) and rejecting the hypothesis that
other nonlinear methodologies provide a tool for identifying This simple method of identifying cointegration
and exploiting this type of relationship. From a technical per- may not be appropriate in all cases (for example when
spective, the concept of conditional cointegration provides an there is no obvious causal relationship it can produce
economically and financially feasible framework for applying biased coefficients); nevertheless since it is one of the
NN’s to financial modeling and forecasting applications. simplest available we retain its use. For more sophisti-
cated tests of identifying cointegration, see, for example,
[56] and [46].
C. Modeling Strategy 2) Identifying exogenous influences: The objective of this
The modeling strategy described here is applicable to mar- test is to verify if short-run deviations for the equilibrium
kets and instruments where finance theory and/or market point can be attributed to a vector of exogenous variables
dynamics may recognize long-run equilibrium relationships. such as for example those described in Table XIII.
The objective of the procedure is to 1) verify the presence Although this can be achieved by using the approach
of cointegration; 2) identify exogenous factors influencing the described in Section II of this paper, in this step we
short-run dynamics of the cointegration; 3) identify the most shall deploy a slightly different procedure. We start by
significant of those factors; and 4) construct a reliable esti- constructing the following well-specified models of the
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1243

cointegrating residuals (representing risk/return profile) is 2.42 compared to the


1.78 and 1.72 of the individual neural models.
(31) The modeling strategy described above is not necessarily
the only procedure or indeed the most appropriate modeling
strategy. It is, for example, entirely acceptable to use the
(32) strategy described in the previous section of this paper in
Steps 2) and 3) of the procedure. The two methods are not
To test if incremental value can be obtained by the mutually exclusive but they differ primarily in the way in
multivariate linear and neural model we evaluate their which variables are selected for entry into the model. The -
performance on an unseen dataset. For each estimator we test selects variables in a model independent way and in many
test the hypothesis that its performance is significantly respects it is preferable. However, when testing for interaction
better than the auxiliary regression in (30). We also effects, in a large number of dimensions it requires relatively
repeat the test between the two estimators in (31) and large samples which are not always available.
(32). For any two estimators, this is done using the Having applied the first step of our modeling procedure (see
statistic Section III-B) and satisfied the cointegration criterion let us
turn our attention to verifying that the cointegration residuals
(33) reflect the existence of other factors which effect the markets
in the short-run.

where and denote the respective mean performance D. Identifying Exogenous Influences
and and the standard errors of the relevant estima-
tors. The mean performance can be formulated in terms The cointegration residuals reflect the existence of other
of correlation between observed and estimated in the test factors which affect the markets in the short run. Changes
sample. It can also be formulated in terms of economic in the cointegration residual reflect changes in the relative
performance should the two estimators be used in the return of the FTSE compared to the basket of international
context of the same trading strategy. The hypothesis is indexes [the RHS of (28)]. The evaluation of nonlinear and
formulated in terms of H0: , in which linear models will be based on a simple trading strategy which
case the statistic should follow a standard normal takes long/short positions on both sides of (28) based on a
distribution, against the alternative H1: . prediction of whether the return of the FTSE will be higher
3) Identifying nonlinear dependencies and interactions: or lower than that of the basket portfolio over the subsequent
The use of an ANOVA-derived F-test of conditional ten-day forecast period.
expectations is suggested as an alternative means of There is a slight complication to the modeling procedure
preliminary variable selection for NN’s. Unlike other because the nature of the cointegration regression in itself
model-independent methods such as correlation analysis, induces a slight element of mean-reversion into the residuals
it is capable of identifying nonlinearities, either directly, (the insample residuals of a regression are, by construction,
in the relationship between individual explanatory vari- unbiased). Also, for implementation purposes, the cointegrat-
ables and the dependent variable, or indirectly, in the ing relationship should only be estimated using past data.
form of interaction effects. Using this approach we The solution to both of these problems is to use a moving
refine the models down to a small number of variables. window regression for the cointegrating relationship and then
These models both perform better than the original, to generate the (out-of-sample) residuals by applying this
overspecified, models and also make sense from an eco- model to future data. In our case we use a window of 200
nomic viewpoint. Again, the nonlinear model performs points to estimate the coefficients of the regression [the ’s in
substantially better out-of-sample than the equivalent (28)] and reestimate the relationship every 100 points.
linear model. The initial step is to verify the existence of the underlying
Median learning and robustness: The use of a mean- cointegrating relationship itself. Rather than performing a
absolute-deviation cost function leads to an estimator of traditional “in-sample” test such as those discussed above,
the conditional median of the dependent variable and as our focus is to test the predictive information contained in
such is more robust than the conventional estimators of the cointegrating residual, as indicated by the generalization
the mean which arise from the use of MSE/OLS. In par- ability of the models in the out-of-sample period. We test both
ticular, estimators of mean and median will diverge most a linear model of the form:
noticeably under the influence of extreme observations and
such divergence can be taken as an indicator of unreliable (34)
predictions. Using a simple rule for combining forecasts
whereby we take a position if the estimators of mean and Note that modeling the relative price changes on the LHS
median agree on the sign of the predicted return we show of (34) is equivalent to modeling changes in the residuals
that the hybrid system produces very similar out-of-sample except during periods where the cointegrating regression is
returns (approx. 46% over two years) but with substantially reestimated. We then go on to test for direct nonlinearities by
reduced risk; the Sharpe ratio of the combined system building the model which is the nonlinear generalization of
1244 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

Fig. 11. Cumulative profit/loss for the initial linear and nonlinear models.

TABLE XIV
OUT OF SAMPLE PERFORMANCE OF UNI-VARIATE MODELS

(34) given by produced by each of the two models. Secondly, the overall
similarity of the two models combined with the slightly better
(35) performance of the linear model, suggests that there are no
significant nonlinearities in this direct relationship and that
The parameters in (34) and the function in (35) were the slight degradation in the performance of the nonlinear
estimated by gradient descent in the parameter space using model is due to overfitting. For this reason we will use only
a feedforward NN. A moving window modeling approach the performance of the linear model as a benchmark for the
was deployed using the first 300 observations for training, subsequent stages.
the next 100 for cross validation/early-stopping and a further The next step was to examine multivariate models. Starting
100 for out-of-sample testing. The “window” is then shifted with the set of candidate variables listed in Table XIII, we
forward 100 observations and the process repeated until a consider the following two models that might explain the
total of 500 out-of-sample predictions were available. As the relative returns on the two assets. First the linear model
model was built using overlapping daily observations this is
approximately equivalent to a two-year out-of-sample period.
Fig. 11. shows the cumulative returns of the two models in
the out of sample period of 500 observations (transaction costs
of eight basis points are included, which is consistent with the (36)
model being executed through the futures markets).
Selected performance metrics for the two models are shown Second, to account for the possibility of 1) unknown non-
in Table XIV. “Correlation” refers to standard correlation linear effects between these factors and the residuals and 2)
coefficient of predicted and actual returns, “Correct” represents possible nonlinearities accruing from the interaction between
the proportion of predictions with the same sign as the actual these factors, we also consider a nonparametric nonlinear
return, “Return” is the profit generated by a simple long–short model with the same independent variables but of the form
trading strategy over the two-year period and “Sharpe Ratio”
is the (annualized) Sharpe ratio of the equity curve which
indicates risk/return profile. (37)
The first thing to notice is that both models show a highly
significant predictive correlation between actual and forecasted As before, the function in and the parameter vector
relative returns. This is also reflected in the positive equity in (37) were estimated by gradient descent in the parameter
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1245

Fig. 12. Cumulative profit/loss for the multivariate linear and nonlinear models.

TABLE XV
OUT-OF-SAMPLE PERFORMANCE OF MULTIVARIATE MODELS

TABLE XVI
STATISTICAL AND ECONOMIC SIGNIFICANCE OF RESULTS

space using a feedforward NN, and the same moving-window variables, in a linear sense, is offset by overfitting. This is
modeling approach is retained. consistent with the view that the markets should be efficient
Fig. 12 shows the cumulative returns of the two multivariate to commonly available data applied within a conventional
models. modeling framework. On the other hand, the NN model, while
The performance statistics of these two models are summa- only significantly better in the correlation sense at around
rized in Table XV. an 80% level of confidence, does improve on the other two
The results in Table XV indicate that both multivariate mod- models from a trading perspective with a confidence level of
els return a profit during the out-of-sample period. Let us now 90% against the basic model and 99% against the multivariate
test the hypothesis that the results are significantly different linear model.
for the different models. We do this by performing -tests to The much improved performance of the NN model over the
compare both the correlation and the economic performance of linear model suggests the presence of significant nonlinearities
each pair of models. For instance, in comparing the predictive either in the cointegration itself or in the relationships between
correlation of the multivariate models we us the FTSE and the other explanatory variables. Nevertheless, it
was our belief that many of the variables were either redundant
(38) and/or insignificant and that the presence of these was more
likely to cause problems due to overfitting. The next phase of
The results are summarized in Table XVI. the study was to perform a variable selection process in order
On the whole, the performance of the models is broadly to obtain a parsimonious model of the cointegration.
similar, as indicated by the reltively high -values. The multi-
variate linear model is not significantly different from the basic
model in terms of correlation and actually under-performs E. A Methodology for Nonlinear Variable Selection
in terms of trading performance. Thus any small predic- Within a linear framework there are a variety of statistical
tive information which might be contained in the additional procedures which assist in the stages of model identification,
1246 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

construction and verification. The model identification stage of Where is the number of observations in group . Here
these techniques is typically based on correlation analysis. For we lose one degree of freedom for using the sample mean.
time-series modeling for instance there is the well-known and Thus we can also use SSB to estimate the true variance of the
established Box–Jenkins framework, [9], which examination data by dividing by .
of the autocorrelation function (ACF) allows the modeller If the different groups represent samples from the same
to identify both the order and type (autoregressive, moving underlying distribution then SSW and SSB are simply depen-
average, or mixed) of the model. Similarly, for multivariate dent on the underlying true variance. Adjusting for degrees
modeling the most commonly used methodology is that of of freedom, each can be used as an estimate of the variance
stepwise regression, [74], which is likewise based upon the and the ratio given in (41) follows an distribution with
concepts of correlation and covariance. degrees of freedom
Currently there is a lack of an established nonlinear model
SSB
identification procedure. When performing time-series model- (41)
ing using NN’s it is still the norm to use correlation analysis as SSW
a preliminary variable selection technique. However, correla- Under the null hypothesis that all groups are drawn from
tion analysis will in many cases fail to identify the existence of the same distribution then the ratio will be below the 10%
significant nonlinear relationships. This might cause variables critical value nine times out of ten and below the 1% critical
to be erroneously left out of later stages of modeling and result value 99 times out of 100, etc. However if a pattern exists
in poorer models than would be the case had a more powerful then the between group variation will be increased. This will
identification procedure been used. cause a higher -ratio and, if the pattern is sufficiently strong,
Both linear regression models and NN’s are commonly fitted lead to the rejection of the null hypothesis.
by minimizing mean squared error. They provide an estimate 2) Testing a Single Variable: Thus we can perform an
of the mean value of the output variable given the current ANOVA test to establish whether the variance in the
values of the input variable, i.e., a conditional expectation. conditional expectation of given different values of is
This insight allows us to realize why measures of dependence statistically significant.
which have been developed for linear models are not always The first step is to choose the number of groups;
suitable for nonlinear models. typically this would be in the range three–ten. Following
In a broad sense, a linear relationship exists if and only this, each observation is allocated to the appropriate group
if the conditional expectation either consistently increases by dividing the continuous range of the original variable into
or consistently decreases as we increase the value of the nonoverlapping regions. For a normally distributed variable,
independent variable. A nonlinear relationship, however, only using boundary values which correspond to of
requires that the conditional expectation varies as we increase the cumulative normal distribution will cause the number in
the independent variable. The class of functions which would each group to be approximately the same.
thus be missed by a linear measure but identified by a suitable For example, let be normally distributed with mean ten
nonlinear one includes symmetric functions, periodic functions and standard deviation five, let , the values for 1/4, 2/4
(e.g., sinewave), and many others besides. and 3/4 are 0.675, zero, and 0.675. These correspond to
We propose a measure of the degree to which the conditional values of 6.625, ten, and 13.375. Group 1 consists of all those
expectation of the dependent variable varies with a given set observations for which ; group 2 is those for which
of independent variables but which imposes no condition that ; group 3 those where and
this variation should be of a particular form. group 4 those where .
1) Analysis of Variance or ANOVA: The ANOVA tech- The mean value of the dependent variable within each group
nique is a standard statistical technique, usually used for can then be computed. Under the null hypothesis that the
analysis of categorical independent variables, which divides a independent variable contains no useful information about the
sample into groups and then tests to see if the differences dependent variable the -ratio SSB SSW
between the groups are statistically significant. It does this follows an distribution with degrees of
by comparing the variability within the individual groups to freedom.
the variability between the different groups. First, the total 3) Testing Sets of Two or More Independent Variables: We
variability within the groups is calculated by can also test sets of variables simultaneously. For instance,
with two variables the variation between the groups can be
SSW (39) broken down into one component due to the first variable, one
component due to the second variable, and a third component
Where is the group to which the th observation due to the interaction between the two, i.e.,
belongs. With groups we lose degrees of freedom in
estimating the group means, so we can estimate the variance SSB SSB SSB
of the data by SSW . Second, the variability between
the groups is calculated by degrees of freedom (42)

SSB (40) Thus we can use this approach to test directly for both
positive interactions, where the two variables together contain
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1247

Fig. 13. Cumulative profit/loss for parsimonious models.

more information than separately, and also negative interac- TABLE XVII
tions, where some of the information is redundant. PERFORMANCE OF PARSIMONIOUS MODELS

F. Ex-Ante Evaluation of the Models


From the original set of 12 variables in Table XIII we
chose to retain those variables which are consistently the most
significant (i.e., those with the largest -statistic but also those
that retain their significance when we divide the training set Comparing the performance of the network against the
into two consecutive parts). According to these criteria the linear model we find that the appropriate -statistics are
selected variables, at the 99% level of confidence, are the correlation: 1.02 -value 0.15), and economic: 1.39 ( -value
following: 0.082). While these improvements are not proof of nonlinear-
1) the cointegration residual (F-ratio 5.94); ities in the statistical sense (i.e., the -values are too high to
2) the change in oil price (F-ratio 3.87); reject the null hypothesis of equal performance) they represent
3) the change in the sterling index (F-ratio 4.62); confidence levels of over 80 and 90%, respectively, that the
4) the volatility in interest rates (F-ratio 2.76 used alone, nonlinear model has captured interaction effects which are
but 8.47 allowing for two-way combination with coin- ignored by the linear model.
tegration residual).
We then repeated the modeling process using only these four G. Combining Standard and Robust Estimators
selected variables. Fig. 13 shows the cumulative returns of the An important issue when using NN’s to model noisy data
resulting linear and NN models for the out-of-sample period. such as asset returns is that of robustness to influential obser-
As can be seen in Fig. 13, there is a marked improvement on vations. Influential observations are those (groups of) training
the net returns for both models (see also Table XVII). This is vectors which although relatively a small proportion of the
reflected in the improved correlations and correctness but also, sample size they have the potential to dominate the character-
and most importantly, in the Sharpe ratios of the two models. istics of the fitted function. A main reason for this is the use
This suggests that the original models were overspecified and of the mean-squared-error cost function in training NN’s. The
suffered from the problem of overfitting which is particularly effect of using the MSE cost function is to cause the network
dangerous in high-dimensional highly noisy data. Intuitively, to learn the conditional expectation of the dependent variable.
the reduced models make economic sense because they sug- It may be possible to improve the robustness properties of our
gest that uncertainties about short-term interest rates tend to trading system by using an NN trained using a mean absolute
influence the cointegration effect (most likely by distracting deviation (MAD) cost function. As opposed to the mean, this
the attention of participants from longer term issues such as causes the NN to learn the conditional median of a function.
cointegration). The presence of recent changes in oil price In contrast to the mean, any given point can only exert a
in the model is perhaps linked to the fact that the U.K. limited influence on an estimate of the median. Thus, moving
is a net oil producer, whereas the indexes comprising the any single observation to infinity will not drag the median
cointegrating portfolio primarily represent countries which are with it, and even changing the sign of an extreme observation
net consumers of oil. Similarly the changes in the Sterling would only be expected to affect the median very slightly.
index reflect changes in the average strength of the U.K. More detailed intuitive and theoretical treatments of this and
currency against foreign currencies and it is not surprising related issues are included in [74] and [12].
that under certain conditions, this might influence the relative In practice, it is unlikely that we would want to largely
performance of the FTSE against international equity markets. ignore magnitude information and rely solely on a prediction
1248 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

Fig. 14. Equity curve for combined strategy.

of the median. In general we might choose to use the two ratio improves from 2.07 to 2.42, representing almost a 50%
estimates in conjunction; assuming that under normal circum- improvement on the best individual models and over a 150%
stances our estimate of the mean provides useful information improvement on the benchmark cointegration model given by
concerning both direction and magnitude but that we can use (34). The -statistic for the trading performance of the final
our estimate of the median to indicate abnormal circumstances. system, versus that of the basic cointegration model is 2.85,
For instance if the estimates of the mean and median are which is significant at a confidence level of over 99.5%. On
substantially different from each other this suggests that a this evidence at least, it seems that combining the predictions
small number of influential observations are dominating the of mean and median does provide useful information about
model at the particular query point. This suggests that we the reliability of the predictions, which can be translated into
should attach less reliance to the network predictions and improved trading performance.
consequently either downweight our position or stay out of We have described the concept of conditional cointegration
the market completely. This is the approach which we shall and showed how it can be used to deal with nonstationary
adopt here. data and to enhance statistical inference. We have shown that
In the first step we train a network using the MAD cost nonlinear methods can explain short-run deviations from the
function and the four previously selected variables. Notably, equilibrium point better than their linear counterparts. The key
the out-of-sample performance of this model is roughly com- factor contributing to this success is attributable to careful
parable to that of the NN model trained using MSE cost problem formulation. This involves the “intelligent use” of
function. This is not surprising because except under the financial economics theory on market dynamics so that the
presence of extreme observations the mean and median are problem can be formulated in a way which makes the task of
likely to be similar and hence we would expect the two models predictive modeling more likely to succeed. In other words it is
to be broadly the same. much easier to model the dynamics of a (stationary) mispricing
The next step is to combine the two models using a simple than it is to model the dynamics of price fluctuations. In the
trading strategy—namely that a position in the market is only next section we take this approach one step further and show
taken when the two models are in agreement. If the two models how such mispricings can be detected in the term structure of
produce predictions which are opposite in sign then the system interest rates and how they can be exploited, particularly when
we have no a priori knowledge of the exogenous variables
takes a neutral position (i.e., stays out of the market). The
involved. This approach is complementary to variable selection
equity curve for the combined system is shown in Fig. 14.
approaches described in the previous two sections and it is
The performance statistics for both the median network and
particularly useful in cases that there is a high degree of
the combined system are shown in Table XVIII
colinearity between the independent variables.
In practice, the model combination strategy causes us to
take 379 positions (i.e., we are active in the market almost
exactly three quarters of the time). Of these trades 66% are IV. YIELD CURVE ARBITRAGE IN EURODOLLAR FUTURES
correct which is an improvement over either of the individual
networks. Although the overall profit made is very similar A. Overview
to that of the individual networks, it is achieved with fewer To maximize the return on a portfolio of bonds with different
trades and a much improved risk/return profile (as shown in the maturities, it is critical to understand how the price differ-
Sharpe ratio). If we adjust the trading strategy for the combined ences between bond of different maturities (i.e., the so-called
system in such a way as to make the average market exposure yield curve) change over time and what factors drive those
match that of the previous models (i.e., allowing for being changes. Recent research using principal components analysis
out of the market roughly one-quarter of the time) the Sharpe demonstrates that 90–95% of the yield curve movements are
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1249

TABLE XVIII
PERFORMANCE OF MEDIAN AND COMBINED MODELS

(a)

(b)

explained by two uncorellated factors. The first factor accounts that volatility may be mean reverting. If, therefore, one could
for approximately 90% of the variability, the second factor construct a portfolio which immunises against the first two
approximately 5%. [49]. Furthermore, these two factors, have factors, the behavior over time of the third (unknown) factor
approximately been identified as (unpredictable) parallel shifts would be isolated and analyzed. Moreover, if this factor is
and changes in the slope of the yield curve which closely indeed mean-reverting one would expect the value of this
follow a random walk. Parallel shifts are largely a function “immune” portfolio to revert through time. If this reversion
of the long rate, while changes in slope are a function of is consistent or predictable it would in theory be possible to
the spread between the long and short rates. By showing that earn excess returns through arbitrage.3
the correlation between the long rate and spread is very close It is therefore desirable to identify the variables which drive
to zero, [67] indicated that they are nearly equivalent to the changes in this unknown factor and to construct predictive
two “unknown” factors postulated by the principal components models of its behavior. Should these changes be attributable
analysis. to a vector of exogenous variables, this would suggest the use
of a methodology similar to that employed in the previous
These findings are now applied by major financial institu-
two sections of this paper. But, there is a slight complication
tions in every day trading. It is therefore reasonable to assume
in the modeling procedure. In our case it is not clear which,
that any arbitrage opportunities arising from discrepancies in
if any, exogenous factors might influence an asset as abstract
how accurately securities prices reflect yield curve changes
as the “third factor of the Eurodollar yield curve.” We are
resulting from these two factors are quickly and efficiently
therefore confined to using a purely time series approach.
“traded away.” Thus, the yield curve is essentially arbitrage- However, past experience in a variety of financial time-
free with respect to parallel shifts and changes in slope. series applications has shown that simply using lagged values
However, the residual yield curve movements arising from of the price as informative variables may lead to models
“unexplained” factors may not be equally well understood with unstable performance. Another option is to explore the
by market participants, and there may still exist arbitrage predictive ability of “technical indicators” such as “moving
opportunities here. It has been suggested that a third factor
3 Moreover, if this factor is indeed a proxy for volatility as it has been
is identifiable, and that this “unexplained” factor responsible
argued we should be able to extract information about volatility from the
for some 2–5% of yield curve movements is in large part yield curve which can be used in a variety of ways (e.g., pricing options)
attributable to volatility. Furthermore, several authors suggest beyond the scope of this paper.
1250 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

averages” and “oscillators,” but since most technical indicators 1760 daily observations in all. The problem of discontinuities
are themselves parameterized, this creates the possibility of is avoided by effectively ignoring any data which crosses a
“overfitting” the technical indicators themselves by choosing changeover point (by setting the “return” to zero for any such
parameter values which are predictive during the training period), this is a harsher approach than simply matching the
period but which fail to generalize to subsequent data. Instead, levels of the spliced series and was intended to reflect the
we propose an approach whereby we generate a limited possibility that not only the price itself but also the dynamics of
set of “technical indicators” (which are unavoidably highly the price changes might be discontinuous across the different
correlated) and then applying a variable selection methodology contracts. The size of each contract is $1 million and at expiry
which explicitly removes the correlations within this dataset. the futures are settled for a cash amount based on the London
The approach is in many respects complementary to those interbank offer rate (LIBOR) for dollar time deposits of a
described in the previous sections and is particularly useful duration of three months. The prices as quoted in the form
in cases that there is a high degree of colinearity between of 100—annualized yield, e.g., a price of 91.35 is equivalent
candidate variables and/or limited sample sizes prohibit the to an annualized yield of 8.65%. Each basis point move in the
use of analysis of variance. quoted value is equivalent to $25. The margin requirements
In this section we describe a study of statistical yield curve are $500 or $250 for a spread (long in one future, short in an
arbitrage among Eurodollar futures. A factor analysis confirms equivalent future of a different maturity). The small margin
that the bulk of changes in Eurodollar futures are accounted requirements allow for very high gearing (one million/500
for by unpredictable parallel shifts and tilts in the yield curve 2000) and hence the key target of any trading strategy is not
which closely follow a random walk. However, the third factor profitability per se but rather consistency and smoothness of
corresponds roughly to a flex in the yield curve and shows the equity curve.
evidence of a degree of predictability in the form of mean- The three different curves shown in Fig. 15(a) depict daily
reversion. We construct portfolios of Eurodollar futures which closing prices for the shortest maturity (90 days) contract
are immunized against the first two factors but exposed to the (close 1), a medium maturity contract (close 2; 450 days)
third and use both linear and nonlinear techniques to model the and the longest maturity contract (close 3; 720 days). Note
expected return on these portfolios. The third factor is found the high correlation in daily movements. This reflects the
to be predictable by nonlinear, but not by linear, techniques. similarity of the underlying assets, a long or short position
Using daily Eurodollar futures closing prices between 1987 in any of these futures is equivalent to a deposit or loan for
and 1994, we construct a portfolio which is immunized against a given duration and it would be surprising for the prices
parallel shifts and rotations of the yield curve. There are weak to diverge greatly given that the loan durations differ only
indications that the value of this portfolio is mean-reverting, by three-month increments. It is this property which induces
which if correct gives rise to arbitrage opportunities. We use an us to search for “statistical arbitrage” opportunities because,
NN to model the changes in the portfolio value, on a weekly although in the short term price anomalies could occur, we
basis. A significant risk-free return can be obtained by actively would generally expect them to be rectified in due course.
resetting the portfolio at the predicted turning points. This rectification is not guaranteed and is thus a statistical
(or risky) proposition rather than a traditional (or riskless)
B. Experimental Setup—Term Structure of arbitrage condition. Another way of viewing this is that the
Interest Rates and the Yield Curve assets are cointegrated [38] in some way.
Today’s value of a portfolio of risk-free interest rate securi- Fig. 15(b) shows four examples of the yield curve; the yield
ties is determined by the market interest rates for each relevant curve is constructed at each point in time by joining the prices
point in the future. For example, the value of a portfolio of all contracts with term to maturity ranging from less than
comprising a five-, ten-, and 15-year zero-coupon bond is a three months at the short end to three years at the long end.
function of today’s observed five-, ten-, and 15-year spot rates Considering cross-sectional “snapshots” of the yield curve on
of interest. The “spot” rate is defined as the return demanded four different dates we obtain some clues as to the likely
by the market for an investment of a given maturity. structure of the relationships between different contracts. The
The spot rates depend on current (short-term) spot rates yield curve can be flat, upward-sloping, down-ward sloping,
as well as anticipated future spot rates, which in turn are a and may also have a “hump” at some maturities. Samples of
function of market risk premia for different factors and interest this are clearly shown in Fig. 15(b). The yield curve on 12
rate volatility. If a portfolio contains multiple securities, its June, 1988 (marked *) is generally flat. On 13 July, 1987,
value will be a function of the spot rates for all maturities the yield curve (marked is upwards sloping, etc. The most
included in the portfolio, or the term structure of interest rates. striking fact about the curves is simply that they are smooth
A common way of referring to the term structure is the yield curves rather than jagged lines. Contracts tend to have a yield
curve, which plots spot rates as a function of time to maturity. which is similar to that of neighboring contracts. Comparing
Thus, when a portfolio contains most or all maturities traded the yield curves on the different dates we note that they differ
in the market, its value will be a function of the yield curve. primarily in level but also somewhat in slope and some of the
Fig. 15(a) shows some examples of different maturities curves are very smooth while others exhibit some kinks and
contracts for the Eurodollar futures. Overall our data consists curves. Most of the curves are upward sloping reflecting the
of daily high, low, open, and close prices for Eurodollar fact that longer term loans are more risky and hence under
futures over the time period August 1987 to July 1994, giving normal circumstances require a higher premium.
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1251

(a)

(b)
Fig. 15. Sample futures contracts and yield curves. (a) the closing prices of three contracts of different maturities. (b) Different types of yield curve shapes.

Fig. 16. Scree plot for principal components of changes in the yield curve.

In order to better understand the structure of the relationship 98% of the total variance and thus conceptually our results
between different maturity contracts, let us conduct a factor are largely consistent with theoretical two-factor bond-pricing
analysis on changes in the yield curve using the method of models such as those in [67] and [41]. However, our analysis
principal components. The scree plot in Fig. 16 shows the also suggests the presence of a third factor which accounts for
percentage of the total variability which can be attributed to almost exactly 2% of the total variance. Analysis of the factor
each factor. loadings (eigenvectors), see Fig. 17, shows that the first factor
The first factor accounts for almost 90% of the total vari- represents a parallel shift in the curve. The second factor cor-
ability. The second factor accounts for just under 8% of responds to a tilt in the yield curve. The loadings for the third
the variance. Together, these two factors account for almost factor suggest that it relates to a flex or bend in the yield curve.
1252 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

Fig. 17. Factor loadings (eigenvectors) for first three principal components.

The results of this factor analysis of Eurodollar futures bear


some resemblance to those of Litterman and Scheinkman [49],
in which three similar factors were found to be primarily
responsible for the price changes of different maturities of
U.S. government bonds.
Our expectation is that, due to widespread awareness of
the theoretical two-factor models, the markets are likely to (a)
be efficient with respect to the first and second of these
factors, but that some predictable inefficiencies (such as mean-
reversion) might exist with respect to the third factor. This can
be tested with a number of tests for “omitted” nonlinearity.
Barnett et al. [3] compare a number of such tests one of
which uses the approximating ability of NN’s. The modeling
(b)
strategy we describe in the next section attempting to capture
the dynamics of this third factor, can itself form the basis of Fig. 18. Immunization to parallel shifts (a) and tilts 9b) of the yield curve
with a simple butterfly portfolio. If the yield curve shifts up by an amount x,
such test. the portfolio appreciates by 23 x on the outside positions but simultaneously
depreciates by 23 x on the center contract. A similar effect is achieved for
the tilt.
C. Modeling Strategy
The purpose of our strategy is to identify opportunities for
statistical arbitrage in the sense of predictable price corrections practice however, we do not wish to turnover a portfolio of
which occur to eliminate short-term anomalies or deviations 12 contracts. Due to the high degree of colinearity between
from long-term relationships. Following the results of our adjacent contacts, the same effect can be achieved by a much
factor analysis we wish to investigate whether portfolios of smaller portfolio, with three contracts: at the short, middle,
bonds constructed specifically to be exposed to the a factor and long end of the yield curve (see, for example, [26]). To
would exhibit a degree of predictability, perhaps in the form avoid liquidity effects at the long-end of the yield curve and
of mean-reversion effects. In particular, we expect the markets volatility effects at the short end we decided to only consider
to be efficient with respect to the first two factors and hence and Thus, if we construct a portfolio
we concentrate our efforts upon modeling the third factor.
(42)
The first step in our modeling procedure is to isolate the
effects of the first two factors. In other words, using the with and we achieve the
12 available contracts we wish to construct a desired effect. This is illustrated in Fig. 18. By holding two
portfolio whose value we denote by and weigh with weights outside positions in equal sizes and twice the opposite position
in such a way that its sensitivity to parallel shifts and tilts on the center contract we simultaneously immunise against
of the yield curve is set to zero. i.e., both parallel shifts and tilts of the yield curve.
Any fluctuations in the value of this portfolio depend solely
and find weights such that on changes in the third factor. Fig. 19 depicts the value of the
portfolio over time.
The value of the butterfly portfolio appears to be stationary,
(41) supporting our original hypothesis that the third factor might
PCA PCA
exhibit mean reversion. If this was truly the case then it
Since the two principal components PCA and PCA are would be possible to generate excess profits using a simple
linear combinations of the 12 contracts this is easy to do. In long–short strategy (buy the portfolio if the current value is
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1253

Fig. 19. Value (over time) of butterfly portfolio (13 Long on contract c2 ; 23 Short on c5 ; 13 Long on c8 ).

Fig. 20. Value of butterfly portfolio, adjusted for discontinuities.

below average, sell the portfolio when the current value is candidate variables, all of which are derived from the port-
above average). However, it turns out that the mean-reversion folio price. The candidate input “variables” are shown in
is simply an artifact of the periodic expiry, and consequent Table XVIII.
shifting along, of the contracts. Adjusting for discontinuities The next step in our modeling procedure is to construct a
we obtain the price series shown in Fig. 20 (the series has been reliable estimator of weekly changes in portfolio value using
inverted so that it is appreciating rather than depreciating in the variables in Table XVIII. If we denote the portfolio value
value): at time as then our target variable is simply .
Clearly, the adjusted series is not mean-reverting, but it It would appear that there at least two possible approaches
does appear to exhibit some structured behavior, notably in to the variable selection problem as described in the previous
the persistence of both upward and downward trends. sections. However due to the high degree of correlation and
The second step in our modeling strategy is to attempt colinearity within the data set in Table XVIII the techniques
to model the dynamics of this price series and to com- are not ideal in this case. The approach we shall follow is
pare neural and linear regression models; however, first it is one which explicitly removes the correlations within the data
necessary to identify suitable input variables. Because it is set—namely principal components analysis. The scree plot for
not clear which, if any, exogenous factors might influence the principal components analysis of the candidate variables
an asset as abstract as the “third factor of the Eurodollar is shown in Fig. 21.
yield curve” we are confined to taking a purely time series The PCA indicates that much of the information in the
modeling approach. However, past experience in a variety 16 variables is actually redundant and that 98% of the total
of financial time-series applications has shown that simply variability within the candidate inputs can be represented by
using lagged values of the price as informative variables may the first six principal components. By simply transforming the
lead to models with unstable performance. A better option data and using the first six principal components as inputs to
is to exploit the predictive ability of “technical indicators” the models we reduce both colinearity and model complexity
such as “moving averages” and “oscillators.” However, most while losing very little information. In fact, by observing the
technical indicators are parametric; this instantly creates the values of the principal components over time (see Fig. 22), we
possibility of “overfitting” the technical indicators themselves note that the first PC is clearly nonstationary and has captured
by choosing parameter values which are predictive during the the overall drift of the portfolio.
training period but which fail to generalize to subsequent Interestingly the other five PC’s are all stationary and
data. Instead, we shall adopt the approach of generating appear to be derived from higher frequency components of
a limited set of candidate variables and then applying a the portfolio dynamics. On the basis of this we chose only to
variable selection methodology. This provides a set of 16 include the stationary PC’s (numbers 2–6) in our predictive
1254 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

Fig. 21. Scree plot for principal components of all 16 candidate variables.

TABLE XVIII
CANDIDATE “VARIABLES” FOR ESTIMATING Pt+5 0 Pt

(a)

(b)

models of the third factor in Eurodollar futures. In the next TABLE XIX
section we use regression and neural analysis to model the LINEAR REGRESSION RESULTS (INSAMPLE)
weekly changes in this factor.

D. Modeling the Third Factor in the Yield Curve


In this section we construct both linear and neural regression
models using 252 weeks of data for training and the final 100
weeks as a testing period. The results for the regression model These results indicate that the linear model is weak if
insample are shown in Table XIX. not nonexistent, Factor 4 with a value of 1.71 is (just)
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1255

Fig. 22. The charts show the time series which represent the first six principal components of the 16 original variables (a mixture of technical and
time series variables).

significant at the 90% level while the other variables are not while the indirect connections (through the hidden layer) learn
at all significant. the nonlinear component of the input–output relationship. The
Out-of-sample the linear model behaves as might be ex- number of hidden units is specified a priori as two; this
pected from the statistics above: It predicts the correct di- gives a network with a total of 18 parameters—roughly three
rection in 53 out of 100 weeks and the magnitude of the times the representational capacity of the linear model but still
correctly predicted moves is four basis points more than the satisfying the heuristic condition that we have more than ten
magnitude of incorrectly predicted moves. The total volatility times as many training examples as network parameters (e.g.,
of the portfolio is 614 basis points so the predictive ability of [33]).
the linear system is clearly insignificant. Due to the relatively small size of the NN its ability to
To test the hypothesis that changes in this third “unknown” overfit is severely limited and hence we simply train the
factor might be partially deterministic but in a nonlinear sense network to the point where the error is no longer decreasing.
we use a standard multilayer perceptron. The model uses tanh In fact, this occurs quite quickly in around 200 epochs of batch
activation functions and shortcut connections from input to learning. Then we employ the technique described in [13] to
output layer; the principle being that the shortcut connections perform a statistical analysis of the estimated neural model.
learn the linear component of the input–output relationship The insample results are shown in Table XX.
1256 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

TABLE XX
NEURAL-NETWORK RESULTS (INSAMPLE)

The model has a predictive correlation of almost 0.4 with apparent if we look at the chart of out-performance over time
an adjusted of over 8%; the overall value for the model [see Fig. 23(b)]. In fact the ratio of average weekly return
(with the full set of 18 potential parameters) is significant to standard deviation of returns is 0.270 in-sample and 0.255
at the 99% level of confidence (NB: this is without making out-of-sample; testing against the null hypothesis that the true
any adjustments for the effective number of parameters which, average return is zero we find values of 4.32 and 2.55,
if anything, would increase the significance of the result). respectively. Thus both in-sample and, more importantly, out-
The Durbin–Watson statistic is not significantly different from of-sample performance is significantly better than zero at the
two, indicating that there is no problem with autocorrelated 99% level of confidence.
residuals. Within the model factors 2, 3, and 6 are significant The results so far are interesting from a theoretical and
at over 95%; factor 4 presumably still has a linear effect modeling perspective, however the true test lies in whether
but does not make any use of the additional parameters and they can be used in a practical setting to “beat the market;” do
hence is insignificant overall; only factor 5 appears to lack any they provide further evidence against the EMH? This requires
predictive ability. Clearly the dynamics of the third factor in us to calculate the returns which the system can generate in
the yield curve appear to be predictable by nonlinear, but not financial terms, taking into account transactions costs and an
by linear, techniques. appropriate level of gearing. It is only really meaningful to
While the above results are significant in a statistical sense, measure the performance during the out-of-sample period. This
it is the out-of-sample test which is critical because it indicates consists of 100 weeks of data from July 1992 to July 1994.
whether the results are stable or whether they are invalidated First, let us review the various costs and returns involved.
by nonstationarities in the underlying process. The networks’ The transaction cost is £1 sterling per contract which we
performance out-of-sample is very promising: it predicts the approximate in dollar terms as $1.50. The value of a price
correct direction in 60 out of 100 weeks and the magnitude of change in a contract is $25 per basis point. The largest draw-
the correctly predicted moves is 192 basis points more than the down during the insample period is approximately 100 basis
magnitude of incorrectly predicted moves. The total volatility points or $2500. we will assume an initial account of $5000 as
of the portfolio is 614 basis points so the predictive ability of this would require a drawdown twice as large as the historical
the NN is sufficiently large to be significant. maximum in order to wipe out the account.
During the out-of-sample period the system makes total
E. Economic Ex-Ante Evaluation of the Results profits of 192 basis points $4800; the number of transactions
In order to evaluate the performance of the NN from a is 32 caused by changing predictions plus 7*2 caused by
financial rather than a statistical perspective it is necessary to switching contracts in all; each transaction includes buy-
analyze the properties of the equity curve which the system ing or selling four contracts (the constituents of the butterfly
generates and to compare this to the buy and hold performance. portfolio) giving a total cost of 46 * 4 *$1.5 $276. Thus
The performance should also be evaluated with respect to the during the out-of-sample period, the system makes profits of
other factors involved—such as transactions costs and gearing $4800, net $276 costs giving net profits of $4524 on an account
effects. of $5000 in a period of just under two years. This equates to a
Fig. 23 shows the equity curves for the network and the rate of return of approximately 47% per year. Note, however,
buy and hold strategy. The buy and hold strategy is to that this figure is sensitive (both upwards and downwards) to
simply maintain a Long position in the butterfly portfolio the level of gearing which is assumed.
(Short contract 2, Long 2*contract 5, Short contract 8). In In summary, the factor analysis of the Eurodollar yield curve
contrast, the network strategy alternates between long and from August 1987 to July 1994 indicates that the first two
short positions in the butterfly portfolio according to the factors, which can be viewed as a shift and a tilt of the curve,
direction of the price change predicted by the network. respectively, jointly account for just under 98% of the total
The network clearly outperforms the buy and hold strategy. price variability. This is consistent with the two factor bond-
During the out-of-sample period (roughly July 1992 to July pricing models in the theoretical literature. However, a third
1994) the network continues to be profitable, making almost factor, which the factor loadings identify as a flex of the yield
200 basis points, while the performance of the buy and hold curve, is also indicated. This third factor represents almost
is almost flat. The added value of the NN is even more exactly 2% of the total price variability.
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1257

(a)

(b)
Fig. 23. (a) Equity curves for network and holding strategies. The RHS of the vertical line (i.e., the last 100 weeks constitute the out-of sample
period). (b) Relative outperformance.

Assuming that the markets are efficient with respect to the at the 99% level. Translated into cash terms, and allowing for
previously known first two factors, we attempted to model transactions costs, the out-of-sample performance was found
changes in the third factor. A butterfly portfolio was con- to result in an average annual return of 47% per year, at a
structed which is immunized against shifts and tilts in the fairly conservative level of gearing. This is sufficient evidence
yield curve but exposed to the third flex factor. A time series to conclude that the third factor of the Eurodollar yield curve
modeling approach was adopted using technical indicators, appears to be predictable by nonlinear, but not by linear
such as simple oscillators and moving averages, as informative techniques.
variables. In order to avoid over-fitting the parameters of the So far we have used a modeling approach which attempts
technical indicators while also avoiding excessive intercorre- to develop estimators for expected returns in the context of
lations of the variables, a variety of indicators were generated (3). We have argued that careful problem formulation is in
and subjected to principal components analysis. The first six many cases a key to success. In the next section we take
factors were found to account for 98% of the variability in this approach one step further and argue that even if security
the original set of 16 variables. Of these the first factor repre- price fluctuations are largely unpredictable, it is possible that
sented the underlying trend of the portfolio and, being clearly investor behavior may not be. By using a simple model of
nonstationary, was excluded from the subsequent modeling investment strategy performance and switching in and out of a
procedure. The second through fifth principal components simple investment style we show that it is possible to produce
were used to build both linear and nonlinear predictive models excess returns.
of price changes in the butterfly portfolio.
The linear model was found to be of marginal statistical V. META-MODELS OF INVESTOR BEHAVIOR
significance at best and this was reflected in an out of sample
test with results which were no better than random. The NN A. Overview
model, however, was found to be significant overall with no The target of traditional predictive modeling approaches
less than three variables being significant at the 95% level have been price fluctuations on “real” assets (e.g., stocks,
of confidence. The out-of-sample test showed the NN to bonds, derivatives, etc.). These assets can be purchased or sold
generate consistent profits which were statistically significant and their prices can be observed on the market. The motivation
1258 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

behind the “metamodeling” we describe in this section is that if which take advantage not only of the time structure (e.g.,
asset prices can not be forecasted, there might nevertheless be cyclicity) and historical performance in a trading strategy but
a way to predict the performance of particular investment (or also make use of exogenous variables. It is entirely plausible
fund management) styles and strategies. The basic idea behind that the expected returns of a particular investment style may
“metamodeling” is that an investment strategy (or a trading be conditioned not only on a single measure of fundamental
rule) can be dealt with in much the same way as an asset value as suggested by [54] or historical performance alone as
because it can be seen as a synthetic asset. The return of this suggested by Refenes and Azema [60] but also on the current
synthetic asset is the profit and loss made by the strategy, when economic environment (e.g., interest rates) as well as market
applied on an underlying asset or a portfolio of (underlying) expectations (e.g., estimated growth of earnings per share), or
assets. Subsequently, metamodeling handles a class of objects they may vary according to the country or sector in which the
(i.e., investment strategies) which are abstracted from the investment style is applied.
“real” assets. The returns of these strategies are not directly In this section we take these approaches one step further by
observable in the markets, but have to be computed by the combining results from [54] and [60] and introducing other
modeller and correspond to the equity that would reach a variables suggested in the literature. We utilize a stepwise
virtual investor if he were using the strategy to take a position procedure of modeling investment strategy returns which
in the underlying “real” assets. builds upon these results and also the results described in
The concept of analyzing investment strategy performance earlier sections on model identification and variable selection.
is not entirely new. It has been used in one form or another Within this framework, we construct a metamodel of investor
by several researchers and practitioners. Active investment behavior using a dataset composed of financial data on 90
style management, which is a specific case of metamodeling French Stocks drawn from the SBF250 index. We compare
has become increasing popular among practitioners (see, for two metamodels of investment style performance: a linear
example, [23]). Of particular relevance to our approach is the regression and a neural model. Both models provide small
work of Palmer et al. [54] and Refenes and Azema [60]. Using but consistent incremental value with the NN outperforming
a large scale simulation in an evolutionary framework, Palmer its linear counterpart by a factor of two. This suggests the
et al. describe a model of a stockmarket in which independent presence of nonlinearities probably induced by the interac-
adaptive agents can buy and sell stock on a central market. The tions between some of the technical, economic, and financial
overall market behavior is therefore an emerging property of variables and which are undetectable by standard econometric
the agents’ behavior. The simulation starts with a predefined analysis.
set of investment strategies in which investors make their
decisions on the basis of technical and fundamental “analysis.”
Through an evolutionary process of mutation (i.e., random B. Experimental Setup—Modeling Investor Behavior
alteration of the decision rules) and crossover (i.e., creation of One of the dominant themes in the academic literature
new investors by systematic combination of existing ones) the since the 1960’s has been the concept of an efficient capital
study attempts to emulate the way in which markets evolve as market [27]. According to the EMH, security prices fully
a direct result on investor behavior. The analysis also identifies reflect all available information and therefore can not be
the type of variables that successful investors are observing in forecasted. However, if asset prices are unpredictable, investor
order to generate excess return. Typical variables which enter behavior might not be. In other words, if asset prices can not
the decision process are changes in the stock’s fundamental be forecasted, there might nevertheless be a way to model
value (measured by an appropriately discounted series of the performance of fund management styles and investment
expected future dividends), a technical indicator signifying strategies. In that context, metamodeling is useful because it
mean-reversion in prices (measured by an oscillator), and a models an element of investor behavior.
technical indicator signifying trends in prices of the underlying From a more practical point of view, metamodeling provides
(measured by a moving average crossover). a method of quantifying the conditions under which a certain
Refenes and Azema [60] describe an approach for dynam- investment strategy should be used. Let us consider for exam-
ically selecting trading strategies from a predefined set of ple a commonly used investment strategy based on screening,
trend-following and mean-reversion based trading strategies. whereby the professional fund manager dynamically maintains
The key idea is to predict (on the basis of past performance) a portfolio of “value stocks” i.e., composed of say the 10%
which of two alternative strategies is likely to perform better in of stocks having the lowest price/earning (PE) ratio. The
the current context. The expected returns of the two strategies investment strategy can be seen as a synthetic asset that has a
are conditioned on their relative prediction error. If the error dynamic composition but a single and constant characteristic
in one of the two strategies is decreasing while the error in (e.g., “low PE”). In contrast, a stock with a low PE ratio has a
the other is increasing the metamodel switches strategies. In fixed composition but no constant and single characteristic. It
many respects this approach is analogous to a hidden Markov encompasses a collection of aspects: for example, Air Liquide
model and does make use of other potentially informative has a large capitalization, it is also a blue chip stock, has
variables that could explain the performance of a particular low volatility, high PE, average dividend yield. Modeling
strategy. The approaches of Palmer et al. [54], [60] and our such a stock in isolation would give very little information
own are not mutually exclusive. Subsequently it would be about the behavior of high/low PE stocks, but modeling the
desirable to develop nonlinear models of investor behavior performance of the fund manager who does have a constant
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1259

Fig. 24. Asset price and P&L of a trend following strategy. The P&L can be seen as a cyclicity indicator: Negative values correspond to sideways
movements, positive values to trending periods.

characteristic might be more predictable. It is entirely plausible these measures are sometimes only remotely related to the
that the market may reward low PE stocks under certain equity that would reach an investor with a trading strategy
economic conditions but not under others. Variations in the based upon them.
performance of such manager may therefore be attributable to In this study we shall adopt a simple measure of cyclicity
factors which are exogenous to his decision making. If it were which is directly related to the equity that would reach an
possible to develop a predictive model of the performance investor if he were using the spread between two moving
of the investment style, then excess returns could be made averages to (i.e., moving average crossover) to trade the
by buying the portfolio recommended by the strategy when underlying (see [5] for a rationalization). To do so he uses
the prediction is positive, and selling the portfolio when its a simple decision rule whereby given a time series of prices
expected return is negative. on the underlying asset he computes two moving
In this section, we describe a methodology for developing averages for the th point. If the short moving average
such predictive models. The modeling procedure consists intersects the long moving average from below he purchases
of two steps. The objective in the first step is to define the underlying and conversely if the short moving average
a universe of commonly used investment strategies and to intersects from above he (short) sells the underlying.
quantify their expected returns. Since these returns are not Fig. 24 shows the prices on the underlying asset over 300
directly observable in the market we need to compute them days (single line), as well as the weekly returns (i.e., profit
from historical data; they correspond to the equity that would and loss) of the notional investor (shaded area). This simple
reach a notional investor if were using the strategy(ies) to investment style performs quite well when the underlying asset
take a position on the underlying assets. This is easily done is in a state of trend but rather poorly when the underlying
by simulating the notional investor. The second step consists is moving sideways. Notably, there are persistent periods of
of building a metamodel of these expected returns which are trending followed by persistent periods of mean reversion. This
conditioned on a number of variables which are exogenous to is reflected in the cyclicity of the manager’s expected return
the investment style and selecting those variables that are the (shaded area).
most significant (in the statistical sense) in explaining varia- In an earlier paper [60] described a technique for alternating
tions of the investment styles’ performance. By treating these between investment strategies as the market changes from a
investment styles as synthetic assets it is easy to see how they trending into an oscillatory behavior. The key idea is analogous
can be used in the framework of active portfolio management to a hidden Markov process, whereby the underlying asset is
as described in (3). However, since the number of possible modeled as the composite of two distinct states. The decision
investment styles is practically unlimited and the purpose of to switch over strategies is made on the basis of monitoring the
this section is to describe the methodology, we shall focus error (i.e., profit/loss) that each state model is producing and
our study on the second step of the procedure. To facilitate choosing the one in which the error is decreasing. There are
clarity we shall, without loss of generality, restrict our analysis two main disadvantages of the approach: first the measures of
to a single (and rather simple) investment style, whereby the profitability may not be sufficiently responsive. For example,
notional fund manager invests in a set of underlying assets the underlying might have entered a trending period for some
by utilizing a cyclicity indicator. There are several indicators time already, while the measure might still indicate some
that can be used to measure cyclicity such as the variance mean reversion due to the delays in constructing the averages.
ratio [50], drift-stationary tests inspired by the unit root test Essentially the approach is one of error tracking rather than
[22] and various measures based on spectral analysis. These predictive modeling.
measures suffer from two limitations. First the amount of data The second reason is more important and relates to the
required to measure cyclicity is prohibitively large. Second, inability of the approach to make use of exogenous variables
1260 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

that might be useful in explaining variations in the perfor- the commonly used ratios cash\_flow/price, dividend-yield,
mance of the investment style. For example, it is entirely price/book\_value.
plausible that trending periods are induced by extreme moves To capture any nonlinear effects that are due to the size of a
in interest rates, while sideways moves may be induced by stock its market capitalization is used as an additional variable.
frequent revisions on earnings expectations for the underlying Finally the following two variables are used to capture any
by financial analysts. It is therefore desirable to develop mean-reversion effects that frequent revisions of expected
predictive models of investment style performance which earnings might induce: the percentage of EPS estimates raised
make use of such exogenous variables. and the percentage of EPS estimates lowered over the past four
We examine monthly variations in the performance of months. They are computed from data provided by I/B/E/S.
the investment style applied to a universe of 90 French It has also been argued that the expected returns of a
Stocks drawn from the SBF250 stock index. The dataset particular investment strategy might be conditioned on the cur-
covers the period from January 1988 to January 1995. We rent economic environment or the may vary according to the
select our universe of exogenous variables on the basis of country or sector within the investment style is applied [23].
their availability to the investor at a specific point in time To capture any effects due to the economic environment we
and the variable’s relevance to explaining variations in the use two variables: the yield on the ten-year French government
investor’s performance.4 We start with set of 16 variables bond is used to capture long interest rate effects while the three
(see Table XXI) that could potentially have an influence in month interest rates are use to capture any dependencies on
explaining variations in a fund managers performance. We the short rate. We use four dummy variables to specify the
then select those that are significant in explaining variations sector in which a stock belongs: cyclical is a dummy variable
in the performance of this particular investment style through encoded as one for stocks in the chemical, building materials,
a simple procedure described in Section V-C. autos and equipment, and basic industries and encoded as zero
Among others Elton et al. [24] found that the performance elsewhere; noncyclical is similarly encoded as one for stocks
of a stock might be influenced by its growth characteristics. in the food, household products, retailer and utility industries
High growth stocks might move in trends more often than and zero elsewhere; growth is encoded as one for stocks
low growth ones. The growth characteristics of the underlying in the cosmetics, drugs, electronics, entertainment, hotels
might in turn influence the performance of the investment and restaurants, computers, and publishing; and financial is
style. These are captured by five variables (see Table XXI). encoded as one for financial institutions and zero elsewhere.
The EPS growth rate is the growth rate based on a least To account for country effects we use the leading indicator
squares calculation of moving four quarters actual earning of for France as published by the OECD over the past two years.
the latest 20 quarters. The stability of actual EPS (i.e., the It is, however, possible that further exogenous factors may
stability in detrended earnings) is likely to be a key factor and be at work such as for example financial gearing, exposure
it is computed as the mean absolute difference between actual to international competition, etc. which are not accounted
EPS and the trend line in EPS growth (previous variable) for for in the set of variables above. Indeed, the debate as to
the past 20 quarters. A large increase or decrease in the markets which variables have the strongest influence on explaining
expectations for earnings growth might trigger a trend. This stock returns is still ranging in the finance literature (see, for
effect is captured by two variables: current year’s expected example, [2], [6], and [28], among others). In our case it is not
percentage growth in EPS i.e., ([E(EPS(t))/EPS * feasible to include all possible variables in our model purely
100) based on the analysts’ estimates for this years earnings for reasons of data availability, but to account for any residual
and EPS(t) is the actual earnings in the last reported fiscal effects we shall use lagged values of the target variable. Past
year. Likewise, the one year expectation in percentage growth cyclicity is simply and lagged passed
of EPS, i.e., ([E(EPS(t))/EPS * 100) is also based cyclicity is We shall also include
on analysts’ expectations. the relative difference between two moving averages. A large
It is possible that extreme movements in a stock’s ex- value (in absolute terms) for this difference means that the
pansiveness might provide useful information in triggering stock price has moved strongly in one direction. It is possible
trending periods. Several authors have linked the performance that this might trigger trending periods.
of a stock to financial ratios. For example, Fama and French The dependant variable is the (marked-to-market) profit an
[28] found that the book/market value of equity ratio and investor would make by using a trend following strategy on
firm size, measured by the market value of equity emerged the relative price of each stock for the 12 coming months. The
as variables that have strong relations with the cross section cross over of two MA constitutes the buy or sell signal. A long
of average stock returns during the 1963–1990 period. Five position is taken in the stock after a buy signal, a short position
variables are used to measure expansiveness: earnings yield is taken after a sell signal. The lengths of the MA are three
both in absolute terms and relative to the market as a whole and 12 months, respectively. All variable values are expressed
and the interaction between earnings yield and earnings growth in prices which are relative-to-market. The reason for this is
as a single variable earnings\_yield * earnings\_growth, and because the procedure is evaluated against a benchmark, i.e.,
the market. Besides, positions on the market as a whole can
4 Some of the variables are computed from data provided by Datastream,
easily be taken through future contracts. Finally, much of the
others are computed from data provided by I/B/E/S (Institutional Brokers noise in individual stock prices are due to the market itself.
Estimate System). Taking relative prices filters this noise out.
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1261

TABLE XXI
SUMMARY OF ALL VARIABLES

The fact that the performance of an investment style is be a function of other variables in the model. Therefore, NN
not constant over time has two possible explanations. The models constitute far stronger tests for rejecting relationships
first is randomness. After all, the strategy’s performance is than linear models but due to overfitting, they can easily be
a result of the fluctuations in the underlying stock prices. misleading in finding relationships that reflect a temporary
If these fluctuations are random, there is no reason why market anomaly.
investment performance should not be random. However, it Furthermore this experimental setup gives us the opportu-
is also possible that under certain conditions, stock prices are nity to test several prevailing beliefs among some investors.
more probable to be in a certain state than in the alternative For instance, we investigate whether using constantly a trend
one. For instance, it is entirely plausible that stocks of com- following strategy is profitable. We also test whether past
panies that are highly exposed to international competition are performance of a strategy is related to future performance. In
moving in trends for extreme exchange rates between their this framework of market efficiency, the concept of conditional
domestic currency and the dollar. Another example is provided cyclicity becomes particularly relevant. Constant stock cyclic-
by companies for which earning prospects are difficult to ity would be relatively easy to detect by traditional methods,
forecast and are frequently revised by financial analysts. Their and therefore would be rapidly discounted in the market. In
stock prices are more likely to move sideways. This notion contrast, conditional cyclicity is more difficult to model and
that cyclicity in the performance of an investment strategy is therefore might provide an economically feasible framework
influenced by exogenous variables, rather than merely caused for applying nonlinear methods such as NN’s to financial
by the randomness of the stock price, suggests that a model modeling.
of a strategy’s performance could be built. The existence of
such a model could then reject the first explanation, i.e., that C. Modeling Strategy
investment strategy performance is a random series. Our modeling strategy consists of three basic steps. The first
Unfortunately, traditional linear models are only weak tests step is essentially a data preparation step for the independent
of this hypothesis. Failure to find any linear relationship variable. The second step attempts to identify time structure in
between cyclicity and potential explanatory variables does not investment strategy returns while in the third step we attempt
imply that there is no relationship at all. Trending periods to obtain incremental value by testing the hypothesis that these
might well be triggered by a combination of factors. These in- returns are conditioned on exogenous variables.
teraction effects cannot be explored with a linear model unless 1) Simulating investment strategies: Unlike prices of
they are explicitly taken into account by a composite variable. real assets (e.g., stocks, bonds, currencies, derivatives,
However, such explicit modeling is only possible if strong a commodities, etc.), returns from an investment strategy
priori knowledge is available. Often in financial forecasting are not directly observable on the markets, and have
this is not the case. Besides, extensive search of interactions to be calculated by a simulation procedure. In order to
is merely impossible, especially when the number of potential generate the time series for the profit and loss of the
explanatory variables is large. NN models do not suffer from strategy, it is necessary to use historical data to simulate
such insufficiencies. Their powerful universal approximation the investment strategy and obtain its marked-to-market
abilities allow them to model nonlinear functions, in particular returns
conditional (“if”) relationships. In other words, the influence
of some variables in the model can vary arbitrarily and it could (investment\_style, asset (43)
1262 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

where is a function of the particular investment style in each month and estimate the models on this ranked data.
applied to a particular (group of) underlying asset(s). In At each point in time instead of using the value of the
the case that is explicitly quoted this step can be directly we use it to rank the underlying assets with respect to
omitted by using publicly available data. In the study the of the investment strategy. We then regress this rank
presented here, for each stock in the index, we need against the similarly computed ranks of underlying assets with
to generate the series of returns that would reach the respect to the independent variables. i.e.,
investor had he applied it to each stock. The procedure,
Rank
, for doing so is given in Section V-D.
2) Identifying time structure in investment strategy re- Rank
turns: Using univariate time series analysis investigate Rank (48)
the time structure of the returns and identify significant
lags where is the exogenous variable . The variables are
selected if they are significant in both rank-based and non-
ranked models. We apply both a multiple regression as well
(44) as a neural model and compare the results to investigate
nonlinearities in the relationship between cyclicity and its
(45) explanatory variables.

D. Simulating Investment Strategies


In principle, the Box–Jenkins identification procedure
can be used to identify time structure for the linear In order to generate the time series of the profit and loss of
model. In practice, however, because the returns are the strategy, a number of steps are necessary. First, we generate
often serially correlated one has to be cautious when the sell/buy signals recommended by the strategy for each asset
fitting ARIMA models. In Section V-E, we describe a at each point in time. These signals, , are defined by
simple ad hoc procedure for overcoming this difficulty. if MA MA
3) Identifying exogenous influences: Construct well- if MA MA (49)
specified models of conditional cyclicity for each of if MA MA
the investment styles/instances under consideration
where MA being the price of
the asset at time being the length of the moving average in
months. From this we calculate the marked-to-market returns
from the strategy
(46)
(50)
(47) Finally, we calculate the profit and loss P and L of the
strategy over the last 12 months
where is the vector of exogenous variables defined in
Table XXI and is the vector of network parameters.
P L (51)
In principle, this can be done using any of the variable
selection methodologies described in the previous sec-
tions, but in this application the significant exogenous In the first step of our modeling procedure we investigate
variables are selected using a combination of linear and whether the constant use of this strategy produces excess re-
nonlinear tests as described in Section V-F to overcome turns. Indeed, if we apply this simple trend-following strategy
some special difficulties arising from serially correlated to a cross section of stocks drawn from the SBF250 it yields
data. small but significant excess return. The annual average excess
The estimates of expected returns given by the meta- profit after transaction costs (1% per transaction) is 2.4%. The
models in (46) and (47) can be used in the context of standard error for this average is 1.1%. The average return is
(3) in the obvious way. For simplicity, however, in this therefore different from zero at a 90.2% significance level over
section we shall use a simple asset allocation strategy this period. The objective of our metamodeling strategy is to
whereby we purchase the investment strategy if our improve on this performance by switching in and out of the
estimate of future return is positive and sell investment strategy based on the predictions of our models.
the strategy otherwise. We start by building a simple univariate metamodel of the
Because some of the variable distributions have heavy tails performance of this investment strategy.
and consequential outliers, the estimated parameters might
be strongly influenced by a small number of observations. E. Identifying Time Structure in Investment Strategy Returns
There are many ways to deal with such influential observations In principle, the Box–Jenkins methodology can be used
including robust estimators as described in the Section III. to estimate the time structure of the Investment strategy
In this section, however, we shall use a procedure based on returns. In practice, however, because the P&L series are
ranking as an alternative. Therefore, we shall also rank the data serially correlated by construction, one has to be cautious
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1263

TABLE XXII
F- AND t-VALUES FOR VARIABLES APPLIED TO RANKED AND UNRANKED DATA

when fitting ARIMA models. The problem arises from the F. Identifying Exogenous Influences
fact that although we have 4800 observations, due to the 12- We are interested in assessing the degree to which the
month overlap the actual independent observations are at best exogenous variables in Table XXI have a significant influence
400. Since we have 90 underlying assets, effectively there in explaining investment strategy returns, and whether by
are less than five independent observations per stock. This using these variables we obtain incremental value over and
is not suitable for traditional ARIMA fitting. Instead, we shall above the passive (unconditional buy and hold) strategy and
use cross sectional data and regress the future performance the time series model. From the possible methods of variable
of the strategy to only a limited number of lags chosen for selection described in earlier sections of this paper ANOVA
practical reasons not to exceed two. Since, due to the overlaps, appears to be the most suitable in this application. The main
we cannot use the in-sample and -statistics5 in order reason for this is that our interest lies in identifying the
to evaluate the predictive ability of the models and/or the conditions under which to purchase or sell the investment
significance of the parameters a we chose a simple alternative strategy. This is likely to occur due to multiplicative effects
based on out-of-sample testing. The models (i.e., interactions) between these variables rather than simple
additive contributions.
P L P L P L The -statistics for the 11 most significant variables are
P L P L P L (52) shown in Table XXII. According to the -statistics, three
variables seem to have the strongest effect on cyclicity: the
with being a function approximated by a neural learning, dividend yield (relative to the market), the estimated growth
and is the vector of parameters, i.e., weights; are estimated in earnings for the current year (Fiscal year zero to fiscal
on 80% of the dataset and are then tested on the remaining year one) as well as the spread between two moving averages.
20%. For the neural model, a further validation set (10% of the The -statistics also show linear or additive effects. Because
in-sample observations, randomly chosen) is used to control ranked data are not normally distributed but uniformly, the -
the complexity of the model through early stopping. The out- values significance levels are different than those for normally
of-sample correlations between actual and predicted returns distributed data (see -values). Given the large amount of
are: 0.02 and 0.003 for the regression and neural models, data points (4800), even small -values are significant (three
respectively. The frequency of change in sign are on average is significant at a 99% level). However, if we take into
one every two years for both models. Therefore the additional account the overlap in the data (and adjust the for it),
profit obtained by switching on and off the strategy on the most variables are unlikely to pass the significance test at a
basis of its predicted performance is 1.37% and 0.94% confidence level higher than 80%. As far as direct dependency
for the regression and neural model, respectively. The funding is concerned only those with an higher than ten are
costs for switching in and out of the strategy make this active influential. However, because some of these variables interact
strategy management approach less profitable than the passive with each other it is desirable to retain them in the model.
(i.e., buy-and-hold) investment strategy. Apparently neither of These nonlinear interactions are illustrated using sensitivity
the two models seems to be profitable. However, it is entirely analysis (Section V-I) but first let us examine if incremental
plausible that the expected returns of this investment strategy value can be obtained (both in the statistical and economic
may be conditioned not only on historical performance alone, sense) by these metamodels.
but also on exogenous influences attributable to the current
economic and market environment. In the next section we G. Statistical Evaluation
attempt to verify this possibility. Because the data contains both time series and cross sections
of stocks, we split the datasets in two ways. First, in order to
5 It is actually possible to use these metrics but they need to be adjusted for test how well the models generalize to unseen periods a first
the overlap if the extend of the overlap was clear. dataset is created by splitting the database according to time,
1264 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

TABLE XXIII
STATISTICAL EVALUATION OF LINEAR REGRESSION AND NEURAL MODELS

Fig. 25. Economic performances for the different models with fixed and proportional positions. Transaction costs of 1% are used.

i.e., the period January 1989 to December 1992 is used to Two asset allocation rules will be applied to the models pre-
calibrate the regression, whereas January 1993 to December dictions. The first rule is based on the sign of the metamodels’
1994 is used for testing. Second, in order to test how well the prediction i.e., if the is positive we buy the strategy, and if the
models generalize to unseen stocks a second dataset is created expectation is negative we sell the strategy in equal amounts.
by splitting the database according to stock names i.e., the The second rule consists of weighting the position proportional
first 72 stocks (in alphabetical order) are used for calibration, to the magnitude of the expected return. The average sizes
whereas the next 18 are used for testing. For the NN models a of the position taken with the two rules being different, it is
small cross validation set is used (10% of the estimation set) important to adjust (i.e., scale down) individual positions so
are randomly chosen for this purpose. Table XXIII shows the that the aggregate economic performances of the two rules
performance of the models in terms of percentage of variance can be compared fairly.
explained Fig. 25 plots the cumulative excess returns for the two
The NN model seems to outperform the linear regression models with both allocation rules. The passive (i.e., buy and
model, both on unseen stocks, and on unseen periods, suggest- hold) strategy is also shown (solid line). Transaction costs of
ing the presence of nonlinear relationships in the dataset. There 1% are taken into account.
is a notable difference between the estimation (in-sample) As shown in Fig. 25, trading rules based on positions that
and out of sample performance for both models suggesting are proportional to the prediction of the models significantly
nonstationarity. This difference is even more pronounced for outperform the trading rules based on fixed size positions. This
the NN which is expected since it uses a smaller estimation suggests that the models have picked up the amplitude of the
set. profits and losses of the MA strategy, as well as their signs.
The incremental value provided by the meta models is
small in statistical terms. However, the expected gain from the I. Sensitivity Analysis
strategy does not need to be large in order to ensure excess The difference in performance between linear and nonlinear
returns. The strategy need only provide a positive expectation models suggests that some nonlinearities exist in the dataset.
net of funding (transaction costs). If we apply the strategy However, the -statistics in Table XXII only show minor
a large number of times (or alternatively to a large number nonlinear dependencies. These nonlinearities are probably due
of assets) the return should be positive and the risk low. In to interactions and conditional relationships. To illustrate this
the next section we evaluate the economic performance of the point, let us analyze the relationship between the expected
metamodels. cyclicity and the estimated growth in earnings per share
(provided by I/B/E/S and measured relative to the market).
H. Economic Results from Linear and NonlLinear Models This relationship is plotted in Fig. 26 for two different market
The statistical performance of the model has an interest capitalization: the bold line represent market capitalization that
when analyzing the nature of the relationships between the are 50% higher than the market average, the dotted line corre-
different variables. However, it does not provide a measure of sponds to capitalizations that are 50% smaller then the market
how useful the model is in term of prediction. Some models average. Because the estimated function is multidimensional,
can have a very good explanatory power, but appear to be we can only plot a cross section of it. The relationship in
unable to generate profits. Transaction costs are often the Fig. 26 is obtained when setting all the other variables to their
reason for this as well as outliers in the data. mean value.
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1265

Fig. 26. Conditional relationship between cyclicity and the growth in EPS, relative to the market, for two different market capitalizations. The other
factors are set to their mean value.

As shown in Fig. 26, the effect of estimated growth in EPS partial derivatives of the function in (9) or at least for the
is different for large and small capitalization stocks. For large statistic in (10). The ANOVA methodology on the other hand,
stocks, trends in relative prices tend to occur when EPS growth is model independent and does not assume causality. Many
is extreme. On the other hand, when earning growth is in line other research issues, particularly model misspecification and
with the market’s earning growth, prices tend to oscillate. For distribution theories remain unexplored, but they are essential
small stocks, only small or negative earning growth engender if NN’s are to become commonplace in financial econometrics.
trending (probably downwards) prices.
Unfortunately, because of the large number of variables, it is ACKNOWLEDGMENT
difficult to visualise or analyze exhaustively all the interaction
effects. Many relationships are probably conditional to more The authors would like to thank D. Bunn for motivating
than one other variables. this research and his continuous technical contributions at
various stages in the development of the methodologies. F.
VI. CONCLUSIONS AND FURTHER RESEARCH Miranda-Gonzales carried out the initial statistical analysis of
implied volatility. The authors would also like to thank J.-F.
We described a collection of neural applications in options De Laulanié from Société Générale Asset Management for his
pricing, cointegration, the term structure of interest rates and useful comments regarding the metamodels for asset selection
models of investor behavior which highlight some key method-
ological weaknesses of NN’s including: testing the statistical
REFERENCES
significance of the input variables, testing for misspecified
models, dealing with nonstationary data, handling leverages [1] C. Alexander and A. Johnson, “Dynamic links,” RISK, vol. 7, no. 2,
in the datasets and generally formulating the problem in a 1994.
[2] W. C. Barbee, S. Mukherji, and G. Raines, “Do sales-price and debt-
way which is more amenable to predictive modeling. equity explain stock returns better than book-market and firm size?,”
We proposed and evaluated a number of solutions. We Financial Analysts J., pp. 56–60, Mar./Apr. 1996.
[3] W. A. Barnett, A. R. Gallant, M. J. Hinich, J. A. Jungeiles, D. T.
described a number of ways for principled variable selec- Kaplan, and M. J. Jensen, “An experimental design to compare tests
tion including 1) a stepwise procedure building upon the of nonlinearity and chaos,” in Nonlinear Dynamics and Economics, W.
Box–Jenkins methodology; 2) analysis of variance; and 3) A. Barnett et al., Eds. Cambridge, U.K.: Cambridge Univ. Press, pp.
163–190, 1996.
regularization. We showed how model misspecification tests [4] S. Beckers, “Standard deviations in option prices as predictors of future
can be used to ensure against models which make systematic stock price variability,” J. Banking and Finance, vol. 5, pp. 363–382.
error. We described how the principle of cointegration can be [5] Y. Bentz, A. N. Refenes, and J-F. Laulanié, “Modeling the performance
of investment strategies: Concepts, tools and examples,” in Neural Net-
used to deal with nonstationary data and generally described works in Financial Engineering, A-P. N. Refenes et al., Eds. London:
ways of formulating the problem in a manner that makes World, 1996, pp. 241–258.
predictive modeling more likely to succeed. [6] L. C. Bhandari, “Debt/equity ratio and expected common stock returns:
Empirical evidence,” J. Finance, vol. 43, no. 2, pp. 507–528, June 1988.
The problems and solutions presented and discussed in [7] F. Black, “The pricing of commodity contracts,” J. Financial Economics,
this paper represent only the tip of an iceberg. Some of the vol. 3, pp. 167–179, 1976.
[8] F. Black and M. Scholes, “The pricing of corporate liabilities,” J.
solutions, although effective in practise, still require more Political Economy, vol. 81, pp. 637–659, May 1973.
rigorous statistical foundations. For example, the stepwise [9] G. E. P. Box and G. M. Jenkins, Time Series Analysis, Forecasting, and
procedure in Section II is still somewhat ad hoc. Ideally, one Control. San Francisco, CA: Holden-Day, 1970.
[10] L. Breiman and J. Friedman, “Estimation optimal transformation for
would wish to bypass the Box–Jenkins phase all together multiple regression and correlation,” J. Amer. Statist. Assoc., vol. 80,
but this is not possible without having a distribution for the pp. 580–619, 1985.
1266 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

[11] L. Breiman, J. Friedman, R. Olshen, and C. J. Stone, Classification and surement of Consumer Behavior, A. S. Deaton, Ed. Cambridge, U.K.:
regression trees, Belmont, CA: Wadsworth, 1984. Cambridge Univ. Press, 1981.
[12] A. N. Burgess, “Robust financial modeling by combining neural-network [43] J. Hull and A. White, “The pricing of options on assets with stochastic
estimators of mean and median,” in Proc. Appl. Decision Technol., volatilities,” J. Finance, vol. 42, pp. 281–300, 1987.
UNICOM Seminars, U.K., 1995. [44] B. Jacquillat, J. Hamon, P. Handa, and R. Schwartz, “The profitability
[13] , “Nonlinear model identification and statistical significance tests of limit order trading on the Paris stock exchange,” Université Paris
and their application to financial modeling,” in Artificial Neural Net- Dauphine, unpublished document
works, Inst. Elect. Eng. Conf., June 26–25, 1995, pp. 312–217. [45] S. Johansen, “Statistical analysis of cointegration vectors,” J. Economic
[14] A. N. Burgess and A-P. N. Refenes, “A principled approach to neural- Dynamics Contr., vol. 12, pp. 231–254, 1988.
network modeling of financial time series,” in Proc. IEEE ICNN’95, [46] , “Hypothesis testing for cointegration vectors in Gaussian vector
Perth, Australia, Nov. 1995. autoregressive models,” reprint, Inst. Math. Statist., Univ. Copenhagen,
[15] L. Canina and S. Figlewski, “The informational content of implied Denmark.
volatility,” Rev. Financial Studies, vol. 6, no. 3, pp. 659–681. [47] H. Latané and R. J. Rendleman, “Standard deviations of stock price
[16] L. Capaal, I. Rowley, and W. F. Sharpe, “International value and growth ratios implied by option premia,” J. Finance, vol. 31, pp. 369–381,
stock returns,” Financial Analysts J., vol. 49, pp. 27–36, 1993. 1976.
[17] D. P. Chiras and S. Manaster, “The information content of option prices [48] B. Lev, “On the usefulness of earnings and earnings research: Lessons
and a test of market efficiency,” J. Financial Economics, vol. 6, pp. and directions from two decades of empirical research,” J. Accounting
234–256, 1978. Res., vol. 27, suppl. 53–192, 1989.
[18] J. Y. Choi and K. Shastri, “Bid-ask spreads and volatility estimates: The [49] R. Litterman and J. Scheinkman, “Common factors affecting bond
implication for option pricing,” J. Banking and Finance, 1987. returns,” J. Fixed Income, June 1991.
[19] J. H. E. Davidson, D. F. Hendry, F. Srba, and S. Yeo, “Econometric [50] A. Lo and A. C. Mackinlay, “Stock market prices do not follow random
modeling of the aggregate time-series relationship between consumers, walks: Evidence from a simple specification test,” Rev. Financial
expenditure and income in the United Kingdom,” Economic J., vol. 88, Studies, vol. 1, pp. 203–238, 1988.
pp. 661–692, 1978. [51] H. M. Markowitz, “Portfolio selection,” J. Finance, vol. 7, pp. 77–91,
[20] T. E. Day and C. M. Lewis, “The behavior of the volatility implicit in 1952.
the prices of stock index options,” J. Financial Economics, 1988. [52] R. Merton, “On estimating the expected return on the market: An
[21] E. Derman and I. Kani, “Riding on a smile,” Risk Mag., vol. 7, p. 2, exploratory investigation,” J. Financial Economics, 1980.
1994. [53] J. Moody and J. Utans, “Architecture selection strategies for neural
[22] D. A. Dickey and A. W. Fuller, “Likelihood ratio statistics for au- networks: Application to corporate bond rating prediction,” in Neural
toregressive time series with a unit root,” Econometrica, vol. 49, pp. Networks in the Capital Markets, A-P. N. Refenes, Ed. Chichester,
1057–1072, 1981. U.K.: Wiley, 1995, pp. 277–300.
[23] J. L. Dorian and R. D. Arnott, “Tactical style management,” in The [54] R. G. Palmer, W. B. Arthur, J. H. Holland, B. LeBaron, and P. Taylor,
Handbook of Equity Style Management, T. D. Coggin and F. Fabozzi, “Artificial economic life: A simple model of a stock market,” Santa
Eds. New Hope, PA: FJF, 1995. Fe Inst. working paper 93, also submitted Elsevier Publishers, Oct.
[24] E. S. Elton, E. J. Gruber, and J. Gullerkin, “Expectations and share 1993.
prices,” Management Sci., vol. 27, pp. 974–987, 1981. [55] J. M. Patell and M. Wolfson, “Anticipated information releases reflected
[25] R. F. Engle and C. W. J. Granger, “Cointegartion and error correction: in call option prices,” J. Accounting and Economics, 1979.
Representation, estimation, and testing,” Econometrica, vol. 55, pp. [56] P. C. B. Philips and S. Ouliaris, “Testing for cointegration using
251–278. principal components methods,” J. Economic Dynamics Contr., vol. 12,
[26] F. J. Fabozzi, Bond markets: Analysis and strategies, Prentice Hall, 1989. pp. 205–230, 1988.
[27] E. E. Fama, “The behavior of stock market prices,” J. Business, vol. 38, [57] J. Poterba and L. H. Summers, “The persistence of volatility and stock
pp. 34–105, 1965. market fluctuations,” Amer. Economic Rev., 1986.
[28] E. F. Fama and K. R. French, “Size and book-to-market factors in [58] E. J. Pozo, “Instrumentos derivados sobre indices bursátiles negociados
earnings and returns,” J. Finance, vol. 50, no. 1, pp. 185–224, Mar. en mercados organizados,” Especial referencia al mercado MEFF.RV.
1995. Tesis Doctoral, UAM, Madrid.
[29] K. French, “Stock returns and the weekend effect,” J. Financial Eco- [59] A.-P. N. Refenes, “Methods for optimal network design,” in A.-P. N.
nomics, vol. 8, pp. 55–69, 1980. Refenes, Ed., Neural Networks in the Capital Markets. Chichester,
[30] K. French and R. Roll, “Stock return variances: The arrival of infor- U.K.: Wiley, 1995, pp. 33–54.
mation and the reaction of traders,” J. Financial Economics, vol. 8, pp. [60] A.-P. N. Refenes and M. Azema-Barac, “Neural-network applications
79–96, 1980. in financial asset management,” Neural Computing Applicat., vol. 2, pp.
[31] J. Friedman and W. Stützle, “Projection pursuit regression,” J. Amer. 13–39, 1994.
Statist. Assoc., vol. 76, pp. 817–823, 1981. [61] A-P. N. Refenes, M. F. Gonzales, and A. N. Burgess, “Intraday volatility
[32] W. K. H. Fung and D. A. Hsieh, “Empirical analysis of implied forecasting using neural networks, a comparative study with regression
volatility: Stocks, bonds, and currencies,” presented at the Financial models,” IJCIO, accepted 1996, to appear 1997.
Options Res. Center, Univ. Warwick, 1992. [62] A-P. Refenes and A. D. Zapranis, “Specification tests for neural net-
[33] R. P. Gorman and T. P. Sejnowski, “Analysis of hidden unites in a works,” J. Forecasting, London Business School, Dept. Decision Sci.,
layered network trained to classify sonar targets,” Neural Networks, Tech. Rep., submitted Apr. 1997.
vol. 1, pp. 75–89, 1988. [63] Risk/Finex, “From black-scholes to black holes: New frontiers in
[34] C. Graham, “The supermodel comes of age, New Angles,” Risk Mag., options,” Risk/Finex, pp. 64–67, 1992.
vol. 7, p. 1, 1994. [64] R. Roll, “A simple implicit measure of the effective bid-ask spread,” J.
[35] C. W. J. Granger and P. Newbokd, “Spurious regressions in economet- Finance, vol. 39, pp. 1127–1139, 1984.
rics,” J. Econometrics, vol. 2, pp. 111–120, 1974. [65] S. A. Ross, “The arbitrage pricing theory of capital asset pricing,” J.
[36] C. W. J. Granger and A. A. Weiss, “Time series analysis of error Economic Theory, vol. 13, pp. 341–360, 1976.
correction models,” in S. Karlin et al., Eds., Studies in Economet- [66] J. D. Sargan, “Wages and prices in the United Kingdom: A study
rics, Time Series, and Multivariate Statistics. New York: Academic, in econometric methodology,” in Econometric Analysis for National
1983. Economic Planning, P. E. Hart, et al., Eds. London: Butterworths,
[37] C. W. J. Granger, “Cointegrated variables and error correcting models,” 1964.
Univ. California, San Diego, discussion paper 83-13a, 1983. [67] S. Schaefer and E. Schwartz, “A two factor model of the term structure:
[38] , “Some properties of time series data and their use in econometric An approximate analytical solution,” J. Financial and Quantitative
model specification,” J. Econometrics, pp. 121–130, 1981. Anal., vol. 19, no. 4, Dec. 1984.
[39] W. Härdle, “Applied nonparametric regression,” Econometric Soc. [68] W. Schwertz, “Why does stock market volatility change over time?,”
Monographs, 1990. J. Finance, 1989.
[40] C. R. Harvey and R. E. Whaley, “S&P100 index option volatility,” J. [69] K. Shastri and K. Wthyavivorn, “The valuation of currency options for
Finance, vol. 46, pp. 1551–1561, 1991. alternate stochastic processes,” J. Financial Res., vol. 10, pp. 283–293,
[41] D. Heath, R. Jarrow, and A. Morton, “Bond pricing and the term 1987.
structure of interest rates: A new methodology,” Cornell Univ., working [70] W. Sharpe, “Capital asset prices: A theory of market equilibrium,” J.
paper, Aug. 1988. Finance, vol. 19, pp. 425–442, 1964.
[42] D. F. Hendry and T. Von Ungern-Sternberg, “Liquidity and inflation [71] A. M. Sheikh, “Transactions data tests of S&P100 call option pricing,”
effects on consumers expenditure,” in Essays in the Theory and Meas- J. Financial and Quantitative Anal., vol. 26, pp. 459–474, 1991.
REFENES et al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1267

[72] E. E. Stein and J. C. Stein, “Stock price distributions with stochastic A. Neil Burgess received the B.Sc. degree in Com-
volatility: An analytic approach,” Rev. Financial Res., vol. 4, pp. puter Science from Warwick University, U.K., in
727–752, 1987. 1989. He is currently pursuing the Ph.D. degree on
[73] R. Trippi, “A test of option market efficiency using a random walk the subject of modeling nonstationary systems using
valuation model,” J. Economics and Business, 1977. computational intelligence techniques.
[74] S. Weisberg, Applied Linear Regression. New York: Wiley, 1985. Subsequently, he worked in the Thorn-EMI Cen-
[75] R. C. Whaley, “Valuation of American call options on dividends paying tral Research Laboratories, applying computational
stocks: Empirical test,” J. Financial Economics, vol. 10, pp. 29–58, intelligence techniques to a range of problems in
1982. musical analysis, marketing, signal processing, and
[76] H. White, “Nonparametric estiamtion of conditional quantiles using financial forecasting. Since September 1993, he has
neural networks,” in Proc. 22nd Symp. Interface. New York: Springer- been a Research Fellow in the Decision Technology
Verlag, 1991, pp. 190–199. Center at London Business School, where he has published more than 20
[77] X. Xinxhong and S. Taylor, “Conditional volatility and the informational papers on the use of advanced techniques for financial modeling. His research
efficiency of the PHLX currency options markets,” J. Banking and interests neural networks, genetic algorithms, nonparametric statistics, and
Finance, vol. 6, pp. 237–248, 1993. cointegration.
[78] W. Härdle and T. Stouer, “Investigaring smooth multiple repression by
the method of average derivatives,” J. Amer. Statist. Assoc., vol. 84, pp.
986–995, 1989.
[79] T. Bollenslev, “Generalized autoregressive conditional hetero-
schedasticity,” J. Econometrics, vol. 33, pp. 307–327, 1986.

Yves Bentz received the M.Sc. degree in physics


Apostolos-Paul N. Refenes received the B.Sc. de- from Marseilles National School of Physics, France,
gree in mathematics from the University of North in 1991. Subsequently, he received the M.A. degree
London in 1984 and the Ph.D. degree in computing from Marseilles Business School, France, in 1993.
from the University of Reading, U.K., in 1987. He is currently pursuing the Ph.D. degree at London
He is Associate Professor of Decision Science and Business School on the identification of conditional
Director of the Computational Finance Program at factor sensitivities in the area of investment man-
London Business School. His current research inter- agement.
ests on neural networks and nonparametric statistics He is currently a Researcher in the department
include model identification, variable selection, tests of Investiment Strategy at Société Générale Asset
for neural model misspecification, and estimation Management, Paris, investigating the applications of
procedures. Applied work includes factor models for advanced modeling techniques to equity investment management. His present
equity investment, dynamic risk management, nonlinear cointegration, tactical interests include factor models based on adaptive and intelligent systems such
asset allocation, exchange risk management, etc. as the Kalman filter and neural networks.

Das könnte Ihnen auch gefallen