Optimal Probabilistic Predictions of Financial Returns

Optimal Probabilistic Predictions
of Financial Returns
Dimitrios D. Thomakos
and
Tao Wang
November 21, 2007

Preliminary
Abstract
We examine the relative optimality of sign predictions for financial returns, extending the
work of Christoffersen and Diebold (2006) on volatility dynamics and sign predictability. We
show that there is a more general decomposition of financial returns than that implied by the
sign decomposition and which depends on the choice of the threshold that defines direction.
We then show that the choice of the threshold matters and that a threshold of zero (leading to
sign predictions) is not necessarily optimal. We provide explicit conditions that allow for the
choice of a threshold that has maximum responsiveness to changes in volatility dynamics and
thus leads to optimal probabilistic predictions. Finally, we connect the evolution of volatility
to probabilistic predictions and show that the volatility ratio is the crucial variable in this
context. Our work strengthens the arguments in favor of accurate volatility measurement and
prediction, as volatility dynamics are integrated into the optimal threshold. We provide an
empirical illustration of our findings using monthly returns and realized volatility for the
S&P500 index.
An earlier version of the paper was presented at the Workshop Series of the Rimini Center for Economic
Analysis, Italy. We would like to thank the workshop participants and especially John Maheu, for useful comments
and suggestions. All remaining errors are ours. Computations were performed by the authors using R.
Department of Economics, University of Peloponnese, Greece, and Senior Fellow, Rimini Center for Economic
Analysis, Italy. Email: thomakos@uop.gr
Department of Economics, Queens College, City University of New York, USA. Email: tao.wang@qc.edu
Electronic copy available at: http://ssrn.com/abstract=1032944
Introduction
In the context of active risk management, portfolio choice and asset allocation, there is a very
strong interest in making accurate directional (probabilistic) predictions of financial returns.
Integral components of such strategies are accurate predictions of expected returns and return
volatilities. There is a vast literature surrounding these last two topics but there are relatively
fewer studies that deal with the problem of directional and probabilistic predictions. In this paper
we provide a number of novel results that relate to the relative optimality of such predictions
and show that, for an appropriately defined measure of optimality, there is an optimal choice
of a threshold value for returns that is different from zero and that depends on the underlying
return volatility. Sign predictions are thus non-optimal and one can do potentially better by
looking at other types of return decompositions.
Our work mainly extends the results of Christoffersen and Diebold (2006) on sign prediction.
On their paper they examine the effects of volatility dynamics and show that there is a strong
link between volatility dependence and sign dependence, when expected returns are non-zero.
Since expected returns at high frequencies are close to zero, sign dependence can be mostly found
at intermediate return horizons. In this paper we examine the relationship between volatility
dependence and probability forecasts for returns exceeding a threshold value ct+1|t , which nests
the work of Christoffersen and Diebold (2006). Since there is no need for the assumption of
nonzero expected returns, our approach is more flexible with regard to the forecasting horizon.
Moreover, our analysis suggests that there is an optimal value of the threshold, in the sense that
is maximizes the responsiveness of the probability forecast to changes in volatility dynamics. In
doing so we introduce a generalized sign decomposition and show how this decomposition can
justify the use of a variety of threshold values when making probabilistic predictions.
We then discuss the dynamics of the optimal threshold value and connect volatility evolution
to probabilistic predictions in a specific fashion. In doing so we find that the ratio of volatilities
plays an important role and uncover a positive relationship between the conditional probability
forecast for returns exceeding a given value and conditional asset return volatility. Standard
asset pricing models predict a positive equilibrium relationship between conditional first and
second moments (Merton 1973, Ferson and Harvey 1991). A positive relation is consistent
with a deterministic model where volatility dynamics drives return dynamics or it is consistent
with a stochastic model where conditional volatility dynamics drives conditional mean where
both unconditional mean and volatility could be constant. In empirical studies, however, the
relationship is inconclusive. French et al. (1987), Campbell and Hentschel (1992), and Ghysels
et al. (2005) found a positive relationship between return and volatilities while Campbell (1987),
2
Electronic copy available at: http://ssrn.com/abstract=1032944
Lettau and Ludvigson (2005), and Brandt and Kang (2004) found a negative relationship between
return and asset return volatilities. Our results suggest that when conditional volatility increases,
the conditional probability for returns exceeding a given value x would increase. Therefore, when
risk increases, it is more likely there will be higher returns but there is no certainty that high
returns would actually occur.
Finally, a brief list of recent related literature from the technical perspective of probabilistic
predictions, includes: Anatolyev and Gospodinov (2007), who consider joint modeling of signs
and absolute returns; Chung and Hong (2006) and Hong and Chung (2003), who consider testing
for directional predictability given a known arbitrary threshold; Taylor (2005) and Engle and
Manganelli (2004), who consider quantile predictions; Fleming et al. (2001, 2003), who consider
volatility timing and dynamic portfolio adjustment. Other references that deal with the economic
content of probabilistic and directional predictions include Thomakos, Wang and Wu (2007),
who consider asset rotation strategies based on sign predictions; Granger and Pesaran (1999)
and Leitch and Tanner (1991), who claim that directional measures are potentially appropriate
in the context of economic welfare maximization.
The rest of the paper is organized as follows. In section 2 we set-up the mathematical
framework and present our generalized sign decomposition that motivates the use of different
threshold values for defining direction of returns. In section 3 we show why sign predictions need
not be optimal, obtain an explicit expression for the optimal threshold value that maximizes
predictive response and show how volatility evolution is related to probabilistic predictions via the
volatility ratio. In section 4 we discuss the empirical implications of our findings and estimation
issues. In section 5 we illustrate our results using monthly returns and realized volatility for the
S&P500 index. We offer some concluding remarks in section 6.
A Generalized Sign Decomposition for Returns

def
def
2
=
Let Rt+1 denote a return series with conditional mean t+1|t = E [Rt+1 |t ] and volatility t+1|t
Var [Rt+1 |t ], t being the information set on and before period t. All subsequent analysis is
conditional on t which we omit for clarity of exposition. Given the standard representation for
the returns Rt+1 = t+1|t + t+1|t Zt+1 we define the (conditional) standardized returns by:
def
Zt+1 |t Zt+1 = (Rt+1 t+1|t )/t+1|t
(1)
and denote possible values for standardized returns by zt+1 . In what follows, and as in Christofdef
fersen and Diebold (2006), we will call the ratio It = t+1|t /t+1|t the information ratio. We
3
assume that the standardized return series Zt+1 consists of independent and identically distributed (i.i.d.) random variables and denote by F () and f () the cumulative and probability
density function of the standardized returns. The i.i.d. assumption can be relaxed without much
additional cost but we will use it as a first approximation. We can also make the empirically
viable assumption that the density function f () is symmetric but our results hold for asymmetric
distributions as well. Our main focus is in making conditional probabilistic predictions about
future returns of the form:
def
p(ct+1|t ) = P Rt+1 > ct+1|t P [Zt+1 > zt+1 ] = 1 F (zt+1 )
(2)
def
where we define zt+1 = (ct+1|t t+1|t )/t+1|t . An immediate relationship that we can obtain is between ct+1|t and the quantiles of the standardized returns distribution. Suppose that
p(ct+t|t ) = 1 p , for some value p [0, 1], and note that zt+1 = F 1 (p ) Q(p ) and is equal
to the p quantile of the distribution. Solving for ct+1|t we obtain that:
ct+1|t = t+1|t + Q(p )t+1|t
(3)
and, therefore, the threshold for making probabilistic predictions is a function of both the mean
and the volatility of the returns and its exact value depends on the choice of the quantile of
the standardized returns distribution. While this looks familiar to the Value-at-Risk (VaR)
framework we stress that the situation here is different since neither the probability p (and thus
Q(p )) nor the threshold value ct+1|t is known.
To motivate the use of a more general threshold value, different from zero, consider the following quotations from Christoffersen and Diebold (2006) and Hong and Chung (2003) respectively:
Other explorations may also prove interesting. One example is generation of probability forecasts for future returns exceeding any given value ct+1|t or percentile ...
and
To sum up, (i) when the market is not efficient (i.e., there exists serial dependence
in conditional mean), the direction of returns with any threshold ct+1|t is generally
predictable using past returns. (ii) When the market is efficient but there exists
serial dependence in such higher order conditional moments as skewness and kurtosis,
the direction of returns with any threshold ct+1|t is also predictable using t . (iii)
When the market is efficient and serial dependence is completely characterized by
volatility clustering, the direction of returns is predictable using t except for threshold
ct+1|t = . As long as ct+1|t 6= , volatility clustering is a driving force for directional
predictability.
4
We see that there is clearly an interest in probabilistic predictions beyond the sign of the returns
and also that the conditions that allow for directional predictability relate to the efficiency of
the market and the underlying moments of the return distribution, with particular emphasis in
return volatility. A number of interesting questions arise:
What should be the value of the threshold ct+1|t ?
If higher moment dynamics are the source of directional predictability, is there a threshold
that somehow optimizes the response of probabilistic predictions with respect to these
moment dynamics?
How should we look at these moment dynamics?
Christoffersen and Diebold (2006) partly motivate sign predictability using the sign decomposition of returns. If the threshold value ct+1|t is equal to zero then we can always write:
Rt+1 = sign(Rt+1 ) |Rt+1 |
= [I(Rt+1 > 0) I(Rt+1 0)] |Rt+1 |
(4)
where sign(x) is the sign function and I(A) is the indicator function. The signs of the returns
can be predictable (see the quotations) and the literature generally agrees that absolute values
are also predictable. Even if the components of these decomposition are assumed predictable
their product may not be (and usually is not) predictable.
The above decomposition can easily be generalized so as to account for threshold values other
than zero and thus to motivate more general probabilistic predictions and a discussion for the
optimal choice of ct+1|t . To this end, note that for any threshold value and for any two values
1
2
rt+1
and rt+1
for the returns we can write:
1
2
Rt+1 I(Rt+1 > ct+1|t )rt+1
+ I(Rt+1 ct+1|t )rt+1
(5)
Doing some algebra with the above formulation we can express the returns as:
Rt+1 = S(ct+1|t ) |Rt+1 |
(6)
where S(ct+1|t ) is a generalized sign function that is defined as a function of both the threshold
value and its sign as well as the sign of the returns. Specifically we have that S(ct+1|t ) can be
broken into two components, similarly to the sign function of equation (4), as:
def
S(ct+1|t ) = S1 (ct+1|t ) + S2 (ct+1|t )
(7)
where the components Si (ct+1|t ), i = 1, 2 are given by:
I(Rt+1 > ct+1|t ) I(ct+1|t > 0) + I(ct+1|t 0)sign(Rt+1 )
def
S2 (ct+1|t ) = I(Rt+1 ct+1|t ) I(ct+1|t 0) I(ct+1|t > 0)sign(Rt+1 )
def
S1 (ct+1|t ) =
(8)
The above decomposition motivates us to search for a threshold value for ct+1|t that has some
optimality characteristics.
3
3.1
Optimal Probabilistic Predictions

Maximizing Predictive Response
Christoffersen and Diebold (2006) analyze sign predictability using the response function of
the probability of positive returns. This function is simply the derivative of the probability in
equation (2) with respect to changes in volatility. Here, we generalize this response function to
account for changes in the conditional mean as well as the conditional variance and define the
matrix response function as follows:
def
R(ct+1|t ) =
p(ct+1|t )
zt+1
= f (zt+1 )
t+1|t
t+1|t
(9)
where t+1|t = (t+1|t , t+1|t )0 . This is a (2 2) matrix that contains the response of the
def
probabilistic prediction of future returns changes in the value of the conditional mean and the
volatility. Straightforward calculations give us the two partial and one cross-partial derivatives
as:
p(ct+1|t )
f (zt+1 )
=
t+1|t
t+1|t
(10)
for changes in the mean;

p(ct+1|t )
p(ct+1|t )
zt+1
= f (zt+1 )
zt+1
t+1|t
t+1|t
t+1|t
(11)
for changes in the volatility;

2
p(ct+1|t )
f (zt+1 )(zt+1
1)
=
2
t+1|t t+1|t
t+1|t
(12)
for changes in both. We immediately notice that the response of volatility is a function of the
response of the mean but not vice versa and its sign depends on the choice of the threshold value
relative to the conditional mean. It is also straightforward to see how the probabilistic prediction
changes as the cut-off value ct+1|t alone changes. We obtain:
p(ct+1|t )
f (zt+1 )
=
ct+1|t
t+1|t
(13)
6
0.0
0.2
0.1
Responsiveness
0.1
0.2
Responsiveness Function wrt Volatility symmetric distributions
N(0,1)
t(2)
5
Values of z
Figure 1: Implied threshold values ct+1|t and responsiveness - N (0, 1) and t(2) distributions
so that changes in the cut-off value are inversely related to volatility.

The easiest way to explore the effects of the choice of the threshold on the response function
is graphically through examples of distributions for the standardized returns. We consider both
symmetric and asymmetric distributions, to see how the value of the threshold changes based on
the degree of asymmetry of the underlying distribution, and concentrate on volatility dynamics.
In Figure 1 we plot the response function of equation (11) for various values of zt+1 that correspond to different choices of ct+1|t . We use the standard normal and Students t(2) distributions
for this graph. We can see that:
The response is symmetric around ct+1|t = t+1|t , nonlinear and a function of the kurtosis
of the underlying density.
The maximum responsiveness is obtained for zt+1 = 1 that corresponds to optimal
values of ct+1|t = t+1|t t+1|t ; therefore:
For a given symmetric distribution, like the standard normal or the t-distribution, the
required quantile is given by Q(p ) = 1.
The optimal threshold value is not zero and sign predictions are, by this definition
of optimality, suboptimal.
Sign prediction maximizes responsiveness when the information ratio is close to unity
but only if t+1|t 6= 0.
7
1.0
Probability as a function of various thresholds symmetric distibutions
N(0,1)
0.0
0.2
0.4
Probability
0.6
0.8
t(2)
Values of z
Figure 2: Probability as a function of the threshold - N (0, 1) and t(2) distributions
A choice of the threshold equal to the conditional mean (the point of symmetry of the
response function) is clearly sub-optimal.
Leptokurtic distributions are likely to have less responsiveness around the value of the
optimal threshold but retain more of their responsiveness for values away from the threshold
(when compared to the standard normal distribution). In the case of underlying normality
of standardized returns it becomes more important to use the optimal threshold.
In Figure 2 we plot the probabilities of equation (2), for various values of zt+1 , and mark the
ones that correspond to the optimal threshold values of 1 for the two symmetric distributions
of Figure 1. These probabilities are given by 1 F (1) and are equal to 16% and 84% for the
N (0, 1) distribution and 21% and 79% for the t(2) distribution, for +1 and 1 respectively.
Similarly, in Figure 3 we plot the response function of equation (11) for two asymmetric
distributions, the skew Generalized Error (GED) and skew t distributions. We can supplement
our comments above with the following:
Deviations from symmetry alter the previous optimal threshold - the corresponding quantiles can differ from unity.
The choice of the threshold representation matters (i.e. whether one adds or subtracts
Q(p )t+1|t from the conditional mean):
8
0.1
0.3
0.2
Responsiveness
0.0
0.1
0.2
Responsiveness Function wrt volatility skewed distributions
Skew GED,c* = 0.96,1.08

Skew t,c* = 0.77,0.85
4
Values of z
Figure 3: Implied threshold values ct+1|t and responsiveness - Skew GED and t distributions
The choice of ct+1|t = t+1|t Q(p )t+1|t leads to larger responsiveness but it deteriorates faster as we move away from the optimal value.
The choice of ct+1|t = t+1|t + Q(p )t+1|t leads to smaller responsiveness but is more
robust to deviation from the optimal value.
Under an asymmetric distribution it is more likely that sign prediction will maximize
responsiveness when It+1 is close to Q(p ) but again if t+1|t 6= 0!
A question that arises here is whether the above observations on the optimal predictive
response of returns on probability dynamics are possibly an artifact of the function chosen to
represent optimality, namely the response function of equation (11). We now illustrate that
our findings should be independent of the measure used to gauge optimality. To this end,
def
let Yt+1 (ct+1|t ) Yt+1 = I(Rt+1 > ct+1|t ) be the actual outcome of a probabilistic prediction, a
discrete binary random variable with conditional mean given by p(ct+1|t ). We can use a standard
measure, such as the mean squared error, for Yt+1 to examine the effects on the choice of the
threshold. The mean squared error for Yt+1 is given by:
def
M SE(ct+1|t ) = p(ct+1|t ) 1 p(ct+1|t )
(14)
and is well known that it attains its maximum value when p(ct+1|t ) = 0.5, irrespective of the
choice of the density. Therefore, the M SE of directional predictions depends on the choice of
the threshold and, as the next figure illustrates, sign predictions are good performers only for a
9
0.25
Mean Squared Error for different thresholds
Optimal c
c = 0 & Inf. ratio = 1.10
0.00
0.05
0.10
MSE
0.15
0.20
c = 0 & Inf. Ratio = 0.9
0.0
0.2
0.4
0.6
0.8
1.0
Probability of exceeding threshold
Figure 4: The effect of the threshold and information ratio on the M SE - N (0, 1) distribution
certain range of values for the information ratio. However, these values may not be empirically
plausible. We illustrate the above in Figure 4 where we plot the M SE (based on the standard
normal density) and mark the M SE values based on the optimal threshold as defined before
and on two different values for the information ratio (10% above and below unity). First, it
is clearly futile to perform sign predictions when the conditional mean is zero. However, even
when the conditional mean is not zero, it is only in rare occasions that we will have available an
information ratio high enough to beat the optimal threshold in M SE terms. In Figure 4 we
see that making a sign prediction with an information ratio of 1.10 yields a 12.7% decrease in
M SE, compared to the optimal threshold; with an information ratio of 0.90 the M SE increase
of a sign prediction, compared to the optimal threshold, is 11.4%. What is crucial here is the
plausibility of having information ratios that exceed unity and how often that happens. If most
of the time the information ratio is below unity then using sign predictions is not going to be
very productive. However, even with medium term returns we cannot know when and if we are
going to get high information ratios: such a case is illustrated by our empirical application in
section 5.
3.2
Volatility/Threshold Evolution and Probabilistic Response
When we consider ct+1|t and make the connection with the generic formula for ct+1|t = t+1|t +
Q(p )t+1|t we see that the optimal threshold value corresponds to a particular quantile of the
10
standardized returns distribution, that is Q(p ). This distribution is, of course, unknown and it
would have to be fitted from historical observations. Then the probability associated with ct+1|t ,
namely p(ct+1|t ), can be computed. However, if the assumption of if i.i.d. standardized returns is
valid then this probability would be identical for all time periods. What might be more interesting
is to see if there is a connection between the evolution of the optimal threshold value and the
associated probabilistic predictions - the optimal threshold is, after all, dynamic. For example,
we can think that in a practical application one will not be interested only about the probability
of future returns exceeding a particular threshold that depends on future volatility: one will also
be interested in the probability of future returns exceeding the realized threshold that depends
def
b t|t as the realized threshold with
on todays volatility. To make this specific, define ct|t = t|t Q
b Q(p
b ) being estimated from historical observations. The associated probabilistic prediction
Q
would then correspond to:
p(ct|t )
h
i
= P Rt+1 > ct|t = 1 F (qt+1|t )
def
(15)
h
i
def
b t|t /t+1|t is essentially the ratio of current to future volatilwhere qt+1|t = (t+1|t t|t ) Q
ities, adjusted by the difference between expected and realized returns. How is the above probabilistic prediction affected by volatility dynamics? Straightforward computations yield the
following responses:
p(ct|t )
t+1|t
= f (qt+1|t )
b t|t
(t+1|t t|t ) Q
2
t+1|t
(16)
As before we are interested to see at what levels of future volatility do we get the largest response
b = 1 and
in the probabilistic prediction. Normalizing current volatility to t|t = 1, taking Q
assuming a slowly varying conditional mean t+1|t t|t , we can see where this happens by
plotting the response function of equation (16) against a range of values for t+1|t . We do
this in Figure 5 for the two symmetric distributions we used in Figures 1 and 2, the N (0, 1)
and the t(2) . We first note that maximum responsiveness is obtained when future volatility is
falling, by 30% for the N (0, 1) distribution (t+1|t = 0.71) and by 50% for the t(2) distribution
(t+1|t = 0.5). These values define a useful range where volatility changes can increase the
predictive response of the probability in equation (15). As volatility rises above or falls below
these values the responsiveness starts decreasing. For example, for the N (0, 1) distribution the
maximum responsiveness at t+1|t = 0.71 is -0.29 while for t+1|t = 1 is -0.25 and for t+1|t = 1.30
is -0.17; we see that there is a substantial drop in responsiveness as volatility increases by 30%,
compared to the case when volatility decreases by the same percentage.
In Figure 6 we plot the corresponding probabilities for exceeding the realized threshold based
on equation (15). Here we can clearly see a risk-return trade-off relationship. As volatility
11
0.15
0.20
0.30
0.25
Responsiveness
0.10
0.05
0.00
Responsiveness function wrt future volatility realized threshold
0.35
N(0,1)
t(2)
0
10
sigma(t+1|t)
Figure 5: The response function using the realized threshold - N (0, 1) and t(2) distributions
0.2
0.1
Probability
0.3
0.4
Probability of exceeding realized threshold
0.0
N(0,1)
t(2)
0
10
sigma(t+1|t)
Figure 6: Probability of exceeding the realized threshold ct|t - N (0, 1) and t(2) distributions
12
sigma(t|t) = 0.2
1.0
1.0
sigma(t|t) = 0.1
sigma(t+1|t)=sigma(t|t)
0.8
sigma(t+1|t)=0.7sigma(t|t)
0.4
0.6
0.0
0.0
0.2
0.4
Probability
0.6
0.2
Probability
0.8
0.3
0.2
0.1
0.0
0.1
0.2
0.3
0.3
0.2
Values of mu(t+1|t)
0.1
0.2
0.3
0.2
0.3
0.8
0.6
0.2
0.0
0.0
0.2
0.4
Probability
0.6
0.4
0.8
0.0
sigma(t|t) = 0.8
1.0
1.0
sigma(t|t) = 0.5
Probability
0.1
Values of mu(t+1|t)
0.3
0.2
0.1
0.0
0.1
0.2
0.3
0.3
Values of mu(t+1|t)
0.2
0.1
0.0
0.1
Values of mu(t+1|t)
Figure 7: Effect of an increase in t+1|t in the probability of exceeding the realized threshold ct|t
- N (0, 1) distribution
increases the probability of exceeding the realized threshold increases - for the N (0, 1) distribution
the relevant probabilities for t+1|t = {0.71, 1, 1.30} are 8%, 16% and 22% respectively. Note
that the probability cannot, by construction, exceed 50%.
Finally, we note that we can also examine the effect that a change in the expected return
has on p(ct|t ). Unlike the case of future volatility, here the sign of the corresponding derivative
is unambiguous and positive as we have that:
p(ct|t )
t+1|t
f (qt+1|t )
t+1|t
(17)
which implies that higher expected returns, which raise ct+1|t , will always increase the probability
of exceeding the realized threshold, irrespective of the path of future volatility. In Figure 7 we plot
the probability of equation (15) against a range of values for t+1|t . We normalize t|t = 0 take
b = 1 and use the N (0, 1) distribution. In each panel in the plot we present the probabilities for
Q
a range of values for t+1|t from -30% to +30%, for a specific value of t|t and for three alternative
values for t+1|t = t|t , for = 0.7, 1, 1.3. In this way we can see the effect of rising and falling
future volatility with respect to current volatility. Note that now qt+1| = (t+1|t t|t )/t+1|t
and the lines for the three probabilities will intersect at the point where t+1|t = t|t .
We see that the main characteristic in all four panels in Figure 7 is that the lines corresponding
to higher future volatility (blue) are always on top of the other two lines (red and green) until
the point where they intersect. Again we have that higher future volatility is associated with a
13
higher probability for future returns to exceed the realized threshold. However, once the expected
return t+1|t exceeds t|t the probability is inversely related to future volatility - this can be seen
clearly in the top panels of the figure. We can also observe that the higher t|t is the lower is the
probability of future returns exceeding the realized threshold and volatility dynamics matters
less in this case. This can be seen by comparing the upper left with the lower right panels: in the
upper left panel we have a much larger distance between the lines that correspond to volatility
dynamics (blue and green) where volatility either rises or falls with the line where future volatility
stays the same with its realized value; this distance is clearly reduced when we consider the lower
right panel of the figure where current volatility is eight times larger than the current volatility
in the upper left panel.
Empirical Implications, Estimation and Prediction
Our previous discussion has a number of interesting empirical implications. It provides us with
a multi-step approach for working with probabilistic predictions. Let us assume that we have
a time series of return observations and a corresponding one for volatility (e.g. any realized
T
volatility measure) and denote them by Rt , Vt2 t=1 respectively.
bt+1|t for the con1. Test for serial correlation in the returns and fit an appropriate model
ditional mean - in most cases
bt+1|t =
b, the global mean, which is frequently zero. We
assume a constant conditional mean for simplicity in what follows.
bt def
2. Compute the standardized returns Z
= (Rt
b)/Vt and fit an appropriate distribution to
b
them. Plot the response function of equation (11) to locate the optimal quantile Q.
3. Based on the distributional approximation compute the probability that future returns will
exceed the optimal threshold ct+1|t . Note that a forecast of future volatility is not needed
b for example, in the N (0, 1) or any
for this: the required probability is given by 1 F (Q);
symmetric distribution this will be given by 1 F (1). This probability can be computed
using the underlying theoretical distribution or using the sample empirical distribution, see
below.
def
bt > Q),
b then use any available statistic
4. Compute the binary indicator variable Yt = I(Z
that can test for serial dependence (and hence predictability) of the discrete series Yt .
5. For making dynamic directional predictions using the realized threshold ct|t compute the
def
b t /Vt+1 for t = 1, . . . , T 1 for the in-sample
probability 1 F (qt+1|t ) where qt+1|t = QV
14
observations. For out-of-sample observations a forecast VbT +1|T of volatility is required in

def
b T /VbT +1|T .
which case the probability will be computed using qbT +1|T = QV
4.1
Estimation and Prediction of Probabilities
There are various methods by which we can estimate the required probabilities, either p(ct+1|t )
or p(ct|t ), including the empirical distribution function of the returns, the empirical distribution
function of standardized returns or a parametric/expansion-based approximation. All these have
been used in previous work by Christoffersen and Diebold (2006) and Christoffersen, et al. (2007).
The key ingredients in the application of these methods are appropriate historical volatility and
volatility forecasts. In the discussion that follows we assume that these volatility measures are
available and concentrate on methods for the estimation and prediction of probabilities.
4.1.1
Estimating the Empirical Distribution Function
The empirical distribution function of the returns FbR (r) is nothing more than a sample proportion
of observations less than than a chosen evaluation value r and is defined as:
def
FbR (r) = T 1
T
X
I(Rt r)
(18)
t=1
This is a relatively crude estimator, although it enjoys many optimal properties as an estimator
of the underlying cumulative distribution function F (r). Notice that this estimator does not use
information about historical volatility, only about current or predicted volatility depending on
the choice of the cut-off value. In addition, the standard errors for this estimator are complicated
by the possible dependence in the returns.1 A way to introduce both historical volatility and
volatility forecasts and to work in an i.i.d. framework is to consider, as in Christoffersen and
Diebold (2006), Christoffersen, et al. (2007) and our previous discussion, realized standardized
bt . The standardization is performed using historical volatility while the chosen evalureturns Z
ation value z can depend on either current or predicted volatility. The relevant estimator now
becomes:
def
FbZ (z) = T 1
T
X
I(Zbt z)
(19)
t=1
For the i.i.d. case it is well known that the empirical distribution function is the nonparametric maximum likelihood estimator of the distribution function and is unbiased, consistent and
If the returns are dependent then estimation of the standard errors for FbR (r) would require estimation of the
T
X
def
joint empirical distribution function, say Fbk (r, r) = T 1
I(Rt r Rt+k r).
1
t=1
15
asymptotically normally distributed as:

i
h
T FbZ (z) F (z) N (0, Z2 (z))
(20)
def
where Z2 (z) = F (z) [1 F (z)] is the variance of the estimator.

An easy improvement in the above estimator FbZ (z) can be effected by taking advantage of the
possible symmetry of the distribution of standardized returns. Using a symmetrized estimator
FbS (z), see Shuster (1973) and Modarres (2002), for standardized returns one can have a gain in
efficiency, essentially translated in smaller standard errors, relative to the empirical distribution
function FbZ (z). The generic form of the symmetrized estimator is given by:
h
i
def
FbS (z) = 0.5 FbZ (z) + 1 FbZ (z)
(21)
and Modarres (2002) shows that there are explicit expressions based on the counts of observations
and on whether z > 0 or z < 0. Again, in an i.i.d. framework the symmetrized estimator is
unbiased, consistent and asymptotically normally distributed as:
i
h
T FbS (z) F (z) N (0, S2 (z))
,
T
(22)
def
where S2 (z) = 0.5F (|z|) [1 2F (|z|)] is the variance of the estimator.

Another straightforward way of improving the estimator FbZ (z) of the distribution function
is to smooth it, akin to the smoothing that leads from a histogram to a density estimator. This
smoothing can done in a variety of ways, the most popular being kernel methods. However here
we do not adapt kernel methods but a different approach that is based on direct smoothing of
the empirical distribution function. This approach, based on the use of Bernstein polynomials
as smoothing coefficients, has a number of advantages including simplicity of computation and
optimal asymptotic properties that hold both for independent and dependent observations. We
provide a brief description of this smoothed estimator below; for further details see Babu, Canty
and Chaubey (2002). The application of Bernstein polynomials requires a rescaling of the observations to have support on the unit interval. Suitable transformations, of the form (z) 7 w,
for standardized returns can be based on either a compact support of the form [a, b] for some
constants a < b or the whole real line (, +) and are given by:
(z) =
za
ba
(z) = 0.5 +
1
tan1 z
(23)
Using either of the above transformations for the observations (Zbt ) 7 Wt and (z) 7 w and
T
X
def
I(Wt w), the smoothed estimator is given by:
letting FbW (w) = T 1
t=1
def
FbB (z) FbB (w) =
m
X
FbW (k/m)bk (m, w)
k=0
16
k = 0, . . . , m
(24)
where the weights bk (m, w) are given by the Bernstein polynomials:
m
def
wk (1 w)mk
bk (m, w) =
k
(25)
and m is the smoothing coefficient. Note that the estimator FbB (z) is a polynomial and thus has
all its derivatives. The choice of m is based on asymptotic considerations (strong consistency
and proximity of the smooth estimator to the empirical distribution function) and can be shown
that is obeys the lower bound m T 2/3 for both independent and dependent observations.
However, Babu, Canty and Chaubey (2002) do not provide explicit expressions for the mean
and the variance of their estimator. It is straightforward to derive them and we do so in the
appendix for the i.i.d. case. Finally, note that all variance estimators given in this section can be
consistently estimated by their sample counterparts, that is by the estimators of the empirical
distribution function.
4.1.2
Other Methods for Estimating Probabilities
There is a variety of other methods that one can potentially employ for the estimation of probabilities. These can be expansion-based, parametric models (e.g. logit-type models) or nonparametric models. The main distinction of these methods versus the use of the empirical
distribution function is that they explicitly account for the presence of conditioning variables in
the information set t . Here we motivate the use of logit-type models and also review the case
of non-parametric estimation of a conditional distribution function, as being closely related to
the estimation of the (unconditional) empirical distribution function discussed in the previous
sections. Assume that the information set depends, in addition to the returns, on a single condef
ditioning variable say Xt . That is t = {Rt , Xt , Rt1 , Xt1 , . . . }. We are interested in making a
prediction while explicitly taking into account the evolution of Xt . Therefore, we would like to
have an estimate of the conditional distribution function F (z|Xt = x).
Consider the probability given in equation (15), that of future returns exceeding the realized
threshold. For the case we are examining here, where t+1|t = , we can consider the following
b = 1):
expansion around a fixed value (say the optimal value Q
p(ct|t ) 1 f (1)(qt+1|t 1) f 0 (1)(qt+1|t 1)2
(26)
2
0 + 1 qt+1|t + 2 qt+1|t
where 0 = 1 + f (1) f 0 (1), 1 = 1 + 2f 0 (1) and 2 = f 0 (1). For the binary variable
def
def
def
Yt+1 = I(Rt+1 > ct|t ) we have that E [Yt+1 |t ] = p(ct|t ) and therefore one can use a logit-type
def
17
model to compute the relevant probability using the volatility ratio and its square as explanatory
variables.
For non-parametric estimation, an appropriate kernel-based estimator is the weighted NadarayaWatson estimator, see for example and Cai (2002) and references therein, that is given by:
def
FbZ (z|Xt = x) =
PT
b
t=1 pt (x)Kh (x Xt )I(Zt
PT
t=1 pt (x)Kh (x Xt )
z)
(27)
def
where Kh (x) = K(x/h)/h is a normalized kernel function and h > 0 is the smoothing parameter
(bandwidth). The optimal weights pt (x) are functions of the evaluating value x and are timevarying. Following Cai (2002), they can be uniquely defined as solutions to the following empirical
likelihood maximization problem:
{pt (x)}Tt=1
def
= argmax
pt (x)
subject to the constraints
T
X
log pt (x)
(28)
t=1
PT
t=1 pt (x)
= 1 and
PT
t=1 (x
Xt )pt (x)Kh (x Xt ) = 0. The optimal
weights are functions of the Lagrange multiplier used in solving the maximization problem
and are given by:
pt (x) = T 1 {1 + (x Xt )Kh (x Xt )}1
(29)
while the Lagrange multiplier itself is obtained as a solution to the following (nonlinear) maximization problem:
= argmax(T h)1
def
T
X
log {1 + (x Xt )Kh (x Xt )}
(30)
t=1
The resulting estimator FbZ (z|Xt = x) has several performance advantages: its has automatic
good behavior at the boundaries, it is guaranteed to lie between 0 and 1 and is monotone in
z. The bandwidth selection can be based on any of the generalized cross-validation procedures,
including the nonparametric Akaike information criterion (AIC) proposed in Cai and Tiwari
(2000).
Empirical Illustration
In this section we give a brief illustration of the empirical application of the results presented
above. We use monthly data for the S&P500 from December 1969 to May 2007, for a total
def
of T = 448 observations. We calculate monthly returns Rt and daily realized volatility Vt2 =
18
V(t)^2
0.0
0.2
0.1
100
200
300
400
200
300
400
S&P500 Monthly realized std. deviation
S&P500 Monthly standardized returns
Z(t)
0.15
0.20
0.25
Index
0.05
0.10
V(t)
100
Index
R(t)
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07
S&P500 Monthly realized volatility
0.1
S&P500 Monthly returns
100
200
300
400
Index
100
200
300
400
Index
Figure 8: Monthly data on the S&P500
m
X
2
Rt,i
, where Rt,i is the daily return of the ith day in month t. See Andersen et al. (2007) for
i=1
a review of methods on estimation and forecasting with realized volatility measures.

In Figures 8 and 9 we present the data and their autocorrelation functions (ACF). There
is no significance presence of autocorrelation in the returns while the presence of dynamics is
evident for the volatility series. As noted in the previous section, in the absence of conditional
bt = (Rt
mean dynamics we compute the standardized returns as Z
b)/Vt with
b being the
unconditional mean. Descriptive statistics are provided in Table 1 for the three series Rt , Vt and
bt . The statistics for the standardized returns support the possibility of an underlying symmetric
Z
marginal distribution compatible with normality. In the table we present the Cramer-Von Misses
test for normality but the results from two additional tests, (Kolmogorov-Smirnov and ShapiroWilk) that are not reported here, agree with it. Therefore we proceed as if the underlying
distribution of the standardized returns is Gaussian.
The absence of autocorrelation coupled with normality suggests that the standardized returns
may be i.i.d. (although there is some serial correlation in the squares of the series) - there are
however more powerful and specialized tests for checking the i.i.d. hypothesis, see for example
Hong and White (2005). Here we compute a simple test of independence versus a 1st order
bt > Q)
b for various
Markov alternative as in Christoffersen (1998) based on binary series Yt = I(Z
b Specifically we consider the three binary series Yt = I(Z
bt > 0), Yt = I(Z
bt > 1) and
values of Q.
Yt = I(Zbt < 1). The results in Table 2 are interesting as they show that the only value that
19
0.8
0.6
0.2
0.0
0
10
15
20
25
10
15
20
25
S&P500 Monthly realized std. deviation
S&P500 Monthly standardized returns
0.8
0.6
0.4
0.0
0.0
0.2
0.4
ACF
0.6
0.8
1.0
Lag
1.0
Lag
0.2
ACF
0.4
ACF
0.4
0.0
0.2
ACF
0.6
0.8
1.0
S&P500 Monthly realized volatility
1.0
S&P500 Monthly returns
10
15
20
25
Lag
10
15
20
25
Lag
Figure 9: Autocorrelations for the data in Figure 8
b = 1. Note
splits the standardized returns space into a possibly predictable binary sequence is Q
the absence of symmetry in these results which makes them potentially useful: looking for excess
positive returns is predictable while looking for excess negative returns (with the symmetric
b = 1.
threshold) is not. In the rest of our computations we focus on the case of Q
We now look into the time-varying probability of exceeding the realized threshold b
ct|t =
b +Vt ,
the probability now being dependent on the volatility ratio qt+1|t = Vt /Vt+1 . In Figure 10 we plot
qt+1|t , its autocorrelation and the corresponding probability of exceeding 1 F (qt+1|t ), based on
the normal approximation. We see that the volatility ratio exhibits negative and significant first
order serial correlation (over -0.4) and is therefore possibly predictable. The probability declines
as the ratio increases - note that a higher ratio corresponds to declining volatility and therefore
as volatility increases so does the probability of future returns exceeding the realized threshold.
We next perform a small forecasting exercise. We use a rolling window of 300 observations
and compute true out-of-sample volatility predictions Vbt+1|t based on a simple autoregressive
model of the form
log Vt+1 = 0 +
p
X
j log Vt+1j + et
(31)
j=1
with the order of the autoregression selected by the AIC criterion. We use both the actual and
predicted volatility ratios qt+1|t and qbt+1|t to compute empirical probabilities of exceeding the
optimal and realized thresholds. We follow the methods presented in the previous section: we
compute probabilities based on the smoothed empirical distribution function using Bernstein
20
0.5 1.0 1.5 2.0 2.5 3.0
V(t)/V(t+1)
Volatility ratio
100
200
300
400
Index
0.0 0.2 0.4 0.6 0.8 1.0

0.4
ACF
ACF of volatility ratio
10
15
20
Lag
0.2
0.0
0.1
Probability
0.3
0.4
Probability vs. Volatility Ratio
0.5
1.0
1.5
2.0
2.5
3.0
Volatility Ratio
Figure 10: Volatility ratio, its autocorrelation function and the probability of exceeding qt+1|t
50
100
150
actual ratio
forecasted ratio
50
100
150
Cprobability of exceeding optimal threshold
Cprobability of exceeding realized threshold
Probability
0.20
0.19
0.18
0
50
100
150
0.05 0.10 0.15 0.20 0.25 0.30 0.35
Index
0.21
Index
0.17
Probability
0.05 0.10 0.15 0.20 0.25 0.30 0.35
Probability of exceeding realized threshold
Probability
0.170 0.175 0.180 0.185 0.190 0.195
Probability
Probability of exceeding optimal threshold
actual ratio
forecasted ratio
Index
50
100
150
Index
Figure 11: Predicted probabilities of exceeding optimal and realized thresholds
21
0.0 0.2 0.4 0.6 0.8 1.0
ACF
ACF of probability of exceeding optimal threshold
10
20
30
40
50
60
Lag
0.0 0.2 0.4 0.6 0.8 1.0
ACF
ACF of conditional probability of exceeding optimal threshold
10
20
30
40
50
60
Lag
Figure 12: Autocorrelation of probabilities of exceeding optimal threshold
polynomials as in equation (24) and conditional probabilities as in equation (26) - we use the
volatility as the conditioning variable in the latter case.
The results are presented in Figures 11, 12, 13 and 14. In Figure 11 we present the computed
probabilities for exceeding the optimal and realized thresholds both conditionally and unconditionally. The two left panels have the probabilities of exceeding the optimal threshold and we
note that they exhibit a similar shape and high persistence around the theoretical value of 16%.
The range of values is higher for the conditional probability, ranging from about 17% to 21%.
The strong persistence can be seen in Figure 12 where we plot the ACF of the series in the left
panel in Figure 11.
The two right panels of Figure 11 have the probabilities of exceeding the realized threshold, using both qt+1|t and qbt+1|t for comparison. Their appearance is very different from the
corresponding probabilities using the optimal threshold and we note that they exhibit lower
persistence and signs of cyclicality (especially when qt+1|t is used). We can see both their autocorrelation and cyclical features series in Figure 13 where we plot ACF of these series, for the
case we use qt+1|t - the results for qbt+1|t are similar.
Finally, in Figure 14 we present scatterplots that relate the actual and predicted volatility
ratios qt+1|t and qbt+1|t to the probability of exceeding the realized threshold. These plots are
similar to the lower panel of figure 10 with the exception that now the probabilities are based
on the empirical distribution and not on the normal approximation. The red lines in the figures
22
0.2 0.4 0.6 0.8 1.0

0.2
ACF
ACF of probability of exceeding actual realized threshold
10
20
30
40
50
60
Lag
0.2 0.0 0.2 0.4 0.6 0.8 1.0
ACF
ACF of probability of exceeding predicted realized threshold
10
20
30
40
50
60
Lag
Figure 13: Autocorrelation of probabilities of exceeding realized threshold
correspond to a polynomial fit of the form:

2
p(b
ct|t ) = 0 + 1 qt+1|t + 2 qt+1|t
+ et
(32)
and similarly for qbt+1|t . The plots in all panels of Figure 14 are very similar to each other and
to the lower panel of Figure 10. All show how higher volatility increases the probability of
exceeding the realized threshold, irrespective of whether we use the actual qt+1|t or the predicted
qbt+1|t volatility ratio. The fit of the polynomial in equation (32) above is over 90% and therefore is
quite suggestive of the use of the volatility ratio as a regressor in making probabilistic predictions
using a variety of different methods, e.g. logit regression with dependent variable one of the Yt s
that we encountered above.
Concluding Remarks
The main result of this paper suggests that there is a potentially optimal choice for the threshold
value that defines the direction of the returns and can be used in making probabilistic predictions. This choice of the threshold does not coincide with zero and therefore sign predictions
are sub-optimal in most cases, especially when the information ratio is low. We also show that
the evolution of this optimal threshold is tied to the evolution of volatility, in particular to the
ratio of successive volatilities. We examine in some detail the issues that surround the choice
of the threshold and its evolution under a variety of distributional assumption for the standard23
0.20
0.15
0.10
Probability
0.5
1.0
1.5
0.8
1.0
1.2
1.4
1.6
volatility ratio
volatility ratio
Vol. ratio vs. cprob. (actual)
Vol. ratio vs. cprob. (predicted)
0.15
0.05
0.10
Probability
0.20
0.25
0.05 0.10 0.15 0.20 0.25 0.30 0.35
Probability
Vol. ratio vs. prob. (predicted)
0.25
0.05 0.10 0.15 0.20 0.25 0.30 0.35
Probability
Vol. ratio vs. prob. (actual)
0.5
1.0
1.5
0.8
volatility ratio
1.0
1.2
1.4
1.6
volatility ratio
Figure 14: Probabilities of exceeding realized threshold as functions of the volatility ratio
ized returns and find a positive risk-return trade-off in this probabilistic context as well: the
higher is future volatility the higher is the probability of future returns exceeding the realized
optimal threshold that is defined through current volatility. Therefore, volatility dynamics play
an important role in our context and successful predictions of future volatility are crucial in
understanding the probability of obtaining higher future returns.
24
References
1. Anatolyev, A. and N. Gospodinov, 2007. Modeling Financial Return Dynamics by Decomposition, Working Papers w0095, Center for Economic and Financial Research, (CEFIR).
2. Andersen, T.G., T. Bollerslev, F.X. Diebold, 2007, Parametric and nonparametric volatility measurement, L.P. Hansen, Y. Ait-Sahalia, eds. Handbook of Financial Econometrics,
forthcoming, Amsterdam: North-Holland.
3. Babu, G. J., Canty, A. and Chaubey, Y, 2002, Application of Bernstein polynomials for
smooth estimation of a distribution and density function, Journal of Statistical Planning
and Inference, vol. 105, pp. 377-392.
4. Bollerslev, T., R. Y. Chou, K.F. Kroner, 1992, ARCH modeling in finance: A selective
review of the theory and empirical evidence, Jounrnal of Econometrics, 52, 5-59.
5. Brandt, M.W., and Q. Kang, 2004, On the relationship between the conditional mean
and volatility of stock returns: a latent VAR approach, Journal of Financial Economics,
72, 217-57.
6. Cai, Z., 2002, Regression Quantiles for Time Series, Econometric Theory, vol. 18, pp.
169-192.
7. Cai, Z. and R.C. Tiwari, 2000, Application of a local linear autoregressive model to BOD
time series, EnvironMetrics, vol. 11, pp. 341-350.
8. Campbell, J. 1987, Stock returns and the term structure, Journal of Financial Ecoonomics, 18, 373-99.
9. Campbell, J. and L. Hentschel, 1992, No news is good news: an asymmetric model of
changing volatility in stock returns, Journal of Financial Economics, 31, 281-318.
10. Christoffersen, P.F., 1998, Evaluating Interval Forecasts, International Economic Review,
vol. 39, pp. 841-862.
11. Christoffersen P.F. and F. X. Diebold, 2006, Financial asset returns, direction-ofchange
forecasting, and volatility dynamics, Management Science, 52 (8), 1273-1287.
12. Christoffersen, P.F., Diebold, F.X., Mariano, R.S., Tay, A.S. and Tse, Y.K. 2007, Directionof-Change Forecasts Based on Conditional Variance, Skewness and Kurtosis Dynamics:
International Evidence, Journal of Financial Forecasting, forthcoming.
25
13. Chung, J. and Y. Hong, 2006, Model-Free Evaluation of Directional Predictability in

Foreign Exchange Markets, Journal of Applied Econometrics, vol. 22, pp. 855-889
14. Engle, R.F. and S. Manganelli, 2004, CAViaR: Conditional Autoregressive Value at Risk
by Regression Quantiles, Journal of Business & Economic Statistics, vol. 22, pp. 367-381.
15. Ferson, W. E., and C. R. Harvey, 1991, The variation of economic risk premium Journal
of Political Economy, 99, 385-415.
16. Fleming, J., C. Kirby, B. Ostdiek, 2001, The economic value of volatility timing, Journal
of Finance, 56, 329-352.
17. French, K.R., W. Schwert, R.F. Stambaugh, 1987, Expected stock returns and volatility,
Journal of Financial Economics, 19, 3-29.
18. Ghysels, E., A. Harvey, E. Renault, 1996, Stochastic volatiilty in G.S. Maddala, C. R.
Rao, eds. Statistical Methods in Finance, Handbook of Statistics, Vol. 14. North-Holland,
Amsterdam, The Netherlands, 119-191.
19. Ghysels, E., P. Santa-Clara, R. Valkanov, 2005, There is a risk-return tradeoff after all,
Journal of Financial Economics, 76, 509-548.
20. Granger, C.W.J. and H. Pesaran, 1999, Economic and Statistical Measures of Forecast
Accuracy, Journal of Forecasting, vol. 19, pp. 537 - 560.
21. Hong, Y. and J. Chung, 2003, Are the directions of stock price changes predictable?
Statistical theory and evidence, manuscript, Cornell University.
22. Hong, Y.M. and H. White, 2005, Asymptotic Distribution theory for an Entropy-Based
Measure of Serial Dependence, Econometrica, vol. 73, pp. 837-902.
23. Leitch, G. and J.E. Tanner, 1991, Economic Forecast Evaluation: Profits versus the Conventional Error Measures, American Economic Review, vol. 81, pp. 580590.
24. Lettau, M., and S. Ludvigson, 2005, Measuring and modeling variation in the risk-return
tradeoff, in Y. Ait-Shalia, L.P. Hansen, eds. Handbook of Financial Econometrics, NorthHolland, Amsterdam, The Netherlands.
25. Merton, R.C. 1973, An intertemporal capital asset pricing model, Econometrica 41, 867887.
26. Modarres, R., 2002, Efficient Nonparametric Estimation of a Distribution Function,
Computational Statistics & Data Analysis, vol. 39, pp. 75 - 95.
26
27. Mood, A.M., Graybill, F.A. and D.C. Boes, 1974, Introduction to the Theory of Statistics,
New York: McGraw-Hill.
28. Shuster, E.F., 1973, On the Goodness-of-Fit Problem for Continuous Symmetric Distributions, Journal of the American Statistical Association, vol. 68, pp. 713-715.
29. Thomakos, D.D., Wang, T. and J. Wu, 2007, Market Timing and Capital Rotation,
Mathematical and Computer Modeling, vol. 46, pp. 278-291.
27
Appendix
We give explicit expressions for the mean and variance of the smoothed estimator of the distribution function given in equation (24) of the main text. First, define FB (w) as:
FB (w) =
def
m
X
F (k/m)bk (m, w)
(33)
k=0
and note that FB (w) F (z) as m , by the properties of the distribution function and
Bernstein polynomials. Then, we can express the difference between FbB (w) and FB (w), say
(w) def
DB
= FbB (w) FB (w), as:
DB
(w)
m
X
b B (k/m)bk (m, w)
D
(34)
k=0
h
i
b B (k/m) def
where D
= FbW (k/m) F (k/m) and note that we immediately get that E [DB
(w)] =
h
i
b B (k/m) =
0 by the unbiasedness of the empirical distribution function, and therefore E D
0. Using the large sample properties of the empirical distribution function we also get that
h
i
limm E FbB (w) = F (z). That is, the estimator is biased in finite samples but asymptotically
unbiased.
b B (k/m)
To find the variance of the estimator notice that we already know the variance of D
from equation (20) and is given by T 1 F (k/m) [1 F (k/m)]. Since we can show that the
h
i
b B (k/m), D
b B (j/m) = T 1 F (k/m) [1 F (j/m)] for j k, see Mood,
covariance terms Cov D
Graybill and Boes (1974, ch. 11), we immediately obtain that:
Var [D (w)] = T 1
Pm
2
k=0 F (k/m) [1 F (k/m)] bk (m, w)+
P
Pm
2T 1 m1
k=0
j=k+1 F (k/m) [1 F (j/m)] bk (m, w)bj (m, w)
(35)
Finally, and although we do not have exact results about the rate of convergence, we should have
that FbB (w) is asymptotically normally distributed as:
2
T DB
(w) N (0, B
(w))
2 (w) = T Var [D (w)].

where B
def
28
(36)
Table 1. Descriptive Statistics for the S&P500 Monthly Returns

Rt
Vt
bt
Z
Mean
0.01
0.04
0.05
RM SE
0.04
0.02
1.01
-0.37
4.08
0.06
(s.e.)
0.12
0.12
0.12
Kurtosis
5.04
38.96
2.49
(s.e.)
0.23
0.23
0.23
CV M -test
0.20
3.24
0.07
(p-value)
0.00
0.00
0.27
LB-test
15.02
647.60
14.06
(p-value)
0.78
0.00
0.83
LB 2 -test
26.88
63.65
38.91
(p-value)
0.14
0.00
0.01
Skewness
Notes:
bt denote the monthly returns, daily realized volatility and standardized returns respectively.
1. Rt , Vt and Z
2. Mean, RM SE, Skewness and Kurtosis correspond to the usual sample moments; (s.e.) denotes standard
error.
3. CV M , LB and LB 2 correspond to the Cramer-Von Misses statistics for normality and the Ljung-Box test
for autocorrelation applied to the original and the squares of the series respectively; (p-value) denotes the
p-value of the test.
Table 2. Tests of Independence vs. a 1st Order Markov Alternative

Sample Proportion
P-value of test
bt > 0
Z
51.33%
0.75
bt > 1
Z
19.19%
0.04
bt < 1
Z
17.41%
0.22
Notes:
1. The test is applied to the binary series defined by the inequality in the first column of the table.
29

Optimal Probabilistic Predictions of Financial Returns

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Optimal Probabilistic Predictions of Financial Returns

Hochgeladen von

Copyright:

Verfügbare Formate

Optimal Probabilistic Predictions

November 21, 2007

Electronic copy available at: http://ssrn.com/abstract=1032944

Electronic copy available at: http://ssrn.com/abstract=1032944

A Generalized Sign Decomposition for Returns

Zt+1 |t Zt+1 = (Rt+1 t+1|t )/t+1|t

S(ct+1|t ) = S1 (ct+1|t ) + S2 (ct+1|t )

where the components Si (ct+1|t ), i = 1, 2 are given by:

I(Rt+1 > ct+1|t ) I(ct+1|t > 0) + I(ct+1|t 0)sign(Rt+1 )

Optimal Probabilistic Predictions

for changes in the mean;

for changes in the volatility;

Responsiveness Function wrt Volatility symmetric distributions

so that changes in the cut-off value are inversely related to volatility.

Probability as a function of various thresholds symmetric distibutions

Figure 2: Probability as a function of the threshold - N (0, 1) and t(2) distributions

Responsiveness Function wrt volatility skewed distributions

Skew GED,c* = 0.96,1.08

Mean Squared Error for different thresholds

c = 0 & Inf. Ratio = 0.9

Probability of exceeding threshold

Volatility/Threshold Evolution and Probabilistic Response

= P Rt+1 > ct|t = 1 F (qt+1|t )

Responsiveness function wrt future volatility realized threshold

Probability of exceeding realized threshold

Empirical Implications, Estimation and Prediction

observations. For out-of-sample observations a forecast VbT +1|T of volatility is required in

Estimation and Prediction of Probabilities

Estimating the Empirical Distribution Function

asymptotically normally distributed as:

where Z2 (z) = F (z) [1 F (z)] is the variance of the estimator.

where S2 (z) = 0.5F (|z|) [1 2F (|z|)] is the variance of the estimator.

FbW (k/m)bk (m, w)

where the weights bk (m, w) are given by the Bernstein polynomials:

Other Methods for Estimating Probabilities

subject to the constraints

Xt )pt (x)Kh (x Xt ) = 0. The optimal

S&P500 Monthly realized std. deviation

S&P500 Monthly standardized returns

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07

S&P500 Monthly realized volatility

S&P500 Monthly returns

Figure 8: Monthly data on the S&P500

a review of methods on estimation and forecasting with realized volatility measures.

S&P500 Monthly realized std. deviation

S&P500 Monthly standardized returns

S&P500 Monthly realized volatility

S&P500 Monthly returns

Figure 9: Autocorrelations for the data in Figure 8

0.5 1.0 1.5 2.0 2.5 3.0

0.0 0.2 0.4 0.6 0.8 1.0

ACF of volatility ratio

Probability vs. Volatility Ratio

Cprobability of exceeding optimal threshold

Cprobability of exceeding realized threshold

0.05 0.10 0.15 0.20 0.25 0.30 0.35

0.05 0.10 0.15 0.20 0.25 0.30 0.35

Probability of exceeding realized threshold

0.170 0.175 0.180 0.185 0.190 0.195

Probability of exceeding optimal threshold

Figure 11: Predicted probabilities of exceeding optimal and realized thresholds

0.0 0.2 0.4 0.6 0.8 1.0

ACF of probability of exceeding optimal threshold

0.0 0.2 0.4 0.6 0.8 1.0