Beruflich Dokumente
Kultur Dokumente
111
10
15
Some key words: High dimension; Intra-day volatility; Realized covariance; Extreme eigenvalues; Portfolio allocation.
1. I NTRODUCTION
With the easily obtainable intra-day trading data nowadays, financial market analysts and academic researchers enjoy more accurate return or volatility matrix estimation through the substantial increase in sample size. Yet, with respect to the integrated covariance matrix estimation
for asset returns, there are several well-known challenges using such intra-day price data. For
instance, when tick-by-tick price data is used, the contamination by market microstructure noise
(At-Sahalia et al., 2005; Asparouhova et al., 2013) can hugely bias the realized covariance matrix. Non-sychronous trading times presents another challenge when there are more than one
asset to consider.
To present a further challenge, it is well-documented that with independent and identically distributed random vectors, random matrix theories imply that there are biased extreme eigenvalues
for the corresponding sample covariance matrix when the dimension of the random vectors p
has the same order as the sample size n, i.e., p/n c > 0 for some constant c > 0. See for
instance Bai & Silverstein (2010) for more details. This suggests that the realized covariance
matrix, which is essentially a sample covariance matrix when all covolatilities are constants and
all log prices have zero drift with equally-spaced observation times (see the diffusion process
for the log price defined in (1) for more details), can have biased extreme eigenvalues under the
high dimensional setting p/n c > 0. The resulting detrimental effects to risk estimation or
portfolio allocation are thoroughly demonstrated in Bai et al. (2009) when inter-day price data is
used.
To rectify this bias problem, many researchers focused on regularized estimation of covariance
or precision matrices with special structures. These go from banded (Bickel & Levina, 2008b)
20
25
30
35
C. L AM AND C. H U
2
40
45
50
55
60
65
70
or sparse covariance matrix (Bickel & Levina, 2008a; Cai & Zhou, 2012; Lam & Fan, 2009;
Rothman et al., 2008), sparse precision matrix (Friedman et al., 2008; Meinshausen & Buhlmann,
2006), sparse modified Cholesky factor (Pourahmadi, 2007), to a spiked covariance matrix from
a factor model (Fan et al., 2008, 2011), or combinations of these (Fan et al., 2013).
Recently, Ledoit & Wolf (2012) proposed a nonlinear shrinkage formula for shrinking the extreme eigenvalues in a sample covariance matrix without assuming a particular structure of the
true covariance matrix. The method is generalized in Ledoit & Wolf (2014) for portfolio allocation with remarkable results. However, such a nonlinear shrinkage formula is only applicable
to the independent and identically distributed random vector setting. It is not applicable to intraday price data since the volatility within a trading day is highly variable, so that asset returns at
different time periods, albeit independent theoretically, are not identically distributed.
Lam (2016) proves that by splitting the data into two independent portions of certain sizes, one
can achieve the same nonlinear shrinkage asymptotically without the need to evaluate a shrinkage
formula as in Ledoit & Wolf (2012), which can be computationally expensive. At the same time,
such a data splitting approach can be generalized to adapt to different data settings. In this paper,
we modify the method proposed in Lam (2016) to achieve nonlinear shrinkage of eigenvalues
in the realized covariance matrix using intra-day price data. We use the same assumption as in
Zheng & Li (2011) (see Assumption 1 in Section 2 and the details therein) to overcome the
difficulty of time-varying volatilities for all underlying stocks. Ultimately, our method produces
a positive definite integrated covariance matrix asymptotically almost surely with shrinkage of
eigenvalues achieved nonlinearly, while local integrated covolatilities are adapted and estimated
accurately. Our method is fast since it involves only eigen-decompositions of matrices of size
p p, which is not computationally expensive when p is of order of hundreds. This is usually
the typical order for p in the case of portfolio allocation
The rest of the paper is organized as follows. We first present the framework for the data
together with the notations and the main assumptions to be used in Section 2. Our method of
estimation is detailed in Section 21, while Section 3 presents all related theories. Simulation
results are given in Section 4, and a real data example of portfolio allocation is presented in
Section 42. All proofs are presented in the supplementary materials accompanying this paper.
(1)
(Xt ,
Let Xt =
diffusion process
(1)
where t is the drift, t is a p p matrix called the (instantaneous) covolatility process, and
(1)
(p)
Wt = (Wt , . . . , Wt )T is a p-dimensional standard Brownian motion. We want to estimate
the integrated covariance matrix
1
p =
t T dt.
(2)
0
75
80
RCV
=
X XT , where X := Xn, Xn,1 .
(3)
p
=1
Jacod & Protter (1998) shows that as n goes to infinity, the above estimator converges weakly to
the true one defined in (2). Hence the realized covariance matrix is one of the most frequently
used estimator for the integrated covariance matrix.
While the intra-day volatility can change hugely within a short time period, it is not unreasonable to assume that the correlation of any two price processes stays constant within such a
period, say within a trading day. Following Zheng & Li (2011), for j = 1, . . . , p, write
(j)
dXt
(j)
(j)
(j)
(j)
= t + t dZt ,
(j)
85
(4)
(j)
where t , t are assumed to be c`adl`ag over [0, 1], and the Zt s are one dimensional standard
(j)
(j)
Brownian motions. Both the t s and the Zt s are related to t and Wt in (1). We assume
further, defining X, Y t to be the quadratic covariation between the processes X and Y :
(1)
90
(p)
95
The rest of the assumptions in this paper can be found in Section 3. We present this assumption
first since following Proposition 4 in Zheng & Li (2011), the log-price process Xt defined in (1)
satisfying Assumption 1 is such that there exist a c`adl`ag process (t )t[0,1] and a p p matrix
satisfying tr(T ) = p such that
t = t .
(5)
The nonlinear shrinkage estimator described in the next section is based on this property.
21. Nonlinear shrinkage estimator
When the dimension p is large relative to the sample size n, even for a sample covariance matrix constructed from independent and identically distributed random vectors, its extreme eigenvalues will be severely biased from the true ones (see chapter 5.2 of Bai & Silverstein (2010) for
example). While various assumptions have been made on the true integrated covariance matrix
like sparsity (Wang & Zou, 2010) or having a factor structure (Tao et al., 2011), in this paper we
follow Ledoit & Wolf (2012) and introduce nonlinear shrinkage for regularization, which does
not need a particular structural assumption on the true integrated covariance matrix itself.
However, since intra-day covariance can vary hugely within a short time period, the X s
defined in (3) are not identically distributed, and hence we cannot directly apply the nonlinear
shrinkage formula in Ledoit & Wolf (2012) to the realized covariance matrix in (3). Instead, we
use the data splitting idea for nonlinear shrinkage of eigenvalues in Lam (2016), and modify their
100
105
110
C. L AM AND C. H U
115
method to accommodate the intra-day volatility change base on (5), which is a condition derived
from Assumption 1 as proved in Zheng & Li (2011).
To this
observe that by (5), the integrated covariance matrix in (2) can be written as
1 end,
2
p = 0 t dt T . Zheng & Li (2011) proposed a so-called Time-variation adjusted realized
covariance matrix, defined as
p :=
tr(RCV
)
X XT
p
where
:= p
,
,
X
2
p
n
(6)
=1
120
p is a good estimator for p by
and
denotes the norm of a vector. They demonstrate that
1
is good for = T . Here
showing that tr(RCV
)/p is a good estimator for 0 t2 dt, while
p
plays the role of a sample covariance matrix for estimating . Hence if p/n c > 0, then
n,1
125
where = stands for equal in distribution, and the Z s are independent random vectors each with
Z N (0, Ip ). Then
( Z ZT )
1/2 Z ZT 1/2
d 1
1/2 1
1/2 .
T
T
n
Z Z /p
n
Z Z /p
n
=1
=1
We can actually show that ZT Z /p goes to 1 almost surely, leaving the above being the sample
covariance matrix constructed from the Z s sandwiched by 1/2 .
130
Following Lam (2016), since the X s are independent following model (1), we split the
data X = (X1 , . . . , Xn ) into two independent parts, say X = (X1 , X2 ), with Xi
having size p ni for i = 1, 2, such that n = n1 + n2 . Define
X XT
ei = p
,
X
2
ni
Ii
e 1 , suppose
e 1 = P1 D1 PT .
where Ii = { : X Xi }. Carrying out an eigen-analysis on
1
Then we introduce our estimator as
b p :=
135
)
tr(RCV
p
T
b where
b := P1 diag(PT
e
,
1 2 P1 )P1 ,
p
(7)
b above belongs to
with diag() setting all non-diagonal elements of a matrix to 0. The estimator
T
a class of rotation equivariant estimator (D) = P1 DP1 , where D is a diagonal matrix, and P1
e
e 1 . The choice of D = diag(PT
is the matrix containing all the eigenvectors of
1 2 P1 ) comes
from solving
e 2
,
min
P1 DPT1
D
140
145
Assumption 3. The observation times n, s are independent of the log-price Xt , and there
exists a constant C > 0 such that for all positive integer n,
max n(n, n,1 ) C.
1n
We set t = 0 in Assumption 2 for the ease of proofs and presentation. If t is slowly varying
locally, the results to be presented are still valid at the expense of longer and more complex
proofs. The uniform bounds on the eigenvalues of t Tt are needed so that individual volatility
1
(i)
process for each Xt are bounded uniformly. Also, 0 t2 dt > 0 uniformly, and finally,
p
=
O(1) uniformly as a result, which are all needed for our results to hold. These assumptions
essentially treat t as non-random. Extension to t being stochastic can follow the lines of Zheng
& Li (2011), but we keep it non-random for the ease of presentation and proofs as well.
150
155
160
165
T HEOREM 1. Let all the assumptions in Lemma 1 hold. Then as p, n such that p/n
b p defined in (7) is almost surely positive definite.
c > 0,
This is an important result since p is always assumed to be positive definite, and we want our
estimator to be so too. This is certainly not the case for a sample covariance matrix when p > n,
p defined in (6) by Zheng & Li (2011), which is demonstrated in
and is still not the case for
our simulation results in Section 4.
170
175
C. L AM AND C. H U
5
Remark 2. Both Lemma 1 and Theorem 1 requires
n2 1 pn2 < . Following Lam
(2016), we set n2 = an1/2 where a is a constant, so that when p/n c > 0, the condition is
satisfied. See Section 31 for more details on how to find n2 with finite sample.
To present the rest of the results, we introduce a benchmark estimator for comparisons. This
estimator is called the ideal estimator, defined by
1
ideal =
t2 dt Pdiag(PT P)PT .
(8)
0
180
185
This is similar to the proposed estimator defined in (7), except that the estimator tr(RCV
)/p is
p
1 2
e 2 is replaced by the population counterreplaced by the population counterpart 0 t dt, while
part . Also, P1 is replaced by P, which is the matrix containing all orthonormal eigenvectors
defined in (6) using all data points. In line with Ledoit & Wolf
for the covariance-type matrix
(2012) and Lam (2016), this estimator utilizes
1 2 all data points for calculating the eigenmatrix P,
and it assumes the knowledge of and 0 t dt. With this, we define the efficiency loss of any
b as
estimator
b
b := 1 L(p , Ideal ) ,
EL(p , )
(9)
b
L(p , )
b is a loss function for estimating p by .
b We consider the Frobenius loss
where L(p , )
b =
b p
2 ,
L(p , )
(10)
F
and the inverse Steins loss function in this paper,
b = tr(p
b 1 ) log det(p
b 1 ) p.
L(p , )
(11)
190
195
200
205
The class of rotation-equivariant estimator (D) = PDP minimizes the Frobenius norm exb Ideal , while similar to Proposition 2 in Lam (2016),
b Ideal also minimizes the inverse
actly at
b p , being
Steins loss within such a class of estimator. Hence it is intuitive that our estimator
also rotation-equivariant but not utilizing all data points in calculating the eigenmatrix, will be
b p ) > 0. It turns out that asymptotically,
b p is doing as
less efficient in the sense that EL(p ,
b
good as Ideal , as shown in the following theorem.
T HEOREM 2. Let all the assumptions in Lemma 1 hold. Then as p, n such that p/n
b p ) a.s.
c > 0, we have EL(p ,
0 with respect to both the Frobenius and the inverse Steins loss
1
b Ideal ) does not tend to 0 almost surely.
functions, as long as p L(p ,
b Ideal ) not going to 0 almost surely eliminates the case p =
The requirement p1 L(p ,
1 2
0 t dt Ip , when both the loss functions will attain 0 for the the ideal estimator. Our esti)/p will still be a good estimator for
mator will still do a good job in such a case since tr(RCV
p
1 2
b
0 t dt by the proof of Theorem 1, while can still do a fine job when permutation of the data
is allowed as demonstrated in the simulation results in Lam (2016). Improvement by averaging
and permutation will be described in Section 31.
31. Practical Implementation
2
Following Assumption 1, X XT /
X
is independent of t and is similar to a data
point in constructing a sample covariance matrix, which is independent of each others for different ; see Remark 1 in Section 21. This observation permits us to permute the data beforehand,
(j)
7
(j)
say at the jth permutation, we form a data matrix X(j) = (X1 , X2 ), with Xi having
size p ni for i = 1, 2, such that n = n1 + n2 . Then we construct
X XT
e (j) = p
2 ,
i
ni
X
(j)
(12)
Ii
(j)
(j)
e (j) , say
e (j) =
where Ii = { : X Xi }, and perform eigen-analysis on
1
1
(j) (j) (j)T
P1 D1 P1 . The we can form the jth estimator as
b (j) :=
tr(RCV
) (j)
p
b , where
b (j) := P(j) diag(P(j)T
e (j) P(j) )P(j)T .
1
1
2
1
1
p
(13)
If we perform M permutations and get M estimators as above, we can define the averaged
estimator as
M
1 b (j)
b
p,M :=
p .
M
210
(14)
j=1
Note that in all M estimators, we are only using one split location, n1 , for the data, instead of
using several of them and then average the results similar to the grand average estimator in Abadir
et al. (2010). To find the best split location empirically, we minimize the following function:
M
1
(j)
2
(j)
b
e
g(m) =
(p 2 )
,
M
F
(15)
j=1
215
220
225
230
C. L AM AND C. H U
235
240
4. E MPIRICAL R ESULTS
We carry out simulation studies to compare the performances of our estimator in (7), the time
variation-adjusted realized covariance matrix in (6) and the realized covariance matrix in (3)
by comparing their Frobenius and inverse Steins losses defined in (10) and (11) respectively.
Then in Section 41, we consider a trading exercise using simulated market data and compare
the risks associated with the minimum variance portfolios constructed using these three different
estimators. Finally, in Section 42, we consider real data from the New York Stock Exchange.
Consider two different scenarios for the diffusion process {Xt } defined in (1), with t = 0
and t = t as in (5). One has t being piecewise constant, the other has t being continuous,
detailed as follows:
Design I: Piecewise constants. We take t to be
{
0.0007, t [0, 1/4) [3/4, 1],
t =
0.0001, t [1/4, 3/4).
Design II: Continuous path. We take t to be
245
250
We assume = Ip and the observation times are taken to be equidistant, where n, = /n, =
1, . . . , n. We generate {Xt } using model (1) and get n = 200 discrete observations, and consider
p = 100, 200. For each design and each (n, p) combination, we repeat 1000 times the simulations, and compare the mean Frobenius and inverse Steins losses for our proposed estimator, the
time variation-adjusted realized covariance matrix and the realized covariance matrix.
Table 1 presents the simulation results. It is clear that overall, our proposed estimator performs
the best. In particular, since the realized covariance or the time variation-adjusted realized covariance matrices are singular when p = 200, their inverses do not exist. In contrast, our proposed
estimator is always non-singular and stable even in this case, which is in line with Theorem 1.
41. A market trading exercise
As an application in finance, we simulate market trading data in this section and construct minimum variance portfolio using the three different estimators compared in the previous section.
Given an integrated covariance matrix p , the minimum variance portfolio solves
min
w:wT 1p =1
wT p w,
1Tp 1
p 1p
(16)
For the price data, following Barndorff-Nielsen et al. (2011) and Fan et al. (2012), we simuo(i)
(i)
(i)
(i)
late p = 100 stock prices for 200 days using Xt = Xt + t , where Xt is the underlying
(i)
(i)
log-price, and t models the market microstructure noise, with t N (0, 0.00052 ) and are
(i)
assumed to be independent of each other. The underlying log-price Xt is generated by the
stochastic volatility model. For i = 1, . . . , 100,
(i)
(i)
(i)
(i)
(i) (i)
dXt = dt + t dBt + 1 ((i) )2 t dWt + (i) dZt ,
(i)
260
1
p 1p
where {Wt }, {Zt } and the {Bt }s are all independent standard Brownian motions. The process
{Zt } plays the role of a pervasive factor, which is usually the market factor in asset returns. The
Proposed
p = 100
p = 200
.13(.02)
.55(.17)
p = 100
p = 200
.17(.014)
88(15)
Design II
Proposed
p = 100
p = 200
.29(.03)
.38(.03)
p = 100
p = 200
.54(.1)
88(16)
Time variation-adjusted
Frobenius loss
2.8(.04)
69(31)
Inverse Steins loss
5.63(.058)
-
Realized covariance
Time variation-adjusted
Frobenius loss
6.32(.09)
13(10)
Inverse Steins loss
693(31)
-
Realized covariance
3.6(.06)
1564(63)
7.08(.08)
-
7.55(.1)
15(20)
1232(53.8)
-
Table 1. Mean and standard deviation (in bracket) of losses for different methods. All values
reported in this table are multiplied by 1000. Upper table: results for Design I. Lower table:
results for Design II. For p = 200, the time variation-adjusted and realized covariance matrices
are always singular, and hence inverse Steins loss are at infinity.
(i)
(i)
(i)
(i)
(i)
(i)
265
(i)
0
270
275
280
C. L AM AND C. H U
10
Theoretical risk
Proposed
.922
.753
Actual risk
Perceived risk
.735
Time variation-adjusted
3.918
3.869
Realized covariance
4.115
4.034
b w
b opt ) and perceived risk R(
b opt ).
Table 2. Mean of theoretical risk R(wopt ), actual risk R(w
285
290
295
300
305
b opt ) =
covariance matrix over the 5-day investment period. The second one is the actual risk R(w
T
b opt
b opt , where w
b opt is calculated using different integrated covariance matrix estimators.
w
p w
T b
b w
b opt ) = w
b opt
b opt .
Finally the perceived risk is defined by R(
p w
We can see from Table 2 that our method has the best performance among all three different
methods, and has the risk closet to the theoretical one. In particular, our method has the smallest
actual risk, which is the most relevant risk in practice.
42. Portfolio allocation on NYSE data
We consider p = 45 stocks from the New York Stock Exchange from January 1 of 2013 to December 31 of 2013 (245 trading days). We choose the stocks from mid-cap energy sector stocks.
We downloaded all the trades of these stocks from Wharton Research Data Services (WRDS,
https://wrds-web.wharton.upenn.edu/). The raw data are of high frequency nature. As mentioned
before, the stocks have non-sychronous trading times and all the log-prices are contaminated by
market microstructure noise.
Like the market trading exercise in Section 41, we consider trades in 15-minute intervals on
every trading day from 9:30 to 16:00, with each log-price being the observed one from a trade
right before a 15-minute interval ends. This results in a total of 6732 observations over the 245
trading days. Hence on average there are around 27 observations per day.
We consider two settings. For the first one, we consider 20-day training windows and reevaluate portfolio weights every 5 days. Another setting use 5-day training windows and reevaluate portfolio weights everyday. We use the annualized out-of-sample standard deviation
b,
together with the annualized portfolio return
b and the Sharpe ratio
b/b
to gauge the performance of each method. For 20-day training windows and 5 day re-evaluation period,
b and
b are
defined by
b = 52
(
)1/2
1 T
1 T
wi ri ,
b = 52
(wi ri
b)2
.
45
45
49
49
i=5
i=5
We use the annualized out-of-sample standard deviation since we do not know the true underlying
integrated covariance matrix, and hence the actual risk cannot be calculated. For 5-day training
windows with daily re-evaluation of portfolio weights,
b and
b are defined by
b = 252
(
)1/2
1 T
1 T
wi ri ,
b = 252
(wi ri
b)2
.
240
240
245
245
i=6
i=6
R EFERENCES
310
A BADIR , K. M., D ISTASO , W. & Z IKE S , F. (2010). Model-free estimation of large variance matrices. The Rimini
Centre for Economic Analysis, WP 10-17.
AI T-S AHALIA , Y., M YKLAND , P. A. & Z HANG , L. (2005). How often to sample a continuous-time process in the
presence of market microstructure noise. Review of Financial Studies 18, 351416.
11
A NDERSEN , T., B OLLERSLEV, T., D IEBOLD , F. & P., L. (2001). The distribution of realized exchange rate volatility.
Journal of the American Statistical Association 96, 4255.
A SPAROUHOVA , E., B ESSEMBINDER , H. & K ALCHEVA , I. (2013). Noisy prices and inference regarding returns.
The Journal of Finance 68, 665714.
BAI , Z., L IU , H. & W ONG , W.-K. (2009). ENHANCEMENT OF THE APPLICABILITY OF MARKOWITZS
PORTFOLIO OPTIMIZATION BY UTILIZING RANDOM MATRIX THEORY. Mathematical Finance 19,
639667.
BAI , Z. & S ILVERSTEIN , J. (2010). Spectral Analysis of Large Dimensional Random Matrices. New York: Springer
Series in Statistics, 2nd ed.
BARNDORFF -N IELSEN , O. E., H ANSEN , P. R., L UNDE , A. & S HEPHARD , N. (2011). Multivariate realised kernels:
Consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous
trading. Journal of Econometrics 162, 149 169.
B ICKEL , P. J. & L EVINA , E. (2008a). Covariance regularization by thresholding. Ann. Statist. 36, 25772604.
B ICKEL , P. J. & L EVINA , E. (2008b). Regularized estimation of large covariance matrices. Ann. Statist. 36, 199227.
C AI , T. T. & Z HOU , H. H. (2012). Optimal rates of convergence for sparse covariance matrix estimation. The Annals
of Statistics 40, 23892420.
FAN , J., FAN , Y. & LV, J. (2008). High dimensional covariance matrix estimation using a factor model. Journal of
Econometrics 147, 186197.
FAN , J., L I , Y. & Y U , K. (2012). Vast volatility matrix estimation using high- frequency data for portfolio selection.
Journal of the American Statistical Association 107, 412428.
FAN , J., L IAO , Y. & M INCHEVA , M. (2011). High-dimensional covariance matrix estimation in approximate factor
models. The Annals of Statistics 39, 33203356.
FAN , J., L IAO , Y. & M INCHEVA , M. (2013). Large covariance estimation by thresholding principal orthogonal
complements. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75, 603680.
F RIEDMAN , J., H ASTIE , T. & T IBSHIRANI , R. (2008). Sparse inverse covariance estimation with the graphical
lasso. Biostatistics 9, 432441.
JACOD , J. & P ROTTER , P. (1998). Asymptotic error distributions for the euler method for stochastic differential
equations. Ann. Probab. 26, 267307.
L AM , C. (2016). Nonparametric eigenvalue-regularized precision or covariance matrix estimator. Ann. Statist. To
appear.
L AM , C. & FAN , J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist.
37, 42544278.
L EDOIT, O. & W OLF, M. (2012). Nonlinear shrinkage estimation of large-dimensional covariance matrices. The
Annals of Statistics 40, 10241060.
L EDOIT, O. & W OLF, M. (2014). Nonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz
meets Goldilocks. ECON - Working Papers 137, Department of Economics - University of Zurich.
320
325
330
335
340
345
350
355
360