Heteroskedasticity and Autocorrelation Consistent Standard Errors

NBER Summer Institute Minicourse Whats New in Econometrics: Time Series
Lecture 9

July 16, 2008
Heteroskedasticity and Autocorrelation Consistent Standard Errors
Lecture91,July21,2008
Outline 1. What are HAC SEs and why are they needed? 2. Parametric and Nonparametric Estimators 3. Some Estimation Issues (psd, lag choice, etc.) 4. Inconsistent Estimators
1. What are HAC SEs and why are they needed? Linear Regression: yt = xt + et
T = xt xt ' xt yt ' = + xt xt ' xt et ' t =1 t =1 t =1 t =1
T T T 1 1
1 T 1 ) = T ( T xt xt ' t =1 T
d
1 xt et ' = S XX AT . t =1
T
SXX
1 XX = E(xtxt), AT N (0, ) , where = limT Var T
xt et ' t =1
T
a N , 1 V , where V = S 1 S 1 Thus XX XX T HAC problem : = ???
(Same problem arises in IV regression, GMM, ) Notation:

1 = limT Var T xt et ' t =1
T
Let wt = xtet (assume this is a scalar for convenience)

Feasible estimator of must use wt = xt et , The estimator is ({wt }) , but most of our discussion uses ({wt }) for convenience.
An expression for : Suppose wt is covariance stationary with autocovariances j. (Also, for notational convenience, assume wt is a scalar). Then
1 T 1 = var ( wt ) = T var (w1 + w2 + ... + wT ) T t =1 1 = {T 0 + (T 1)( 1 + 1 ) + (T 2)( 2 + 2 ) + ...1 ( T 1 + 1T )} T T 1 1 T 1 = j j ( j + j ) T j =1 j =T +1
If the autocovariances are 1-summable so that

1 = var ( T
j |
|< then
w )
t =1 t j =
1 Recall spectrum of w at frequency is S() = 2
j =
j e i j , so that =
2S(0). is called the Long-run variance of w.

II. Estimators (a) Parametric Estimators, wt ~ ARMA
(L)wt = (L)t,
1 2 (e i ) (ei ) S() = 2 (e i ) (ei )
2 (1 1 2 2 (1) 2 = = 2S(0) = 2 (1) (1 1 2
q )2 p )2
(1 1 2 2 = (1
1 2
q )2 )2
p
Jargon: VAR-HAC is a version of this. Suppose wt is a vector and the estimated VAR using wt is
wt = 1wt 1 + ... + p wt p + t ,
1 T where = t t ' T t =1
Then = ( I 1 ... p ) 1 ( I 1 ... p ) 1 '
(b) Nonparametric Estimators

=
j = m
j = m
K
j
1 T with j = wt wt j = j (j 0) T t = j +1
III. Issues
(a) 0 ?
(b) Number of lags (m) in non-parameter estimator or order of ARMA in parametric estimator (c) form of Kj weights
(a) 0 ?
(1 1 2 2 = (1
1 2
q )2 )2
p
(yes)
j = m
K
j
(Not necessarily)
MA(1) Example: = (1 + 0 + 1) and = (1 + 0 + 1 )
|1| 0/2 but | 1 |> 0 / 2 is possible. Newey-West (1987): Use Kj = 1

| j| (Bartlett Kernel) m +1
An alternative expression for =
j = m
K
j
Let W = (w1 w2 wT) where W ~ (0, WW), U = HW, U ~ (0, UU) with UU= HWWH. Choose H so that UU = D a diagonal matrix. A useful result: with W covariance stationary (WW is special), a particular H matrix yields (approximately) UU = D (diagonal). (The us are the coefficients from the discrete Fourier transform of the ws.)
Variable u1 u2 and u3 u4 and u5 u6 and u7 uT1 and uT (T odd, similar for T even)
2Variance (Dii) S(0) S(12/T) S(22/T) S(32/T) S([(T1)/2]2/T)

Some algebra shows that

=
j = m
K j j = k0u12 +
(T 1)/2
j =1
2 2 u2 j + u2 j +1 kj 2
where the k-weights are functions (Fourier transforms) of the K-weights. The algebraic details are unimportant for our purposes but, for those interested, they are sketched on the next slide. Jargon: the term
2 2 u2 j + u2 j +1 is called the jth periodogram ordinates. Because 2 2 j 2 2 E (u2 j ) = E (u2 j +1 ) = 2 S , the jth periodogram ordinate is an estimate T
of the spectrum at frequency (2j/T).
The important point: Evidently, 0 requires kj 0 for all j.

The algebra linking the data w, the us, the periodogram ordinates and the sample covariances parallels the expressions the spectrum presented in lecture 1. (Ref: Fuller (1976), or Priestly (1981), or Brockwell and Davis (1991)). In particular, let cj =
1 2T
wt e
t =1
i j t
where j = 2j/T
2 2 T 1 u2 j + u2 j +1 2 Then =| c j | = p cos( p j ) , 2 p =T 1 1 T where p = ( wt w)( wt p w) T t = p +1 2 2 u2 j + u2 j +1 2 Thus =| c j | is a natural estimator of the spectrum at frequency 2
j.
j = m
K j j = k0u12 +
(T 1)/2
j =1
2 2 u2 j + u2 j +1 kj 2
is then a weighted average (with weights k of the estimated spectra at difference frequencies. Because the goal is to estimate at = 0, the weights should concentrate around this frequency as T gets large. Jargon: Kernel (K-weights, sometimes k-weights), Lag Window (Kweights, RATS follows this notation and calls these lwindow in the code), Spectral Window (k-weights)
Truncated Kernel: Kj = 1(|j| m) Spectral Weight (k) (m = 12, T=250)
Newey-West (Bartlet Kernel): Kj = 1
| j| m +1
Spectral Weight (k) (m = 12, T=250)
Implementing Estimator (lag order, kernel choice and so forth) Parametric Estimators: AR/VAR approximations: Berk (1974) Ng and Perron (2001), lag lengths in ADF tests. Related, but different issues. Shorter lags: Larger Bias, Smaller Variance Longer lags: Smaller Bias, Larger Variance Practical advice: Choose lags a little longer than you might otherwise (rational given below).
How Should m be chosen?

Andrews (1991) and Newey and West (1994): minimize mse( )
MSE = Variance + Bias2

= k0u12 +
(T 1)/2
j =1
2 2 u2 j + u2 j +1 kj 2
Variance: Spread the weight out over many of the squared us: Make the spectral weights flat.
2 2 u2 j + u2 j +1 Bias: E( ) depends on the values E S(j2/T). So the bias 2
depends on how flat the spectrum is around = 0. The more flat it is, the smaller is the bias. The more curved it is, the higher is the bias.
Spectral Window (kj) for N-W/Bartlett Lag Window (Kj = 1

m=5 m = 10
| j| ), T = 250. m +1
m = 20
m = 50
MSE minimizing values for wt = wt1 + et, Bartlett Kernel (note the higher is , more serial correlation, more curved spectrum around = 0, more bias for given value of m.) m* = 1.144741/3(2)1/3T1/3 = 1.822/3T1/3
0.00 0.25 0.50 0.75 0.90 0.95
m* 0 0.72T1/3 1.15T1/3 1.50T1/3 1.70T1/3 1.76T1/3 100 0 3 5 6 7 8
T 400 0 5 8 11 12 12
1000 0 7 11 15 17 17
Kernel Choice: Some Kernels (lag windows)
Does Kernel Choice Matter? In theory, yes. (QS is optimal) In practice (for psd kernels), not really.
Does lag length matter? Yes, quite a bit yt = + wt, wt = wt1 + et, T = 250. Rejection of 2-sided 10% test
0.00 0.25 0.50 0.75 0.90 0.95
m*
AR 0.11 0.11 0.11 0.12 0.15 0.20
m *(PW)
m* 0.10 0.13 0.15 0.21 0.33 0.46
2m* 0.10 0.13 0.14 0.18 0.25 0.37
0.10 0.13 0.15 0.21 0.33 0.46
0.11 0.11 0.11 0.12 0.15 0.19
(More Simulations: den Haan and Levin (1997) available on den Haans web page and other papers listed throughout these slides)
Why use more lags than MSE Optimal (Sun, Phillips, and Jin (2008) )
Intuition: z ~ N(0,2) , 2 an estimator of 2 (assumed independent of z)
z2 Pr 2 < c = Pr ( z 2 < 2c ) = E (1( z 2 < 2c) ) = E ( g ( 2 ) ) 1 2 2 2 ) g '( 2 ) + E ( ( 2 2 ) 2 ) g ''( 2 ) E ( g ( ) ) + E ( 2 1 = F 2 (c) + Bias ( 2 ) g '+ MSE ( 2 ) g '' 1 2
MSE Bias2 + Variance This formula Bias and MSE Thus m should be bigger
IV. Making m really big m = bT where b is fixed. (Kiefer, Vogelsang and Bunzel (2000), Kiefer and Vogelsang (2002), Kiefer and Vogelsang (2005))
Bartlett m = (T1): ( w) =
j =T +1
T 1
(1
| j| ) j T
1 T j = wt wt j = j , and where wt = wt w . T t = j +1 ( w) = 2T 1 T 1/2 w j KV (2002) show that some rearanging yields : t =1 j =1

T t
2
But, T
T
1/2
1/2
[ sT ]
w j W (s) , T
1/2
1/2
[ sT ]
j =1
w j 1/2 (W ( s ) sW (1)) , and

j =1
w j () , where (s) = 1/2(W(s)sW(1)).

j =1
[ T ]
Thus
1 1/2 t d ( w) : ( w) = 2T 1 T w j 2 (W ( s ) sW (1)) 2 ds t =1 j =1 0
So that ( w) is a robust, but inconsistent estimator of . Distributions of t-statistics (or other statistics using this ) will not be asymptotically normal, but large sample distributions easily tabulated (see KV (2002)).
Table of t cricitical values
F-critical values .. See KVB (DIVIDE BY 2 !)
Power loss (From KVB)
Size Control (Finite Sample AR(1))
(a) Advantages ... (i) Some what better size control (in Monte Carlos and in theory (Jansson (2004)). (ii) More Robust to serial correlation (Mller (2007). Mller proposes estimators that yield t-statistics with t-distributions. (b) Disadvantages ... (i) Wider confidence intervals ... In the one-dimensional regression problem a value of 0 chosen so that is contained in the (true, not size-distorted) 90% HAC confidence interval with probability 0.10. The value of is contained in the KVB 90% CI with probability 0.23. (Based on an asymptotic calculation) (ii) Robustness to volatility breaks ... No ....
KVB Cousins: (Some related work) (a) (Mller (2007): Just focus on low frequency observations. How many do you have?
2 j Periodogram ordinates at , j = 1, 2, (T-1)/2 T
Let p denote a low frequency cutoff frequencies with periods greater 2 than p are . Solving number of periodogram ordinates with
p
periods p:
T p
60 years of data, p = 6 years, number of ordinates = 10. (Does not depend on sampling frequency). Mller : do inference based on these obs. Appropriately weighted these yield t-distributions for t-statisics.
(b) Panel Data: Hansen (2007). Clustered SEs (over T) when T is large and n is small (T and n fixed). This too yields t-statistics being distributed t (with n1 df) and appropriately constructed F-stats having Hotelling t-squared distribution. (c) Ibragimov and Mller (2007). Allows heterogeneity.
HAC Bottom line(s) (1) Parametric Estimators are easy and tend to perform reasonably well. In some applications a parametric model is natural (MA(q) model for forecasting q+1 periods ahead). In other circumstances VAR-HAC is sensible. (den Haan and Levin (1997).) Think about changes in volatility. (2) Why are you estimating ?
(a) Optimal Weighting matrix in GMM: Minimum MSE seems like a good idea. (Analytic results on this ?)
(b) Inference: Use more lags than you otherwise would think you should in Newey-West (or other non-parametric estimators). Worried? Use KVB estimators (and their critical values for tests) or Mller (2007) versions (with t or F critical values).

Heteroskedasticity and Autocorrelation Consistent Standard Errors

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Heteroskedasticity and Autocorrelation Consistent Standard Errors

Hochgeladen von

Copyright:

Verfügbare Formate

NBER Summer Institute Minicourse Whats New in Econometrics: Time Series

July 16, 2008

Heteroskedasticity and Autocorrelation Consistent Standard Errors

1 XX = E(xtxt), AT N (0, ) , where = limT Var T

a N , 1 V , where V = S 1 S 1 Thus XX XX T HAC problem : = ???

(Same problem arises in IV regression, GMM, ) Notation:

Let wt = xtet (assume this is a scalar for convenience)

If the autocovariances are 1-summable so that

1 Recall spectrum of w at frequency is S() = 2

2S(0). is called the Long-run variance of w.

II. Estimators (a) Parametric Estimators, wt ~ ARMA

Then = ( I 1 ... p ) 1 ( I 1 ... p ) 1 '

(b) Nonparametric Estimators

MA(1) Example: = (1 + 0 + 1) and = (1 + 0 + 1 )

|1| 0/2 but | 1 |> 0 / 2 is possible. Newey-West (1987): Use Kj = 1

An alternative expression for =

2Variance (Dii) S(0) S(12/T) S(22/T) S(32/T) S([(T1)/2]2/T)

Some algebra shows that

of the spectrum at frequency (2j/T).

The important point: Evidently, 0 requires kj 0 for all j.

Truncated Kernel: Kj = 1(|j| m) Spectral Weight (k) (m = 12, T=250)

Newey-West (Bartlet Kernel): Kj = 1

Spectral Weight (k) (m = 12, T=250)

How Should m be chosen?

MSE = Variance + Bias2

Spectral Window (kj) for N-W/Bartlett Lag Window (Kj = 1

0.00 0.25 0.50 0.75 0.90 0.95

m* 0 0.72T1/3 1.15T1/3 1.50T1/3 1.70T1/3 1.76T1/3 100 0 3 5 6 7 8

Kernel Choice: Some Kernels (lag windows)

0.00 0.25 0.50 0.75 0.90 0.95

AR 0.11 0.11 0.11 0.12 0.15 0.20

m* 0.10 0.13 0.15 0.21 0.33 0.46

2m* 0.10 0.13 0.14 0.18 0.25 0.37

0.10 0.13 0.15 0.21 0.33 0.46

0.11 0.11 0.11 0.12 0.15 0.19

1 T j = wt wt j = j , and where wt = wt w . T t = j +1 ( w) = 2T 1 T 1/2 w j KV (2002) show that some rearanging yields : t =1 j =1

w j 1/2 (W ( s ) sW (1)) , and

w j () , where (s) = 1/2(W(s)sW(1)).

Table of t cricitical values

F-critical values .. See KVB (DIVIDE BY 2 !)

Power loss (From KVB)

Size Control (Finite Sample AR(1))

Das könnte Ihnen auch gefallen