Sie sind auf Seite 1von 61

Further Issues in Using OLS with Time

Series Data

EMET 8002
Lecture 2
July 24, 2009
Further Issues in Using OLS with
Time Series Data
„ Recall that in the last lecture we covered six
assumptions, TS.1 through TS.6, under which OLS
has exactly the same desirable properties, in a finite
sample as for cross-sectional analysis

„ Furthermore, TS.1 through TS.6 allow us to carry out


inference in the exact same way as well

„ As we will see, models with lagged dependent


variables necessarily violate TS.3 and thus we need
to relax this assumption
Stationary and Weakly Dependent
Time Series
„ A stationary process is one whose probability
distributions are stable over time.

„ Formally, the stochastic process {xt: t=1,2,…} is


stationary if for every collection of time indices 1≤t1
≤t2 ≤… ≤tm, the joint distribution of (xt1,xt2,…,xtm) is
the same as the joint of (xt1+h,xt2+h,…,xtm+h) for all
integers h≥1
Stationary and Weakly Dependent
Time Series
„ In other words the sequence {xt: t=1,2,…} is
identically distributed. Moreover, stationarity requires
that the nature of any correlation between adjacent
terms is the same across all time periods.

„ A stochastic process that is not stationary is said to


be a nonstationary process

„ Since stationarity is a feature of the underlying


process and not the realization, it can be very difficult
to test whether the collected data come from a
stationary series
Stationary and Weakly Dependent
Time Series
„ Example:
„ Is a stochastic process with a linear time trend
stationary?
Stationary and Weakly Dependent
Time Series
„ The stochastic process {xt: t=1,2,…} with a finite
second moment [E(xt2)<∞] is covariance
stationary if
1. E(xt2) is constant,
2. var(xt2) is constant, and
3. for any t, h≥1, cov(xt,xt+h) depends only on h and not
t
Stationary and Weakly Dependent
Time Series
„ Relationship between stationary and covariance
stationary:
„ If a stationary process has a finite second moment
then it must be covariance stationary, but the converse
is not necessarily true
„ Stationarity is a stronger requirement than covariance
stationarity and is thus sometimes referred to as strict
stationarity
Stationary and Weakly Dependent
Time Series
„ Why do we care about stationarity?

„ It simplifies statements of the Law of Large Numbers


and the Central Limit Theorem which are required for
asymptotic analysis of estimators

„ More practically, if we want to examine the relationship


between two or more variables over time using
regression analysis, we need to assume some sort of
stability over time. Otherwise, if we allow the
relationship between the variables to change every
period, then how could we ever hope to learn much
about how the change in one variable affects the other
variable?
Stationary and Weakly Dependent
Time Series
„ Have we already made assumptions like this?
Stationary and Weakly Dependent
Time Series
„ Stationarity describes the joint distribution of a
process as it moves through time. A very different
concept is that of weak dependence

„ A stationary time series process {xt: t=1,2,…} is said


to be weakly dependent if xt and xt+h are “almost
independent” as h increase without bound

„ E.g., A covariance stationary process is weakly


dependent if the correlation between xt and xt+h goes
to 0 “sufficiently quickly” as h goes to ∞
Stationary and Weakly Dependent
Time Series
„ In other words, as the variables get farther apart in
time, the correlation between them becomes smaller
and smaller

„ Weak dependence is important for regression


analysis because:
„ It replaces the assumption of random sampling in
implying that the Law of Large Numbers and the
Central Limit Theorem hold
„ The most well known Central Limit Theorem for time
series processes requires stationarity and some form of
weak dependence
Stationary and Weakly Dependent
Time Series
„ Examples of weakly dependent time series:
„ An independent, identically distributed sequence – by
definition xt and xt+h are independent for all h>1
regardless of how big h is
Stationary and Weakly Dependent
Time Series
„ Consider a moving average process of order one
[MA(1)]
xt = et + α1et −1 , t = 1, 2,...
„ where et is i.i.d. with zero mean and variance σe2
„ cov(xt,xt-1)=α1σe2
„ var(xt)=(1+ α12) σe2
„ cov(xt,xt-2)=0
Stationary and Weakly Dependent
Time Series
„ Consider an autoregressive process of order one
[AR(1)]
yt = ρ1 yt −1 + et , t = 1, 2,...

„ where et is iid with mean 0 and variance σe2

„ The crucial assumption for an AR(1) process to be


weakly dependent is to assume that |ρ1|<1. Then we
say that yt is a stable AR(1) process
Stationary and Weakly Dependent
Time Series
„ To see that a stable AR(1) process is asymptotically
uncorrelated (i.e., weakly dependent), it is useful to
assume that the process is covariance stationary,
i.e.,
1. E(yt2) is constant,
2. var(yt2) is constant, and
3. for any t, h≥1, cov(yt, yt+h) depends only on h and not
t
„ Notice that E(yt)=ρ1E(yt-1)+E(et)=ρ1E(yt-1). This
condition can only hold, for ρ1≠1, if E(yt)=0.
Stationary and Weakly Dependent
Time Series
„ Next, var(yt)=ρ12var(yt-1)+var(et). Since we have
assumed covariance stationarity, then
var(yt)=var(yt-1)=σy2 which we can easily solve for:
σy2= σe2/(1-p12)

„ We are now ready to solve for the covariance


between yt and yt+h for h≥1. Start by repeated
substitution:
yt +h = ρ1 yt +h−1 + et +h
= ρ1 ( ρ1 yt +h−2 + et +h−1 ) + et +h
= ρ1h yt + ρ1h−1et +1 + ... + ρ1et +h−1 + et +h
Stationary and Weakly Dependent
Time Series
„ We can now solve for the covariance of yt+h and yt

cov ( yt , yt +h ) = E ⎡⎣( yt − E ( yt ) ) ( yt +h − E ( yt +h ) ) ⎤⎦
= E ⎡⎣( yt )( yt +h ) ⎤⎦
= E ⎡⎣ ρ1h yt2 + ρ1h−1 yt et +1 + ... + ρ1 yt et +h−1 + yt et +h ⎤⎦
= ρ1hσ y2
Stationary and Weakly Dependent
Time Series
„ Finally, because σy is the standard deviation of both
yt and yt+h, the cor(yt, yt+h)=ρ1h

„ Thus, although yt and yt+h are correlated for any


h≥1, this correlation gets smaller and smaller for
larger values of h since we’ve assumed that |ρ1|<1

„ Thus, a stable AR(1) process is weakly dependent


Asymptotic Properties of OLS
„ In the previous lecture, we saw some examples of
time series processes that do not satisfy the
assumptions TS.1 through TS.6. In such cases, we
must appeal to large sample properties of OLS, much
like we do for cross-sectional analysis.

„ We will introduce a new set of assumptions that are


very similar to those from the last lecture
Asymptotic Properties of OLS
„ Assumption TS.1’: (Linearity and weak
dependence) We assume the model is exactly as in
TS.1, but we now add the assumption that {(xt,yt):
t=1,2,…} is stationary and weakly dependent. In
particular the Law of Large Numbers and the Central
Limit Theorem can be applied to sample averages.

„ Thus, the model is still linear:


yt = β 0 + β1 xt1 + ... + β k xtk + ut
„ but the model can now include lagged dependent
variables on the right-hand side
Asymptotic Properties of OLS
„ The important extra restriction in TS.1’ as compared
to TS.1 is the inclusion of the weak dependence
requirement. This is by no means an innocuous
assumption

„ Assumption TS.2’: (No perfect collinearity) Same


as TS.2
Asymptotic Properties of OLS
„ Assumption TS.3’: (zero conditional mean) The
explanatory variables xt=(xt1,xt2,…,xtk) are
contemporaneously exogenous: E(ut|xt)=0

„ This is a much weaker condition than TS.3 since it


puts no restrictions on how ut is related to
explanatory variables in other time periods

„ By stationarity, if contemporary exogeneity holds for


one period, it holds for them all
Asymptotic Properties of OLS
„ Theorem 11.1: (Consistency of OLS): Under TS.1’
through TS.3’ the OLS estimators are consistent:
plim βˆ j = β j , j = 0,1,..., k

„ Key differences with Theorem 10.1:


„ We conclude consistency, not necessarily unbiased
„ We have weakened the sense in which the explanatory
variables must be exogenous, but weak dependence is
required
Asymptotic Properties of OLS
„ Static model example:
yt = β 0 + β1 zt 2 + β1 zt 2 + ut

„ Under weak dependence, the sufficient condition for


consistency is E(ut|z1t,z2t)

„ Importantly, TS.3’ does not rule out correlation


between, say, ut-1 and zt1. This sort of correlation
could arise if zt1 is related to past yt-1, such as:
zt1 = δ 0 + δ1 yt −1 + vt
Asymptotic Properties of OLS
„ An AR(1) model clearly violates TS.3, and thus we
have to appeal to large sample properties of OLS
yt = β 0 + β1 yt −1 + ut
„ where E(ut|yt-1,yt-2,yt-3,…)=0

„ Notice that the strict exogeneity assumption required


by TS.3 does not hold. TS.3 requires that ut be
uncorrelated with all realizations of the right-hand
side variables (y0,y1,y2,…,yn-1). Since ut is
uncorrelated with yt-1, it must be the case that ut is
correlated with yt.
Asymptotic Properties of OLS
„ Thus, a model with a lagged dependent variable
cannot satisfy TS.3

„ Assumption TS.4’: (homoskedasticity) The errors


are contemporaneously homoskedastic:
var(ut|xt)=σ2

„ Assumption TS.5’: (no serial correlation) For all


t≠s, E(utus|xt,xs)=0
Asymptotic Properties of OLS
„ In TS.4’ notice that we only condition on the
explanatory variables at time t, as opposed to the
explanatory variables for all time periods as in TS.4

„ Similarly, in TS.5’ we condition only on the


explanatory variables from time periods t and s. This
condition is a little difficult to interpret and thus we
often just think about whether or not ut and us are
uncorrelated for all s≠t without conditioning on xt and
xs
Asymptotic Properties of OLS
„ Serial correlation is often a problem in static and
finite distributed lag regression models

„ Importantly, however, it is not a problem in AR(1)


models
Asymptotic Properties of OLS
„ Theorem 11.2: (asymptotic normality of OLS)
Under TS.1’ through TS.5’, the OLS estimators are
asymptotically normally distributed. Further, the
usual OLS standard errors, t statistics, F statistics and
LM statistics are asymptotically valid

„ Thus, even if the generally stronger assumptions


introduced in the previous lecture do not hold, OLS is
still consistent and the usual inference procedures
are still valid
Highly Persistent Time Series
„ In the simple AR(1) model, the assumption |ρ1|<1 is
crucial for the series to be weakly dependent

„ Many economic times series are better characterized


by an AR(1) model with ρ1=1

„ In this case we can write: yt=yt-1+et, t=1,2,… where


we again assume that et is i.i.d. with mean 0 and
variance σe2. We further assume that the initial value,
y0, is independent of et for all t≥1
Highly Persistent Time Series
„ This process is described as a random walk

„ Using repeated substitution, we can show that


E(yt)=E(et)+E(et-1)+…+E(e1)+E(y0)=E(y0)

„ Thus, the expected value of a random walk does not


depend on t

„ By contrast, the variance changes over time:


var(yt)=var(et)+var(et-1)+…+var(e1)=σe2t
Highly Persistent Time Series
„ A random walk process displays highly persistent
behavior because the value of y today is important
for determining the value of y in the very distant
future: E(yt+h|yt)=yt

„ By comparison, with a stable AR(1) process:


E(yt+h|yt)=ρ1hyt
Highly Persistent Time Series
„ We can also show that a random walk process is not
covariance stationary since
corr ( yt , yt +h ) = t ( t + h )

„ Generally, it is not easy to look at a time series and


determine if it is a random walk. We will study formal
tests for random walks later, but for now we will
discuss them informally
Highly Persistent Time Series
„ It is important to distinguish between highly persistent time
series and trending time series

„ A series can be trending, but not be highly persistent. For


example, yt=β0+β1t+et is clearly trending, but it is not
considered to be highly persistent because
E(yt+h|yt)=E(β0+β1(t+h)+et+h)=β1(t+h). The current value
does not in anyway influence the expected value of the series h
periods in the future

„ Series such as inflation are considered to be highly persistent,


but they have no clear upward or downward trend over time
Highly Persistent Time Series
„ It is also possible though for the series to be both
trending and highly persistent. As an example,
consider a random walk with drift model:
yt = α 0 + yt −1 + et , t = 1, 2,...
„ where et and y0 satisfy the same properties as in the
random walk model and α0 is called the drift term

„ We can show that the process follows a linear time


trend by using repeated substitutions:
yt=α0t+et+et-1+…+e1+y0
Highly Persistent Time Series
„ Thus E(yt)=α0t+E(y0) which, if we assume y0=0 then
E(yt)=α0t, a clear time trend

„ Using similar reason as for the random walk model,


we can also show that E(yt+h|yt)=α0h+yt, which
exhibits persistence
Transformations on Highly Persistent
Time Series
„ Using time series which are highly persistent can lead to very
misleading results of the Central Limit Theorem assumptions are
violated.

„ Weakly dependent series are said to be integrated of order


zero, I(0). Practically, this means nothing really needs to be
done to these series before using them in regression analysis.

„ Unit root process, such as the random walk with or without


drift, are said to be integrated of order one, I(1). This
means that the first difference of the series is weakly dependent
(and often stationary)
Transformations on Highly Persistent
Time Series
„ The concept of an I(1) process is easiest to
understand using a random walk process:
Δyt = yt − yt −1 = et

„ Thus, the differenced series {Δyt: t=2,3,4,…} is


actually an i.i.d. process

„ Hence, if we suspect that series are I(1) then we


often first-difference them in order to use them in
regression analysis
Transformations on Highly Persistent
Time Series
„ Differencing a time series before using it in a
regression has an additional benefit: it removes any
linear time trend.

„ Suppose yt=β0+β1t+vt. Then Δyt= β1+ Δvt and so


E(Δyt)= β1

„ Thus, instead of including a time trend in the


regression we can also first difference the data
Deciding Whether a Time Series is
I(1)
„ Determining whether or not a series is I(1) versus
I(0) can be quite difficult. There are formal tests that
can be used for this purpose, but we won’t cover
these until later.

„ For now, we’ll cover some introductory methods

„ A very simple tool is motivated by the AR(1) model:


„ Recall, if |ρ1|<1 then the process is I(0)
„ However, if ρ1=1 then the process is I(1)
Deciding Whether a Time Series is
I(1)
„ Earlier, we showed that if an AR(1) process is stable,
then ρ1=corr(yt,yt-1). Therefore, we can estimate ρ1
from the sample correlation. This sample correlation
coefficient is called the first order autocorrelation
of {yt}. We denote it by ρ̂1

„ This sample coefficient is a consistent estimator, but


it is not unbiased
Deciding Whether a Time Series is
I(1)
„ We can use the value of the first order
autocorrelation coefficient to decide whether the
process is I(1) or I(0)

„ We can never know for sure if ρ1<1 since we can


only estimate it. Thus, ideally we would construct a
confidence interval and see if the confidence interval
excludes 1. However, this turns out to be very
complicated, which is why we will leave it for later.
Deciding Whether a Time Series is
I(1)
„ Earlier, we estimated a linear regression model for
the impact of personal exemptions (pe) on the
general fertility rate (gfr).

„ The first order autocorrelations for gfr and pe are


0.977 and 0.964. These are very high and thus
suggestive that the series are I(1). Hence, the
previous is likely flawed.
Deciding Whether a Time Series is
I(1)
„ If the series in question has an obvious upward or
downward trend, then it makes more sense to obtain
the first order autocorrelation coefficient after
detrending, otherwise the autoregressive correlation
tends to be overestimated
Dynamically Complete Models and
the Absence of Serial Correlation
„ When regression models are dynamically
complete then TS.5’ (no serial correlation) must be
satisfied.

„ Consider the general model:


yt = β 0 + β1 xt1 + ... + β k xtk + ut

„ Generally, {ut} will be serially correlated. However, if


we assume:
E ( ut | xt , yt −1 , xt −1 ,...) = 0
Dynamically Complete Models and
the Absence of Serial Correlation
„ Then we can write:
E ( yt | xt , yt −1 , xt −1 ,...) = E ( yt | xt )

„ In other words, we have included enough lags so


that further lags of y and the explanatory variables
do not matter for explaining yt. When this condition
holds, we have a dynamically complete model.
Dynamically Complete Models and
the Absence of Serial Correlation
„ The condition
E ( ut | xt , yt −1 , xt −1 ,...) = 0
is equivalent to
E ( ut | xt , ut −1 , xt −1 , ut −2 ,...) = 0

„ We can now show that dynamic completeness implies


no serial correlation. For concreteness, take s<t.
Then, by the Law of Iterated Expectations:
Dynamically Complete Models and
the Absence of Serial Correlation
E ( ut us | xt , x s ) = E ⎡⎣ E ( ut us | xt , x s , us ) | xt , x s ⎤⎦
= E ⎡⎣us E ( ut | xt , x s , us ) | xt , x s ⎤⎦
= E ⎡⎣us ( 0 ) | xt , x s ⎤⎦
=0

„ Thus, Assumption TS.5’ holds!

„ This is why you will often hear the suggestion “Add


some more lags to remove the serial correlation”
Dynamically Complete Models and
the Absence of Serial Correlation
„ Should we always try to specify a dynamically
complete model?
„ As we will see later in the course, for forecasting
reasons, yes we should
„ In other cases, sometimes we really are interested in a
static model (such as a Phillips curve) in which case we
would not want to add additional lags. The presence of
serial correlation does not mean the model is
misspecified. We will see in the next lecture how to
correct for serial correlation in such models
The Homoskedasticity Assumption for
Time Series Models
„ The Assumption TS.4’ looks a lot like the
homoskedasticity assumption for cross-sectional
regression models. However, since xt can contain
lagged y as well as lagged explanatory variables, we
need to elaborate on this assumption slightly

„ In the simple static model yt=β0+β1zt+ut Assumption


TS.4’ requires that var(ut|zt)=σ2

„ Hence, even though E(yt|zt) is a linear function of zt,


var(yt|zt) is constant.
The Homoskedasticity Assumption for
Time Series Models
„ Generally, whatever variables appear in the model
we must assume that the variance of yt given these
explanatory variables is constant

„ If the model includes lagged values of y or


explanatory variables, then we are explicitly ruling
out dynamic forms of heteroskedasticity

„ We will explore this issue further in the next lecture


Computer Exercise C11.2
„ In Example 11.7, define the growth in hourly wage
and output per hour as the change in the natural
log: ghrwage=Δlog(hrwage) and
houtphr=Δlog(outphr). Consider the following
model:
ghrwaget = β 0 + β1 goutphrt + β 2 goutphrt −1 + ut
1. Estimate the equation using the data in EARNS.RAW
and report the results in standard form. Is the
lagged value of goutphr statistically significant?
Computer Exercise C11.2
„ reg ghrwage goutphr goutph_1
„ The t statistic on the lag is 2.76, so the lag is very
significant
ghrwage Coef. Std. Err. t P>t [95% Conf. Interval]

goutphr .7283641 .1672223 4.36 0.000 .3892217 1.067507


goutph_1 .4576351 .1656126 2.76 0.009 .1217571 .793513
_cons -.010425 .0045439 -2.29 0.028 -.0196404 -.0012096
Computer Exercise C11.2
„ Test H0:β1+β2=1 versus H1:β1+β2≠1

„ test goutphr+goutph_1=1
„ F(1,36) = 0.84
„ P-value = 0.3660
„ Thus, we cannot reject H0 as the p-value is well below
the usual significance levels of 10 or 5%
Computer Exercise C11.2
„ Does goutphrt-2 need to be in the model? Explain.

„ reg ghrwage goutphr goutph_1 goutph_2


ghrwage Coef. Std. Err. t P>t [95% Conf. Interval]

goutphr .7464273 .1615037 4.62 0.000 .4182123 1.074642


goutph_1 .3740461 .1665312 2.25 0.031 .0356139 .7124783
goutph_2 .0653486 .1597067 0.41 0.685 -.2592144 .3899116
_cons -.0112838 .0048409 -2.33 0.026 -.0211217 -.0014458
Computer Exercise C11.4
„ Use the data in PHILLIPS.RAW for this exercise, but only
through 1996.

1. In Example 11.5, we assumed that the natural rate of


unemployment is constant. An alternative form of the
expectations augmented Phillips curve allows the natural rate
of unemployment to depend on past levels of unemployment.
In the simplest case, the natural rate at time t equals unemt-1.
If we assume adaptive expectations, we obtain a Phillips curve
where inflation and unemployment are in first differences:
Δinf = β 0 + β1Δunem + u
Estimate this model, report the results and discuss the sign,
size, and statistical significance of β̂1 .
Computer Exercise C11.4
„ reg cinf cunem if year<=1996
„ The coefficient on Δunem suggests and inflation-
unemployment tradeoff. The t statistic is -2.68,
implying the coefficient is statistically significant.
„ Moreover, the coefficient is large in magnitude and is
not statistically different from -1 (p-value=0.618)

cinf Coef. Std. Err. t P>t [95% Conf. Interval]

cunem -.8421708 .3142509 -2.68 0.010 -1.474725 -.2096165


_cons -.0781776 .3484621 -0.22 0.823 -.7795954 .6232401
Computer Exercise C11.4
„ Which model fits the data better, (11.19) or the
model from part (i)? Explain.
„ The model in 11.19 is Δinf regressed on unem

„ Based on the R-squared values (0.135 for (i) versus


0.108 for (11.19)), the model from part (i) explains
Δinf better. It explains about 3 percentage points
more of the variation in Δinf.
Problem 11.2
„ Let {et:1=-1,0,1,…} be a sequence of independent,
identically distributed random variables with mean
zero and variance 1. Define a stochastic process by
xt=et-(1/2)et-1+(1/2)et-2,t=1,2,…

1. Find E(xt) and var(xt). Do either depend on t?


2. Show that corr(xt,xt+1)=-1/2 and corr(xt,xt+2)=1/3.
3. What is corr(xt,xt+h) for h>2?
4. Is {xt} an asymptotically uncorrelated process?
Problem 11.4
„ Let {yt:t=1,2,…} follow a random walk with y0=0.
Show that corr(yt,yt+h)=[t/(t+h)]1/2.
Problem 11.8
„ Suppose that the equation
yt = α + δ t + β1 xt1 + ... + β k xtk + ut

satisfies the sequential exogeneity assumption in


equation (11.40).
1. Suppose you difference the model. How come
applying OLS on the differenced equation does not
generally result in consistent estimators?

Das könnte Ihnen auch gefallen