Beruflich Dokumente
Kultur Dokumente
and Seid H
ECONOMETRICS
A TEACHING MATERIAL FOR DISTANCE
STUDENTS MAJORING IN ECONOMICS
Module II
Prepared By:
Bedru Babulo
Seid Hassen
Department of Economics
Faculty of Business and Economics
Mekelle University
Econometrics: Module-II
Bedru B. and Seid H
August, 2005
Mekelle
Econometrics
Module II
Module II of the course is a continuation of module-I. In the first module of the course the first
three chapters - introductory chapter, the simple classical regression models, and the multiple
regression models - are presented with a fairly detailed treatment. In the two of chapters of
Module-I i.e. on the chapters on ‘Classical Linear Regression Models’ , students are introduced
with the basic logic, concepts, assumptions, estimation methods, and interpretations of the
classical linear regression models and their applications in economic science.
The ordinary least square (OLS) estimation method discussed in module-I possess the desirable
properties of estimators provided that the basic classical assumptions are satisfied. But in many
real world instances, the classical assumptions of linear regression models may be violated.
Therefore, module-II pays due attention to violations of these assumptions, their consequences,
and the remedial measures. Specifically, Autocorrelation, Heteroscedasticity, and Multicolliearity
problems will be given much focus.
Besides the discussions on ‘violations of classical assumptions’, three more chapters viz.
Regression on Dummy Variables; Dynamic Econometric Models; and an Introduction to
Simultaneous Equation Models are also included in Module-II.
Econometrics: Module-II
Bedru B. and Seid H
Chapter Four
4.0 Introduction
In both the simple and multiple regression models, we made important
assumptions about the distribution of Yt and the random error term ‘ut’. We
assumed that ‘ut’ was random variable with mean zero and var(u t ) 2 , and that
the errors corresponding to different observation are uncorrelated, cov(u t , u s ) 0
(for t s) and in multiple regression we assumed there is no perfect correlation
between the independent variables.
Now, we address the following ‘what if’ questions in this chapter. What if the
error variance is not constant over all observations? What if the different errors
are correlated? What if the explanatory variables are correlated? We need to ask
whether and when such violations of the basic clssical assumptions are likely to
occur. What types of data are likely to lead to heterosedasticity (different error
variance)? What type of data is likely to lead to autocorrelation (correlated
errors)? What types of data are likely to lead to multicollinearity? What are the
consequences such violations on least square estimators? How do we detect the
presence of autocorrelation, heteroscedasticity, or multicollineairy? What are the
remedial measures? How do we build an alternative model and an alternative set
of assumptions when these violations exist? Do we need to develop new
estimation procedures to tackle the problems? In the subsequent sections (4.1, 4.2,
and 4.3), we attempt to answer such questions.
4.1 Heteroscedasticity
Econometrics: Module-II
Bedru B. and Seid H
Econometrics: Module-II
Bedru B. and Seid H
Econometrics: Module-II
Bedru B. and Seid H
In panel (a) u2 seems to increase with X. in panel (b) the error variance appears
greater in X’s middle range, tapering off toward the extremes. Finally, in panel
(c), the variance of the error term is greater for low values of X, declining and
leveling off rapidly an X increases.
The pattern of hetroscedasticity would depend on the signs and values of the
coefficients of the relationship ui2 f ( X i ) , but u i ’s are not observable. As such
in applied research we make convenient assumptions that hetroscedasticity is of
the forms:
i. ui2 K 2 ( X i2 )
ii. 2 K 2 (Xi )
K
iii. ui2 etc.
Xi
1 0 .......... 0
0 2 .......... 0
(UU ' )
: : :
0 0 .......... n
………………………………………..3.10
Where i (U i2 ) . In other words, variance covariance matrix in the present case
is a diagonal matrix with unequal elements in the diagonal.
4..1.4 Examples of Heteroscedastic functions
Econometrics: Module-II
Bedru B. and Seid H
At low levels of income, the average consumption is low, and the variation below
this level is less possible; consumption cannot fall too far below because this
might mean starvation. On the other hand, it cannot rise too far above because
money income does not permit it. Such constraints may not be found at higher
income levels. Thus, consumption patterns are more regular at lower income
levels than at higher levels. This implies that at high incomes the u' s will be
high, while at low incomes the u' s will be small. The assumption of constant
variance of u' s is therefore, does not hold when estimating the consumption
function from across section of family budgets.
ii. Production Function: Suppose we are required to estimate the production
function X f ( K , L) of the sugar industry from a cross-section random sample of
firms of the industry. Disturbance terms in the production function would stand
for many factors; like entrepreneurship, technological differences, selling and
purchasing procedures, differences in organizations, etc. other than inputs, labor
(L) and capital (K) considered in the production function. The factors mentioned
above, which are not considered explicitly in the production function show
considerable variance in large firms than in small ones. This leads to breakdown
of our assumption on homogeneity of variance terms.
It should be noted that the problem of heteroscedasticity is likely to be more
common in cross-sectional data than in time-series data. One deals with members
of population at a given point of time, such as individual consumers or their
families, firms, industries. These members may be of different size such as small,
medium or large firms or low, medium or high income. In time series data on the
other hand, the variables tend to be of similar orders of magnitude because one
generally collects data for the same entity over a period of time.
Econometrics: Module-II
Bedru B. and Seid H
(ˆ ) X (U ) ( ˆ ) X
Econometrics: Module-II
Bedru B. and Seid H
i.e., the least square estimators are unbiased even under the condition of
heteroscedasticity. It is because we do not make use of assumption of
homoscedasticity here.
2. Variance of OLS coefficients will be incorrect
2
Under homoscedasticity, var(ˆ ) 2 K 2 , but under hetroscedastic
x 2
increasing range of value of X and hence cannot be taken out of the summation
(notation).
3.OLS estimators shall be inefficient: in other words, the OLS estimators do not
have the smallest variance in the class of unbiased estimators and, therefore, they
are not efficient both in small and large samples. Under the heteroscedastic
assumption, therefore:
xi xi2 ui2
var(ˆ ) K i2 (Y 2i) x 2 (Yi ) (x 2 ) 2 3.11
2
ˆ 2
Under homoscedasticy, var( ) 2 3.12
x
These two variances are different. This implies that, under heteroscedastic
assumption although the OLS estimator is unbiased, but it is inefficient. Its
variance is larger than necessary.
To see the consequence of using (3.12) instead of (3.11), let us assume that:
ui2 K i 2
ˆ 2 k i xi2 2 k i xi2
var( ) 2 2
(xi2 )(xi ) 2 x
i x
Econometrics: Module-II
Bedru B. and Seid H
k x 2
(var(ˆ ) Homo . i 2 i 3.13
x i
That is to say if x 2 and k i are positively correlated and if and only if the second
term of (3.13) is greater than 1, then var(ˆ ) under heteroscedasticty will be
greater than its variance under homoscedasticity. As a result the true standard error
of ̂ shall be underestimated. As such the t-value associated with it will be over
estimated which might lead to the conclusion that in a specific case at hand ˆ is
statistically significant (which in fact may not be true). Moreover, if we proceed
with our model under false belief of homoscedasticity of the error variance, our
inference and prediction about the population coefficients would be incorrect.
and the mean value of Y i.e. (Yˆ ) or with X i . In the figure below ei2 are plotted
Econometrics: Module-II
Bedru B. and Seid H
linear relationship where as d and e indicate quadratic relationship between ei2 and
Yi .
Econometrics: Module-II
Bedru B. and Seid H
Against H 1 : 0
If turns out to be statistically significant, it would suggest that hetroscedasticity
is present in the data. If it turns out to be insignificant, we may accept the
assumption of homoscedasticity. The park test is thus a two-stage test procedure;
in the first stage, we run OLS regression disregarding the hetroscedasticity
question. We obtain ei from this regression and then in the second stage we run
the regression in equation (3.15) above.
Example: Suppose that from a sample of size n=100 we estimate the relation
between compensation and productivity.
Y 1992.342 0.2329 X i ei 3.16
SE (936.479) (0.0098)
t ( 2.1275) ( 2.333) R 2 0.4375
The results reveal that the estimated slope coefficient is significant at 5% level of
significant on the bases of one tail t-test. The equation shows that as labour
productivity increases by, say, a birr, labor compensation on the average increases
by about 23 cents.
The residual obtained from regression (3.16) were regressed on X i as suggested
by equation (3.15) giving the following result.
ln ei2 35.817 2.8099 ln X i v i (3.17)
SE (38.319) ( 4.216)
t (0.934) ( 0.667) R 2 0.0595
The above result revealed that the slope coefficient is statistically insignificant
implying there is no statistically significant relationship between the two variables.
Following the park test, one may conclude that there is no hetroscedasticity in the
Econometrics: Module-II
Bedru B. and Seid H
error variance. Although empirically appealing, the park test has some problems.
Gold Feld and Quandt have argued that the error term vi entering into
ln ei2 ln X i vi may not satisfy the OLS assumptions and may itself be
hetroscedastic. Nonetheless, as a strict explanatory method, one may use the park
test.
b. Glejser test:
The Glejser test is similar in sprit to the park test. After obtaining the residuals ei
from the OLS regression. Glejser suggest regressing the absolute value of U i on
the X i variable that is thought to be closely associated with i2 . In his experiment,
Glejser use the following functional forms:
1
ei X i v i , ei vi
Xi
ei X i v i , ei X i v i ; where vi is error term.
1
ei vi , ei X i2 vi
Xi
Goldfeld and Quandt point out that error term vi has some problems in that its
expected value is non-zero, i.e. it is serially correlated and irrorrically it is
heteroscedstic. An additional difficulty with the Glejser method is that models
are non-linear in parameters and therefore cannot be estimated with the usual OLS
procedure. Glejester has found that for large samples the first four preceding
models give generally satisfactory results in detecting heterosedasticity. As a
practical matter, therefore, the Glejester technique may be used for large samples
and may be used in small samples strictly as qualitative device to learn something
about heterosedasticity.
c. Goldfield-Quandt test
This popular method is applicable if one assumes that the heteroscedastic variance
i2 is positively related to one of the explanatory variables in the regression
Econometrics: Module-II
Bedru B. and Seid H
Yi i X i U i
If the above equation is appropriate, it would mean i2 would be larger, the larger
values of X i .If that turns out to be the case, hetroscedasticity is most likely to be
present in the model. To test this explicitly, Goldfeld and Quandt suggest the
following steps:
Step 1: Order or rank the observations according to the values of X i beginning
with the lowest X value
Step 2: Omit C central observations where C is specified a priori, and divide the
(n c)
remaining (n-c) observations into two groups each of observations
2
(n c)
Step 3: Fit separate OLS regression to the first observations and the last
2
(n c)
observations, and obtain the respective residual sums of squares RSS, and
2
RSS2, RSS1 representing RSS from the regression corresponding to the smaller X i
values (the small variance group) and RSS 2 that from the larger X i values (the
large variance group). These RSS each have
(n c) (n c 2 K )
K or df , where: K is the number of parameters
2 2
to be estimated, including the intercept term; and df is the degree of freedom.
RSS 2 / df
Step 4: compute
RSS1 / df
Econometrics: Module-II
Bedru B. and Seid H
Econometrics: Module-II
Bedru B. and Seid H
Yi 3.4094 0.6968 X i ei
(8.7049) (0.0744)
R 0.8887
2
RSS1 377.17
df 11
RSS 2 1536.8
df 11
RSS 2 / df 1536.8 / 11
From these results we obtain: RSS1 / df 377.17 / 11
4.07
The critical F-value for 11 numerator and 11 denominator for df at the 5% level is
2.82. Since the estimated F ( ) value exceeds the critical value, we may
conclude that there is hetrosedasticity in the error variance. However, if the level
of significance is fixed at 1%, we may not reject the assumption of
homosedasticity (why?) Note that the value of the observed is 0.014.
There are also other tests of hetroscedasticity like spearman’s rank correlation test,
Breusch-pagan-Goldfe y test and white’s general hetroscedastic test. But at these
introductory level the above tests are enough.
If we apply OLS to the above then it will result in inefficient parameters since
var(u i ) is not constant.
Econometrics: Module-II
Bedru B. and Seid H
The remedial measure is transforming the above model so that the transformed
model satisfies all the assumptions of the classical regression model including
homoscedasticity. Applying OLS to the transformed variables is known as the
method of Generalized Least Squares (GLS). In short GLS is OLS on the
transformed variables that satisfy the standard least squares assumptions. The
estimators thus obtained are known as GLS estimators, and it is these estimators
that are BLUE.
4.1.8.1 The Method of Generalized (Weight) Least Square
Assume that our original model is: Y X i U i where u i satisfied all the
assumptions except that u i is heteroscedastic.
(u i ) 2 i2 f ( k i )
If we apply OLS to the above model, the estimators are no more BLUE. To make
them BLUE we have to transform the above model.
Let us assume the following types of hetroscedastic structures, under two
conditions: hetroschedasticity when the population variance i2 is known and
Econometrics: Module-II
Bedru B. and Seid H
We can know apply OLS to the above model. The transformed parameters are
BLUE. Because all the assumptions including homoscedasticity are satisfied to
ui 1
(3.1). (Yi X i )
i i
2
u 1
2
1
i
i
(Yi X i ) 2 , Let wi 2
i
i
w uˆi
2
i wi (Yi ˆ ˆX i ) 2
The method of GLS (WLS) minimizes the weighted residual sum of squares
wi uˆ i2
2wi (Yi ˆ ˆX i ) 0
ˆ
wi (Yi ˆ ˆX i ) 0
wi Yi ˆwi X i
ˆ Y * ˆX *
wi wi
Econometrics: Module-II
Bedru B. and Seid H
w Y X Y * X * wi x * y *
ˆ i i i 2
2
wi X i2 X * wi x *
Y X i U i U
is: i
Xi Xi Xi Xi Xi Xi
2
u 1 K2X 2
i 2 (u i2 ) 2
K 2 constant
Xi X i X i
which proves that the new random term in the model has a finite constant variance
( K 2 ) . We can, therefore, apply OLS to the transformed version of the model
U
i . Note that in this transformation the position of the coefficients has
Xi Xi
1
changed: the parameter of the variable X in the transformed model is the
i
constant intercept of the original model, while the constant of term of the
transformed model is the parameter of the explanatory variable X in the original
model. Therefore, to get back to the original model, we shall have to multiply the
estimated regression by K i .
Case b. Suppose the heteroscedasticity is of the form : (u i2 ) i2 k 2 X i
The transforming variable is Xi
Econometrics: Module-II
Bedru B. and Seid H
Y X i U
The transformed model is: i
Xi Xi Xi Xi
U
Xi i
Xi Xi
2
u
i 1 (U i ) 2 1 k 2 X k 2
X X X
i
Y X i Ui
(i)
X i X i X i X i
Ui 1 1
K 2 (Yi ) K 2
2
(u i ) 2
X i ( X i )
2
(Yi ) 2
The transformed model described in (i) above is however not operational in this
case. It is because values of and are not known. But since we can obtain
Yˆ ˆ ˆX i , the transformation can be made through the following two steps.
1st : we run the usual OLS regression disregarding the heteroscedasticity problem
in the data and obtain Ŷ using the estimated Ŷ , we transform the model as
Y 1 X U
follows. i i
Yˆ Yˆ Yˆ Yˆ
It should be, therefore, be clear that in order to adopt the necessary corrective
measure (which is through transformation of the original data in such a way as to
obtain a form in which the transformed disturbance terms possesses constant
Econometrics: Module-II
Bedru B. and Seid H
Yi Ui
On transforming the original model we obtain:
Xi Xi Xi
ˆ Y 1
ˆ
Xi Xi
i2 1 X K 2 1 X
2 2
ˆ
var(ˆ )
n( 1 X ( 1 X ) 2 ) n( 1 X ( 1 X )) 2
u2 X i2
Since var(ˆ ) in OLS
nx 2
4.2 Autocorrelation
4.2.1 The nature of Autocorrelation
Econometrics: Module-II
Bedru B. and Seid H
If the above assumption is not satisfied, that is, if the value of U in any particular
period is correlated with its own preceding value(s), we say there is
autocorrelation of the random variables. Hence, autocorrelation is defined as a
‘correlation’ between members of series of observations ordered in time or space.
Econometrics: Module-II
Bedru B. and Seid H
t t t
(a) (b ) (c)
Ui Ui
t : : : : : : :: :: : : : : t
:::::::::::::
(d) (e)
The figures (a) –(d) above, show a cyclical pattern among the U’s indicating
autocorrelation i.e. figures (b) and (c) suggest an upward and downward linear
trend and (d) indicates quadratic trend in the disturbance terms. Figure (e)
indicates no systematic pattern supporting non-autocorrelation assumption of the
classical linear regression model.
Econometrics: Module-II
Bedru B. and Seid H
The above figures f and g similarly indicates us positive and negative auto-
correlation respectively while h indicates no autocorrelation.
In general, if the disturbance terms follow systematic pattern as in (f) and (g) there
is autocorrelation or serial correlation and if there is no systematic pattern, this
indicates no correlation.
There are several reasons why serial or autocorrelation a rises. Some of these are:
a. Cyclical fluctuations
Time series such as GNP, price index, production, employment and unemployment
exhibit business cycle. Starting at the bottom of recession, when economic
Econometrics: Module-II
Bedru B. and Seid H
recovery starts, most of these series move upward. In this upswing, the value of a
series at one point in time is greater than its previous value. Thus, there is a
momentum built in to them, and it continues until something happens (e.g.
increase in interest rate or tax) to slowdown them. Therefore, regression involving
time series data, successive observations are likely to be interdependent.
b. Specification bias
Let’s see one by one how the above specification biases causes autocorrelation.
income, x3 price of pork and t time. Now, suppose we run the following
regression in lieu of (3.21):
y t 1 x1t 2 x 2t Vt ------3.22
Now, if equation 3.21 is the ‘correct’ model or true relation, running equation
3.22 is the tantamount to letting Vt 3 x3t U t . And to the extent the price of
pork affects the consumption of beef, the error or disturbance term V will reflect
a systematic pattern, thus creating autocorrelation. A simple test of this would be
to run both equation 3.21 and equation 3.22 and see whether autocorrelation, if
Econometrics: Module-II
Bedru B. and Seid H
any, observed in equation 3.22 disappears when equation 3.21 is run. The actual
mechanics of detecting autocorrelation will be discussed latter.
ii. Incorrect functional form: This is also one source of the autocorrelation of
error term. Suppose the ‘true’ or correct model in a cost-output study is as
follows.
As the figure shows, between points A and B the linear marginal cost curve
will consistently over estimate the true marginal cost; whereas, outside these
points it will consistently underestimate the true marginal cost. This result is
to be expected because the disturbance term Vi is, in fact, equal to
(output)2+ ui, and hence will catch the systematic effect of the (output)2 term
on the marginal cost. In this case, V i will reflect autocorrelation because of
the use of an incorrect functional form.
iii. Neglecting lagged term from the model: - If the dependent variable of a
certain regression model is to be affected by the lagged value of itself or the
Econometrics: Module-II
Bedru B. and Seid H
explanatory variable and is not included in the model, the error term of the
incorrect model will reflect a systematic pattern which indicates
autocorrelation in the model. Suppose the correct model for consumption
expenditure is:
C t 1 y t 2 y t 1 U t -----------------------------------3.25 but
again for some reason we incorrectly regress:
C t 1 y t Vt ---------------------------------------------3.26
2 0 ........ 0 1 0 ........ 0
0 2 ........ 0 2 0 1 ........ 0
(UU ' ) 2 I n --------
: : : : : :
2
0 0 0 0 1
Econometrics: Module-II
Bedru B. and Seid H
3 0 0 1 12 0 3 12 14
0 5 0 1 1 1 1 1 1
2 2
2 2
0 0 3 1 2 1
1 4 2 2
1 1
Econometrics: Module-II
Bedru B. and Seid H
u u t t 1
̂ t 2
n --------------------------------3.31
u
t 2
2
t 1
n n n
u u t t 1 u u t t 1 u u t t 1
ˆ t 2
n
t 2
t 2
rut ut 1 (Why?)---------------------3.32
2
u t2 u t21
u
2 n
t 2
t 1
u 2 t 1
t 2
1 ˆ 1 since 1 r 1 ---------------------------------------------3.33
This proves the statement “we can treat autocorrelation in the same way as
correlation in general”. From our statistics background we know that:
Econometrics: Module-II
Bedru B. and Seid H
U t f (U t 1 ) U t 1 vt
U t 1 f (U t 2 ) U t 2 vt 1
U t 2 f (U t 3 ) U t 3 vt 2
U t r f (U t ( r 1) ) U t ( r 1) vt r
We make use of above relations to perform continuous substitutions in
U t u t 1 vt as follows.
U t U t 1 vt
( U t 2 vt 1 ) vt , u t 1 U t 2 vt 1
2U t 2 vt 1 vt
2 ( U t 3 vt 3 ) ( vt 1 vt )
U t 3U t 3 2 vt 3 vt 1 vt
In this way, if we continue the substitution process for r periods (assuming that r is
very large), we shall obtain:
U t vt vt 1 2 vt 2 3 vt 3 -------------3.35
r 0 since / / 1
ut
r 0
r
vt r -----------------------------------------------------------3.36
Now, using this value of u t , let’s compute its mean, variance and covariance
1. To obtain mean
(U t ) r vt r r (vt r ) 0 since (vt r ) 0 ----------
r 0
3.37
In other words, we found that the mean of autocorrelated U’s turns out to be zero.
2. To obtain variance
var(vt r ) E (Vt r ) 2
1
2 r 2 2 (1 2 4 6 ................ ) 2 2
r 0 1
Econometrics: Module-II
Bedru B. and Seid H
2
var(U t ) --------------------------------(3.38) ; Since / / 1
(1 2 )
2
Thus, variance of autocorrelated i is 1 2 which is constant value.
u
From the above, the variance of Ui depends on the nature of variance of Vi. If the
variance of Vi is homoscedaistic, Ui is homomscedastic and if Vi is hetroscedastic,
Ui is hetroscedastic.
3. To obtain covariance:
( v2 2 v2 ...... 0)
( v2 (1 2 4 ......)
2
since 1 --------------------------------------------------------3.40
1 2
v2
cov(U t , U t 1 ) u2 ……………………………………………….3.41
1 2
Econometrics: Module-II
Bedru B. and Seid H
2
U t ~ N 0, v 2 and; E ( U t U t r ) 0 --------------------------------3.44
1-
We have seen that ordinary least square technique is based on basic assumptions.
Some of the basic assumptions are with respect to mean, variance and covariance
of disturbance term. Naturally, therefore, if these assumptions do not hold good on
what so ever account, the estimators derived by OLS procedure may not be
efficient. Now, we are in a position to examine the effect of autocorrelation on
OLS estimators. Following are effects on the estimators if OLS method is applied
in presence of autocorrelation in the given data.
We know that: ̂ k i u i
The variance of estimate ˆ in simple regression model will be biased down wards
(i.e. underestimated) when u’s are auto correlated. It can be shown as follows.
We know that: ̂ k i u i ; ˆ k i wi
Var ( ˆ ) ( ˆ ) 2 (k i u i ) 2
2 2
(k1u1 k 2 u 2 ...... k n u n ) 2 (k1 u1 k 22 u 22 ....... k n2 u n2 2k1 k 2 u1u 2 .... 2k n 1 k n u n 1u n )
( k i u i 2k i k j u i u j )
2 2
2
k i (u i ) 2 2k i k j (u i u j )
If (u i u j ) 0 which
means if there is no autocorrelation, the last term disappears
ˆ u2
so that: var( ) u k i 2
2 2
x
Econometrics: Module-II
Bedru B. and Seid H
In the case the explanatory variable X of the model is random, the covariance of
successive values is zero (xi x j 0) , under such circumstance the bias in var( )
will not be serious even though u is autocorrelated.
4. Wrong testing procedure will make wrong prediction and inference about the
characteristics of the population.
There are two methods that are commonly used to detect the existence or absence
of autocorrelation in the disturbance terms. These are:
1. Graphic method
Econometrics: Module-II
Bedru B. and Seid H
Dear distance student, you recalled from section 3.2.2 that autocorrelation can be
presented in graphs in two ways. Detection of autocorrelation using graphs will be
based on these two ways.
a. Apply OLS to the given data whether it is auto correlated or not and obtain
the error terms. Plot et horizontally and et 1 vertically. i.e. plot the
following observations (e1 , e2 ), (e2 , e3 ), (e3 , e4 ).......(en , en 1 ) .If on
plotting, it is found that most of he points fall in quadrant I and III, as
shown in fig (a) below, we say that the given data is autocorrelated and the
type of autocorrelation is positive autocorrelation. If most of the points fall
in quadrant II and IV, as shown in fig (b) below the autocorrelatioin is said
to be negative. But if the points are scattered equally in all the quadrants as
shown in fig (c) below, then we say there is no autocorrelation in the given
data.
Econometrics: Module-II
Bedru B. and Seid H
This method is called formal because the testis based on the formal testing
procedure you have seen in your statistics course. It is based on either the z-test, t-
test, F-test or X2 test. If a test applies any of the above, it is called formal testing
method. Different econometricians and statisticians suggest different types of
testing methods. But, the most frequently and widely used testing methods by
researchers are the following.
A. Run test: Before going to the detail analysis of this method, let us define what
a run is in this context. Run is the number of positive and negative signs of the
Econometrics: Module-II
Bedru B. and Seid H
Under the null hypothesis that successive outcomes (here, residuals) are
independent, and assuming that n1 10 and n2 10 , the number of runs is
distributed (asymptotically) normally with:
2n1 n 2
Mean: ( k ) 1
n1 n 2
2n1 n2 (2n1 n2 n1 n 2 )
Variance: k
2
(n1 n 2 ) 2 (n1 n 2 1)
Econometrics: Module-II
Bedru B. and Seid H
since k=5, it clearly falls outside this interval. There fore we can reject the
hypothesis that the observed sequence of residuals is random (are of independent)
with 95% confidence.
B. The Durbin-Watson d test: The most celebrated test for detecting serial
correlation is one that is developed by statisticians Durbin and Waston. It is
popularly known as the Durbin-Waston d statistic, which is defined as:
t n
(e t et 1 ) 2
d t 2
t n ------------------------------------3.47
e
t 1
2
t
1. The regression model includes an intercept term. If such term is not present,
as in the case of the regression through the origin, it is essential to rerun the
regression including the intercept term to obtain the RSS.
4. The regression model does not include lagged value of Y the dependent
variable as one of the explanatory variables. Thus, the test is inapplicable
to models of the following type
y t 1 2 X 2t 3 X 3t ....... k X kt ry t 1 U t
Where y t 1 the one period lagged value of y is such models are known as
autoregressive models. If d-test is applied mistakenly, the value of d in such
Econometrics: Module-II
Bedru B. and Seid H
cases will often be around 2, which is the value of d in the absence of first
order autocorrelation. Durbin developed the so-called h-statistic to test
serial correlation in such autoregressive.
In using the Durbin –Watson test, it is, there fore, to note that it can not be
applied in violation of any of the above five assumptions.
t n
(e t et 1 ) 2
Dear distance student, from equation 3.47 the value of d
t 2
t n
e
t 1
2
t
e 2
t
n n
However, for large samples et et 1 because in both cases one
2 2
t 2 t 2
e e
d 21 n t t 1
t 1
et
et et 1
but from equation
et
d 2(1 ˆ )
Econometrics: Module-II
Bedru B. and Seid H
ˆ 0, d 2
if ˆ 1, d 0
ˆ 1, d 4
Thus we obtain two important conclusions
However, because the exact value of d is never known, there exist ranges of values
with in which we can either accept or reject null hypothesis. We do not also have
unique critical value of d-stastics. We have d L -lower bound and d u upper bound
of he initial values of d to accept or reject the null hypothesis.
For the two-tailed Durbin Watson test, we have set five regions to the values of d
as depicted in the figure below.
The mechanisms of the D.W test are as follows, assuming that the assumptions
underlying the tests are fulfilled.
Obtain the computed value of d using the formula given in equation 3.47
For the given sample size and given number of explanatory variables, find
out critical d L and d U values.
Econometrics: Module-II
Bedru B. and Seid H
of d with d L , d U , (4 d L ) and (4 d U )
(4 d L ) =4-1.37=2.63
( 4 d U ) =4-1.5=2.50
Since d is less than d L we reject the null hypothesis of no autocorrelation
Example 2. Consider the model Yt X t U t with the following observation on X and
Y
X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Y 2 2 2 1 3 5 6 6 10 10 10 12 15 10 11
Solution:
1. regress Y on X: i.e. Yt X t U t :
Econometrics: Module-II
Bedru B. and Seid H
xy 255
ˆ 2 0.91
x 280
Y 0.29 0.91X U i
(et et 1 ) 2 60.213
d 1.442
et2 41.767
(4 d u ) 2.64
d U d 4 d U (1.364 2.64)
d * 1.442
Since d* lies between dU d 4 dU , accept H0. This implies the data is autocorrelated.
Although D.W test is extremely popular, the d test has one great drawback in that
In many situations, however, it has been found that the upper limit d U is
approximately the true significance limit. Thus, the modified DW test is based on
dU in case the estimated d value lies in the inconclusive zone, one can use the
following modified d test procedure. Given the level of significance ; if
Econometrics: Module-II
Bedru B. and Seid H
Since in the presence of serial correlation the OLS estimators are inefficient, it is
essential to seek remedial measures. The remedy however depends on what
knowledge one has about the nature of interdependence among the disturbances. :
This means the remedy depends on whether the coefficient of autocorrelation is
known or not known.
and U t U t 1 Vt , | | 1 3.50
yt 1 X t 1 U t 1 3.51
Econometrics: Module-II
Bedru B. and Seid H
Let: Yt * Y y t 1
a (1 )
X t* X t X t 1
Yt * a BX t* vt (3.54)
It may be noted that in transforming Equation (3.49) into (3.54) one observation
shall be lost because of lagging and subtracting in (3.52). We can apply OLS to the
transformed relation in (3.54) to obtain ˆ and ˆ for our two parameters and .
aˆ
ˆ and it can be shown that
1
2
1
var ˆ
1
var(aˆ )
u2 X t2 * u2
var(ˆ ) n
, var(ˆ ) n
n ( X X )
*
t
2
(X *
t X t* ) 2
ti
Estimators obtained in equation 6 are efficient, only if our sample size is large so
that loss of one observation becomes negligible.
When is not known, we will describe the methods through which the
coefficient of autocorrelation can be estimated.
Econometrics: Module-II
Bedru B. and Seid H
Many times an investigator makes some reasonable guess about the value of
autoregressive coefficient by using his knowledge or institution about the
relationship under study. Many researchers usually assume that =1 or -1.
(Yt Yt 1 ) ( X t X t 1 ) Vt ; where Vt U t U t 1
Note that the constant term is suppressed in the above. B̂ is obtained by taking
merely the first differences of the variable and obtaining line that passes through
the origin. Suppose that one assumes =-1 instead of =1, i.e the case of
perfect negative autocorrelation. In such a case, the transformed model becomes:
Yt Yt 1 ( X t X t 1 ) vt
Yt Yt 1 2 ( X t X t 1 ) vt Or
2 2 2
This model is then called two period moving average regression model because
Yt Yt 1
actually we are regressing the value of one moving average on another
2
( X t X t 1 )
2
This method of first difference in quite popular in applied research for its
simplicity. But the method rests on the assumption that either there is perfect
positive or perfect negative autocorrelation in the data.
Econometrics: Module-II
Bedru B. and Seid H
d 2(1 ˆ )
of certain data. Given the d-value we can estimate from this. ˆ 1 d
1
2
As already pointed out, ̂ will not be accurate if the sample size is small. The
above relationship is true only for large samples. For small samples, Theil and
Nagar have suggested the following relation:
n 2 (1 d 2 ) k 2
ˆ ………………………………………………..3.55
n2 k 2
We estimate ̂ from the above relation. With the estimated ̂ , we transform the
original data and then apply OLS to the model.
(Yt ˆYt 1 ) (1 ˆ ) ( X t ˆX t 1 ) Vt ˆu t 1 ……………......…3.57
Econometrics: Module-II
Bedru B. and Seid H
method. However one can follow an alternative approach to use at each step of
interaction, the Durbin Watson d-statistic to residuals for autocorrelation or till the
estimates of do not differ substantially from one another.
Method IV: Durbin’s two-stage method: Assuming the first order autoregressive
scheme, Durbin suggests a two-stage procedure for resolving the serial correlation
problem. The steps under this method are:
Given Yt X t ut -----------------------------------(3.59)
U t U t 1 vt
Yt * Yt 1 X t X t 1 vt
4.3 Multicollinearity
4.3.1 The nature of Multicollinearity
Originally, multicollinearity meant the existence of a “perfect” or exact, linear
relationship among some or all explanatory variables of a regression model. For k-
Econometrics: Module-II
Bedru B. and Seid H
where 1 , 2 ,..... k are constants such that not all of them are simultaneously
zero.
Today, however , the term multicollineaity is used in a broader sense to include the
case of perfect multicollinearity as shown by (1) as well as the case where the x-
variables are inter-correlated but not perfectly so as follows
1 x1 2 x 2 ....... 2 x k vi 0 (1)
Econometrics: Module-II
Bedru B. and Seid H
Econometrics: Module-II
Bedru B. and Seid H
Econometrics: Module-II
Bedru B. and Seid H
2 2
x1 y2 x1 x1 yx1 0
2 2
indeterminate.
(x ) (x1 )
2 2 2
1
2 0
Applying the same procedure, we obtain similar result (indeterminate value) for
ˆ 2 . Likewise, from our discussion of multiple regression model, variance of ̂ 1
2 x 22
is given by : var(ˆ1 )
x12 x12 (x1 x 2 ) 2
2 2 x12
infinite.
0
These are the consequences of perfect multicollinearity. One may raise the
question on consequences of less than perfect correlation. In cases of near or high
multicollinearity, one is likely to encounter the following consequences.
Econometrics: Module-II
Bedru B. and Seid H
determinate.
This proves that if we have less than perfect multicollinearity the OLS coefficients
are determinate.
The implication of indetermination of regression coefficients in the case of perfect
multicolinearity is that it is not possible to observe the separate influence of
x1 and x 2 . But such extreme case is not very frequent in practical applications.
1
2 x 22 . 2
x 2 2
var( ˆ 2 )
x x
2
1
2
2 (x1 x 2 ) 2 . 1
2 x1
2 (x1 x 2 ) 2
x 2 x12
2 2
(x1 x 2 ) 2 x12 (1 r122 )
x1 1
2
x12 x12
Econometrics: Module-II
Bedru B. and Seid H
As r12 increases to ward one, the covariance of the two estimators increase in
absolute value. The speed with which variances and covariance increase can be
seen with the variance-inflating factor (VIF) which is defined as:
1
VIF
1 r122
Which shows that variances of ˆ1 and ˆ 2 are directly proportional to the VIF.
4. Because of the large variance of the estimators, which means large standard
errors, the confidence interval tend to be much wider, leading to the acceptance of
“zero null hypothesis” (i.e. the true population coefficient is zero) more readily.
5. Because of large standard error of the estimators, the computed t-ratio will be
very small leading one or more of the coefficients tend to be statistically
insignificant when tested individually.
Econometrics: Module-II
Bedru B. and Seid H
6. Although the t-ratio of one or more of the coefficients is very small (which
makes the coefficients statistically insignificant individually), R 2, the overall
measure of goodness of fit, can be very high.
Example: if y 1 x1 2 x 2 .... k x k vi
In the cases of high collinearity, it is possible to find that one or more of the partial
slope coefficients are individually statistically insignificant on the basis of t-test.
But the R2 in such situations may be so high say in excess of 0.9.in such a case on
the basis of F test one can convincingly reject the hypothesis that
1 2 k 0 Indeed, this is one of the signals of multicollinearity-
7. The OLS estimators and their standard errors can be sensitive to small change
in the data.
Dear Readers! These are the major consequences of near or high multicolinearity.
If you have any comments or suggestion, you are welcome!
Econometrics: Module-II
Bedru B. and Seid H
ii. A high rx x is only sufficient but not a necessary condition (adequate condition)
i j
for the existence of multicollinearity because multicollinearity can also exist even
if the correlation coefficient is low.
However, the combination of all these criteria should help the detection of
multicollinearity.
Econometrics: Module-II
Bedru B. and Seid H
4.3.4.2 The Farrar-Glauber test - They use three statistics for testing
mutlicollinearity There are chi-square, F-ratio and t-ratio. This test may be
outlined in three steps.
A. Computation of 2 to test orthogonalitly: two variables are called orthogonal
x 2 x1 x2 x1 x3
1
( x 2 ) 2 x1 x2 x1 x1
2 2
1
x 2 x1 x22 x 2 x 3
x 2 x 2 2
x22 x32
1 2 (x22 )
x3 x1 x 3 x 2 x32
x1 x1 x12 x32 (x32 ) 2
2 2
1 r12 r13
r12 1 r23
r13 r23 1
Econometrics: Module-II
Bedru B. and Seid H
H1 : rxi x j . x1 , x2 , x3 ,...xk 0
(r 2 xi x j x1 , x2 , x3 ,...xk ) n k
t* (How?)
(1 r 2 xi x j . x1 , x2 , x3 ,...xk )
if t*>t (tabulate), H0 is rejected
t*<t (tabulated), H0 is accepted, we accept Xi and Xj are not the cause of
muticollinearity since ( rx x is not significant)
i j
Econometrics: Module-II
Bedru B. and Seid H
In addition using these value we can drive the condition index (CI) defined as
Max.eigen value
CI k
min . eigen value
1
as TOLi (1 R 2j )
VIF
Econometrics: Module-II
Bedru B. and Seid H
ˆ 2
seen earlier var( ), 2 (VIF ) ; depends on three factors 2 , xi2 and VIF . A
xi
high VIF can be counter balanced by low 2 or high xi2 . To put differently, a
high VIF is neither necessary nor sufficient to get high variances and high standard
errors. Therefore, high multicollinearity, as measured by a high VIF may not
necessary cause high standard errors.
4.3.5.Remedial measures
It is more difficult to deal with models indicating the existence of multicollinearity
than detecting the problem of multicollinearity. Different remedial measures have
been suggested by econometricians; depending on the severity of the problem,
availability of other sources of data and the importance of the variables, which are
found to be multicollinear in the model.
Some suggest that minor degree of multicollinearity can be tolerated although one
should be a bit careful while interpreting the model under such conditions. Others
suggest removing the variables that show multicollinearity if it is not important in
the model. But, by doing so, the desired characteristics of the model may then get
affected. However, following corrective procedures have been suggested if the
problem of multicollinearity is found to be serious.
1. Increase the size of the sample: it is suggested that multicollinearity may be
avoided or reduced if the size of the sample is increased. With increase in the size
of the sample, the covariances are inversely related to the sample size. But we
should remember that this will be true when intercorrelation happens to exist only
in the sample but not in the population of the variables. If the variables are
collinear in the population, the procedure of increasing the size of the sample will
not help to reduce multicollinearity.
Econometrics: Module-II
Bedru B. and Seid H
x ( y ˆ 2* x 2 ) x1 y ˆ 2* x1 x 2
Applying OLS method: ˆ1 1
2
x1 2
x1
Econometrics: Module-II
Bedru B. and Seid H
Q* A * L * K * U
The asterisk indicates logs of the variables. Suppose, it is observed that K and L
move together so closely that it is difficult to separate the effect of changing
quantities of labor inputs on output from the effect of variation in the use of
capital. Again, let us assume that on the basis of information from some other
source, we have a solid evidence that the present industry is characterized by
constant returns to scale. This implies that 1, we can therefore, on the basis
of this information, substitute (1 ) in the transformed function. On
combining the results, the relationship becomes:
Dˆ t ˆPt 2*Yt uˆ t
Where ˆ1 is derived from the time series data, ̂ 2* is obtained by using the cross-
section data. By following the pooling technique, we have skirted the
multicollinearity between income and price.
The methods described above are no sure methods to get rid of the problem of
multicollinearity. Which of these rules work in practice will depend on the nature
of the data under investigation and severity of the multicollinearity problem.
Econometrics: Module-II
Bedru B. and Seid H
Chapter Five
Econometrics: Module-II
Bedru B. and Seid H
Model (5.01) may enable us to find out whether sex makes any difference in a
college professor’s salary, assuming, of course, that all other variables such as age,
degree attained, and years of experience are held constant. Assuming that the
disturbance satisfy the usually assumptions of the classical linear regression
model, we obtain from (5.01).
Mean salary of female college professor: E (Yi / Di 0) -------(5.02)
Mean salary of male college professor: E (Yi / Di 1)
that is, the intercept term gives the mean salary of female college professors
and the slope coefficient tells by how much the mean salary of a male college
professor differs from the mean salary of his female counterpart, reflecting
the mean salary of the male college professor. A test of the null hypothesis that
there is no sex discrimination ( H 0 : 0) can be easily made by running
regression (5.01) in the usual manner and finding out whether on the basis of the t
test the estimated is statistically significant.
5.2 Regression on one quantitative variable and one qualitative variable with
Econometrics: Module-II
Bedru B. and Seid H
Econometrics: Module-II
Bedru B. and Seid H
If the assumption of common slopes is valid, a test of the hypothesis that the two
regressions (5.04) and (5.05) have the same intercept (i.e., there is no sex
discrimination) can be made easily by running the regression (5.03) and noting the
statistical significance of the estimated 2 on the basis of the traditional t test. If
the t test shows that ̂ 2 is statistically significant, we reject the null hypothesis
that the male and female college professors’ levels of mean annual salary are the
same.
Before proceeding further, note the following features of the dummy variable
regression model considered previously.
Econometrics: Module-II
Bedru B. and Seid H
has two categories, and hence we introduced only a single dummy variable.
If this rule is not followed, we shall fall into what might be called the
dummy variable trap, that is, the situation of perfect multicollinearity.
2. The assignment of 1 and 0 values to two categories, such as male and
female, is arbitrary in the sense that in our example we could have assigned
D=1 for female and D=0 for male.
3. The group, category, or classification that is assigned the value of 0 is often
referred to as the base, benchmark, control, comparison, reference, or
omitted category. It is the base in the sense that comparisons are made with
that category.
5.3 Regression on one quantitative variable and one qualitative variable with
more than two classes
Suppose that, on the basis of the cross-sectional data, we want to regress the
annual expenditure on health care by an individual on the income and education of
the individual. Since the variable education is qualitative in nature, suppose we
consider three mutually exclusive levels of education: less than high school, high
school, and college. Now, unlike the previous case, we have more than two
categories of the qualitative variable education. Therefore, following the rule that
the number of dummies be one less than the number of categories of the variable,
we should introduce two dummies to take care of the three levels of education.
Assuming that the three educational groups have a common slope but different
intercepts in the regression of annual expenditure on health care on annual income,
we can use the following model:
Econometrics: Module-II
Bedru B. and Seid H
= 0 otherwise
D3 1 if college education
= 0 otherwise
Note that in the preceding assignment of the dummy variables we are arbitrarily
treating the “less than high school education” category as the base category.
Therefore, the intercept 1 will reflect the intercept for this category. The
differential intercepts 2 and 3 tell by how much the intercepts of the other two
categories differ from the intercept of the base category, which can be readily
checked as follows: Assuming E (u i ) 0 , we obtain from (5.06)
E (Yi | D2 0, D3 0, X i ) 1 X i
E (Yi | D2 1, D3 0, X i ) ( 1 2 ) X i
E (Yi | D2 0, D3 1, X i ) ( 1 3 ) X i
which are, respectively the mean health care expenditure functions for the three
levels of education, namely, less than high school, high school, and college.
Geometrically, the situation is shown in fig 5.2 (for illustrative purposes it is
assumed that 3 2 ).
Econometrics: Module-II
Bedru B. and Seid H
=0 otherwise
D3 1 if white
=0 otherwise
Notice that each of the two qualitative variables, sex and color, has two categories
and hence needs one dummy variable for each. Note also that the omitted, or base,
category now is “black female professor.”
Assuming E (u i ) 0 , we can obtain the following regression from (5.07)
Econometrics: Module-II
Bedru B. and Seid H
Once again, it is assumed that the preceding regressions differ only in the intercept
coefficient but not in the slope coefficient .
An OLS estimation of (5.06) will enable us to test a variety of hypotheses. Thus,
if 3 is statistically significant, it will mean that color does affect a professor’s
salary. Similarly, if 2 is statistically significant, it will mean that sex also affects
a professor’s salary. If both these differential intercepts are statistically
significant, it would mean sex as well as color is an important determinant of
professors’ salaries.
From the preceding discussion it follows that we can extend our model to include
more than one quantitative variable and more than two qualitative variables. The
only precaution to be taken is that the number of dummies for each qualitative
variable should be one less than the number of categories of that variable.
Econometrics: Module-II
Bedru B. and Seid H
whether two (or more) regressions are different, where the difference may be in
the intercepts or the slopes or both.
D2 1 if female
= 0 if male
D3 1 if college graduate
= 0 otherwise
Implicit in this model is the assumption that the differential effect of the sex
dummy D2 is constant across the two levels of education and the differential
effect of the education dummy D3 is also constant across the two sexes. That is,
if, say, the mean expenditure on clothing is higher for females than males this is so
whether they are college graduates or not. Likewise, if, say, college graduates on
the average spend more on clothing than non college graduates, this is so whether
they are female or males.
Econometrics: Module-II
Bedru B. and Seid H
which shows that the mean clothing expenditure of graduate females is different
(by 4 ) from the mean clothing expenditure of females or college graduates. If
2 , 3 , and 4 are all positive, the average clothing expenditure of females is
higher (than the base category, which here is male nongraduate), but it is much
more so if the females also happen to be graduates. Similarly, the average
expenditure on clothing by a college graduate tends to be higher than the base
category but much more so if the graduate happens to be a female. This shows
how the interaction dummy modifies the effect of the two attributes considered
individually. Whether the coefficient of the interaction dummy is statistically
significant can be tested by the usual t test. If it turns out to be significant, the
simultaneous presence of the two attributes will attenuate or reinforce the
individual effects of these attributes. Needless to say, omitting a significant
interaction term incorrectly will lead to a specification bias.
Econometrics: Module-II
Bedru B. and Seid H
series. Important economic time series, such as the consumer price index, the
wholesale price index, the index of industrial production, are usually published in
the seasonably adjusted form.
It pays commissions based on sales in such manner that up to a certain level, the
target, or threshold, level X*, there is one (stochastic) commission structure and
beyond that level another. (Note: Besides sales, other factors affect sales
commission. Assume that these other factors are represented by the stochastic
disturbance term.) More specifically, it is assumed that sales commission increases
linearly with sales until the threshold level X*, after which also it increases
linearly with sales but at a much steeper rate. Thus, we have a piece-wise linear
regression consisting of two linear pieces or segments, which are labeled I and II
in fig. 5.3, and the commission function changes its slope at the threshold value.
Given the data on commission, sales, and the value of the threshold level X*, the
technique of dummy variables can be used to estimate the (differing) slopes of the
Econometrics: Module-II
Bedru B. and Seid H
two segments of the piecewise linear regression shown in fig. 5.3. We proceed as
follows:
Yi 1 X 2 ( X i X *) Di u i ------------------------------------(5.11)
where Yi sales commission
X i volume of sales generated by the sales person
X*= threshold value of sales also known as a knot (Known in advance)
D=1 if Xi X *
= 0 if Xi X *
Assuming E (u i ) 0, we see at once that
E (Yi | Di 0, X i , X *) 1 1 X i ---------------------------------------(5.12)
which gives the mean sales commission up to the target level X* and
E (Yi | Di 1, X i , X *) 1 2 X * ( 1 2 ) X i ----------------------(5.13)
which gives the mean sales commission beyond the target level X*.
Thus, 1 gives the slope of the regression lien in segment I, and 1 2 gives the
slope of the regression line in segment II of the piecewise linear regression shown
in fig 5.3. A test of the hypothesis that there is no break in the regression at the
threshold value X* can be conducted easily by noting the statistical significance of
the estimated differential slope coefficient ˆ 2 .
Summary:
1. Dummy variables taking values of 1 and 0 (r their linear transforms) are a
means of introducing qualitative regressors in regression analysis.
2. Dummy variables are a data-classifying device in that they divide a sample
into various subgroups based on qualities or attributes (sex, marital status,
race, religion, etc.) and implicitly allow one to run individual regressions
for each subgroup. If there are differences in the response of the regress
Econometrics: Module-II
Bedru B. and Seid H
Econometrics: Module-II
Bedru B. and Seid H
renovation, 0 otherwise.
D3 type of theater: 1 if outdoor, 0 if indoor
D4 parking: 1 if provided, 0 otherwise
Chapter Six
6.1 Introduction
While considering the standard regression model, we did not pay attention to the
timing of the explanatory variable(s) on the dependent variable. The standard
linear regression implies that change in one of the explanatory variables causes a
change in the dependent variable during the same time period and during that
period alone. But in economics, such specification is scarcely found. In economic
phenomenon, generally, a cause often produces its effect only after a lapse of time;
this lapse of time (between cause and its effect) is called a lag. Therefore, realistic
Econometrics: Module-II
Bedru B. and Seid H
is a distributed lag model of consumption function. This means that the value of
the consumption expenditure (C t ) at any given time depends on the current and
past values of the disposable income (Yt ) . The general form of a distributed lag
model (with only lagged exogenous variables) is written as:
Yt 0 X t 1 X t 1 2 xt 2 s X t s U t
The number of lags, s, may be either finite or infinite. But generally it is assumed
to be finite. The coefficient 0 is known as the short run, or impact, multiplier
because it gives the change in mean value of Y following a unit change in X in the
same time period t. If the change in X is maintained at the same level thereafter,
then,(0+1)gives the change in the (mean value of) Y in the next period,(0+1+
2) in the following period, and so on. These partial sums are called interim, or
intermediate, multipliers. Finally, after ‘s’ periods we obtain which is known as the
long run, distributed-lag multiplier provided the sum exists.
s
i 0
i 0 1 2 s
Econometrics: Module-II
Bedru B. and Seid H
Econometrics: Module-II
Bedru B. and Seid H
placed funds in long term saving accounts for fixed durations such as one
year, three years or seven years are essentially “locked” in even though
money market conditions may be such that higher yields are available.
In (6.01) the length of the lag, that is, how far back into the past we want to go
hasn’t been defined. Such a model is called an infinite (lag) model, whereas
models with specified lags are called a finite (lag) distributed-lag model.
How do we estimate and in (6.01)? We may adopt two approaches:
I. Ad Hoc estimation of distributed-lag models
II. A priori restriction on ' s by assuming that the ' s follow some
systematic pattern.
Econometrics: Module-II
Bedru B. and Seid H
Proponents of this approach chose the second regression as the “best” one because
in the last two equation the sign of X t z was not stable and in the last equation the
sign of X t 3 was negative, which may be difficult to interpret economically.
Although seemingly straight forward, ad hoc estimation suffers from many
drawbacks, such as the following:
a. There is no guide as to what is the maximum lag length
b. As one estimates successive lags, there are fewer degrees of freedom
left, making statistical inference some what shaky
c. More importantly, in economic time series data, successive values
(lags) tend to be highly correlated; hence multicollinearity rears its
ugly head.
d. The sequential search for the lengths of lags opens the researcher to
the charge of data mining.
In view of the preceding problems, the ad hoc estimation procedure has very little
to recommend it. Some prior or theoretical considerations are brought to bear upon
the various ’s if we are to make headway with the estimation problem.
Econometrics: Module-II
Bedru B. and Seid H
U ~ N(0, 2 )
(u i u j ) 0
(u i xi ) 0
According to Koyck: i i 0
1 0 , 2 2 0
Let * (1 ),Vt U t U t 1
Yt * 0 X t Yt 1 Vt
Econometrics: Module-II
Bedru B. and Seid H
a. Our original model was a distributed lag model but he transformed model is
autoregressive model because Yt 1 appears as one of the explanatory
variables. Koyck transformation, therefore, also helps to convert
distributed lag model into an auto regressive model.
b. In the new formulation the error term Vt U t U t 1 is found to be auto
correlated despite the fact that the disturbance term of the original model is
non-auto correlated. It can be seen as under
(VtVt 1 ) (U t U t 1 )(U t 1 U t 2 )
(U 2 t 1 )
u2 0
c. The lagged variable Yt 1 is not also independent of the error term Vt i.e.
(U t Yt 1 ) 0 this is because Yt is directly dependent on Vt . Similarly
Yt 1 on Vt 1 . But since Vt and Vt 1 are not independent, Yt 1 will
obviously be related to Vt .
Due to these two problems, the Koyck transformation of the distributed lag model
will give rise to biased and inconsistent estimates. In addition to these estimation
problem, the Koyck hypothesis is quite restrictive in the sense that it assumes that
impact of past periods decline successively in a specific way. But the following
are also possible.
1. 0 1 2 3
2. 0 1 2 3
3. 0 1 2 3
Econometrics: Module-II
Bedru B. and Seid H
Yt 0 1 X t* u t -----------------------------------------------------(i)
What equation (ii) implies is that “economic agents will adapt their expectations
in the light of past experience and that in particular they will learn from their
mistakes.” More specifically, (ii) states that expectations are revised each period
by a fraction of the gap between the current value of the variable and its
previous expected value. Thus, for our model this would mean that expectations
about interest rates are revised each period by a fraction of the discrepancy
between the rate of interest observed in the current period and what its anticipated
value had been in the previous period. Another way of stating this would be to
write (ii) as: X t* X t (1 ) X t*1 -------------------------------------------------(iii)
which shows that the expected value of the rate of interest at time t is a weighted
average of the actual value of the interest rate at time ‘t’ and its value expected in
the previous period, with weights of ‘ ’ and ‘1- ’, respectively. If =1,
X t* X t , meaning that expectations are realized immediately and fully, that is, in
the same time period. If, on the other hand, =0, X t* X t*1 , meaning that
expectations are static, that is, “conditions prevailing today will be maintained in
Econometrics: Module-II
Bedru B. and Seid H
all subsequent periods. Expected future values then become identified with
current values.” Substituting (iii) into (i), we obtain
Yt 0 1 X t (1 ) X t*1 u t
Now, lag equation (i) by one period, multiply it by 1- , and subtract the product
from (iv). After simple algebraic manipulations, we obtain:
Yt 0 1 X t (1 )Yt 1 u t (1 )u t 1
where vt u t (1 )u t 1 .
Let us note the difference between (i) and (v). In the former, 1 measures the
average response of Y to a unit change in X*, the equilibrium or long-run value of
X. In (v), on the other hand, 1 measures the average response of Y to a unit
change in the actual or observed value of X. These responses will not be the same
unless, of course, =1, that is, the current and long-run values of X are the same.
In practice, we first estimate (v). Once an estimate of is obtained from the
coefficient of lagged Y, we can easily compute 1 by simply dividing the
coefficient of X t ( 1 ) by .
Note that like the Koyck model, the adaptive expectations model is autoregressive
and its error term is similar to the Koyck error term.
Econometrics: Module-II
Bedru B. and Seid H
produce a given output under the given state of technology, rate of interest, etc.
For simplicity assume that this desired level of capital Yt * is a linear function of
output X as follows:
Yt * 0 1 X t u t ------------------------------------------------(1)
Since the desired level of capital is not directly observable, Nerlove postulates the
following hypothesis, known as the partial adjustment, or stock adjustment,
hypothesis:
Yt Yt 1 (Yt * Yt 1 ) --------------------------------------------(2)
Equation (2) postulates that the actual change in capital stock (investment) in any
given time period t is some fraction of the desired change for that period. If
=1, it means that the actual stock of capital is equal to the desired stock; that is,
actual stock adjusts to the desired stock instantaneously (in the same period).
However, if =0, it means that nothing changes since actual stock at time t is the
same as that observed in the previous time period. Typically, is expected to lie
between these extremes since adjustment to the desired stock of capital is likely to
be incomplete because of rigidity, inertia, contractual obligations, etc. – hence the
name partial adjustment model. Note that the adjustment mechanism (2)
alternatively can be written as:
Yt Yt * (1 )Yt 1 -------------------------------------------------(4)
Econometrics: Module-II
Bedru B. and Seid H
showing that the observed capital stock at time t is a weighted average of the
desired capital stock at that time and the capital stock existing in the previous time
period, and (1- ) being the weights. Now substitution of (1) into (4) gives:
Yt ( 0 1 X t u t ) (1 )Yt 1
The partial adjustment model resembles both the Koyck and adaptive expectation
models in that it is autoregressive. But it has a much simpler disturbance term: the
original disturbance term u t multiplied by a constant . But bear in mind that
although similar in appearance, the adaptive expectation and partial adjustment
models are conceptually very different. The former is based on uncertainty (about
the future course of prices, interest rates, etc.), whereas the latter is due to
technical or institutional rigidities, inertia, cost of change, etc. However, both of
these models are theoretically much sounder than the Koyck model.
The important point to keep in mind is that since Koyck, adaptive expectations,
and stock adjustment models – apart from the difference in the appearance of the
error term – yield the same final estimating model, one must be extremely careful
in telling the reader which model the researcher is using and why. Thus,
researchers must specify the theoretical underpinning of their model.
Econometrics: Module-II
Bedru B. and Seid H
Econometrics: Module-II
Bedru B. and Seid H
of the form of lag scheme. It is because; this method does not hypothesize any
form of lag before hand.
This model assumes that any pattern of lag scheme among ' s can be described
by polynomial. This idea is based on a theorem in mathematics known as
Weierstrass’s theorem, which states that under general conditions a curve may be
approximated by a polynomial whose degree is one more than the number of
turning points in the curve. Suppose that the ' s in a given distributed lag model
are expected to decrease first, then increase and again decrease
We are, now in a position to obtain all ' s by setting i equal to the value of the
subscript of the particular coefficient.
0 a0
1 a0 a1 a 2 a3
2 a0 2a1 4a 2 8a3
3 a0 3a1 9a 2 27 a3
k a 0 ka1 k 2 a 2 k 3 a3
Econometrics: Module-II
Bedru B. and Seid H
k k k k
Yt 0 a0 X t i a1 iX t i a 2 i 2 X t i a3 i 3 X t i U i
i 1 i 1 i 1 i 1
Yt 0 a 0 w0 a1 w1 a 2 w2 a3 w3 U t
k k
where: w0 X t i , w2 iX t i -----------------------------------------------(c)
i 1 i 1
This is the final form (or transformed form) of Almon Lag model. We can now
apply OLS method to estimate ˆ 0 , aˆ 0 , aˆ1 , aˆ 2 , and aˆ 3 to obtain ' s in the original
form. Note that vt remains in its original form.
Chapter Seven
7.1 Introduction
In all the previous chapters discussed so far, we have been focusing exclusively
with the problems and estimations of a single equation regression models. In such
models, a dependent variable is expressed as a linear function of one or more
explanatory variables. The cause-and-effect relationship in such models between
the dependent and independent variable is unidirectional. That is, the explanatory
variables are the cause and the independent variable is the effect. But there are
situations where such one-way or unidirectional causation in the function is not
meaningful. This occurs if, for instance, Y (dependent variable) is not only
function of X’s (explanatory variables) but also all or some of the X’s are, in turn,
determined by Y. There is, therefore, a two-way flow of influence between Y and
(some of) the X’s which in turn makes the distinction between dependent and
independent variables a little doubtful. Under such circumstances, we need to
Econometrics: Module-II
Bedru B. and Seid H
consider more than one regression equations; one for each interdependent
variables to understand the multi-flow of influence among the variables. This is
precisely what is done in simultaneous equation models.
The bias arising from application of such procedure of estimation which treats
each equation of the simultaneous equations model as though it were a single
model is known as simultaneity bias or simultaneous equation bias. To avoid this
bias we will use other methods of estimation, such as, Indirect Least Square (ILS),
Two Stage Least Square (2SLS), three Stage Least Square(3SLS), Maximum
Likelihood Methods and the Method of Instrumental Variable (IV).
Econometrics: Module-II
Bedru B. and Seid H
Y 0 1 X U
--------------------------------------------------(10)
X 0 1Y 2 Z V
Suppose that the following assumptions hold.
(U ) 0 , (V ) 0
(U )
2 2
u , (V 2 ) u2
(U iU j ) 0 , (ViV j ) 0, also (UiVi ) 0;
0 0 1 2 U V
X Z 1 (11)
1 1 1 1 1 1 1 1 1
Applying OLS to the first equation of the above structural model will result in
biased estimator because cov( X iU i ) ( X iU j ) 0 . Now, let’s proof whether this
expression.
Econometrics: Module-II
Bedru B. and Seid H
0 1 2 1U V 0 0 1 2
0 Z Z U
1 1 1 1
1 1 1
1 1 1 1 1 1
1 1
U
( 1U V )
1 1 1
1
( 1U 2 UV )
1
1 1
1 1 u2
(U 2 ) 0 , since E(UV) = 0
1 1 1 1 1 1
Econometrics: Module-II
Bedru B. and Seid H
xU
(ˆ ) 1 2
x
Since, we have already proved that (XU ) 0 ; which is the same as
( XU ) 0 . Consequently, when ( XU ) 0 ; (ˆ ) , that is ̂ 1 will be biased
xu
by the amount equivalent to .
x 2
Q s 0 1 P 2 R U 2 (15)
Econometrics: Module-II
Bedru B. and Seid H
Here P and Q are endogenous variables and Y and R are exogenous variables.
Structural models
A structural model describes the complete structure of the relationships among the
economic variables. Structural equations of the model may be expressed in terms
of endogenous variables, exogenous variables and disturbances (random
variables). The parameters of structural model express the direct effect of each
explanatory variable on the dependent variable. Variables not appearing in any
function explicitly may have an indirect effect and is taken into account by the
simultaneous solution of the system. For instance, a change in consumption affects
the investment indirectly and is not considered in the consumption function. The
effect of consumption on investment cannot be measured directly by any structural
parameter, but is measured indirectly by considering the system as a whole.
Example: The following simple Keynesian model of income determination can
be considered as a structural model.
C Y U -----------------------------------------------(16)
Y C Z ----------------------------------------------------(17)
Econometrics: Module-II
Bedru B. and Seid H
Since C and Y are endogenous variables and only Z is the exogenous variables,
we have to express C and Y in terms of Z. To do this substitute Y=C+Z into
equation (16).
C (C Z ) +U
C C Z U
C C Z U
C (1 ) Z U
U
C Z ----------------------------------(18)
1 1 1
Equation (18) and (19) are called the reduced form of the structural model of the
above. We can write this more formally as:
Structural form equations Reduced form equations
C Y U U
C Z
1 1 1
Y CZ 1 U
Y Z
1 1 1
Parameters of the reduced form measure the total effect (direct and indirect) of a
change in exogenous variables on the endogenous variable. For instance, in the
above reduced form equation(18), 1 measures the total effect of a unit
1
, the direct effect, times 1 ,the indirect effect.
Econometrics: Module-II
Bedru B. and Seid H
Econometrics: Module-II
Bedru B. and Seid H
Econometrics: Module-II
Bedru B. and Seid H
In the above illustration, as usual, the X’s and Y’s are exogenous and endogenous
variables respectively. The disturbance terms follow the following assumptions.
(U 1U 2 ) (U 1U 3 ) (U 2U 3 ) 0
The above assumption is the most crucial assumption that defines the recursive
model. If this does not hold, the above system is no longer recursive and OLS is
also no longer valid. The first equation of the above system contains only the
exogenous variables on the right hand side. Since by assumption, the exogenous
variable is independent of U 1 , the first equation satisfies the critical assumption
of the OLS procedure. Hence OLS can be applied straight forwardly to this
equation.
concerned. Hence OLS can be applied to this equation. Similar argument can be
stretched to the 3rd equation because Y1 and Y2 are independent of U 3 . In this
way, in the recursive system OLS can be applied to each equation separately.
Econometrics: Module-II
Bedru B. and Seid H
Y1 1 2 X 2 3 X 3 U 1
Y2 4 1Y1 5 X 4 U 2
Y3 6 2Y2 7 X 5 U 3
In the first equation, there are only exogenous variables and are assumed to be
independent of U 1 . In the second equation, the causal relation between Y1 and
Y2 is in one direction. Also Y1 is independent of U 2 and can be treated just like
Y1 1 2 X 2 3 X 3 U 1
1Y1 Y2 4 5 X 4 U 2
2Y2 Y3 6 7 X 5 U 3
We can again rewrite this in matrix form as follows:
Econometrics: Module-II
Bedru B. and Seid H
1 0 0 Y1 1 2 3 0 0 X1 U
1 0 Y 0 0 0 X U
1
2 4 5 2
0 1 Y3 6 0 0 0 7 X U
3
2 X 4
Coefficien t matrix of coefficien t matrix of
endogenous var iables exogenous var iable
X 5
The coefficient matrix of endogenous variables is thus a triangular one; hence
recursive models are also called as triangular models.
Econometrics: Module-II
Bedru B. and Seid H
Econometrics: Module-II
Bedru B. and Seid H
1 V
W P -------------------------------------------------(iii)
Now, suppose A and B are any two constants. Let’s multiply equation (i) by A,
multiply equation (ii) by B and then add the two equations. This gives
B B
( A B )W A B A P AE AU V or
B B B
A A AU
V
V P A
W
A B A B
E
A B A B
-------------------(iv)
Equation (iv) is what is known as a linear combination of (i) and (ii). The point
about equation (iv) is that it is of the same statistical form as the wage equation (i).
That is, it has the form:
W = constant + (constant)P + (constant)E + disturbance
Moreover, since A and B can take any values we like, this implies that our wage
price model generates an infinite number of equations such as (iv), which are all
statistically indistinguishable from the wage equation (i). Hence, if we apply OLS
or any other technique to data on W, P and E in an attempt to estimate the wage
equation, we can’t know whether we are actually estimating (i) rather than one of
the infinite number of possibilities given by (iv). Equation (i) is said to be
unidentified, and consequently there is now no way in which unbiased or even
consistent estimators of its parameters may be obtained.
Notice that, in contrast, price equation (ii) cannot be confused with the linear
combination (iv), because it is a relationship involving W and P only and does not,
like (iv), contain the variable E. The price equation (ii) is therefore said to be
identified, and in principle it is possible to obtain consistent estimates of its
parameters. A function (an equation) belonging to a system of simultaneous
equations is identified if it has a unique statistical form, i.e. if there is no other
equation in the system, or formed by algebraic manipulations of the other
Econometrics: Module-II
Bedru B. and Seid H
Identification problems do not just arise only on two equation-models. Using the
above procedure, we can check identification problems easily if we have two or
three equations in a given simultaneous equation model. However, for ‘n’
equations simultaneous equation model, such a procedure is very cumbersome. In
general for any number of equations in a given simultaneous equation, we have
two conditions that need to be satisfied to say that the model is in general
identified or not. In the following section we will see the formal conditions for
identification.
In applying the identification rules we should either ignore the constant term, or, if
we want to retain it, we must include in the set of variables a dummy variable (say
X0) which would always take on the value 1. Either convention leads to the same
Econometrics: Module-II
Bedru B. and Seid H
Econometrics: Module-II
Bedru B. and Seid H
(K M ) (G 1)
excluded
variable total number of equatioins1
For example, if a system contains 10 equations with 15 variables, ten endogenous
and five exogenous, an equation containing 11 variables is not identified, while
another containing 5 variables is identified.
a. For the first equation we have
G 10 K 15 M 11
Order condition:
( K M ) (G 1)
(15 11) (10 1)
; that is, the order condition is not satisfied.
order condition:
( K M ) (G 1)
(15 5) (10 1)
; that is, the order condition is satisfied.
Econometrics: Module-II
Bedru B. and Seid H
y 2 y 3 x3 u 2
y 3 y1 y 2 2 x3 u 3
where the y’s are the endogenous variables and the x’s are the predetermined
variables. This model may be rewritten in the form
y1 3 y 2 0 y 3 2 x1 x 2 0 x3 u1 0
0 y1 y 2 y 3 0 x1 0 x 2 x3 u 2 0
y1 y 2 y 3 0 x1 0 x 2 2 x3 u 3 0
Ignoring the random disturbance the table of the parameters of the model is as follows:
Variables
Equations Y1 Y2 Y3 X1 X2 X3
1st equation -1 3 0 -2 1 0
2nd equation 0 -1 1 0 0 1
3rd equation 1 -1 -1 0 0 -2
Secondly. Strike out the row of coefficients of the equation which is being
examined for identification. For example, if we want to examine the identifiability
of the second equation of the model we strike out the second row of the table of
coefficients.
Thirdly. Strike out the columns in which a non-zero coefficient of the equation
being examined appears. By deleting the relevant row and columns we are left
with the coefficients of variables not included in the particular equation, but
contained in the other equations of the model. For example, if we are examining
for identification the second equation of the system, we will strike out the second,
third and the sixth columns of the above table, thus obtaining the following tables.
Econometrics: Module-II
Bedru B. and Seid H
Fourthly. Form the determinant(s) of order (G-1) and examine their value. If at
least one of these determinants is non-zero, the equation is identified. If all the
determinants of order (G-1) are zero, the equation is underidentified.
In the above example of exploration of the identifiability of the second structural
equation we have three determinants of order (G-1)=3-1=2. They are:
1 2 2 1 1
1 0 2 0 3 0
10 0 10
(the symbol stands for ‘determinant’) We see that we can form two non-zero
determinants of order G-1=3-1=2; hence the second equation of our system is
identified.
Fifthly. To see whether the equation is exactly identified or overidentified we use
the order condition ( K M ) (G 1). With this criterion, if the equality sign is
satisfied, that is if ( K M ) (G 1) , the equation is exactly identified. If the
inequality sign holds, that is, if ( K M ) (G 1) , the equation is overidentified.
Econometrics: Module-II
Bedru B. and Seid H
D b0 b1 P1 b2 P2 b3 C b4 t w
DS
Where: D= quantity demanded
S= quantity supplied
P1 price of the given commodity
P2 price of other commodities
Y= income
C= costs (index of prices of factors of production)
Econometrics: Module-II
Bedru B. and Seid H
t= time trend. In the demand function it stands for ‘tastes’; in the supply function it stands for
‘technology’.
The above model is mathematically complete in the sense that it contains three
equations in three endogenous variables, D,S and P 1. The remaining variables, Y,
P2, C, t are exogenous. Suppose we want to identify the supply function. We apply
the two criteria for identification:
1. Order condition: ( K M ) (G 1)
In our example we have: K=7 M=5 G=3
Therefore, (K-M)=(G-1) or (7-5)=(3-1)=2
Consequently the second equation satisfies the first condition for identification.
2. Rank condition
The table of the coefficients of the structural model is as follows.
Variables
Equations D P1 P2 Y t S C
st
1 equation -1 a1 a2 a3 a4 0 0
2nd equation 0 b1 b2 0 b4 -1 b3
3rd equation 1 0 0 0 0 1 0
Following the procedure explained earlier we strike out the second row an the second,
third, fifth, sixth and seventh columns. Thus we are left with the table of the coefficients
of excluded variables:
Complete table of Table of parameters of
Structural parameters variables excluded from
the second equation
-1 a1 a2 a3 a4 0 0 -1 a3
0 b1 b2 0 b4 1 b3
1 0 0 0 0 1 1 -1 0
From this table we can form only one non-zero determinant of order
(G-1) = (3-1) =2
Econometrics: Module-II
Bedru B. and Seid H
1 a3
(0)1)( 1(a3)) a3
1 0
The value of the determinant is non-zero, provided that a3 0 .
We see that both the order and rank conditions are satisfied. Hence the second
equation of the model is identified. Furthermore, we see that in the order
condition the equality holds: (7-5) = (3-1) = 2. Consequently the second
structural equation is exactly identified.
Example 2. Assume the following simple version of the Keynesian model of
income determination.
Consumption function: Ct a 0 a1Yt a 2Tt u
Investment function: It b0 b1Yt 1 u
Taxation function: Tt c0 c1Yt w
Definition: Yt C t I t Gt
This model is mathematically complete in the sense that it contains as many
equations as endogenous variables. There are four endogenous variables, C,I,T,Y,
and two predetermined variables, lagged income (Yt 1 ) and government
expenditure (G).
A. The first equation (consumption function) is not identified
1. Order condition: ( K M ) (G 1)
There are six variables in the model (K=6) and four equations (G=4). The
consumption function contains three variables (M=3).
(K-M)=3 and (G-1)=3
Thus (K-M)=(G-1), which shows that the order condition for identification is
satisfied.
2. Rank condition
The table of structural coefficients is as follows
Variables
Econometrics: Module-II
Bedru B. and Seid H
Equations C Y T I Yt 1 G
1st equation -1 a1 a2 0 0 0
2nd equation 0 0 -1 b1 0
3rd equation 0 C1 -1 0 0 0
4th equation -1 0 1 0 1
1
1
We strike out the first row and the three first columns of the table and thus obtain
the table of coefficients of excluded variables.
-1 a1tableaof
Complete 2 0 0 0 Table of coefficients of
0 0
structural 0
parameters-1 0 b1 -1
excluded variables b1 0
0 c1 -1 0 0 0 0 0 0
1 -1 0 1 0 1 -1 0 0
We evaluate the determinant of this table. Clearly the value of this determinant is
zero, since the second row contains only zeros. Consequently we cannot form any
nonzero determinant of order 3(=G-1). The rank condition is violated. Hence we
conclude that the consumption function is not identified, despite the satisfaction of
the order criterion.
B. The investment function is overidentified
1. Order condition
The investment function includes two variables. Hence
K-M = 6-2
Clearly (K-M) > (G-1), given that G-1=3. The order condition is fulfilled.
2. Rank condition
Deleting the second row and the fourth and fifth columns of the structural
coefficients table we obtain.
Complete table of structural Table of coefficients of
Parameters excluded variables
-1 a1 a2 0 0 0
0 0 0 -1 b1 0 -1 a1 a2 0
0 c1 -1 0 0 0 0 c1 -1 0
1 -1 0 1 0 1 -1 -1 0 1
Econometrics: Module-II
Bedru B. and Seid H
The value of the first 3x3 determinant of the parameters of excluded variables is
c1 1 01 0 c1
11 1 aa 2 a12c01
1 0 1 0 1
(provided a1 a 2 c1 1 )
The rank condition is satisfied since we can construct at least one non-zero
determinant of order 3=(G-1).
Applying the counting rule ( K M ) (G 1) we see that the inequality sign holds:
4>3; hence the investment function is overidentified.
Econometrics: Module-II
Bedru B. and Seid H
structural model. The rank condition here refers to the value of the determinant
formed from some of the reduced form parameters, π‘s.
(K M) (G 1)
Total number of number of
excluded variables equations les one
where K, M and G have the same meaning as before:
K= total number of variables, endogenous and exogenous, in the entire
model
M= number of variables, endogenous and exogenous, in any particular
equation
G= number of structural equations=number of all endogenous variables in
the model
If (K-M) = (G-1), the equation is exactly identified, provided that the rank
condition set out below is also satisfied. If (K-M)>(G-1), the equation is
overidentified, while if (K-M)<(G-1), the equation is underidentified, under the
same proviso.
2.Rank condition as applied to the reduced form
Econometrics: Module-II
Bedru B. and Seid H
y 2 b23 y 3 y 23 x3 u 2
y 3 b31 y1 b32 y 2 y 33 x3 u 3
This model is complete in the sense that it contains three equations in three
endogenous variables. The model contains altogether six variables, three
endogenous ( y1 , y 2 , y 3 ) and three exogenous ( x1 , x 2 , x3 ).
The reduced form of the model is obtained by solving the original equations for
the exogenous variables. The reduced form in the above example is:
y1 11 x1 12 x 2 13 x3 v1
y 2 21 x1 22 x 2 23 x3 v 2
y 3 31 x1 32 x 2 33 x3 v3
Econometrics: Module-II
Bedru B. and Seid H
y2 21 22 23
rd
3 equation: y 3 31 32 33
Strike out the rows corresponding to endogenous variables excluded from the
particular equation being examined for identifiability. Also strike out all the
columns referring to exogenous variables included in the structural form of the
particular equation.
After these deletions we are left with the reduced form coefficients of exogenous
variables excluded (absent) from the structural equation. For example, assume that
we are investigating the identification procedure are found by striking out the first
row (since y1 , does not appear in the second equation) and the third column (since
x3 , is included in this equation).
21 22 23 31
31 32 33 32
Thirdly. Examine the order of the determinants of the π’s of excluded exogenous
variables and evaluate them. If the order of the larges non-zero determinant is
G*-1, the equation is identified. Otherwise the equation is not identified.
Major References
Econometrics: Module-II
Bedru B. and Seid H
Econometrics: Module-II