Sie sind auf Seite 1von 5

GaussMarkov theorem

Not to be confused with GaussMarkov process.


(i.e., all disturbances have the same variance; that is
BLUE redirects here. For queue management algo- "homoscedasticity"), and
rithm, see Blue (queue management algorithm).
cov(i , j ) = 0, i = j
In statistics, the GaussMarkov theorem, named after
Carl Friedrich Gauss and Andrey Markov, states that in a
linear regression model in which the errors have expectation zero and are uncorrelated and have equal variances,
the best linear unbiased estimator (BLUE) of the coecients is given by the ordinary least squares (OLS) estimator. Here best means giving the lowest variance of
the estimate, as compared to other unbiased, linear estimators. The errors do not need to be normal, nor do they
need to be independent and identically distributed (only
uncorrelated with mean zero and homoscedastic with nite variance). The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist
with lower variance. See, for example, the JamesStein
estimator (which also drops linearity) or ridge regression.

in which the coecients cij are not allowed to depend


on the underlying coecients j , since those are not observable, but are allowed to depend on the values Xij ,
since these data are observable. (The dependence of the
coecients on each Xij is typically nonlinear; the estimator is linear in each yi and hence in each random ,
which is why this is linear regression.) The estimator is
said to be unbiased if and only if

E(bj ) = j

for i = j that is, the error terms are uncorrelated. A


linear estimator of j is a linear combination
bj = c1j y1 + + cnj yn

Statement

K
regardless of the values of Xij . Now, let j=1 j j
be some linear combination of the coecients. Then the
mean squared error of the corresponding estimation is

Suppose we have in matrix notation,

y = X + ,

(y, Rn , RK and X RnK )

2
K

E
j (bj j ) ;

expanding to,

j=1

yi =

j Xij + i

i.e., it is the expectation of the square of the weighted


sum (across parameters) of the dierences between the
estimators and the corresponding parameters to be estimated. (Since we are considering the case in which all
the parameter estimates are unbiased, this mean squared
error is the same as the variance of the linear combination.) The best linear unbiased estimator (BLUE) of
the vector of parameters j is one with the smallest
mean squared error for every vector of linear combination parameters. This is equivalent to the condition that

i = 1, 2, . . . , n

j=1

where j are non-random but unobservable parameters,


Xij are non-random and observable (called the explanatory variables), i are random, and so yi are random.
The random variables i are called the disturbance,
noise or simply error (will be contrasted with residual later in the article; see errors and residuals in statistics). Note that to include a constant in the model above,
one can choose to introduce the constant as a variable
K+1 with a newly introduced last column of X being
V ()
b
unity i.e., Xi(K+1) = 1 for all i .
V ()
The GaussMarkov assumptions are

is a positive semi-denite matrix for every other linear


unbiased estimator .

E(i ) = 0,

The ordinary least squares estimator (OLS) is the


function

V (i ) = 2 < ,
1

5 GAUSSMARKOV THEOREM AS STATED IN ECONOMETRICS

3 Remarks on the proof


b = (X X)1 X y


As it has been stated before, the condition of V ()
b
of y and X (where X denotes the transpose of X ) that V () is equivalent to the property that the best linear unt
t
minimizes the sum of squares of residuals (mispredic- biased estimator of l is l b (best in the sense that it has
minimum variance). To see this, let lt another linear
tion amounts):
unbiased estimator of lt .
= lt V ()l
= 2 lt (X X)1 l +lt DDt l
V (lt )

2
{z
}
|
K
n
n

V (lt )
2
b

(yi ybi ) =
yi
j Xij
.
+ (Dt l)t (Dt l) = V (lt )
+ ||Dt l|| V (lt )

= V (lt )
i=1
i=1
j=1

The theorem now states that the OLS estimator is a


BLUE. The main idea of the proof is that the leastsquares estimator is uncorrelated with every linear unbiased estimator of zero, i.e., with every linear combination
a1 y1 + +an yn whose coecients do not depend upon
the unobservable but whose expected value is always
zero.

V (lt )
.
Therefore, V (lt )
=
Moreover, suppose that the equality holds ( V (lt )
t
t
V (l ) ). It happens if and only if D l = 0 . Remembering that, from the proof above, we have =
((X X)1 X + D)Y , then:
lt =lt (X X)1 X Y + lt DY = lt b + (Dt l)t Y = lt b
| {z }
=0

Proof

This proves that the equality holds if and only if lt = lt b


which gives the unicity of the OLS estimator as a BLUE.

Let = Cy be another linear estimator of and let


C be given by (X X)1 X + D , where D is a k n 4 Generalized least squares estinonzero matrix. As we're restricting to unbiased estimator
mators, minimum mean squared error implies minimum
variance. The goal is therefore to show that such an estimator has a variance no smaller than that of , the OLS The generalized least squares (GLS), developed by
Aitken,[1] extends the GaussMarkov theorem to the
estimator.
case where the error vector has a non-scalar covariance
The expectation of is:
matrix.[2] The Aitken estimator is also a BLUE.
E(Cy) = E(((X X)1 X + D)(X + ))

5 GaussMarkov theorem
in econometrics
0

= ((X X)1 X + D)X + ((X X)1 X + D) E()


| {z }stated
= (X X)1 X X + DX
= (Ik + DX).
Therefore, is unbiased if and only if DX = 0 .
The variance of is

= V (Cy) = CV (y)C = 2 CC
V ()
= 2 ((X X)1 X + D)(X(X X)1 + D )

as

In most treatments of OLS, the regressors in the


design matrix X are assumed to be xed in repeated
samples. This assumption is considered inappropriate for a predominantly nonexperimental science like
econometrics.[3] Instead, the assumptions of the Gauss
Markov theorem are stated conditional on X .

5.1 Linearity

= 2 ((X X)1 X X(X X)1 + (X X)1 X D +The


DX(X
X)1variable
+ DDis
) assumed to be a linear function
dependent
the
1variables
2

2
2

1
2

1
of
specied
in the model. The specication
DX (X X) + DD
= (X X) + (X X) (DX
|{z}) + |{z}
must
be
linear
in
its
parameters.
This does not mean that
0
0
there must be a linear relationship between the indepen2

1
2

= (X X) + DD .
|
{z
}
dent and dependent variables. The independent variables

V ()
can take non-linear forms as long as the parameters are
linear. The equation y = 0 + 1 x2 , qualies as linear

Since DD' is a positive semidenite matrix, V () exceeds while y = 0 + 12 x can be transformed to be linear by
by a positive semidenite matrix.
V ()
replacing 12 by another parameter, say . An equation

5.4

Spherical errors

with a parameter dependent on an independent variable


does not qualify as linear, for example y = 0 +1 (x)x
rank(X) = k
, where 1 (x) is a function of x .
Data transformations are often used to convert an equa- Otherwise X is not invertible and the OLS estimator cantion into a linear form. For example, the CobbDouglas not be computed.
functionoften used in economicsis nonlinear:
A violation of this assumption is perfect multicollinearity,
i.e. some explanatory variables are linearly dependent.
One scenario in which this will occur is called dummy
Y = AL K 1 e
variable trap, when a base dummy variable is not omitted resulting in perfect correlation between the dummy
But it can be expressed in linear form by taking the natural
variables and the constant term.[7]
logarithm of both sides:[4]
Multicollinearity (as long as it is not perfect) can be
present resulting in a less ecient, but still unbiased estimate. The estimates will be less precise and highly sensiln Y = ln A+ ln L+(1) ln K+ = 0 +1 ln L+2 ln K+
tive to particular sets of data.[8] Multicollinearity can be
This assumption also covers specication issues: assum- detected from condition number or the variance ination
ing that the proper functional form has been selected and factor, among other tests.
there are no omitted variables.

5.4 Spherical errors


5.2

Strict exogeneity

The outer product of the error vector must be spherical.

For all n observations, the expectationconditional on


the regressorsof the error term is zero:[5]

E[ i | X] = E[ i | x1 , . . . , xn ] = 0.
[
]T
where xi = xi1 xi2 . . . xik is the data vector
of regressors for the ith observation, and consequently
[
]T
xT
. . . xT
X = xT
is the data matrix or design
n
1
2
matrix.

E[ T | X] = Var[ | X] = .
..

0
2
..
.

...
...
..
.

0
0

2
.. = I
.

...

This implies the error term has uniform variance


(homoscedasticity) and no serial dependence.[9] If this
assumption is violated, OLS is still unbiased, but inecient. The term spherical errors will describe the mul2
Geometrically, this assumptions implies that xi and i are tivariate normal distribution: if Var[ | X] = I in the
orthogonal to each other, so that their inner product (i.e., multivariate normal density, then the equation f(x)=c is
the formula for a ball centered at with radius in ntheir cross moment) is zero.
dimensional space.[10]

E[ xj1 i ]
E[ xj2 i ]

E[ xj i ] =
=0
..

.
E[ xjk i ]

all fori, j n

Heteroskedacity occurs when the amount of error is correlated with an independent variable. For example, in a
regression on food expenditure and income, the error is
correlated with income. Low income people generally
spend a similar amount on food, while high income people may spend a very large amount or as little as low income people spend. Heteroskedacity can also be caused
by changes in measurement practices. For example, as
statistical oces improve their data, measurement error
decreases, so the error term declines over time.

This assumption is violated if the explanatory variables


are stochastic, for instance when they are measured with
error, or are endogenous.[6] Endogeneity can be the result of simultaneity, where causality ows back and forth
between both the dependent and independent variable. This assumption is violated when there is autocorrelation.
Instrumental variable techniques are commonly used to Autocorrelation can be visualized on a data plot when a
address this problem.
given observation is more likely to lie above a tted line
if adjacent observations also lie above the tted regression line. Autocorrelation is common in time series data
5.3 Full rank
where a data series may experience inertia.[11] If a dependent variable takes a while to fully absorb a shock.
The sample data matrix X must be non-singular, i.e. it Spatial autocorrelation can also occur geographic areas
must have full rank.
are likely to have similar errors. Autocorrelation may be

with 2 > 0

the result of misspecication such as choosing the wrong


functional form. In these cases, correcting the specication is one possible way to deal with autocorrelation.

EXTERNAL LINKS

8 References

In the presence of non-spherical errors, the generalized


least squares estimator can be shown to be BLUE.[12]

Plackett, R.L. (1950).


Some Theorems in
Least Squares. Biometrika 37 (12): 149157.
doi:10.1093/biomet/37.1-2.149. JSTOR 2332158.
MR 36980.

Greene, William H. (2012, 7th ed.) Econometric


Analysis, Prentice Hall.

See also

Independent and identically distributed random Use of BLUE in physics


variables
Linear regression
Measurement uncertainty

6.1

Other unbiased statistics

Best linear unbiased prediction (BLUP)


Minimum-variance unbiased estimator (MVUE)

Notes

[1] Aitken, A. C. (1935). On Least Squares and Linear


Combinations of Observations. Proceedings of the Royal
Society of Edinburgh 55: 4248.
[2] Huang, David S. (1970). Regression and Econometric
Methods. New York: John Wiley & Sons. pp. 127147.
ISBN 0-471-41754-8.
[3] Hayashi, Fumio (2000). Econometrics. Princeton University Press. p. 13. ISBN 0-691-01018-8.
[4] Kennedy 2003, p. 110.
[5] Hayashi, Fumio (2000). Econometrics. Princeton University Press. p. 7. ISBN 0-691-01018-8.
[6] Johnston, John (1972). Econometric Methods (Second
ed.). New York: McGraw-Hill. pp. 267291. ISBN
0-07-032679-7.
[7] Wooldridge, Jerey (2012). Introductory Econometrics
(Fifth international ed.). South-Western. p. 220. ISBN
978-1-111-53439-4.
[8] Johnston, John (1972). Econometric Methods (Second
ed.). New York: McGraw-Hill. pp. 159168. ISBN
0-07-032679-7.
[9] Hayashi, Fumio (2000). Econometrics. Princeton University Press. p. 10. ISBN 0-691-01018-8.
[10] Greene 2012, p. 23-note.
[11] Greene 2010, p. 22.
[12] Kennedy 2003, p. 135.

L. Lyons, D. Gibaut, P. Cliord (1998). How to


combine correlated estimates of a single physical
quantity. Nucl. Instr. and Meth. A270: 110.
L. Lyons, A. J. Martin, D. H. Saxon (1990). On
the determination of the b lifetime by combining the
results of dierent experiments. Phys. Rev. D41:
982985. doi:10.1103/physrevd.41.982.

9 External links
Earliest Known Uses of Some of the Words of
Mathematics: G (brief history and explanation of
the name)
Proof of the Gauss Markov theorem for multiple linear regression (makes use of matrix algebra)
A Proof of the Gauss Markov theorem using geometry

10
10.1

Text and image sources, contributors, and licenses


Text

GaussMarkov theorem Source: https://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem?oldid=712569967 Contributors:


Michael Hardy, Kku, Den fjttrade ankan~enwiki, Cherkash, Benwing, Psychonaut, Giftlite, Lupin, Bender235, 3mta3, Eric Kvaalen,
Caesura, Jwb~enwiki, Gene Nygaard, Redvers, Oleg Alexandrov, Stoni, Grammarbot, Rjwilmsi, Wavelength, Johndburger, Jasn, Zvika,
SmackBot, BeteNoir, Nbarth, Darth Panda, Vina-iwbot~enwiki, Lambiam, Voceditenore, Jec, ShelfSkewed, Thijs!bot, Headbomb, Quintote, Harish victory, Yesitsapril, Baccyak4H, STBotD, VolkovBot, Prizmak, Kyle the bot, Philip Trueman, TXiKiBoT, Zidonuke, Malinaccier, Aristan007, Petergans, Flyer22 Reborn, Antonio Lopez, Tomas e, Qwfp, Guarracino, Gotta catch 'em all yo, Winklerf, Possiblehamburgler, Addbot, Ddu, Ptbotgourou, TaBOT-zerem, Wjastle, Mattpb27, Flavio Guitian, Blossomonte, DoostdarWKP, Tegel,
Kiefer.Wolfowitz, Stpasha, Duoduoduo, Jowa fan, Elitropia, John of Reading, IGeMiNix, Penguinlovesyouboy, ClueBot NG, Deleter42,
Widr, BG19bot, RDebashruti, Fodon, YFdyh-bot, SD5bot, Eykolo, Slgsoccer, Lucky.v.tran and Anonymous: 51

10.2

Images

File:Fisher_iris_versicolor_sepalwidth.svg Source: https://upload.wikimedia.org/wikipedia/commons/4/40/Fisher_iris_versicolor_


sepalwidth.svg License: CC BY-SA 3.0 Contributors: en:Image:Fisher iris versicolor sepalwidth.png Original artist: en:User:Qwfp (original); Pbroks13 (talk) (redraw)

10.3

Content license

Creative Commons Attribution-Share Alike 3.0

Das könnte Ihnen auch gefallen