Beruflich Dokumente
Kultur Dokumente
Now we write
yt 1 , t=1,2,...,T
2 x2t 3 x3t ... k xkt ut
Where is x1? It is the constant term. In fact the constant term is usually represented by a
column of ones of length T:
1
1
x1
1 (which we called before).
1 is the coefficient attached to the constant term
y is T 1
X is T k
is k 1
u is T 1
y1 1 x21
u1
y 1 x
u
22
1
2
2
yT 1 x2T
uT
T 1
T2
21
T1
u
T
u ' u u1 u2 uT
uT
2 ( X X ) 1 X y
s2
2
t
T 2
u' u
2
s
Tk
y 1 2 x2 3 x3 u
and the following data have been calculated from the original Xs.
2.0 35
30
. 10
.
.
( X ' X ) 1 35
. 10
. 65
. ,( X ' y) 2.2 , u' u 10.96
10
0.6
. 65
. 4.3
Calculate the coefficient estimates and their standard errors.
To calculate
the coefficients, just multiply the matrix by the vector to obtain
1
X'X
X'y
RSS 10.96
2
s
0.91
Tk
15 3
183
.
320
.
0.91
s2 ( X ' X ) 1 0.91( X ' X ) 1 320
.
0.91 594
.
0.91 594
.
393
.
Var ( 1 ) 183
.
SE ( 1 ) 135
.
Var ( 2 ) 0.91 SE ( 2 ) 0.96
Var ( ) 3.93
SE ( ) 198
.
3
We write:
1.35 0.96
1.98
The F-test:
Restricted and Unrestricted Regressions
Example
The general regression is
yt = 1 + 2x2t + 3x3t + 4x4t + ut
(1)
We want to test the restriction that 3+4 = 1 (we have some hypothesis
from theory which suggests that this would be an interesting hypothesis to
study). The unrestricted regression is (1) above, but what is the restricted
regression?
yt = 1 + 2x2t + 3x3t + 4x4t + ut s.t. 3+4 = 1
We substitute the restriction (3+4 = 1) into the regression so that it is
automatically imposed on the data.
3+4 = 1 4 = 1- 3
test statistic
RRSS URSS T k
URSS
m
The F-Distribution
The test statistic follows the F-distribution, which has 2 d.f.
parameters.
The value of the degrees of freedom parameters are m and (T-k)
respectively (the order of the d.f. parameters is important).
The appropriate critical value will be in column m, row (T-k).
The F-distribution has only positive values and is not symmetrical. We
therefore only reject the null if the test statistic > critical F-value.
We cannot test using this framework hypotheses which are not linear
or which are multiplicative, e.g.
H0: 2 3 = 2 or H0: 2 2 = 1
cannot be tested.
2 0.5
test stat
SE ( 2 )
F-test Example
Data Mining
Data mining is searching many series for statistical relationships
without theoretical justification.
For example, suppose we generate one dependent variable and twenty
explanatory variables completely randomly and independently of each
other.
If we regress the dependent variable separately on each independent
variable, on average one slope coefficient will be significant at 5%.
If data mining occurs, the true significance level will be greater than
the nominal significance level.
TSS yt y
We can split the TSS into two parts, the part which we have explained (known as
the explained sum of squares, ESS) and the part which we did not explain using
the model (the RSS).
Defining R2
That is,
TSS =
ESS
+ RSS
2
2
2
u
t
t
t
t
ESS
TSS
But since TSS = ESS + RSS, we can also write
R2
1
TSS
TSS
TSS
R2 must always lie between zero and one. To understand this, consider two
extremes
RSS = TSS i.e.
ESS = 0 so
R2 = ESS/TSS = 0
ESS = TSS i.e.
RSS = 0 so
R2 = ESS/TSS = 1
R2
yt
yt
xt
xt
Adjusted R2
(1 R 2 )
T k
R 2 1
A Regression Example:
Hedonic House Pricing Models
Hedonic models are used to value real assets, especially housing, and view the asset as
representing a bundle of characteristics.
Des Rosiers and Thrialt (1996) consider the effect of various amenities on rental values
for buildings and apartments 5 sub-markets in the Quebec area of Canada.
The rental value in Canadian Dollars per month (the dependent variable) is a function of 9
to 14 variables (depending on the area under consideration). The paper employs 1990
data, and for the Quebec City region, there are 13,378 observations, and the 12
explanatory variables are:
LnAGE
- log of the apparent age of the property
NBROOMS - number of bedrooms
AREABYRM - area per room (in square metres)
ELEVATOR - a dummy variable = 1 if the building has an elevator; 0 otherwise
BASEMENT - a dummy variable = 1 if the unit is located in a basement; 0 otherwise
Coefficient
282.21
-53.10
48.47
3.97
88.51
-15.90
7.17
73.76
-16.99
5.84
-4.27
-10.04
0.29
t-ratio
56.09
-59.71
104.81
29.99
45.04
-11.32
7.07
31.25
-7.62
4.60
-38.88
-5.97
5.98
Notes: Adjusted R2 = 0.65l; regression F-statistic = 2082.27. Source: Des Rosiers and
Thrialt
(1996). Reprinted with permission of the American Real Estate Society.
Model 1 : yt 1 2 x2t ut
Model 2 : yt 1 2 x3t vt
We could use R2 or adjusted R2, but what if the number of explanatory
variables were different across the 2 models?
An alternative approach is an encompassing test, based on examination
of the hybrid model:
Model 3 : yt 1 2 x2t 3 x3t wt