Beruflich Dokumente
Kultur Dokumente
Types of Data:
Cross sectional
Time series
Pooled cross sections (independent cross sections, usually at different times)
Panel / longitudinal data (dependent cross sections, uses same individuals at different times)
Ceteris Paribus, how does variable y change when x is changed when all else is kept equal?
Chpt 2
SST=SSE+SSR
R2=SSE/SST
Gauss-Markov Assumptions:
Chpt 3
SLR 5: homoskedasticity, needed in conjunction with SLR 1-4 for expected value of variance to equal
true variance.
MLR 1-5 needed for unbiasedness of OLS. Also means that OLS estimators are best linear unbiased
estimators (BLUE).
Chpt 4
Omitted variable bias is worse than added unnecessary variables that cause collinearity. Omitted
variables must be uncorrelated with all included regressors to avoid omitted variable bias.
MLR 6: normality of error terms. U~N(0,sigma squared)
Chpt 5
Chpt 6
100*exp(B1x1)-1
Chpt 7
Linear probability models are when y is a dummy variable that has value 0 or 1. It is always
heteroskedastic.
Chpt 8
Heteroskedastic robust standard errors are used to compensate. Can now use these standard errors
for T and F tests.
Breusch Pagan test for heteroskedasticity, regress square of residuals against explanatory variables.
Null hypothesis is homoskedasticity.
White test for heteroskedasticity, regress square of residuals against fitted values and square of
fitted values. Tests for a broader class of heteroskedasticity but uses more degrees of freedom.
Weighted least squares is used when there is heteroskedasticity and its form is known. If variance is
sigma squared multiplied by some function h(x), where h(x)>0 as variance>0, then divide the
regression by the square root of h(x). Interpret the model as under OLS. If original model satisfies
MLR 1-4 and MLR 6, then new model will be BLUE.
As a special case, h(x)=1/n, the number of people in the sample. Used when data points are
averages. Then WLS is multiplying the original model by square root of n to remove
heteroskedasticity.
Estimated / Feasible GLS used when h(x) is unknown. Regress log of squared residuals against
explanatory variables and find fitted values, g. H(x) = exp(g). Weights are 1/h(x). Rerun regression
using weights 1/h(x) or transform each variable (including intercept) by dividing by square root of
h(x).
If OLS and WLS produce very different estimators then another assumption (probably ZCM) is wrong.
Chpt 9
To test if misspecification is due to omission of logs, create a general model and then use F test.
Works for any two non-nested models.
RESET test, add square and cubic of fitted values to the regression. Null hypothesis is no functional
form misspecification, i.e. coefficients are equal to zero. However does not show source of
misspecification.
Proxy variables are used when data cannot be obtained on a variable. A lagged variable can be used
as a proxy.
Measurement error in response variable is of little concern as variables are still unbiased and
consistent). Attenuation bias causes coefficient to be biased and smaller than true value when there
is measurement error. ZCM is always violated when there is measurement error.
Complete case analysis: Stata ignores data rows where some inputs (variables) are missing. If it is
random there is no bias.
Exogenous sample selection is unbiased (select based on x), while endogenous is biased.
Chpt 10
LRP long run propensity is combined effect of all lags due to a permanent change.
TS3: ZCM (strict exogeneity if uncorrelated with all x, exogeneity if only current x)
TS4: Homoskedasticity
TS5 no autocorrelation
Seasonality and time trends tested with F tests, null hypothesis is coefficients are zero.
Chpt 11
Difference in differences estimator (DiD), found with joint variable.
ai is the unobserved heterogeneity that is time invariant, can be removed from longitudinal (panel)
data by differencing.
Chpt 12