Sie sind auf Seite 1von 2

Linear Regression with 1 regressor (CHAPTER 4) Binary Variables interpretation: are multiple categories and every observation falls

are multiple categories and every observation falls in one and only one
category (Freshmen, Sophomores, Juniors, Seniors, Other). If you
Aim: estimate the causal effect on Y of a unit change in X include all these dummy variables and a constant, you will have perfect
multicollinearity – this is sometimes called the dummy variable.SOLN:
Slope: expected change on Y for a unit change in X E[X|Y] = b0 + b1X omit one group or b0
Method: minimize the sum of square errors or average squared Multiple Regressor: Hypothesis testing and CI (CHAPTER 7)
difference between actual Yi and predicted Yi, min u (OLS), u = error
which contains omitted factors that influence Y that is not captured in X Idea here is all the same from chapter 5 but the t test is WRONG
and also error in measurement in Y IF var(u|X=x) is constant – that is if the conditional dist of u given X
doesn’t depend on X then u is said to be homo ow hetero We use F-test for joint hypotheses. Note t up F up
b0 and b1 are population parameter, the hats are the estimates, we pick
the hats so that u is minimized IF 3 assumption + homoskedastic u imply beta 1 has smallest variance F approaches chi^2 of q as n approach infinity, q =# of betas
among all linear estimator (Gauss Markov Thm)
Interpretation: one more unit change in x on average have beta1 effect
on y IF u is homo and u is dist N(0,sigma^2) then beta0 and beta1 ~ N under
all n and t stat ~t with n-2 degree of freedom
Measure of fit: 1) R^2 – measure the fraction of variance of Y that is
Linear Multiple Regression with multiple regressor (CHA 6) when
explained by X, between [0,1] = sum ESS/sum TSS = (yhat –
the t1 and t2 are indep then 0.5(t1^2+t2^2)
ybar_hat)/(yi – y bar)2) SER – measure the magnitude of a typical
regression residual in the unit change of Y, measures the spread of the OVB: The error u arises because of factors, or variables, that influence
Y but are not included in the regression function. There are always p-value = tail probability of the chi^2_q/q distribution beyond the F-
dis of u 3) RMSE is the same as SER but 1/n and not n-2 statistic actually computed.
omitted variables.
Assumption on Sampling: Restricted vs unrestricted regression: compare R^2 under H0 or H1 –
2 conditions for OVB where Z is the omitted varaible
that is under H1 b1 = b2= 0
1) E(u|Xi = xi) = 0 implies beta1hat is unbiased, conditional dist of u
given X has mean 0 RESULT: E(beta hat) = beta, var(betahat)~1/n 1) Z is a determinant of Y
2) Z is correlated with the regressor X corr(Z,X) != 0 Assuming homo vs hetero, more likely to reject H0. via example
2) (Xi,Yi) are iid – true if sample is random samples, problematic when
we have panel data Direction of Bias = direction of corr(X,u)
For
3) E(X^4) < infinity or outliers are rare, OLS can be sensitive with small n use F dist as it is more conservative than chi^2 in rejecting the
outliers null
then
beta 1 = Sxy/Sx^2 is the object of interest (the causal effect of X on Y) the OLS estimator in the regression omitting that variable is biased and Testing b1 = b2, there are 2 methods
inconsistent.thus beta hat will not approach beta1 nlarge
sampling dist of beta 1 is normal when we have n large and thus 1) re-arrange regressor so that the restriction become a restriction on a
estimators -> pop parameter in the limits that is it is consistent The Causal effect: is defined to be the effect measured in an ideal single coefficient in an equivalent regression 2) test on stata
larger the var of X, the smaller the var of beta hat randomized controlled experiment. beta 1c = E[Y|X=x*] - E E[Y|X=x* -
n] where n is decided by us randomly

For beta hat to approach beta forecast need (2,3), for causal also 1
1 Regressor: Hypothesis testing and CI (CHAPTER 5)
Interpretation of beta:
sampling dist of beta 1 when n is large
unit change in Y for unit change in X1 holding all other Xi constan
Confidence Sets based on F-stat: ellipse for 2 variables
beta0 = predicted value of Y when all Xi are = 0
Control Variables: A control variable W is a variable that is correlated
Adjusted R^2: 1 – ((n-1)/(n-k-1))SRR/TSS – penalize addition
with, and controls for, an omitted causal factor in the regression of Y on
Objective: Test various hypotheses H0: beta1 = 0, 1 or 2 side regressors but converges when n is large
X, but which itself does not necessarily have acausal effect on Y. - An
Assumptions on sampling: effective control variable is one which, when included in the regression,
Method: t-stat, compute p-value and reject or accept
makes the error term uncorrelated with the variable of interest. Shown
1,2,3 with pipeline E(u|w) = E[E[u|X,W]|W]

4) there is no perfect multicollinearity – that is when one of the reg is an High R^2 shows predictive power not causal effect
Reject at 5% sig level if |t| > 1..96 exact linear function of the other

Dummy Var trap: Suppose you have a set of multiple binary (dummy)
variables, which are mutually exclusive and exhaustive – that is, there
Nonlinear Regression Function (Chapter 8)

If a relation between Y and X is nonlinear: (1) The effect on Y of a


change in X depends on the value of X –that is, the marginal effect of X Time fixed effect
is not constant (2) A linear regression is mis-specified: the functional
form is wrong (3) The estimator of the effect on Y of X is biased: in
general it isn’t even right on average.(4) The solution is to estimate a
regression function that is nonlinear in X Case where indiviudual was accepted by together is rejected

Polynomial Case: Regression with Panel data (Chapter 10)


Regression itself can still show predicted change just calculate the Notations: i = entity, t = time State –fixed – some unobserved effect for CA over time is constant
change in X normally [KEY] To interpret the estimated regression Time-fixed – some unobserved effect for 1999 over all states is
function: (1) plot predicted values as a function of x (2) compute Panel Data: constant
predicted ΔY/ΔX at different values of x
1) control factors that vary across entity/state, but not time Assumptions on sampling: 1,2,3,4
2) control factor that vary across time but not entity

3) control the unobserved or unmeasured

also lets us eliminate omitted varialbe bias that is constant over time
with a given state, due to the regression is in moving time – e.g. Any
change in the fatality rate from 1982 to 1988 cannot be caused by Zi,
because Zi (by assumption) does not change between 1982 and 1988.Wh
1) Uit has mean zero, given the entity fixed effect and the entire history
of the X’s for that entity
We look at the difference equation if
2) This is satisfied if entities are randomly sampled from their
population by simple random sampling. This does not require
observations to be i.i.d. over time for the same entity – that would be
unrealistic. Whether a state has a high beer tax this year is a good
last predictor of (correlated with) whether it will have a high beer tax next
case is just elasticity, note cant compare R^2 over diff cases Non-Linear year. Similarly, the error term for an entity in one year is plausibly
Least Square are used for when the parameter beta is non-linear in the (1) The new error term, (ui1988 – ui1982), is uncorrelated with either correlated with its value in the year, that is, corr(uit, uit+1) is often
regression equaiton – use stata BeerTaxi1988 or BeerTaxi1982.(2) This “difference” equation can be plausibly nonzero.
estimated by OLS, even though Zi isn’t observed.(3) The omitted
Binary-Binary variable Zi doesn’t change, so it cannot be a determinant of the change Auto corrolation: corrolation over time – brings out the need for cluster!
in Y (4) This differences regression doesn’t have an intercept – it was
eliminated by the subtraction step

“n-1 binary regressor”

Continuous and Binary

Entity-demeaned

Cont-Cont

Das könnte Ihnen auch gefallen