Beruflich Dokumente
Kultur Dokumente
Problem Set 1
Please include stata do-file code and output for all exercises. Please submit the description of your results as
a pdf–printed or electronic. * indicates bonus points, but it is still in your best interest to solve them.
(a) Open a blank Stata dataset and set the number of observations to 100.
Please refer to original output file if you find some discrepancies. I may commit
errors while copy-pasting.
g i = _n
(c) Generate another variable α which has value 0.5 for all 100 observations.
g alpha = 0.5
(d) Generate ǫi as a random normal variable with mean 0 and standard deviation 1.
g epsilon = rnormal()
g x = runiform()
yi = α + βxi + ǫi,
g beta = 0.3
g y = alpha + beta * x + epsilon
reg y x
i. Test H0: β = 0.
test x, mtest
F(df,98) df p
# unadjusted p-values
F(df,98) df p
# unadjusted p-values
iii. Report the p-values for testing both hypotheses and interpret the results.
P-value for the hypothesis test of 𝛃 = 𝟎, is 0.7710. This means that the
coefficient is not statistically significant at 95 % confidence level.
P-value for the hypothesis test of 𝛃 = 𝟎. 𝟔 is 0.0596. Still we fail to reject null
at 95% confidence level though in the very margin.
Corresponding p-values show the probability of obtaining the value of beta
(from other possible samples) at least as extreme as the one we obtained from
the sample at hand given 𝐇𝟎 is true. Formally, assuming two-sided hypothesis
test, it would be
̂ ] ≥ 𝟎|𝐇𝟎 𝐢𝐬 𝐭𝐫𝐮𝐞)
𝐩 − 𝐯𝐚𝐥𝐮𝐞 = 𝐏(𝛃 − [𝛃
̂ is the value obtained from the estimation using the sample at hand,
where 𝛃
and square brackets are absolute values.
(h) Generate vi as a random normal variable with mean 0 and standard deviation 1. Generate qi
as
qi = xi + 0.5xi2 + vi
What is the correlation between q and x? Generate outcome variable zi, with β = 0.3 and
γ = 0.2
zi = α + βxi + γqi + ǫi
g v = rnormal()
g q = x + 0.5*x^2 + v
correlate q x
Samir Orujov 4
. correlate q x
(obs=100)
q x
q 1.0000
x 0.3329 1.0000
g gamma = 0.2
g z = alpha + beta*x + gamma*q + epsilon
reg z x q
. reg z x q
F(df,97) df p
# unadjusted p-values
The true model is given by the following (I suppress subscripts for ease):
𝑦 = 𝛼+𝛽∗𝑥+𝛾∗𝑞+𝜖
Because 𝑞 = 𝑥 + 0.5 ∗ 𝑥 2 + 𝑣 then we can re-write everything a follows:
𝑦 = 𝛼 + 𝛽 ∗ 𝑥 + 𝛾 ∗ (𝑥 + 0.5 ∗ 𝑥 2 + 𝑣) + 𝜖 (1)
If instead of true model (1) we estimate the following “wrong model” (2) then coefficients
will be biased:
𝑦 =𝛼+𝛽∗𝑥+𝜖 (2)
If we know the true model (1) then we can compute the degree of bias in 𝛽 coefficient by
replacing the true model in the above equation (3):
𝑐𝑜𝑣(𝑥, 𝑦) 𝑐𝑜𝑣(𝑥, 𝛼 + 𝛽𝑥 + 𝛾𝑥 + 0.5𝛾𝑥 2 + 𝛾𝑣 + 𝜖)
𝛽̂ = =
𝑣𝑎𝑟(𝑥) 𝑣𝑎𝑟(𝑥)
𝑐𝑜𝑣(𝑥, 𝑥 2 )
= 𝛽 + 𝛾 + 0.5 ∗ 𝛾 ∗
𝑣𝑎𝑟(𝑥)
1
we know that x is Uniformly distributed within [0,1], therefore we know that 𝑣𝑎𝑟(𝑥) = 12.
Now let us find 𝑐𝑜𝑣(𝑥, 𝑥 2 ):
+∞ +∞
𝑐𝑜𝑣(𝑥, 𝑥 2 ) = 𝐸(𝑥 3 ) − 𝐸(𝑥) ∗ 𝐸(𝑥 2 ) = ∫ 𝑥 3 𝑑𝑥 − 0.5 ∗ ∫ 𝑥 2 𝑑𝑥
−∞ ∞
𝑥4 𝑥3 1 1 1
= |10 − 0.5 ∗ ( ) |10 = − =
4 3 4 6 12
In the end, we can even calculate the exact value of bias because we know all parameters
𝑐𝑜𝑣(𝑥,𝑥 2 )
(note that = 1 in this case):
𝑣𝑎𝑟(𝑥)
Robust
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
We can see that using different estimation techniques for standard errors are
so close that they do not change the statistical inference at 95% confidence
level. This shows that assuming spherical disturbances is quite valid
assumption and statistical inference can be done based on traditional
standard errors. Under standard assumptions we are assuming that there is
no change in variance and all off diagonal elements of variance-covariance
matrix is zero, to put it otherwise errors are iid. Under white estimation we
are departing from the usual assumption of iid errors and allow errors to be
correlated and the variance to change across observations. The procedure
relies on getting some weighting matrix and given the weighting matrix is
correct we can get super-efficient standard errors. In the third case, we are
basically taking repeated samples from the sample at hand and find the
Samir Orujov 8
Robust
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
Jackknife
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
(h) Compare the results from these two methods to the results from (e).
When I clustered standard errors in the village level, I assumed that unobservable
qualities of people in the same districts are correlated, although those qualities for
people living in different villages are not. Clustered standard errors may be or
may not be larger than the usual error estimates. It actually depends on the
structure of intra-cluster correlation. If the correlation is positive then clustering
will cause larger errors. As we can see both versions of clustering – usual and
jackknife – give almost the same results and statistical inference stays the same. It
means that assumptions we made about assymptotics while estimating standard
errors in the usual way is possibly correct. For example, we assume that variance
will follow chi-square distribution but while bootstrapping we may find something
different from our assumed distribution.
(i) Now estimate a regression of another outcome – w on treatment and use your proposed
methods from (g). Discuss the results and compare them to (e) and (g).
Robust
w Coef. Std. Err. t P>|t| [95% Conf. Interval]
Jackknife
w Coef. Std. Err. t P>|t| [95% Conf. Interval]
-.3316979* .0363805
reg y treatment, vce(jackknife, cluster(id_village)
(.3966937) (.2587612)
.6807807* .035184
reg w treatment , vce(cluster id_village)
(.3373106 ) (.2115615)
.6807807* .035184
reg w treatment , vce(jacknife, cluster(id_village))
(.3493701) (.2144746)
(j) * Describe a procedure that accounts for multiple testing with two outcomes y and w (for a
hint you can refer to Casey, Glennerster, and Miguel “Reshaping Institutions: Evidence on
Aid Impacts Using a Preanalysis Plan”).
When we are testing several hypothesis together the probability of rejecting null
hypothesis by chance is increasing. For example, if we are testing 100 hypothesis with
95% confidence level and given those hypothesis are independent then the expected value
of false rejections is 5. However, the probability of rejecting at least 1 true hypothesis by
chance equals 1 − 0.95100 .We want to decrease the probability of commiting type I
errors in multiple comparisons. To that end, we have to use FWER(Family wise error
rate). FWER is the probability of rejecting true null hypothesis given independence:
𝐹𝑊𝐸𝑅 = 1 − (1 − 𝛼)𝑚
There are some suggested corrections to decrease FWER. There are Bonferroni and Šidák
corrections (Wikipedia). There are different methodologies based on resampling
techniques for dealing with multiple comparison errors and p-value adjustments.
(a) Generate given longitude and latitude variables x, y, construct x2, y2, xy, x3, y3, x2y, xy2.
clear all
use "mitaData.dta"
g x2 = x^2
g y2 = y^2
g xy = x*y
g x3 = x^3
Samir Orujov 12
g y3 = y^3
g x2y = x^2*y
g xy2 = x*y^2
(b) Regress the log equivalent household consumption (2001)(lhhequiv) on mita (pothuanmita), all
polynomial terms in (a), elevation (elv-sh), mean slope, infants, children, adults, and boundary
segment fixed effects (bfe4-1, bfe4-2, bfe4-3). Cluster the standard errors by district. Run the
regression in 3 ways: first, for observations where the distance to mita boundary (d-bnd) is
less than 100km, next when it is less than 75km, and lastly when it is less than 50km.
Robust
logcons Coef. Std. Err. t P>|t| [95% Conf. Interval]
Robust
logcons Coef. Std. Err. t P>|t| [95% Conf. Interval]
Robust
logcons Coef. Std. Err. t P>|t| [95% Conf. Interval]
i. What is the coefficient on mita? Are the results significant at the 5% level? Interpret
them.
(c) Run the same regressions as in (b), but instead of polynomial terms in longitude and latitude,
use a cubic polynomial in distance to Potosi (dpot). That is, include the first, second and
third powers of this variable in the regressions. Again, cluster the standard errors by district
and run the regression in 3 ways: first, for observations where the distance to mita boundary
(d-bnd) is less than 100km, next when it is less than 75km, and lastly when it is less than 50km.
g dpot2 = dpot^2
g dpot3 = dpot^3
reg logcons pothuan_mita dpot dpot2 dpot3 elv_sh slope infants children adults
bfe4_1 bfe4_2 bfe4_3 if d_bnd<100 , vce(cluster district)
Robust
logcons Coef. Std. Err. t P>|t| [95% Conf. Interval]
reg logcons pothuan_mita dpot dpot2 dpot3 elv_sh slope infants children adults
bfe4_1 bfe4_2 bfe4_3 if d_bnd<75 , vce(cluster district)
Samir Orujov 16
Robust
logcons Coef. Std. Err. t P>|t| [95% Conf. Interval]
reg logcons pothuan_mita dpot dpot2 dpot3 elv_sh slope infants children adults
bfe4_1 bfe4_2 bfe4_3 if d_bnd<50 , vce(cluster district)
Robust
logcons Coef. Std. Err. t P>|t| [95% Conf. Interval]
i. What is the coefficient on mita? Are the results significant at the 5% level? Interpret
them.????
Samir Orujov 17
(d) Is there any difference between the coefficients on mita in (b) and (c) as well as their signifi-
cances? If yes, why might there be a difference? If no, why should we expect the same results?
I do not know the exact meaning of the variables so I can not come up with
economic interpretation. However, this kind of change in the coefficients and
standard errors may be due to overfitting, multicollinearity, OVB or wrong model
specification. In this specific case, I think overfitting is highly possible. When we
are adding 9 polynomial terms they will capture all variation and some other
variables will be insignificant. Why overfitting is most possibly the case is because
the coefficients and standard errors are not far from each other and actually in
the first case we are rejecting the Null Hypothesis in the very margin of 10%.
Moreover, we now that we have to increase sample size when we have more
parameters to estimate. The sample size is fixed and the only difference is that we
are fitting a bigger model in the first case. This fact along with the fact that
parameter and standard error estimates of mita are not too far from each other, I
am assuming that overfitting was the case most probably.
Problem 4 (30 points)
Use firmTax.dta for the following exercise. Small firms in country C are enjoying a simplified tax bracket,
with corporate tax dropping from 20% to 5% if their operating revenues are lower than 6,420,000 in local
currency. The research department of the Ministry of Economic Development (MED) claims that small
and more efficient firms largely benefit from the simplified tax bracket. They provide RD estimates to
support their claims. You have the data on operating revenues, gross profit, gross profit margin, net profit
margin, cost to revenue ratio.
(a) Can you reconstruct the argument of the MED using Regression Discontinuity design?
In STATA 12, it is not possible to use rdrobust and other relevant commands. I
have STATA12 so I used the feasible approach.
I begin with graphical analysis following Lee, Moretti and Buttler (2004) –
hereinafter referred to as LMB(2004) - and I benefited from their STATA codes.
For doing graphical analysis for each variable of interest I use a program (it is
based on LMB(2004) STATA codes) not to redo everything again and again. The
program is called “qrafik” and you can see the whole program in the STATA do
file. Here I normalized threshold variable (opre) so that the threshold is set at
Samir Orujov 18
exactly 1 (one). In this case, companies with threshold value above 1 will not get
the treatment and the ones below 1 will get. The reason for this normalization
(division by 6,420,000) is to get nice graphs. Then, I find the mean of variables of
interest (in our case, net profit margin(prma), gross profit margin(grma), cost to
revenue ratio(cost_share) and gross profit(gros)) within 0.01-wide intervals of the
normalized threshold variable. Moreover, I fit 4th order polynomial to the data and
95% confidence interval of the fitted values following LMB(2004). The following
graphs are generated:
Graph 1. Discontinuity around threshold 1. This is the graph of mean net profit margin
of firms drawn with 0.01 – wide intervals of the threshold variable.
Samir Orujov 19
Graph 2. This is the mean of gross profit margin within drawn 0.01 unit intervals of
threshold variable. We see some outliers but in general discontinuity around the
threshold is present.
Samir Orujov 20
Graph 3. This is the graph of mean cost to revenue ratio drawn within 0.01 – wide
intervals of the threshold variable.
Samir Orujov 21
Graph 4. This is the graph of mean gross profit drawn in 0.01-wide intervals off the
threshold variable.
Further, for constructing RDD assumption for MED I am assuming the following:
• Assignment to tax bracket treatment is a deterministic function of a running
variable, namely, operating revenues;
• Probability of assignment jumps from 0 to 1 at cut-off (sharp RDD design);
• potential outcomes’ conditional expectation is continuous in the neighborhood of
the cutoff. In economic terms, firms cannot manipulate their operating revenues to
fall below or above the threshold. To generalize, I am assuming that firms just
below and just above the cutoff are equally comparable for their characteristics.
(b) What is the estimated treatment effect of simplified tax bracket on the variables of interest?
Assume for now that MED made correct assumptions about the RDD design.
Again here I benefitted from LMB(2004) Stata program and estimated the effect
of simplified tax bracket on the variables of interest. The following are my
estimation results:
Samir Orujov 22
(c) Do you think that the continuity of the expectation–a standard RD assumption–is likely to
hold in this setting? Please provide an intuitive economic argument, along with the formal
test from operating revenues.
Samir Orujov 24
In economic terms, firms can precisely manipulate with their operating revenues
to get the beneficial tax bracket which completely destroys RDD setup. This kind
of manipulations should show up in the discontinuity of the density of running
variable which is operating revenues in this case. The details can be found in
McCrary (2006). Now I will perform Mccrary formal test of possible
manipulation and I will make the use of his example STATA code(“Codes for
Manipulation of the Running Variable” n.d.). We can see the discontinuity in the
conditional expectation (as if more mass is accumulated below the threshold
where firms get benefit of lower taxes) and as noted below log difference in
heights is statistically significant even at 1% level of significance. The results are
as follows:
3
2
1
0
-1 0 1 2 3
(d) * Please derive and estimate corporate tax elasticity around the cutoff.
I am using two approaches because I had some confusion about the question
Samir Orujov 25
asked. In the hint, we are asked to find the responsiveness of variables to tax rate
changes, but tax elasticity is the response of variables of interest to change in
taxes.
First of all, I want to tackle the response of variables to tax rate changes
assuming RDD assumptions hold. Firms above threshold will pay 20% and below
the threshold will pay 5% tax of their operating revenues. I created a new
variable called ltr(log tax rate) which gets the value of log(20) if firms are not
treated and gets the value of log(5) if firms are treated and then I estimate the
following regression :
𝐥𝐨𝐠(𝒚𝒊 ) = 𝜶 + 𝜷 ∗ 𝐥𝐨𝐠(𝒕𝒂𝒙 𝒓𝒂𝒕𝒆𝒊 ) + 𝝐𝒊
I use all given variables as dependent variables and estimated beta coefficients
are the tax elasticity of the respective variables. The regression results are as
follows:
Moreover, I computed corporate tax (+interest) paid by each firm and then I estimated
how increase in tax affects different variables. The key assumption here is that taxes are
lump-sum. The results are in the output file.
Furthermore, interesting fact is that as the tax rate increases around the cutoff gross
profits and operating revenues are increasing while margins are declining. This fact
means that less efficient firms are choosing to be below the cutoff while more viable
companies dare to stay above and pay higher tax payments.