Problem Set 1

Samir Orujov 1
Problem Set 1
Submit by October 18 class for full credit
Please include stata do-file code and output for all exercises. Please submit the description of your results as
a pdf–printed or electronic. * indicates bonus points, but it is still in your best interest to solve them.
Problem 1 (15 points)
(a) Open a blank Stata dataset and set the number of observations to 100.
Please refer to original output file if you find some discrepancies. I may commit
errors while copy-pasting.
set obs 100
(b) Generate a variable, i, running from 1 to 100.
g i = _n
(c) Generate another variable α which has value 0.5 for all 100 observations.
g alpha = 0.5
(d) Generate ǫi as a random normal variable with mean 0 and standard deviation 1.
g epsilon = rnormal()
(e) Generate xi as a random uniform variable with range [0, 1].
g x = runiform()
(f) Generate outcome variable yi where β = 0.3,

Samir Orujov 2
yi = α + βxi + ǫi,
g beta = 0.3
g y = alpha + beta * x + epsilon
(g) Estimate β by OLS.
reg y x
Source SS df MS Number of obs = 100

F( 1, 98) = 0.09
Model .09586559 1 .09586559 Prob > F = 0.7710
Residual 110.282627 98 1.12533293 R-squared = 0.0009
Adj R-squared = -0.0093
Total 110.378492 99 1.11493427 Root MSE = 1.0608
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
x -.1085041 .3717538 -0.29 0.771 -.8462374 .6292292

_cons .7225878 .2281536 3.17 0.002 .2698244 1.175351
i. Test H0: β = 0.
test x, mtest
F(df,98) df p
(1) 0.09 1 0.7710 #
all 0.09 1 0.7710
# unadjusted p-values
ii. Test H0 : β = 0.6.
test x=0.6 , mtest

Samir Orujov 3
F(df,98) df p
(1) 3.63 1 0.0596 #
all 3.63 1 0.0596
iii. Report the p-values for testing both hypotheses and interpret the results.
P-value for the hypothesis test of 𝛃 = 𝟎, is 0.7710. This means that the
coefficient is not statistically significant at 95 % confidence level.
P-value for the hypothesis test of 𝛃 = 𝟎. 𝟔 is 0.0596. Still we fail to reject null
at 95% confidence level though in the very margin.
Corresponding p-values show the probability of obtaining the value of beta
(from other possible samples) at least as extreme as the one we obtained from
the sample at hand given 𝐇𝟎 is true. Formally, assuming two-sided hypothesis
test, it would be
̂ ] ≥ 𝟎|𝐇𝟎 𝐢𝐬 𝐭𝐫𝐮𝐞)
𝐩 − 𝐯𝐚𝐥𝐮𝐞 = 𝐏(𝛃 − [𝛃
̂ is the value obtained from the estimation using the sample at hand,
where 𝛃
and square brackets are absolute values.
(h) Generate vi as a random normal variable with mean 0 and standard deviation 1. Generate qi
as
qi = xi + 0.5xi2 + vi
What is the correlation between q and x? Generate outcome variable zi, with β = 0.3 and
γ = 0.2
zi = α + βxi + γqi + ǫi
Estimate β by running a regression of zi on xi and qi.
g v = rnormal()
g q = x + 0.5*x^2 + v
correlate q x
Samir Orujov 4
. correlate q x
(obs=100)
q x
q 1.0000
x 0.3329 1.0000
g gamma = 0.2
g z = alpha + beta*x + gamma*q + epsilon
reg z x q
. reg z x q

F( 2, 97) = 0.33
Model .528062014 2 .264031007 Prob > F = 0.7202
Residual 77.7543559 97 .801591298 R-squared = 0.0067
Adj R-squared = -0.0137
Total 78.2824179 99 .790731494 Root MSE = .89532
z Coef. Std. Err. t P>|t| [95% Conf. Interval]
x .0069954 .3412651 0.02 0.984 -.6703214 .6843121

q .0690276 .0925611 0.75 0.458 -.1146806 .2527358
_cons .8164275 .1744116 4.68 0.000 .4702687 1.162586
i. Test H0 : β = 0.3 and report the p-value. Interpret the results.

test x=0.3, mtest
F(df,97) df p
(1) 0.21 1 0.6509 #
all 0.21 1 0.6509
(i) Derive an explicit formula for the OVB in this case.

Samir Orujov 5
The true model is given by the following (I suppress subscripts for ease):
𝑦 = 𝛼+𝛽∗𝑥+𝛾∗𝑞+𝜖
Because 𝑞 = 𝑥 + 0.5 ∗ 𝑥 2 + 𝑣 then we can re-write everything a follows:
𝑦 = 𝛼 + 𝛽 ∗ 𝑥 + 𝛾 ∗ (𝑥 + 0.5 ∗ 𝑥 2 + 𝑣) + 𝜖 (1)
If instead of true model (1) we estimate the following “wrong model” (2) then coefficients
will be biased:
𝑦 =𝛼+𝛽∗𝑥+𝜖 (2)
The estimate of 𝛽 from 2 will be given by the following formula:

𝑐𝑜𝑣(𝑥,𝑦)
𝛽̂ = 𝑣𝑎𝑟(𝑥) (3)
If we know the true model (1) then we can compute the degree of bias in 𝛽 coefficient by
replacing the true model in the above equation (3):
𝑐𝑜𝑣(𝑥, 𝑦) 𝑐𝑜𝑣(𝑥, 𝛼 + 𝛽𝑥 + 𝛾𝑥 + 0.5𝛾𝑥 2 + 𝛾𝑣 + 𝜖)
𝛽̂ = =
𝑣𝑎𝑟(𝑥) 𝑣𝑎𝑟(𝑥)
𝑐𝑜𝑣(𝑥, 𝑥 2 )
= 𝛽 + 𝛾 + 0.5 ∗ 𝛾 ∗
𝑣𝑎𝑟(𝑥)
1
we know that x is Uniformly distributed within [0,1], therefore we know that 𝑣𝑎𝑟(𝑥) = 12.
Now let us find 𝑐𝑜𝑣(𝑥, 𝑥 2 ):
+∞ +∞
𝑐𝑜𝑣(𝑥, 𝑥 2 ) = 𝐸(𝑥 3 ) − 𝐸(𝑥) ∗ 𝐸(𝑥 2 ) = ∫ 𝑥 3 𝑑𝑥 − 0.5 ∗ ∫ 𝑥 2 𝑑𝑥
−∞ ∞
𝑥4 𝑥3 1 1 1
= |10 − 0.5 ∗ ( ) |10 = − =
4 3 4 6 12
In the end, we can even calculate the exact value of bias because we know all parameters
𝑐𝑜𝑣(𝑥,𝑥 2 )
(note that = 1 in this case):
𝑣𝑎𝑟(𝑥)
𝑏𝑖𝑎𝑠 = 𝛾 + 0.5 ∗ 𝛾 = 0.2 + 0.1 = 0.3

Samir Orujov 6
(a) Open data1 using “use” command.
use “./data1.dta” , clear

(b) Use the variable id_village to merge data1 and data2. Note that observations in data1 are
“individuals”, while observations in data2 are “villages”.
merge m:1 id_village using "data2.dta"
(c) Set seed to 10101, so it is easier to check some of the answers.

set seed 10101
(d) Dataset data2 contains village-level treatment. Estimate a regression of y on treatment (with
a constant).
reg y treatment // HOMOSKEDASTICITY
(e) Please try several ways of estimating the standard errors:

i. Standard errors under homoscedasticity
reg y treatment // HOMOSKEDASTICITY

F( 1, 2198) = 6.51
Model 55.0667764 1 55.0667764 Prob > F = 0.0108
Adj R-squared = 0.0025
Total 18644.6369 2199 8.47868892 Root MSE = 2.9082
treatment -.3316979 .1299926 -2.55 0.011 -.5866192 -.0767767

_cons .0363805 .0769047 0.47 0.636 -.114433 .187194
ii. White standard errors,

reg y treatment, vce(robust) // WHITE
Samir Orujov 7
Linear regression Number of obs = 2200

F( 1, 2198) = 6.48
Prob > F = 0.0110
R-squared = 0.0030
Root MSE = 2.9082
Robust
treatment -.3316979 .1303434 -2.54 0.011 -.5873071 -.0760888

_cons .0363805 .0766619 0.47 0.635 -.1139568 .1867178
iii. Bootstrap standard errors.

reg y treatment, vce(bootstrap) // BOOTSTRAP

Replications = 50
Wald chi2(1) = 5.57
Prob > chi2 = 0.0183
R-squared = 0.0030
Root MSE = 2.9082
Observed Bootstrap Normal-based

y Coef. Std. Err. z P>|z| [95% Conf. Interval]
treatment -.3316979 .1405845 -2.36 0.018 -.6072386 -.0561573

_cons .0363805 .0784216 0.46 0.643 -.1173231 .1900841
(f) Compare the results in (i-iii) and discuss.
We can see that using different estimation techniques for standard errors are
so close that they do not change the statistical inference at 95% confidence
level. This shows that assuming spherical disturbances is quite valid
assumption and statistical inference can be done based on traditional
standard errors. Under standard assumptions we are assuming that there is
no change in variance and all off diagonal elements of variance-covariance
matrix is zero, to put it otherwise errors are iid. Under white estimation we
are departing from the usual assumption of iid errors and allow errors to be
correlated and the variance to change across observations. The procedure
relies on getting some weighting matrix and given the weighting matrix is
correct we can get super-efficient standard errors. In the third case, we are
basically taking repeated samples from the sample at hand and find the
Samir Orujov 8
average of standard errors estimated from pseudo-samples. Bootstrap is

usually good when we have small samples. If bootstrap and usual asymptotic
methods give completely different results this means we have some
unidentified information within data.
(g) Propose at least two ways of inference that suits the setting: one that does not involve boot-
strap, one that does.
I will use clustered standard errors in the village level:
reg y treatment , vce(cluster id_village)

F( 1, 39) = 0.75
Prob > F = 0.3926
R-squared = 0.0030
Root MSE = 2.9082
(Std. Err. adjusted for 40 clusters in id_village)
Robust
treatment -.3316979 .3836552 -0.86 0.393 -1.107714 .444318

_cons .0363805 .2552466 0.14 0.887 -.4799046 .5526656
Then I use clustered jackknife method which involves bootstrapping. Both

bootstrap and jackknife relies on resampling but the main difference is the way it
is done. In jackknife method we are leaving one specific observation and
calculating the statistic without that specific observation and doing it across all
available observations and finding the mean of the statistic from those pseudo-
samples. Jackknife is good to suppress the effect of outliers. On the other hand,
bootstrap is based on random resampling from the data. There are many
variations of bootstrap and actually jackknife can be seen as one type of
bootstrap:
reg y treatment , vce(jacknife, cluster(id_village))

Samir Orujov 9

Replications = 40
F( 1, 39) = 0.70
Prob > F = 0.4082
R-squared = 0.0030
Root MSE = 2.9082
(Replications based on 40 clusters in id_village)
Jackknife
treatment -.3316979 .3966937 -0.84 0.408 -1.134087 .4706908

_cons .0363805 .2587612 0.14 0.889 -.4870135 .5597745
(h) Compare the results from these two methods to the results from (e).
When I clustered standard errors in the village level, I assumed that unobservable
qualities of people in the same districts are correlated, although those qualities for
people living in different villages are not. Clustered standard errors may be or
may not be larger than the usual error estimates. It actually depends on the
structure of intra-cluster correlation. If the correlation is positive then clustering
will cause larger errors. As we can see both versions of clustering – usual and
jackknife – give almost the same results and statistical inference stays the same. It
means that assumptions we made about assymptotics while estimating standard
errors in the usual way is possibly correct. For example, we assume that variance
will follow chi-square distribution but while bootstrapping we may find something
different from our assumed distribution.
(i) Now estimate a regression of another outcome – w on treatment and use your proposed
methods from (g). Discuss the results and compare them to (e) and (g).
reg w treatment , vce(cluster id_village)

Samir Orujov 10

F( 1, 39) = 4.07
Prob > F = 0.0505
R-squared = 0.0124
Root MSE = 2.8955
(Std. Err. adjusted for 40 clusters in id_village)
Robust
w Coef. Std. Err. t P>|t| [95% Conf. Interval]
treatment .6807807 .3373106 2.02 0.050 -.0014944 1.363056

_cons .035184 .2115615 0.17 0.869 -.3927395 .4631075
reg w treatment , vce(jacknife, cluster(id_village))

........................................

Replications = 40
F( 1, 39) = 3.80
Prob > F = 0.0586
R-squared = 0.0124
Root MSE = 2.8955
(Replications based on 40 clusters in id_village)
Jackknife
w Coef. Std. Err. t P>|t| [95% Conf. Interval]
treatment .6807807 .3493701 1.95 0.059 -.0258871 1.387449

_cons .035184 .2144746 0.16 0.871 -.3986318 .4689997
Models Coefficients constant

(Standard
errors)
-.3316979* .0363805
reg y treatment
(.1299926) (.0769047)
-.3316979* .0363805
reg y treatment, robust
(.1303434 (.0766619)
-.3316979* .0363805
reg y treatment, bootstrap
(.1405845) (.0784216)
-.3316979* .0363805
reg y treatment, vce(cluster id_village)
(.3836552) (.2552466)
Samir Orujov 11
-.3316979* .0363805
reg y treatment, vce(jackknife, cluster(id_village)
(.3966937) (.2587612)
.6807807* .035184
reg w treatment , vce(cluster id_village)
(.3373106 ) (.2115615)
.6807807* .035184
reg w treatment , vce(jacknife, cluster(id_village))
(.3493701) (.2144746)
* means statistical significance at 95% confidence level

standard errors are in paranthesis
(j) * Describe a procedure that accounts for multiple testing with two outcomes y and w (for a
hint you can refer to Casey, Glennerster, and Miguel “Reshaping Institutions: Evidence on
Aid Impacts Using a Preanalysis Plan”).
When we are testing several hypothesis together the probability of rejecting null
hypothesis by chance is increasing. For example, if we are testing 100 hypothesis with
95% confidence level and given those hypothesis are independent then the expected value
of false rejections is 5. However, the probability of rejecting at least 1 true hypothesis by
chance equals 1 − 0.95100 .We want to decrease the probability of commiting type I
errors in multiple comparisons. To that end, we have to use FWER(Family wise error
rate). FWER is the probability of rejecting true null hypothesis given independence:
𝐹𝑊𝐸𝑅 = 1 − (1 − 𝛼)𝑚
There are some suggested corrections to decrease FWER. There are Bonferroni and Šidák
corrections (Wikipedia). There are different methodologies based on resampling
techniques for dealing with multiple comparison errors and p-value adjustments.

Use mitaData.dta for the following exercise.
(a) Generate given longitude and latitude variables x, y, construct x2, y2, xy, x3, y3, x2y, xy2.
clear all
use "mitaData.dta"
g x2 = x^2
g y2 = y^2
g xy = x*y
g x3 = x^3
Samir Orujov 12
g y3 = y^3
g x2y = x^2*y
g xy2 = x*y^2
(b) Regress the log equivalent household consumption (2001)(lhhequiv) on mita (pothuanmita), all
polynomial terms in (a), elevation (elv-sh), mean slope, infants, children, adults, and boundary
segment fixed effects (bfe4-1, bfe4-2, bfe4-3). Cluster the standard errors by district. Run the
regression in 3 ways: first, for observations where the distance to mita boundary (d-bnd) is
less than 100km, next when it is less than 75km, and lastly when it is less than 50km.
reg logcons pothuan_mita x y x2 y2 xy x3 y3 x2y xy2 elv_sh slope infants children

adults bfe4_1 bfe4_2 bfe4_3 if d_bnd<100 , vce(cluster district)

F( 18, 70) = 10.44
Prob > F = 0.0000
R-squared = 0.0905
Root MSE = .14044
(Std. Err. adjusted for 71 clusters in district)
Robust
logcons Coef. Std. Err. t P>|t| [95% Conf. Interval]
pothuan_mita -.0594741 .036176 -1.64 0.105 -.1316249 .0126767

x -.0410425 .031613 -1.30 0.198 -.1040925 .0220076
y -.1585 .0782443 -2.03 0.047 -.3145534 -.0024467
x2 .0090631 .0130927 0.69 0.491 -.0170496 .0351757
y2 .0066574 .0349164 0.19 0.849 -.0629812 .0762959
xy -.0084695 .0257764 -0.33 0.743 -.0598789 .04294
x3 .0063865 .0126277 0.51 0.615 -.0187987 .0315716
y3 .0908958 .0475957 1.91 0.060 -.0040308 .1858225
x2y .0368065 .0266248 1.38 0.171 -.016295 .089908
xy2 .070916 .0337816 2.10 0.039 .0035407 .1382913
elv_sh .0118423 .0267735 0.44 0.660 -.0415557 .0652403
slope -.0029408 .002933 -1.00 0.319 -.0087904 .0029088
infants -.0069905 .0048865 -1.43 0.157 -.0167362 .0027553
children -.0001553 .0024349 -0.06 0.949 -.0050116 .004701
adults -.0039418 .0030746 -1.28 0.204 -.010074 .0021903
bfe4_1 .1469286 .0469554 3.13 0.003 .0532791 .2405781
bfe4_2 .0756296 .0657749 1.15 0.254 -.0555543 .2068136
bfe4_3 .0284651 .0333884 0.85 0.397 -.0381259 .0950561
_cons 1.710242 .1379949 12.39 0.000 1.43502 1.985464

Samir Orujov 13

F( 18, 59) = 5.16
Prob > F = 0.0000
R-squared = 0.0888
Root MSE = .13015
Robust
pothuan_mita -.0557118 .0367423 -1.52 0.135 -.1292329 .0178093

x -.0004655 .0360947 -0.01 0.990 -.0726909 .0717598
y -.1830247 .0653171 -2.80 0.007 -.3137239 -.0523254
x2 .0022833 .019233 0.12 0.906 -.0362019 .0407685
y2 .0815247 .0456433 1.79 0.079 -.0098073 .1728567
xy .0528894 .0294454 1.80 0.078 -.0060306 .1118094
x3 -.0502768 .0262333 -1.92 0.060 -.1027696 .0022159
y3 .1673342 .0478226 3.50 0.001 .0716414 .2630271
x2y -.0432456 .0366265 -1.18 0.242 -.1165349 .0300438
xy2 .1223449 .0332225 3.68 0.001 .0558668 .1888229
elv_sh .0010666 .0234575 0.05 0.964 -.0458717 .0480048
slope -.0056028 .0027055 -2.07 0.043 -.0110164 -.0001891
infants -.0064558 .0054634 -1.18 0.242 -.0173879 .0044764
children -.0028381 .0026412 -1.07 0.287 -.0081233 .002447
adults -.0007483 .0035152 -0.21 0.832 -.0077822 .0062855
bfe4_1 .1483173 .0481202 3.08 0.003 .052029 .2446055
bfe4_2 .1477108 .0520848 2.84 0.006 .0434893 .2519323
bfe4_3 .0315318 .0293674 1.07 0.287 -.0272323 .090296
_cons 1.731693 .1162191 14.90 0.000 1.499139 1.964246

Samir Orujov 14

F( 18, 51) = 5.73
Prob > F = 0.0000
R-squared = 0.1145
Root MSE = .1251
Robust
pothuan_mita -.0734743 .038322 -1.92 0.061 -.1504089 .0034603

x -.016184 .050196 -0.32 0.748 -.1169567 .0845887
y -.1401786 .0756387 -1.85 0.070 -.2920297 .0116725
x2 .0491198 .0239037 2.05 0.045 .0011311 .0971084
y2 .0583194 .0592233 0.98 0.329 -.0605764 .1772151
xy .0310647 .0416743 0.75 0.459 -.0525999 .1147294
x3 -.0454783 .0408645 -1.11 0.271 -.1275172 .0365606
y3 .1223936 .0507374 2.41 0.019 .020534 .2242532
x2y -.0279855 .0428381 -0.65 0.517 -.1139865 .0580155
xy2 .1087628 .0383512 2.84 0.007 .0317695 .1857561
elv_sh .0105569 .0222841 0.47 0.638 -.0341802 .055294
slope -.0027022 .0028967 -0.93 0.355 -.0085175 .0031131
infants -.0088926 .005591 -1.59 0.118 -.0201171 .0023319
children -.0026397 .0028304 -0.93 0.355 -.008322 .0030426
adults -.0010211 .0035272 -0.29 0.773 -.0081023 .00606
bfe4_1 .1233631 .0807723 1.53 0.133 -.0387941 .2855203
bfe4_2 .0959664 .0638427 1.50 0.139 -.0322031 .224136
bfe4_3 .0371415 .0298548 1.24 0.219 -.0227945 .0970775
_cons 1.681478 .1087321 15.46 0.000 1.463189 1.899767
i. What is the coefficient on mita? Are the results significant at the 5% level? Interpret
them.
Mita Boundary Mita p-values significance at 95%

(d_bond) Coefficients confidence level
(pothuan_mita)
<100 - 0.059 0.105 NOT

<75 - 0.055 0.135 NOT
<50 - 0.073 0.061 NOT
Samir Orujov 15
(c) Run the same regressions as in (b), but instead of polynomial terms in longitude and latitude,
use a cubic polynomial in distance to Potosi (dpot). That is, include the first, second and
third powers of this variable in the regressions. Again, cluster the standard errors by district
and run the regression in 3 ways: first, for observations where the distance to mita boundary
(d-bnd) is less than 100km, next when it is less than 75km, and lastly when it is less than 50km.
g dpot2 = dpot^2
g dpot3 = dpot^3
reg logcons pothuan_mita dpot dpot2 dpot3 elv_sh slope infants children adults
bfe4_1 bfe4_2 bfe4_3 if d_bnd<100 , vce(cluster district)

F( 12, 70) = 5.13
Prob > F = 0.0000
R-squared = 0.0762
Root MSE = .14126
Robust
pothuan_mita -.0632895 .0147974 -4.28 0.000 -.092802 -.033777

dpot .0294137 .7963197 0.04 0.971 -1.558796 1.617623
dpot2 -.0157608 .0965172 -0.16 0.871 -.2082582 .1767367
dpot3 .0009554 .0037964 0.25 0.802 -.0066164 .0085271
elv_sh -.0290342 .0205493 -1.41 0.162 -.0700185 .01195
slope -.0031343 .0028689 -1.09 0.278 -.0088561 .0025876
infants -.0083551 .0047674 -1.75 0.084 -.0178634 .0011532
children -.0004426 .0026488 -0.17 0.868 -.0057254 .0048402
adults -.0037019 .0031714 -1.17 0.247 -.0100272 .0026233
bfe4_1 .0738122 .0190377 3.88 0.000 .0358428 .1117816
bfe4_2 -.0221257 .0409173 -0.54 0.590 -.1037326 .0594812
bfe4_3 .0306035 .0256084 1.20 0.236 -.020471 .0816779
_cons 2.250642 2.150966 1.05 0.299 -2.039325 6.540608
Samir Orujov 16

F( 12, 59) = 2.20
Prob > F = 0.0229
R-squared = 0.0474
Root MSE = .13272
Robust
pothuan_mita -.063153 .0183666 -3.44 0.001 -.0999045 -.0264016

dpot 1.412985 1.434124 0.99 0.329 -1.456691 4.282661
dpot2 -.165869 .1688307 -0.98 0.330 -.5036984 .1719604
dpot3 .006347 .0064863 0.98 0.332 -.0066321 .019326
elv_sh -.0278152 .0288814 -0.96 0.339 -.0856066 .0299763
slope -.0030299 .0027855 -1.09 0.281 -.0086037 .0025438
infants -.0088307 .0055446 -1.59 0.117 -.0199254 .002264
children -.0026446 .002886 -0.92 0.363 -.0084196 .0031303
adults -.0014728 .0036869 -0.40 0.691 -.0088502 .0059046
bfe4_1 .0716319 .0251974 2.84 0.006 .021212 .1220518
bfe4_2 -.015429 .0460015 -0.34 0.739 -.1074778 .0766197
bfe4_3 .0342669 .0279974 1.22 0.226 -.0217557 .0902895
_cons -1.980116 3.934761 -0.50 0.617 -9.853554 5.893322

F( 12, 51) = 3.26
Prob > F = 0.0015
R-squared = 0.0792
Root MSE = .12719
Robust
pothuan_mita -.0678048 .0173613 -3.91 0.000 -.1026591 -.0329504

dpot 1.994647 3.860143 0.52 0.608 -5.754913 9.744207
dpot2 -.2638769 .4044541 -0.65 0.517 -1.075852 .5480985
dpot3 .0110412 .0140843 0.78 0.437 -.0172343 .0393167
elv_sh -.0340051 .0256236 -1.33 0.190 -.0854467 .0174365
slope -.001478 .0028106 -0.53 0.601 -.0071206 .0041646
infants -.0110865 .0055716 -1.99 0.052 -.0222719 .000099
children -.0032327 .0030429 -1.06 0.293 -.0093416 .0028761
adults -.0014215 .0037917 -0.37 0.709 -.0090336 .0061907
bfe4_1 .0765025 .0258096 2.96 0.005 .0246875 .1283175
bfe4_2 -.0650494 .0647687 -1.00 0.320 -.195078 .0649793
bfe4_3 .0263351 .0239569 1.10 0.277 -.0217605 .0744306
_cons -2.679465 12.25133 -0.22 0.828 -27.27503 21.9161
i. What is the coefficient on mita? Are the results significant at the 5% level? Interpret
them.????
Samir Orujov 17
Mita Boundary Mita p-values significance at 95%

(d_bond) Coefficients confidence level
<100 -0.063 0.000 YES

<75 -0.063 0.001 YES
<50 -0.067 0.000 YES
(d) Is there any difference between the coefficients on mita in (b) and (c) as well as their signifi-
cances? If yes, why might there be a difference? If no, why should we expect the same results?
I do not know the exact meaning of the variables so I can not come up with
economic interpretation. However, this kind of change in the coefficients and
standard errors may be due to overfitting, multicollinearity, OVB or wrong model
specification. In this specific case, I think overfitting is highly possible. When we
are adding 9 polynomial terms they will capture all variation and some other
variables will be insignificant. Why overfitting is most possibly the case is because
the coefficients and standard errors are not far from each other and actually in
the first case we are rejecting the Null Hypothesis in the very margin of 10%.
Moreover, we now that we have to increase sample size when we have more
parameters to estimate. The sample size is fixed and the only difference is that we
are fitting a bigger model in the first case. This fact along with the fact that
parameter and standard error estimates of mita are not too far from each other, I
am assuming that overfitting was the case most probably.
Use firmTax.dta for the following exercise. Small firms in country C are enjoying a simplified tax bracket,
with corporate tax dropping from 20% to 5% if their operating revenues are lower than 6,420,000 in local
currency. The research department of the Ministry of Economic Development (MED) claims that small
and more efficient firms largely benefit from the simplified tax bracket. They provide RD estimates to
support their claims. You have the data on operating revenues, gross profit, gross profit margin, net profit
margin, cost to revenue ratio.
(a) Can you reconstruct the argument of the MED using Regression Discontinuity design?
In STATA 12, it is not possible to use rdrobust and other relevant commands. I
have STATA12 so I used the feasible approach.
I begin with graphical analysis following Lee, Moretti and Buttler (2004) –
hereinafter referred to as LMB(2004) - and I benefited from their STATA codes.
For doing graphical analysis for each variable of interest I use a program (it is
based on LMB(2004) STATA codes) not to redo everything again and again. The
program is called “qrafik” and you can see the whole program in the STATA do
file. Here I normalized threshold variable (opre) so that the threshold is set at
Samir Orujov 18
exactly 1 (one). In this case, companies with threshold value above 1 will not get
the treatment and the ones below 1 will get. The reason for this normalization
(division by 6,420,000) is to get nice graphs. Then, I find the mean of variables of
interest (in our case, net profit margin(prma), gross profit margin(grma), cost to
revenue ratio(cost_share) and gross profit(gros)) within 0.01-wide intervals of the
normalized threshold variable. Moreover, I fit 4th order polynomial to the data and
95% confidence interval of the fitted values following LMB(2004). The following
graphs are generated:
Graph 1. Discontinuity around threshold 1. This is the graph of mean net profit margin
of firms drawn with 0.01 – wide intervals of the threshold variable.
Samir Orujov 19
Graph 2. This is the mean of gross profit margin within drawn 0.01 unit intervals of
threshold variable. We see some outliers but in general discontinuity around the
threshold is present.
Samir Orujov 20
Graph 3. This is the graph of mean cost to revenue ratio drawn within 0.01 – wide
intervals of the threshold variable.
Samir Orujov 21
Graph 4. This is the graph of mean gross profit drawn in 0.01-wide intervals off the
threshold variable.
Further, for constructing RDD assumption for MED I am assuming the following:
• Assignment to tax bracket treatment is a deterministic function of a running
variable, namely, operating revenues;
• Probability of assignment jumps from 0 to 1 at cut-off (sharp RDD design);
• potential outcomes’ conditional expectation is continuous in the neighborhood of
the cutoff. In economic terms, firms cannot manipulate their operating revenues to
fall below or above the threshold. To generalize, I am assuming that firms just
below and just above the cutoff are equally comparable for their characteristics.
(b) What is the estimated treatment effect of simplified tax bracket on the variables of interest?
Assume for now that MED made correct assumptions about the RDD design.
Again here I benefitted from LMB(2004) Stata program and estimated the effect
of simplified tax bracket on the variables of interest. The following are my
estimation results:
Samir Orujov 22

F( 1, 65923) = 20.01
Model 3828.95124 1 3828.95124 Prob > F = 0.0000
Total 12616873.1 65924 191.385127 Root MSE = 13.832
prma Coef. Std. Err. t P>|t| [95% Conf. Interval]
cutoffdummy 2.775093 .6203391 4.47 0.000 1.559229 3.990958

_cons 5.7899 .6179774 9.37 0.000 4.578664 7.001136
Table 1. Effect of Tax Bracket on Net Profit Margin

F( 1, 65923) = 7.77
Model 3722.42474 1 3722.42474 Prob > F = 0.0053
Total 31570923.5 65924 478.898785 Root MSE = 21.883
grma Coef. Std. Err. t P>|t| [95% Conf. Interval]
cutoffdummy 2.736218 .98138 2.79 0.005 .8127127 4.659722

_cons 17.56307 .9776439 17.96 0.000 15.64689 19.47926
Table 2. Effect of simplified tax bracket on Gross Profit Margin(given RDD

Assumptions hold)
Samir Orujov 23

F( 1, 65923) = 7.76
Model .371372007 1 .371372007 Prob > F = 0.0053
Total 3155.75113 65924 .047869534 Root MSE = .21878
cost_share Coef. Std. Err. t P>|t| [95% Conf. Interval]
cutoffdummy -.0273302 .0098117 -2.79 0.005 -.0465611 -.0080992

_cons .8243753 .0097744 84.34 0.000 .8052175 .8435331
Table 3. Effect of simplified tax Bracket on Cost to Revenue Ratio(given RDD

Assumptions hold)

F( 1, 65923) = 311.55
Model 1.6772e+14 1 1.6772e+14 Prob > F = 0.0000
Residual 3.5489e+16 65923 5.3835e+11 R-squared = 0.0047
Total 3.5657e+16 65924 5.4088e+11 Root MSE = 7.3e+05
gros Coef. Std. Err. t P>|t| [95% Conf. Interval]
cutoffdummy -580807.6 32905.48 -17.65 0.000 -645302.3 -516312.8

_cons 1137903 32780.21 34.71 0.000 1073654 1202152
Table 4. Effect of simplified tax Bracket on Gross Profit(given RDD Assumptions

hold)
(c) Do you think that the continuity of the expectation–a standard RD assumption–is likely to
hold in this setting? Please provide an intuitive economic argument, along with the formal
test from operating revenues.
Samir Orujov 24
In economic terms, firms can precisely manipulate with their operating revenues
to get the beneficial tax bracket which completely destroys RDD setup. This kind
of manipulations should show up in the discontinuity of the density of running
variable which is operating revenues in this case. The details can be found in
McCrary (2006). Now I will perform Mccrary formal test of possible
manipulation and I will make the use of his example STATA code(“Codes for
Manipulation of the Running Variable” n.d.). We can see the discontinuity in the
conditional expectation (as if more mass is accumulated below the threshold
where firms get benefit of lower taxes) and as noted below log difference in
heights is statistically significant even at 1% level of significance. The results are
as follows:
3
2
1
0
-1 0 1 2 3
Conditional Expectation of Operational Revenues with respect to treatment

Discontinuity estimate (log difference in height): .314627988
Standard error: (.020810741)
(d) * Please derive and estimate corporate tax elasticity around the cutoff.
I am using two approaches because I had some confusion about the question
Samir Orujov 25
asked. In the hint, we are asked to find the responsiveness of variables to tax rate
changes, but tax elasticity is the response of variables of interest to change in
taxes.
First of all, I want to tackle the response of variables to tax rate changes
assuming RDD assumptions hold. Firms above threshold will pay 20% and below
the threshold will pay 5% tax of their operating revenues. I created a new
variable called ltr(log tax rate) which gets the value of log(20) if firms are not
treated and gets the value of log(5) if firms are treated and then I estimate the
following regression :
𝐥𝐨𝐠(𝒚𝒊 ) = 𝜶 + 𝜷 ∗ 𝐥𝐨𝐠(𝒕𝒂𝒙 𝒓𝒂𝒕𝒆𝒊 ) + 𝝐𝒊
I use all given variables as dependent variables and estimated beta coefficients
are the tax elasticity of the respective variables. The regression results are as
follows:

F( 1, 11234) =32213.71
Model 112.187127 1 112.187127 Prob > F = 0.0000
Total 151.310539 11235 .013467783 Root MSE = .05901
logopre Coef. Std. Err. t P>|t| [95% Conf. Interval]
ltr .1472659 .0008205 179.48 0.000 .1456576 .1488742

_cons 15.32251 .0018584 8244.92 0.000 15.31887 15.32615

F( 1, 11234) = 3.66
Model 8.97723508 1 8.97723508 Prob > F = 0.0558
Total 27565.2849 11235 2.45351891 Root MSE = 1.5662
loggros Coef. Std. Err. t P>|t| [95% Conf. Interval]
ltr .0416583 .0217758 1.91 0.056 -.001026 .0843427

_cons 13.08886 .0493214 265.38 0.000 12.99218 13.18554
Samir Orujov 26

F( 1, 11234) = 10.05
Model 4.29664754 1 4.29664754 Prob > F = 0.0015
Total 4807.32488 11235 .427888285 Root MSE = .65387
logcost_sh~e Coef. Std. Err. t P>|t| [95% Conf. Interval]
ltr .0288201 .0090912 3.17 0.002 .0109998 .0466404

_cons -.352943 .0205913 -17.14 0.000 -.3933054 -.3125805

F( 1, 11234) = 23.51
Model 57.5553676 1 57.5553676 Prob > F = 0.0000
Total 27565.5077 11235 2.45353874 Root MSE = 1.5648
loggrma Coef. Std. Err. t P>|t| [95% Conf. Interval]
ltr -.1054808 .0217567 -4.85 0.000 -.1481277 -.0628339

_cons 2.37181 .0492782 48.13 0.000 2.275216 2.468404

F( 1, 11234) = 64.02
Model 172.33521 1 172.33521 Prob > F = 0.0000
Total 30412.0481 11235 2.70690237 Root MSE = 1.6407
logprma Coef. Std. Err. t P>|t| [95% Conf. Interval]
ltr -.182523 .0228114 -8.00 0.000 -.2272374 -.1378087

_cons 1.298631 .0516671 25.13 0.000 1.197354 1.399908
Moreover, I computed corporate tax (+interest) paid by each firm and then I estimated
how increase in tax affects different variables. The key assumption here is that taxes are
lump-sum. The results are in the output file.
Furthermore, interesting fact is that as the tax rate increases around the cutoff gross
profits and operating revenues are increasing while margins are declining. This fact
means that less efficient firms are choosing to be below the cutoff while more viable
companies dare to stay above and pay higher tax payments.

Problem Set 1

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Problem Set 1

Hochgeladen von

Copyright:

Verfügbare Formate

Samir Orujov 1

Submit by October 18 class for full credit

Problem 1 (15 points)

set obs 100

(b) Generate a variable, i, running from 1 to 100.

(e) Generate xi as a random uniform variable with range [0, 1].

(f) Generate outcome variable yi where β = 0.3,

(g) Estimate β by OLS.

Source SS df MS Number of obs = 100

y Coef. Std. Err. t P>|t| [95% Conf. Interval]

x -.1085041 .3717538 -0.29 0.771 -.8462374 .6292292

(1) 0.09 1 0.7710 #

all 0.09 1 0.7710

ii. Test H0 : β = 0.6.

test x=0.6 , mtest

(1) 3.63 1 0.0596 #

all 3.63 1 0.0596

Estimate β by running a regression of zi on xi and qi.

Source SS df MS Number of obs = 100

z Coef. Std. Err. t P>|t| [95% Conf. Interval]

x .0069954 .3412651 0.02 0.984 -.6703214 .6843121

i. Test H0 : β = 0.3 and report the p-value. Interpret the results.

(1) 0.21 1 0.6509 #

all 0.21 1 0.6509

(i) Derive an explicit formula for the OVB in this case.

The estimate of 𝛽 from 2 will be given by the following formula:

𝑏𝑖𝑎𝑠 = 𝛾 + 0.5 ∗ 𝛾 = 0.2 + 0.1 = 0.3

Problem 2 (30 points)

(a) Open data1 using “use” command.

use “./data1.dta” , clear

merge m:1 id_village using "data2.dta"

(c) Set seed to 10101, so it is easier to check some of the answers.

reg y treatment // HOMOSKEDASTICITY

(e) Please try several ways of estimating the standard errors:

Source SS df MS Number of obs = 2200

y Coef. Std. Err. t P>|t| [95% Conf. Interval]

treatment -.3316979 .1299926 -2.55 0.011 -.5866192 -.0767767

ii. White standard errors,

Linear regression Number of obs = 2200

treatment -.3316979 .1303434 -2.54 0.011 -.5873071 -.0760888

iii. Bootstrap standard errors.

Linear regression Number of obs = 2200

Observed Bootstrap Normal-based

treatment -.3316979 .1405845 -2.36 0.018 -.6072386 -.0561573

(f) Compare the results in (i-iii) and discuss.

average of standard errors estimated from pseudo-samples. Bootstrap is

I will use clustered standard errors in the village level:

reg y treatment , vce(cluster id_village)

(Std. Err. adjusted for 40 clusters in id_village)

treatment -.3316979 .3836552 -0.86 0.393 -1.107714 .444318

Then I use clustered jackknife method which involves bootstrapping. Both

reg y treatment , vce(jacknife, cluster(id_village))

Linear regression Number of obs = 2200

(Replications based on 40 clusters in id_village)

treatment -.3316979 .3966937 -0.84 0.408 -1.134087 .4706908

reg w treatment , vce(cluster id_village)

Linear regression Number of obs = 2200

(Std. Err. adjusted for 40 clusters in id_village)

treatment .6807807 .3373106 2.02 0.050 -.0014944 1.363056

reg w treatment , vce(jacknife, cluster(id_village))

Linear regression Number of obs = 2200

(Replications based on 40 clusters in id_village)