Sie sind auf Seite 1von 2

Graphs in regression discontinuity design in "Stata" or "R" - Cross Validated

sign up

Cross Validat ed is a quest ion and answer sit e f or people int erest ed in st at ist ics, machine learning, dat a
analysis, dat a mining, and dat a visualizat ion. It 's 100% f ree, no regist rat ion required.

log in

tour

help

T ake t he 2-minut e t our

Graphs in regression discontinuity design in Stata or R


Lee and Lemieux (p. 31, 2009) suggest the researcher to present the graphs while doing Regression discontinuity design
analysis (RDD). They suggest the following procedure:
"...for some bandwidth h , and for some number of bins K 0 and K 1 to the left and right of the cutoff value,
respectively, the idea is to construct bins (bk ,bk+1 ], for k = 1, . . . , K = K 0 + K 1 , where bk = c ( K 0 k + 1) h. "
c=cutoff point or threshold value of assignment variable
h=bandwidth or window width.

...then compare the mean outcomes just to the left and right of the cutoff point..."
..in all cases, we also show the tted values from a quartic regression model estimated separately on each side of the
cutoff point...(p. 34 of the same paper)
My question is how do we program that procedure in Stata or R for plotting the graphs of outcome variable against
assignment variable (with confidence intervals) for the sharp RDD.. A sample example in Stata is mentioned here and
here (replace rd with rd_obs) and a sample example in R is here. However, I think both of these didn't implement the
step 1. Note, that both have the raw data along with the fitted lines in the plots.
Sample graph without confidence variable [Lee and Lemieux,2009]

Thank you in advance.


r

regression

data- visualiz ation

edited Feb 15 '13 at 22:38

stata

asked Dec 5 '12 at 13:04


Metrics
1,285

16

In response to your flag, a good way to revive your question is to edit it and offer a bounty: This will
bump your question and get more people interested in it. If you feel this question might be better
served on Stack Overflow, let us know and we can migrate it for you. chl Feb 9 '13 at 20:06
I would like this to be migrated to Stack Overflow.

Metrics Feb 12 '13 at 12:10

Unfortunately, this question is too old to be migrated to Stack Overflow. I believe it belongs on
Cross Validated but if you want to ask on Stack Overflow (putting emphasis on the programming
aspect and providing a minimal reproducible example), let me know and I will close it here. chl
Feb 14 '13 at 10:29

http://stats.stackexchange.com/questions/45184/graphs-in-regression-discontinuity-design-in-stata-or-r

Graphs in regression discontinuity design in "Stata" or "R" - Cross Validated

Thanks chi for the bounty

Metrics Feb 15 '13 at 2:54

You should use cmogram. It does everything you need. Yan Song Apr 7 '13 at 19:49

1 Answer
Is this much different from doing two local polynomials of degree 2, one for below the
threshold and one for above with smooth at K i points? Here's an example with
Stata:
use votex // the election-spending data that comes with rd
tw
(scatter lne d, mcolor(gs10) msize(tiny))
(lpolyci lne d if d<0, bw(0.05) deg(2) n(100) fcolor(none))
(lpolyci lne d if d>=0, bw(0.05) deg(2) n(100) fcolor(none)), xline(0)

legend(off)

Alternatively, you can just save the lpoly smoothed values and standard errors as
variables instead of using twoway . Below x is the bin, s is the smoothed mean, se is
the standard error, and ul and ll are the upper and lower limits of the 95%
Confidence Interval for the smoothed outcome.
lpoly lne d if d<0, bw(0.05) deg(2) n(100) gen(x0 s0) ci se(se0)
lpoly lne d if d>=0, bw(0.05) deg(2) n(100) gen(x1 s1) ci se(se1)
/* Get the 95% CIs */
forvalues v=0/1 {
gen ul`v' = s`v' + 1.95*se`v'
gen ll`v' = s`v' - 1.95*se`v'
};
tw
(line ul0 ll0 s0 x0, lcolor(blue blue blue) lpattern(dash dash solid))
(line ul1 ll1 s1 x1, lcolor(red red red) lpattern(dash dash solid)), legend(off)

As you can see, the lines in the first plot are the same as in the second.
edited Feb 15 '13 at 22:06

answered Feb 14 '13 at


20:28
Dimitriy V. Masterov
7,455

10

33

@Dimitry: +1 for the solution. However, I would like to have the mean value for each bin (please run
the stata example above) rather than the scatter plot showing raw values. CI is great. Metrics
Feb 15 '13 at 2:54

I am not quite sure what you mean. I added coded showing how you get the smoothed means in
each bin by hand. If that's not what you are looking for, please explain what you have in mind in
more detail. As far as I can tell, these graphs usually show the raw data and the smoothed means.
Dimitriy V. Masterov Feb 15 '13 at 21:34
To quote Lee and Lemieux (p. 31, 2009): "A standard way of graphing the data is to divide the
assignment variable(d here) into a number of bins, making sure there are two separate bins on
each side of the cutoff point (to avoid having treated and untreated observations mixed together in
the same bin). Then, the average value of the outcome variable can be computed for each bin and
graphed against the mid-points of the bins". So, if there are 50 bins, then we will have only 25 data
points on the left and right and not all the raw data (e.g, Graph 6(b) of the reference: updated in
question) Metrics Feb 15 '13 at 22:39

Now it's clear! I agree on the kernel. But are you certain it's now not degree 0? That would
correspond to equally-weighted mean smoothing. Dimitriy V. Masterov Feb 15 '13 at 22:51

I believe that corresponds to lpoly with a regular kernel and a degree 0 polynomial
Dimitriy V. Masterov Feb 19 '13 at 4:35

http://stats.stackexchange.com/questions/45184/graphs-in-regression-discontinuity-design-in-stata-or-r

Das könnte Ihnen auch gefallen