The Toolbox3

Identifying
impacts
the toolbox
By
Jonas Mnsson* & Marie Gartell**
* Linnaeus University & Swedish
National Audit Office,
** Swedish National Audit Office
Summary of principles of
policy evaluation
Evaluation questions
1. Have goals been achieved?
2. Is goal fulfilment due to the project under
examination?
3. Is the value of the observed impacts larger than the
cost of achieving them?
4. (Is there something within the activity that have
influenced the results?)
Evaluation type
1. Monitoring
2. Impact evaluation
3. Efficiency evaluation (Cost-benefit analysis, CBA)
4. (Process evaluation)
Impact evaluation
Evaluation question: Has the goals been
achieved and are they an effect of the
project?
Principles
Activity
Result
A
A
Meas
Impact = Result Aure
Activit
Resul Result B
yB
tB
By comparing the results of activity A with the results of
activity B its possible to talk about the impact of A,
relative to B.
Case 1
Result A = impact + other factors that

influence
Result
B =outcome
other factors
(O) that
influence outcome (O) Result A
Result B = impact + O O = impact
Case 2
Result A = impact +
selection + O Result B =
O
Result A Result B =
impact + selection + O
O=
impact + selection
How can we handle
Identification
Assignment of treated and untreated
should be exogenous
P(T) = P(UT)
The goal with methods within the
Toolbox is to create, or make

assignments exogenous or make it
believable that non- exogenous
assignment has been taken care of
5
ange presentationens titel under "infoga

sidhuvud/sidfot"
201604-14
Evaluation designs
Non
experimental
Goal
achievement evaluations / monitoring

will not provide without further assumptions
(nothing else happened / ceteris paribus)
Experimental
Random
design
control trials (RCT)

Lab experiment
Field Experiment
Natural experiment
Quasi Regression based

experimental
DiD
design
RD
Matchin
g
Chara
cterist
ic
match
ing
CEM
PSM
Synth
Learning objective
The objective with the toolbox lecture is
to briefly present designs and methods

that can be used and connect them to
certain situations.
Teams should be able to narrow their
choice of method to one or two.
Presentation
Method
Question to be answered
(some times illustration)
Papers that can be used as reference
Used in SNAO
Papers are convenience sampled and
should be seen as illustrations
Experimental
Designs
Random control trials (social

experiments)
Assignment to a program is done at random
before the program start

All individuals in the target population have
the same initial probability of being

assigned Individual characteristic does
not matter for assignment no selection
problem
Impact is determined by just comparing the
mean outcome for

the treated and untreated
Advantages with RCT

High internal validity measure what is
supposed to be measured
If done correct low costs
Disadvantages - problems
Often small scale experiment, i.e. low
external validity
If something goes wrong, e.g. systematic
dropout the internal validity might be low

Ethical issues
It is unlikely that Performance auditors
enter the process before intervention.

But a recommendation could be that
Papers
Bennmarker, Helge & Grnqvist, Erik &
ckert, Bjrn, 2013. "Effects of contracting

out employment services: Evidence from
a randomized experiment," Journal of Pub
lic Economics, Elsevier, vol. 98(C), pages 6
8-84.
Hgglund, P. (2005). Natural and Classical
Experiments in Swedish Labour Market
Policy, Ph.D. Dissertation Stockholm
university.
Natural experiment
A
natural experiment is a shock in a

system that are not related to selection
(individual characteristics) that in a
clear way determines a state before and
after
Examples
Palme & Megier (AER) used variation in the
timing of the school reforms in the 1960s the

to look at the impact of this school reform
Mnsson and Dahlander (2009) uses the
hurricane Gudrun to study differences in

attitude formation towards guest workers
Mnsson & Delander (2006) uses differences in
rule interpretation by civil servants
Natural experiments
Principles:
The shock into a system creates a random
variation independent of individual

characteristics
What
is a shock?
Practical
solutions (school example)

Budget restrictions/allocation: a) All should get
but some get later, b) Money runs out
Managers that refuse to implement, interpret
rules wrong, are to
alert
Problem
Papers
Mnsson & Delander (2007), Forensic evaluation:
A strategy for and results of an impact evaluation

of a universal labor market program The
Swedish Activity Guarantee,Wissenschaftszentrum
Berlin fr Sozialforschung (WZB), Discussion paper
2007-102 https://www.wzb.eu/www2000/alt/ab/pdf
/dp_sp_i_2007-102.pdf
Mnsson & Dahlander (2009), Social interaction
impact on attitudes: Native Swedes attitudes

towards labour immigrants and guest workers
after hurricane Gudrun, Baltic Economic Journal,
11(1): 51-64
Meghier, C. & Palme, M. (2005), Educational
Field experiment
Used to investigate decision making
e.g. discrimination
The respondent react to a fake
situation
Can keep a lot of factors constant and
let the variable of interest vary, e.g.

gender, age, ethnicity
1
8

sidhuvud/sidfot"
201604-14
Field experiment
Advantages
High internal validity
Disadvantages
Ethnical issues is it O.K. to fool someone?
Is it O.K. that the parliaments audit
organisation fool civil servant to see if

they discriminate? I think so!!
Generally low external validity since it is
costly to administrate.
1
9

sidhuvud/sidfot"
201604-14
Papers
Ahmed, A. & Hammarstedt, M. (2008), Discrimination in the

rental housing market: A field experiment on the Internet,
Journal of Urban Economics. 64(2), p 362-372
Carlsson, M., & Rooth, D. O. (2007). Evidence of ethnic

discrimination in the Swedish labor market using
experimental data. Labour Economics 14(4), p. 716-729
.
Carlsson, M. & Rooth, D. (2012). Revealing Taste-Based

Discrimination
in
Hiring:
A
CorrespondenceTesting
Experiment with GeographicVariation. Applied Economics
Letters 19: p. 1861-1864
Eriksson, S. & Rooth, D-O. (2014). Do Employers Use

Unemployment as a Sorting Criterion When Hiring? Evidence
from a Field Experiment. American Economic Review 104(3), p.
1014-1039.
Widerstedt, B. & Mnsson, J. (2016) A Warm Welcome?
Laboratory experiment
Used to investigate behaviours
(behaviour economics and applied

psychology)
A
situation is created to mimic a real

world situation in a laboratory
E.g. a meeting (not possible with field
experiments)
Can
be used to study normally

unobservable behaviour
E.g. discrimination wrt. look and
languages/dialects
2
1

sidhuvud/sidfot"
201604-14
Laboratory experiment
Advantages
High internal validity
Disadvantages
Low external validity
Costly
In earlier studies (Kahnemann nobel prize
2
2
2002) with few exceptions students was

used, which after some time delay
resulted in massive criticism because of
the extremely low external validity.Today
work are done on more relevant
2016populations
=
lab
in
field
sidhuvud/sidfot"
04-14
Papers
Ahmed, A. (2010) What is in a surname? The r
ole of ethnicity in
economic decision making. Applied Economics,
2010, 42(21),
2715-2723.
Arai, M, Gartell, M, Rdin, M och G zcan
(2016), Stereotypes of appearance, noncognetive characteristics and labor market

chances. Conference paper EALE 2016. (find
conference and download paper)
Rdin & zcan (2011), Is It HowYou Look or
Speak That Matters?

-An Experimental Study Exploring the
Quasiexperimental
design
Quasi-experimental design
Common:
Impact
evaluations has not been

planed/designed before program start
Selection
into a program is not

random, i.e. the probability (P) of
assignment is a function of
characteristics (x), i.e. P(x)
Simple
comparison of outcomes (y) will

be a function of the selection, y(P(x))
Regression techniques
Matching techniques
Combinations of matching and regressions
Objective with all methods is to eliminate
effects relating to selection
Controlling for
heterogeneity
Treat
ed
Befor
Untrea
ted
C
e
After
B
D
B-A = effect for treated, D-C=
effect UT
C-A = Initial difference =
selection effect
Impact = (B-A)-(D-C)=(B-D) +
(C-A)
Difference in Difference
Di
D
1
2
R
1
0
3
A
4
B
B
2
In practice
D1 = 1 if Treated, D1=0 if Untreated
D2 = 1 if After, D2 = 0 if Before
D3=D1 x D2 = Difference in
Difference estimate
Y 1 D1 2 D2 3 D3 i
X
Advant
Handle large samples = external validity
ages
Intuitive
Disadvantages
Does
the regression take care of selection

problems? Parallel paths pre treatment
Papers
Avdic, D och M Gartell (2015) "The study
pace among college students before and

after a student aid reform: some Swedish r
esults", Labour Economics, 2015, vol. 33,
pp. 26- 40 ,Working paper 2011:12
(Audit Report) SNAO (2014) Effekter av
frndrade regler fr deltidsarbetslsa

(Impacts of changed rules for part-time
unemployed), Riksrevisionen 2014:5,
Stockholm.
Regression discontinuity
9
8
7
6
T = 1 om S
>
T=c 0 om S
<c
5
4
3
2
1
0
3
2

sidhuvud/sidfot"
201604-14
Regression discontinuity
Identification is that the reform has a
sharp cut off.

By comparing observations lying closely
on either side of the threshold, it is
possible to estimate the local average
treatment effects when randomization was
not possible.
RD example
Evaluation
3
3
if the daddy month in Sweden.

Implemented 1 Jan.
Before = Dec, After = Jan.
2016
sidhuvud/sidfot"
School class size = above 30 students =>
04-14
Advant
Handle a lot of information = high external
ages
validity
Is assumed to produce high internal validity
Disadvantages
Fuzzy
cut off point => Contamination

=> low internal validity
Fuzzy cut off = supposed to be implemented 1/1
but some did it earlier and some later.

Sensitive
to the functional form used.

Sometimes non-linearity is mistaken for
discontinuities.
Papers
Fredriksson, P., Oosterbeek, H. and
ckert, B. (2013). "Long-term effects

of class size", Quarterly Journal of
Economics, 128(1), 249-285.
Pettersson-Lidbom, P. (2007), Do Parties
Matter for Economic Outcomes? A

Regression-Discontinuity Approach, Journal
of the European Economic
Association,Volume 6, Issue 5.
Matching
approaches
Problem:We have data before after, or
only after, but we know who is treated

and untreated and we know the
selections mechanisms
Solution: Find
individuals/firms/organisations that in
the respect of selection is as close as
possible to a treated unit.
Logic: If we can find untreated units that
would have the same initial probability to
How is it possible to find a

comparison group
Matching techniques
Early studies
For each individual that are treated, find an individual
with the same important characteristics with in the
target group and use these statistical twins and control
group.
Statistical twins, one-to-one matching
Problems: If the number of important characteristics

increases the number of comparison possibilities
increases rapidly
Example:
Match on gender (2), education (3), immigrant
background (2), age (6 intervals)
=2*3*2*6=72 different comparisons.
The curse of dimensionality
Propensity score
Rosenbaum & Rubin (1983), partly also
Heckman & Robb (1985)

The basic idea of propensity score matching:
If the selection process is observable and known it will be
possible to estimate the model P(assigned(x))
Run the model and compute the predicted probability to
be assigned
For each individual in the treatment group the goal is to
find one (several) untreated
individuals that have the same probability to be assigned
Result:
Two groups (treated untreated) with the same (not 1/N)

probability to be assigned,
i.e. constructing the same preconditions as for a RCT.
Matching take place in only one dimension
Propensity score (cont..)

The
common probability support

problem
A
comparison is only possible for those

individuals where at least one match is
identified
Example:
In the audit of start-up grants (SUG) we tried to
search for matches among those not attending

any program (no treatment). For our 15 000
treated we ended up with matches for only 1250,
i.e. less than 10 per cent.
(Gerfin and Lechner (2000) rejects 14 percent, Frlish et.al

(2000) rejects 27 percent)
Conclusion: Individuals that entered the SUG were
to unlike those that never had entered a program

=> propensity score does not work
Advantages
1. Gives the evaluator a possibility to
replicate a RCT
2. Research (Dehijia and Wahba, 1998;
1999) has shown the propensity
score
matching
method
can
replicate the results of a RCT (high
internal validity)
3. Since PSM dont have any limitations
on data (more than accessibility) large
samples can be evaluated (high
external validity)
Disadvantages
Evaluators
has to put a lot of effort in

investigating about
how do the selection process works and
how to model it?
Needs extremely good access to data
Heterogeneity
between treated and

potential controls
=> few observations on common
support Evaluation results for only a
few
Propensity score matching

When will it work O.K.:
Relatively homogeneous target group
Common goals that are well defined
Clear intervention logic
Problem
If there are factors that affect selection
and outcome that are
not observable in the data and no good
proxy variable exist
=> biased estimates and standard errors
4
3

sidhuvud/sidfot"
201604-14
Papers
Caliendo, M. and Kopeinig, S. (2005). Some
Practical Guidance for the Implementation

of Propensity Score Matching, IZA DP No.
1588, http://ftp.iza.org/dp1588.pdf
Rosenbaum, P. R. (2002): Observational
Studies. Springer, New
York. (book)
Rubin, D. (1974): \Estimating Causal
Eects to Treatments in Randomised and
Nonrandomised Studies," Journal of
Educational Psychology, 66, 688-701.
Rosenbaum, P., and D. Rubin (1983): \The
Central Role of the Propensity Score in
Audit reports
Riksrevisionen (2008) Std till start av
nringsverksamhet, Ett
framgngsrikt program, RiR 2008:24,
Riksrevisionen.
Behrenz, L., Delander, L & Mnsson, J. (2016) Is
starting business a sustainable way out of

unemployment? Treatment effects of the
Swedish start-up subsidy, submitted to Journal
of Labour Research
Riksrevisionen (2010) Arbetspraktik, RiR
2010:05, Riksrevisionen
Mnsson & Lundin (2016) When outcome
definition determines the result in impact

evaluations: an illustration using the Swedish
work- practice programme, Evidence and Policy,
Coarsened Exact Matching
Problem: Sometimes groups that are

(CEM)
targeted homogeneous, intervention logic

unclear and the goal is a little less distinct.
PSM does not work that good since a
lot of separate regressions is

needed
Idea: Go back to the 70s and match in
4
6
characteristics that might influence

selection and outcome. It works today
because good computer capacity
sidhuvud/sidfot"
201604-14
Advant
Free of functional form assumptions
ages
(non-parametric)
Manage heterogeneity e.g. gender,
education etc..
Disadvant
age:
Data
intensive
No
solution
might
Papers
Blackwell,
M.; Iacus, S., King, G. &

Porro, G. (2009) cem: Coarsened
exact matching in Stata,The Stata
Journal, 9(4): 524-546
Mnsson
4
8
& Delander (2016): Mentoring

as a way of integrating refugees on the
labour market Evidence from a
Swedish pilot,WEAI conference
Singapore
2016
ange
presentationens titel
under "infoga
2016sidhuvud/sidfot"
04-14
Audit
reports
Riksrevisionen 2014:05
Deltidsarbetslshet
Combinations between
matching and
regressions
Conditional diff in diff

Combine matching and diff in diff
Treated and untreated are matched on before data

Diff. in Diff is applied to the matched data
The idea is that the regression should take away remaining
differences. Normaly the variable efter and treated are
non-significant but the interaction (DiD) is
Paper
Widerstedt, B. & Mnsson, J. (2015), Can business co
unselling help SMEs grow? Evidence from the Swedi

sh business development grant programme, Journal
of Small Business and Enterprise Development,
22(4): 652-665
Synthetic control groups

All have recived the same treament e.g.
reduction in VAT => No conutrafactual

exist by definition
SCG = construct a hypothetical/synthetic
control group
Compare over time by e.g. DiD
Exempel: Falkenhall et al. (2016),
Example
Reduction in the VAT for Swedish
restaurants
Construct a synthetic restaurant industry
by using data from other industries
The method is completely data driven
and do not relay on any subjective
choice
Before
SCG
After
Advantage
Sometimes its the only option to close in
on impacts of large national reforms

Disadvantage
Will never be perfect
Extremely computer intense =
take long time
Papers
Abadie, A., Diamond, A. & Hainmuller, J.
(2010).Synthetic Control Methods for Comparative

Case Studies: Estimating the Effect of Californias
Tobacco Control Program, Journal of the American
Statistical Association, 105(490): 493- 505
Abadie, A. & Gardezabal, J. (2003). The Economic
Costs of Conflict: A Case Study of the Basque

Country, American economic review, 93(1): 113-132
Ando, M. (2014). Dreams of urbanization:
Quantitative case studies on the local impacts of

nuclear power facilities using the synthetic control
method, Journal of Urban Economics, 85: 68-85
Falkenhall, Mnsson och Tano (2016) The impact of
the VAT reduction in Swedish Restaurant sector A

synthetic control group approach, Scottish
Check list and key
Do we have treated and untreated groups?

questions
No -> Impact not possible
Yes
Why are the treated treated?

Random -> (e.g. some variation in
implementation = natural
experiment) => compare outcome
Selection?
Can we observe (in data) characteristics
that are key element

in selecting treated and that might affect
the outcome?
No -> Impact not possible
Yes -> Quasi experimental methods
Do we have data both

Yes ->and
before
after? (DiD or Conditional DiD)
regression
No- > matching methods (weak identification
most times)
Is there a common goal and the selection is
quite homogeneous
Yes =Props score
No => matching on characteristics
Do prices on impacts exist or can be
constructed?
No-> Efficiency not possible
Yes -> Efficiency evaluation (Value for Money in
an impact context)
Summary
To identify impacts a relevant control
6
0
group is needed
Identification, identification,
identification.
Randomising is the most secure way
however sometimes it is neither possible
or suitable
Look for variation in the
implementation natural
experiment
Quasi experimental methods is
ange
presentationens
titel under
"infogaintense
2016almost
always
data
sidhuvud/sidfot"
04-14
The Family tree of

evaluation methods
) P V OJ
M A D ( W ITH
aw
OJ
)PY
COPY
M A D ( W I TH
IAl
( OPV
M
D[
IT H
A ti"'i'lr -
't
(J
5martDraw
MAOf
I Al
'WI TH
A TR
Quasi
F!'
0
rimental
Dl
OF
W I TH A U l A l
5
1
5martDraw
1;...-.rimental
OF
OJ
5martDraw
aw
)
Ul
COPY
IHDf
5
1
O PV 0
OF
Synthetic
>P v o r
aw
>PY OF
aw
=
diff-in diff
conditional difference
in difference
)p
V OF
aw
MAD I W I I H
H I Al
COPV
OF
5martDraw
Ot
5
1

The Toolbox3

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

The Toolbox3

Hochgeladen von

Copyright:

Verfügbare Formate

Identifying

Result A = impact + other factors that

Toolbox is to create, or make

ange presentationens titel under "infoga

achievement evaluations / monitoring

control trials (RCT)

Quasi Regression based

to briefly present designs and methods

choice of method to one or two.

should be seen as illustrations

Random control trials (social

before the program start

the same initial probability of being

mean outcome for

Advantages with RCT

dropout the internal validity might be low

enter the process before intervention.

ckert, Bjrn, 2013. "Effects of contracting

natural experiment is a shock in a

timing of the school reforms in the 1960s the

hurricane Gudrun to study differences in

Mnsson & Delander (2006) uses differences in

rule interpretation by civil servants

variation independent of individual

solutions (school example)

A strategy for and results of an impact evaluation

Mnsson & Dahlander (2009), Social interaction

impact on attitudes: Native Swedes attitudes

let the variable of interest vary, e.g.

ange presentationens titel under "infoga

organisation fool civil servant to see if

ange presentationens titel under "infoga

Ahmed, A. & Hammarstedt, M. (2008), Discrimination in the

Carlsson, M., & Rooth, D. O. (2007). Evidence of ethnic

Carlsson, M. & Rooth, D. (2012). Revealing Taste-Based

Eriksson, S. & Rooth, D-O. (2014). Do Employers Use

Widerstedt, B. & Mnsson, J. (2016) A Warm Welcome?

(behaviour economics and applied

situation is created to mimic a real

be used to study normally

ange presentationens titel under "infoga

2002) with few exceptions students was

(2016), Stereotypes of appearance, noncognetive characteristics and labor market

Rdin & zcan (2011), Is It HowYou Look or

Speak That Matters?

evaluations has not been

into a program is not

comparison of outcomes (y) will

effects relating to selection

the regression take care of selection

pace among college students before and

frndrade regler fr deltidsarbetslsa

ange presentationens titel under "infoga

sharp cut off.

if the daddy month in Sweden.

cut off point => Contamination

but some did it earlier and some later.

to the functional form used.

ckert, B. (2013). "Long-term effects

Matter for Economic Outcomes? A

Problem:We have data before after, or

only after, but we know who is treated

would have the same initial probability to

How is it possible to find a