Sie sind auf Seite 1von 62

Identifying

impacts
the toolbox

By
Jonas Mnsson* & Marie Gartell**
* Linnaeus University & Swedish
National Audit Office,
** Swedish National Audit Office

Summary of principles of
policy evaluation
Evaluation questions
1. Have goals been achieved?
2. Is goal fulfilment due to the project under

examination?
3. Is the value of the observed impacts larger than the
cost of achieving them?
4. (Is there something within the activity that have
influenced the results?)

Evaluation type
1. Monitoring
2. Impact evaluation
3. Efficiency evaluation (Cost-benefit analysis, CBA)
4. (Process evaluation)

Impact evaluation
Evaluation question: Has the goals been
achieved and are they an effect of the
project?
Principles
Activity
Result
A
A
Meas
Impact = Result Aure
Activit
Resul Result B
yB
tB
By comparing the results of activity A with the results of
activity B its possible to talk about the impact of A,
relative to B.

Case 1

Result A = impact + other factors that


influence
Result
B =outcome
other factors
(O) that
influence outcome (O) Result A
Result B = impact + O O = impact
Case 2

Result A = impact +
selection + O Result B =
O
Result A Result B =
impact + selection + O
O=
impact + selection
How can we handle

Identification
Assignment of treated and untreated

should be exogenous
P(T) = P(UT)
The goal with methods within the

Toolbox is to create, or make


assignments exogenous or make it
believable that non- exogenous
assignment has been taken care of
5

ange presentationens titel under "infoga


sidhuvud/sidfot"

201604-14

Evaluation designs
Non

experimental

Goal

achievement evaluations / monitoring


will not provide without further assumptions
(nothing else happened / ceteris paribus)

Experimental
Random

design

control trials (RCT)


Lab experiment
Field Experiment
Natural experiment

Quasi Regression based


experimental
DiD
design
RD

Matchin

g
Chara
cterist
ic
match
ing
CEM
PSM
Synth

Learning objective
The objective with the toolbox lecture is

to briefly present designs and methods


that can be used and connect them to
certain situations.
Teams should be able to narrow their

choice of method to one or two.

Presentation
Method
Question to be answered
(some times illustration)
Papers that can be used as reference
Used in SNAO
Papers are convenience sampled and

should be seen as illustrations

Experimental
Designs

Random control trials (social


experiments)
Assignment to a program is done at random

before the program start


All individuals in the target population have

the same initial probability of being


assigned Individual characteristic does
not matter for assignment no selection
problem
Impact is determined by just comparing the

mean outcome for


the treated and untreated

Advantages with RCT


High internal validity measure what is

supposed to be measured
If done correct low costs

Disadvantages - problems
Often small scale experiment, i.e. low

external validity
If something goes wrong, e.g. systematic

dropout the internal validity might be low


Ethical issues
It is unlikely that Performance auditors

enter the process before intervention.


But a recommendation could be that

Papers
Bennmarker, Helge & Grnqvist, Erik &

ckert, Bjrn, 2013. "Effects of contracting


out employment services: Evidence from
a randomized experiment," Journal of Pub
lic Economics, Elsevier, vol. 98(C), pages 6
8-84.
Hgglund, P. (2005). Natural and Classical
Experiments in Swedish Labour Market
Policy, Ph.D. Dissertation Stockholm
university.

Natural experiment
A

natural experiment is a shock in a


system that are not related to selection
(individual characteristics) that in a
clear way determines a state before and
after
Examples
Palme & Megier (AER) used variation in the

timing of the school reforms in the 1960s the


to look at the impact of this school reform
Mnsson and Dahlander (2009) uses the

hurricane Gudrun to study differences in


attitude formation towards guest workers

Mnsson & Delander (2006) uses differences in

rule interpretation by civil servants

Natural experiments
Principles:
The shock into a system creates a random

variation independent of individual


characteristics
What

is a shock?

Practical

solutions (school example)


Budget restrictions/allocation: a) All should get
but some get later, b) Money runs out
Managers that refuse to implement, interpret
rules wrong, are to
alert
Problem

Papers
Mnsson & Delander (2007), Forensic evaluation:

A strategy for and results of an impact evaluation


of a universal labor market program The
Swedish Activity Guarantee,Wissenschaftszentrum
Berlin fr Sozialforschung (WZB), Discussion paper
2007-102 https://www.wzb.eu/www2000/alt/ab/pdf
/dp_sp_i_2007-102.pdf

Mnsson & Dahlander (2009), Social interaction

impact on attitudes: Native Swedes attitudes


towards labour immigrants and guest workers
after hurricane Gudrun, Baltic Economic Journal,
11(1): 51-64
Meghier, C. & Palme, M. (2005), Educational

Field experiment
Used to investigate decision making

e.g. discrimination
The respondent react to a fake

situation
Can keep a lot of factors constant and

let the variable of interest vary, e.g.


gender, age, ethnicity

1
8

ange presentationens titel under "infoga


sidhuvud/sidfot"

201604-14

Field experiment
Advantages
High internal validity

Disadvantages
Ethnical issues is it O.K. to fool someone?
Is it O.K. that the parliaments audit

organisation fool civil servant to see if


they discriminate? I think so!!
Generally low external validity since it is
costly to administrate.
1
9

ange presentationens titel under "infoga


sidhuvud/sidfot"

201604-14

Papers

Ahmed, A. & Hammarstedt, M. (2008), Discrimination in the


rental housing market: A field experiment on the Internet,
Journal of Urban Economics. 64(2), p 362-372

Carlsson, M., & Rooth, D. O. (2007). Evidence of ethnic


discrimination in the Swedish labor market using
experimental data. Labour Economics 14(4), p. 716-729
.

Carlsson, M. & Rooth, D. (2012). Revealing Taste-Based


Discrimination
in
Hiring:
A
CorrespondenceTesting
Experiment with GeographicVariation. Applied Economics
Letters 19: p. 1861-1864

Eriksson, S. & Rooth, D-O. (2014). Do Employers Use


Unemployment as a Sorting Criterion When Hiring? Evidence
from a Field Experiment. American Economic Review 104(3), p.
1014-1039.

Widerstedt, B. & Mnsson, J. (2016) A Warm Welcome?

Laboratory experiment
Used to investigate behaviours

(behaviour economics and applied


psychology)
A

situation is created to mimic a real


world situation in a laboratory
E.g. a meeting (not possible with field

experiments)
Can

be used to study normally


unobservable behaviour
E.g. discrimination wrt. look and

languages/dialects

2
1

ange presentationens titel under "infoga


sidhuvud/sidfot"

201604-14

Laboratory experiment
Advantages
High internal validity

Disadvantages
Low external validity
Costly
In earlier studies (Kahnemann nobel prize

2
2

2002) with few exceptions students was


used, which after some time delay
resulted in massive criticism because of
the extremely low external validity.Today
work are done on more relevant
ange presentationens titel under "infoga
2016populations
=
lab
in
field
sidhuvud/sidfot"
04-14

Papers
Ahmed, A. (2010) What is in a surname? The r

ole of ethnicity in
economic decision making. Applied Economics,
2010, 42(21),
2715-2723.
Arai, M, Gartell, M, Rdin, M och G zcan

(2016), Stereotypes of appearance, noncognetive characteristics and labor market


chances. Conference paper EALE 2016. (find
conference and download paper)

Rdin & zcan (2011), Is It HowYou Look or

Speak That Matters?


-An Experimental Study Exploring the

Quasiexperimental
design

Quasi-experimental design
Common:
Impact

evaluations has not been


planed/designed before program start

Selection

into a program is not


random, i.e. the probability (P) of
assignment is a function of
characteristics (x), i.e. P(x)

Simple

comparison of outcomes (y) will


be a function of the selection, y(P(x))

Regression techniques
Matching techniques
Combinations of matching and regressions
Objective with all methods is to eliminate

effects relating to selection

Regression techniques
Controlling for

heterogeneity
Treat
ed
Befor

Untrea
ted
C

e
After
B
D
B-A = effect for treated, D-C=
effect UT
C-A = Initial difference =

selection effect
Impact = (B-A)-(D-C)=(B-D) +

(C-A)
Difference in Difference

Di
D
1
2

R
1
0

3
A

4
B

B
2

Regression techniques
In practice
D1 = 1 if Treated, D1=0 if Untreated
D2 = 1 if After, D2 = 0 if Before
D3=D1 x D2 = Difference in

Difference estimate

Y 1 D1 2 D2 3 D3 i
X

Advant
Handle large samples = external validity
ages
Intuitive

Disadvantages
Does

the regression take care of selection


problems? Parallel paths pre treatment

Papers
Avdic, D och M Gartell (2015) "The study

pace among college students before and


after a student aid reform: some Swedish r
esults", Labour Economics, 2015, vol. 33,
pp. 26- 40 ,Working paper 2011:12
(Audit Report) SNAO (2014) Effekter av

frndrade regler fr deltidsarbetslsa


(Impacts of changed rules for part-time
unemployed), Riksrevisionen 2014:5,
Stockholm.

Regression discontinuity
9
8
7
6

T = 1 om S
>
T=c 0 om S
<c

5
4
3
2
1
0

3
2

ange presentationens titel under "infoga


sidhuvud/sidfot"

201604-14

Regression discontinuity
Identification is that the reform has a

sharp cut off.


By comparing observations lying closely
on either side of the threshold, it is
possible to estimate the local average
treatment effects when randomization was
not possible.
RD example
Evaluation

3
3

if the daddy month in Sweden.


Implemented 1 Jan.
Before = Dec, After = Jan.
ange presentationens titel under "infoga
2016
sidhuvud/sidfot"
School class size = above 30 students =>
04-14

Advant
Handle a lot of information = high external
ages

validity
Is assumed to produce high internal validity
Disadvantages
Fuzzy

cut off point => Contamination


=> low internal validity
Fuzzy cut off = supposed to be implemented 1/1

but some did it earlier and some later.


Sensitive

to the functional form used.


Sometimes non-linearity is mistaken for
discontinuities.

Papers
Fredriksson, P., Oosterbeek, H. and

ckert, B. (2013). "Long-term effects


of class size", Quarterly Journal of
Economics, 128(1), 249-285.
Pettersson-Lidbom, P. (2007), Do Parties

Matter for Economic Outcomes? A


Regression-Discontinuity Approach, Journal
of the European Economic
Association,Volume 6, Issue 5.

Matching
approaches

Problem:We have data before after, or

only after, but we know who is treated


and untreated and we know the
selections mechanisms
Solution: Find

individuals/firms/organisations that in
the respect of selection is as close as
possible to a treated unit.
Logic: If we can find untreated units that

would have the same initial probability to

How is it possible to find a


comparison group
Matching techniques

Early studies
For each individual that are treated, find an individual
with the same important characteristics with in the
target group and use these statistical twins and control
group.

Statistical twins, one-to-one matching

Problems: If the number of important characteristics


increases the number of comparison possibilities
increases rapidly

Example:
Match on gender (2), education (3), immigrant
background (2), age (6 intervals)
=2*3*2*6=72 different comparisons.
The curse of dimensionality

Propensity score
Rosenbaum & Rubin (1983), partly also

Heckman & Robb (1985)


The basic idea of propensity score matching:
If the selection process is observable and known it will be
possible to estimate the model P(assigned(x))
Run the model and compute the predicted probability to
be assigned
For each individual in the treatment group the goal is to
find one (several) untreated
individuals that have the same probability to be assigned

Result:

Two groups (treated untreated) with the same (not 1/N)


probability to be assigned,
i.e. constructing the same preconditions as for a RCT.
Matching take place in only one dimension

Propensity score (cont..)


The

common probability support


problem
A

comparison is only possible for those


individuals where at least one match is
identified
Example:
In the audit of start-up grants (SUG) we tried to

search for matches among those not attending


any program (no treatment). For our 15 000
treated we ended up with matches for only 1250,
i.e. less than 10 per cent.

(Gerfin and Lechner (2000) rejects 14 percent, Frlish et.al


(2000) rejects 27 percent)

Conclusion: Individuals that entered the SUG were

to unlike those that never had entered a program


=> propensity score does not work

Advantages
1. Gives the evaluator a possibility to

replicate a RCT
2. Research (Dehijia and Wahba, 1998;
1999) has shown the propensity
score
matching
method
can
replicate the results of a RCT (high
internal validity)
3. Since PSM dont have any limitations
on data (more than accessibility) large
samples can be evaluated (high
external validity)

Disadvantages
Evaluators

has to put a lot of effort in


investigating about
how do the selection process works and
how to model it?
Needs extremely good access to data

Heterogeneity

between treated and


potential controls
=> few observations on common
support Evaluation results for only a
few

Propensity score matching


When will it work O.K.:
Relatively homogeneous target group
Common goals that are well defined
Clear intervention logic

Problem
If there are factors that affect selection
and outcome that are
not observable in the data and no good
proxy variable exist
=> biased estimates and standard errors
4
3

ange presentationens titel under "infoga


sidhuvud/sidfot"

201604-14

Papers
Caliendo, M. and Kopeinig, S. (2005). Some

Practical Guidance for the Implementation


of Propensity Score Matching, IZA DP No.
1588, http://ftp.iza.org/dp1588.pdf
Rosenbaum, P. R. (2002): Observational
Studies. Springer, New
York. (book)
Rubin, D. (1974): \Estimating Causal
Eects to Treatments in Randomised and
Nonrandomised Studies," Journal of
Educational Psychology, 66, 688-701.
Rosenbaum, P., and D. Rubin (1983): \The
Central Role of the Propensity Score in

Audit reports
Riksrevisionen (2008) Std till start av

nringsverksamhet, Ett
framgngsrikt program, RiR 2008:24,
Riksrevisionen.

Behrenz, L., Delander, L & Mnsson, J. (2016) Is

starting business a sustainable way out of


unemployment? Treatment effects of the
Swedish start-up subsidy, submitted to Journal
of Labour Research

Riksrevisionen (2010) Arbetspraktik, RiR

2010:05, Riksrevisionen
Mnsson & Lundin (2016) When outcome

definition determines the result in impact


evaluations: an illustration using the Swedish
work- practice programme, Evidence and Policy,

Coarsened Exact Matching

Problem: Sometimes groups that are


(CEM)

targeted homogeneous, intervention logic


unclear and the goal is a little less distinct.

PSM does not work that good since a

lot of separate regressions is


needed
Idea: Go back to the 70s and match in

4
6

characteristics that might influence


selection and outcome. It works today
ange presentationens titel under "infoga
because good computer capacity
sidhuvud/sidfot"

201604-14

Advant
Free of functional form assumptions
ages

(non-parametric)
Manage heterogeneity e.g. gender,
education etc..
Disadvant

age:
Data

intensive
No
solution
might

Papers
Blackwell,

M.; Iacus, S., King, G. &


Porro, G. (2009) cem: Coarsened
exact matching in Stata,The Stata
Journal, 9(4): 524-546

Mnsson

4
8

& Delander (2016): Mentoring


as a way of integrating refugees on the
labour market Evidence from a
Swedish pilot,WEAI conference
Singapore
2016
ange
presentationens titel
under "infoga
2016sidhuvud/sidfot"

04-14

Audit

reports
Riksrevisionen 2014:05
Deltidsarbetslshet

Combinations between
matching and
regressions

Conditional diff in diff


Combine matching and diff in diff

Treated and untreated are matched on before data


Diff. in Diff is applied to the matched data
The idea is that the regression should take away remaining
differences. Normaly the variable efter and treated are
non-significant but the interaction (DiD) is

Paper
Widerstedt, B. & Mnsson, J. (2015), Can business co

unselling help SMEs grow? Evidence from the Swedi


sh business development grant programme, Journal
of Small Business and Enterprise Development,
22(4): 652-665

Synthetic control groups


All have recived the same treament e.g.

reduction in VAT => No conutrafactual


exist by definition
SCG = construct a hypothetical/synthetic

control group
Compare over time by e.g. DiD
Exempel: Falkenhall et al. (2016),

Example
Reduction in the VAT for Swedish

restaurants
Construct a synthetic restaurant industry
by using data from other industries
The method is completely data driven
and do not relay on any subjective
choice

Before
SCG

After

Advantage
Sometimes its the only option to close in

on impacts of large national reforms


Disadvantage
Will never be perfect
Extremely computer intense =

take long time

Papers
Abadie, A., Diamond, A. & Hainmuller, J.

(2010).Synthetic Control Methods for Comparative


Case Studies: Estimating the Effect of Californias
Tobacco Control Program, Journal of the American
Statistical Association, 105(490): 493- 505

Abadie, A. & Gardezabal, J. (2003). The Economic

Costs of Conflict: A Case Study of the Basque


Country, American economic review, 93(1): 113-132
Ando, M. (2014). Dreams of urbanization:

Quantitative case studies on the local impacts of


nuclear power facilities using the synthetic control
method, Journal of Urban Economics, 85: 68-85

Falkenhall, Mnsson och Tano (2016) The impact of

the VAT reduction in Swedish Restaurant sector A


synthetic control group approach, Scottish

Check list and key

Do we have treated and untreated groups?


questions
No -> Impact not possible
Yes

Why are the treated treated?


Random -> (e.g. some variation in

implementation = natural
experiment) => compare outcome
Selection?
Can we observe (in data) characteristics

that are key element


in selecting treated and that might affect
the outcome?
No -> Impact not possible
Yes -> Quasi experimental methods

Do we have data both


Yes ->and
before
after? (DiD or Conditional DiD)
regression
No- > matching methods (weak identification

most times)
Is there a common goal and the selection is

quite homogeneous
Yes =Props score
No => matching on characteristics

Do prices on impacts exist or can be

constructed?
No-> Efficiency not possible
Yes -> Efficiency evaluation (Value for Money in

an impact context)

Summary
To identify impacts a relevant control

6
0

group is needed
Identification, identification,
identification.
Randomising is the most secure way
however sometimes it is neither possible
or suitable
Look for variation in the
implementation natural
experiment
Quasi experimental methods is
ange
presentationens
titel under
"infogaintense
2016almost
always
data
sidhuvud/sidfot"
04-14

The Family tree of


evaluation methods

) P V OJ

M A D ( W ITH

aw

OJ

)PY

COPY

M A D ( W I TH

IAl

( OPV

M
D[

IT H

A ti"'i'lr -

't

(J

5martDraw

MAOf
I Al

'WI TH

A TR

Quasi
F!'
0
rimental

Dl
OF

W I TH A U l A l

5
1

5martDraw

1;...-.rimental

OF

OJ

5martDraw

aw
)

Ul

COPY

IHDf

5
1
O PV 0

OF

Synthetic

>P v o r

aw
>PY OF

aw
=

diff-in diff
conditional difference
in difference
)p

V OF

aw

MAD I W I I H

H I Al

COPV

OF

5martDraw

Ot

5
1

Das könnte Ihnen auch gefallen