Sie sind auf Seite 1von 38

Evaluating an Evaluation

David Yanagizawa-Drott
Assistant Professor of Public Policy
Evidence for Policy Design Harvard Kennedy School
August 28, 2012

Lecture Overview
How to evaluate an evaluation Internal validity
What type of evaluation? What is used as the counterfactual? What are the implicit assumptions?

External validity

Evaluating an Evaluation
Many impact evaluations are not rigorously designed to best estimate true program impact. It is key to know what the causal impact of a program is, in order to decide when and how much to invest in it. As consumers of evaluations, we must be careful about the evidence we accept.

Evaluating an Evaluation

Evaluating an Evaluation Internal validity External validity


How certain are we about the impact estimates that we measure? Are our impact estimates biased?

Can the results be applied to a different population and setting?

Internal Validity: Ask Yourself


1. What type of evaluation is being presented?
2. What is used to estimate the counterfactual in this type of evaluation? 3. Are the necessary assumptions met to produce a valid impact estimate?

What Type of Evaluation?


Simple Differences Pre-Post Regression Discontinuity Matching Difference-in-Differences Randomized Evaluation (gold standard)

Evaluation Example
Intervention: Suppose ADB would like to know the impact of this workshop on all of you

Outcome of interest: Knowledge of impact evaluation methods measured by a test

Simple Difference Evaluation


Suppose you are assessing an evaluation that used Simple Differences. The comparison group:
Individuals that did not participate in this workshop, but participated in a World Bank workshop on agricultural technology adoption held in this hotel. They are given the same test on impact evaluation methods.

Simple Difference Evaluation

Estimated Impact: 60 %-points increase

Ask Yourself
1.
What type of evaluation?

Simple Difference

The outcome of interest for the What 2. represents the comparison group after the counterfactual? intervention.
What assumptions must hold?

3.

Participants and comparison group are similar except for program participation.

Threats to Internal Validity


Selection The average person participating in the program differs from the average person not participating, even without the training. This difference may account for differences in outcomes between the intervention and comparison groups.

Simple Difference Evaluation


You also have the following information about the comparison group:
1. Both the intervention and comparison group had a high proportion of women. 2. The comparison group is staying mostly on lower floors of the hotel. 3. The comparison group is less likely to have experience conducting impact evaluations. 4. The comparison group also includes policymakers from a variety of Asian countries.

Which of the following might be a problem for the evaluations internal validity? 1. Similarly high proportion of women 2. Staying on lower floors 3. Less evaluation experience 4. Also Asian policymakers

Pre-Post Evaluation
Instead of a Simple Difference study, suppose you are assessing an evaluation that used the Pre-Post method. The comparison group:
Test results of participants right before the workshop

Pre-Post Evaluation

Estimated Impact: 30 %-points increase

Ask Yourself.
What type of 1. evaluation? Pre-Post Evaluation.

2.

What The same individuals before the represents the intervention. counterfactual?
What assumptions must hold?

3.

No other factor contributed to any change in the measured outcome over time.

Threats to Internal Validity


External Shocks An unrelated program, policy, or other shock that occurs over the same time period and affects the outcome of interest. Trends

The change in outcome reflects processes that occur simply over time and that are unrelated to the program.

Evaluation Example
Intervention: Suppose the agriculture ministry has implemented a large-scale, six-month training program on how to use new hybrid seeds. Outcome of interest: Farm yields

Difference-in-Differences
Now suppose you are assessing an evaluation that used D-in-D. The comparison group:
Villages that did not participate in the training program Their yields are measured before and after the training Comparison of change in yields across the two types of villages

Difference-in-Differences
Estimated Impact: 2 tons per hectare

Non-participating villages

Participating villages

Ask Yourself
1.
What type of

evaluation?

Difference-in-Differences

The trend in outcome over the What 2. represents the same period in the comparison counterfactual? group.
What 3. assumptions must hold?

Parallel trends assumption outcomes would have followed the same trend in absence of program.

Differences-in-Differences
Suppose the program targeted villages that had been experiencing declining yields in previous years.

Is the fact that villages with declining yields were targeted a threat to internal validity? 1. Yes 2. No 3. Dont know

Threats to Internal Validity


Differential Shocks An unrelated program, policy, or other shock that affects outcomes in either the intervention or comparison group over the same period of time, but not both. Different Trends

Underlying differences in trends between the two groups.

Randomized Evaluation
Now suppose you are assessing an evaluation that is a randomized experiment. The comparison group:
Villages that were randomly selected for the control group, and did not receive the training

Randomized Evaluation

Estimated Impact: 3 tons per hectare

Randomized Evaluation
What type of 1. evaluation?

Randomization

What The participants assigned to the 2. represents the control group. counterfactual?
What 3. assumptions must hold?

Randomization worked; the two groups are identical on average on observed and unobserved factors.

Threats to Internal Validity


Randomization Failure The randomization protocol was not properly followed. Insufficient sample size The sample is not large enough to ensure valid results.

Selective Attrition
Attrition is related to being assigned to the program and unaccounted for in the analysis.

Evaluating an evaluation
Making bigger assumptions (which are less likely to hold) increases the probability of incorrectly estimating the impact
Always assess the assumptions necessary for an evaluation to see if the assumptions are likely to hold Across impact evaluation methods, assumptions are most easily assessed and most likely to hold in a randomized evaluation

Evaluation Example
Intervention: The government decides to offer a subsidy for voluntary health insurance to people in rural areas. All households are eligible. Outcome of interest: Out-of-pocket health spending.

Evaluation method: Two years into the program, an evaluation compares spending for households that signed up for the program to spending for households that didnt sign up.
Estimated Impact: 25% reduction in spending

What type of evaluation is this?


A. Pre-Post B. Simple Difference C. Matching D. Difference-inDifferences E. Regression Discontinuity F. Randomization

Is it likely that the evaluation produced a credible estimate of the programs impact? A. Yes B. No C. Dont know

Is it likely that the evaluation produced a credible estimate of the programs impact? A. Yes B. No C. Dont know
The estimate is unlikely to capture the true program effect. Selection: Relatively sick people are more likely to sign up for the program. It could also be that poor people demand health insurance more. The comparison group is therefore not valid.

Lecture Overview
How to evaluate an evaluation Internal validity
What type of evaluation? What is used as the counterfactual? What are the implicit assumptions?

External validity

External Validity Internal validity External validity


How certain are we about the impact estimates that we measure? Are our impact estimates biased?

Can the results be applied to a different population and setting?

External Validity
Population

Is the evaluation sample representative of the population of interest?


Do the conditions that allow the program to be effective in one setting exist in another setting?

Setting

Scale

Can it be replicated at a large (national) scale with same impact?

Take-Aways
Some impact evaluation methods are generally considered as more reliable than others. Some methods, such as Simple Difference and Pre-Post, can easily provide false conclusions about program impact. Always identify and test your assumptions in order to assess the strength of an impact evaluation When evaluating an evaluation, first assess the internal validity, then the external validity.

Das könnte Ihnen auch gefallen