K. Lecture 7 - Evaluating An Evaluation

Evaluating an Evaluation
David Yanagizawa-Drott
Assistant Professor of Public Policy
Evidence for Policy Design Harvard Kennedy School
August 28, 2012
Lecture Overview
How to evaluate an evaluation Internal validity
What type of evaluation? What is used as the counterfactual? What are the implicit assumptions?
External validity
Many impact evaluations are not rigorously designed to best estimate true program impact. It is key to know what the causal impact of a program is, in order to decide when and how much to invest in it. As consumers of evaluations, we must be careful about the evidence we accept.
Evaluating an Evaluation Internal validity External validity

How certain are we about the impact estimates that we measure? Are our impact estimates biased?
Can the results be applied to a different population and setting?
Internal Validity: Ask Yourself

1. What type of evaluation is being presented?
2. What is used to estimate the counterfactual in this type of evaluation? 3. Are the necessary assumptions met to produce a valid impact estimate?
What Type of Evaluation?

Simple Differences Pre-Post Regression Discontinuity Matching Difference-in-Differences Randomized Evaluation (gold standard)
Evaluation Example
Intervention: Suppose ADB would like to know the impact of this workshop on all of you
Outcome of interest: Knowledge of impact evaluation methods measured by a test
Simple Difference Evaluation

Suppose you are assessing an evaluation that used Simple Differences. The comparison group:
Individuals that did not participate in this workshop, but participated in a World Bank workshop on agricultural technology adoption held in this hotel. They are given the same test on impact evaluation methods.
Estimated Impact: 60 %-points increase
Ask Yourself
1.
What type of evaluation?
Simple Difference
The outcome of interest for the What 2. represents the comparison group after the counterfactual? intervention.
What assumptions must hold?
3.
Participants and comparison group are similar except for program participation.
Threats to Internal Validity

Selection The average person participating in the program differs from the average person not participating, even without the training. This difference may account for differences in outcomes between the intervention and comparison groups.

You also have the following information about the comparison group:
1. Both the intervention and comparison group had a high proportion of women. 2. The comparison group is staying mostly on lower floors of the hotel. 3. The comparison group is less likely to have experience conducting impact evaluations. 4. The comparison group also includes policymakers from a variety of Asian countries.
Which of the following might be a problem for the evaluations internal validity? 1. Similarly high proportion of women 2. Staying on lower floors 3. Less evaluation experience 4. Also Asian policymakers
Pre-Post Evaluation
Instead of a Simple Difference study, suppose you are assessing an evaluation that used the Pre-Post method. The comparison group:
Test results of participants right before the workshop
Pre-Post Evaluation
Estimated Impact: 30 %-points increase
Ask Yourself.
What type of 1. evaluation? Pre-Post Evaluation.
2.
What The same individuals before the represents the intervention. counterfactual?
What assumptions must hold?
3.
No other factor contributed to any change in the measured outcome over time.

External Shocks An unrelated program, policy, or other shock that occurs over the same time period and affects the outcome of interest. Trends
The change in outcome reflects processes that occur simply over time and that are unrelated to the program.
Evaluation Example
Intervention: Suppose the agriculture ministry has implemented a large-scale, six-month training program on how to use new hybrid seeds. Outcome of interest: Farm yields
Difference-in-Differences
Now suppose you are assessing an evaluation that used D-in-D. The comparison group:
Villages that did not participate in the training program Their yields are measured before and after the training Comparison of change in yields across the two types of villages
Estimated Impact: 2 tons per hectare
Non-participating villages
Participating villages
Ask Yourself
1.
What type of
evaluation?
The trend in outcome over the What 2. represents the same period in the comparison counterfactual? group.
What 3. assumptions must hold?
Parallel trends assumption outcomes would have followed the same trend in absence of program.
Differences-in-Differences
Suppose the program targeted villages that had been experiencing declining yields in previous years.
Is the fact that villages with declining yields were targeted a threat to internal validity? 1. Yes 2. No 3. Dont know

Differential Shocks An unrelated program, policy, or other shock that affects outcomes in either the intervention or comparison group over the same period of time, but not both. Different Trends
Underlying differences in trends between the two groups.
Randomized Evaluation
Now suppose you are assessing an evaluation that is a randomized experiment. The comparison group:
Villages that were randomly selected for the control group, and did not receive the training
Estimated Impact: 3 tons per hectare
What type of 1. evaluation?
Randomization
What The participants assigned to the 2. represents the control group. counterfactual?
What 3. assumptions must hold?
Randomization worked; the two groups are identical on average on observed and unobserved factors.

Randomization Failure The randomization protocol was not properly followed. Insufficient sample size The sample is not large enough to ensure valid results.
Selective Attrition
Attrition is related to being assigned to the program and unaccounted for in the analysis.
Evaluating an evaluation
Making bigger assumptions (which are less likely to hold) increases the probability of incorrectly estimating the impact
Always assess the assumptions necessary for an evaluation to see if the assumptions are likely to hold Across impact evaluation methods, assumptions are most easily assessed and most likely to hold in a randomized evaluation
Evaluation Example
Intervention: The government decides to offer a subsidy for voluntary health insurance to people in rural areas. All households are eligible. Outcome of interest: Out-of-pocket health spending.
Evaluation method: Two years into the program, an evaluation compares spending for households that signed up for the program to spending for households that didnt sign up.
Estimated Impact: 25% reduction in spending
What type of evaluation is this?

A. Pre-Post B. Simple Difference C. Matching D. Difference-inDifferences E. Regression Discontinuity F. Randomization
Is it likely that the evaluation produced a credible estimate of the programs impact? A. Yes B. No C. Dont know
Is it likely that the evaluation produced a credible estimate of the programs impact? A. Yes B. No C. Dont know
The estimate is unlikely to capture the true program effect. Selection: Relatively sick people are more likely to sign up for the program. It could also be that poor people demand health insurance more. The comparison group is therefore not valid.
Lecture Overview
How to evaluate an evaluation Internal validity
What type of evaluation? What is used as the counterfactual? What are the implicit assumptions?
External validity
External Validity Internal validity External validity

How certain are we about the impact estimates that we measure? Are our impact estimates biased?
Can the results be applied to a different population and setting?
External Validity
Population
Is the evaluation sample representative of the population of interest?

Do the conditions that allow the program to be effective in one setting exist in another setting?
Setting
Scale
Can it be replicated at a large (national) scale with same impact?
Take-Aways
Some impact evaluation methods are generally considered as more reliable than others. Some methods, such as Simple Difference and Pre-Post, can easily provide false conclusions about program impact. Always identify and test your assumptions in order to assess the strength of an impact evaluation When evaluating an evaluation, first assess the internal validity, then the external validity.

K. Lecture 7 - Evaluating An Evaluation

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

K. Lecture 7 - Evaluating An Evaluation

Hochgeladen von

Copyright:

Verfügbare Formate

Evaluating an Evaluation

Evaluating an Evaluation Internal validity External validity

Can the results be applied to a different population and setting?

Internal Validity: Ask Yourself

What Type of Evaluation?

Outcome of interest: Knowledge of impact evaluation methods measured by a test

Simple Difference Evaluation

Simple Difference Evaluation

Estimated Impact: 60 %-points increase

Threats to Internal Validity

Simple Difference Evaluation

Estimated Impact: 30 %-points increase

Threats to Internal Validity

Threats to Internal Validity

Underlying differences in trends between the two groups.

Estimated Impact: 3 tons per hectare

Threats to Internal Validity

What type of evaluation is this?

External Validity Internal validity External validity

Can the results be applied to a different population and setting?

Is the evaluation sample representative of the population of interest?

Can it be replicated at a large (national) scale with same impact?

Das könnte Ihnen auch gefallen