Sie sind auf Seite 1von 30

Cyrus Samii, Columbia University

Practical Sampling for


Impact Evaluations
Introduction
How do we construct a sample to credibly detect a
meaningful effect?
Which populations or groups are we interested in and where do
we find them?
How many people/firms/units should be interviewed/observed
from that population?
How does this affect the evaluation budget?

Warning!
Goal of presentation is not to make you a sampling expert
Goal is also not to give you a headache.
Rather an overview: How do sampling features affect what it is
possible to learn from an impact evaluation?
Outline

1. Sampling frame
What populations or groups are we interested in?
How do we find them?

2. Sample size
Why it is so important: confidence in results
Determinants of appropriate sample size
Further issues
Examples

3. Budgets
Sampling frame
Who are we interested in?
a) All SMEs?
b) All formal SMEs?
c) All formal SMEs in a particular sector?
d) All formal SMEs in a particular sector in a particular region?

Need to keep in mind external validity


Can findings from population (c) inform appropriate programs to help
informal firms in a different sector?
Can findings from population (d) inform national policy?

But should also keep in mind feasibility and what you want to
learn
Might not be possible or desirable to pilot a very broadly defined
program or policy
Sampling frame:
Finding the units were interested in
Depends on size and type of experiment
Lottery among applicants
Example: BDS program among informal firms in a particular area
Can use treatment and comparison units from applicant pool
If not feasible (50,000 get the treatment), need to draw a sample to
measure impact
Policy change
Example: A change in business registration rules in randomly selected
districts
To measure impact on profits, cannot sample all informal businesses in
treatment and comparison districts.
Will need to draw a sample of firms within districts.
Required information before sampling
Complete listing all of units of observation available for sampling in each area or group
Tricky for units like informal firms, but there are techniques to overcome this
Outline

1. Sampling frame
What populations or groups are we interested in
How do we find them?

2. Sample size
Why it is so important: confidence in results
Determinants of appropriate sample size
Further issues
Examples

3. Budgets
Sample size and confidence

Start with a simpler question than program


impact
Say we wanted to know the average annual
profits of an SME in Dakar.
Option 1: We go out and track down 5 business
owners and take the average of their responses.
Option 2: We track down 1,000 business owners and
average their responses.

Which average is likely to be closer to the true


average?
Sample size and confidence:
1,000 firms
5 firms
Profits Number of firms Profits Number of firms
$0 - $1,000 1 $0 - $1,000 70
$ 1,001 -$5,000 2 $ 1,001 -$5,000 150
$5,001-10,000 1 $5,001-10,000 650
$10,001, - $15,000 0 $10,001, - $15,000 125
$15,001 + 1 $15,001 + 5
Sample size and confidence
Similarly, when determining program impact
Need many observations to say with confidence whether
average outcome of treatment group is higher/lower than
in comparison group

What do I mean by confidence?


Minimizing statistical error

Types of errors
Type 1 error: You say there is a program impact when
there really isnt one.
Type 2 error: There really is a program impact but you
cannot detect it.
Sample size and confidence
Type 1 error: Find program impact when theres none
Error can be minimized after data collection, during statistical analysis
Need to adjust the significance levels of impact estimates (e.g. 99% or
95% confidence intervals)

Type 2 error: Cannot see that there really is a program impact


In jargon: statistical test has low power
Error must be minimized before data collection
Best method of doing this: ensuring you have a large enough sample

Whole point of an impact evaluation is to learn something


Ex ante: We dont know how large the impact of this program is
Low powered ex-post: This program might have increased firms
profits by 50% but we cannot distinguish a 50% increase from an
increase of zero with any confidence
Calculating sample size

Theres actually a formula. Dont get scared.


2 2 ( z / 2 z ) 2
N 2 1 ( H 1)
D

Main things to be aware of:


1. Detectable effect size
2. Probability of type 1 and 2 errors
3. Variance of outcome(s)
4. Units (firms, banks) per treated area
Calculating sample size

Detectable effect size


Smallest effect you want to be able to distinguish from zero
A 30% increase in sales, a 25% decrease in bribes paid

Larger samples easier to detect smaller effects

Do female and male entrepreneurs work similar hours?


Claim: On average, women work 40 hours/week, men work 44
hours/week
If statistic came from sample of 10 women & 10 men
Hard to say if they are different
Would be easier to say they are different if women work 30 hours/week and men
work 80 hours/week
But if statistic came from sample of 500 women and 500 men
More likely that they truly are different
Calculating sample size

How do you choose the detectable effect


size?
Smallest effect that would prompt a policy
response
Smallest effect that would allow you to say that a
program was not a failure
This program significantly increased sales by 40%.
Great - lets think about how we can scale this up.
This program significantly increased sales by 10%.
Great.uh..wait: we spent all of that money and it only increased
sales by that much?
Calculating sample size

Type 1 and Type 2 errors


Type 1
Significance level of estimates usually set to 1% or 5%
1% or 5% probability that there is no effect but we think
we found one
Type 2
Power usually set to 80% or 90%
20% or 10% probability that there is an effect but we
cannot detect it
Larger samples higher power
Calculating sample size

Variance of outcomes
Less underlying variance easier to detect
difference can have lower sample size
Calculating sample size

Variance of outcomes
How do we know this before we decide our
sample size and collect our data?
Ideal pre-existing data often .non-existent
Can use pre-existing data from a similar
population
Example: Enterprise Surveys, labor force surveys

Makes this a bit of guesswork, not an exact


science
Further issues

1. Multiple treatment arms


2. Group-disaggregated results
3. Take-up
4. Data quality
Further issues
Multiple treatment arms
Straightforward to compare each treatment separately to
the comparison group
To compare treatment groups requires very large samples
Especially if treatments very similar, differences between the
treatment groups would be smaller
In effect, its like fixing a very small detectable effect size

Group-disaggregated results
Are effects different for men and women? For different
sectors?
If genders/sectors expected to react in a similar way, then
estimating differences in treatment impact also requires
very large samples
Who is taller?
Detecting smaller differences is harder
Further issues

Group-disaggregated results
To ensure balance across treatment and comparison
groups, good to divide sample into strata before
assigning treatment

Strata
Sub-populations
Common strata: geography, gender, sector, initial
values of outcome variable
Treatment assignment (or sampling) occurs within
these groups
Why do we need strata?

Geography example
=T
=C
Why do we need strata?

Whats the impact in a particular region?


Sometimes hard to say with any confidence
Why do we need strata?

Random assignment to treatment within


geographical units
Within each unit, will be treatment, will be
comparison.
Similar logic for gender, industry, firm size,
etc
Further issues

Take-up
Low take-up increases detectable effect size
Can only find an effect if it is really large
Effectively decreases sample size

Example: Offering matching grants to SMEs for BDS


services
Offer to 5,000 firms
Only 50 participate
Probably can only say there is an effect on sales with
confidence if they become Fortune 500 companies
Further issues

Data quality
Poor data quality effectively increases required
sample size
Missing observations
Increased noise
Can be partly addressed with field coordinator on
the ground monitoring data collection
Example from Ghana
Calculations can be made in many statistical packages e.g. STATA, OD
Experiment in Ghana designed to increase the profits of microenterprise firms
Baseline profits
50 cedi per month.
Profits data typically noisy, so a coefficient of variation >1 common.

Example STATA code to detect 10% increase in profits:


sampsi 50 55, p(0.8) pre(1) post(1) r1(0.5) sd1(50) sd2(50)
Having both a baseline and endline decreases required sample size (pre and post)

Results
10% increase (from 50 to 55): 1,178 firms in each group
20% increase (from 50 to 60): 295 firms in each group.
50% increase (from 50 to 75): 48 firms in each group (But this effect size not realistic)

What if take-up is only 50%?


Offer business training that increases profits by 20%, but only half the firms do it.
Mean for treated group = 0.5*50 + 0.5*60 = 55
Equivalent to detecting a 10% increase with 100% take-up need 1,178 in each group instead of 295 in
each group
Outline

1. Sampling frame
What populations or groups are we interested in
How do we find them?

2. Sample size
Why it is so important: confidence in results
Determinants of appropriate sample size
Further issues
Examples

3. Budgets
Budgets

What is required?

Data collection
Survey firm
Data entry

Field coordinator to ensure treatment follows


randomization protocol and to monitor data
collection

Data analysis
Budgets
How much will all of this cost?
Huge range. Often depends on
Length of survey
Ease of finding respondents
Spatial dispersion of respondents
Security issues
Formal vs informal firms
Required human capital of enumerator
Et cetera.
Firm-level survey data:$40-350/firm
Household survey data: $40+/household
Field coordinator: $10,000-$40,000/year
Depends on whether you can find a local hire
Administrative data: Usually free
Sometimes has limited outcomes, can miss most of the informal sector
Summing up
The sample size of your impact evaluation will
determine how much you can learn from your
experiment
Some judgment and guesswork in calculations but
important to spend time on them
If sample size is too low: waste of time and money
because you will not be able to detect a non-zero impact
with any confidence
If little effort put into sample design and data collection:
See above.

Questions?

Das könnte Ihnen auch gefallen