Beruflich Dokumente
Kultur Dokumente
OF SAMPLE SURVEYS
Hukum Chandra
ICAR-Indian Agricultural Statistics Research Institute,
New Delhi
Email: hchandra@iasri.res.in
About you
What
Objectives
To
Statistical Preliminaries
Definition of
Survey
Census
Sample Survey
Sample Survey Theory
Target population
Survey population
Sampling frame
Notation
Finite population parameter
One way of obtaining the required information is to collect the data for
each and every unit belonging to the population and this procedure of
obtaining information is termed as complete enumeration (Census)
The effort, money and time required for the carrying out complete
enumeration to obtain the different types of data will, generally, be
extremely large
However, if the information is required for each and every unit in the
domain of study, a complete enumeration is clearly necessary.
But there are many situations, where only summary figures are required
for the domain of study as a whole or for group of units.
What is sampling?
Questionnaire
Diary
Physical measurements
Examples:
Sampling pasta from a pan
Sampling apples from a market stall
Population
Population consists of complete set of all observations
of interest
necessary to identify what does and what does not
belong to the population
All households in India in 2000
All women aged 15-49 in India in 2000
All businesses in the Delhi in 2014 with more than
1000 employees
All 15 year olds in India in 2011
Sample
Sampling
Definitions
Element : An element is a unit about which we require information. For example, a
field growing a particular crop is an element for collecting information on the yield of a
crop.
Population : Complete set of all observations of interest.
It is the totality of elements under consideration on which inference is required.
Thus, all fields growing a particular crop in a region constitute a population.
Sampling units
A group of elements constitute a sampling unit
Elements belonging to different sampling units are non-overlapping
A sampling unit may have one or more than one element
Sampling units are convenient as well as relatively inexpensive to observe and
identifiable
For example, it is convenient to select households for collecting data on milk
produced by animals rather than contacting the elements directly
10
Definitions
Sampling frame
An
sample is a subset where units are chosen with the help of probabilities
(Sampling).
11
Sampling Error
12
Non-Sampling Error
13
Censuses
(all members of the population of interest are
studied)
14
15
16
Census
Sample Survey
17
18
Example
x
19
Example
x
20
21
Sample Design
22
23
24
25
Convenience sampling:
extremely cheap and quick but very large bias
Purposive (Quota) sampling:
Cheaper and quicker than random sampling, but
potential for availability/ willingness bias even after
weighting
Random (probability) sampling:
More expensive/ slower; will have nonresponse bias
(because of people refusing to take part)
if a good response rate then should have significantly
less bias then quota sample
26
Quota Sampling
Quota categories are specified and
replicable; but interviewer preference
typically rules on how to fulfil quotas
Inference based on subjective judgement
Prone to severe availability and
willingness bias; weighting is essential
but bias can remain
Confidence intervals cannot be
calculated
Cheaper and quicker
27
28
29
30
Therefore,
31
In
case, this procedure is repeated till n distinct units are selected and all
repetitions are ignored, it is called a simple random sampling without
replacement (wor)
32
Advantages:
Easy to understand
Used as yardstick for assessing efficiency of
complex samples
Disadvantages:
Can be time consuming to implement
Can be costly
Statistically not the most efficient method of
sampling (e.g. use of stratification to improve
efficiency)
33
34
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
.
.
.
98
99
100
35
Systematic Sampling
36
Systematic Sample
37
Systematic Sampling
38
Systematic Sampling
Advantages
Easy
to understand
Quick and easy to implement
Arranging the frame in stratified order will create
implicit stratification
Disadvantages
Periodicity:
39
40
41
Stratified Sampling
42
Stratified Sampling
43
Stratified Sampling
44
Whole Sampling
frame (size N)
N
North
South
East
West
N1
N2
N3
N4
45
Stratified Sample
Can be
Proportionate (same sampling fraction for each strata)
Disproportionate (different sampling fractions),
this means
differential probabilities of selection
e.g. often small subgroups are selected with a higher
sampling fraction than the rest of the population to
ensure a larger number of them in your final sample to
facilitate analysis
46
47
48
49
Cluster sampling
50
Cluster sampling
51
Cluster sampling
52
Cluster sampling
53
54
55
Multi-stage sampling
56
57
Successive Sampling
58
Successive Sampling
59
Multiphase Sampling
60
Multiphase Sampling
62
63
63
64
64
Stratification I
Outline
What
is stratification ?
Implicit and explicit stratification
Systematic sampling
Implementation of stratification
Some examples of stratification
67
Review
68
Random Sampling
nh
fh
Nh
69
70
71
Systematic Sampling
Recall session 1
Involves sampling at a fixed interval down a list
If the list is ordered in some meaningful way, this has the
effect of stratification
Advantage of being easy to implement
Procedure: calculate the required interval (K=N/n), then
generate a random start (R) (random number between 1
and K). The sampled units are then the Rth, (R+K)th,
(R+2K)th etc units on the list.
72
73
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
.
.
.
98
99
100
74
Stratum Construction
76
etc)
Income / occcupation (e.g social economic group, social
class )
77
78
Stage 2
Within each sector, addresses are in postcode order,
and selected systematically. This provides some
geographical stratification.
79
80
81
fh
82
Nh
ph *
h 1
Nh
( h 1 N h )
H
N1
N2
p p1 *
p2 *
( N1 N 2 )
( N1 N 2 )
83
DEFF =
2
SE STRAT
2
SE SRS
84
Variance of a mean:
H
N h2 sh2
x =
var
2
h1 N nh
N h2 ph (1 ph )
p =
var
2
N
nh
h 1
H
85
86
87
Recall session 1
Imposing quotas has similar effect to stratification namely to reduce sampling variance
But, quota sampling also has inherent bias towards more
accessible and more willing population members
This may manifest itself as a bias in the survey
measures
Thus, quota sample estimates could have relatively high
precision, but be biased and therefore have low
accuracy (high mean squared error) (session 3)
88
Stratification II
Outline of session
Variable
Sampling Fractions
Motivations
Optimal allocation
Design effects
90
nh
fh
Nh
91
Nh
wi
for i h
nh
92
Use of weights
93
94
Examples
1. A national survey where estimates are also required for
each of the component countries /regions
E.g. survey of the UK, but estimates for Scotland, Wales and NI
are also needed separately
Then a larger sampling fraction might be used in Wales and
Scotland compared to England.
95
96
N h2 sh2
x =
var
2
h1 N nh
nh
1 N
(6.1)
N h2 ph (1 ph )
nh
p =
var
1 N
2
N nh
h
h 1
H
97
(6.2)
98
The expression
nh
1
Nh
99
N h2 sh2
x =
var
2
N
nh
h 1
(6.3)
Variance of a proportion:
N h2 ph (1 ph )
p =
var
2
N
nh
h 1
H
100
(6.4)
101
x =
var
For a proportion:
p =
var
102
h h
2
(6.5)
nh ph 1 ph
n
(6.6)
103
104
Example (cont)
2 N 2 S2 2
var x =
2
4
N
n1
N 2 S22
2
4 N n2
S2
S2
=
2n1 4n2
2
105
Example (cont)
Now, consider two alternative sample designs:
a.) Proportional allocation
i.e. where
nh N h
n N
106
It follows:
For
a.) Substitute n1 = n2 = n/2 :
S2 2 S2 2
S2 2
var x
1.5
2n
n
n
S2 2
S2 2
S2 2
var x
1.457
n
1.16n 1.68n
107
Example (cont)
2
SEVSF
1.457
0.97
2
1.5
SE SRS
DEFTVSF
DEFFVSF 0.98
108
Example (cont)
109
Note
110
Optimal Allocation
Nh
Ch
where Ch is the unit cost of data collection for a unit in
stratum h.
If data collection costs do not vary between strata, this
simplifies to:
nh / N h Sh
nh / N h K
111
112
Example:
Again suppose H = 2, and N1 = N2.
But now suppose that stratum variances are
equal, i.e.
S12 S22
113
Example (cont)
var x =
2
n
n
n
2N n
N 2
N 2
2
2
114
Example (cont)
var x =
2
n
n
2
n
n
8n
2
N 2
N
4
4
3
3
3
3
It follows:
DEFFVSF
2
SEVSF
9 S 2 / 8n
9 / 8 1.125
2
2
SE SRS
S /n
115
Example (cont)
This means:
The sampling variance under design b) is 9/8 (=1.125)
times that under design a).
By allocating disproportionately, we have lost precision
(in the case of equal stratum variances)!
116
nh wh
nh wh2
2
VSF
neff
117
118
119
120
3.4
DEFF VSF
3
2.6
2.2
1.8
1.4
1
0
0.6
0.4
0.2
0.8
n1/n
w2=2
121
w2=4
w2=10
122
Multi-Stage Sampling
Outline of session
124
125
126
Examples:
business survey :
PSUs might be companies
SSUs might be workplaces
Elements might be employees
127
128
Constraint
Implication
Efficient fieldwork
Training/ briefing/
learning costs
129
130
131
Selection Probabilities
132
Selection Options
133
Selection Options
134
Selection Options
135
Selection Options
136
137
138
139
1000
900
800
1200
1500
1300
1100
500
1000
700
P (i) =
Pr(i| j)=D/Nj P(j) x P(i| j)=C*D
3x1000/10000
3x 900/10000
3x 800/10000
3x1200/10000
3x1500/10000
3x1300/10000
3x1100/10000
3x1500/10000
3x1000/10000
3x 700/10000
25/1000
25/ 900
25/ 800
25/1200
25/1500
25/1300
25/1100
25/ 500
25/1000
25/ 700
C=3/10000
140
D=25
________
10000
75/10000
75/10000
75/10000
75/10000
75/10000
75/10000
75/10000
75/10000
75/10000
75/10000
141
Size
Cum. size
Selection
_______________________________________________________
1
2
3
4
5
6
7
8
9
10
1000
900
800
1200
1500
1300
1100
500
1000
700
1000
1900
2700
3900
5400
6700
7800
8300
9300
10000
142
*
*
143
144
145
var (X) =
1
2
2 ( xi =2)4/6
6 i 1
146
= 2/3
Example (cont)
a) divide population into 3 clusters: (1,1) (2,2) and (3,3).
Then: no variance within clusters (homogeneous
clusters). But variance between the cluster means is:
var (XB) = [(1-2)2 + (2-2)2 +(3-2)2] /3 = 2/3.
It implies that sampling variance is greater than 0 since
we get different estimates of the mean depending on
which cluster is sampled.
147
Example (cont)
b) divide the population into 2 clusters: (1,2,3) (1,2,3).
No variance between cluster means. But variance
within each cluster is:
Var (XW) = 2* [[(1-2)2 + (2-2)2 +(3-2)2]/3] /2 = 2/3
The sampling variance is 0 since there is no
variability in sample means.
With design a) all the variance is between clusters clusters are perfectly homogeneous.
With design b), clusters are as heterogeneous as the
population as a whole, so cluster sampling would not
cause a loss in precision.
148
Example (cont)
149
Examples:
people within postcode sectors,
pupils within schools,
students within classes
employees within firms.
150
Intra-Cluster Correlation
DEFFCL 1 b 1
where b is sample size per cluster (in practice b may
vary slightly, in which case mean cluster size provides an
adequate approximation), and (roh) is the intracluster correlation.
151
152
x 1.96 * SE * DEFTCL
153
DEFT
DEFT
if b=10
Variable
Household size
Owner-occupier
Has telephone
Asian
Roman Catholic
0.070
0.231
0.102
0.334
0.037
16.6
16.5
16.5
8.3
16.4
1.45
2.14
1.61
1.86
1.25
1.28
1.75
1.38
1.53
1.15
0.021
0.044
0.021
8.4
8.3
8.2
1.08
1.15
1.07
1.03
1.08
1.04
154
Note
islow for attitudinal variables, so design effects
small (DEFT small). But large for variables related to
ethnicity and housing type.
155
References
Cochron, W.G., (1977). Sampling techniques; Wiley Eastern Ltd.
Sampling
theory;
Tata-Mcgraw-Hill
Publishing
Hansen, M.H., Hurwitz, W.H. and Madow, W.G., (1993). Sample survey
methods and theory, Vol. 1 and Vol. 2; John Wiley & Sons, Inc.
Murthy, M.N., (1977). Sampling theory and methods; Statistical
Publishing Society
156
156
157
157