1.overview of Sampling Scheme Hukum Chandra

VARIOUS ELEMENTARY CONCEPTS
OF SAMPLE SURVEYS
Hukum Chandra
ICAR-Indian Agricultural Statistics Research Institute,
New Delhi
Email: hchandra@iasri.res.in
About you
What
experiences (if any) do you have of Survey

Sampling?
Objectives
To
introduce various sampling schemes
Statistical Preliminaries
Definition of
Survey
Census
Sample Survey
Sample Survey Theory
Target population
Survey population
Sampling frame
Notation
Finite population parameter
Complete Enumeration (Census)
One way of obtaining the required information is to collect the data for
each and every unit belonging to the population and this procedure of
obtaining information is termed as complete enumeration (Census)
The effort, money and time required for the carrying out complete
enumeration to obtain the different types of data will, generally, be
extremely large
However, if the information is required for each and every unit in the
domain of study, a complete enumeration is clearly necessary.
Examples of such situations are preparation of voter list for election

purposes
But there are many situations, where only summary figures are required
for the domain of study as a whole or for group of units.
Need for Sampling
An effective alternative to a complete enumeration is sample

survey where only some of the units selected from the population
are surveyed and inferences are drawn about the population on the
basis of sample
In certain investigations, it may be essential to use specialized

equipment or highly trained field staff for data collection, making it
almost impossible to carry out such investigations
If a sample survey is carried out according to certain specified

statistical principles, it is possible not only to estimate the value of
the characteristic for the population, but also to get a valid estimate
of the sampling error of the estimate
What is sampling?
Sampling proceeds in several stages:

Define scope and objectives of the study, including
Population to be studied (Identify the population of interest)

General information to collect
Choose tools and techniques for making observations, e.g.
Questionnaire
Diary
Physical measurements
Select (sample) some members of the population (units)

Study the sample (Gather data on the sample)
Draw inferences about the population (Analyze the data and
make inferences)
Examples:
Sampling pasta from a pan
Sampling apples from a market stall
Population
Population consists of complete set of all observations
of interest
necessary to identify what does and what does not
belong to the population
All households in India in 2000
All women aged 15-49 in India in 2000
All businesses in the Delhi in 2014 with more than
1000 employees
All 15 year olds in India in 2011
Populations and samples

Population
Sample
Sampling
The process of how to obtain a sample from the population is

referred to as sampling
9
Definitions
Element : An element is a unit about which we require information. For example, a
field growing a particular crop is an element for collecting information on the yield of a
crop.
Population : Complete set of all observations of interest.
It is the totality of elements under consideration on which inference is required.
Thus, all fields growing a particular crop in a region constitute a population.
Sampling units
A group of elements constitute a sampling unit
Elements belonging to different sampling units are non-overlapping
A sampling unit may have one or more than one element
Sampling units are convenient as well as relatively inexpensive to observe and
identifiable
For example, it is convenient to select households for collecting data on milk
produced by animals rather than contacting the elements directly
10
Definitions
Sampling frame
An
exhaustive list of all the sampling units constitutes a sampling frame.

An example of a sampling frame may be cultivator fields growing a particular
crop or households containing animals in a region.
Sample: A subset of the population.

A part
of the population selected from a sampling frame for the purpose of

making inference about the population is called as a sample.
For example, a subset of the cultivator fields may be selected to estimate the
yield of a crop in a region.
A random
sample is a subset where units are chosen with the help of probabilities
(Sampling).
11
Sampling Error
The error which arises due to use of sample to estimate the

population parameters
Whatever method of sampling is used, there will always be a
difference between population value and its corresponding
estimate
This error is unavoidable in every sampling scheme.
A sample with the smallest sampling error will always be
considered as a good representative of the population.
This error can be reduced by increasing the size of the sample
12
Non-Sampling Error
Besides sampling error, the sample estimate may be

subject to other error which arises due to failure to
measure some of the units in the selected sample,
observational errors or errors introduced in editing,
coding and tabulating the results
Generally, census results may suffer from nonsampling error although these may be free from
sampling error
The non sampling error is likely to increase with
increase in sample size, while sampling error
decreases with increase in sample size
13
Alternatives to Sample Surveys
Analysis of administrative records (administrative

data)
(for example Health Authority data, Crime records by
Home Office or Police, School Authority data, tax
records etc)
Censuses
(all members of the population of interest are
studied)
14
Sample Surveys vs Admin Data
Administrative data may not focus on same population

(as the one of interest)
May not contain all required information
Based on definitions devised for administrative purposes
May have incomplete coverage, be out of date,
inaccurate etc
Surveys can adopt desired definitions, collect desired
data etc
15
Sample Versus Census

Which is better?
Cost
Speed
Practicality and Feasibility
Data Quality
Detail (e.g. questionnaire)
Ability to analyse small
subsets
Timeliness
Sampling Error
Inference to population
16
Census
Sample Survey
From Population to Sample
Population parameter (e.g. population mean, average

household income, or population proportion, e.g. infant
mortality rate) based on population data
refers to a summary value of variable in population
Draw a random sample from the population
Based on sample data, calculate a statistic (e.g.
sample mean, sample proportion) also referred to as
estimator
refers to summary value of a variable based on sample
17
From Population to Sample
Estimator: An estimator is a statistic obtained by a

specified procedure for estimating a population
parameter
The estimator is a random variable and its value differs

from sample to sample
Estimate: The particular value, which the estimator

takes for a given sample, is known as an estimate
18
Example
Population parameter: population mean income

denoted
Sample statistics: mean income in the sample

denoted
x
The sample statistic may be used as an estimate for

the population parameter:
x
19
Example
Population parameter: population mean income

denoted
Sample statistics: mean income in the sample

denoted
x
The sample statistic may be used as an estimate for

the population parameter:
x
20
Types of SamplesDifferent Sample Designs
21
Sample Design
A sample design is a plan determined before any data

are actually collected for obtaining a sample from a
given population.
22
Non-Probability versus Probability Samples

Non-probability sampling:
1. Convenience sampling
A sample selected because of its ease of access
to sample members
23

Non-probability sampling:
2. Purposive sampling
a sample selected using a deliberate subjective
choice in order to produce a sample which the
researcher judges to be representative in some
sense
example: a quota sample
represent the major characteristics of the population by
sampling a proportional amount of each. You have to
decide on which specific characteristic to base your
quota
24

Probability sampling
a sample that is selected by a random mechanism,

where each member of the population has a known and
non-zero probability of being in the sample (selection
probability)
important when choosing a random sample, that the

surveyor does not choose the sample himself. It has
been repeatedly shown that the human investigator is
not a satisfactory instrument for making random
selections.
25
Pros and Cons
Convenience sampling:
extremely cheap and quick but very large bias
Purposive (Quota) sampling:
Cheaper and quicker than random sampling, but
potential for availability/ willingness bias even after
weighting
Random (probability) sampling:
More expensive/ slower; will have nonresponse bias
(because of people refusing to take part)
if a good response rate then should have significantly
less bias then quota sample
26
Probability vs Quota samples

Probability Sampling
Method of selection is specified,
objective and replicable
Inference to population based on
mathematics
Protects (to some extent) against
availability and willingness bias
precision of estimates can be
estimated
More expensive, requires more
resources
Depending on nonresponse rate
likely to suffer less overall bias
Quota Sampling
Quota categories are specified and
replicable; but interviewer preference
typically rules on how to fulfil quotas
Inference based on subjective judgement
Prone to severe availability and
willingness bias; weighting is essential
but bias can remain
Confidence intervals cannot be
calculated
Cheaper and quicker
27
Assessing a Sample Design
Virtually all surveys that are taken seriously by social

scientists and policy makers use some form of
probability sampling
One way to ruin an otherwise well-conceived survey is to

use a convenience sample rather than one which is
based on a probability design
28
Types of Probability Samples

An Overview
29
Probability sampling methods

1. Simple random sampling (SRS)
Randomly
chosen selections using a random

number table, computer-generated random
numbers, lottery balls etc
Probably easiest way of obtaining a random
sample
With replacement: replace element back into
selection frame once selected, one unit could be
selected several times
Without replacement
30
Simple Random Sampling (SRS)

This
is the simplest and most basic method of sampling in which

the sample is drawn unit by unit, with equal probability of
selection for each unit at each draw.
Therefore,
it is a method of selection of n units out of a

population of size N by giving equal probability to all units, or
A sampling procedure in which all possible combinations of n

units that may be formed from the population of N units have the
same probability of selection.
31
Simple Random Sampling (SRS)

For
selecting a simple random sample in practice, units from population

are drawn one by one
If
a unit is selected and observation is recorded and then returned to the

population before the next drawing is made and this procedure repeated n
times. This procedure is generally known as simple random sampling with
replacement (wr)
In such a selection procedure, there is a possibility of one or more
population units getting selected more than once
In
case, this procedure is repeated till n distinct units are selected and all
repetitions are ignored, it is called a simple random sampling without
replacement (wor)
32
Simple random sampling
Advantages:
Easy to understand
Used as yardstick for assessing efficiency of
complex samples
Disadvantages:
Can be time consuming to implement
Can be costly
Statistically not the most efficient method of
sampling (e.g. use of stratification to improve
efficiency)
33
Probability sampling methods (cont)

2. Systematic Sampling
A random
start followed by successive application of

the sampling interval
34
Example: Systematic Sampling

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
.
.
.
98
99
100
Determine the number of

units N=100
Determine the sample size
you want n= 20
The interval size is therefore
K=N/n = 100/20 = 5
K=5 (sample one fifth)
Select at random an integer
from 1 to K: e.g. 4 is chosen
Then select every K-th unit
35
Systematic Sampling
Special methods are needed for systematic selection

with a fractional interval
Use of fractional interval
The list from which to sample should be ideally

randomly ordered
36
Systematic Sample
Disadvantage: periodicity in population list

e.g. sampling interval coincides with a periodic interval
of list
Example: suppose you select 1st , 11th, 21st, etc
element, but list is arranged that 1st is a man, 2nd his
wife, 3rd is a man, 4th his wife, etc.
we would obtain a list of males, whereas whole
population made up of males and females
Such periodicity may be easily avoided
Another way to solve this problem is to use stratification
37
Systematic Sampling
In all other sampling methods, the units (whether

elements or clusters) are selected with the help of
random numbers
But, a method of sampling in which only the first unit is
selected with the help of random number while the rest
of the units are selected according to a pre-determined
pattern, is known as systematic sampling
Very useful in
forest surveys for estimating the volume of timber

fisheries surveys for estimating the total catch of fish,
milk yield surveys for estimating the lactation yield
38
Systematic Sampling
Advantages
Easy
to understand
Quick and easy to implement
Arranging the frame in stratified order will create
implicit stratification
Disadvantages
Periodicity:
If units are ordered unnoticed or

unattended this may result in an unusual sample
39

3. Stratified Sampling
If
we have information about the composition of a

population, we may be able to improve on e.g. simple
random sampling by using stratification
Units are aggregated (grouped) into different nonoverlapping subgroups, called strata
Then a certain number of units are randomly selected
from each stratum
40
Example: Stratified Sample
if a surveyor wants to find the most popular TV

programmes, it would be advisable to first divide the
population into 3 strata, men, women and children
then select a random sample from each of the strata
care must be taken to ensure that the strata are nonoverlapping, i.e. there is no element falling into more
than 1 category.
41
Stratified Sampling
The basic idea in this sampling is to divide a heterogeneous

population into sub-populations, usually known as strata
Strata are internally homogeneous in which case a precise
estimate of any stratum mean can be obtained based on a sample
from that stratum
By combining such estimates, a precise estimate for the whole
population can be obtained
This sampling provides a better cross section of the population than
the procedure of simple random sampling
For example, in the case of survey for income estimation, whole
population can be divide into three strata Low-income, Medium and
High-income stratum
42
Stratified Sampling
It may also simplify the organization of the field work.

Geographical proximity is sometimes taken as the basis
of stratification.
The assumption here is that geographically contiguous
areas are often more a like than areas that are far apart.
Administrative convenience may also dictate the basis
on which the stratification is made
Auxiliary information may be taken as the basis of
stratification
43
Stratified Sampling
In stratified sampling, the variance of the estimator consists of only

the within strata variation
Thus, the larger the number of strata into which a population is
divided, the higher, the precision
For estimating the variance within strata, there should be a
minimum of 2 units in each stratum
The larger the number of strata the higher will be the cost of survey
So, depending on administrative convenience, cost of the survey
and variability of the characteristic under study in the area, a
decision on number of strata will have to be arrived at
44
Example: Stratified Sample

whole
Whole Sampling
frame (size N)
N
North
South
East
West
N1
N2
N3
N4
Sample separated by region into 4 strata (N1, N2, N3, N4)
Random subsample of n1/N1
Random sub-sample from each
45
Stratified Sample
Can be
Proportionate (same sampling fraction for each strata)
Disproportionate (different sampling fractions),
this means
differential probabilities of selection
e.g. often small subgroups are selected with a higher
sampling fraction than the rest of the population to
ensure a larger number of them in your final sample to
facilitate analysis
46
Proportionate Stratified Sample

Advantages
Guards
against the more unusual samples that can

be chosen by random chance
If stratifiers are related to the variables in your survey,
stratification can reduce standard errors
Disadvantages
Stratification
information has to be available
47
Disproportionate Stratified Sample

Advantages
Allows
one to over-sample small groups so that a

good statistical comparison can be made
Also used where the goal is to achieve an optimum
allocation between variance and cost
Disadvantages
Estimates
of the total population need to be derived

using weighting (see later sessions)
48

4. Cluster sampling
A cluster is a naturally occurring unit like a county
(country, or state)
Sampling units are selected as part of a cluster of units
Difference to stratified sampling is that the starting point
is a natural cluster, and not made up as in stratified
sampling.
49
Cluster sampling
A sampling procedure presupposes division of the

population into a finite number of distinct and identifiable
units called the sampling units.
The smallest units into which the population can be
divided are called the elements of the population and
group of elements the clusters
A cluster may be a class of students or cultivators fields
in a village
When the sampling unit is a cluster, the procedure of
sampling is called cluster sampling
50
Cluster sampling
For many types of population, a list of elements is not

available, therefore, the use of an element as the
sampling unit is not feasible.
The method of cluster is available in such cases.
For example, in a city a list of all the houses may be
available, but that of persons is rarely so and list of farms
are not available, but those of villages or enumeration
districts prepared for the census are.
Cluster sampling is, therefore, widely practiced in sample
surveys.
51
Cluster sampling
For a given number of sampling units cluster sampling is more

convenient and less costly than simple random sampling
due to the saving time in journeys, identification and contacts
etc.
Cluster sampling is generally less efficient than simple
random sampling due to the tendency of the units in a cluster
to be similar
In most practical situations, the loss in efficiency may be
balanced by the reduction in the cost and the efficiency per unit
cost may be more in cluster sampling as compares to simple
random sampling
52
Cluster sampling
Clearly, the size of the cluster will influence efficiency of

sampling
In general, the smaller the cluster, the more accurate will
usually be the estimate of the population characteristic for a
given number of elements in the sample
The optimum cluster is one which would estimate the
characteristic under study with smallest standard error for a
given proportion of the population sampled, or more
generally, for a given cost.
53

5. Multi-stage sampling
Large
units are selected first and then smaller

units within the selected larger units are
selected (results in clustering)
54

5. Multi-stage sampling
One of the main considerations of adopting cluster sampling is the

reduction of travel cost
However, this method restricts the spread of the sample over
population which results in increasing the variance of the estimator
In order to increase the efficiency of the estimator with the given
cost it is natural to think of further sampling the clusters and
selecting more number of clusters so as to increase the spread of
the sample over population.
Sampling which consists of first selecting clusters and then selecting
a specified number of elements from each selected cluster is known
as two stage sampling (sub- sampling)
55
Multi-stage sampling
Clusters are generally termed as first stage units (fsus) or primary

stage units (psus)
The elements within clusters or ultimate observational units are
termed as second stage units (ssus) or ultimate stage units (usus).
This procedure can be easily generalized to give rise to multistage
sampling
It can be expected to be (i) more efficient than simple random
sampling and less efficient than cluster sampling from operational
convenience and cost point of view
(ii) less efficient than simple random sampling and more efficient
than cluster sampling from the variability point of view
56
Multi-Stage Cluster Sampling

Advantages
Huge
cost savings if survey is carried out with face-toface interviews

Useful when no frame is available for the final
sampling unit
Disadvantages
to the extent that clusters are homogeneous with

respect to the survey variables you are studying, this
may result in larger standard error (less precision of
estimates)
57
Successive Sampling
Many times surveys often gets repeated on many occasions (over

years or seasons) for estimating same characteristics at different
points of time.
The information collected on previous occasion can be used to
study the change or the total value over occasion for the character
and also to study the average value for the most recent occasion
For example in milk yield survey, we are interested in
1. Average milk yield for the current season

2.The change in milk yield for two different season
3.Total milk production for the year
58
Successive Sampling
The successive method of sampling consists of selecting

sample units on different occasions such that some units are
common with samples selected on previous occasions
If objective is to estimate the change, then it is better to retain
the same sample from occasion to occasion
For populations where the basic objective is to study the total,
it is better to select a fresh sample for every occasion
If the objective is to estimate the average value for the most
recent occasion, the retain a part of the sample over
occasions
59
Multiphase Sampling
It is well known that the prior information on an auxiliary

variable could be used to enhance the precision of the
estimator.
Ratio, product and regression estimators require the
knowledge of population mean and total for the auxiliary
variable x.
When such information is lacking, it is sometimes less
expensive to select a large sample on which auxiliary
variable alone is observed.
The purpose is to furnish a good estimate of population mean
of x
60
Multiphase Sampling
Subsequently, a subsample from the initial sample is selected

for observing the variable of interest.
For example: Consider problem of estimating total production
of cow milk in a certain region. For this purpose, village is
taken as the sampling unit and the number of milch cows in all
the villages of the region may not be available
Then investigator could decide to take a large initial sample of
villages and collect information on number of milch cows in the
sample villages
This information is used to build up an estimate of total
number of milch cows in the region
A subsample of villages is selected from the first-phase
sample to observe the study variable, viz., cow milk yield in
the village
61

6. Probability Proportional to Size (PPS)
Units
are sampled in two or more stages with

probabilities proportional to their size (a clever
solution to ensure equal sized fieldwork
assignments while maintaining equal
probabilities of selection)
62
Sampling with Varying Probability

Under certain circumstances, selection of units with unequal
probabilities provides more efficient estimators than equal
probability sampling, and this type of sampling is known as
unequal or varying probability sampling
The units are selected with probability proportional to a given
measure of size (pps) where the size measure is the value of
an auxiliary variable x
This sampling scheme is termed as probability proportional
to size (pps) sampling
In pps sampling, the units may be selected with or without
replacement.
63
63
Use of Auxiliary Information

In sampling theory if the auxiliary information, related to the
character under study, is available on all the population units
Then it may be advantageous to make use of this additional

information in survey sampling
One way of using this additional information is in the sample
selection with unequal probabilities of selection of units
The knowledge of auxiliary information may also be exploited at
the estimation stage. The estimator can be developed in such a
way that it makes use of this additional information
64
64
Use of Auxiliary Information (contd)

Examples are ratio estimator, difference estimator, regression estimator,
generalized difference estimators are the of such estimators
Obviously, it is assumed that the auxiliary information is available on all
the sampling units
Another way the auxiliary information can be used is at the stage of
planning of survey. An example of this is the stratification of the
population units by making use of the auxiliary information
Stratification I
Outline
What
is stratification ?
Implicit and explicit stratification
Systematic sampling
Implementation of stratification
Some examples of stratification
67
Review
Note: in simple random sampling all units have the same

probability of selection (the probabilities are known and
positive)
But in general, random sampling does not need to be

based on equal sampling probabilities (however they
need to be known and the need to be all positive), e.g.
some units have a higher probability of selection
68
Random Sampling
We sometimes sample with unequal probabilities
Think of the population as being divided into H subsets

(h = 1, ... H), with Nh units in the hth subset.
If we sample separately from each subset, then we call

the subsets sampling strata. If we sample nh units
from stratum h, then the sampling fraction (selection
probability) in that stratum is nh/Nh.
nh
fh
Nh
69
What is Stratified Sampling?
Stratified sampling involves sorting (stratifying) the

sampling frame prior to selection
Implicit Stratification involves sampling systematically

from an ordered (stratified) list
Explicit Stratification involves sorting the population list
(frame) into distinct strata and then sampling
independently from each stratum
It is possible (and often desirable) to combine explicit
and implicit stratification - i.e. to stratify implicitly within
explicit strata
70
Why Stratified Sampling?
The primary reason for stratification is that it ensures

(unlike SRS) that the sample proportion from any
particular stratum equals the population proportion.
will increase precision if strata are correlated with survey
measures (smaller SE and CI)
Cannot do statistical harm (estimates not less precise
than under SRS)
This is true of both explicit and implicit stratification.
A secondary motivation for stratification is to permit the
use of variable sampling fractions.
71
Systematic Sampling
Recall session 1
Involves sampling at a fixed interval down a list
If the list is ordered in some meaningful way, this has the
effect of stratification
Advantage of being easy to implement
Procedure: calculate the required interval (K=N/n), then
generate a random start (R) (random number between 1
and K). The sampled units are then the Rth, (R+K)th,
(R+2K)th etc units on the list.
72
Systematic Sampling (2)
K = N/n, where N is the total number of units on the list,

and n the desired sample size.
R is a random number between 1 and K.
Note that K need not be an integer. E.g. if desired n is

500 and N = 10,679, using K = 21.36 will give exactly n =
500, but rounding to K = 21, will give n = 508.
Do not use K = 21 and then stop once 500 are sampled:
biased! (go up to 508 sampled cases)
73
Example: Systematic Sampling

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
.
.
.
98
99
100
Determine the number of

units N=100
Determine the sample size
you want n= 20
The interval size is therefore
K=N/n = 100/20 = 5
K=5 (sample one fifth)
Select at random an integer
R from 1 to K: e.g. 4 chosen
Then select every K-th unit
74
Stratum Construction
Choose factors so that strata are homogeneous

If strata are correlated with survey measures then
increase in precision
Strata examples: e.g. regions
We can sometimes estimate the precision achievable
with different choices
Choice of number of strata:
More
strata, more precision

But variance estimation more difficult
And administration and sampling (and weighting) may be
more complex
75
Stratum Construction (2)
Cross factors with few categories rather than using many

categories for one factor
For example: stratify according to region and poor and
rich areas
When using a continuous factor (e.g. tax payments;

proportions of households with attribute A etc) choose
carefully the stratum boundaries (i.e. define sensible
categories and cut-off points)
76
Stratum Construction (3)
Choose stratifiers such that they are correlated with a

range of variables
For example, for national household surveys, tend to
choose stratifiers that are related to
Area
characteristics (e.g. rural, urban, population density
etc)
Income / occcupation (e.g social economic group, social
class )
It is common to use 3-4 stratification variables

hierarchically (see later example)
77
Example of Stratification: A General

Population Survey
The Health Survey for England (DH)
Stage 1:
Postcode Sectors stratified by:
14 Regional Health Authorities (1st-level explicit strata)
Proportion of adults with limiting long-term illness, in three
bands (2nd-level explicit strata)
Proportion of households with non-manual head, in two
bands (3rd-level explicit strata)
Proportion of households with no car, in two bands (4th-level
explicit strata)
Proportion "non-white" (5th-level stratification: implicit)
78
Example of Stratification: A General

Population Survey (2)
720 sectors were sampled systematically
Stage 2
Within each sector, addresses are in postcode order,
and selected systematically. This provides some
geographical stratification.
79
Example of Stratification: A Special

Population Survey
Survey of Recipients of Job Seekers Allowance (DSS)
Stage 1
Postal sectors were stratified by region and number
of recipients
200 sectors were selected with probability
proportional to number of recipients
80
Example of Stratification: A Special

Population Survey (2)
Stage 2
Recipients were stratified by sex (2 bands) x claim type
(4 bands) x length of continuous unemployment prior to
current claim (implicit)
25 recipients were selected systematically from each
sampled sector
81
Stratified sample: some notation
Dividing for example frame into distinct strata and then

sampling independently from each stratum results in:
H strata (or groups), stratum h=1,, H
In each stratum h there are Nh units (on population
level)
An independent sample of nh units is then selected
from each stratum h
Sampling fraction (selection probability)
nh
in the stratum is:
fh
82
Nh
Estimator Under Stratification (Example)
We have 2 strata (e.g. north and south GB)

Proportion of people 18+ years old in GB who use the
internet: P
Estimator p
H
ph *
h 1
Nh
( h 1 N h )
H
N1
N2
p p1 *
p2 *
( N1 N 2 )
( N1 N 2 )
83
DEFF under Stratified Sample
Increase in precision under stratified sample can be

estimated using the DEFF
DEFF =
2
SE STRAT
2
SE SRS
Numerator is the variance of the stratified design

Denominator is variance under SRS
2
How can
be calculated?
SESTRAT
84
Variance under Stratification
Variance of a mean:
H
N h2 sh2
x =
var
2
h1 N nh
.. . and for a proportion:
N h2 ph (1 ph )
p =
var
2
N
nh
h 1
H
85
Variance under Stratification (2)

where
h
is the stratum
s2h is the sample variance in stratum h (estimated from
sample)
Nh is the population size in stratum h
nh is the sample size in stratum h
N is the total population size (N=N1+N2 ++NH)
n
is the total sample size (n=n1+n2 ++nH)
86
Practical Limitations to Stratification
Often only possible at PSU level (e.g. household

surveys) (PSU= primary sampling unit, e.g. postcode
sector, schools etc) rather than at individual level
Correlation between strata and survey variables is
typically modest
Depends on what information available on the
sampling frame
Multi-purpose nature of surveys: optimal stratification
for one estimate may produce no benefit for another
Typically there is a lack of information about stratum
variances
87
Comparisons between Stratification

and Quota Sampling
Recall session 1
Imposing quotas has similar effect to stratification namely to reduce sampling variance
But, quota sampling also has inherent bias towards more
accessible and more willing population members
This may manifest itself as a bias in the survey
measures
Thus, quota sample estimates could have relatively high
precision, but be biased and therefore have low
accuracy (high mean squared error) (session 3)
88
Stratification II
Outline of session
Variable
Sampling Fractions
Motivations
Optimal allocation
Design effects
90
Variable Sampling Fraction (VSF)
We sometimes sample with unequal probabilities
Think of the population as being divided into H subsets

(h = 1, ... H), with Nh units in the hth subset.
If we sample separately from each subset, then we call

the subsets sampling strata. If we sample nh units
from stratum h, then the sampling fraction (selection
probability) in that stratum is nh/Nh.
nh
fh
Nh
91
Variable Sampling Fraction (VSF)
For unbiased estimation, each sampled unit i must be

assigned a weight in inverse proportion to its selection
probability.
This is usually referred to as the sampling weight or
design weight: wi
An example of such a weight in the case of stratified
sampling would be:
Nh
wi
for i h
nh
if sample unit i belongs to stratum h
92
Use of weights
So when certain types of units have been selected

based on different selection probabilities (oversampling)
then the sample weights need to be taken into account in
estimation
Corrective weighting is needed to get design-unbiased
estimates
If weights are ignored then sample estimates are biased
93
Motivations for VSF
1. To increase the sample size of small groups

(i.e. to get acceptable confidence intervals for
estimates based on those groups)
2. Because the frame / selection method gives us
no choice
3. To increase precision of estimates by oversampling more variable strata
94
Examples
1. A national survey where estimates are also required for
each of the component countries /regions
E.g. survey of the UK, but estimates for Scotland, Wales and NI
are also needed separately
Then a larger sampling fraction might be used in Wales and
Scotland compared to England.
2. Sampling minority ethnic groups:

a high proportion of the minority ethnic population live within a
relatively small proportion of areas
Oversampling such (ethnically dense) areas will increase
achieved sample sizes while reducing survey costs.
95
Use of Variable Sampling Fractions
Now we want to investigate further the effects of using

variable sampling fractions
We have seen we need to use weights
We want to investigate under which circumstances
precision in survey estimates is increased and when
precision is reduced after using VSF
Or in other words: what is the effect of oversampling
on the precision of estimates?
96
Standard Errors for Stratified Sampling
We have already introduced in last session a

formula for the variance
Generally, it is for a mean:
H
N h2 sh2
x =
var
2
h1 N nh
nh
1 N
(6.1)
And for a proportion:
N h2 ph (1 ph )
nh
p =
var
1 N
2
N nh
h
h 1
H
97
(6.2)
Variance under Stratification (2)

where
h
is the stratum
s2h is the sample variance in stratum h (estimated from
sample)
Nh is the population size in stratum h
nh is the sample size in stratum h
N is the total population size (N=N1+N2 ++NH)
n
is the total sample size (n=n1+n2 ++nH)
98
The finite population correction
The expression
nh
1
Nh
is referred to as the finite population correction
This term is only important if nh/Nh not close to 0

Usually nh/Nh is very close to 0 (since N very large;
even if n quite large) and the finite population
correction can be ignored
Remember (standard error):
SE x Var x
99
Variance under Stratification
If we ignore the finite population correction (for every

stratum) we can simplify this to:
Variance of a mean:
H
N h2 sh2
x =
var
2
N
nh
h 1
(6.3)
Variance of a proportion:
N h2 ph (1 ph )
p =
var
2
N
nh
h 1
H
100
(6.4)
In addition to the simplification of the variance

estimation formulae for a mean and a proportion if we
ignore the finite population correction (fpc), we note:
Differences between strata do not contribute to

variance. So, we should construct strata as
homogeneous (small sh2 ) as possible
101
Note that in the special case where we use the same

sampling fraction in each stratum, each of the
variance formulae simplify further.
We can substitute n/N in place of nh/Nh, and nh/n in
place of Nh/N. (6.3) and (6.4) then become:
For a mean:
n s 2
x =
var
For a proportion:
p =
var
102
h h
2
(6.5)
nh ph 1 ph
n
(6.6)
We will look more at (6.5) and (6.6) later.
First, we will concentrate on Variable Sampling

Selections. In the presence of VSFs, we need formulae
(6.3) and (6.4), ignoring the fpc.
103
Example: Over-Sampling More

Variable Strata
Sometimes, we can identify strata that have high

population variances ( Sh2 large). Over-sampling
these strata will tend to increase the precision of the
survey estimates (reduce standard errors).
We can only do this if we have advance estimates of
stratum variances.
Example to illustrate this:
Suppose H = 2 and N1 = N2 (=N/2).
Suppose we know (or estimate) that
S12 2S22
104
Example (cont)
Then we can substitute into expression (6.3) (ignoring

the fpc and looking at the population variance rather
than the estimated variance) and we get:
2 N 2 S2 2
var x =
2
4
N
n1
N 2 S22
2
4 N n2
S2
S2
=
2n1 4n2
2
105
Example (cont)
Now, consider two alternative sample designs:
a.) Proportional allocation
i.e. where
nh N h
n N
b.) A higher sampling fraction in stratum 1

i.e. n1 larger than n2
106
It follows:
For
a.) Substitute n1 = n2 = n/2 :
S2 2 S2 2
S2 2
var x
1.5
2n
n
n
b.) Substitute e.g. n1 = 0.58n; n2 = 0.42n :
S2 2
S2 2
S2 2
var x
1.457
n
1.16n 1.68n
107
Example (cont)
So, the sampling variance is slightly smaller under

design b)
It is smaller by a ratio of 1.457/1.5, i.e. 0.97
This is the design effect due to over-sampling the
more variable stratum (VSF):
DEFFVSF
2
SEVSF
1.457
0.97
2
1.5
SE SRS
DEFTVSF
DEFFVSF 0.98
108
Example (cont)
This example illustrates how precision can be increased

by the use of Variable Sampling Fractions! (in the case
of oversampling strata with high stratum-variances)
This approach is quite common for repeated business

and agriculture surveys, but rare for household surveys.
109
Note
We have seen when considering case b.) that a higher

sampling fraction in a stratum led to increased precision
Therefore: Important to consider which stratum allocation
will maximise survey precision (under the assumption of
not equal stratum variances)
110
Optimal Allocation
In general, the optimum allocation rule is to set:

nh
Sh
Nh
Ch
where Ch is the unit cost of data collection for a unit in
stratum h.
If data collection costs do not vary between strata, this
simplifies to:
nh / N h Sh
If stratum variances are equal, it further simplifies to a

constant K:
nh / N h K
111
Optimal Allocation (cont)
The last case demonstrates that an equal probability

selection method is optimum in the situation where
variances and data collection costs are equal in all strata
(other things being equal).
112
Example: VSFs with Equal Stratum

Variances
Example:
Again suppose H = 2, and N1 = N2.
But now suppose that stratum variances are
equal, i.e.
S12 S22
Again consider two different sampling schemes:

nh N h
a.) Proportional allocation
b.) Sampling fraction in stratum 1 is twice that in

stratum 2, i.e. n1 = 2n/3; n2 = n/3.
113
Example (cont)
Then, with design a), we find (from expression 6.3, again

ignoring the fpc):
2
2
N S2 N S2
2
2
N 2S2 S2
var x =
2
n
n
n
2N n
N 2
N 2
2
2
(Note: this is the formula of the variance of a mean under

SRS!)
114
Example (cont)
With design b), we find:

2
2
N S2 N S2
2
2
S2
S 2 9S 2
var x =
2
n
n
2
n
n
8n
2
N 2
N
4
4
3
3
3
3

It follows:
DEFFVSF
2
SEVSF
9 S 2 / 8n
9 / 8 1.125
2
2
SE SRS
S /n
115
Example (cont)
This means:
The sampling variance under design b) is 9/8 (=1.125)
times that under design a).
By allocating disproportionately, we have lost precision
(in the case of equal stratum variances)!
In general, precision will be lost whenever variable

sampling fractions are used, if the stratum variances do
not vary (much).
The level of precision loss depends on the range of the
weights used
116
Design Effects due to VSFs
If we can assume stratum variances to be equal, there is an

alternative and often-used way to estimate effect of VSFs on
sampling variance.
Expression 6.1 can be used to derive expression for
effective sample size:
nh wh
nh wh2
2
VSF
neff
where: nh is the sample size in stratum h and wh is the

weight given to each case in stratum h. (Remember that wh
will be proportional to Nh/nh)
117
Design Effects due to VSFs (cont)
Note that this expression only takes into account the

effect of VSFs on effective sample size, not the effect of
any other aspect of design.
Formula on previous slide can be used at design stage
to predict impact on precision of alternative allocations to
strata!
118
Design Effects due to VSFs (cont)
In general, it will be found that:

larger range of sampling fractions (weights) results
in a smaller neff (i.e. greater loss of precision)
over-sampling a large subgroup results in greater
loss of precision than over-sampling a small
subgroup
when main aim is to produce estimates for
subgroups, equal sample sizes per subgroup will be
an efficient design
when the main aim is to produce estimates for the
total population, equal sampling fractions will be
efficient.
119
Graphical illustration of neff
The following graph illustrates the effect of oversampling

on survey precision for a sample with 2 strata (H=2)
The graph shows relationship between
the proportion of the sample in stratum 1 (n1/n) (x-axis)

and the consequent loss of precision, as measured by the design effect
(y-axis).
The three lines relate to three oversampling rates and

the subsequent relative weights that need to be used:
2:1, 4:1 and 10:1 (i.e. w1=1 in all cases).
(2:1 means that stratum 1 is oversampled by a factor of
2)
120
3.4
DEFF VSF
3
2.6
2.2
1.8
1.4
1
0
0.6
0.4
0.2
0.8
n1/n
w2=2
121
w2=4
w2=10
Graphical illustration of neff

The
graph illustrates the two points made

earlier:
larger range of sampling fractions (weights)
results in a smaller neff (i.e. greater loss of
precision)
over-sampling a large subgroup results in
greater loss of precision than over-sampling a
small subgroup
122
Multi-Stage Sampling
Outline of session
What is multi-stage / cluster sampling

Motivations for multi-stage sampling
Choice of sampling units, sample sizes at each
stage
Selection probabilities and weighting
Probability Proportional to Size (PPS) sampling
Design effects due to clustering
124
What is Multi-Stage Sampling?
The units in the population are arranged hierarchically

A 3-stage design would entail:
Primary sampling units (PSUs)
Secondary sampling units (SSUs)
Sample elements
It would be necessary to assign every element uniquely

to one SSU and every SSU uniquely to one PSU
125
What is Multi-Stage Sampling?
Stage 1: select sample of PSUs
Stage 2: select sample of SSUs within each selected

PSU
Stage 3: select sample of elements within each selected

SSU
Note that there could be any number of stages: 2, 3 or 4

are common
126
Examples:
general population survey :

PSUs might be postcode sectors
SSUs might be households
Elements might be persons
business survey :
PSUs might be companies
SSUs might be workplaces
Elements might be employees
127
Why Multi-Stage Sampling?
No frame of elements available, but frame of PSUs

available (examples: national sample of school pupils, where
schools could be PSUs; US face to face survey where counties are
PSUs)
Cost of data collection (example: general population sample

involving face-to-face interviewing)
Access to elements may only be via gatekeepers

(examples: students, employees, trainees)
Data quality (example: in the case of face-to-face interviewing,

field work can be better supervised if in clusters)
128
Design Choices (clustering):

Example: Field interviewing
Constraint
Implication
Tight field work periods Small workload per interviewer

Completion depends on
slowest interviewer
Equal interviewer workloads
Efficient fieldwork
Each workload in small area
Training/ briefing/
learning costs
Large workload per interviewer
129
Design Choices (clustering):

Some General Points:
Larger clusters will generally result in larger design
effects due to clustering (see later)
But larger clusters will also generally result in larger cost
savings (e.g. field interviewers, gatekeepers)
Necessary to make an appropriate compromise: i.e.
where cost saving outweighs loss in precision, to
produce higher overall accuracy per unit cost
(remember key aim of sample design: minimising costs,
maximising accuracy)
130
Selection Probabilities: Principle
With multi-stage sampling, the selection probability of

each element is the product of the (conditional) selection
probabilities at each stage
e.g. probability of sampling unit i in SSU j in PSU k is
Prijk = Pr (k) x Pr (j | k) x Pr (i | j,k)
So, it is important to control and record the selection

probabilities at each stage.
131
Selection Probabilities
Other things being equal, it is desirable to keep selection

probabilities equal for all elements (remember:
stratification; otherwise loss in precision).
If selection probabilities are not equal, we will need to
weight each sampled element ijk by
wijk = 1/Prijk
for unbiased estimation.
132
Selection Options
With multi-stage sampling, there are many ways to

achieve equal selection probabilities.
(epsem design = equal probability of selection method; =
self-weighting design)
In the (rare) case of equal size PSUS and 2-stage
sampling, we can easily select PSUs (js) and elements
(is) with equal probability.
Example: Design (0):
Pr(j) =1/3 and Pr (i|j)=1/2 and the overall probability is
Pr(i) = 1/3 * 1/2 = 1/6 for all i.
133
Selection Options
In many types of sampling situations having equal size

PSUs is rare. In the case of unequal sized PSUs we
are left with 3 alternative designs:
1. select PSUs with equal probabilities and then a fixed
number of elements within each - gives unequal
selection probabilities (not an epsem design)
2. select PSUs with equal probabilities and then a
variable number of elements within each, to give
equal overall selection probabilitiesx
3. select PSUs with PPS (probability proportional to
size), then a fixed number of elements within each
134
Selection Options
Design 1) undesirable because it will generally cause

loss in precision compared with an epsem design; nonepsem design undesirable; weighting needed
Design 2) avoids this problem, but causes practical
problems. Number of elements sampled per PSU will
vary in proportion to the population size of PSU.
Elements in one PSU typically form one interviewer
workload, so this is undesirable.
Also, with design 2) the sample size is not fixed in
advance - it is a random variable. Very undesirable!
135
Selection Options
Design 3) overcomes all these problems, but it depends

on the availability of a reasonably accurate measure of
the number of elements in each PSU (and SSU, if a 3
stage design).
Note: when accurate measures of number of elements
within each PSU not available it may be possible to get a
reasonable good estimate of the measure of size and to
proceed with PPS sampling
The next slide discusses this design further:
136
Probability Proportional to Size (PPS)

Selection
Example: A 2-stage design

set Pr (j) proportional to Nj (number of elements in
population in PSU j = PPS sampling).
So Pr (j) = C Nj.
We then select the same number of elements, D, from
each sampled PSU, so Pr (i| j) = D/ Nj.
Then,
Pr (i) = Pr (j) x Pr (i|j) = C Nj x D/ Nj = CD, which is
the same for every element
137
Implementation of a PPS Design
We do not need to calculate the selection probabilities at

each stage in order to make the selection.
We need only to create a cumulative total down the list
of PSUs (e.g. 10,000) and then sample systematically
down that list of totals, including each PSU within which
the interval falls
138
Implementation of a PPS Design
Example: Selection of 3 PSUs from 10 with PPS and 25

units from each selected PSU, so that n=75
Pr(j) is probability of selecting the PSU
Pr(i|j) is the probability of selecting each unit, given that
PSU has been selected, and
Pr(i) is the overall probability of selecting each unit.
It can be seen that each of the 10,000 units in the
population has the same selection probability:
139
Example of a PPS Design

PSU Size (Nj) Pr(j)=C*Nj
1
2
3
4
5
6
7
8
9
10
1000
900
800
1200
1500
1300
1100
500
1000
700
P (i) =
Pr(i| j)=D/Nj P(j) x P(i| j)=C*D
3x1000/10000
3x 900/10000
3x 800/10000
3x1200/10000
3x1500/10000
3x1300/10000
3x1100/10000
3x1500/10000
3x1000/10000
3x 700/10000
25/1000
25/ 900
25/ 800
25/1200
25/1500
25/1300
25/1100
25/ 500
25/1000
25/ 700
C=3/10000
140
D=25
________
10000
75/10000
75/10000
75/10000
75/10000
75/10000
75/10000
75/10000
75/10000
75/10000
75/10000
Example of a PPS Design (cont)
We would select the sample of PSUs as follows:

N = 10,000 and n = 3 (PSUs).
To select systematically (see session: stratification I), K
=N/n= 3333 and R needs to be a random number
between 1 and 3333. Suppose we happen to generate
R = 1,050.
Then, we sample the PSUs that contain elements 1050,
(1050 + 3333) and (1050 + 2x3333), i.e. PSUs 2, 5 and 7
:
141
Example of a PPS Design (cont)

PSU
Size
Cum. size
Selection
_______________________________________________________
1
2
3
4
5
6
7
8
9
10
1000
900
800
1200
1500
1300
1100
500
1000
700
1000
1900
2700
3900
5400
6700
7800
8300
9300
10000
142
*
*
Some Limitations of PPS Sampling of

PSUs
We might have only imperfect estimates of number

of elements in each PSU (the size measure)
We could then adjust the sample size within each
PSU to keep overall probabilities equal or we might
simply weight by 1/Pr(i)
Sampling interval might be smaller than number of
elements in some PSUs. (This will only happen if
sampling fraction of PSUs is large and/or size of
PSUs highly variable.) Those PSUs will be certain
to be sampled, and could be sampled more than
once.
143
Some Limitations of PPS Sampling of

PSUs
We might place these PSUs in a separate stratum and

include them with certainty. We might also increase their
sample size of elements, to keep overall probabilities
equal, or we might weight
144
Design Effects due to Clustering
Clustering tends to increase sampling variance (but this

is partly offset by the fact that a larger sample size can
be obtained for any given cost).
This is because units within a cluster tend to be more
homogeneous than units as a whole.
Clustering is therefore tending to have the opposite

effect to stratification.
145
Example of Homogeneity of Clusters

Let us consider the following example to illustrate the effect
of clustering:
Population of 6 people, with values: 1, 1, 2, 2, 3, 3.

Population mean = 12/6=2
Population variance:
6
var (X) =
1
2
2 ( xi =2)4/6
6 i 1
146
= 2/3
Example (cont)
a) divide population into 3 clusters: (1,1) (2,2) and (3,3).
Then: no variance within clusters (homogeneous
clusters). But variance between the cluster means is:
var (XB) = [(1-2)2 + (2-2)2 +(3-2)2] /3 = 2/3.
It implies that sampling variance is greater than 0 since
we get different estimates of the mean depending on
which cluster is sampled.
147
Example (cont)
b) divide the population into 2 clusters: (1,2,3) (1,2,3).
No variance between cluster means. But variance
within each cluster is:
Var (XW) = 2* [[(1-2)2 + (2-2)2 +(3-2)2]/3] /2 = 2/3
The sampling variance is 0 since there is no
variability in sample means.
With design a) all the variance is between clusters clusters are perfectly homogeneous.
With design b), clusters are as heterogeneous as the
population as a whole, so cluster sampling would not
cause a loss in precision.
148
Example (cont)
If we sample one cluster (and then include all elements),

design a) has a sampling variance of 2/3; design b) has
a sampling variance of 0.
This illustrates the general point that sampling variance

will be greater if clusters are relatively homogeneous
(i.e. like in a) )
149
Design Effects due to Clustering (cont)
Typically, the sorts of units that we tend to use as PSUs

are relatively homogeneous, so in practice clustering
nearly always results in a design effect due to clustering
which is greater than one.
Examples:
people within postcode sectors,
pupils within schools,
students within classes
employees within firms.
150
Intra-Cluster Correlation
The design effect due to clustering is
DEFFCL 1 b 1
where b is sample size per cluster (in practice b may
vary slightly, in which case mean cluster size provides an
adequate approximation), and (roh) is the intracluster correlation.
=0: randomly sorted clusters

=1: perfectly homogeneous clusters
151
Intra-Cluster Correlation (cont)
Note that is a population characteristic relating to the

chosen definition of PSU, but sample design should
involve a careful choice of b.
Examples of possible values:

DEFFCL 1
b=10: if =0 then
DEFFCL 10
b=10: if =1 then; if then
DEFFCL 1.45.
more realistically, b=10, if =0.05 then
152
Inflation due to clustering
Reminder: the square root of DEFF is DEFT
DEFTCL inflates confidence intervals of the mean (or

proportion) as follows:
x 1.96 * SE * DEFTCL
153
Example of Intra-Cluster Correlations

From the British Social Attitudes Survey:
DEFT
DEFT
if b=10
Variable
Household size
Owner-occupier
Has telephone
Asian
Roman Catholic
0.070
0.231
0.102
0.334
0.037
16.6
16.5
16.5
8.3
16.4
1.45
2.14
1.61
1.86
1.25
1.28
1.75
1.38
1.53
1.15
Not racially prejudiced

Extra-marital sex wrong
Dodging VAT is OK
0.021
0.044
0.021
8.4
8.3
8.2
1.08
1.15
1.07
1.03
1.08
1.04
154
Example of Intra-Cluster Correlations
Note
islow for attitudinal variables, so design effects
small (DEFT small). But large for variables related to
ethnicity and housing type.
Thus, the most effective degree of clustering might be

greater for an attitude survey (fewer, larger clusters) than
for a housing survey.
155
References
Cochron, W.G., (1977). Sampling techniques; Wiley Eastern Ltd.
Des Raj, (1968).

Company Ltd.
Sampling
theory;
Tata-Mcgraw-Hill
Publishing
Hansen, M.H. and Hurwitz, W.H. (1943b). On the theory of sampling

from finite populations; Ann. Math. Statist., 14, 333-362.
Hansen, M.H., Hurwitz, W.H. and Madow, W.G., (1993). Sample survey
methods and theory, Vol. 1 and Vol. 2; John Wiley & Sons, Inc.
Murthy, M.N., (1977). Sampling theory and methods; Statistical
Publishing Society
Sukhatme, P.V., Sukhatme, B.V., Sukhatme, S. and Ashok, C. (1984).

Sampling theory of surveys with applications; Indian Society of
Agricultural Statistics.
156
156
157
157

1.overview of Sampling Scheme Hukum Chandra

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

1.overview of Sampling Scheme Hukum Chandra

Hochgeladen von

Copyright:

Verfügbare Formate

VARIOUS ELEMENTARY CONCEPTS

experiences (if any) do you have of Survey

introduce various sampling schemes

Complete Enumeration (Census)

Examples of such situations are preparation of voter list for election

Need for Sampling

An effective alternative to a complete enumeration is sample

In certain investigations, it may be essential to use specialized

If a sample survey is carried out according to certain specified

Sampling proceeds in several stages:

Population to be studied (Identify the population of interest)

Choose tools and techniques for making observations, e.g.

Select (sample) some members of the population (units)

Populations and samples

The process of how to obtain a sample from the population is

exhaustive list of all the sampling units constitutes a sampling frame.

Sample: A subset of the population.

of the population selected from a sampling frame for the purpose of

The error which arises due to use of sample to estimate the

Besides sampling error, the sample estimate may be

Alternatives to Sample Surveys

Analysis of administrative records (administrative

Sample Surveys vs Admin Data

Administrative data may not focus on same population

Sample Versus Census

From Population to Sample

Population parameter (e.g. population mean, average

From Population to Sample

Estimator: An estimator is a statistic obtained by a

The estimator is a random variable and its value differs

Estimate: The particular value, which the estimator

Population parameter: population mean income

Sample statistics: mean income in the sample

The sample statistic may be used as an estimate for

Population parameter: population mean income

Sample statistics: mean income in the sample

The sample statistic may be used as an estimate for

Types of SamplesDifferent Sample Designs

A sample design is a plan determined before any data

Non-Probability versus Probability Samples

Non-Probability versus Probability Samples

Non-Probability versus Probability Samples

a sample that is selected by a random mechanism,

important when choosing a random sample, that the

Pros and Cons

Probability vs Quota samples

Assessing a Sample Design

Virtually all surveys that are taken seriously by social

One way to ruin an otherwise well-conceived survey is to

Types of Probability Samples

Probability sampling methods

chosen selections using a random

Simple Random Sampling (SRS)

is the simplest and most basic method of sampling in which

it is a method of selection of n units out of a

A sampling procedure in which all possible combinations of n

Simple Random Sampling (SRS)

selecting a simple random sample in practice, units from population

a unit is selected and observation is recorded and then returned to the

Simple random sampling

Probability sampling methods (cont)

start followed by successive application of

Example: Systematic Sampling