Sie sind auf Seite 1von 9

Understanding People

Sample Matching
Sample Matching
Representative Sampling from Internet Panels
A white paper on the advantages of the sample matching methodology by Douglas Rivers,
Ph.D. - founder; President and CEO of YouGovPolimetrix, Inc. and Professor of Political Sci-
ence at Stanford University.
285 hamilton avenue suite 200 palo alto ca 94301 T 650.462.8000 F 650.462.8422 www.polimetrix.com

Introduction panel of Internet users who allow their Web


Sampling matching is a new methodology traffic to be monitored. Knowledge Net-
for the selection of study samples from pools works uses RDD to recruit a panel of both
of opt-in respondents. This methodology ad- existing Internet users and non-users. Those
dresses the primary substantive and techni- without home Internet access are provided
cal issues of how large, but unrepresentative, with an inexpensive device that allows them
panels can be used to construct represen- to be interviewed on the Internet. However,
tative study samples for particular target both NetRatings and Knowledge Networks
populations. The procedure uses a listing have struggled with low response rates, high
or enumeration of the population that can costs, and limitations imposed by small
be obtained from large scale consumer and panel size.
voter databases that have been developed
Sample quality is largely a function of two
in recent years. The existence of such data
factors: population coverage and selec-
has not been exploited in previous Internet
tion bias. Population coverage refers to the
research. On both a theoretical and a practi-
proportion of the target population that is
cal level, this approach substantially im-
reachable, while selection bias refers to the
proves upon existing weighting procedures.
willingness of reachable respondents to com-
As validation, we show how this procedure
plete an interview. It would be nonsensical,
performed in predicting the outcome of the
for example, to use an opt-in Internet panel
2005 California special election.
for a study of non-internet users, since the
panel lacks coverage of that population. On
1. The Web Sampling Problem the other hand, even if a population can be
Most samples today, whether for phone or reached by RDD, sample quality will still be
the Internet, do not approximate random poor if patterns of respondent cooperation
samples. In the case of phone surveys, where cause selection bias.
random digit dialing (RDD) or random
selection from a list is used to select respon- 1.1 Population Coverage
dents, typical response rates for media polls In the early days of Internet surveys, the
or market research surveys are in the range primary sampling problem was the Digital
of 20 percent. As a result, sample selection is Divide. Internet usage was concentrated in
primarily determined by who chooses to re- more affluent and better educated segments
spond, not the random selection mechanism. of the population, while racial minorities,
the elderly, and women were substantially
In the case of web surveys, most Internet underrepresented among Internet users.
panels do not claim to be randomly selected. Today, nearly three quarters of the adult
Panel members are recruited by a variety of population has access to the Internet, either
means (banner ads, email lists, promotions, at home, work, or school, so that most of
and offers) and those who opt-in become the population is, at least in principle, reach-
the pool of respondents available for sample able by the Internet. Usage rates are lower
selection. for African Americans, Latinos, persons
with a high school education or less, and the
A few Internet panels, such as NetRatings
elderly, but none of these groups is excluded
and Knowledge Networks, do use random
altogether.
selection. NetRatings uses RDD to recruit a

2
Figure 1.1: Race and Internet Access Internet panels and RDD phone samples. In
fact, the degree of under-representation of
these groups (except for the elderly, dis-
cussed in more detail below) is not much
different in an opt-in Internet panel, than in
an unweighted RDD phone sample. Table
1.1 shows the proportion of several diffi-
285 hamilton avenue suite 200 palo alto ca 94301 T 650.462.8000 F 650.462.8422 www.polimetrix.com

cult to reach groups in national media polls


conducted by one of the national television
networks during 2004.

Table 1.1: Unweighted Sample Composition of a National


Media Poll

Figure 1.1 provides data on Internet access Avg.


Implied
by race, as measured by the Current Popu- Census of 11
Weight
lation Survey. Internet usage has grown at Surveys
about the same rate in all racial groups. The
effect of this growth, however, has been to Blacks 11.0% 7.9% 1.4
substantially reduce (though not eliminate)
the degree to which minority groups are
Hispanics 12.4% 4.8% 2.6
underrepresented among Internet users. In
1997, for example, whites were more than
twice as likely to have Internet access as Aged 18-24 12.3% 6.4% 1.9
blacks and Hispanics. By 2003, whites were
only about a third more likely to have Inter-
HS or less 46.6% 32.7% 1.4
net access as blacks. Similar patterns can be
found in other groups. The Digital Divide
has diminished substantially and will largely Postgraduate 8.7% 17.2% 0.5
disappear in the next decade, as the Internet
becomes the vehicle for the delivery of home Never
23.8% 16.2% 1.5
entertainment and communications services. Married
Even today, Internet coverage is adequate
for most types of research. The problem is Table 1.2: Composition of Opt-in Web Panel
not coverage who can be reached on the
internetbut sample selection. Web Internet
Census
Panel Users
1.2 Selection Bias
Most Internet surveys are not conducted
using a random sample of Internet users. Blacks 4.3% 9.3% 11.0%
Instead, access panels have been devel-
oped from which samples are selected for Hispanics 3.3% 7.2% 12.4%
individual studies. The properties of these
panels vary depending upon how they were
Aged 18-24 8.7% 16.0% 12.3%
recruited. In this section, we compare selec-
tion biases in Internet surveys with selection
biases in phone surveys. Postgraduate 23.3% 14.7% 8.7%

Different types of people have different


propensities for participation in survey Married 60.4% 55.3% 54.3%
research. These propensities lead to under-
representation of certain groups in both Male 58.8% 48.7% 48.9%

3
The conclusion to be drawn from these data especially among younger age groups. (Over
is not that opt-in Web panels are representa- 25 percent of those between the ages of 18
tive of any particular population. This is and 29 are not reachable on land lines.)
demonstrably false people who opt-in for Because of regulations on outbound calls
taking Web surveys have different demo- to cell phones, this population is no longer
graphics than either the population of all reachable in a RDD phone sample. Phone
Internet users or the population of all adults. coverage, which as recently as five years
285 hamilton avenue suite 200 palo alto ca 94301 T 650.462.8000 F 650.462.8422 www.polimetrix.com

But the same is true for RDD telephone sam- ago was in excess of 96 percent of the adult
ples. In both cases, an appropriate method- population, now appears to be under 90
ology is required to produce usable samples percent and will continue to fall.
for individual studies. We will discuss vari-
ous solutions to this problem in sections 2 Caller ID and answering machines make it
and 3 below. harder to contact respondents as well. In a
short field period, it is practically impossible
1.3 The Elderly on the Internet to contact more than half of the working
The Internet is often viewed as a venue for numbers in a RDD sample. This pushes
the young. Among the elderly, there tend to overall response rates to well under 50
be fewer Internet users and a larger propor- percent.
tion who express no interest in having Inter-
net access. While both statements are true, Finally, declining cooperation for all types
a lesser known fact is that elderly Internet of surveys (including in-person interviews)
users are much more likely to participate in has reduced the completion rate among
web surveys. Therefore, most Internet panels contacted respondents. The overall response
have an excess of elderly participants, not a rates are so low that few survey organiza-
shortage. tions publish them for phone studies. To
some degree, the growing acceptance of opt-
Of course, the relevant question is not in Internet samples just reflects a realization
whether a panel has too many or too few that most phone samples are opt-in samples
elderly, but whether its elderly participants too.
are representative or atypical of the elderly
population. The evidence suggests that el- 2. Current Practice for Selection
derly web survey participants are somewhat
and Weighting
differentmore affluent and knowledgeable
about technologybut, after controlling 2.1 Quota Sampling
for these factors, similar to elderly phone By far the most common method for sample
respondents. The problem of sampling the selection in consumer market research is
elderly using an opt-in Internet panel pro- quota sampling. In quota sampling, one
vides a good illustration of the issues that a defines a set of groups (e.g., men, women,
valid sample selection procedure must deal 18-29 year olds, 30-64 year olds, 65+, etc.)
with. There are usually some characteristics and specifies how many respondents should
associated with sample selection that need be recruited for each group. Recruitment
to be identified to correct sample biases. In is then done on an ad hoc basis and any
many years of experience with phone sur- respondents in excess of the specified quota
veys, these factors have, for the most part, are turned away.
been identified and reasonably satisfactory
measures developed for handling them. Needless to say, quota sampling has no
basis in sampling theory, since the survey
1.4 Problems with Phone Samples researcher has almost complete discretion
The quality of phone samples, however, has in the selection of respondents within the
been deteriorating for a variety of reasons. cells. In practice, the hard to- fill quotas
First, cell phones have replaced land lines, are the last to be filled and often end up

4
being highly unrepresentative. For example, weighting can often have serious implica-
many phone surveys use explicit or im- tions for survey estimates. The reliability of
plicit quotas for gender, since men are more these estimates then becomes a subjective
difficult to reach by phone than women. judgment about which variables to use in
Different devicessuch as asking for a weighting.
male respondent first and then, if none are
available, accepting a female respondent 2.3 Cell Weighting
285 hamilton avenue suite 200 palo alto ca 94301 T 650.462.8000 F 650.462.8422 www.polimetrix.com

are employed to balance phone samples. An alternative to raking is cell weighting,


The resulting samples are often very un- where the population is divided into a set of
representative of men, since the available mutually exclusive and exhaustive categories
men are less likely to be employed and often (or cells). The sample is then weighted by
older. Some media organizations have tried the ratio of the population fraction in each
to address this problem by asking first for cell to the corresponding sample fraction.
the youngest male at home and, if unavail- This is sometimes called post-stratification.
able, then to ask for the oldest female. These It differs from the usual type of stratifica-
procedures also do not produce accurate age tion in that the sample observations in each
distributions within gender groups. cell are not a sample from the corresponding
sub-population because of non-response.
Quota sampling is a relic of the 1930s and The procedure is valid if an ignorability as-
should not be employed in the twenty-first sumption, similar to that described below,
century. It is, unfortunately, the standard holdsthe survey measurements need to be
sampling procedure for most web surveys. conditionally independent of non-response
given the variables used for post-stratifica-
2.2 Raking tion.
For samples that have already been selected,
the most popular method of weighting is There are two primary deficiencies of cell
the method of raking, also known as rim- weighting. First, if the weights are large,
weighting, first proposed by W. E. Deming the estimates can be highly inefficient and
during the 1940s. In raking, the sample unstable. It is common practice to trim the
marginals are forced to match the known weights (so, for example, weights are con-
population marginals (from a census or strained to lie between, say, and 2), but
other source) by an iterative procedure. The with current phone and Internet samples,
primary advantage of raking is that it does larger weights are often needed to deal with
not require the joint distribution of the vari- differential nonresponse. Second, usually
ables to be known. It has a number of seri- the cross-classification of only a few vari-
ous disadvantages. First, if the population ables is available, so cell weighting is only
marginals are skewed the iterative weighting applicable with a small number of variables
procedure often does not converge. Second, and categories. This means that the range of
it generally does not find the correct weight- nonresponse problems that can be remedied
ing for combinations of variables. It can with cell weighting is limited.
be shown that the implied joint distribu-
tion maximizes the entropy over a certain 3. Sample Selection by Matching
class of distributions. Since the weighting
variables are often expected to be highly 3.1 Description of Sample Matching
inter-correlated (e.g., race, education, and Methodology
income), this is undesirable behavior. Third, Sample matching is a newly developed
and perhaps most important, raking yields methodology for selection of representa-
unstable and unreliable estimates when tive samples from non randomly selected
the number of variables used to weight the pools of respondents. It is ideally suited for
sample is large. Which variables are used for Web access panels, but could also be used

5
for other types of surveys, such as phone The purpose of matching is to find an avail-
surveys. able respondent who is as similar as possible
to the selected member of the target sample.
Sample matching starts with an enumeration The result is a sample of respondents who
of the target population. In other con- have the same measured characteristics as
texts, this is known as the sampling frame, the target sample. Under certain conditions,
though, unlike conventional sampling, the described below, the matched sample will
285 hamilton avenue suite 200 palo alto ca 94301 T 650.462.8000 F 650.462.8422 www.polimetrix.com

sample is not drawn from the frame. For a have similar properties to a true random
study of registered voters, the target popula- sample. That is, the matched sample mimics
tion is the set of registered voters, who are the characteristics of the target sample. It is,
enumerated (with some exceptions) in the as far as we can tell, representative of the
registered voter list. For general population target population (because it is similar to the
studies, the target population is all adults, as target sample).
enumerated (again with some exceptions) in
consumer databases maintained by commer- 3.2 Selection of the Target Sample
cial vendors such as Acxiom, Experian, and In explaining the sample matching meth-
InfoUSA. The development of comprehen- odology, it may be helpful to think of the
sive consumer and voter databases is a rela- target sample as a simple random sample
tively recent phenomenon that has important (SRS) from the target population. However,
implications for survey sampling. the efficiency of the procedure can be im-
proved by using stratified sampling in place
Sample selection using the matching meth- of simple random sampling. SRS is generally
odology is a two-stage process. First, a less efficient than stratified sampling because
random sample is drawn from the target the size of population subgroups varies in
population. We call this sample the target the target sample.
sample. Details on how the target sample is
drawn are provided below, but the essential With stratified sampling, we partition
idea is that this sample is a true probability the population into a set of categories (or
sample and thus representative of the frame strata) that are believed to be more ho-
from which it was drawn. mogeneous than the overall population. For
example, we might divide the population
Ideally, we would interview the respondents into race, age, and gender categories. The
in the target sample and conventional sam- cross classification of these three attributes
pling theory would describe the properties divides the overall population into a set of
of the sample. However, we have no eco- mutually exclusive and exhaustive groups or
nomical way of contacting most members strata. Then a SRS is drawn from each cat-
of the target sample: they have not provided egory and the combined set of respondents
their email addresses to us, many do not constitutes a stratified sample. If the num-
have listed phone numbers, and those who ber of respondents selected in each strata is
do have listed numbers may not agree to be proportional to their frequency in the target
interviewed. Therefore, we do not attempt population, then the sample is self-represent-
to interview members of the target sample. ing and requires no additional weighting.
Instead, for each member of the target sam- At YouGovPolimetrix, we usually stratify on
ple, we select one or more matching mem- race, gender, and age. For political studies,
bers from our pool of opt-in respondents. we also stratify on party registration and
This is called the matched sample. Matching region. For other types of studies, custom
is accomplished using a large set of variables strata can be developed.
that are available in consumer and voter
databases for both the target population and 3.3 The Distance Function
the option panel. When choosing the matched sample, it

6
is necessary to find the closest matching we select multiple matches. The number of
respondent in the panel of opt-ins to each matches is based on an estimated response
member of the target sample. Various types probability using a hazard model to estimate
of matching could be employed: exact the probability that a panelist responds
matching, propensity score matching, and by the end of the survey field period. The
proximity matching. Exact matching is im- total number of panelists matched to each
possible if the set of characteristics used for member of the target sample is determined
285 hamilton avenue suite 200 palo alto ca 94301 T 650.462.8000 F 650.462.8422 www.polimetrix.com

matching is large and, even for a small set of by matching panelists until the expected
characteristics, requires a very large panel number of responses is greater than or equal
(to find an exact match). Propensity score to one.
matching has the disadvantage of requiring
estimation of the propensity score. Either a Second, we use a second round of match-
propensity score needs to be estimated for ing when respondents begin an interview.
each individual study, so the procedure is Though the expected number of respondents
automatic, or a single propensity score must who arrive for each target sample element is
be estimated for all studies. If large numbers approximately one, randomness in response
of variables are used the estimated propen- patterns will mean that some target sample
sity scores can become unstable and lead to elements are matched more than once and
poor samples. some none at all. The best matching re-
spondent is assigned to the matching target
At YouGovPolimetrix, we employ a proxim- element if that element has not already been
ity matching method. For each variable used matched. Otherwise, the responding panelist
for matching, we define a distance function, is compared to the target sample elements
d (x,y), which describes how close the val- across all open studies and assigned to the
ues x and y are on a particular attribute. For closest matching respondent using a priority
numerical characteristics, such as age, years assignment algorithm. This minimizes the
of schooling, latitude, longitude, income, number of respondents who are turned away
etc., the distance function is usually just (because a match has already been found)
the absolute value of the difference |x y|, and ensures the most accurate matches pos-
though, occasionally, we use the square of sible.
the distance to penalize large discrepancies.
3.5 Statistical Theory
The overall distance between a member of The intuition behind sample matching is
the target sample and a member of the panel clear: if respondents who are similar on a
is a weighted sum of the individual distance large number of characteristics tend to be
functions on each attribute. The weights similar on other items for which we lack
can be adjusted for each study based upon data, then substituting one for the other
which variables are thought to be important should have little impact upon the sample.
for that study, though, for the most part, Can this intuition be made rigorous? The
we have not found the matching procedure answer is yes, as we describe below.
to be sensitive to small adjustments of the
weights. A large weight, on the other hand, The theoretical conditions that guarantee
forces the algorithm toward an exact match the validity of sample matching are quite
on that dimension. technical, but their content is easily under-
stood. There are three main assumptions:
3.4 Non-response Adjustments
Not all respondents in a matched sample Assumption 1: Ignorability
will respond to a survey invitation. At Panel participation is assumed to be ignor-
Polimetrix, we use two procedures to deal able with respect to the variables measured
with non-response: multiple matching and by survey conditional upon the variables
re-matching. Instead of selecting a single used for matching. What this means is
match for each member of the target sample, that if we examined panel participants and

7
non-participants who have exactly the same ists. More precisely, the probability distri-
values of the matching variables, then on av- bution of the matching variables must be
erage there would be no difference between bounded away from zero for panelists on the
how these sets of respondents answered the range of values (known as the support)
survey. This does not imply that panel par- taken by the non-panelists. In practice, this
ticipants and nonparticipants are identical, excludes attempts to match on variables for
but only that the differences are captured by which there are no possible matches within
285 hamilton avenue suite 200 palo alto ca 94301 T 650.462.8000 F 650.462.8422 www.polimetrix.com

the variables used for matching. Since the set the panel. For instance, it would be impos-
of data used for matching is quite extensive, sible to match on computer usage because
this is, in most cases, a plausible assump- there are no panelists without some experi-
tion. ence using computers.

Assumption 2: Smoothness Under Assumptions 1-3, it can be shown


The expected value of the survey items that if the panel is sufficiently large, then
given the variables used for matching is a the matched sample provides consistent
smooth function. Smoothness is a tech- estimates for survey measurements. The
nical term meaning that the function is sampling variances will depend upon how
continuously differentiable with bounded close the matches are if the number of vari-
first derivative. In practice, this means that ables used for matching is large, but Monte
the expected value function doesnt have any Carlo evidence indicates that these adjust-
kinks or jumps. ments are usually small. The key issues for
an application are whether the variables
Assumption 3: Common Support used for matching are adequate controls for
The variables used for matching need to panel participation effects and, if they are,
have a distribution that covers the same whether the panel is large enough to permit
range of values for panelists and non-panel- close matches.

Table 3: Survey Accuracy in 2005 California Special Election

Polimetrix Final Survey Election Outcome

Proposition Yes No Undecided Outcome Error

73 43% 54% 2% 47.4% -3.1%

74 45% 52% 3% 45.1% 1.3%

75 48% 49% 3% 46.7% 2.8%

76 40% 56% 3% 38.0% 3.7%

77 41% 52% 6% 40.6% 3.5%

78 33% 55% 13% 41.5% -4.0%

79 38% 46% 16% 39.0% 6.2%

8
4. Validation of Sample Matching results and can become highly unstable
when large weights are used.
2005 California Special Election
During the 2005 California special election, Sample matching is a newly developed meth-
YouGovPolimetrix released survey estimates odology for selection of representative
of the proportion of voters intending to samples from non-randomly selected pools
vote for and against seven propositions on of respondents. Sample matching results is
285 hamilton avenue suite 200 palo alto ca 94301 T 650.462.8000 F 650.462.8422 www.polimetrix.com

the ballot. These estimates were contained a sample of respondents who have similar
in press releases that were published with properties to a true random sample. That is,
several public sources (the National Jour- the matched sample mimics the characteris-
nals Hotline, www.realclearpolitics.com tics of the target sample.
and www.pollingreport.com). The outcome A number of side-by-side comparisons of
of all seven propositions was correctly pre- matched samples against other offline and
dicted (a record matched by only one other online samples shows this new sampling
polling organization) and the root mean method to be stable and highly accurate.
square error was 3.0% (only slightly larger
than what would be expected from random
sampling).

While one (or even seven) estimates do not


prove that the methodology works, these
results are very encouraging. In an election
which a number of phone and other Internet
surveys provided very misleading estimates,
sample matching performed very well.

Summary
Most samples today, whether for phone or
the Internet, do not even roughly approxi-
mate random samples. The primary sam-
pling problem that researchers face is one of
sample selection.

Most Internet surveys are not conducted


using a random sample of Internet users.
Rather, most employ access panels from
which samples are selected for individual
studies. By far the most common method
for sample selection in consumer market
research (both online and offline) is quota
sampling. Quota sampling has no basis in
sampling theory, since the survey researcher
has almost complete discretion in the selec-
tion of respondents within the cells. Ad-
ditionally, hard-to-fill quota cells often end
up being highly unrepresentative.

Post selection stratification methods such as


raking can be influenced by which variables
are used for weighting and can often have
serious implications for survey estimates.
Cell weighting also suffers from the fact that
9
only a few variables can be used to weight

Das könnte Ihnen auch gefallen