560

Journal of Advertising
ISSN: 0091-3367 (Print) 1557-7805 (Online) Journal homepage: .!:!nQ://www.tandfonline.com/loi/ujoa20
An Analysis of Data Quality: Professional Panels,

Student Subject Pools, and Amazon's Mechanical
Turk
Jeremy Kees, Christopher Berry, Scot Burton & Kim Sheehan
To cite this article: Jeremy Kees, Christopher Berry, Scot Burton & Kim Sheehan (2017) An Analysis
of Data Quality: Professional Panels, Student Subject Pools, and Amazon's Mechanical Turk, Journal
of Advertising, 46:1, 141-155, DOI: 10.1080/00913367.2016.1269304
To link to this article: https://doi.org/10.1080/00913367.2016.1269304
a Published online: 23 Jan 2017.
~~ Submit your article to this journal C?
W!l Article views: 1161
!J View related articles C?
CR) View Crossmark data C?

CrossM.trk
~ Citing articles: 11 View citing articles C?
Full Terms & Conditions of access and use can be found at http://www. ta
ndfon Ii ne .com/ action/jou rna 11 nformation ?jou rna ICode=u joa20
Download by: [156.214.125.231] Date: 27 December 2017, At: 13:21

Journal of Advertising, 46(1), 141-155
Copyright© 2017, American Academy of Advertising
ISSN: 0091-3367 print/ 1557-7805 online
DOI: 10. 1080/00913367.2016.1269304
An Analysis of Data Quality: Professional Panels, Student

Subject Pools, and Amazon's Mechanical Turk
Jeremy Kees
Villanova University, Villanova, Pennsylvania, USA
Christopher Berry and Scot Burton

University of Arkansas, Fayetteville, Arkansas, USA
Kim Sheehan
University of Oregon, Eugene, Oregon, USA
Advances in digital technology over the past decade have cre-

Data collection using Internet-based samples has become ated opportunities for social science researchers to drastically
increasingly popular in many social science disciplines, including
advertising. This research examines whether one popular Internet reduce the time and cost of collecting primary data. As society's
data source, Amazon's Mechanical Turk (MTurk), is an appropriate reliance on digital technology has grown, new opportunities to
.....•
N substitute for other popular samples utilized in advertising research. collect data from people all over the world have become avail-
er, Specifically, a five-sample between-subjects experiment was able. These samples from new data sources are being used
conducted to help researchers who utilize MTurk in advertising increasingly more often in academic research. For example,
experiments understand the strengths and weaknesses of MTurk
relative to student samples and professional panels. In comparisons recent innovations have given researchers the ability to collect
across five samples, results show that the MTurk data outperformed data using crowdsourcing platforms, such as Amazon's Mechan-
panel data procured from two separate professional marketing ical Turk (MTurk), which have afforded researchers the luxury of
research companies across various measures of data quality. The easy access to diverse nonstudent populations at a fraction of the
MTurk data were also compared to two different student samples, cost of traditional panel data. As new data collection method-
and results show the data were at least comparable in quality. While
researchers may consider MTurk samples as a viable alternative to ologies emerge, however, questions arise as to the validity and
student samples when testing theory-driven outcomes, precautions quality of the data obtained using these new sources.
should be taken to ensure the quality of data regardless of the source. While data sourced from student subject pools, professional
Best practices for ensuring data quality are offered for advertising panels, and MTurk may all have certain advantages, researchers
researchers who utilize MTurk for data collection. must weigh the perceived trade-offs from the different data sour-
ces. For instance, student samples have long been popular in
advertising research due to low cost and ease of access. However,
the most apparent drawback from student samples is the lack of
Address correspondence to Jeremy Kees, Villanova University, external validity, which often prevents the results from being
School of Business, Department of Marketing, 800 Lancaster Avenue, generalized to a broader population. Samples from professional
Bartley Hall, Room 3006, Villanova, PA 19085. E-mail: panels address many of the external validity concerns of student
jkees@villanova.edu samples but are much more expensive to obtain. In contrast,
Jeremy Kees (PhD, University of Arkansas) is a professor, Depart-
MTurk samples are inexpensive and easy to obtain. However, the
ment of Marketing, Villanova University School of Business, Villanova
University. quality of MTurk data (relative to student and professional panel
Christopher Berry (MBA, University of Tennessee at Chattanooga) is data) has not been specifically addressed in the context of
a doctoral candidate, Department of Marketing, Sam M. Walton College advertising experiments. Therefore, the purpose of this research is
of Business, University of Arkansas. to test differences between samples on a number of issues related
Scot Burton (PhD, University of Houston) is Distinguished Professor
to overall data quality in the context of an advertising experiment
and Tyson Chair in Food and Consumer Products Retailing, Department
of Marketing, Sam M. Walton College of Business, University of designed to test theory-driven outcomes.
Arkansas. In this article, we examine whether MTurk samples are a
Kim Sheehan (PhD, University of Tennessee at Knoxville) is a pro- viable substitute for more traditional samples used for conducting
fessor, School of Journalism and Communication, University of Oregon. academic advertising research studies. To do this, we
Color versions of one or more of the figures in the article can be
found online at www.tandfonline.com/ujoa.
141
14 J. KEES ET AL.
2
conduct an identical between-subjects advertising experiment other practical drawbacks, including slower data collection and
using five samples (two student samples, two samples from significant administrative time costs for the researcher to
professional research companies, and an MTurk sample). Across coordinate the data collection with the panel company. Therefore,
the five samples, we examine comparisons of data quality through many advertising researchers are beginning to consider MTurk as
various measures including the reliability of multi-item measures, a source for obtaining convenience samples, because this source
effects of manipulations, responses to attention checks, potentially overcomes some of the external validity issues related
participants' response involvement, tests of hypothesized to the limited age, income, and education of student samples yet
predictions, and other measures related to response quality. We avoids the high monetary and administrative time costs
offer conclusions related to the relative performance across the associated with panel data.
different samples. These findings are especially relevant for Amazon's MTurk offers a specific mechanism for obtaining a
researchers who utilize MTurk as a data source for advertising convenience sample of demographically diverse, geographically
experiments . dispersed, nonstudent participants for advertising research
(Sheehan and Pittman 2016). MTurk is a crowdsourcing
marketplace that provides (1) "Workers" with thousands of
LITERATURE REVIEW Human Intelligence Tasks (HITs) (e.g., audio transcribing, image
tagging, surveys) they can complete for pay at their convenience
Data Sources in Advertising Experiments and (2) "Requesters" (e.g., businesses, developers, academics)
Advertising researchers, like researchers in other fields, often access to an on-demand scalable workforce. Researchers from
rely on obtaining data from nonprobability samples, including many disciplines have learned to leverage this diverse,
convenience samples. Convenience samples consist of consumers nonstudent, adult population to participate in their studies. In
who are easily accessible rather than consumers who are 2007, more than 100,000 people globally had registered as
randomly selected from the entire population of interest. In a Workers, with that number growing to half a million by 2014
convenience sample, then, many members of a target population (Marvit 2014). In an average day, there are more than 430,000
have no chance of being selected, and the extent to which a HITs available on MTurk, where academic studies account for
.....•
N convenience sample actually represents the entire population less than 10% of all MTurk HITs offered at any given time
er, cannot be known (Schuman and Kalton 1985). Until recently, (Sheehan and Pittman 2016).
advertising research generally has utilized two main sources of
convenience samples: student subject pools and professional
panel data. Student samples are popular in numerous social The Use of Amazon's MTurk in Advertising Research
science disciplines and are generally acceptable in many respected Advertising experiments often rely largely on convenience
journals, including the Journal of Advertising. Because the samples from student subject pools, online panels, or, more
availability of resources often influences the selection of research recently, crowdsourcing platforms such as MTurk. This trend is
participants (Sackett and Larson 1990), one of the most popular evident in articles published in the Journal of Advertising. The
reasons researchers use student samples in advertising research is Journal of Advertising published 163 article from 2011 to 2015.
the low cost of these data. The most often cited and most Contained in these articles are 284 studies involving human
significant drawback of student samples is related to external subjects. A surprising 160 (56%) of these studies utilized student
validity (Peterson and Merunka 2014). Findings from studies samples compared to only 71 (25%) using nonstudent adult
using student samples are limited in that these younger, respondents (for the remaining 53 studies, it is unclear who
geographically constrained, and relatively well-educated participated in the studies). In contrast, consider 2016's issue 2 of
participants may not always be generalizable to broader the Journal of Advertising: These published articles that used
populations of nonstudent adults. human subjects included nine studies (in five articles) that used
To overcome such limitations of student samples, many student samples and eight studies (in five articles) that used
advertising researchers use Internet panels to obtain nonstudent nonstudent samples. Of those eight studies using nonstudent
samples that are presumably more diverse and more rep- samples, five studies (in three articles) featured samples from
resentative of the general population. However, studies using MTurk. Indeed, as shown in Figure 1, while Journal of
professional panel data are also more expensive than student data. Advertising studies utilizing student samples and samples from
The cost of panel data can vary significantly depending on the nonstudent panels and field studies have remained relatively
sampling criteria. General convenience samples can be obtained constant over the past five years, there has been a recent increase
through professional panels for less than $4 per completed survey. in the number of studies that have utilized MTurk as a source of
However, these samples can become much more expensive if the data. Articles published in the Journal of Advertising using
researcher seeks a quota sample (with proportions of individuals MTurk samples have covered an array of topics, including
equal to the population in terms of specified demographics or sustainable marketing and social media (Minton et al. 2012),
characteristics) or a truly random probability sample.' In addition nostalgic advertising (Muehling, Sprott, and Sultan 2014), brand
to cost, panel data has several placements in music (Ferguson
AN ANALYSIS OF DATA QUALITY 14
3
and Burkhalter 2015), and health advertising (Karek:las, Mueh- obtained from professional panel data to MTurk data. In addition,
ling, and Weber 2015; Newton, Wong, and Newton 2015; only a few of these studies examine responses to attention checks
Pounders, Lee, and Mackert 2015). and the reliabilities of multi-item measures. The studies that do
While some Journal of Advertising articles have utilized use attention checks do not directly compare results across
MTurk in conjunction with other samples, none have directly different samples. Therefore, a direct comparison of the quality
examined the consistency of results or potential differences in of data and the effects on theory-driven outcomes across samples
data quality across samples. Specifically, of the 163 articles often used in advertising experiments would be useful for
represented in Figure 1, only four articles reported using MTurk advertising researchers who use, or are considering use of,
in conjunction with data from other sources (Kulkarni and Yuan MTurk.
2015; Leonhardt, Catlin, and Pirouz 2015; Minton et al. 2012; Although nonexperimental research is not included in Table 1,
Mohanty and Ratneshwar 2015). While results seem to appear a number of survey-based studies worth noting have been
consistent across the samples in these articles, these studies do conducted to examine the sample composition and/or data quality
not directly examine the differences in the results, compare the of MTurk samples relative to other survey platforms. Some large-
quality of data across samples, or directly examine the effects of scale, survey-based studies have found that MTurk respondents
the experimental manipulations on manipulation checks and are not too different from respondents on other survey platforms
theory-driven outcomes across samples. Yet a number of (Huff and Tingley 2015), but others have identified some
experiments outside of the advertising literature have examined differences in age, education, gender, and income (Maynard
the quality of MTurk data or considered the consistency of results 2014; Peer et al. 2015). However, a survey of sample
across sample sources. An overview of these experiments is pro- composition across MTurk, Qualtrics, and SurveyMonkey
vided in Table 1. samples found that none of the samples approximated the U.S.
As shown in Table 1, several experiments in other social population profile (Heen, Lieberman, and Miethe 2014). With
science disciplines have examined various aspects of MTurk data. regard to data quality, some research suggests that MTurk
Although these studies employ experimental designs, they are respondents may be prone to multitasking (Hauser and Schwarz
substantially different from the types of experiments generally 2016) and fail attention checks at a higher rate than other samples
.....• conducted in advertising. For example, some include games used
N (Goodman, Cryder, and Cheema 2013). Yet other studies have
er, in behavioral economics (e.g., Amir, Rand, and Gal 2012), and shown that MTurk respondents pass attention checks at the same
others replicate popular psychology tasks (e.g., Crump, level as other pools of research participants (Roulin 2015) and
McDonnell, and Gureckis 2013). Therefore, none of these that MTurk respondents are often more attentive (Kung et al.
experiments directly examine potential differences in the 2016). Thus, in the current research, we examine multiple aspects
effectiveness of ad-based manipulations on manipulation checks related to the data quality of MTurk samples, and we extend prior
or theory-driven outcomes across samples. Furthermore, none of survey-based research to an advertising experiment with manipu-
these studies include continuous measured moderators, which are lations of advertising stimuli, ad manipulation checks, and
common in advertising research, none examine study completion measured moderators used to test hypotheses derived from
time, and none compare results theory.
Published JA Studies 2011 - 2015
40
351 3- •• ..,~.. _ •• _ •• _...__ _ __ ,,,.s
30 -~~'!le'~~- --- --
25 ~~"If'--
20
15
10
..... -~ -·······-
_
~--- ,
~----·~!-·-·~·-!~-~
0I
2011 2012 2013 2014 2015
- - • Student Sample -(Nonstudent) Panel or Field Sample - • MTurk Sample - Undefined
FIG. l. Samples used in Journal of Advertising articles from 201 I to 2015.

14 J. KEES ET AL.
4
0 0 0
z z z
::
:
0 0 0 0 0
-~ (].) z z z z
~s
0. ·
-
s E-< 0 0 0
0
u z z z
0 0 0 0
z z z z
0 0 0
z z z
.....•
N
er,
0 0
z z
X
...-.,
Vi
(.)
·
s
0
:
::
0
(.)
<l.)
5
0 0 0 0
z z z z
C
0
0 0 0 0 0 0
·= (l)
z z z z z z
~s
p., ·-
sf-<
0
u 0 0 0
z z z
c--- 0 0 0
z z z
-
0
N
N 0 Cf) 0 0 0
(\)
,....
(\)
.D
z >, z z z
s
-
0
N
(\)
(.)
(\)
0 0 0
c---
er,
N
z z z
(\) 01)
01
)
(\) .s
~ "O VJ
0 .a ;:::l 4-,
0..1<1
(.)
(\) (.)
Cf) (\)
·- "O C ;:::l ..c (\)
v (.) ..C
oJl C C .•..
o::l ·- • 0
s .s .E
.D
(\)
(\) 0
~
.s
14 J. KEES ET AL.
6
Overview of the Current Research is viewed as a stable consumer attribute, and its measure assesses
Table 1 shows that currently no research directly compares the "the extent to which people consider the potential distant
quality of data obtained from MTurk to other convenience outcomes of their current behaviors and the extent to which they
samples and panel data in the context of an advertising experi- are influenced by these potential outcomes"
ment. Further, as suggested by Table 1, advertising contexts We chose this specific study because of its use of a relatively
differ from many psychology, economics, and political science straightforward experimental design, predicted effects based on
experiments and even cross-sectional studies. Specifically, higher theoretical rationale for both a manipulated variable and a
levels of attention may be required due to detailed manipulations measured enduring construct (CFC), use of manipulation checks,
embedded in advertising stimuli, manipulation checks associated and use of several multi-item measures that varied in difficulty
with the ad treatments, and various dependent variables in which level for study participants. The original study employed a
reliabilities are critical. Insufficient levels of attention for ad student sample; in contrast, the current study compares a
experiments may result in poor data quality, which may differ in convenience sample of MTurk participants to convenience
comparison to other types of experiments not involving ad samples of two (new) student samples, a convenience sample of
manipulations (and differ in comparison to cross-sectional adults collected by Qualtrics,3 and a sample that is marketed as
studies). To address this gap in the literature, we replicate a reflective of the online U.S. population (for adults 18 years of age
published advertising experiment utilizing different samples or older) collected by Lightspeed, a respected professional panel
collected using five different sources of data: MTurk workers, provider.
two student samples, and two online samples using two different In the current study, we randomly assigned participants to ad
professional panel providers. Using this advertising experiment, conditions, and participants from all of the samples completed the
we address several research questions regarding data quality and identical survey. Participants in the MTurk sample were recruited
related measures for the performance of MTurk participants through Amazon's MTurk and received $0.75 for participation in
compared to student samples and participants from professional
the study (n = 163). The student samples were recruited from
panels.
classes with junior- and senior-level business students, and all
.....• students received extra course credit for their participation. The
N RQI: Are there differences between these samples on reliabilities for first group of student participants (n = 168) was sent a link to a
er,
dependent measures and an enduring psychological moderator, and do survey and given several days to respond at any location or time
checks on the effectiveness of ad stimuli manipulations differ across
these samples?
they desired. The second group of students (n = 146) responded
to the identical study, but these participants did so in a computer
RQ2: How do MTurk workers compare to other samples in regard to
laboratory environment that had a lab administrator nearby in the
participant response involvement and performance on attention checks room. The participants in the Qualtrics convenience sample (n =
embedded in the survey? 193) received points that can be accumulated and redeemed for
prizes for their participation. The final sample (n = 179) was
RQ3: How different are MTurk participants in terms of levels of procured from Lightspeed. It was promoted as a probability
research participation, general computer knowledge, and survey
completion time relative to other sample sources?
sample representative of online U.S. consumers balanced on
various demographic variables (e.g., age, gender, household
RQ4: Do theory-derived effects on dependent measure outcomes
income, ethnicity). As an incentive for participating in the study,
related to the advertising experiment differ across samples? these panel participants also received points that can be pooled
and later redeemed for prizes.
We summarize similarities and differences across each of the
five samples and offer some suggestions for advertising
researchers conducting experiments in online environments. Sample
Table 2 provides demographic information for the overall
sample of 849 participants as well as the demographic infor-
mation for each of the five samples. As would be expected, the
METHODS MTurk sample was older, more educated, and had higher
personal income levels than both of the student samples (all ps
Research Design < .001). Compared to the Qualtrics sample, the MTurk sample
The current study followed the design of research previously was younger (p < .001) and more educated (p < .01); however,
published in the Journal of Advertising (Kees, Burton, and Tangari personal income was not significantly different between the two
2010) that manipulated the framing of an advertisement for samples. Compared to the Lightspeed sample, the MTurk sample
pursuing a health-related goal and included the measured was younger (p < .001) and had lower personal income (p
moderator of consideration of future consequences (CFC). 2 CFC < .001), but education level did not differ significantly between
is a multi-item scale that measures consumers' time-related the two samples.
orientation concerning future consequences. CFC
TABLE2
Sample Characteristics
Overall Sample MTurk Sample Student Sample Student Sample Qualtrics Sample Lightspeed Sample
(n = 849) (n = 163) (Lab) (n = 168) (Link) (n = 146) (n = 193) (n = 179)
Age (years) 36.8 38.9 21.1 21.2 49.9 50.0
Personal 10,000-19,999 30,000-39,999 < 10,000 < 10,000 30,000-39,999 50,000-74,999
income
(USD)
Education Some college Four-year Some college Some college Some college Four-year college
college
Male (female) 47% (53%) 50% (50%) 57% (44%) 40% (60%) 36% (64%) 52% (48%)
English 93% 99% 88% 93% 98% 89%
In addition, the MTurk was equally divided between males and Davidenko 2009; Smith et al. 2016). These four measures were
females, whereas the Lightspeed sample contained more males, dispersed throughout the questionnaire. Three of the attention
and the Qualtrics sample contained more females. Finally, a check questions were 7-point Likert scale questions: (1) "The Sun
question was included asking if English was the primary rotates around the Earth," (2) "Obama was the first American
language spoken in the participants' households. Interestingly, president," and (3) "I have never heard of Facebook" (Smith et al.
English was the primary language spoken in the households of 2016). For each of these (incorrect) attention check measures, a
nearly the entire MTurk sample, whereas this percentage was response of Strongly disagree (coded as a 1) would indicate that
lower for both student samples and the Lightspeed sample (z the participant was attentive to the question. The fourth attention
......
N values ranged from 3.14 to 4.34, ps < .01 for each). check measure was a modified instructional manipulation check
er,
(IMC). In general, IMCs are designed to test whether participants
are carefully reading the instructions that they are given as part of
a study (Oppenheimer, Meyvis, and Davidenko 2009). In this
Measures of Data Quality study, the IMC was modified from prior research to be relevant in
A key issue for the study was the various measures of data this specific advertising context (Goodman, Cryder, and Cheema
quality related to participants' performance on the survey. To 2013). As part of this IMC, participants were first given the
evaluate data quality, three broad categories of measures were following instructions: "Research shows that people, when
included: measures of involvement, attention checks, and meas- answering questions, prefer not to pay attention and minimize
ures of research participation and general computer knowledge. their effort as much as possible. If you are reading this question,
Measures of involvement included total time (in minutes) to please select 'none of the above' on the next question." On the
complete the study, two open-ended items for which the number same page and just below these instructions, participants were
of characters in the open-ended response was captured, a twoitem asked: "What was this study about?" with response options of
measure of ad processing involvement, and a two-item measure Advertisements, Fast food, Managing body weight, and None of
of multitasking. Specifically, the open-ended items were (1) the above. This particular IMC is considered to be relatively
"What does the ad say or suggest to you about body weight man- difficult because several of the (incorrect) response options are
agement?" and (2) "Please describe what you have eaten in the related to the experiment and the specific ad stimuli (Goodman,
past 24 hours." Ad processing involvement was measured with Cryder, and Cheema 2013).
two 7-point semantic differential scale items (r = .90, p < .001; After participants completed the measures relevant to the
i.e., "How involved were you in processing the information in the experiment, but before they provided their demographic infor-
advertisement?" with end points of Skimmed it quickly/Read it mation, participants responded to measures regarding their prior
very quickly and Paid little attention/Paid a lot of attention). This research participation, the influence of compensation on their
ad processing involvement measure was drawn from prior willingness to participate in research, and their overall level of
research (Kees, Burton, and Tangari 2010). Multitasking was computer knowledge. Specifically, they were asked to indicate the
measured using 7-point Likert scale items: "While answering the approximate number of research studies and advertising studies
questions in this survey, I was multitasking." they participated in during the past month. The influence of
In addition to the measures of participant involvement, four compensation on likelihood of participating in this research was
attention check questions were included that have been recom- measured with two 7-point items (r = .94, p < .001): "How likely
mended as metrics for respondent data quality (Goodman, would you have been to respond to
Cryder, and Cheema 2013; Oppenheimer, Meyvis, and
14 J. KEES ET AL.
8
this survey if you were receiving no type of compensation?" with things might be in the future, and try to influence those things
end points of Not likely at all/Very likely and Not probable at with my day-to-day behavior" and "I am willing to sacrifice my
all/Very probable. Level of computer knowledge was measured immediate happiness or well-being in order to achieve future
with a semantic differential scale item: "How knowledgeable outcomes" (a = .79), and it also included both positively and
about computers do you consider yourself to be?" with endpoints negatively worded items.
of Not knowledgeable/Extremely knowledgeable .
RESULTS
Results for Multi-Item Scale Reliabilities

Measures Used for Tests of Predictions Table 3 shows the coefficient as for all multi-item measures
All multi-item measures were taken directly from those used for the entire sample as well as for each of the five samples
in the original study (see Kees, Burton, and Tangari 2010, p. 23). individually. As shown in Table 3, all as were acceptable across
Primary dependent measures included attitude toward the ad the MTurk and the two student samples (as > .70). In general,
(Aad) and perceived risk. The measure of overall Aad used a while there are a number of significant differences in a levels
measure adapted from Chandran and Menon (2004) and had (Feldt 1969), most of the reliability estimates were satisfactory
participants rate the ad on three 7-point semantic differential across all of the samples. The critical exceptions were the as for
items anchored with Negative/Positive, Unfavorable/Favorable, the manipulation check, which were much lower for the Qualtrics
and Bad/Good. Risk perception was measured using four related (a = .58) and Lightspeed samples (a = .48) than the other three
two-item Likert scales consistent with the specific goal-framing samples (as ranging from .76 to .79). Thus, the manipulation
manipulation in the ad message (e.g., "Failing to consume check exhibited an unacceptable reliability level for the two most
healthy foods (e.g., fruits, vegetables, and whole grains) as a expensive sample sources. While this result, along with
regular part of my diet, will put me at risk for poor health"; differences in reliabilities in other established measures discussed
"Consuming unhealthy foods (e.g., foods high in saturated fat and above, seems to indicate higher levels of attention for
.....• sugar) as a regular part of my diet will put me at risk for poor respondents in the MTurk and student samples, inconsistent
N health"). The manipulation check for the framing manipulation reliabilities for manipulation check variables also could possibly
er,
was drawn from prior research (Lee and Aaker 2004) and be an indicator of a problem with the manipulation check
consisted of four 7-point items anchored by Strongly measure. In addition, the a for CFC (with its mix of positively
disagree/Strongly agree. For example, participants reported the and negatively worded items) was significantly higher for the
degree to which the advertisement highlighted issues such as MTurk sample (.89) than for the other samples (as ranging
"Eating healthy foods such as fruits and vegetables" and from .71 to .76). As shown in Table 3, the as for the MTurk
"Avoiding unhealthy foods such as fat and sugars" (reverse sample were equal to or higher than all other samples for three of
scored). the four multi-item measures.
The measures of the manipulation check and the risk measure
were considered more challenging (they included reverse coded
items) than the standard Aad measure, and this was reflected in Results for Data Quality
the coefficient alpha reliability estimates. For Aad, the a across For the ad experiment conducted, we examined a variety of
the five samples was .96; while for risk and the check measures, measures of the participants' performance and overall response
the as were .82 and .70, respectively. Example items for the 12- quality. For any academic advertising experiment in which a
item CFC scale include "I consider how latent construct is manipulated, researchers need to
TABLE3
Reliabilities of Multi-Item Measures
Overall MTurk Student Sample Student Sample Qualtrics Lights peed
Sample Sample (a) (Lab) (b) (Link) (c) Sample (d) Sample (e)
Consideration of future .79 .89b,c.d,e .71 a .76 8 .76 8 .76 8
consequences
Perceived risk .82 _79b,d,e .84a,c _73b,d,e .84a,c .ss=
Attitude toward the ad .96 .98b,c,d _94a,d _95a,e .96b,e .98b,c,d
Manipulation check .70 _79d,e .76d,e _79d,e .58a,b,c .48a,b,c
Note. Superscripts indicate significant differences between coefficient a reliabilities (Feldt 1969). Consideration of future consequences,
perceived risk, and manipulation check all had positively and negatively worded (recoded) items.
have study participants perform adequately on a manipulation check Mvigilant = 3.46), and MTurk samples (MEager = 5.35 and
measure. To examine whether the manipulation was effective and Mvigilant =
3.15). In contrast, the mean differences between
whether the effect of the ad framing manipulation was consistent conditions were much lower for the professional samples: 1.09 for
across the five different samples, a twoway factorial analysis of the Qualtrics (MEager = 4.89 and Mvigilant = 3.80) and 0.85 for
variance (ANOV A) was performed. Overall results indicate that the the Lightspeed samples (MEager = 4.66 and Mvigilant = 3.81) .
framing manipulation was effective (F (1, 839) = 525.30, p < .001) Other results for measures of participants' performance and data
such that significant differences were found between the eager (M = quality across samples are shown in Table 4. To examine potential
5.17) and vigilant means conditions (M = 3.50). However, the effect differences in the data quality of the five samples, we performed one-
of framing on the manipulation check was qualified by the significant way ANOV As. The ANOV A results reveal significant differences
framing x sample interaction (F (1, 839) = 15.42, p < .001). in time taken to complete the study, the number of characters used to
Although the effect of the framing manipulation was significant for respond to the open-ended questions, level of ad processing
each of the five samples (p < .001 for each sample), the interaction involvement, and reported multitasking (Fs (4, 844) ranged from 2.80
indicates that the size of the effect varied across samples. to 68.95, ps < .OS for all). As shown in Table 4, the MTurk
Specifically, the mean differences between the manipulated eager participants took less time to complete the study than the Lightspeed
and vigilant means conditions were all greater than 1.98 for the sample (p < .01) yet used more characters to answer the open-ended
students in the lab (MEager = 5.53 and Mvigilant = 3.22), students questions than the Qualtrics and
sent a link (MEager = 5.43 and
TABLE4
Data Quality Measures across the Five Samples
Means
......
N MTurk Student Student Sample Qualtrics Lightspeed
er,
Grand F Sample Sample (Lab) (Link) Sample Sample
Means Value (n = 163) (a) (n = 168) (b) (n = 146) (c) (n = 193) (d) (n = 179) (e)
Measures of participant involvement
Time (in minutes) 16.19 2.80* 1 l.4e 13.0e 16.0 14.9e 25.1 a,b,d
Open-end 1 (no. of 90.22 24.24** 11 l.9c,d,e 116.1 c,d,e 96.sa,b,d,e 74.4a,b,c,e 58.2a,b,c,d
characters)
Open-end 2 (no. of 91.06 9.38** 102.7d,e 119_3c,d,e 85.8b 81.7a,b 68.3a,b
characters)
Ad processing 5.78 68.95** 6.6ob,c,d,e s.02a,d,e 4_7?3,d,e 6.2sa,b,c 6.07a,b,c
involvement t
Multitasking t 2.04 29.49** 1.31 c,d,e 1.61 c,e 3.03a,b,d,e 1.74a,c,e 2.61 a,b,c,d
Attention checks
Sun around earth t 2.18 16.41** 1.66d,e 1.65d,e 1.7ld,e 2.6l a.b.c,e 3.06a,b,c,d
Facebookt 1.27 12.89** 1.or 1.1r l.02d,e 1.29c,e l.75a,b,c,d
Obama first U.S. 1.57 12.35** 1.23ct,e 1.29ct,e l .32d,e 1. 7 4 a,b,c,e 2. l 6a,b,c,e
president'
IMC (percent correct) 61.3% 90.8%b,c,d,e 64. 3 % a.c.d,e 52.1 %a,b 52.3%a,b 48.6%a,b
Research participation and computer knowledge
Total studies (last 41.68 55_57** 162.ob,c,d,e .sa,d .8a,d 29_3a,b,c 17.6a
month)
Ad studies (last 11.33 13.81 ** 3 5 .4 b,c,d,e .4a,d .6a,d n.s-= 7.9a
month)
No compensation! 3.01 41.so** 1.92c,d,e 2.32c,d,e 2. 7 4 a.b.d.e 3_93a,b,c 3.89a,b,c
Computer 5.25 1.77 5.43d 5.21 5.24 5.08 8 5.31
knowledge!
Notes. tMeans based on 7-point scales. Superscripts indicate significant differences between means. **p
< .01; *p < .05.
15 J. KEES ET AL.
0
Lightspeed samples (ps < .05). The MTurk participants also used indicated that they would have been less likely to participate in
more characters than the students using the link to answer the this research if they were not compensated (ps < .001 for all).
first open-ended question (p < .05) but not the second open-ended However, there was not a significant difference between
question (p > .05). There were no differences in MTurk MTurkers' and students' in the laboratory likelihood of
participants and students taking the study in the laboratory on participating without compensation (p > .05). Finally, there was
these three measures (ps > .05). In addition, MTurk participants not a significant difference between the samples' overall computer
reported greater ad processing involvement than the student, knowledge (F (4, 844) = 1.77, p > .10); all samples indicated that
Qualtrics, and Lightspeed samples (ps < .01) and reported they were relatively knowledgeable about computers (all means
significantly less multitasking than three of the four other were greater than 5.08).4
samples.
The results of the attention check measures, including the
IMC, are also shown in Table 4. The ANOV A results reveal
significant differences in performance on the three attention Tests of Effects in the Experiment
checks (Fs (4, 844) ranging from 12.35 to 16.41, ps < .001 for Although the primary purpose of this research was not to
all). There were no differences in the MTurk and student samples replicate results found in extant literature (Kees, Burton, and
(ps > .40), with all samples performing relatively well. The Tangari 2010), we examined effects of advertisement framing
means for these three measures were all less than 1. 71 for the (i.e., eager versus vigilant means) and individuals' CFC on Aad
MTurk and student samples, indicating that participants strongly and risk perceptions to compare differences in these effects
(and accurately) disagreed with the statements that the Sun across the five different samples. To mimic the original study, we
rotates around the Earth, that President Obama was the first divided participants into high and low groups on CFC using a
American president, and that they had never heard of Facebook. median split (Iacobucci et al. 2015) to be able to include the
In contrast, the Qualtrics and Lightspeed samples did not perform construct measure as a factor in analyses of variance. To test the
as well. For example, the means for these attention check effects of framing and CFC on Aad and risk perceptions across
measures for the Lightspeed sample were significantly higher samples, we performed a three- factor ANOV A. The ad framing
.....• than the means for the check measures for each of the other four manipulation (eager versus vigilant means), CFC (high versus
N
er, samples, indicating that the Lightspeed sample performed more low), and the sample source (MTurk; students in the lab; students
poorly on the attention checks than each of the other four samples using a link; Qualtrics; Lightspeed samples) all served as factors
(ps < .05 for all). in the three-factor between-subjects design.
Because the IMC was a nominally scaled variable, the pro- The ANOV A results for the effects on Aad revealed signifi-
portion of participants who correctly answered the IMC (where cant main effects for framing (F (1, 829) = 10.73, p < .01) and
they selected None of the above as instructed) was compared CFC (F (1, 829) = 16.19, p < .001), as expected. These results
across the five samples. Compared to each of the other four indicate that participants reported more positive Aad framed
samples, the MTurk participants were significantly more likely to
using eager means (M = 5.84) than the ad framed using vigilant
correctly answer the IMC (91 %; compared to other sample
percentages ranging from 49% to 64%; z values from 5.76 and means (M = 5.53), and participants with high CFC reported more
7.61, ps < .01 for all), indicating that MTurk participants read the positive Aad (M = 5.88) than participants with low CFC (M =
instructions more carefully than the other participants. 5.49). Most importantly, although there was a main effect of the
In addition, we compared the samples with regard to their sample on Aad (F (4, 829) = 8.48, p < .001), there was no
recent participation in research and advertising studies, their indication of any differences in the findings across the samples.
likelihood of participating in this study if they were not In other words, the sample did not interact with either the framing
compensated, and their knowledge of computers. We again used manipulation or CFC to affect Aad (ps > .60 for all). These
ANOVAs, and Table 4 shows these results. ANOV A results nonsignificant results indicate that the effects of framing and
reveal significant differences in research participation and CFC on Aad were consistent across the five samples.
likelihood of participation without compensation (Fs (4, 844) Similarly, the ANOVA results for the main effects of framing
ranged from 13.81 to 55.57, ps < .001 for all). As could be (F (1, 829) = 5.75, p < .05) and CFC (F (1, 829) = 10.20, p
expected, the MTurk sample reported participating in far more < .01) on the risk perception dependent variable were significant.
total research studies and advertising studies in the past month Specifically, the results indicate that perceived risk was greater
than each of the other four samples (ps < .001 for all). Although among participants who viewed the ad framed using vigilant
all of the samples generally indicated that they would have been means (M = 5.61) than the participants who viewed the ad
unlikely to participate in the study if they were not compensated,
framed using eager means (M = 5.43), and risk perceptions
compared to students taking the study using a link, Qualtries
participants, and Lightspeed participants, MTurkers among participants with high CFC were greater (M = 5.64) than
participants with low CFC (M = 5.41). In addition, there was a
main effect for sample on risk perceptions (F (4, 829) = 3.14,
p < .05).5 Follow-up contrasts reveal that risk perceptions among approaches, including both more expensive professional panels
MTurkers (M = 5.67), Qualtrics participants (M = 5.60), and and student subject pools.
Lightspeed participants (M = 5.64) were greater than students Our results showed many differences across the samples with
taking the study in the laboratory (M = 5.36; p < .01), and risk regard to manipulation checks, attention checks, measures of
participant involvement, and prior research participation, but
perceptions among the Lightspeed participants were greater than
fewer differences were found for computer knowledge. In
students taking the study using a link (M = 5.43; p < .05). In
general, our results revealed that the response quality of the
addition, the framing x CFC interaction and all of the interactions
MTurk sample was good in comparison to the two student
involving the samples were nonsignificant (ps > .10 for all).
samples, while the samples from the professional panels had the
These nonsignificant interaction results for the sample factor
lowest overall response quality. Results show that MTurkers
indicate that the effects of framing and CFC on risk perceptions
appeared significantly more involved in processing the ad and
were consistent across the five samples.6
reported less multitasking than the other samples. In addition,
MTurk workers wrote more text when responding to the two
open-ended questions, exceeding both of the panel samples.
GENERAL DISCUSSION The results of the attention checks, which have been recom-
The purpose of this study was to test differences between mended and used in prior research as metrics for respondent data
samples on a number of issues related to overall data quality in quality (Smith et al. 2016), revealed some important differences
the context of an advertising experiment designed to test theory- among the samples. The IMC attention check (Oppenheimer,
driven outcomes. We examined differences across five distinct Meyvis, and Davidenko 2009), which was modified from prior
samples: two student samples, two different samples obtained research (Goodman, Cryder, and Cheema 2013), showed that the
from professional research companies, and a nonstudent sample MTurk participants performed better than the other four samples.
of MTurk workers. To assess response quality, we examined a The vast majority of MTurkers correctly answered this question,
variety of outcomes, including reliabilities for dependent which required careful reading of the study's instructions, while
measures and a psychological construct used as a measured the participants in the two student and two panel samples were
...... more likely to fail this test at a substantially higher rate.
N moderator, manipulation checks, response involvement, several
er, attention checks, measures of prior research participation, general Interestingly, in studies outside of the advertising context,
computer knowledge, and dependent variables for the advertising conflicting findings have been reported regarding whether
experiment. Overall findings from the current study indicate that MTurkers or students perform better on IMCs. That is, some
MTurk is a viable data collection platform for obtaining a sample studies have shown that MTurk workers pass IMCs at a greater
for advertising research experiments. In addition, as summarized rate than student participants (Hauser and Schwarz 2016),
in Table 5, using MTurk to obtain samples for advertising whereas others have found that student participants significantly
experiments appears to have several distinct advantages over outperformed MTurk
other popular
TABLES
Summary of Comparisons across Sample Sources
MTurk Student Sample Student Sample Qualtrics Lightspeed
Sample (Lab) (Link) Sample Sample
Cost per respondent $0.75 $3.75 $5.88
Measure reliability Best Better Better Worst Worst
Manipulation checks Best Best Best Good Good
Instructional manipulation check Best Better Worst Worst Worst
Attention checks Best Best Best Better Worst
Ease of data collection for researcher Easiest Variable Variable Easier Most Difficult
Total time for researcher to collect all Fastest Variable Variable Fast Slowest
data
Number of studies participated in Highest Lowest Lowest Moderate Moderate
(30 days)
Hypothesis testing Consistent Consistent Consistent Consistent Consistent
Note. Although the cost per respondent shown for each sample above pertains to this specific advertising experiment, MTurk data are generally
much less expensive than professional panel data.
15 J. KEES ET AL.
2
workers on difficult IMCs (Goodman, Cryder, and Cheema are consistent across the MTurk sample, student samples, and
2013). With over 90% of the MTurk workers passing the IMC samples drawn from professional panels.
used in this study, compared to around 50% for the professional Although the primary focus of this research was on comparing
samples, our results from this advertising experiment are the quality of data obtained from MTurk to professional panels
consistent with the former (Hauser and Schwarz 2016). Given the and student subject pools, it is important to note that many
high number of studies in which the MTurk population differences between the two student samples were found, espe-
participates, it should be noted that it is possible that MTurk cially with regard to the measures of participant involvement. Not
respondents recognize IMC-type attention screening questions only did students taking the study outside of the controlled
more readily than respondents in other samples. For the three laboratory environment (via a link provided to them) appear to be
other attention check measures, the MTurk and student samples less involved than the MTurk Workers, these students also were
performed well. In contrast, the professional panel samples did generally less involved than the students taking the study in a
not perform as well on these attention check measures, with the controlled laboratory environment. For example, compared to
most expensive sample (Lightspeed) performing the worst among students taking the study in the laboratory, the students partic-
the five samples. ipating in the experiment on their own time using a link provided
Some have expressed concerns with MTurk workers' repeated to them by the research team elaborated less on openended
research participation and compensation as a motivation for their questions, reported greater levels of multitasking while taking the
participation (Landers and Behrend 2015). Our findings show that study, and took significantly longer to complete the experiment.
MTurk workers do participate in a large number of research In addition, it is concerning that only about half of these student
studies, as compared to both the student and panel samples. This participants outside of the lab passed the IMC, indicating that
experience through repeated participation could explain why these students did not carefully read the instructions provided to
MTurkers were relatively quick in their ability to complete this them. These results suggest researchers should use caution when
experiment. In addition, MTurk workers report that they would allowing students to participate in research outside of a controlled
have been less likely than the respondents from the other four environment. Based on these results, for studies using student
samples to participate in the study if they did not receive samples, we strongly recommend that researchers conduct their
.....• compensation for their participation. However, it is important to advertising experiments in a controlled environment. If a
N
er, note that all study participants indicated that they would have researcher does not have access to a controlled environment, use
been unlikely to participate if they were not compensated (means of MTurk as a participant source appears beneficial, because our
for all samples were less than the scale midpoint). While repeated results indicate that these participants are equally attentive,
participation and worker motivation may be a concern to some, generally more involved, and more carefully read study
our results summarized in Table 5 for participant involvement, instructions than students participating in studies outside of a
attention checks, and the IMC indicate that response quality for controlled environment.
MTurkers is generally better than, or at least as high as, the Although the number of studies comparing MTurk samples to
response quality for student participants, and the response quality student and professional panel samples in domains outside of
is generally superior to participants from professional panels. advertising is increasing (e.g., Bartneck et al. 2015; Berinsky,
Finally, the five different samples were included as an addi- Huber, and Lenz 2012; Casler, Bickel, and Hackett 2013; Clif-
tional between-subjects factor when examining the effects of ford, Jewell, and Waggoner 2015; Goodman, Cryder, and
advertisement framing and participants' CFC on Aad and risk Cheema 2013; Heen, Lieberman, and Miethe 2014; Hamby and
perceptions to test whether these theory-driven effects were Taylor 2016; Huff and Tingley 2015; Maynard 2014; Paolacci,
consistent across the samples. The manipulation check results Chandler, and Ipeirotis 2010; Peer et al. 2015; Simons and
showed that the effect of the manipulation was effective across Chabris 2012; Sprouse 2011; Suri and Watts 2011), our study
the five samples. However, the sample factor did interact with the makes an important contribution to the literature. As shown in
framing manipulation to affect the manipulation check. The Table 1, our study extends findings from prior experiments
pattern of these results suggests that the effect related to the involving the assessment of MTurk, and it identifies important
manipulation was stronger for the MTurk and student samples differences in data quality across samples in an advertising
than for the two other panel samples. The experimental effect of experiment context. This is important for advertising researchers
framing and the effect of participants' CFC on attitude toward the due to the uniqueness of advertising experiments, where lack of
ad and risk perception were consistent across the five samples. attention to manipulated advertising stimuli, manipulation check
Thus, although there are significant demographic differences measures, and multi-item variables can have detrimental effects
across the samples (see Table 2), and there may be concerns with on tests of theory. Results from our study reveal few differences
the generalizability of the results from any of the five samples, in theory-driven outcomes, suggesting that MTurk samples
these results show that effects on theory-driven outcomes in this appear a viable option for advertising experiments. In addition,
experimental advertising context we found that manipulation effectiveness, check measures, and
other measures of overall data quality vary across data sources,
with the MTurk sample performing better than the two
3
professional samples. These differences are important for adver- examined. Thus, these specific results indicate that researchers
tising researchers to consider as they decide on specific data conducting advertising experiments may consider the use of
sources to use for their research . MTurk as a viable alternative to student samples and panel data
when testing theory-driven outcomes.
Suggestions for Researchers

Given the methodological issues identified in this study, we NOTES
offer some suggestions for researchers collecting advertising data 1. Multiple quotes were obtained for a true random probability sample of
in online environments, and specifically when utilizing MTurk, in 170 respondents to use in the current study. Quotes for data collection
the appendix. While not comprehensive, these best practices will ranged from $10,000 to $16,000.
help ensure that researchers maximize data quality. For example, 2. The detailed study procedure including the specific manipulation
it is highly recommended that researchers use quality assurance embedded in the experimental ad stimuli can be found in Kees,
measures (i.e., attention checks, speeding traps) for MTurk and Burton, and Tangari (2010; Study 1).
other studies conducted online, including professional panels. In 3. While Qualtrics provides panel services, it outsources the actual data
addition to the several attention and instructional manipulation collection to partner companies with established consumer panels.
check measures used in this research that could be used by 4. To assess whether the effects were consistent when controlling for age
advertising researchers (see Table 4), future research could and gender, we performed analyses of covariance (ANCOV As) for
consider using and continuing to develop new attention checks to each of the measures reported in Table 4. The ANCOVA results were
which MTurk workers and other panel members may not have fully consistent with the ANOV A results reported in Table 4 with one
been previously exposed. It may be tempting to assume that data exception. Controlling for age and gender, results reveal that there
quality is high when sourcing data from professional panel data was a difference in computer knowledge across the samples (F (4,
842) = 9.06, p < .05). Contrasts reveal that the student samples
providers. However, as shown in the current study (and observed
reported somewhat lower computer knowledge (Mswdent Lab= 4.80;
in many other experiments conducted by the authors), data from
Mswdent Link= 4.87) than all three of the nonstudent adult samples
professional panels may often be particularly susceptible to (MMTurk = 5.42; MQualtrics = 5.43; MLightspeed = 5.62; ps < .001
.....• quality control issues. In addition, it is relatively easy for the
N for all), when controlling for age and gender.
er, researcher to implement multiple safeguards to ensure only U.S. 5. The main effect of the sample on risk perceptions is nonsignificant
respondents participate in MTurk and other online data when controlling for age (F ( 4, 828) = 2.24, p > .05).
collections. This is important in order to overcome the popular 6. To assess whether the effects of framing and CFC were consistent
critique of online data collections (and MTurk in particular) that when controlling for differences in age and gender, we performed
samples purported to be composed of U.S. consumers are ANCOV As for the effects on (1) Aad and (2) risk perception. The
contaminated by respondents outside of the United States. results from both ANCOV As were fully consistent with the ANOVA
Technological innovations will continue to enhance online data results for each of the dependent measures reported. We also
collections (e.g., ballot stuffing prevention in Qualtrics, performed ANOV As for the effects of framing and CFC on Aad and
microbatching in TurkPrime) (Qualtrics 2016; TurkPrime 2016). risk perception after removing outliers in terms of time taken to
complete the survey. These results also were consistent with the
Researchers are encouraged to continue to explore new tools and
ANOV A results in the text. In addition, we conducted moderated
techniques that will increase the quality and validity of studies
regression analyses with CFC as a continuous predictor to ensure that
conducted online. Finally, MTurk is distinctly different from results were consistent with the results from the ANOV As using a
professional panel data in that participant compensation is under median split for CFC. These regression results showed that the direct
the direct control of the researcher. Thus, while the low cost of effects of framing and CFC on the dependent measures were
conducting studies can be a benefit of MTurk, researchers should significant, the interaction nonsignificant, and all findings were fully
keep in mind ethical considerations and strive to compensate consistent with results reported using a median split for CFC.
MTurk workers fairly (e.g., estimate the average time it will take
workers to complete the study and then strive to pay them at least
minimum wage).
REFERENCES
Amir, Ofra, David G. Rand, and Ya'akov Kobi Gal (2012), "Economic Games on the
Internet: The Effect of $1 Stakes," PLOS ONE, 7 (2), e31461.
CONCLUSION Bartneck, Christoph, Andreas Duenser, Elena Moltchanova, and Karolina Zawieska
In summary, although researchers publishing in the Journal of (2015), "Comparing the Similarity of Responses Received from Studies in Amazon's
Advertising have primarily relied on data obtained from student Mechanical Turk to Studies Conducted Online and with Direct Recruitment," PLOS
samples and nonstudent samples from professional panels (see ONE, IO (4), e0121595.
Figure 1), we found the MTurk sample to be of equal to or better Berinsky, Adam J., Gregory A. Huber, and Gabriel Lenz (2012), "Evaluating Online
Labor Markets for Experimental Research: Amazon.corn's Mechanical Turk,"
quality than both of the student samples and clearly of better
Political Analysis, 20 (3), 351-68.
quality than both of the professional panel samples
15 J. KEES ET AL.
4
Buhrmester, Michael, Tracy Kwang, and Samuel D. Gosling (2011), "Amazon's Feedback across Cultures," Journal of Cross-Cultural Psychology, 47 (5), 696-
Mechanical Turk: A New Source of Inexpensive, Yet HighQuality, Data?," 712.
PLOS ONE, 6 (I), 3-5. Landers, Richard N., and Tara S. Behrend (2015), "An Inconvenient Truth:
Casler, Krista, Lydia Bickel, and Elizabeth Hackett (2013), "Separate but Equal? A Arbitrary Distinctions between Organizational, Mechanical Turk, and Other
Comparison of Participants and Data Gathered via Amazon's MTurk, Social Convenience Samples," Industrial and Organizational Psychology, 8 (2), 142-
Media, and Face-to-Face Behavioral Testing," Computers in Human Behavior, 64.
29 (6), 2156-60. Lee, Angela Y., and Jennifer L. Aaker (2004), "Bringing the Frame into Focus:
Chandran, Sucharita, and Geeta Menon (2004), "When a Day Means More than a The Influence of Regulatory Fit on Processing Fluency and Persuasion,"
Year: Effects of Temporal Framing on Judgments of Health Risk," Journal of Journal of Personality and Social Psychology, 86 (2), 205-218.
Consumer Research, 31 (2), 375-89. Leonhardt, James M., Jesse R. Catlin, and Dante M. Pirouz (2015), "Is Your
Clifford, Scott, Ryan M. Jewell, and Philip D. Waggoner (2015), "Are Samples Product Facing the Ad's Center? Facing Direction Affects Processing Fluency
Drawn from Mechanical Turk Valid for Research on Political Ideology?," and Ad Evaluation," Journal of Advertising, 44 (4), 315-25.
Research and Politics, 2 (4), 1-9. Marvit, Moshe Z. (2014), "How Crowdworkers Become Ghosts in the Digital
Crump, Matthew J.C., John V. McDonnell, and Todd M. Gureckis (2013), Machine," The Nation, February 5,
"Evaluating Amazon's Mechanical Turk as a Tool for Experimental Behavioral http://www.thenation.com/article/178241/how-crowdworkers-became-ghosts-
Research," PLOS ONE, 8 (3), e57410. digital-machine
Feldt, Leonard S. (1969), "A Test of the Hypothesis That Cronbach's Alpha or Maynard, Andrew (2014), "Has Anyone Heard of Bisphenol A," 2020 Science,
Kuder-Richardson Coefficient Twenty is the Same for Two Tests," Psy- December 1, http://2020science.org/2014/12/01/anyone-heardbpa/
chometrika, 34 (3), 363-73. Minton, Elizabeth, Christopher Lee, Ulrich Orth, Chung-Hyun Kim, and Lynn
Ferguson, Nakeisha S. and Janee N. Burkhalter (2015), "Yo, DJ, That's My Brand: Kahle (2012), "Sustainable Marketing and Social Media: A Cross-Country
An Examination of Consumer Response to Brand Placements in Hip-Hop Analysis of Motives for Sustainable Behaviors," Journal of Advertising, 41 (4),
Music," Journal of Advertising, 44 (1), 47-57. 69-84.
Goodman, Joseph K., Cynthia E. Cryder, and Amar Cheema (2013), "Data Mohanty, Praggyan, and S. Ratneshwar (2015), "Did You Get It? Factors
Collection in a Flat World: The Strengths and Weaknesses of Mechanical Turk Influencing Subjective Comprehension of Visual Metaphors in Advertising,"
Samples," Journal of Behavioral Decision Making, 26 (3), 213-24. Journal of Advertising, 44 (3), 232-42.
Hamby, Tyler, and Wyn Taylor (2016), "Survey Satisficing Inflates Reliability and Muehling, Darrel D., David E. Sprott, and Abdullah J. Sultan (2014), "Exploring
Validity Measures: An Experimental Comparison of College and Amazon the Boundaries of Nostalgic Advertising Effects: A Consideration of Childhood
Mechanical Turk Samples," Educational and Psychological Measurement, 76 Brand Exposure and Attachment on Consumers' Responses to Nostalgia-
(6), 912-32. Themed Advertisements," Journal of Advertising, 43 (I), 73-84.
.....• Newton, Joshua D., Jimmy Wong, and Fiona J. Newton (2015), "The Social Status
N Hauser, David J., and Norbert Schwarz (2016), "Attentive Turkers: MTurk
er, Participants Perform Better on Online Attention Checks Than Subject Pool of Health Message Endorsers Influences the Health Intentions of the
Participants," Behavior Research Methods, 48 (1), 400--407. Powerless," Journal of Advertising, 44 (2), 151-60.
Heen, M.S.J., Joel D. Lieberman, and Terence D. Miethe (2014), A Comparison of Oppenheimer, Daniel M., Tom Meyvis, and Nicolas Davidenko (2009),
Different Online Sampling Approaches for Generating National Samples "Instructional Manipulation Checks: Detecting Satisficing to Increase Sta-
(Report CCJP 2014-01 ). Las Vegas: University of Nevada, Las Vegas, Center tistical Power," Journal of Experimental Social Psychology, 45 (4), 867- 72.
for Crime and Justice Policy, Paolacci, Gabriele, Jesse Chandler, and Panagiotis G. Ipeirotis (2010), "Running
http://www.unlv.edu/sites/default/files/page_files/27/ComparisonDifferentOnli Experiments on Amazon Mechanical Turk," Judgment and Decision Making, 5
neSampling.pdf (5), 411-19.
Horton, John J., David G. Rand, and Richard J. Zeckhauser (2011), 'The Online Peer, Eyal, Sonam Samat, Laura Brandimarte, and Alessandro Acquisti (2015),
Laboratory: Conducting Experiments in a Real Labor Market," Experimental "Beyond the Turk: An Empirical Comparison of Alternative Platforms for
Economics, 14, 399-425. Crowdsourcing Online Behavioral Research," SSRN, April 14, http://ssrn.
Huff, Connor, and Dustin Tingley (2015), '"Who Are These People?' Evaluating com/abstract=2594 l 83
the Demographic Characteristics and Political Preferences of MTurk Survey ---, Joachim Vosgerau, and Alessandro Acquisti (2014), "Reputation as a Sufficient
Respondents," Research and Politics, 2 (3), 1-12. Condition for Data Quality on Amazon Mechanical Turk," Behavior Research
Iacobucci, Dawn, Steven S. Posavac, Frank R. Kardes, Matthew J. Schneider, and Methods, 46 (4), 1023-31.
Deidre L. Popovich (2015), "Toward a More Nuanced Understanding of the Peterson, Robert A., and Dwight R. Merunka (2014), "Convenience Samples of
Statistical Properties of a Median Split," Journal of Consumer Psychology, 25 College Students and Research Reproducibility," Journal of Business Research,
(4), 652-65. 67 (5), 1035-41.
Joireman, Jeff A., Alan Strathman, and Daniel Balliet (2006), "Considering Future Pounders, Kathrynn R., Seungae Lee, and Mike Mackert (2015), "Matching
Consequences: An Integrative Model," in Judgments Over Time: Temporal Frame, Self-View, and Message Frame Valence: Improving Per-
The Interplay of Thoughts, Feelings, and Behaviors, Lawrence J. Sanna and suasiveness in Health Communications," Journal of Advertising, 44 (4), 388-
Edward Chin-Ho Chang, eds., Oxford: Oxford University Press, 82- 99. 402.
Kareklas, loannis, Darrel D. Muehling, and T.J. Weber (2015), "Reexamining Qualtrics. (2016). Q Support: Survey Protection.
Health Messages in the Digital Age: A Fresh Look at Source Credibility www.qualtrics.com/support/survey-platform/survey-module/survey-
Effects," Journal of Advertising, 44 (2), 88-104. options/survey-protection/
Kees, Jeremy, Scot Burton, and Andrea Heintz Tangari (2010), "The Impact of Roulin, Nicolas (2015), "Don't Throw the Baby Out with the Bathwater: Com-
Regulatory Focus, Temporal Orientation, and Fit on Consumer Responses to paring Data Quality of Crowdsourcing, Online Panels, and Student Samples,"
Health-Related Advertising," Journal of Advertising, 39 (1), 19-34. Industrial and Organizational Psychology, 8 (2), 190-96.
Kulkarni, Atul A. and Hong Yuan (2015), "Effect of Ad-Irrelevant Distance Cues Sackett, Paul R., and James R. Larson (1990), "Research Strategies and Tactics in
on Persuasiveness of Message Framing," Journal of Advertising, 44 (3), 254- Industrial and Organizational Psychology," in Handbook of Industrial and
63. Organizational Psychology, Marvin D. Dunnette and Leanetta M. Hough, eds.,
Kung, Franki Y.H., Young-Hoon Kim, Daniel Y.-J. Yang, and Shirley Y.Y. Palo Alto, CA: Consulting Psychologists Press, 419-89.
Cheng (2016), "The Role of Regulatory Fit in Framing Effective Negative
5
Schuman, Howard, and Graham Kalton (1985), "Survey Methods," in Handbook of Sprouse, Jon (2011), "A Validation of Amazon Mechanical Turk for the Collection
Social Psychology, Vol. 1, Gardner Lindzey and Elliot Aronson, eds., New of Acceptability Judgments in Linguistic Theory," Behavior Research
York: Random House, 635-97. Methods, 43 (I), 155-67.
Sheehan, Kim, and Matthew Pittman (2016), The Academic Researcher's Guide to Strathman, Alan, Faith Gleicher, David S. Boninger, and Scott C. Edwards (1994),
Mechanical Turk, Irvine, CA: Melvin and Leigh. "The Consideration of Future Consequences: Weighing Immediate and Distant
Simons, Daniel J., and Christopher F. Chabris (2012), "Common (Mis) Beliefs Outcomes of Behavior," Journal of Personality and Social Psychology, 66 (4),
about Memory: A Replication and Comparison of Telephone and Mechanical 742-52.
Turk Survey Methods," PLOS ONE, 7 (12), e51876. Suri, Siddharth, and Duncan J. Watts (2011), "Cooperation and Contagion in Web-
Smith, Scott M., Catherine A. Roster, Linda L. Golden, and Gerald S. Albaum Based, Networked Public Goods Experiments," PLOS ONE, 6 (3), el 6836.
(2016), "A Multi-Group Analysis of Online Survey Respondent Data Quality: TurkPrime. (2016), "Run Large Surveys as Multiple Time-Released Smaller
Comparing a Regular USA Consumer Panel to MTurk Samples," Journal of HITs," http://www.turkprime.com/Home/MicroBatch
Business Research, 69 (8), 3139-48.
APPENDIX
"BEST PRACTICES" FOR ONLINE DATA COLLECTION IN ADVERTISING EXPERIMENTS
 For studies that require a U.S. sample, capture respondent IP address and location parameters (latitude and longitude). These data
can be easily captured using survey design software (e.g., Qualtrics, Survey Monkey). If using M'I'urk, be sure to set study options
to only allow U.S. workers to participate in the study.
 When collecting data using M'Turk, predetermine the required acceptance rate and required number of completed HITs by
workers; consider reporting these when describing a specific MTurk sample.
 Include multiple attention check measures in different parts of the survey. See the Measures of Data Quality section of this
article for examples of effective attention check measures to use.
.....•
N  Implement "speeding traps" to monitor how quickly respondents complete the study.
er,
 Always "soft launch" the survey to collect a small subset of responses. This allows the researcher to identify and fix any
potential data quality and programming problems before the entire data set is collected.
 Take your own survey prior to a "soft launch" to test programming and ad stimuli appearance and to obtain an initial idea of time
needed to respond.
 When working with a panel company, discuss data quality expectations before committing to the project. Be upfront with the
specific data quality checks that will be used, the expected pass rate for the attention checks, and the time to complete the survey.
 In some systems, it is possible for the same subject to return to the survey and complete the questionnaire several times. Some
survey design software has an option to prevent respondents from taking a survey more than once. To avoid repeated responses in
MTurk, only use a single batch, and add additional requests to the existing batch if you decide to increase the sample size after the
study has been launched.
 When using M'I'urk, consider collecting data on various different days and times during the week to capture a more diverse and
representative sample. TurkPrime has a microbatch feature that automates this process for researchers.
 When using MTurk, pay workers a fair wage. Minimum wage is often recommended.
 With M'Iurk, establishing a good reputation with workers is critical. Workers may examine your reviews on MTurk websites and
forums when deciding whether to participate in your study. To maintain a good reputation with workers, be responsive to their e-
mails.
 Consider the device used by the respondent. For example, ad stimuli and copy may be difficult to view from the screen of a
smartphone.
 Monitor the amount of time the ad is viewed to ensure there is sufficient attention to the ad.
 For specialized samples and specific topics avoid specifying the desired sample characteristics and purpose of the study in the
posting to minimize the potential for self-selection bias and cheating.

560

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

560

Hochgeladen von

Copyright:

Verfügbare Formate

Journal of Advertising

ISSN: 0091-3367 (Print) 1557-7805 (Online) Journal homepage: .!:!nQ://www.tandfonline.com/loi/ujoa20

An Analysis of Data Quality: Professional Panels,

Jeremy Kees, Christopher Berry, Scot Burton & Kim Sheehan

To link to this article: https://doi.org/10.1080/00913367.2016.1269304

a Published online: 23 Jan 2017.

~~ Submit your article to this journal C?

W!l Article views: 1161

!J View related articles C?

CR) View Crossmark data C?

~ Citing articles: 11 View citing articles C?

Download by: [156.214.125.231] Date: 27 December 2017, At: 13:21

An Analysis of Data Quality: Professional Panels, Student

Christopher Berry and Scot Burton

Advances in digital technology over the past decade have cre-

Published JA Studies 2011 - 2015

- - • Student Sample -(Nonstudent) Panel or Field Sample - • MTurk Sample - Undefined

FIG. l. Samples used in Journal of Advertising articles from 201 I to 2015.

Results for Multi-Item Scale Reliabilities

sent a link (MEager = 5.43 and

Suggestions for Researchers

"BEST PRACTICES" FOR ONLINE DATA COLLECTION IN ADVERTISING EXPERIMENTS

Das könnte Ihnen auch gefallen