Sie sind auf Seite 1von 14

Psychological Methods © 2014 American Psychological Association

2014, Vol. 19, No. 4, 528 –541 1082-989X/14/$12.00 DOI: 10.1037/met0000016

Measurement and Control of Response Styles Using Anchoring Vignettes:


A Model-Based Approach

Daniel M. Bolt Yi Lu
University of Wisconsin–Madison ACT, Iowa City, Iowa

Jee-Seon Kim
University of Wisconsin–Madison
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Response styles are frequently of concern when rating scales are used in psychological survey instru-
This document is copyrighted by the American Psychological Association or one of its allied publishers.

ments. While latent trait models provide an attractive way of controlling for response style effects
(Morren, Gelissen, & Vermunt, 2011), the analyses are generally limited to accommodating only a small
number of response style types. The use of anchoring vignettes provides an opportunity to overcome this
limitation. In this article, a new latent trait model is introduced that uses vignette responses to measure
and control for any form of response style. An illustration is provided with data from a cross-national
study of self-reported conscientiousness by Mõttus, Allik, Realo, Pullman, et al. (2012).

Keywords: multidimensionality, response style, item response theory, anchoring vignettes

Supplemental materials: http://dx.doi.org/10.1037/met0000016.supp

Response styles (or “response sets”) refer to stylistic tendencies bias if higher item scores are taken to imply a higher level of the
in how respondents assign ratings to survey items that are unre- intended-to-be-measured construct. Response styles such as ERS
lated to item content (Baumgartner & Steenkamp, 2001). Various can also lead to bias (see, e.g., Bolt & Johnson, 2009). These
response style types have been documented in the literature (Van biases can be consequential not only for the interpretation of
Vaerenbergh & Thomas, 2013). For example, extreme response individual respondent scores, but also in evaluating how the
style (ERS) refers to a tendency to overuse the extreme endpoints intended-to-be-measured construct is related to other variables
of a rating scale (e.g., 1 ⫽ strongly disagree and 7 ⫽ strongly (Moors, 2012). These latter effects are often complex due to the
agree on a 1–7 scale). Other common forms of response style fact that response styles frequently correlate with other respondent
include midpoint response style (MRS; e.g., a tendency to select characteristics that, in turn, may also correlate with the measured
4 ⫽ neutral), acquiescent response style (ARS; consistent selec- construct. ERS, for example, has been correlated with level of
tion of positive or agree categories), and disacquiescent response education (Meisenberg & Williams, 2008; Weijterss, Allik, Realo,
style (DRS; consistent selection of negative or disagree catego- Pullman, et al., 2010), trait anxiety (Lewis & Taylor, 1955), and
ries), among others (van Rosmalen, van Herk, & Groenen, 2010). age (Billiet & McClendon, 2000), and also varies across cultures
Although usually defined in relation to agree– disagree rating (Harzing, 2006; Hui & Triandis, 1989; Johnson, Kulesa, Cho, &
scales, similar response style types are observed under other re- Shavitt, 2005).
sponse formats as well, including frequency-based rating scales Latent trait models, which introduce separate traits to model the
(Greenleaf, 1992a) and semantic differential scales (Gibbins, effects of the substantive construct and response style, provide an
1968). appealing way to control for the effects of response style behavior
The stability of response style behavior across both survey (Morren, Gelissen, & Vermunt, 2011). However, these models are
instruments and time (Weijters, Geuens, & Schillewaert, 2010a, generally limited to a small number (often just one) of response
2010b) makes response styles a likely source of systematic mea- style types (see, e.g., Billiett & McClendon, 2000; Bolt & Johnson,
surement error. For example, ARS will naturally lead to positive 2009; Moors, 2003). A primary reason for this limitation is the
inherent challenge in disentangling the simultaneous effects of
both the substantive trait and response style on the item response.
For example, on a survey instrument for which item agreement
This article was published Online First April 28, 2014. implies a higher level of the construct, a respondent may consis-
Daniel M. Bolt, Department of Educational Psychology, University of tently select the highest possible agree score categories across
Wisconsin–Madison; Yi Lu, ACT, Iowa City, Iowa; Jee-Seon Kim, De-
items not because of a response style (such as ARS), but because
partment of Educational Psychology, University of Wisconsin–Madison.
of a high level of the construct. Such forms of confounding often
We thank Rene Mõttus for providing the data used in this article.
Correspondence concerning this article should be addressed to Daniel M. make latent variable models of response style intractable when
Bolt, Department of Educational Psychology, University of Wisconsin, attempting to simultaneously control for multiple response style
859 Educational Sciences, 1025 West Johnson Street, Madison WI 53706- types. Unfortunately, failure to account for the full range of re-
1796. E-mail: dmbolt@wisc.edu sponse style types can be problematic not only because of the

528
RESPONSE STYLES USING ANCHORING VIGNETTES 529

biasing effects of missed response styles, but also because any van Rosmalen et al., 2010) distinguish response style classes based
response style attended to in the model may also be mischaracter- on response category selection without accounting for the effects
ized in the presence of the missed response styles. of a substantive trait. It is important to acknowledge that such
A promising design consideration to address this limitation of methods may be entirely appropriate for surveys not intended to
latent variable models is the use of anchoring vignettes. Anchoring measure a latent construct, as frequently occurs with marketing
vignettes are typically short texts involving hypothetical persons or surveys, for instance. However, such methods are likely less useful
scenarios that are rated with the same rating scales as the self- in defining response styles when latent constructs are present, and
rating instrument. The vignettes are designed so as to instill a perhaps the intended goal of measurement, as is often the case in
common subjective reaction from all respondents; thus, variability psychological measurement.
in respondent ratings is attributed to response style heterogeneity. Scale heterogeneity models represent another class of methods
An important consideration in the use of a vignette design is the for handling response styles. Such models are well suited to
writing of the vignettes. Ideally, the vignettes should be written account for differences in respondent interpretations of the rating
with unambiguous language and avoiding any unnecessary addi- scale metric (e.g., the subjective level of agreement that a state-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

tional information that could differentially alter perceptions of the ment like strongly agree implies). For example, Rossi, Gilula, and
This document is copyrighted by the American Psychological Association or one of its allied publishers.

vignette. As noted by King, Murray, Salomon, and Tandon (2004), Allenby (2001) defined a model in which each respondent is
the vignette should be “geared to encourage respondents to think associated with unique location (␶) and scale (␴) parameters cor-
the person described is someone just like themselves in all other responding to linear translations of the rating scale metric. One
ways” (p. 194), thus making respondents more likely to use the difference between their approach and latent variable approaches,
rating categories in the same way as for the self-rating items. similar to the methods described above, is that scale heterogeneity
While various vignette designs are possible, a common strategy is models do not provide a representation of response style types as
to develop five to seven vignettes for each self-rating item, with distinguished from effects of the substantive trait. A second dif-
each vignette resembling the self-rating item in every way except ference can be understood with respect to the perceived underlying
for the object of the rating. Further details on this methodology are cause of response styles. A common psychological explanation for
provided by King et al. and will also be discussed in relation to a response styles is that they represent a respondent’s attempt to
real data example presented in a later section.
simplify the process of responding to items (Baumgartner & Steen-
The goal of this article is to illustrate a new latent trait model
kamp, 2001). Response styles have been hypothesized to reflect a
that can be applied with an anchoring vignette design so as to
mechanism by which respondents can reduce cognitive effort by
theoretically measure and control for any response style type. The
only attending to a subset of categories in providing their ratings.
model is a generalization of a multidimensional item response
Such a theory would appear to be better represented by the latent
theory (IRT) model for response style presented by Bolt and
trait approach to response style where respondents have higher
Johnson (2009) that accounts for response styles through contin-
propensities toward selecting particular score categories for rea-
uous latent trait variables (see also Moors, 2003, for an analogous
sons that do not imply a different psychological interpretation of
discrete representation) using a multidimensional version of the
the rating scale. Of course, this theory also introduces a likelihood
nominal response model (Bock, 1972). The Bolt and Johnson
that many response style types will occur, as different respondents
model (likewise Bolt & Newton, 2011) was only able to introduce
a single response style latent variable. When applied with anchor- will likely use different subsets of categories to simplify respond-
ing vignettes, the new approach allows the vignette responses to ing.
serve as uncontaminated indicators of response style, thus making Another shared limitation of the various methods just described
a more general model that accounts for any response style type is that they do not provide for statistical control of response style.
possible. As will be shown, the new model also permits an explicit A subsequent step is thus needed. A frequent strategy applied with
evaluation of the biasing effects of any response style type on index-based methods controls for response style bias through a
survey scores. The method is illustrated with data collected in a linear regression of the survey scores on the response style indices
cross-national study of conscientiousness by Mõttus, Allik, Realo, (see, e.g., Buckley, 2009; Mõttus, Allik, Realo, Rossier, et al.,
Pullman, et al. (2012), which also provides a demonstration of the 2012). In effect, this approach leads to a residualized survey score
full range of response style types that are seen in survey data. with the predictive effects of the response style index removed.
While the approach is straightforward, some assumptions of the
method can be questioned. First, it assumes response styles will be
Approaches to Measuring Response Style uncorrelated with the substantive trait. This is a strong and likely
As noted above, a fundamental challenge in measuring many of violated assumption, because (as noted earlier) response styles
the response style types in psychological surveys is the frequent frequently do correlate with respondent characteristics. Second,
confounding of effects related to response style and the substantive the approach assumes the biasing effects of response style are
trait. Some approaches to measuring response style choose not to constant across levels of the substantive trait, a particularly ques-
attend to the effects of a substantive construct. For example, a tionable assumption for response styles such as ERS, where the
number of index-based methods use simple counts of the fre- biasing effects should logically vary, often quite substantially
quency with which particular response categories are selected as (Baumgartner & Steenkamp, 2001).
indicators of response style (Baumgartner & Steenkamp, 2001). Due to the capacity of latent trait models to simultaneously
ERS, for example, can be measured by the proportion of times the measure and control for the effects of response style (Morren et al.,
extreme endpoints of the rating scale are selected. Similarly, 2011), there would appear to be value in extending these models to
certain latent class approaches to modeling response style (e.g., account for a full range of response style types. The subsequent
530 BOLT, LU, AND KIM

section outlines how the use of an anchoring vignette design makes propensities across all score categories h ⫽ 1, . . . , K (Thissen &
such an extension possible. Steinberg, 1986). In the current application, K ⫽ 5, ␪ denotes level
of conscientiousness, sr ⫽ [sr1, sr2, sr3, sr4, sr5] is a respondent-
specific response style vector accounting for respondent differ-
A Multidimensional IRT Model for Response Styles
ences in selecting each rating category, and ci ⫽ [cr1, cr2, cr3, cr4,
Using Anchoring Vignettes cr5] is an item-specific vector accounting for item differences
Latent trait models for response style can be adapted to incor- related to category selection (i.e., differences in item “difficulty”).
porate data from vignettes as indicators of response style, while When applied with anchor vignettes, we set ai ⫽ [0, 0, 0, 0, 0] for
simultaneously correcting estimates of the substantive trait for all vignette items, i ⫽ 1, . . . , 30, and ai ⫽ [⫺2, ⫺1, 0, 1, 2] for
response style bias. By use of an anchoring vignette design, the all self-rating items, i ⫽ 31, . . . , 36. In this way, only the
latent variable approach can be extended to accommodate any self-rating items define the conscientiousness trait, ␪, intended to
response style type. To illustrate, we consider the self-rating sur- be measured by the instrument. The specification of fixed interval
vey and associated vignettes administered by Mõttus, Allik, Realo, integer values for the a vector on self-rating items reflects interval-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Pullman, et al. (2012) in a cross-national study of conscientious- level scoring of items, a common assumption made in IRT anal-
This document is copyrighted by the American Psychological Association or one of its allied publishers.

ness. A goal of the Mõttus, Allik, Realo, Rossier, et al. (2012) yses of polytomously scored items (e.g., Masters, 1982). Impor-
study was to compare the distributions of conscientiousness across tantly, this constraint also allows a clearer comparison between ␪
countries in a way that also controlled for country-level response and sum scores across the six self-rating items in evaluating the
style differences. Mõttus, Allik, Realo, Pullman, et al. found effects of response style control. A conceptual illustration of the
peculiar results in terms of country-level conscientiousness esti- model, which demonstrates the anchoring vignettes as pure indi-
mates, in particular, an unexpected inverse relationship with mean cators of response style tendencies, is shown in Figure 1. The
life expectancy, a result they speculated may be due to response presence of pure indicators of response style makes it possible to
style differences across countries. The data analyzed consist of the measure response style tendencies such as ARS, which if based
item responses of 2,965 respondents from 22 countries (in some only on self-rating items would otherwise be confounded with the
cases, country regions) to 36 items. Six items were self-rating substantive trait. In this respect, the vignette approach can be
items of conscientiousness, while the remaining 30 items were preferred to alternative strategies (e.g., Greenleaf, 1992a, 1992b)
anchoring vignettes. All items had five response categories. Ap- based on the administration of additional self-report items mea-
pendix A in the supplemental materials displays the six self-rating suring different content.
items, which are distinguished by the use of different adjectives to For statistical identification, the cih are normalized across cate-
define the endpoints of the scale. Each self-rating item is scored 兺
gories within item (i.e., h cih ⫽ 0), while the srh are normalized
1–5 according to the category selected. Note that the lowest ratings 兺
across categories within respondent (i.e., h Srh ⫽ 0). As a result,
correspond to the highest levels of conscientiousness for Items 1, we can view an item category intercept cik as determining the
3, and 5, but the lowest levels of conscientiousness for Items 2, 4, relative propensity toward selecting category k (in comparison to
and 6. The 30 anchoring vignettes are shown at https://mywebspace other categories) on item i across respondents. In a similar way, the
.wisc.edu/dmbolt/respstyle/. The vignettes used the same rating respondent category intercepts srk reflect the relative propensity
scales as the self-rating items, with each self-rating item having with which respondent r selects category k (in comparison to other
five corresponding vignettes that used the same rating scale end- categories) across items.
point labels. For example, the rating scale for self-rating Item 1, in Figure 2 provides examples of item category probability curves
which the respondent places herself along a 5-point continuum for a single item corresponding to three hypothetical response style
from capable, efficient, competent to inept, unprepared, is also types: a neutral response type (sr ⫽ [0, 0, 0, 0, 0]), an extreme
used for five vignette items involving hypothetical persons. The response type (sr ⫽ [2, ⫺1.3, ⫺1.3, ⫺1.3, 2]), and a midpoint
anchoring vignettes were designed to reflect scenarios aligned with response type (sr ⫽ [⫺.5, ⫺.5, ⫺2, ⫺.5, ⫺.5]). The figures
the five rating scale locations so as to better understand each illustrate how the probability of selecting a response category is
respondent’s use of categories across the entire scale continuum. influenced by both the substantive trait and response style. The
Below, we index Items 1–30 as the vignette items and Items 31–36 relative effects of each response style type are apparent from the
as the self-rating items. In the current analysis, we applied reverse elevated curves associated with particular score categories.
scoring to half of the items so that higher item scores correspond
to higher levels of conscientiousness, consistent with Mõttus,
Allik, Realo, Rossier, et al. (2012).
SR SR SR AV AV AV
The new proposed model can be viewed as a multidimensional Item 1 Item 2 … Item 6 Item 1 Item 2 … Item 30
IRT model in which respondent variability in both conscientious-
ness and response style tendencies are represented through con-
tinuous latent traits. A multinomial logistic model defines the
probability that respondent r selects category k on item i as

exp(aik␪r ⫹ srk ⫹ cik)


P(Uri ⫽ kⱍ␪r, sr) ⫽ , (1)
兺h⫽1 exp(aih␪r ⫹ srh ⫹ cih)
K
θ s
where the numerator can be viewed as defining a propensity Figure 1. Conceptual illustration of response style model (Model 3) using
toward category k and the denominator represents the sum of anchoring vignettes. SR ⫽ self-report; AV ⫽ anchoring vignette.
RESPONSE STYLES USING ANCHORING VIGNETTES 531

A compelling feature of the model in Equation 1 is that it can


A s = [0, 0, 0, 0, 0] theoretically accommodate any form of response style. Due to the
Neutral Responder
normalization of parameters, a respondent with sr ⫽ [0, 0, 0, 0, 0],
characterized as a neutral response type above, is at the mean of s
1.0

(due to the normalizing constraint), and thus provides a natural


5
reference condition against which the bias of any response style
type can be evaluated. An important special case of the model is
0.8

1 one in which sr ⫽ [0, 0, 0, 0, 0] for all respondents, effectively


removing this component of the model. Such a model assumes no
0.6
Probability

response style heterogeneity across respondents. In the current


application, this model provides a baseline model against which
0.4

4
2 3
more general models can be compared to validate the presence of
response styles. We denote this baseline model as Model 1 in
0.2
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

subsequent analyses.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Because it can accommodate any form of response style, the


0.0

model in Equation 1 can also be used to study models that consider


-3 -2 -1 0 1 2 3 only particular response style types as special cases. Specifically,
Theta
constraints can be applied to the model so as to define those
response style types. The analyses of Mõttus, Allik, Realo,
B s = [2, -1.3, -1.3, -1.3, 2] Rossier, et al. (2012) considered both ERS and MRS in the
conscientiousness survey. We can define a special case of Equa-
ERS Responder
tion 1 in which two latent traits, denoted ␩1 and ␩2, both centered
at 0, take the place of sr in representing ERS and MRS. To define
1.0

5
1 ␩1 as an ERS trait, we specify associated category slope parame-
ters for the trait as b1 ⫽ [1.5, ⫺1, ⫺1, ⫺1, 1.5] so as to capture a
0.8

disproportionate tendency to select endpoint categories. Likewise,


the MRS trait ␩2 is defined through slope parameters b2 ⫽
0.6
Probability

[⫺.75, ⫺.75, 3, ⫺.75, ⫺.75]. The resulting model is a special case


of Equation 1 that can be written as
0.4

exp(aik␪r ⫹ b1k␩r1 ⫹ b2k␩r2 ⫹ cik)


P(Uri ⫽ k) ⫽ , (2)

0.2

h⫽1 exp(aih␪r ⫹ b1h␩r1 ⫹ b2h␩r2 ⫹ cih)


3 K
4
2

using the same specified values for the aik and the same normal-
0.0

izing constraints for the ciks as in the general model. We will refer
-3 -2 -1 0 1 2 3
to the model in Equation 2 as Model 2 below. Other types of
Theta response styles (e.g., ARS, DRS) could be similarly specified. The
C s = [-.5, -.5, 2, -.5, -.5] Bolt and Johnson (2009) model is a special case of Model 2 in
which only the response style trait ␩r1 is included, and thus the
Midpoint Responder b2k␩r2 and b2h␩r2 terms are dropped from the numerator and
denominator, respectively, in Equation 2.
1.0

A comparison of Model 2 against the more general model of


3 Equation 1 in terms of model fit can be used to determine whether
5
0.8

the specified response styles of ERS and MRS are sufficient in


accounting for response style heterogeneity. We will denote the
general model of Equation 1 as Model 3 in subsequent analyses.
0.6
Probability

Due to its account for any response style type, Model 3 also
1
functions as a saturated model in which response styles beyond
0.4

those accounted for by Model 2 will be captured.


0.2

4
2 Specification as Multiple Group Models
0.0

Models 1, 2, and 3 can be estimated and compared to evaluate


-3 -2 -1 0 1 2 3 the presence of particular response style types. They can also be
Theta adapted to accommodate the multiple group (country) structure of
the Mõttus, Allik, Realo, Pullman, et al. (2012) data so as to permit
Figure 2. Item category probability curves for an example item, three cross-country comparisons. Such extensions are important in the
response style types. ERS ⫽ extreme response style. Mõttus, Allik, Realo, Pullman, et al. analysis given the speculation
of response style as a potential cause of unexpected country-level
532 BOLT, LU, AND KIM

differences in conscientiousness. More generally, multigroup mod- ␴⫺2 ~ Gamma(1, 1),


els of response style are useful given the frequency with which
variability in response style is a concern in cross-cultural assess- implying the countries differ in terms of mean conscientiousness,
ment. For each model, we assume the distributions of the consci- ␮␪j, but have a constant within-country variance. For Model 2, we
entiousness and response style traits vary at both the group (coun- denote ␪r(j) ⫽ [␪r(j), ␩r(j)1, ␩r(j)2], and assign
try) and respondent levels. For all models, we assume invariance
␪r(j) ~ MultiNormal(␮ j, ⌺),
of the item category intercepts across countries and a constant
within-country covariance matrix of the latent traits. Thus, the where
multigroup analysis effectively introduces country-specific mean
vectors for the conscientiousness and response style traits, that is, ␮ j ~ MultiNormal(0, I3⫻3), and
␮j ⫽ [␮␪j] for Model 1, ␮ j ⫽ 关␮␪j, ␮␩1 j, ␮␩2 j兴 for Model 2, and
␮ j ⫽ 关␮␪j, ␮s1 j, ␮s2 j, ␮s3 j, ␮s4 j, ␮s5 j兴 for Model 3, where j indexes ⌺⫺1 ~ Wishart(12I3⫻3, 11),
country. implying the countries differ in terms of mean conscientiousness,
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

A comparison of the ␮␪j estimates across models shows how ERS, and MRS, but have a constant within-country covariance
This document is copyrighted by the American Psychological Association or one of its allied publishers.

controlling for response style alters the estimated mean levels of matrix.
conscientiousness. The current approach contrasts with the method Finally, for Model 3, we denote ␪r(j) ⫽ [␪r(j), sr(j)1, sr(j)2, sr(j)3, sr(j)4,
applied by Mõttus, Allik, Realo, Rossier, et al. (2012), which sr(j)5] and assign priors to respondent parameters as
performed a correction for response style by regressing the mean
country conscientiousness survey score on the mean country re- ␪r(j) ~ MultiNormal(␮ j, ⌺),
sponse style indices, using the resulting residuals as the adjusted
country-level mean conscientiousness estimates. Such an approach where
may be expected to differ substantially form the current approach
␮ j ~ MultiNormal(0, I6⫻6), and
in part because of its assumption that response style should be
uncorrelated with conscientiousness at the country level. ⌺⫺1 ~ Wishart(41I6⫻6, 40),
An appealing feature of the multigroup implementation is its
capability to simultaneously introduce response style control at implying the countries differ in terms of mean conscientiousness
both the respondent and country levels. That is, the presence of the and mean propensities toward each of the five categories, but have
s1, s2, . . . , s5 at the respondent level permits interpretation of ␪ as a constant within-country covariance matrix. In all cases priors
a bias-corrected estimate of respondent conscientiousness, while were intentionally chosen as weak so as to maximize the extent to
the additional presence of ␮s1, ␮s2, . . . , ␮s5 at the country level which the data would inform model results. Through application of
permits interpretation of ␮␪ as a bias-corrected estimate of mean these priors, a Gibbs sampling procedure can be used to estimate
conscientiousness for a country. model parameters. The most efficient method by which this sam-
pling occurs is dependent upon the nature of the priors chosen and
the structure of the model (see Gilks, Richardson, & Spiegelhalter,
A Bayesian Estimation Algorithm
1996). Under Model 2, a Metropolis–Hastings within Gibbs ap-
To estimate the models described above, we implement a fully proach is applied, while direct Gibbs sampling can be applied for
Bayesian approach using Markov chain Monte Carlo simulation. A Models 1 and 3. The algorithms were implemented with the
Bayesian approach was desirable due to the high dimensionality of software program WinBUGS 1.4 (Spiegelhalter, Thomas, Best, &
the model, which makes traditional estimation methods computa- Lunn, 2003). Appendix B in the supplemental materials displays
tionally intractable. The Bayesian approach also can be applied in code used for Model 3. Simulated chains were carried out to at
a consistent way across models considered in this article, making least 10,000 iterations for each of the models. The chains for each
model comparison straightforward. An initial step in such analyses model were evaluated for convergence, and in some cases contin-
is a specification of priors for the parameters in each model. For ued beyond 10,000 iterations if the criteria were not satisfied.
each of Models 1, 2, and 3, the item category intercept parameters Following identification of an appropriate number of burn-in iter-
are assigned priors of ations, we calculate parameter estimates as the mean of the sam-
pled states across the remaining iterations post-burn-in. WinBUGS
cik ~ Normal(0, 5), code for the models applied in this article, example data files, and
supplementary output files can be found at https://mywebspace
which, as noted earlier are subsequently normalized for identifi-
.wisc.edu/dmbolt/respstyle/.
cation purposes. As the three models differ in regard to specifica-
Model comparison was conducted with the deviance informa-
tion of the person traits, different prior specifications are needed
tion criterion (DIC; Spiegelhalter, Best, Carlin, & van der Linde,
for the remaining parameters. Under Model 1, we assign priors to
2002). The DIC is an index analogous to the Akaike information
the trait parameters as
criterion and Bayesian information criterion that weighs both
statistical model fit and model complexity in identifying a pre-
␪r(j) ~ Normal(␮ j, ␴2),
ferred model. The DIC is based on the posterior distribution of the
where ␪r(j) denotes the conscientiousness trait for respondent r deviance (⫺2 log-likelihood) and a term representing the “effec-
(from country j), tive number of parameters” that accounts for the expected decrease
in deviance attributed to the added parameters of the more com-
␮ j ~ Normal(0, 1), and plex model. As with the other information criteria, the preferred
RESPONSE STYLES USING ANCHORING VIGNETTES 533

model has the lower DIC. DIC can be estimated within WinBUGS 1 with the same notation as in Equation 3, but where t(i) ⫽ 1, . .
1.4. . 6 now indexes the type of item in terms of its rating scale
endpoints (1 ⫽ first rating scale type, up to 6 ⫽ last rating scale
type) and each of srk,1, srk,2, . . . , srk,6 is normalized across cate-
Assumptions in Using Anchor Vignettes
gories within respondent. At the country level, we therefore intro-
Despite their potential value in measuring any response style duce as many country mean parameter vectors as there are rating
type, it is important to evaluate some core assumptions of the scale types, ␮j, 1 . . . , ␮j,6, but again assume a common covariance
anchoring vignette approach (Grol-Prokopczyk, Freese, & Hauser, matrix for the srk,1, srk,2, . . . , srk,6 within each country (the
2011; King et al., 2004). King et al. (2004) presented these covariance matrices are also assumed constant across countries).
assumptions in a context where responses to individual self-rating As for the RC comparison model, we independently sample the
items are being corrected for response style effects. As our model ␮j,1 . . . , ␮j,6 using the same priors as specified for Model 3.
focuses on a collection of six items that collectively measure a Each of the RC and VE comparison models can be implemented
substantive trait, the approaches taken to evaluating these assump- with the same general algorithms applied to Models 2 and 3. In
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

tions are slightly different, but consistent with King et al.’s defi- evaluating the RC and VE assumptions, model comparison is also
This document is copyrighted by the American Psychological Association or one of its allied publishers.

nitions. Response consistency (RC) implies that respondents use performed with the DIC index as described above.
the rating scale in the same way for the self-rating items as for the
anchoring vignettes. This assumption is made explicit in the cur- Analyses of Self-Reported Conscientiousness
rent model by the assumed invariance of the response style pa-
rameters (␩r in Model 2 and sr in Model 3) across the self-rating In Markov chain Monte Carlo applications, an important initial
and anchoring vignette items. We can evaluate this assumption step involves inspection of chain convergence. The sequence of
through comparison of Models 2 and 3 against more general states for the Markov chain should theoretically converge to a
models that allow the response style parameters to be different for stationary distribution such that the sampled observations can be
the self-rating versus vignette items. Each of these comparison viewed as a sample from the posterior distribution of the model
models effectively doubles the number of respondent and country parameters. Failure to see evidence of convergence suggests po-
response style parameters in the model. For Model 3, the compar- tential problems with the sampling mechanism or the identifiabil-
ison RC model can be written as a generalized form of Equation 1: ity of model parameters. To evaluate convergence of the Markov
chains, we applied recommended statistical criteria (e.g., Geweke,
exp(aik␪r ⫹ srk,t(i) ⫹ cik) 1992; Raftery & Lewis, 1992) along with visual inspection of the
P(Uri ⫽ kⱍ␪r, sr) ⫽ , (3)
兺h⫽1 exp(aih␪r ⫹ srh,t(i) ⫹ cih)
K sampling histories of model parameters (Spiegelhalter et al., 2003).
Briefly, the Geweke criterion involves calculating a z score from
where t(i) ⫽ 1, 2 indexes the type of item (1 ⫽ self-rating, 2 ⫽ the difference of means between the first 10% and last 50% of
vignette) and each of srk,1 and srk,2 is normalized across categories sampled states divided by their pooled standard deviation. Non-
within respondent. At the country level, this generalization also significant z values (e.g., ⫺1.96 ⱕ z ⱕ 1.96) support convergence.
implies two country mean vectors, a ␮j,1 for the self-rating items The Raftery and Lewis criterion considers the sample size needed
and a ␮j,2 for the vignette items, where the first element of ␮j,2 is to estimate quantiles of the posterior with sufficient precision. An
irrelevant due to the absence of a substantive trait for the vignette index I is returned indicating the increase in the number of sam-
items. A common covariance matrix is applied for the srk,1 and pled states needed to reach convergence due to autocorrelations in
srk,2 within country (as well as across countries). The ␮j,1 and ␮j,2 the chain; values of I ⱖ 5 indicate problems with convergence.
are independently sampled with the same prior as specified for the Finally, graphical inspection of the sampling histories seeks to
␮j in Model 3. identify a “caterpillar-like” shape for each of the parameters.
The second assumption, vignette equivalence (VE), implies that The R package coda (Plummer, Best, Cowles, & Vines, 2006)
respondents perceive each vignette in the same way (or that any was used to process and graph the sampling histories observed for
departures in how a vignette item is perceived are random across all model parameters, evaluate diagnostic criteria, and calculate
items and unrelated to what is being measured). This assumption parameter estimates. Such analyses focused on the country-level
is made explicit in the modeling of vignette responses only in and item category intercept parameters for each model. The con-
relation to the sr and ci parameters. As there are 30 vignettes in the vergence criteria examined for these parameters were deemed
current application, there are various ways in which departures adequate for interpretation of posterior distributions of the model
from VE could be explored. One feature of the current data that parameters (Spiegelhalter et al., 2003). For example, under Model
might be anticipated to produce a lack of VE relates to the different 3, we find each of the ␮␪, ␮s, and c parameters to pass the
rating scale endpoint labels used for different vignettes. Specifi- Raftery–Lewis criterion (I ⱕ 5), while the Geweke criterion rejects
cally, it might be questioned whether respondents demonstrate at levels only slightly above chance (0/22 for ␮␪, 9/110 for ␮s,
variability in responses to vignettes according to the type of rating 28/180 for c), with even statistically significant differences not
scale (as defined by these endpoints) used for the vignette. We being large in magnitude and consistently displaying a caterpillar-
therefore consider a more general comparison model in which the like sampling history. Results for other parameters in Model 3 are
sr parameters and their country-level means are allowed to differ provided at https://mywebspace.wisc.edu/dmbolt/respstyle/. In ad-
by vignette type. As there exist six scale types, this comparison dition, the Raftery–Lewis criterion suggested a burn-in that was
model increases by 5 times the number of respondent and country consistently less than 500; thus, we use all iterations after 500 (up
mean response style parameters. For Model 3, the comparison VE to at least 10,000 iterations, depending on the model) in calculating
model can thus also be written as a generalized form of Equation posterior distributions for model parameters. We use the means
534 BOLT, LU, AND KIM

Table 1
Model Comparison Results, Mõttus, Allik, Realo, et al. (2012) Data

D-bar (post. mean ⫺ 2


Model log-likelihood) D-hat pD (D-bar ⫺ D-hat) DIC (D-bar ⫹ pD)

1. No response style 203632 201531 2110.91 205733


2. Only ERS, MRS 185414 179749 5665.06 191079
3. All response style 182240 174320 7919.68 190160
4. Scale heterogeneity model 192122 186359 5763.15 197885
5. Model 3, with RC assumption relaxed 179418 170732 8686.56 188105
6. Model 3, with RC and VE assumptions relaxed 175132 156233 18899.70 194032
Note. DIC ⫽ deviance information criterion; ERS ⫽ extreme response style; MRS ⫽ midpoint response style; RC ⫽ response consistency; VE ⫽ vignette
equivalence.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

and standard deviations of the posterior distributions to define lowest DIC), that is, 30 out of 30 correct model identifications.
parameter estimates and standard errors, respectively. Such results appear to provide good support for our use of DIC in
this context.
Model Comparison Results As noted, the final RC and VE comparison models can also be
examined against Model 3 to evaluate anchoring vignette assump-
Table 1 displays model comparison results based on the DIC tions. From a comparison of DIC indices, it would appear that
criterion. The models evaluated include multigroup versions of the assumption of RC is violated, implying that respondents use
Models 1, 2, and 3 described earlier, a multigroup extension of the the scales in detectably different ways for the self-rating and
Rossi et al. (2001) scale heterogeneity model, and the comparison vignette items. This should not be taken to imply, however, that the
RC and VE models for Model 3. The rightmost column displays response style tendencies are independent across item types. To
the final DIC estimates. Based on the DIC criterion, Models 2 and evaluate the extent to which response style tendencies are corre-
3 emerge as clearly superior to Model 1, suggesting that the lated for self-rating and vignette items, we calculated estimates of
response style variability introduced in Models 2 and 3 is statisti- the country-level mean response style trait estimates (the second
cally meaningful. Moreover, Model 3 is found statistically superior through sixth elements of each of the ␮j,1 and ␮j,2 vectors) for each
to Model 2, suggesting the detectable presence of response style item type. The correlations between the country-level self-rating
types beyond those of ERS and MRS. At the same time, the and vignette response style estimates by category ranged from .55
relative decline in DIC is clearly greatest from Models 1 to 2, to .78.2 Thus, while the RC comparison model confirms that there
suggesting that ERS and MRS likely represent the predominant are detectable differences in response style behavior for the vi-
forms of response style heterogeneity in these data. In addition, gnette versus self-rating items, there are also strong consistencies
Model 3 appears superior to the scale heterogeneity model, sug- that likely make the anchoring vignette data useful in understand-
gesting the latent trait modeling approach of Equation 1 may be ing and controlling for response style behavior on the self-rating
preferred to a scale heterogeneity model for these data. items. Consequently, we focus below primarily on the results
To confirm our use of DIC for model comparison, we conducted observed for Model 3.
a series of simulation analyses based on the real data analyses
applied using Models 1, 2, and 3. Specifically, we generated item
response data sets of the same structure as the Mõttus, Allik, Model Estimates
Realo, Pullman, et al. (2012) data using parameter samples from
To better understand the nature of response style heterogeneity
the posterior distributions of the simulated chains for Models 1, 2,
captured under Models 2 and 3, we can inspect estimates of the
and 3 as generating parameters. We then proceeded to fit each of
trait correlations, as shown in Tables 2 and 3. These correlations
Models 1, 2, and 3 to each data set. Ten data sets were generated
are derived from the corresponding ⌺ estimates for each model,
for each model using a different sample of parameters as gener-
and are based on 9,500 iterations post-burn-in for Model 2 and
ating parameters in each replication (resulting in a total of 90
19,500 iterations post-burn-in for Model 3. In both models, re-
analyses).1 A comparison of DICs across the three models in each
sponse style tendencies appear to be weakly correlated with ␪, the
case identified the correct generating model (by returning the
substantive trait. Moreover, among response style traits, ERS and
MRS appear only weakly negatively correlated in Model 2, im-
plying there are likely respondents that represent a blend of these
Table 2
response style types (e.g., respondents that consistently select
Respondent-Level Correlation Estimates Among
Categories 1, 3 and 5). For Model 3, the correlations among
Conscientiousness and Response Style Traits, Model 2, Mõttus,
Allik, Realo, et al. (2012) Data
1
Computational time is the primary reason more replications were not
Variable ␪ ␩1 ␩2 performed here.
2
The subject intercept estimates can also be correlated across item types
␪ —
at the respondent level; however, the lack of reliability for these intercepts
␩1 ⫺.13 —
in the self-rating condition, which are based on only six item responses,
␩2 .11 ⫺.11 —
precluded meaningful comparisons.
RESPONSE STYLES USING ANCHORING VIGNETTES 535

Table 3 Bias(␪, s) ⫽ ES(␪, s) ⫺ ES(␪, 0).


Respondent-Level Correlation Estimates Among
Conscientiousness and Response Style Traits, Model 3, Mõttus, It is important to note that since there exists no “correct” level of
Allik, Realo, et al. (2012) Data response style, s ⫽ [0, 0, 0, 0, 0] is to some extent an arbitrary
reference point, although the relative bias will stay the same for
Variable ␪ s1 s2 s3 s4 s5 other choices. An examination of bias curves allows for an eval-
uation of the practical influence of a particular response style on
␪ —
s1 ⫺.08 — the self-rating scores.
s2 .09 .09 — Figure 3 illustrates bias in the self-rating total score as a function
s3 .11 ⫺.15 .57 — of ␪ for several different response style types based on the esti-
s4 .01 ⫺.09 .58 .55 — mates of Model 3. The example response styles shown here reflect
s5 ⫺.18 .54 ⫺.16 ⫺.19 .08 —
ERS, MRS, and a response style tendency toward Category 2
(denoted RS-2). Apparent from the curves is the differential bias
seen not only across response style types but also in relation to ␪.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Of particular concern is the substantial bias for ERS seen near the
elements of sr in Model 3 suggest a pattern reflective of ERS, with mean of ␪ (⫽0), a location where a high density of respondents
moderate positive correlations among respondent propensities for might be expected. Such a result suggests there may well be value
the endpoint Categories 1 and 5 and for the intermediate Catego- in controlling for response style in the Mõttus, Allik, Realo,
ries 2, 3, and 4. Pullman, et al. (2012) self-rating items, given the likely prevalence
Table 4 displays the item category intercept estimates from of ERS.
Model 3 for the six self-rating items. These estimates indicate the
relative propensity to select each category for a respondent at the
Respondent-Level Results
mean level (0) on both ␪ and sr. Apparent from these estimates is
the near-parallelism of the six self-rating items. Such parallelism Application of Model 3 permits evaluations of bias corrections
may make sum scores particularly susceptible to bias from re- at both the respondent and country levels in the presence of any
sponse style effects, as the biasing effects observed for a single response style type. Corrections at the respondent level can be
item will tend to be replicated across the other items (Bolt & evaluated by contrasting ␪ estimates derived under Model 3
Newton, 2011). against ordinary sum scores calculated for the self-rating consci-
entiousness items. Figure 4 illustrates a scatterplot of respondent ␪
Evaluating the Biasing Effects of Response Styles estimates (bias corrected) based on Model 3 against the sum
scores. It is important to note that the distribution of self-rating
The category intercept estimates of Table 4 can be used to sum scores is concentrated toward the high end of the scale
evaluate bias in the survey scores as a function of response style. (overall M ⫽ 22.97, SD ⫽ 3.87). Where the scatterplot illustrates
Due to the flexibility of Model 3, it is possible to use the Model 3 dispersion of the ␪ estimates at a fixed sum score, we can see
estimates to evaluate bias for any hypothetical response style type. corrections for response style bias as accounted for under Model 3.
For example, for a given response style vector s under Model 3, we Although the correlation between sum scores and ␪ estimates is
can compute an expected total score on the self-rating items of the high overall (r ⫽ .87), it is apparent from the scatterplot that there
survey as are respondents and sum score levels for which the corrections due
36 5 to response style bias are substantial. (Moreover, the correlation is
ES(␪, s) ⫽ 兺 兺 k ⫻ P(Uj ⫽ kⱍ␪, s), notably lower than the correlation between the sum scores and ␪
j⫽31 h⫽1
estimates under Model 1 [r ⫽ .97], indicating the that the changes
where P(Uj ⫽ k|␪, s) is as defined in Equation 1. If we assume the reflect an adjustment for response style as opposed to the use of an
mean of s (⫽ [0, 0, 0, 0, 0]) to define an appropriate reference IRT latent trait.) One pattern apparent from the scatterplot is the
point, we can also use the equation to determine an expected score, tendency to see larger and more variable bias adjustments near the
namely ES(␪, 0), for a neutral response style. Then bias can be mean sum score, where ERS was suggested in Figure 3 to con-
estimated as tribute more substantially to bias.

Table 4
Item Category Intercept Estimates, Model 3, Mõttus, Allik, Realo, et al. (2012) Data

Category 1 Category 2 Category 3 Category 4 Category 5


Item ĉ1 SE ĉ2 SE ĉ3 SE ĉ4 SE ĉ5 SE

31 ⫺3.91 0.16 ⫺0.96 0.08 1.12 0.06 2.37 0.06 1.39 0.07
32 ⫺3.08 0.11 ⫺0.39 0.06 0.93 0.05 1.73 0.05 0.81 0.06
33 ⫺3.89 0.15 ⫺1.22 0.08 0.82 0.06 2.39 0.06 1.89 0.07
34 ⫺3.39 0.13 ⫺0.81 0.07 1.22 0.05 1.99 0.05 0.98 0.06
35 ⫺3.06 0.11 ⫺0.43 0.06 0.98 0.05 1.77 0.05 0.75 0.06
36 ⫺3.09 0.11 ⫺0.65 0.06 1.03 0.05 1.76 0.05 0.94 0.06
536 BOLT, LU, AND KIM

while the rightmost columns display the estimates of the response


10

ERS, s=[2,-1.3,-1.3,-1.3,2] style traits under Models 2 and 3.


MRS, s=[-.5,-.5,2,-.5,-.5] Respondents A–D were chosen as examples of response style
RS-2, s=[-.5,2,-.5,-.5,-.5] tendencies that are well captured by both Models 2 and 3 and for
5

which similar bias corrections are performed by both models.


Respondent A’s pattern reflects a disproportionate selection of 2s,
3s, and 4s. Control for this response style results in a substantial
Bias

change to the respondent’s ␪ estimate under Models 2 and 3, where


0

the self-ratings (containing primarily 4s) result in a much higher ␪


estimate than is observed when response style is ignored (under
Model 1). Model 2, which attends to ERS and MRS, captures
-5

Respondent A’s response style through low and high estimates on


␩1 and ␩2, respectively. Respondents B and C, both of whom
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

demonstrate ERS, differ from each other in that Respondent C also


This document is copyrighted by the American Psychological Association or one of its allied publishers.

selects 3s on some vignettes. The ␪ estimates for both respondents


-10

are adjusted in a negative direction relative to Model 1, consistent


with the positive biasing effects for ERS seen in Figure 3 when
-3 -2 -1 0 1 2 3
␪ ⱖ ⫺1. Finally, Respondent D responds entirely with 3s, and is
Theta represented as such from both the Model 2 and Model 3 response
style estimates. Moreover, while Models 2 and 3 imply a large
Figure 3. Estimated bias in conscientiousness self-rating sum score as a change to the ␪ estimate (essentially pulling it to the mean) there
function of ␪, three response style types, Model 3. ERS ⫽ extreme is also a substantial increase in the standard error, implying little is
response style; MRS ⫽ midpoint response style; RS-2 ⫽ response style known about ␪ for this respondent.
tendency toward Category 2.
The superior fit of Model 3 relative to Model 2 suggests the
presence of response styles other than ERS and MRS. Respondents
To further illustrate some of the effects of Model 3 on the E, G, H, and I provide examples of such patterns. Respondent E,
scoring of individual respondents, Tables 2 and 3 present some like D, consistently selects only one response category—in this
actual response patterns observed in the data as well as their case Category 5. Although such a pattern is not ERS, it emerges as
associated parameter estimates under Models 1, 2, and 3. Each ERS under Model 2. Thus, the estimated ␪ for Respondent E under
respondent’s ratings for the vignettes are shown in the leftmost Model 2 is quite different from that seen under Model 3, which can
column, followed by the responses to the self-rating items. The identify the response style as being related to overselection of only
subsequent columns indicate the ␪ estimates under each model, 5s. In contrast to Respondent E, Respondent F reflects a more

Figure 4. Scatterplot of Model 3 ␪ estimates by conscientiousness self-rating sum scores, Mõttus, Allik, Realo,
Pullman, et al. (2012) data.
RESPONSE STYLES USING ANCHORING VIGNETTES 537

neutral respondent in terms of vignette responses, and thus despite

0.92 (0.35) ⫺1.26 (0.40)

2.51 (0.43)

0.90 (0.49)

0.28 (0.40) ⫺2.48 (0.45)

4.30 (0.43)

0.58 (0.27) ⫺0.41 (0.40)

0.51 (0.41)

3.08 (0.34)

1.09 (0.28) ⫺0.57 (0.37) ⫺0.12 (0.31) ⫺0.69 (0.41)


having the same scores for the self-rating items, receives a much

ŝ5
higher ␪ estimate than Respondent E under Models 2 and 3.
Respondents G, H, and I, like Respondent E, illustrate response
style types that are uniquely identified under Model 3, do not

1.98 (0.43) ⫺1.30 (0.35) ⫺2.18 (0.43) ⫺1.02 (0.33)

1.22 (0.35) ⫺0.77 (0.35) ⫺0.19 (0.40) ⫺1.14 (0.31)

1.06 (0.24) ⫺0.33 (0.43) ⫺0.14 (0.43) ⫺2.16 (0.33) ⫺1.68 (0.53) ⫺0.32 (0.36)

2.22 (0.29)

0.61 (0.35) ⫺0.97 (0.41) ⫺1.82 (0.34) ⫺0.23 (0.32) ⫺0.05 (0.39)
reflect either ERS or MRS, and return substantially different ␪
estimates as a result.

ŝ4
While the examples of Table 5 apply bias corrections con-
sistent with intuition (based on vignette responses) thus lending
support to the model, it is also possible to examine effects with

1.67 (0.34)

3.30 (0.29)

0.41 (0.36)

0.50 (0.72) ⫺0.81 (0.17) ⫺0.52 (0.36) ⫺2.11 (0.43) ⫺0.34 (0.31) ⫺0.28 (0.43)
Model 3
respect to external correlates. In the Mõttus, Allik, Realo,

ŝ3
Pullman, et al. (2012) study, respondents also answered two
items reflecting religiosity—“How important is God in your
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

life?” and “How religious would you say you are?”— both

0.33 (0.32)

0.80 (0.33)

0.20 (0.29)
This document is copyrighted by the American Psychological Association or one of its allied publishers.

answered on a 1–10 scale with 10 reflecting high religiosity.


Religiosity and conscientiousness are expected to positively

ŝ2
correlate (Saroglou, 2002). The correlations observed between
the original conscientiousness sum scores and each of these
items are positive, as expected (.23 and .25, respectively).

0.74 (0.27) ⫺1.67 (0.53)

2.60 (0.28) ⫺1.91 (0.56)

0.12 (0.32) ⫺0.77 (0.41)

0.29 (0.40)
Interestingly, these correlations actually weaken with respect to

Examples of Individual Response Patterns and Corresponding Respondent Trait Estimates Under Models 1, 2, and 3

ŝ1
the ␪ estimates of Model 3 (to .18 and .22, respectively).
However, the weakened correlations have a clear explanation,
in that religiosity is known to positively correlate with ERS

1.28 (0.23) ⫺0.38 (0.44)

0.51 (0.36)

⫺0.72 (0.40) ⫺0.78 (0.41) ⫺0.32 (0.43) ⫺0.22 (0.15) ⫺0.50 (0.36)
(Marshall & Lee, 1998). Indeed, multiple regression models
that regress each religiosity item onto both the ␪ˆ and ŝ of
␩ˆ 2
5
Model 3 (ŝ5 reflecting a respondent’s disproportionate tendency
Model 2

to select Category 5) yield significant positive coefficients for


0.80 (0.53) ⫺0.74 (0.19)

0.69 (0.21)

⫺0.97 (0.39) ⫺0.22 (0.67) ⫺0.06 (0.62) ⫺1.02 (0.26)

2.04 (0.65) ⫺0.04 (0.16)

0.54 (0.20)
both the ␪ˆ (␤ ⫽ .18, t ⫽ 10.27, p ⬍ .001; ␤ ⫽ .22, t ⫽ 12.08,
p ⬍ .001, respectively) and ŝ5 (␤ ⫽ .14, t ⫽ 7.51, p ⬍ .001; ␤ ⫽
␩ˆ 1

.12, t ⫽ 6.52, p ⬍ .001, respectively). Thus, it would appear


that use of the original sum score actually produced a slight
overestimate of the correlation due to the conflation of effects
⫺0.84 (0.35) ⫺1.03 (0.27) ⫺1.16 (0.30)

0.06 (0.46)

0.65 (0.63) ⫺0.08 (0.70)

0.62 (0.52) ⫺0.13 (0.59)


Model 3

related to both actual conscientiousness and extreme response


␪ˆ

tendencies, both of which positively correlate with religiosity.

Country-Level Results
0.95 (0.54)

0.56 (0.40) ⫺0.02 (0.44)

2.01 (0.59)

0.99 (0.61)
Model 2

The other level at the which the implications of response


␪ˆ

style can be examined is in the country-level mean conscien-


tiousness (␮␪) estimates. This was the focus of Mõttus, Allik,
Realo, Rossier, et al. (2012). Figure 5 illustrates the
0.09 (0.45)

1.89 (0.58)

1.33 (0.48)

0.47 (0.44)

1.31 (0.54)

␮s1 j, ␮s2 j, ␮s3 j, ␮s4 j and ␮s5 j estimates for six exemplar countries
Model 1

based on Model 3. Each plot illustrates the estimates in profile


␪ˆ

form along with their corresponding 95% credible intervals.


Estimates above 0 reflect a disproportionate tendency among
Standard errors appear in parentheses.
self-ratings

respondents in the country to select the category; estimates


Response
pattern,

454344

515511

555533

333333

555555

555555

444454

553555

444224

below 0 the opposite. Some of the patterns seen in Figure 5 are


consistent with the findings of Mõttus, Allik, Realo, Rossier, et
al. in that certain countries are more ERS (e.g., Senegal),
433333244333444
443533254332255
115551115511155
115551155511155
113551135511155
113551135511155
333333333333333
333333333333333
555555555555555
555555555555555
244452215421145
433442135533254

444445555544444
555555555533333
555555555555555
122552224512125
212552135521225

whereas others (e.g., Germany, Hong Kong) more anti-ERS. At


444544444444·44
Response pattern,

the same time, through Model 3 we can see that there are
vignettes

countries whose mean response style tendencies appear to re-


flect patterns other than the ERS and MRS types considered in
Mõttus, Allik, Realo, Rossier, et al. and whose actual response
style tendencies could be misconstrued under Model 2, thus
illustrating the value of a model that attends to any response
Respondent

style type. For example, one country region highlighted as


Table 5

ID

H
B

strongly ERS in Mõttus, Allik, Realo, Rossier, et al. is Changc-


Note.

hun (China). The Model 3 analysis, however, shows that


538 BOLT, LU, AND KIM

Changchun(China) Germany Hong Kong

0.6

0.6

0.6
Mean Category Intercept
Mean Category Intercept

Mean Category Intercept


0.2

0.2

0.2
-0.2

-0.2

-0.2
-0.6

-0.6

-0.6
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Category Category Category


This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Mali Mauritius Senegal


0.6

0.6

0.6
Mean Category Intercept

Mean Category Intercept


Mean Category Intercept

0.2

0.2

0.2
-0.2

-0.2

-0.2
-0.6

-0.6

-0.6
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Category Category Category

Figure 5. Country-level mean response style estimates for six exemplar countries, Model 3, Mõttus, Allik,
Realo, Pullman, et al. (2012) data. Each plot illustrates the estimates in profile form along with their
corresponding 95% credible intervals.

Changchun (China) appears to have a disproportionate tendency adjustments related to their own unique response style tenden-
only toward Category 1, not Category 5. Similarly, countries cies that are not well captured by Model 2.
such as Mali and Mauritius illustrate response style profiles not Despite some notable and interpretable changes in the country-
well captured by any combination of ERS and MRS and are level means when controlling for response style under Model 3, the
thus better represented under Model 3. adjustments are less extreme than those seen in Mõttus, Allik,
Table 6 illustrates how the control of response style under Realo, Rossier, et al. (2012), as shown in the rightmost column of
Models 2 and 3 affects the ␮␪ estimates. Country-level means Table 6. Here again, the capability of Model 3 in measuring
for the self-rating items are reported along with the ␮␪ estimates response style is useful in understanding why the country-level
and the rank ordering of countries according to these estimates corrections are minimal. Specifically, a decomposition of the re-
under Models 1, 2, and 3. Comparison of the ␮␪ estimates for sponse style estimates ŝ1, . . . ŝ5 between versus within countries
Model 3 against Model 2 illustrates how Model 3 represents a returns intraclass correlation estimates ranging from .06 to .09,
different form of response style control. Countries highlighted suggesting that the vast majority of heterogeneity in response style
in italics in Table 6 represent those showing more sizable occurs within as opposed to between countries. Thus, it is not
adjustments in their mean conscientiousness estimates across surprising that in the current analysis the more significant response
models, the same countries as shown in Figure 5. The adjust- style adjustments are occurring at the respondent level.
ments are largely consistent with expectations related to the
bias curves in Figure 3. For example, countries such as Hong
Discussion and Conclusion
Kong and Germany show larger increases in their ␮␪ estimates
owing to their anti-ERS tendencies, while a country such as The primary goal of this article is to illustrate a new latent
Senegal has its ␮␪ decreased due to ERS tendencies. One variable approach to modeling response styles that can accommo-
interesting result is again seen for Changchun (China), which date the measurement and control of any response style type. Due
under Model 3 showed a disproportionate tendency to select to the frequent confounding of response styles with substantive
Category 1. Under Model 2, this behavior emerges as ERS, and traits in self-rating responses, anchoring vignettes play a funda-
leads to a lower ␮␪ relative to Model 1. However, under Model mental role in making such a model estimable. An advantage of the
3 the ␮␪ is adjusted in the reverse direction, due to the correct current approach relative to competing approaches for controlling
form of response style captured. Along these lines, both Mau- response style, such as hierarchical ordered probit (hopit) models
ritius and Mali provide examples of countries with more sizable (e.g., King et al., 2004), is the potential within the current model
RESPONSE STYLES USING ANCHORING VIGNETTES 539

Table 6
Country-Level Mean Conscientiousness Estimates, Models 1, 2 and 3, Mõttus, Allik, Realo, et al. (2012) Data

Item score Model 1 Model 2 Model 3


Country M Rank ␮ˆ ␪ Rank ␮ˆ ␪ Rank ␮ˆ ␪ Rank Mõttus ranking

Japan 3.13 1 ⫺1.03 (0.09) 1 ⫺1.06 (0.09) 1 ⫺1.08 (0.10) 1 4


South Korea 3.45 2 ⫺0.64 (0.09) 2 ⫺0.58 (0.09) 2 ⫺0.64 (0.10) 2 9
Lithuania 3.54 3 ⫺0.52 (0.08) 3 ⫺0.49 (0.08) 3 ⫺0.54 (0.09) 3 2
Australia 3.62 4 ⫺0.39 (0.04) 4 ⫺0.45 (0.05) 4 ⫺0.38 (0.05) 4 3
Russia 3.67 5 ⫺0.31 (0.09) 5 ⫺0.23 (0.09) 6 ⫺0.34 (0.10) 5 1
Switzerland 3.69 6 ⫺0.31 (0.09) 6 ⫺0.26 (0.09) 5 ⫺0.29 (0.09) 6 7
Estonia 3.73 7 ⫺0.24 (0.09) 7 ⫺0.21 (0.09) 7 ⫺0.21 (0.09) 7 6
Mauritius 3.74 8 ⫺0.21 (0.09) 8 ⫺0.10 (0.09) 9 ⫺0.04 (0.10) 9 13
Hong Kong (China) 3.76 9 ⫺0.16 (0.07) 10 0.06 (0.08) 12 0.02 (0.09) 11 20
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Germany 3.77 10 ⫺0.18 (0.11) 9 ⫺0.01 (0.11) 10 ⫺0.01 (0.12) 10 18


⫺0.07 (0.09)
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Sweden 3.85 11 11 0.07 (0.09) 13 0.06 (0.10) 13 16


Malaysia 3.89 12 0.01 (0.07) 12 ⫺0.11 (0.07) 8 ⫺0.07 (0.07) 8 5
United States 3.93 13 0.08 (0.07) 13 0.14 (0.08) 16 0.08 (0.08) 14 12
Poland 3.94 14 0.12 (0.09) 14 0.02 (0.09) 11 0.06 (0.10) 12 8
Beijing (China) 3.96 15 0.12 (0.08) 15 0.11 (0.08) 15 0.12 (0.09) 16 19
Philippines 3.97 16 0.15 (0.08) 16 0.07 (0.08) 14 0.08 (0.09) 15 10
South Africa 4.01 17 0.23 (0.09) 17 0.20 (0.09) 17 0.32 (0.10) 17 14
Changchun (China) 4.06 18 0.31 (0.09) 18 0.21 (0.09) 18 0.40 (0.09) 18 11
Mali 4.23 19 0.62 (0.10) 19 0.53 (0.11) 20 0.42 (0.10) 19 21
Senegal 4.25 20 0.70 (0.09) 20 0.51 (0.10) 19 0.54 (0.11) 20 17
Burkina Faso 4.27 21 0.74 (0.10) 21 0.64 (0.10) 21 0.65 (0.11) 21 15
Benin 4.37 22 0.99 (0.10) 22 0.93 (0.11) 22 0.86 (0.11) 22 22
Note. Standard errors appear in parentheses. Italicized countries showed more sizable adjustments in their mean conscientiousness estimates across
models.

to measure and interpret response style types through the sr (and Future simulation work based on this model can also investigate
␮s j) profiles. These profiles can be directly related to a respon- the effects of different design considerations (e.g., the number of
dent’s tendency to disproportionately over- or underselect partic- vignettes per rating scale item; variability across vignette scenar-
ular score categories, and are seen in this article to frequently ios) as well as different respondent population characteristics (e.g.,
reflect tendencies beyond those of common response style types amounts and types of response style heterogeneity) on the effec-
(e.g., ERS, MRS). The direct measures of response style can prove tiveness of response style control. Clearly there remains much to
useful in understanding how response styles contribute to bias. study in regard to the appropriate design and use of vignettes.
A clear concern in specifying models that attend to only a Additional applications of the current model are needed to
limited number of response style types (such as ERS and MRS in evaluate its flexibility in other contexts. In the current analysis,
Model 2) is that unmodeled response style types that may have it appears that more substantial bias corrections tend to occur at
different biasing effects can easily become confounded, and inap- the respondent level than at the country level. The less conse-
propriate biasing corrections applied. At the respondent level, this quential effects at the country level were attributed to what
was seen in example respondents E, G, H, and I of Table 5, while appears to be the relatively large within-country heterogeneity
at the country level such effects were most noticeable for Changc- in response style. It is of course also possible that other forms
hun (China). In each case, forcing response style types to conform of item bias across countries may also exist in the instrument,
to ERS or MRS types either resulted in a mischaracterization of the effects that would not be captured in the current analyses. In
actual response style or else completely missed the presence of an addition, the reverse coding applied to the items in the current
alternative response style (e.g., Respondent I in Table 5). analysis may mask other forms of response style behavior that
Another advantage of the current model-based approach is its would be detected without the reverse coding. Applications of
capacity to systematically evaluate assumptions that are made the current methodology with other survey instruments would
in the use of anchoring vignettes. In the current analysis, there also permit study of the model in the presence of different types
was evidence for a lack of RC. The reasons for this lack of of response style bias. As discussed in relation to Figure 3, the
consistency are open to further investigation. Naturally, the biasing effects of response style are very much dependent on
process of developing and applying anchoring vignettes can be psychometric characteristics (e.g., difficulty levels) of the self-
a delicate operation requiring various considerations to help rating items. In the current analysis, it appeared that ERS would
ensure RC. There is also the possibility that in certain contexts, have more significant biasing effects at the center of the sub-
regardless of such considerations, it will prove difficult to avoid stantive trait distribution. Other survey instruments may yield
respondents altering their response tendencies when responding more substantial biasing effects in relation to other response
in relation to self as opposed to hypothetical others. Further style types and/or at different trait locations.
study is needed regarding the robustness of the model under RC There is potential to explore other variations on the proposed
violations and in developing methods for handling such viola- model, as may be appropriate in certain contexts. For example,
tions. one possibility would be to relax the assumption of specified
540 BOLT, LU, AND KIM

equal-interval category slope (a) estimates for ␪ under Model 3. Johnson, T., Kulesa, P., Cho, Y. I., & Shavitt, S. (2005). The relation
Alternative ways of studying RC and VE could also be consid- between culture and response styles: Evidence from 19 countries. Jour-
ered. We acknowledge that our approaches, occurring within a nal of Cross-Cultural Psychology, 36, 264 –277. doi:10.1177/
latent trait model, are somewhat different from those described 0022022104272905
King, G., Murray, C. J. L., Salomon, J. A., & Tandon, A. (2004). Enhanc-
by King et al. (2004). There may be more powerful ways of
ing the validity and cross-cultural comparability of survey research.
evaluating these assumptions than we considered in this article.
American Political Science Review, 98, 191–207. doi:10.1017/
As suggested by a reviewer, an alternative possibility for testing S000305540400108X
VE using the Mõttus, Allik, Realo, Pullman, et al. (2012) design Lewis, N. A., & Taylor, J. A. (1955). Anxiety and extreme preferences.
would be a model that introduces latent variables (factors) Educational and Psychological Measurement, 15, 111–116. doi:
related to rating scale type. Along these lines, further empirical 10.1177/001316445501500203
study of the current methodology in comparison to other models Marshall, R., & Lee, C. (1998). A cross-cultural, between-gender study of
(e.g., hopit models) for response style would be useful. As extreme response style. In B. G. Englis & A. Olofsson (Eds.), European
noted earlier, an apparent advantage of the current model is its advances in consumer research (Vol. 3, pp. 90 –95). Provo, UT: Asso-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

explicit measurement of response style through model param- ciation for Consumer Research.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

eters. Whether the models also yield different results in terms of Masters, G. N. (1982). A Rasch model for partial credit scoring. Psy-
chometrika, 47, 149 –174. doi:10.1007/BF02296272
response style control, however, requires further study.
Meisenberg, G., & Williams, A. (2008). Are acquiescent and extreme
response styles related to low intelligence and education? Personality
and Individual Differences, 44, 1539 –1550. doi:10.1016/j.paid.2008.01
References .010
Baumgartner, H., & Steenkamp, J. B. E. M. (2001). Response styles in Moors, G. (2003). Diagnosing response style behavior by means of a
marketing research: A cross-national investigation. Journal of Market- latent-class factor approach: Socio-demographic correlations of gender
ing Research, 38, 143–156. doi:10.1509/jmkr.38.2.143.18840 role attitudes and perceptions of ethnic discrimination reexamined.
Billiet, J. B., & McClendon, M. J. (2000). Modeling acquiescence in Quality and Quantity, 37, 277–302. doi:10.1023/A:1024472110002
measurement models for two balanced sets of items. Structural Equation Moors, G. (2012). The effect of response style bias on the measurement of
Modeling, 7, 608 – 628. doi:10.1207/S15328007SEM0704_5 leadership. European Journal of Work and Organizational Psychology,
Bock, R. D. (1972). Estimating item parameters and latent ability when 21, 271–298. doi:10.1080/1359432X.2010.550680
responses are scored in two or more nominal categories. Psychometrika, Morren, M., Gelissen, J., & Vermunt, J. (2011). Dealing with extreme
37, 29 –51. doi:10.1007/BF02291411 response style in cross-cultural research: A restricted latent class factor
Bolt, D. M., & Johnson, T. R. (2009). Addressing score bias and DIF due analysis approach. Sociological Methodology, 41, 13– 47. doi:10.1111/
to individual differences in response style. Applied Psychological Mea- j.1467-9531.2011.01238.x
surement, 33, 335–352. doi:10.1177/0146621608329891 Mõttus, R., Allik, J., Realo, A., Pullman, H., Rossier, J., Zecca, J., . . .
Bolt, D. M., & Newton, J. R. (2011). Multiscale measurement of extreme Tseung, C. N. (2012). Comparability of self-reported conscientiousness
response style. Educational and Psychological Measurement, 71, 814 – across 20 countries. European Journal of Personality, 26, 307–317.
833. doi:10.1177/0013164410388411 doi:10.1002/per.840
Buckley, J. (2009, June). Cross-national response styles in international Mõttus, R., Allik, J., Realo, A., Rossier, J., Zecca, J., Ah-Kion, J., . . .
educational assessments: Evidence from PISA 2006. Paper presented at Johnson, W. (2012). The effect of response style on self-reported con-
the NCES Conference on the Program for International Student Assess- scientiousness across 20 countries. Personality and Social Psychology
ment, Washington, DC. Retrieved from https://edsurveys.rti.org/PISA/ Bulletin, 38, 1423–1436. doi:10.1177/0146167212451275
Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches Plummer, M., Best, N., Cowles, K., & Vines, K. (2006). CODA: Conver-
to calculating posterior moments. In J. M. Bernardo, J. O. Berger, A. P. gence Diagnosis and Output Analysis for MCMC. R News, 6(1), 7–11.
Dawid, & A. F. M. Smith (Eds.), Bayesian statistics (Vol. 4, pp. Raftery, A. E., & Lewis, S. M. (1992). One long run with diagnostics:
169 –193). Oxford, England: Oxford University Press. Implementation strategies for Markov chain Monte Carlo. Statistical
Gibbins, K. (1968). Response sets and the semantic differential. British Science, 7, 493– 497. doi:10.1214/ss/1177011143
Journal of Social and Clinical Psychology, 7, 253–263. doi:10.1111/j Rossi, P. E., Gilula, Z., & Allenby, G. M. (2001). Overcoming scale
.2044-8260.1968.tb00567.x usage heterogeneity: A Bayesian hierarchical approach. Journal of
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (1996). Markov chain the American Statistical Association, 96, 20 –31. doi:10.1198/
Monte Carlo in practice. Boca Raton, FL: Chapman & Hall/CRC. 016214501750332668
Greenleaf, E. A. (1992a). Improving rating scale measures by detecting and Saroglou, V. (2002). Religion and the five factors of personality: A
correcting bias components in some response styles. Journal of Market- meta-analytic review. Personality and Individual Differences, 32, 15–
ing Research, 29, 176 –188. doi:10.2307/3172568 25. doi:10.1016/S0191-8869(00)00233-6
Greenleaf, E. A. (1992b). Measuring extreme response style. Public Opin- Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002).
ion Quarterly, 56, 328 –351. doi:10.1086/269326 Bayesian measures of model complexity and fit. Journal of the Royal
Grol-Prokopczyk, H., Freese, J., & Hauser, R. M. (2011). Using anchoring Statistical Society: Series B. Statistical Methodology, 64, 583– 639.
vignettes to assess group differences in general self-related health. doi:10.1111/1467-9868.00353
Journal of Health and Social Behavior, 52, 246 –261. doi:10.1177/ Spiegelhalter, D. J., Thomas, A., Best, N., & Lunn, D. (2003). WinBUGS
0022146510396713 Version 1.4 user’s manual. Cambridge, England: MRC Biostatistics
Harzing, A.-W. (2006). Response style in cross-national survey research: A Unit.
26-country study. International Journal of Cross Cultural Management, Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models.
6, 243–266. doi:10.1177/1470595806066332 Psychometrika, 51, 567–577. doi:10.1007/BF02295596
Hui, C. H., & Triandis, H. C. (1989). Effects of culture and response format van Rosmalen, J. M., van Herk, H., & Groenen, P. J. F. (2010). Identifying
on extreme response style. Journal of Cross-Cultural Psychology, 20, response styles: A latent-class bilinear multinomial logit model. Journal
296 –309. doi:10.1177/0022022189203004 of Marketing Research, 47, 157–172. doi:10.1509/jmkr.47.1.157
RESPONSE STYLES USING ANCHORING VIGNETTES 541

Van Vaerenbergh, Y., & Thomas, T. D. (2013). Response styles in survey Weijters, B., Geuens, M., & Schillewaert, N. (2010b). The stability of
research: A literature review of antecedents, consequences, and reme- individual response styles. Psychological Methods, 15, 96 –110. doi:
dies. International Journal of Public Opinion Research, 25, 195–217. 10.1037/a0018721
doi:10.1093/ijpor/eds021
Weijters, B., Geuens, M., & Schillewaert, N. (2010a). The individual
consistency of acquiescence and extreme response style in self-report Received March 31, 2013
questionnaires. Applied Psychological Measurement, 34, 105–121. doi: Revision received November 14, 2013
10.1177/0146621609338593 Accepted November 21, 2013 䡲
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Das könnte Ihnen auch gefallen