Beruflich Dokumente
Kultur Dokumente
Construct-Driven Development
of a Video-Based Situational
Judgment Test for Integrity
A Study in a Multi-Ethnic Police Setting
Lonneke A. L. de Meijer,1 Marise Ph. Born,1 Jaap van Zielst,2
and Henk T. van der Molen1
1
Abstract. In a eld study conducted in a multi-ethnic selection setting at the Dutch police, we examined the construct validity of a video-based
situational judgment test (SJT) aimed to measure the construct of integrity. Integrity is of central importance to productive work performance of
police ofcers. We used a sample of police applicants, which consisted of a Dutch ethnic majority group and an ethnic minority group. The ethnic
minority applicants came from one of the four largest ethnic minority groups in The Netherlands, namely groups with a Dutch Antillean, a
Moroccan, a Surinamese, or a Turkish background. A critical issue is the often-found construct-heterogeneity of SJTs. However, we found that a
construct-driven approach may be fruitful in the development of SJTs aiming to measure one single construct. Conrming our expectations, we
found support for the construct validity of the SJT intended to measure the construct of integrity. These results held across ethnic majority and
ethnic minority applicants. Therefore, the SJT is a promising test for personnel selection in a multi-ethnic setting.
Keywords: situational judgment test, integrity, construct validity, ethnicity
Situational judgment tests (SJTs) typically consist of scenarios of hypothetical work situations in which a problem has
arisen. Accompanying each scenario are multiple possible
ways to respond to the hypothetical situation. The test taker
is then asked to judge the possible courses of action.
Although SJTs have been in use since the 1920s, they have
become increasingly popular in personnel selection and in
the research literature during the last two decades (e.g., Chan
& Schmitt, 1997, 2005; Dalessio, 1994; McDaniel, Hartman,
Whetzel, & Grubb, 2007; Olson-Buchanan et al., 1998;
Weekley & Jones, 1997, 1999). Several characteristics of
the SJT have caused its revival. First, McDaniel et al.
(2007) meta-analytically showed the criterion-related validity
and the incremental validity of SJTs over and above a composite of cognitive ability tests and personality questionnaires in predicting job performance. Second, SJTs have
been found to have less adverse impact on ethnic minority
groups than more traditionally used cognitive ability tests
(Clevenger, Pereira, Wiechmann, Schmitt, & Harvey, 2001;
Motowidlo, Dunnette, & Carter, 1990; Nguyen & McDaniel,
2003; OConnell, Harman, McDaniel, Grubb, & Lawrence,
2007; Olson-Buchanan et al., 1998; Weekley & Jones,
1997, 1999). Finally, new technology has made the development of SJTs based on video material possible. The videobased SJT appears to have several advantages compared to
2010 Hogrefe Publishing
230
The present study aims to demonstrate that a constructdriven development of SJTs is possible. We developed an
SJT intended to measure the concept of integrity and based
on video scenarios (i.e., a video-based SJT for integrity). We
collected eld data in a multi-ethnic setting during Dutch
police ofcer selection. Therefore, the construct validity of
the SJT was examined for both the ethnic majority and
the ethnic minority group. The largest ethnic minority
groups in The Netherlands are groups with a Dutch
Antillean, a Moroccan, a Surinamese, and a Turkish background. We will rst discuss the concept of integrity and of
integrity tests. Second, the Integrity-SJT will be introduced
and the hypotheses will be described. Finally, the relevance
of the Integrity-SJT for personnel selection will be dealt with.
An Integrity-SJT
The SJT that was developed for the Dutch police consists of
videos of critical situations in each of which police-integrity
violations are presented. Little is known in the literature
European Psychologist 2010; Vol. 15(3):229236
Overview of Hypotheses
Since both integrity tests and SJTs have shown to be related
to conscientiousness, agreeableness, and emotional stability
(McDaniel & Nguyen, 2001; Ones, 1993, in Wanek,
1999), examining correlations between the present SJT and
these three Big Five dimensions solely will not give much
insight into the question whether the SJT indeed measures
integrity. If, for instance, correlations around .25 are found
in the present study, would this mean that the SJT measures
integrity or would this mean that the test is yet another multidimensional SJT? Therefore, the SJTs convergent validity
was examined by means of the relationship between the SJT
score and several integrity-related measures, namely the
dimension Honesty-Humility of the HEXACO-model (Lee
& Ashton, 2004) and cognitive distortions by means of the
How-I-Think questionnaire (HIT questionnaire; Barriga,
Gibbs, Potter, & Liau, 2001). Also, the discriminant validity
of the SJT is investigated by means of a cognitive ability test.
In the following, we will state the hypotheses and the
arguments for these hypotheses. The rst hypothesis states
that scores on the Integrity-SJT will be related to scores
on other integrity-related dimensions (Hypothesis 1).
A dimension that has shown a strong resemblance to the
concept of integrity is the sixth factor of the recently introduced HEXACO-model of personality (Lee & Ashton,
2004; Lee, Ashton, Morrison, Cordery, & Dunlop, 2008).
This sixth factor is labeled Honesty-Humility and is typically described as honesty, fairness, sincerity, modesty, and
lack of greed. Lee, Ashton, and De Vries (2005) argue that
the dimension Honesty-Humility has a clear conceptual link
to integrity, since both consist of admissions of wrongdoing such as theft, fraud, sabotage, and alcohol and drug
abuse (p. 182). Hence, they investigated the relationship
between Honesty-Humility on the one hand and workplace
delinquency and scores on an overt integrity test on the other
hand. They found correlations of .47 for workplace
2010 Hogrefe Publishing
Method
Sample and Procedure
Data came from ethnic majority and ethnic minority applicants who applied for a position at the Police Academy of
The Netherlands in the period from June 2006 to August
2006. The dataset consisted of 203 applicants (59% male;
Mage = 23.34, SD = 5.98), of which 151 were ethnic majority
2010 Hogrefe Publishing
231
Measures
Situational Judgment Test for Integrity
SJTs typically consist of hypothetical scenarios describing
interpersonal work situations in which a problem has arisen.
The scenario may represent an actual situation on the target
job or a situation constructed in such a manner that it is psychologically identical to an actual work situation (Chan &
Schmitt, 1997). Scenarios within the test are usually developed on the basis of a critical-incident analysis involving
subject matter experts.
An approach analogous to earlier SJT studies was used
for the development of the Integrity-SJT (see, e.g., Weekley
& Jones, 1997; for an example of an SJT item, see
Appendix A). First, we collected realistic critical incidents
regarding interactions between police ofcers and civilians
or among police colleagues from 15 experienced police ofcers (both policemen and policewomen; both ethnic majority and minority police ofcers; police experts had around
15 years of police work experience). All incidents focused
on integrity violations and potential reactions to these violations. For example, several incidents dealt with resisting
fraudulent people or situations. Second, critical incidents
that were similar were grouped and scenarios were written
about each of these groups of critical incidents. Simultaneously, with the help of the experienced police ofcers,
four response options were derived for each scenario. This
procedure resulted in 14 SJT items (a scenario including
its four response options is labeled item) that were
pilot-tested in a written version of the test. Third, after examining the descriptives and the factor-analytic results of the
pilot-study data (N = 228, 72% male, Mage = 24.08;
SD = 6.78), 3 of the 14 SJT items were eliminated.
Subsequently, a video-based version of the test was developed. Both professional actors and police ofcers were
trained to act in scenarios. After this training, the scenarios
were videotaped in a professional manner. Finally, a panel
of experts was asked to ll out the video-based SJT in order
to develop a scoring key. The expert panel consisted of 50
experienced police ofcers with on average 14.06 years of
work experience (SD = 6.38) and with different ethnic backgrounds, namely 10 ethnic majority experts, 10 Antillean
European Psychologist 2010; Vol. 15(3):229236
232
How-I-Think Questionnaire
To measure applicants cognitive distortions, the Dutch
translation (translated from English by Utrecht University,
The Netherlands) of the HIT questionnaire (Barriga et al.,
2001) was used. The HIT questionnaire was developed to
measure two broad dimensions, namely cognitive distortions
and behavioral referents, each consisting of four subdimensions. Cognitive Distortions consist of the subdimensions
Self-centered, Blaming others, Minimizing/Mislabeling,
and Assuming the Worst. Behavioral Referents consist of
the subdimensions Opposition-Deance, Physical Aggression, Lying, and Stealing (for denitions, see Appendix B).
The alpha reliability of the dimension Cognitive Distortions
was .90 and of the dimension Behavioral Referents was .89,
based on the present sample. The alpha reliabilities of the
subdimensions varied from .70 for Blaming others to .79
for Stealing. Two models were tested in accordance with
Barriga et al. (2001), namely: each consisting of one dimension (i.e., Cognitive Distortions or Behavioral Referents)
and four subdimensions. Both models showed a good t
to the data (v2 [df = 2] = 8.82, p < .05, and 1.66, ns;
TLI = .94 and 1.00; CFI = .99 and 1.00; and
RMSEA = .05 and .00). All Cognitive Distortions subdimensions loaded signicantly (.80 < b < .90, p < .001)
on the Cognitive Distortions dimension. All Behavioral Referents subdimension loaded signicantly (.75 < b < .90,
p < .001) on the Behavioral Referents dimension. However,
because the two dimensions Cognitive Distortions and
Behavioral Referents were highly correlated (r = .99,
p < .001), further analyses were conducted with one general
HIT scale consisting of the mean of the two dimensions.
Analyses
Preliminary Analyses
Because response styles can affect answers on questionnaires
(e.g., Van de Vijver & Leung, 1997), structural equivalence
(i.e., absence of bias) was checked across the ethnic majority
and minority group for each measure separately before conducting further analyses. In accordance with Van de Vijver
and Leung (1997), structural equivalence across cultures is
interpreted as follows: A test measures the same trait crossculturally, but not necessarily on the same quantitative scale.
Using an equal measurement weights model in Amos 6.0
(Arbuckle, 2005), no signicant differences between factor
structures of the measures were found between the ethnic
majority group and the minority group (for detailed information, please contact the rst author).
Main Analyses
Using Amos 6.0 (Arbuckle, 2005), a general model was
tested to examine the correlations between the IntegritySJT and the two integrity-related instruments and between
the Integrity-SJT and the cognitive ability test. The model
showed a good t to the data (v2 [df = 4] = 4.73, ns;
TLI = .89; CFI = .98; RMSEA = .03). Multigroup analysis
was conducted to test for cross-ethnic invariance in correlations. Correlations between the SJT, the in-depth interview,
and the HIT-questionnaire were calculated to examine the
convergent validity of the SJT. The correlation between
the SJT and the cognitive ability test was calculated to investigate its discriminant validity.
Results
First, we expected that scores on the Integrity-SJT would
correlate with scores on other integrity-related tests (Hypothesis 1). Second, we expected that scores on the Integrity-SJT
would not correlate with scores on the cognitive ability test
(Hypothesis 2). Finally, we examined potential correlational
differences between the ethnic majority group and the ethnic
minority group (Research Question).
With regard to the Research Question, a multigroup analysis was conducted. A model in which the covariances were
held constant across the ethnic majority and minority group
was compared to the unconstrained model. No signicant
difference was found between the two models (Dv2
[Ddf = 8] = 10.26, ns), meaning that no signicant difference in covariances was found between the two ethnic
groups. To investigate the correlations between scores on
2010 Hogrefe Publishing
233
the SJT and the other measures, therefore, one group without
distinguishing between ethnic majority and minority applicants was used.
With regard to the integrity-related measures, the
observed correlations with the SJT were .23 (p < .05) for
Honesty-Humility and .36 (p < .001) for the HIT questionnaire. On cognitive ability, the correlation with the
SJT was .13 (ns). Thus, concerning the convergent-validity
evidence, the correlations were moderate to high (Cohen,
1988) between the SJT and the in-depth Honesty-Humility
interview and between the SJT and the HIT questionnaire.
Concerning the discriminant-validity results, the correlation
between the SJT and the cognitive ability test was small
(Cohen, 1988). These ndings support Hypotheses 1 and
2 and support the notion that the SJT seems to measure
integrity in both the ethnic majority and minority group.
Discussion
Situational judgment tests recently have gained in popularity
because of its incremental validity over and above cognitive
ability tests and because of its smaller adverse impact against
ethnic minority groups (Clevenger et al., 2001; Motowidlo
et al., 1990; Nguyen & McDaniel, 2003; Weekley & Jones,
1997, 1999). New technology has made the development of
video-based SJTs possible, which show even higher criterion-related validity (Lievens & Sackett, 2006) and less
adverse impact (Chan & Schmitt, 1997) than paper-andpencil SJTs. Furthermore, more and more attention is given
to measuring integrity of applicants during personnel selection because it has shown to be predictive of counterproductive work behavior varying from theft to job performance
(Ones et al., 1993). It is of central importance to determine
the integrity as a work style for police ofcer positions. In
fact, integrity is seen as the most important work style of
police ofcers according to O*Net (O*Net Online, 2009,
May 6). Because of the impact that integrity violations have
on the police organization, a video-based SJT intended to
measure integrity was developed. In a eld study conducted
in a multi-ethnic setting at the Dutch police, we examined the
construct validity of this Integrity-SJT. We investigated convergent and discriminant validity of the SJT, including potential correlational differences between the ethnic majority
group and the ethnic minority group. The largest ethnic
minority groups in The Netherlands have a Dutch Antillean,
a Moroccan, a Surinamese, or a Turkish background.
Firstly, we found no signicant differences in correlations
between the ethnic majority and minority group. Secondly,
correlations between the SJT and the integrity-related measures the in-depth Honesty-Humility interview and the
HIT questionnaire were moderate to high (Cohen,
1988). Finally, the correlation between the SJT and the cognitive ability test was low (Cohen, 1988). These results are in
line with our expectations (Hypotheses 1 and 2) and demonstrate the construct validity of the SJT measuring integrity.
The construct Honesty-Humility includes subdimensions such as Morality (i.e., being able to avoid fraud and
European Psychologist 2010; Vol. 15(3):229236
234
corruption and unwilling to take advantage of other individuals or of society at large) and Honesty (i.e., being genuine in
interpersonal relations and unwilling to manipulate others).
The HIT questionnaire contains subdimensions such as
Opposition-Deance (i.e., being disrespectful for rules, laws,
or authorities), Stealing, and Lying. Therefore, the moderate
to high correlations between the SJT and the HonestyHumility interview and between the SJT and the HIT questionnaire showed support for the construct validity of the
SJT.
Additionally, we found a negligible relationship between
the SJT and the cognitive ability test. Regarding integrity
tests, Ones and Viswesvaran (1998) showed that they have
negligible correlations with cognitive ability. Since integrity
was the intended SJT construct, we expected a small correlation between the SJT score and scores on the cognitive
ability test. This was what we found, providing more evidence for the construct validity of the present SJT.
Acknowledgments
We acknowledge Hans van Loon and Hellen Westerveld for
their contribution to the execution of this study.
References
Limitations
Our study had some limitations. First, the small sample
size of ethnic minority applicants resulted in small power
concerning the multigroup analysis. With regard to the ethnic minority group, a larger sample size would have
allowed to draw rmer conclusions. Also, a larger sample
size of ethnic minorities would allow a further differentiation within the ethnic minority group. De Meijer, Born,
Terlouw, and Van der Molen (2006) showed that large
score differences on various selection measures exist
between ethnic minority groups, which might be explained
by differences in history and culture between the ethnic
groups. Investigating these ethnic minority groups separately may result in more useful information compared to
merely contrasting the ethnic majority to minority group
and not taking into account potential differences between
ethnic groups.
Second, we did not have criterion data at our disposal to
investigate the criterion-related validity of the present SJT.
Although the construct-validity results are promising, we do
not know whether the present SJT is able to predict job performance, workplace (dis)honesty, theft, fraud, etc. Since little is
known about SJTs measuring a single construct, in general,
and their criterion-related validity, specically, future research
should be focused on these types of SJTs and their predictive
power. Furthermore, SJTs intended to measure a single construct should be developed in different companies, in different
settings, and on different job levels to be able to properly generalize the ndings in the present study.
Conclusion
A critical issue regarding SJTs is the often-found constructheterogeneity. However, we argue that a construct-driven
approach may be fruitful in the development of SJTs measuring one single construct. In a eld study conducted in a
European Psychologist 2010; Vol. 15(3):229236
235
236
Appendix A
Description of Situation
A police ofcer (police ofcer 1) comes to work on his
motorbike. When he enters the parking garage of the police
station he accidentally hits a police car, causing a big scratch
on the police car. Shortly after, he meets a colleague (police
ofcer 2) and tells her what happened.
Police Officer 1
Hi! Listen: I just entered the parking garage with my
motorbike and caused a big scratch on one of the police cars.
I feel really bad about it and, actually, I dont know what to
do.
Appendix B
Integrity-Related Dimensions and Their Descriptions
Dimension
In-depth interview
Modesty
Description
Being modest, unassuming, and seeing oneself as an ordinary person
without any claim to special treatment
Honesty
Morality
Being able to avoid fraud and corruption and unwilling to take advantage
of other individuals or of society at large
Avoidance of materialism
How-I-Think questionnaire
Self-centered
Blaming others
Minimizing/mislabeling
Opposition-deance
Note. Denitions of the facets of the in-depth Integrity interview are from Lee and Ashton (2004) and denitions of the (sub-)
dimensions of the HIT questionnaire are from Barriga et al. (2001). Denitions of the subdimensions Physical Aggression, Stealing,
and Lying of the HIT were not listed here, because we assumed that they are self-explanatory.