Beruflich Dokumente
Kultur Dokumente
This study describes the development and validation of the Call Center Aptitude Test
(C-A-T), an Internet-based multi-method measure for the pre-selection of call center
agents. The C-A-T includes a short biographical form, a cognitive speed test, a personality
inventory, and a multimedia situational judgement test. A total of 327 call center agents
completed the C-A-T and were concurrently assessed by their superiors. In a pilot study
(N 5 151), a prototype was tested and subsequently optimized. In the main study
(N 5 176), the revised and extended version of the C-A-T was reliable (alpha 5 .85) and
showed good convergent and discriminant validity. The concurrent validity (r 5.33) can
be considered to be satisfactory for a computerized pre-selection instrument. Evidence
was also found for high face validity and acceptance in both studies.
Introduction
184
r Blackwell Publishing Ltd 2003, 9600 Garsington Road, Oxford OX4 2DQ, UK and
350 Main Street, Malden, MA 02148, USA.
WEB-BASED ASSESSMENT
185
Computerized Testing
Computerized testing has received considerable attention
in the last few years with an increasing impact on
companies selection and personnel development systems
(Bartram, 1999; Buchanan and Smith, 1999; Drasgow
et al., 1993). In addition to computer-based testing offline
that can be run on a standard PC without any connection
to computer networks, web-based or online measures
have been developed that have a number of advantages
compared to traditional paper-and-pencil instruments.
First, internet-based measures allow testing at any time
and anywhere. Thus, testing can be extremely cost-effective
and efficient particularly when assessing large samples.
Several studies have shown psychometrical equivalence
between paper-and-pencil and online applications (Buchanan and Smith, 1999; Hertel, Naumann, Konradt and
Batinic, 2002; Stanton, 1998). A meta-analysis comparing
computerized and paper-and-pencil ability tests found high
levels of equivalence (r around .90) for power tests. Only
speed tests showed somewhat lower equivalence (r around
.60; cf. Mead and Drasgow, 1993).
Second, computerized measures can have positive effects
on test reliability because participants sometimes answer
more frankly and truthfully (Hinkle, Sampson and
Radonsky, 1991). Third, relatively high social acceptance scores are reported for computerized tests (Arvey
et al., 1990; Chan and Schmitt, 1997; Chan et al., 1997;
Richman-Hirsch, Olson-Buchanan and Drasgow, 2000).
For example, managers completing a multimedia test on
conflict resolution skills rated the instrument as more face
valid and had more positive attitudes towards this test form
compared to managers who completed paper-and-pencil
186
PILOT STUDY
Method
Participants and Design
One hundred and fifty-one agents from a middle-sized call
center volunteered to participate in the first study. Since the
focus of this study was mainly on scale development,
no biographical data were collected. However, according
to the HR department of the call center, the participants
met average call center standards regarding age, gender and
education.
Measures
The prototype of the C-A-T was developed as a test battery
with three modules. After a short introduction on how to
use the C-A-T and how to navigate through the modules,
Procedure
Participants were personally invited by the call center
manager to participate in the study. He also informed them
about the goals and procedure. The call center agents
completed the C-A-T during their working hours using
their PC in the call center office. Following a brief
introduction, participants first registered by entering a
.25**
.00
.09
.16
.15
.00
.07
.11
.07
.02
.01
.09
(.24)
.39**
.15
.33**
.21*
.04
.01
.13
.11
.17*
.18*
(.66)
.25**
.37**
.19*
.04
.16
.06
.06
.10
.12
(.58)
.30**
.25**
.04
.12
.09
.18*
.19*
.03
(.60)
.27**
10
11
12
.08 (.90)
.16* .51** (.83)
.08
.58** .64** (.82)
.05
.60** .53** .66** (.87)
.01
.55** .38** .59** .68** (.90)
.08
.44** .52** .55** .48** .37** (.91)
(.69)
Notes. Alpha coefficients are presented in the main diagonal.*denotes significance at po.05,**denotes significance at po.01. Reliability of the concentration
ability (mental processing speed) was calculated as split-half-coefficient.
3.1 (.5)
3.0 (.5)
3.1 (.6)
2.9 (.6)
2.9 (.5)
3.3 (.5)
1
(.87)
.21*
.03
.02
.03
.07
M (SD)
WEB-BASED ASSESSMENT
187
188
Results
An exploratory factor analysis (Principal Component
Analysis, Varimax rotation) on the personality test data
of Module 2 was conducted to explore the underlying
dimensions of the newly developed measure. This analysis
revealed seven factors, explaining a total variance of
61.9%. Thus, the results of the factor analysis were not
consistent with the planned five factor structure, suggesting
further adjustments of the developed scales.
The intercorrelations between the cognitive ability
(concentration) measure, the five personality factors, and
the six criteria are shown in Table 1. All personality factors
were significantly related at least at the one percent level
(range between .15 and .39; all po.01), but were
uncorrelated with the cognitive performance measures
(all rso.11, n.s.). With regard to the alpha reliabilities,
only the cognitive ability (concentration ability) measure
(alpha 5 .87; split half) met a satisfactory level of .70.
Except for the Telephone communication skills (alpha 5 .24), reliabilities for the other personality scales
ranged between .59 and .69. Moreover, a ceiling effect for
Conscientiousness occurred (M 5 6.44; SD 5 .48). The
multiple regression that was used to calculate concurrent
validity yielded a coefficient of Rct 5 .20 for the test
battery (corrected Rct 5 .22).
A factor analysis of the acceptance items revealed a twofactor solution with a first factor Appropriateness (Alpha 5
.52) and a second factor Comprehension (Alpha 5 .41).
Overall acceptance, expressed by the mean of all items,
exceeded the chosen threshold of 75% agreement. Acceptance assessment was unrelated to the cognitive ability
measure (r 5 .02, n.s.) but was significantly related to the
aggregated mean personality score (r 5.26, po.001),
indicating that the better the test performance was, the
higher was the acceptance of the test.
Discussion
The pilot study revealed several flaws and some strengths
of the prototype. First of all, the expected factor structure
could not be found in the personality test module,
indicating insufficient construct validity. This had to be
addressed through further adjustment work. Moreover, the
criteria-based validity was low. One explanation may be
that the criteria had already been collected several months
before the predictors were measured, and thus may have
changed over time. Consequently, the validity of the
criteria might have been impaired. For this reason, we
decided to collect concurrent criteria in our main study.
Results showed that the reliabilities of the scales with a
notable exception of the cognitive speed test module were
not satisfactory. However, acceptance of the C-A-T was
fairly high when compared with the acceptance scores for
personality tests and for cognitive performance tests that
have been found in other studies (e.g. Rynes and Connerley,
1993).
Another limitation of the pilot study was that the time of
the day when participants were tested had not been
controlled. However, agents at the end of their shift might
have been considerably more tired compared to the
beginning of the shift, and these variations might have
had a negative effect on test performance, especially on
the concentration test. Moreover, the test battery was
applied in the call center office which may have lead to
disturbances, noise, and/or interruptions by colleagues.
As a first step in our revision of the measure we decided
to re-run the task analysis. Therefore, we conducted
semi-structured in-depth interviews with three call center
managers. Following these results, two constructs
(Ability to communicate on the telephone and Persistence)
were changed and items were renamed. The final five
dimensions of the personality module, their definitions and
sample items are presented in Table 2. Furthermore,
a forced-choice situational judgement test with multimedia
elements was added to the C-A-T. This situational
test consisted of four typical and critical working situations
including an impolite, a verbose, an incomprehensible
and a claiming caller. The data from this test were not
Table 2. Definitions and Sample Items for the Dimensions of the Personality Questionnaire Module included in C-A-T.
1. Willingness to learn: A tendency to be open to training and to enjoy getting into new subjects. Sample item:
I am fascinated by complex subjects.
2. Social competence: A tendency to act socially skilled and to respond cleverly to customers. Sample item:
It is rather easy for me to reveal my opinion in an unknown situation.
3. Self-motivation: A tendency to motivate oneself without external incentives. Sample item: When I dont succeed,
I increase my efforts.
4. Stress resistance: A tendency to cope with time pressure in an efficient manner. Sample item: When I have to work
to a deadline under pressure, I perform less than usual.
5. Conscientiousness: A tendency to be reliable and to meet time-limits as agreed. Sample item: Responsible tasks
are important to me.
WEB-BASED ASSESSMENT
MAIN STUDY
Method
Participants and Design
One hundred and seventy-six agents (64 females, 112
males) working in a mid-sized call center (different
to the one in the pilot study) participated voluntarily
in the study. On average, participants were 26.3 years
old (SD 5 3.98). Participants were assessed by four supervisors (1 female, 3 males, on average 26.0 years old,
SD 5 1.41).
189
Results
Prior to confirmatory factor analyses (CFA), item characteristics were assessed. Five items were removed due
to an unacceptable discriminatory power. A CFA was
then used to test the fit of the validation sample data to
the underlying five factor model. Evidence that the model
fits the data acceptably is provided by the RMSEA value
of .06 which meets the desired cutoff exactly (cf. Hu
and Bentler, 1999). Similarly, the goodness of fit index
(GFI 5 0.86; adjusted GFI 5 0.82) and the normed fit
index (NFI 5 0.71) indicated a good fit to the model.
However, the Tucker-Lewis Index result of .83 is under
the recommended threshold of .95, denoting satisfactory
model fit.
Table 3 presents the alpha reliabilities and the intercorrelations between the constructs. The reliabilities of
Social competence, Self-motivation, and Stress resistance
ranged between .59 and .68 and did not meet the level of
.70 required for the internal consistency reliability of personality measures. However, the relatively low reliability
seems to be mainly due to the small scale sizes. Following
the Spearman Brown prophecy formula, an extension on
eight items for each scale would lead to reliabilities
between rtt 5 .70 and rtt 5 .86. The overall reliability of
the test battery was r 5.85 (rtt 5 .92 after scale extension).
As shown in Table 3, the factors were highly correlated,
but do not suggest an underlying single factor model. All
five personality constructs showed significant inter-correlations (po.01), but were uncorrelated with the concentration score indicating convergent and discriminant
validity. With the exception of Stress resistance, no
difference was found between inbound and outbound
activity. The stress resistance was higher for agents who
worked outbound (M 5 5.61; SD 5 .55) compared to those
working predominantly inbound (M 5 5.14; SD 5 .75)
(t 5 2.56; df 5 174; po.01). No differences were found
in the other five constructs (all |t|o1.34, n.s.) or in a
compounded total test score (t 5 1.01; df 5 174; n.s.).
Moreover, there was no significant correlation between
hours in charge before test completion and concentration
ratings (r 5.02, n.s.) or the compounded personality score
(r 5.12, n.s.), respectively. The superior ratings were highly
intercorrelated (each po.01) and were partly marginally
correlated to the predictors.
Forced entry multiple regression analyses were conducted in which the cognitive ability measure and the five
personality factors were used as predictors, and mean
superiors rating was used as dependent variable. This
analysis revealed a significant result (F 5 2.94, df 5 6,
po.01). As revealed by the standardized regression
coefficients (Table 4), significant positive effects were
observed for Social competence (Beta 5 .17, po.05),
Stress resistance (Beta 5 .17, po.05), Conscientiousness
(Beta 5 .19, po.05), and Cognitive ability (Beta 5 .19,
po.05).
(.70)
.29**
.37**
.34**
.48*
.01
.01
.06
.07
.05
.08
(.68)
.33**
.47**
.38*
.03
.02
.05
.02
.07
.04
.20**
.01
.17*
.10
.14
.05
10
11
12
.13 (.86)
.09
.46** (.93)
.14
.86** .55** (.84)
.11
.79** .68** .77** (.77)
*
.18
.87** .51** .83** .82** (.83)
*
.16
.70** .60** .64** .82** .73** (.75)
(.64)
.41** (.59)
.54** .42** (.72)
Notes. Only correlation coefficients 4.10 are depicted. Alpha coefficients are presented in the main diagonal. Coefficients for the criteria are interrater agreements
(see text).*denotes significance at po.05,**denotes significance at po.01. Reliability of the concentration ability was calculated as split-half-coefficient.
5.0 (1.0)
5.0 (.7)
4.8 (1.0)
4.8 (1.0)
4.5 (1.0)
4.8 (1.1)
1
(.81)
.05
.00
.03
.07
.05
M (SD)
190
UDO KONRADT, GUIDO HERTEL AND KARIN JODER
WEB-BASED ASSESSMENT
191
Concentration ability
Willingness to learn
Social competence
Self-motivation
Stress resistance
Conscientiousness
0.015
0.086
0.125
0.069
0.152
0.192
SE B
0.006
0.068
0.061
0.073
0.081
0.095
b
0.19
0.11
0.18
0.09
0.17
0.19
t
2.56*
1.26
2.05*
0.95
1.88*
2.02*
Note:*po.05.
For the C-A-T predictors, the overall multiple correlation coefficient of .31 (corrected Rct 5 .33) is statistically
significant at the .001 level, indicating that 9.4 percent of
the overall criteria variance is explained by the C-A-T
predictors. On the other hand, performance was unrelated
to gender, age, education, profession, and the length of time
at work before completing C-A-T (all ro.12).
Finally, a factor analysis of the acceptance ratings
(Principal Component Analysis, Varimax, rotated solution)
revealed two components (loadings 4.67) with the first
factor (Appropriateness) consisting of Item1, Item 2, and
Item 4, and a second factor (Comprehension) consisting
of Item 3 and Item 5. The first factor explained 31.2
percent, and the second factor explained 28.0 percent of
criteria variance. Overall, mean acceptance was 3.85
(SD 5 0.46) and exceeded the expected threshold of 75
percent agreement. Acceptance was found to be independent of the concentration score (r 5.08, n.s.) but correlated
positively with the accumulated personality score (r 5.32;
po.001) and the total test score (r 5.28, po.001).
Discussion
The data analysis of the main study using the revised
instrument revealed good construct validity. The five
personality dimensions which were confirmed by the
confirmatory factor analysis were highly correlated. Apart
from the convergent validity within the personality
module, the personality factors were uncorrelated with
the cognitive concentration measure, indicating discriminant validity (Ackerman and Heggestad, 1997). In
comparison to the pilot study, the psychometric qualities
of the revised scales were significantly improved. Moreover, the ceiling effects observed in the pilot study were
reduced and item difficulties were increased. Four of six
predictors contributed significantly to the performance
criteria showing satisfying concurrent validity. With the
exception of the Stress resistance scale, there were no
differences between inbound and outbound agents. This
indicates a high generalizability of the C-A-T, emphasizing
the function of C-A-T as a general pre-selection instrument.
General Discussion
The aim of this research was to develop and validate a webbased instrument for the pre-selection of call center agents.
The main study supports the utility of this web-based
instrument in pre-selection settings. Overall, a criterionrelated validity of .33 (corrected) was found which is fairly
satisfactory and higher than validity scores usually
observed for biographical data (Bliesener, 1996; HarveyCook and Taffler, 2000). In a recent meta-analysis of
selection instruments, Schmidt and Hunter (1998) reported
a predictive validity score for biodata as a single selection
method of about .35, and for structured interviews of about
.38. Even though a direct comparison between a single
score and a meta-analytic estimate can not be drawn, the
criterion validity of C-A-T can be seen as a first example
that the predictive power of longer and more expensive
instruments can already be achieved by relatively brief
computerized instruments.
C-A-T was developed as a pre-selection tool and may
serve two functions. First, it can be used to select those
applicants that are possibly qualified to perform call center
agent tasks. The second function of C-A-T is to guide a
structured selection interview and may enable interviewers
to focus on contradictory or missing aspects that might
have occurred in the test results. However, such incremental validity can not be enumerated within the design of
the reported studies.
The data on social acceptance yielded results which were
confirmatory in nature. The agents ascribed high face
validity to the instrument and considered it to be helpful,
appropriate and essential. It should be noted that this
high acceptance can not only be ascribed to the inclusion of
multimedia components that usually lead to more positive
reactions (cf., Richman-Hirsch et al., 2000; Shotland,
Alliger and Sales, 1998). The C-A-T already received
positive reactions in the pilot study with no multimedia
elements. Instead, it seems that other aspects of C-A-T, such
as transparency, fairness, and perceived relevance of the
192
References
Ackerman, P.L. and Heggestad, E.D. (1997) Intelligence, Personality, and Interests: Evidence for overlapping traits. Psychological Bulletin, 121, 219245.
Arvey, R.D., Strickland, W., Drauden, G. and Martin, C. (1990)
Motivational components of test taking. Personnel Psychology,
43, 695716.
Bagnara, S. (2000) Towards Telework in Call Centres. Report
produced with the support of the European Commission.
Brussels [http://www.euro-telework.org]
Bartram, D. (1999) Internet Recruitment and Selection: Kissing
Frogs to find Princes. International Journal of Selection and
Assessment, 8, 261274.
WEB-BASED ASSESSMENT
193
Schmidt, F.L. and Hunter, J.E. (1998) The validity and utility of
selection methods in personnel psychology: Practical and
theoretical implications of 85 years of research findings.
Psychological Bulletin, 124, 262274.
Schmitt, N. and Chan, D. (1999) The status of research on applicant
reactions to selection tests and its implications for managers.
International Journal of Management Reviews, 1, 4562.
Schuler, H. (1993) Social Validity of selection situations: A
concept and some empirical results. In H. Schuler, J. Farr and
M. Smith (eds), Personnel Selection and Assessment: Individual
and Organizational Perspectives (pp. 1126). Hillsdale, NJ:
Erlbaum.
Schuler, H. and Moser, K. (1995) Die Validitat des multimodalen
Interviews [Validity of the multimodal interview]. Zeitschrift fur
Arbeits- und Organisationspsychologie, 39, 212.
Schuler, H. and Pabst, J. (2000) Personalentwicklung im CallCenter der Zukunft: Fluktuation verhindern, Mitarbeiter langfristig binden [Personnel development in call centers of the
future: Prevention of employee turnover, committing employees
on a long term basis] Neuwied: Luchterhand.
Sczesny, S. and Stahlberg, D. (2000) Sexual harassment over the
telephone: occupational risk at call centers. Work & Stress, 14,
121136.
Shotland, A., Alliger, G.M. and Sales, T. (1998) Face Validity in the
Context of Personnel Selection: A Multimedia Approach.
International Journal of Selection and Assessment, 6, 124130.
Snell, A.F., Sydell, E.J. and Lueke, S.B. (1999) Towards a theory of
applicant faking: Integrating studies of deception. Human
Resource Management Review, 9, 219242.
Stanton, J.M. (1998) An empirical assessment of data collection
using the Internet. Personnel Psychology, 51, 709725.
Tinsley, H.E.A. and Weiss, D.J. (1975) Interrater reliability and
agreement of subjective judgments. Journal of Counseling
Psychology, 22, 358376.
Zapf, D., Seifert, C., Schmutte, B. and Mertini, H. (2001) Emotion
work and job stressors and their effects on burnout. Psychological Health, 16, 527545.