Beruflich Dokumente
Kultur Dokumente
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Clemson University
The authors evaluated the extent to which a personality-based structured interview was susceptible to
response inflation. Interview questions were developed to measure facets of agreeableness, conscientiousness, and emotional stability. Interviewers administered mock interviews to participants instructed
to respond honestly or like a job applicant. Interviewees completed scales of the same 3 facets from the
NEO Personality Inventory, under the same honest and applicant-like instructions. Interviewers also
evaluated interviewee personality with the NEO. Multitraitmultimethod analysis and confirmatory
factor analysis provided some evidence for the construct-related validity of the personality interviews. As
for response inflation, analyses revealed that the scores from the applicant-like condition were significantly more elevated (relative to honest condition scores) for self-report personality ratings than for
interviewer personality ratings. In addition, instructions to respond like an applicant appeared to have a
detrimental effect on the structure of the self-report and interview ratings, but not interviewer NEO
ratings.
Although personality is typically assessed with self-report measures (Hough & Ones, 2001), recently there has been an increased
interest in using employment interviews to assess personality
(Barrick, Patton, & Haugland, 2000; Binning, LeBreton, &
Adorno, 1999; Huffcutt, Conway, Roth, & Stone, 2001). There are
several reasons why it is worthwhile to evaluate interviews as a
method of personality assessment. For one, research suggests that
interviewer ratings of personality-related constructs may predict
job performance ratings with higher validity than self-report personality scores (Huffcutt, Conway, et al., 2001). There is also
evidence that interviews often result in more favorable applicant
reactions than paper-and-pencil personality tests (e.g., Hamill &
Bartle, 1998; Smither, Reilly, Millsap, Pearlman, & Stoffey, 1993;
Steiner & Gilliland, 1996). Decision makers also like to include
interviews in the selection process to get to know job candidates.
Finally, interviews could be used in addition to self-report personality measures to provide a more complete assessment of the
personality dimensions relevant to the job of interest. For example,
a personality-based interview could be used in the final stage of the
selection process to verify or further probe self-report personality
data obtained in earlier stages.
A recurring theme in the personality assessment literature concerns applicant response distortion (Hough, 1998; Hough et al.,
1990; Ones & Viswesvaran, 1998). Although an abundance of
studies have examined the effects of faking on self-report personality measures, researchers have all but ignored the prevalence and
effects of faking in employment interviews. With this in mind, the
purpose of the current study was to determine the extent to which
a personality-based interview was susceptible to response inflation by comparing it to the amount of inflation on a selfreport measure that assessed the same personality dimensions.
We begin by reviewing the results of previous studies relevant
to personality assessment and response inflation within an
interview context.
In the past decade, a substantial amount of research has examined relations between the Five-Factor Model of personality (FFM;
Goldberg, 1990, 1993; Tupes & Christal, 1992) and behavior at
work. This research indicates that the FFM dimensions can be
related to various organizational outcomes, including job performance, leadership, and training success (Barrick, Mount, & Judge,
2001; Colquitt, LePine, & Noe, 2000; Judge, Bono, Remus, &
Gerhardt, 2002), as well as counterproductive work behaviors like
delinquency (Hough, Eaton, Dunnette, Kamp, & McCloy, 1990),
turnover (Barrick & Mount, 1996), and absenteeism (Judge, Martocchio, & Thoresen, 1997). Researchers have also found that the
Big Five factors can explain variance in performance beyond that
of other predictors, such as cognitive ability test scores (McHenry,
Hough, Toquam, Hanson, & Ashworth, 1990) and assessment
center ratings (Goffin, Rothstein, & Johnston, 1996). Furthermore,
in addition to this validity evidence, traditional paper-and-pencil
personality inventories are relatively inexpensive, are easy to administer, and tend not to produce adverse impact against protected
groups (Hough, Oswald, & Ployhart, 2001).
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
537
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
538
The results of some studies suggest that faking can have detrimental effects on the psychometric properties of personality inventories and the selection decisions based on these measures. For
example, several studies (e.g., Schmit & Ryan, 1993) have found
that a sixth, ideal employee factor emerges in analyses of FFM
data from applicant samples. Other researchers (e.g., Stark,
Chernyshenko, Chan, Lee, & Drasgow, 2001) have found differential item and test functioning between applicants and incumbents
who completed the same personality measure. Studies have also
found that faking can attenuate the criterion-related validity of
personality measures (e.g., Douglas, McDaniel, & Snell, 1996).
Furthermore, some researchers (e.g., Griffith, Chmielowski, Snell,
Frei, & McDaniel, 2000) have demonstrated that even if faking
does not reduce the predictive validity of personality measures, it
can influence which applicants are hired in a top-down selection
system. There is also some evidence that the expected performance
of individuals who distort their responses on personality inventories is systematically lower than the expected performance of
nonfakers (Mueller-Hanson, Heggestad, & Thornton, 2003).
Other researchers provide evidence to suggest that faking does
not influence the validity and psychometric quality of personality
measures. For instance, recent studies (e.g., D. B. Smith, Hanges,
& Dickson, 2001) have reported that the factor structure of the
FFM inventories is invariant across applicant and nonapplicant
samples. M. A. Smith, Moriatry, Lutrick, and Canger (2001)
discovered that the FFM actually fit better in an applicant sample
than in a student sample. Other research (e.g., Robie, Zickar, &
Schmit, 2001) has failed to find evidence of differential item and
test functioning when comparing personality data from applicant
and incumbent groups. Moreover, results of some studies suggest
that faking does not negatively influence the criterion-related validity of personality measures. For example, a meta-analysis by
Ones, Viswesvaran, and Schmidt (1993) found higher relations
between integrity tests and supervisor ratings of performance in
applicant samples (rc .41) than in incumbent samples (rc .33).
Hough (1998) categorized studies by setting (applicant, incumbent, and directed faking studies) and concluded that setting moderates the predictive validity of personality measures. Specifically,
she found only minimal validity differences between applicant and
incumbent studies, but observed that directed faking experiments
result in much lower validity coefficients.
We do not attempt to resolve the faking debate in the current
study. We believe that there is evidence to suggest that response
distortion occurs in at least some applied settings and can have
deleterious effects when personality tests are used for selection.
We also note that many selection researchers and practitioners are
still concerned that faking is a potential limitation of self-report
personality measures. Thus, we think it is important for researchers
to continue to develop and evaluate methods for assessing personality that are more resistant to faking and response inflation in
general.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
539
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
540
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
the California Psychological Inventory, and the Sixteen Personality Factor Questionnaire.
Thus, the extent to which response inflation influences the
underlying structure of personality measures is unclear. Given the
previously discussed reasons why structured interview ratings may
inhibit response inflation, the structure of a personality-based
interview may be more resistant to instructions to respond like a
job applicant than the structure of a self-report personality measure. However, given the lack of prior research, as well as the
inconsistent findings from the above personality studies, we did
not make any specific predictions, but rather posed the following
research question:
What is the relative influence of instructions to respond like a
job applicant on the factor structure of interview-based and selfreport of personality ratings?
Method
Participants
The interviewees for this study were 143 advanced undergraduates and
graduate students. The typical interviewee was 23 years old and at least a
junior in college. Every participant had been through at least one formal
selection interview (M 3.4 interviews). In addition, 62% of the students
had at least 1 year of full- or part-time customer service experience, and
35% of students had 3 years or more of such work experience. This level
of experience is noteworthy because we developed a simulated selection
situation for the job of grocery store customer service manager (CSM).
Experienced interviewers from area organizations (N 52) administered the interviews. Each interviewer conducted an average of five interviews (SD 1.94). Interviewers represented a wide variety of industries
(e.g., automotive, retail, e-commerce) and positions (e.g., recruiter, manager, vice president). All participants had prior interviewing experience
541
(M 11 years), and most (71%) had taken at least one formal interviewer
training course.
Design
All interviewees were administered (in random order) a structured
interview and select facet scales from the NEO Personality Inventory
(Costa & McCrae, 1992). Half of the participants were asked to respond
honestly to the interview and NEO questions, and half were instructed to
respond like a motivated applicant for the CSM position on both
measures.1
Recently, some researchers (e.g., Hough & Ones, 2001; Robie et al.,
2001; Smith & Ellingson, 2002) have concluded that faking in experimental studies does not approximate the faking behavior of actual applicants.
They contend that directed faking studies only indicate the maximum
degree to which a measure can be faked, and provide limited information
about the extent to which faking occurs in applied settings, or how it
influences the validity of personality measures.
As with most types of experimental selection research, it is difficult to
replicate the motivation and thought processes of real applicants. This is
unfortunate because it is very difficult to study phenomena such as response inflation in an operational setting. However, we believe that several
design features of the current study enhance the generalizability of results
compared with those of previous experimental studies. For one, interviewees were advanced undergraduates and graduate students who had both
previous interview and relevant work experience. In addition, most students participated in this study to prepare for the postgraduation job search
process, and thus appeared to take the interviews quite seriously (e.g.,
dressed up for the interviews, appeared nervous, requested feedback on
their performance). In addition, experienced interviewers administered the
interviews in a professional setting (e.g., interviews were conducted in
actual interview rooms in the university career center).
Perhaps the most notable reason for the enhanced generalizability of the
current study is the experimental manipulation. First, before the interview,
participants reviewed a detailed description of the CSM job. Then, rather
than simply asking participants to fake good as in the typical faking
study, interviewees in the applicant-like condition were asked to respond to
the interview and NEO like an applicant who was highly interested in this
specific position. Not only did this increase the realism of the study, there
was also an empirical basis for using these instructions. Specifically, there
is increasing evidence that asking individuals to respond toward the requirements of a specific job is more representative of applicant behavior
than is simply asking individuals to fake good (Burnkrant, 2001; Kluger &
Collela, 1993; Miller & Tristan, 2002; Shilobod & Raymark, 2003).
Finally, our goal was to determine, under controlled conditions, whether
applicant-like instructions would produce differential effects across interview and self-report scores of the same personality dimensions. Thus, even
if response inflation in an experimental setting is more severe than in an
applicant setting, the results of this study provide important evidence
regarding the relative effect of applicant-like instructions on these two
types of measures. We do not attempt to estimate the effect sizes that would
be expected in an operational setting.
1
Within each condition, participants were administered either a behavioral interview or a situational interview. Thus, the design underlying this
study was actually a 2 (honest vs. applicant-like instructions) 2 (behavioral vs. situational interview) between-subjects design. The two sets of
interview questions were identical with the exception of verb tense (i.e.,
What did you do? vs. What would you do?). In fact, the means,
standard deviations, reliability estimates, and construct-related validity
evidence of the behavioral and situational interviews were highly similar.
As a result, the data from the two interviews were combined.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
542
Measures
Procedure
Each session of the study began with a 1-hr interviewer orientation led
by Chad H. Van Iddekinge. During this time, the general purpose of the
study was described and interviewers were instructed how to (a) administer
the interview, (b) use the interview guide, and (c) make evaluations using
the behavioral rating scales. The orientation concluded with a brief review
of the critical aspects of the study and an opportunity to ask questions. It
should be noted, however, that interviewers were blind to the specific
purpose of the study.
Interviews were conducted in interview rooms at the campus career
center. The orientation leader greeted each interviewee when they arrived
and escorted them to an interview room where they were asked to read a
brief job description of the CSM position for which they were interviewing.
Next, interviewees were read the instructions for the NEO (or interview)
according to the condition to which they were assigned (i.e., respond
honestly or like an applicant). Interviewees in the honest condition were
told that their interview and NEO responses would be kept completely
confidential and that the answers they provided should be as truthful as
possible. Interviewees in the applicant-like condition were asked to respond like an applicant who really wanted to become a CSM. Consistent
with past research (e.g., McFarland & Ryan, 2000; Mueller-Hanson et al.,
2003), interviewees in this condition were also told that top scoring
participants would be rewarded (i.e., given $50), providing them additional
motivation to perform.
Panels of two interviewers conducted each interview. Within each interview, the order of the nine interview items was randomized across
participants. Interviewers took turns asking questions and independently
rated each question on the 5-point scale immediately after the response.
This was done to minimize the general impression effects that have been
shown to influence ratings made once the entire interview is completed
(Webster, 1982). After the interview, interviewers returned to the orientation room to evaluate the overall performance of the interviewee (using the
three global ratings) and rate his or her personality with the NEO. The
interviewee remained in the interview room to complete a poststudy
questionnaire or to take the NEO (followed by the questionnaire) if the
interview was administered first. Lastly, participants were given an opportunity to ask questions and thanked for their time.
Results
Manipulation Check
In the poststudy questionnaire, participants were asked to recall
both the position they were applying for and the instructions they
were given prior to completing the interview and NEO. Over 95%
of participants correctly indicated that they were completing the
measures for the CSM position, and 98% recalled the instructions
they were given. Participants were also asked six questions about
the degree to which they attempted to inflate their responses in the
interview (three items, .89) and on the NEO (three items,
.88). Each item (e.g., My test responses were completely honest.) was rated on a 5-point scale that ranged from strongly
disagree (1) to strongly agree (5). The mean of the three interview
items was 4.63 (SD 0.58) in the honest condition and 3.04
(SD 0.98) in the applicant-like condition (higher ratings indicating greater honesty). A similar pattern of results was found on
the three NEO items whereby the mean in the honest group was
4.60 (SD 0.55) and the mean in the applicant-like group was
2.79 (SD 0.93). Taken together, participants in the applicant-like
condition reported attempting to inflate their interview and NEO
responses to a significantly greater extent ( p .01) than did
participants in the honest condition.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Descriptive Statistics
Table 1 displays descriptive statistics and reliability estimates
for interview ratings in the honest and applicant-like conditions.
All dimension means were higher in the applicant-like condition
than in the honest condition, whereas the standard deviations and
internal consistency reliability estimates were smaller in the
applicant-like condition. In contrast, the interclass correlation coefficients (ICCs; McGraw & Wong, 1996) between the two interviewers in each interview were similar in the two conditions.
The descriptives and internal consistency reliability estimates
for self-report and interviewer NEO ratings are shown in Table 2.
As with the interview ratings, all dimension means were higher in
the applicant-like condition than in the honest condition, whereas
the standard deviations and reliability estimates were consistently
smaller. Once again, the interrater reliability estimates were similar in the two conditions. We also note that the mean interview
and self-report NEO ratings in the honest condition were almost
identical (Ms 4.02 and 4.01, respectively). Thus, any observed
differences in response inflation cannot be attributed to a leniency
effect often observed in interview ratings.
Table 1
Descriptive Statistics and Reliability Estimates for Structured
Interview Ratings in the Honest and Applicant-Like Conditions
Variable/condition
Altruism
Honesta
Applicant-likeb
Self-discipline
Honest
Applicant-like
Vulnerability
Honest
Applicant-like
Mean
Honest
Applicant-like
Global ratings
Honest
Applicant-like
SD
ICC
3.93
4.27
0.76
0.55
.71
.55
.88
.75
4.13
4.36
0.73
0.58
.68
.59
.78
.79
3.99
4.32
0.64
0.48
.62
.53
.64
.67
4.02
4.31
0.55
0.43
.67
.56
.86
.80
5.32
5.90
1.28
1.00
.95
.96
.82
.79
Note. Reliability estimates for the mean interview ratings represent the
mean alpha across the three personality dimensions. Structured interview
ratings were made on a 15 scale, whereas global interview ratings were
made on a 17 scale. internal consistency reliability estimate (alpha).
ICC intraclass correlation coefficient (C,2) between the ratings of the
two interviewers in each interview.
a
n 73. b n 70.
543
Table 2
Descriptive Statistics and Reliability Estimates for Interviewer
and Self-Report NEO Ratings in the Honest and Applicant-Like
Conditions
Variable/condition
Interviewer NEO ratings
Altruism
Honesta
Applicant-likeb
Self-Discipline
Honest
Applicant-like
Vulnerability
Honest
Applicant-like
Mean
Honest
Applicant-like
Self-report NEO ratings
Altruism
Honest
Applicant-like
Self-Discipline
Honest
Applicant-like
Vulnerability
Honest
Applicant-like
Mean
Honest
Applicant-like
SD
ICC
3.67
3.75
0.51
0.46
.84
.85
.71
.73
4.02
4.05
0.51
0.50
.89
.88
.62
.67
3.92
4.05
0.46
0.41
.85
.85
.51
.69
3.87
3.95
0.42
0.37
.91
.92
.63
.72
4.18
4.58
0.51
0.39
.81
.74
3.95
4.78
0.62
0.29
.85
.68
3.91
4.77
0.52
0.28
.80
.77
4.01
4.71
0.40
0.26
.82
.73
Note. internal consistency reliability estimate (alpha). ICC intraclass correlation coefficient (C,2) between the ratings of the two interviewers in each interview.
a
n 73. b n 70.
the ratings. We report results from both the honest and applicantlike conditions to compare the extent to which instructions to
inflate influenced the construct-related validity of interviewer and
self-report personality ratings.
With regard to interrater reliability, the ICCs (C,2) for Altruism,
Self-Discipline, and Vulnerability (respectively) were .88, .78, and
.58 in the honest condition and .75, .79, and .67 in the applicantlike condition. The ICCs for the mean ratings of the two interviewers in each interview were .86 and .80, respectively, in the two
conditions. These values compare favorably to current metaanalytic interrater reliability estimates for employment interviews
of similar structure. For example, Conway, Jako, and Goodman
(1995) reported a mean interrater reliability of .75 (k 33, n
3,428) for interviews with standardized questions and follow-up
probing.
The internal consistency reliability estimates for the three interview dimensions were .71, .68, and .62 in the honest condition and
.55, .59, and .53 in the applicant-like condition. These estimates
were notably lower than the estimates for the corresponding selfreport NEO ratings (mean .87). However, interview reliabilities were based on only three items, whereas NEO reliabilities
were based on eight items. For comparison, the internal consistency reliability of an eight-item interview was estimated with the
Spearman-Brown prophecy formula. The corrected coefficients
alpha for Altruism, Self-Discipline, and Vulnerability (honest con-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
544
dition only) were .87, .85, and .81 (M .84), which are similar to
the corresponding NEO reliability estimates.
We then conducted two MTMM analyses. The first analysis was
performed on the mean dimension ratings of the two interviewers
in each interview. Interviewer was treated as the method factor in
this analysis, and the three personality dimensions represented the
trait factors. We randomly designated the two interviewers in each
interview as Interviewer 1 and Interviewer 2 and computed Pearson correlations between the ratings of the two sets of ratings.
Correlations were then transformed to z-scores, averaged, and
converted back to the corresponding correlation coefficients.
The MTMM matrix for this analysis is presented in Table 3. All
convergent validity estimates were statistically significant ( p
.01). In the honest condition, the mean monotrait heteromethod
correlation (i.e., mean correlation within dimension and between
interviewers) was .66, and the mean heterotraitmonomethod correlation (i.e., mean correlation across dimensions and within interviewer) was .38. In the applicant-like condition, the mean
monotrait heteromethod correlation was lower (.60), whereas
the mean heterotraitmonomethod correlation was higher (.45). A
t test of the mean z-scores indicated that the mean monotrait
heteromethod correlation was significantly larger than the mean
heterotraitmonomethod correlation in the honest condition, but
not in the applicant-like condition.
The second MTMM analysis examined relations between interviewer ratings of the interview questions and the NEO items. Thus,
measure (interview vs. NEO) served as the method factor in this
analysis. Table 4 displays the correlation matrix for this analysis,
which also includes correlations between interviewer and selfreport NEO ratings. In the honest condition, the mean monotrait
heteromethod correlation (i.e., mean correlation within dimension
and across measures) was .59 and the mean heterotrait
monomethod correlation (i.e., mean correlation across dimensions
and within measure) was .50. Similar results were found in the
applicant-like condition, for which the mean monotrait
heteromethod correlation was .63 and the mean heterotrait
monomethod correlation was .52. The fact that the mean
monotrait heteromethod correlations were larger than the mean
heterotraitmonomethod correlations provides evidence for the
construct-related validity of the interview ratings (Campbell &
Fiske, 1959). However, the difference between the two correla-
Table 3
MultitraitMultimethod Matrix of the Structured Interview
Ratings in the Honest and Applicant-Like Conditions
Variable
Interviewer 1 ratings
1. Altruism
2. Self-Discipline
3. Vulnerability
Interviewer 2 ratings
4. Altruism
5. Self-Discipline
6. Vulnerability
.43**
.29*
.47**
.31**
.32**
.65**
.61**
.17
.10
.42**
.66**
.39**
.41**
.56**
.51**
.79**
.25*
.35**
.40**
.67**
.53**
.27*
.24*
.48**
.43**
.45**
.41**
.36**
.22*
.57**
545
Table 4
MultitraitMultimethod Matrix of Interviewer and Self-Report Personality Ratings in the Honest and Applicant-Like Conditions
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Variable
Structured interview ratings
1. Altruism
2. Self-Discipline
3. Vulnerability
Interviewer NEO ratings
4. Altruism
5. Self-Discipline
6. Vulnerability
Self-report NEO ratings
7. Altruism
8. Self-Discipline
9. Vulnerability
.43**
.36**
.46**
.38**
.31**
.63**
.66**
.31**
.22*
.33**
.66**
.55**
.28*
.54**
.55**
.15
.00
.01
.12
.11
.16
.07
.08
.10
.63**
.34**
.39**
.44**
.58**
.59**
.37**
.51**
.55**
.44**
.47**
.37**
.80**
.32**
.82**
.10
.04
.08
.26*
.15
.17
.22*
.02
.07
.33**
.14
.01
.37**
.29*
.02
.10
.32**
.10
.43**
.25*
.05
.18
.32**
.18
.24*
.31**
.20*
.24*
.09
.46**
.54**
.29**
.72**
Note. Honest condition correlations are below the diagonal, and applicant-like condition correlations are above the diagonal. Convergent correlations are
in bold.
* p .05. ** p .01.
two-factor model in each condition by fixing the correlation between the Self-Discipline and Vulnerability factors to 1.0. This
model demonstrated a good fit to the data in the applicant-like
condition, and eliminating the third factor did not harm model fit,
2(2, N 70) 2.49, ns. In contrast, the fit of the two-factor
model was significantly worse than the three-factor model in the
honest condition, 2(2, N 73) 22.97, p .001.
For interviewer NEO ratings, the fit of the three-factor model
was almost identical in the honest and applicant-like conditions.
Although model fit was not great in either condition, all factor
loadings were large and statistically significant, the interfactor
correlations were modest, and the suggested modifications would
index, the adjusted goodness-of-fit index, the comparative fit index, and the root-mean-square error of approximation (RMSEA).
Goodness-of-fit index, adjusted goodness-of-fit index, and comparative fit index values greater than .90 are generally considered
acceptable, whereas values of .95 or above indicate an excellent fit
to the data. RMSEA values around .05 indicate a good fit for the
model (Browne & Cudeck, 1993; Hu & Bentler, 1998, 1999).
Analysis of the structured interview ratings revealed a decent fit
for the three-factor model in the honest condition. Although the fit
statistics for this model were actually better in the applicant-like
condition, the correlation between the Self-Discipline and Vulnerability factors exceeded unity. Thus, we assessed the fit of a
Table 5
Fit Statistics for Models of Interviewer and Self-Report Personality Ratings
Personality ratings/model
Structured interview ratings
3-factor model
Honest condition
Applicant-like condition
2-factor model
Honest condition
Applicant-like condition
Interviewer NEO ratings
3-factor model
Honest condition
Applicant-like condition
2-factor model
Honest condition
Applicant-like condition
Self-report NEO ratings
3-factor model
Honest condition
Applicant-like condition
2-factor model
Honest condition
Applicant-like condition
df
GFI
AGFI
CFI
RMSEA
43.07
30.99
24
24
.90
.91
.81
.84
.90
.94
.09
.06
66.04
33.48
26
26
.83
.91
.70
.84
.80
.93
.15
.06
393.29
393.34
249
249
.73
.70
.68
.64
.87
.85
.06
.08
428.75
402.54
251
251
.70
.70
.64
.64
.84
.81
.08
.09
402.17
575.04
249
249
.69
.63
.63
.55
.75
.50
.09
.12
454.41
578.76
251
251
.66
.63
.59
.56
.67
.50
.11
.12
Note. All chi-square statistics are significant ( p .05) with the exception of the three- and two-factor models
of the structured interview ratings in the applicant-like condition. df degrees of freedom. GFI goodnessof-fit index. AGFI adjusted goodness-of-fit index. CFI comparative fit index. RMSEA root-mean-square
error of approximation.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
546
not notably enhance the fit of the model. Nonetheless, we performed an exploratory factor analysis to identify potential alternative models. As with the structured interview ratings, a two-factor
model appeared to have some promise. However, the fit of this
model was significantly worse than the three-factor model in both
the honest condition, 2(2, N 73) 35.46, p .001, and in the
applicant-like condition, 2(2, N 70) 9.20, p .05.
We then evaluated the underlying structure of the self-report
NEO ratings. In the honest condition, results suggested that the
three-factor model did not fit the data very well. However, the
three factors were relatively independent, all factor loadings were
significant and greater than .50, and the suggested modifications
would provide only a minimal improvement in model fit. This
finding is consistent with prior NEO research (e.g., McCrae,
Zonderman, Costa, & Bond, 1996) in which self-report scores
appear to represent that intended personality factors, yet the associated CFA statistics do not indicate a good fit. The fit statistics,
however, were appreciably better in the honest condition than in
the applicant-like condition. In addition, several factor loadings in
the applicant-like condition were small and nonsignificant, and the
interfactor correlations were much larger. The correlation of .93
between the Self-Discipline and Vulnerability factors suggested
that a two-factor model might better explain these data. We therefore evaluated the fit of the two-factor model. In the applicant-like
condition, the fit indices for this model were practically identical
to those of the three-factor model, whereas the two-factor model fit
much worse in the honest condition. Indeed, the change in chisquare (from the two- to three-factor model) was significant in
honest condition, 2(2, N 73) 52.24, p .001, but not in
applicant-like condition, 2(2, N 73) 3.72, ns.
To summarize, results of the analyses described above provide
some evidence that structured interviews can be developed to
assess facets of the FFM in a simulated selection setting. Results
also suggest that instructions to respond like a job applicant had a
negative effect on the construct-related validity of both interviewer
and self-report personality ratings. Analysis of the structured interview ratings and self-report NEO ratings yielded similar results
in that the a priori model fit the data best in the honest condition,
whereas a two-factor model appeared to be the most parsimonious
model in the applicant-like condition. In contrast, analysis of
interviewer NEO ratings indicated that the a priori model demonstrated the best fit in both conditions.
Discussion
Summary of Main Findings
The objective of this study was to compare the effects of
response inflation in a structured interview and a self-report mea2
Table 6
Effect Size Differences Between the Applicant-Like and Honest
Conditions for Interviewer and Self-Report Personality Ratings
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Variable
Altruism ratings
Structured interview
Interviewer NEO
Self-report NEO
Self-Discipline ratings
Structured interview
Interviewer NEO
Self-report NEO
Vulnerability ratings
Structured interview
Interviewer NEO
Self-report NEO
Mean ratings
Structured interview
Global interview
Interviewer NEO
Self-report NEO
dc
SE
95% CI
0.44*
0.15
0.79**
0.52**
0.16
0.88**
0.17
0.17
0.18
0.190.85
0.170.49
0.531.23
0.31*
0.07
1.67**
0.38*
0.07
1.87**
0.17
0.17
0.20
0.050.71
0.260.40
1.482.26
0.51**
0.30*
1.34**
0.65**
0.32*
1.45**
0.17
0.17
0.19
0.320.98
0.010.65
1.081.82
0.54**
0.43**
0.20
1.74**
0.66**
0.46**
0.21
1.92**
0.17
0.17
0.17
0.20
0.330.99
0.130.79
0.120.54
1.532.31
547
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
548
Limitations
Although we believe the present results are informative, it is
important to note some limitations of this study. First, although we
recruited experienced interviewers to conduct the interviews, the
ratings that they provided were not for keeps. This clearly limits
the extent to which these results can be generalized to an organizational context. The fact that college students served as interviewees in this study may also decrease the generalizability of the
results. The vast majority of students, however, were advanced
undergraduates and graduate students who had both previous interview and customer service experience. Further, most students
participated in this study to help prepare for the postgraduation job
search process and thus appeared to take the interviews quite
seriously. Nonetheless, readers should be cautious about generalizing these results to the use of structured interviews in applied
settings.
Second, we compared response inflation in the interviews and
NEO by manipulating the experimental instructions (i.e., respond
honestly or like a job applicant). As discussed, experimental faking
studies have been criticized in the literature (e.g., Hough & Ones,
2001). We agree that the behavior of participants instructed to
pretend like a job applicant may be quite different from the
behavior of actual applicants. However, we went to great lengths
to make the research environment as realistic as possible and
included design features previous studies have not (e.g., used
stimuli to encourage job-desirable responding). Moreover, even if
inflation in an experimental setting is more severe than in an
applicant setting, these results are still relevant because we demonstrated that there were relative differences in inflation for self
and interviewer ratings of the same personality factors. Thus, we
believe that these results provide useful information about the use
of structured interviews for personality assessment.
Finally, the time commitment required of both the interviewers
and the student interviewees limited the number of participants
available for this study. Although the size of this sample compares
favorably to those of previous studies in this general area of
research (e.g., Barrick et al., 2000; Blackman, 2002), it would have
been helpful if these results had been based on a larger sample. We
hope that researchers with access to larger samples can attempt to
replicate and extend the present findings.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
correlations) is higher for more observable traits such as extraversion than for less visible traits such as openness (e.g., Borkenau &
Liebler, 1992; Connolly & Viswesvaran, 1998; Funder & Dobroth,
1987; Watson, 1989). Thus, studies could examine whether employment interviews are more effective for measuring some personality factors than others.
The constructs interviews are designed to measure may also
influence the extent to which applicants can inflate their scores.
For example, inflation may be higher in interviews designed to
assess personality-related constructs than those designed to assess
job knowledge. Given that employment interviews are developed
to measure a variety of constructs (Huffcutt, Conway, et al., 2001),
additional research is needed to better understand the amount and
effects of response inflation in an interview setting.
Finally, future research should compare behavioral and situational interviews for assessing personality and resistance to response inflation. There are reasons why these two interview formats may be differentially effective for assessing personality. For
example, behavioral interviews should be effective for evaluating
personality because traits influence behavior. It is also likely that
personality influences intentions, and thus situational interviews
could be useful for measuring personality.
These two interview formats may also vary in the extent to
which they minimize response inflation. For example, Ellis et al.
(2002) found that interviewees use different impression management tactics in behavioral and situational interviews. Because
future behaviors cannot be verified, it may be easier for individuals
to inflate their responses to situational questions than to behavioral
questions, which in many cases can be confirmed (e.g., via a
reference check). However, there is evidence that performance in
situational interviews depends to some extent on job knowledge
(Conway & Peneno, 1999; Motowidlo, 1999). Therefore, interviewees unfamiliar with the job of interest may be less able to
inflate their responses in a situational interview. Given the popularity of behavioral and situational interviews, we encourage researchers to directly compare the two formats for measuring personality and susceptibility to inflation.
References
Bargh, J. A. (1984). Automatic and conscious processing of social information. In R. Wyer & T. Srull (Eds.), Handbook of social cognition
(Vol. 3, pp. 1 43). Mahwah, NJ: Erlbaum.
Bargh, J. A. (1994). The four horseman of automaticity: Awareness,
intention, efficiency, and control in social cognition. In R. Wyer & T.
Srull (Eds.), Handbook of social cognition (Vol. 1, 2nd ed., pp. 1 40).
Mahwah, NJ: Erlbaum.
Bargh, J. A., & Thein, R. D. (1985). Individual construct accessibility,
person memory, and the recall-judgment link: The case of information
overload. Journal of Personality and Social Psychology, 49, 1129 1146.
Barrick, M. R., & Mount, M. K. (1996). Effects of impression management
and self-deception on the predictive validity of personality constructs.
Journal of Applied Psychology, 81, 261272.
Barrick, M. R., Mount, M. K., & Judge, T. A. (2001). Personality and
performance at the beginning of the new millennium: What do we know
and where do we go next? International Journal of Selection and
Assessment, 9, 9 30.
Barrick, M. R., Patton, G. K., & Haugland, S. N. (2000). Accuracy of
interview judgments of job applicant personality traits. Personnel Psychology, 53, 925951.
Binning, J. F., LeBreton, J. M., & Adorno, A. J. (1999). Assessing
549
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
550
procedures and future job seeking behaviors. Paper presented at the 13th
Annual Conference of the Society for Industrial and Organizational
Psychology, Dallas, TX.
Hogan, R., Hogan, J., & Roberts, B. W. (1996). Personality measurement
and employment decisions. American Psychologist, 51, 469 477.
Hough, L. M. (1998). Effects of intentional distortion in personality measurement and evaluation of suggested palliatives. Human Performance,
11, 209 244.
Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy,
R. A. (1990). Criterion-related validities of personality constructs and
the effect of response distortion on those validities. Journal of Applied
Psychology, 75, 581595.
Hough, L. M., & Furnham, A. (2003). Importance and use of personality
variables in work settings. In I. B. Weiner (Ed.-in-Chief) & W. Borman,
D. Ilgen, & R. Klimoski (Vol. Eds.), Handbook of psychology: Vol. 12.
Industrial and organizational psychology (pp. 131169). New York:
Wiley.
Hough, L. M., & Ones, D. S. (2001). The structure, measurement, validity,
and use of personality variables in industrial, work, and organizational
psychology. In N. R. Anderson, D. S. Ones, H. K. Sinangil, & C.
Viswesvaran (Eds.), Handbook of work psychology (pp. 233377). London and New York: Sage.
Hough, L. M., Oswald, F. L., & Ployhart, R. E. (2001). Determinants,
detection, and amelioration of adverse impact in personnel selection
procedures: Issues, evidence, and lessons learned. International Journal
of Selection and Assessment, 9, 152194.
Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structure
modeling: Sensitivity to underparametized model misspecification. Psychological Methods, 3, 424 453.
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance
structure analysis: Conventional criteria versus new alternatives. Structural Equations Modeling, 6, 155.
Huffcutt, A. I., Conway, J. M., Roth, P. L., & Stone, N. J. (2001).
Identification and meta-analytic assessment of psychological constructs
measured in employment interviews. Journal of Applied Psychology, 86,
897913.
Huffcutt, A. I., Weekley, J., Wiesner, W. H., DeGroot, T., & Jones, C.
(2001). Comparison of situational and behavior description interview
questions for higher-level positions. Personnel Psychology, 54, 619
644.
Jackson, D. N., Peacock, A. C., & Holden, R. R. (1982). Professional
interviewers trait inferential structures for diverse occupational groups.
Organizational Behavior and Human Performance, 29, 120.
Jackson, D. N., Peacock, A. C., & Smith, J. P. (1980). Impressions of
personality in the employment interview. Journal of Personality and
Social Psychology, 39, 294 307.
Joreskog, K. G., & Sorbom, D. (1996). LISREL 8: Users reference guide.
Chicago: Scientific Software International.
Judge, T. A., Bono, J. E., Remus, I., & Gerhardt, M. W. (2002). Personality
and leadership: A qualitative and quantitative review. Journal of Applied
Psychology, 87, 765780.
Judge, T. A., Martocchio, J. J., & Thoresen, C. J. (1997). Five factor model
of personality and employee absence. Journal of Applied Psychology,
82, 745755.
Kluger, A. N., & Collela, A. (1993). Beyond the mean bias: The effect of
warning against faking on biodata item variances. Personnel Psychology, 46, 763780.
Kristof, A. L. (2000). Perceived applicant fit: Distinguishing between
recruiters perceptions of person-job and person-organization fit. Personnel Psychology, 53, 643 671.
Kristof-Brown, A., Barrick, M. R., & Franke, M. (2002). Influences and
outcomes of candidate impression management use in job interviews.
Journal of Management, 28, 27 46.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
551
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
552
Trull, T. J., Widiger, T. A., & Burr, R. (2001). A structured interview for
the assessment of the five-factor model of personality: Facet-level relations to the Axis II personality disorders. Journal of Personality, 69,
175198.
Trull, T. J., Widiger, T. A., Useda, J. D., Holcomb, J., Doan, B., Axelrod,
S. R., et al. (1998). A structured interview for the assessment of the
five-factor model of personality. Psychological Assessment, 10, 229
240.
Tupes, E. C., & Christal, R. E. (1992). Recurrent personality factors based
on trait ratings. Journal of Personality, 60, 225251.
Van Iddekinge, C. H., Raymark, P. H., Eidson, C. E., Jr., & Attenweiler,
W. (2004). What do structured interviews really measure? The construct
validity of behavior description interviews. Human Performance, 17,
7193.
Van Iddekinge, C. H., Raymark, P. H., Eidson, C. E., Jr., & Putka, D. P.
(2003, April). Applicant-incumbent differences on personality, integrity,
and customer service measures. Paper presented at the 18th Annual
Conference of the Society for Industrial and Organizational Psychology,
Orlando, FL.
Vrij, A., Edward, K., Roberts, K. P., & Bull, R. (2000). Detecting deceit via
analysis of verbal and nonverbal behavior. Journal of Nonverbal Behavior, 24, 239 263.
Vrij, A., Semin, G. R., & Bull, R. (1996). Insight into behavior displayed
during deception. Human Communication Research, 22, 544 562.
Watson, D. (1989). Strangers ratings of the five robust personality factors:
Evidence of a surprising convergence with self-report. Journal of Personality and Social Psychology, 57, 120 128.
Webster, E. C. (1982). The employment interview: A social judgment
process. Schomberg, Canada: S. I. P.
Widiger, T. A., & Sanderson, C. J. (1995). Assessing personality disorders.
In J. N. Butcher (Ed.), Clinical personality assessment: Practical approaches (pp. 380 394). New York: Oxford University Press.
Woehr, D. J., & Arthur, W., Jr. (2003). The construct-related validity of
assessment center ratings: A review and meta-analysis of the role of
methodological factors. Journal of Management, 29, 231258.
Zimmerman, M. (1994). Diagnosing personality disorders: A review of
issues and research methods. Archives of General Psychiatry, 51, 225
245.