Assessing Personality With A Structured Employment Interview

Journal of Applied Psychology
2005, Vol. 90, No. 3, 536 552
Copyright 2005 by the American Psychological Association

0021-9010/05/$12.00 DOI: 10.1037/0021-9010.90.3.536
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Assessing Personality With a Structured Employment Interview:

Construct-Related Validity and Susceptibility to Response Inflation
Chad H. Van Iddekinge
Patrick H. Raymark and Philip L. Roth
Human Resources Research Organization
Clemson University
The authors evaluated the extent to which a personality-based structured interview was susceptible to
response inflation. Interview questions were developed to measure facets of agreeableness, conscientiousness, and emotional stability. Interviewers administered mock interviews to participants instructed
to respond honestly or like a job applicant. Interviewees completed scales of the same 3 facets from the
NEO Personality Inventory, under the same honest and applicant-like instructions. Interviewers also
evaluated interviewee personality with the NEO. Multitraitmultimethod analysis and confirmatory
factor analysis provided some evidence for the construct-related validity of the personality interviews. As
for response inflation, analyses revealed that the scores from the applicant-like condition were significantly more elevated (relative to honest condition scores) for self-report personality ratings than for
interviewer personality ratings. In addition, instructions to respond like an applicant appeared to have a
detrimental effect on the structure of the self-report and interview ratings, but not interviewer NEO
ratings.
Although personality is typically assessed with self-report measures (Hough & Ones, 2001), recently there has been an increased
interest in using employment interviews to assess personality
(Barrick, Patton, & Haugland, 2000; Binning, LeBreton, &
Adorno, 1999; Huffcutt, Conway, Roth, & Stone, 2001). There are
several reasons why it is worthwhile to evaluate interviews as a
method of personality assessment. For one, research suggests that
interviewer ratings of personality-related constructs may predict
job performance ratings with higher validity than self-report personality scores (Huffcutt, Conway, et al., 2001). There is also
evidence that interviews often result in more favorable applicant
reactions than paper-and-pencil personality tests (e.g., Hamill &
Bartle, 1998; Smither, Reilly, Millsap, Pearlman, & Stoffey, 1993;
Steiner & Gilliland, 1996). Decision makers also like to include
interviews in the selection process to get to know job candidates.
Finally, interviews could be used in addition to self-report personality measures to provide a more complete assessment of the
personality dimensions relevant to the job of interest. For example,
a personality-based interview could be used in the final stage of the
selection process to verify or further probe self-report personality
data obtained in earlier stages.
A recurring theme in the personality assessment literature concerns applicant response distortion (Hough, 1998; Hough et al.,
1990; Ones & Viswesvaran, 1998). Although an abundance of
studies have examined the effects of faking on self-report personality measures, researchers have all but ignored the prevalence and
effects of faking in employment interviews. With this in mind, the
purpose of the current study was to determine the extent to which
a personality-based interview was susceptible to response inflation by comparing it to the amount of inflation on a selfreport measure that assessed the same personality dimensions.
We begin by reviewing the results of previous studies relevant
to personality assessment and response inflation within an
interview context.
In the past decade, a substantial amount of research has examined relations between the Five-Factor Model of personality (FFM;
Goldberg, 1990, 1993; Tupes & Christal, 1992) and behavior at
work. This research indicates that the FFM dimensions can be
related to various organizational outcomes, including job performance, leadership, and training success (Barrick, Mount, & Judge,
2001; Colquitt, LePine, & Noe, 2000; Judge, Bono, Remus, &
Gerhardt, 2002), as well as counterproductive work behaviors like
delinquency (Hough, Eaton, Dunnette, Kamp, & McCloy, 1990),
turnover (Barrick & Mount, 1996), and absenteeism (Judge, Martocchio, & Thoresen, 1997). Researchers have also found that the
Big Five factors can explain variance in performance beyond that
of other predictors, such as cognitive ability test scores (McHenry,
Hough, Toquam, Hanson, & Ashworth, 1990) and assessment
center ratings (Goffin, Rothstein, & Johnston, 1996). Furthermore,
in addition to this validity evidence, traditional paper-and-pencil
personality inventories are relatively inexpensive, are easy to administer, and tend not to produce adverse impact against protected
groups (Hough, Oswald, & Ployhart, 2001).
Chad H. Van Iddekinge, Human Resources Research Organization,

Alexandria, Virginia; Patrick H. Raymark, Department of Psychology,
Clemson University; Philip L. Roth, Department of Management, Clemson
University.
An earlier version of this article was presented at the 17th Annual
Conference of the Society for Industrial and Organizational Psychology,
Toronto, Ontario, Canada, April 2002.
The authors thank Lynn McFarland and Jose Cortina for their helpful
comments on drafts of this article. We are also indebted to Flora Riley and
the Clemson University Career Center for helping us recruit study participants and for providing a professional setting in which to conduct this
research.
Correspondence concerning this article should be addressed to Chad H.
Van Iddekinge, Human Resources Research Organization (HumRRO), 66
Canal Center Plaza, Suite 400, Alexandria, VA 22314-1591. E-mail:
cvaniddekinge@humrro.org
536
RESPONSE INFLATION IN EMPLOYMENT INTERVIEWS
Sources of Personality Ratings

The source of personality ratings has received increased research attention in recent years. Although personality is typically
measured with self-report measures, studies suggest that observer
assessments of personality are similar to self-report ratings and
may be more valid (Funder, 1999, 2001). For example, Funder and
colleagues (e.g., Blackman & Funder, 1998, 2002; Colvin &
Funder, 1991; Funder & Colvin, 1988; Funder & Sneed, 1993)
have repeatedly demonstrated that peer judgments of personality
correlate highly with one another and with self-judgments. A
meta-analysis of 36 studies by Connolly and Viswesvaran (1998)
revealed that self and observer ratings of the FFM correlated
between .46 (agreeableness) and .62 (extraversion). Corrected
correlations were much higher when the observer was a parent or
spouse (rc .64) than when the observer was a stranger who saw
only a short video of the target (rc .20). Interrater reliability
estimates of observer ratings were quite high across the Big Five
factors, ranging from .69 for emotional stability to .81 for
extraversion.
It has also been proposed that because the FFM dimensions are
behavioral in nature, observer ratings of personality should correlate more highly with job performance than self-ratings, particularly if observer ratings are based on an individuals personality at
work (Hogan, Hogan, & Roberts, 1996). Several studies have
shown that supervisor and/or peer ratings of personality are more
predictive of job performance than are self-report ratings (e.g.,
Brown, Diefendorff, Kamin, & Lord, 1999; Mount, Barrick, &
Strauss, 1994; Nilsen, 1995). Nevertheless, the observers in these
studies (i.e., supervisors, peers, and subordinates) presumably
knew the target workers quite well. At this point, it is unclear
whether these findings generalize to personality assessment in an
interview setting, given that interviewers typically do not know job
applicants.
Assessing Personality in Interviews

Personnel researchers have become increasingly interested in
assessing personality with employment interviews. Early research
in this area indicated that (a) interviewers can correctly identify the
personality characteristics associated with different occupations,
and that (b) applicant personality influences interviewer judgments
of job suitability (Jackson, Peacock, & Holden, 1982; Jackson,
Peacock, & Smith, 1980; Paunonen & Jackson, 1987; Paunonen,
Jackson, & Oberman, 1987; Rothstein & Jackson, 1980).
More recently, Huffcutt, Conway, et al. (2001) found that more
than one third of all employment interview questions appear to be
logically related to personality variables such as conscientiousness
(e.g., dependability) and emotional stability (e.g., stress tolerance).
Results also suggested that interviewer ratings of these dimensions
might be more predictive of job performance than self-report
personality scores. Corrected correlations between ratings of
highly structured interviews and job performance ratings varied
from .37 for conscientiousness to .56 for emotional stability.
However, Huffcutt and colleagues cautioned that criterion-related
validity evidence is insufficient for determining whether employment interviews measure the constructs that they are designed to
measure. In fact, they are among a growing contingent of researchers who have called for additional research on the construct-related
537
validity of interview-based personality judgments (Barrick et al.,

2000; Borman, Hanson, & Hedge, 1997; Hough & Furnham, 2003;
Kristof, 2000; Landy, Shankster, & Kohler, 1994).
Other recent studies have examined the relationship between
overall interview ratings and applicant scores on self-report personality measures (e.g., Conway & Peneno, 1999; Cook, Vance, &
Spector, 2000; Cortina, Goldstein, Payne, Davison, & Gilliland,
2000; Huffcutt, Weekley, Wiesner, DeGroot, & Jones, 2001). In
general, this research has found only modest correlations between
interview ratings (of various constructs) and self-report scores of
FFM dimensions such as conscientiousness and extraversion.
However, the interviews in these studies were not specifically
designed to measure the personality factors assessed in the corresponding self-report measures. As such, little is known about the
association between interviewer and self-report ratings of the same
personality factors.
We know of only two published studies that have examined the
construct-related validity of interviewer personality judgments in a
simulated selection setting. Barrick et al. (2000) had experienced
interviewers conduct mock interviews with undergraduates and
then assess interviewee personality with an adjective checklist.
The resulting personality judgments were correlated with selfreport ratings of the same personality factors. Analyses revealed
that interviewer and self-report personality ratings were moderately related. Statistically significant self-interviewer correlations
(corrected for test unreliability) were found for extraversion (rc
.49), openness (rc .44), and agreeableness (rc .37). However,
nonsignificant relations were found for the two Big Five factors
that most consistently predict job performance: conscientiousness
and emotional stability. Unfortunately, the researchers did not
report discriminant correlations, which are needed to properly
evaluate these convergent validity estimates. We also note that the
interviewer personality ratings in this study were based on interviewers general impressions of students over the course of the
interview, and were not tied to specific interview questions.
Blackman (2002) also investigated the correspondence between
self and interviewer judgments of personality. In this study, undergraduate students conducted mock job interviews, and then
interviewers, interviewees, and peers completed a modified version of the California Q-Set. The results revealed moderate to high
convergence between self, peer, and interviewer personality judgments. The mean correlation between self and other ratings was
.50 for a structured interview and .61 for an unstructured interview.
Like the Barrick et al. (2000) study, however, these interviews
were not designed to assess specific personality variables (e.g., one
question was Describe yourself to me.), nor did Blackman report
any evidence concerning the discriminant validity of interviewer
judgments.
Findings from studies within clinical psychology also provide
insight into using interviews to measure personality. Clinicians
have used interviews to assess personality for decades, and several
semistructured interview formats have been developed to assess
dysfunctional personality (Perry, 1992; Widiger & Sanderson,
1995; Zimmerman, 1994). However, there is surprisingly little
research supporting the validity of clinical judgments of healthy
personality (McIntire & Miller, 2000; Murphy & Davidshofer,
1998). Only in recent years has a comparable interview-based
method for assessing normal personality emerged from this
literature.
538
VAN IDDEKINGE, RAYMARK, AND ROTH
Trull and colleagues have developed an interview to measure

the Big Five factors, the Structured Interview for the Five-Factor
Model of Personality (SIFFM; Trull & Widiger, 1997; Trull,
Widiger, & Burr, 2001; Trull et al., 1998). The SIFFM includes
120 items that the clinician rates on a Likert-type scale. For
example, the question Is it important for you to get what you
want? If yes, have you ever exploited or conned somebody out of
something? is rated on a 3-point scale that ranges from no, it is
not important to get what he or she wants to yes, has exploited or
conned somebody out of something. The results of studies on the
psychometric characteristics of the SIFFM have been encouraging.
For instance, Trull et al. (1998) reported interrater reliability
estimates above .90 across the five SIFFM dimensions. They also
found (within an undergraduate population) large correlations between interviewer and self-report personality ratings, ranging from
.69 for emotional stability and openness to .82 for conscientiousness. SIFFM ratings also demonstrated evidence of discriminant
validity in that the highest discriminant correlation across the Big
Five factors was only .34.
We should note, however, that the SIFFM is more similar to an
oral version of a typical self-report personality measure (e.g.,
dozens of items, Likert-type rating scale) than a typical structured
employment interview. In addition, although the SIFFM was intended to assess the FFM, the developers caution that many of the
questions are skewed toward maladaptive variants of the Big Five.
Nonetheless, the work by Trull et al. (1998) provides some evidence that accurate personality assessments can be made with this
type of interview.
In summary, there is evidence from applied and clinical psychology research to suggest that interviewer personality judgments
can correspond moderately with self-judgments of personality. At
the same time, however, the absence of discriminant validity
evidence in studies that have used mock employment interviews
means that this agreement may be the result of common stereotypes and other judgmental biases. In fact, a recent review of this
literature has raised serious questions about the assumption that
employment interviews are effective for personality assessment
(Hough & Furnham, 2003). Furthermore, we were unable to find
any studies that have examined the influence of response inflation
on the construct- or criterion-related validity of personality-based
interviews. As we discuss in the following section, response inflation may be an issue if interviews designed to assess personality
are used for selection.
Response Distortion in Self-Report Personality Measures

Despite the potential benefits of assessing personality during the
selection process, some researchers remain hesitant to use selfreport personality inventories because they believe that these measures are vulnerable to response distortion, or faking (Ones &
Viswesvaran, 1998). Most experts believe that people can misrepresent themselves on personality measures (Hough & Ones, 2001).
There is also evidence that at least some applicants distort
their responses on such measures during the selection process
(e.g., Hough, 1998; Rosse, Stecher, Miller, & Levin, 1998; Van
Iddekinge, Raymark, Eidson, & Putka, 2003). However, the debate
continues about what effect, if any, faking has on the constructand criterion-related validity of personality measures in applied
settings.
The results of some studies suggest that faking can have detrimental effects on the psychometric properties of personality inventories and the selection decisions based on these measures. For
example, several studies (e.g., Schmit & Ryan, 1993) have found
that a sixth, ideal employee factor emerges in analyses of FFM
data from applicant samples. Other researchers (e.g., Stark,
Chernyshenko, Chan, Lee, & Drasgow, 2001) have found differential item and test functioning between applicants and incumbents
who completed the same personality measure. Studies have also
found that faking can attenuate the criterion-related validity of
personality measures (e.g., Douglas, McDaniel, & Snell, 1996).
Furthermore, some researchers (e.g., Griffith, Chmielowski, Snell,
Frei, & McDaniel, 2000) have demonstrated that even if faking
does not reduce the predictive validity of personality measures, it
can influence which applicants are hired in a top-down selection
system. There is also some evidence that the expected performance
of individuals who distort their responses on personality inventories is systematically lower than the expected performance of
nonfakers (Mueller-Hanson, Heggestad, & Thornton, 2003).
Other researchers provide evidence to suggest that faking does
not influence the validity and psychometric quality of personality
measures. For instance, recent studies (e.g., D. B. Smith, Hanges,
& Dickson, 2001) have reported that the factor structure of the
FFM inventories is invariant across applicant and nonapplicant
samples. M. A. Smith, Moriatry, Lutrick, and Canger (2001)
discovered that the FFM actually fit better in an applicant sample
than in a student sample. Other research (e.g., Robie, Zickar, &
Schmit, 2001) has failed to find evidence of differential item and
test functioning when comparing personality data from applicant
and incumbent groups. Moreover, results of some studies suggest
that faking does not negatively influence the criterion-related validity of personality measures. For example, a meta-analysis by
Ones, Viswesvaran, and Schmidt (1993) found higher relations
between integrity tests and supervisor ratings of performance in
applicant samples (rc .41) than in incumbent samples (rc .33).
Hough (1998) categorized studies by setting (applicant, incumbent, and directed faking studies) and concluded that setting moderates the predictive validity of personality measures. Specifically,
she found only minimal validity differences between applicant and
incumbent studies, but observed that directed faking experiments
result in much lower validity coefficients.
We do not attempt to resolve the faking debate in the current
study. We believe that there is evidence to suggest that response
distortion occurs in at least some applied settings and can have
deleterious effects when personality tests are used for selection.
We also note that many selection researchers and practitioners are
still concerned that faking is a potential limitation of self-report
personality measures. Thus, we think it is important for researchers
to continue to develop and evaluate methods for assessing personality that are more resistant to faking and response inflation in
general.
Response Inflation in Employment Interviews

Although response inflation on self-report personality measures
has been widely studied in recent years, researchers have virtually
ignored the prevalence of inflation in perhaps the most frequently
used selection measure, the employment interview. Several recent
studies have examined the use and effects of impression manage-
ment (IM) behaviors in interviews (e.g., Ellis, West, Ryan, &

DeShon, 2002; Kristof-Brown, Barrick, & Franke, 2002; McFarland, Ryan, & Kriska, 2003). Individuals attempting to inflate in an
interview are likely to use the various IM tactics identified in this
literature (e.g., self-promotion, entitlements, opinion conformity).
However, interview response inflation also involves manipulating
the content of responses to the interview questions. In a structured
interview, this means inflating how interviewees describe past
behavior in response to behavioral questions, or inflating how they
would behave in hypothetical scenarios in response to situational
questions. As noted, we were unable to locate any published
studies that have directly investigated this type of response inflation in an employment interview.
We compared the amount and effects of response inflation in a
personality-based structured interview to the amount and effects of
inflation on a self-report inventory designed to assess the same
personality factors. We believe that there are at least three reasons
why it may be more difficult for individuals to inflate their scores
in structured interviews than on the typical self-report personality
measure. Specifically, we maintain that inflation in interviews is
inhibited by (a) the cognitive demands placed on interviewees, (b)
the behavioral nature of the interview situation, and (c) the fact that
the interviewer (rather than the interviewee) determines the ratings
in an interview. We discuss each of these factors in turn.
The first reason we believe that interviews may be more resistant to response inflation than self-report personality inventories is
the greater information processing or cognitive load interviews
require. On the basis of prior research, we contend that response
inflation is a cognitively complex task and that the information
processing demands of employment interviews may result in less
inflation. Results of studies from a variety of areas within psychology indicate that the cognitive load on deceivers is high
because they must conceive of distortions that are consistent with
what the receiver of information might know (DePaulo, Stone, &
Lassiter, 1985; Levashina & Campion, 2003; Vrij, Semin, & Bull,
1996), while at the same time continue to interact with the receiver. We first discuss the cognitive load associated with interviews and self-report measures and then review the relevant literature from other research domains.
The cognitive load that structured interviews place on applicants
is high in that not only do they have to generate their own socially
desirable responses to interview questions without contradicting
known facts, but applicants must also monitor their verbal and
nonverbal behaviors so as not to cue the interviewer to the
inflation. Inflation in structured interviews requires interviewees to
make up or exaggerate their behavior in past job-related situations
or to determine the appropriate way to behave in hypothetical
situations relevant to the job of interest. Inflated responses to such
questions might also have to withstand follow-up or probing
questions. Time can also be a concern in that interviewees generally have only several seconds to prepare their response. Moreover, applicants also must continually interact with the interviewer
and monitor the interviewers verbal and nonverbal behaviors.
The above issues are much less relevant to completion of a
self-report personality measure. For one, most commercial measures of the Big Five factors tend to sample relatively broad (and
difficult to verify) behaviors and preferences that are often external
to the work context. Time is also less of an issue with self-report
measures. In fact, when completing personality instruments, ap-
539
plicants are allowed to skip questions or change their responses to

previously answered questions. Moreover, self-report instruments
are not typically completed in the direct presence of a hiring
official, and thus applicants do not need to worry about their
nonverbal behaviors or monitor the reactions of an assessor. Taken
together, response inflation on self-report personality inventories
appears to require substantially fewer cognitive resources than
inflation in a structured interview.
Research from the social psychology literature provides support
for the influence of cognitive load on behavior. An important
distinction made in this literature is that between automatic mental
processes and controlled mental processes (Bargh, 1984). Automatic processes take place outside of our awareness and do not
require conscious monitoring, whereas controlled processes demand high amounts of attentional energy. A mental task that
requires controlled cognitive processing (e.g., providing an inflated example of previous job-relevant behavior) can be disabled
if a similarly effortful mental task (e.g., monitoring the interviewers verbal and nonverbal behavior) is introduced (Bargh, 1984,
1994; Bargh & Thein, 1985; Gilbert & Osborne, 1989; Shriffin &
Schneider, 1977).
Studies have tested the influence of increased cognitive load by
adding a second task on top of a primary task. Available evidence suggests that heavy cognitive loads can hinder information
processing on other tasks as a result of the allocation of limited
mental resources (Gilbert & Osborne, 1989). A study by Pontari
and Schlenker (2000) is perhaps the most relevant to the current
investigation. These researchers found that increased cognitive
demands can minimize self-presentation behaviors in interviews.
They increased cognitive load in simulated interviews by asking
interviewees to remember an eight-digit number. Results indicated
that it was difficult for extroverted individuals to present themselves as introverted when they had the additional task of remembering the number during the interview. We should note, however,
that the interviews that Pontari and Schlenker used consisted of
more conventional questions (e.g., Describe your strengths and
weaknesses.). We believe that cognitive load may be even greater
(and response inflation more difficult) in structured interviews that
require applicants to describe their behavior in past or hypothetical
situations.
Research in applied psychology also highlights the importance
of cognitive load on behavior. For example, studies in the goalsetting literature have shown that performance on a primary task
can suffer when an individual is concerned with monitoring performance on that task (e.g., DeShon & Alexander, 1996). This is
likely due to the substantial cognitive demands required by the
monitoring process. McFarland et al. (2003) recently found that
the differing cognitive demands of selection methods can influence
applicant IM behaviors. They compared IM in a situational interview and a role-play exercise. Using cybernetic theory (Bozeman
& Kacmar, 1997) as a framework, the researchers hypothesized
that the role-play exercise would be more cognitively demanding
than the interview because applicants tend to be less familiar with
such exercises. They also noted that the nature of the role-play is
more demanding because applicants must act out and maintain a
particular role for an extended period. McFarland et al. predicted
that the increased cognitive demands would, in turn, inhibit the use
of IM tactics. Results supported this hypothesis in that applicants
not only engaged in less IM during the role-play, but those tactics
540
also appeared to be less effective than the tactics used in the

situational interview.
Overall, we suggest that the results of studies in the social and
applied psychology literatures converge to show that the demand
for, and allocation of, limited mental resources can greatly influence behavior. On the basis of this research, we suggest that the
cognitive demands of structured interviews are likely to inhibit
response inflation relative to self-report personality measures,
which appear to require far fewer cognitive resources. This is
because the presentation of accurate information is more automatic, whereas inflation requires more controlled information processing in that applicants must determine what they think the
interviewer wants to see and then attempt to behave accordingly.
The cognitive load associated with interview response inflation is
further increased by various factors inherent in the interview
context, including time constraints and the continual interaction
between applicant and interviewer. Thus, the nature of employment interviews would appear to inhibit the controlled information
processing that response inflation requires.
A second, related reason why structured interviews may be more
resistant to inflation than self-report personality inventories is the
behavioral nature of the interview situation. Unlike self-report
measures, applicants must actually demonstrate certain characteristics or behaviors of interest during an interview. For example,
individuals interviewing for a job that requires good interpersonal
or oral communication skills must behave in a way that convinces
the interviewer that they have such skills. In this sense, employment interviews are similar to work sample tests (Cascio & Philips,
1979; DuBois, Sackett, Zedeck, & Fogli, 1993) in that they sample
maximal performance relevant to important job-related behaviors
that interviewers can observe and evaluate. We suggest that applicants may have a difficult time inflating interview performance
related to such skills.
In contrast, applicants do not have to present observable samples
of behavior when completing a self-report personality measure.
Response inflation on a typical self-report instrument simply requires applicants to select the high or low end of a Likert-type
scale, whichever they think will make them look the best for the
job. Such measures do not require applicants to demonstrate how
a given trait (e.g., agreeableness) manifests itself in their behavior
(e.g., working well with others). Thus, inflating responses on a
self-report measure would seem to be a much easier task than the
behavioral manipulation required to inflate in a structured
interview.
A final reason we believe that response inflation may be more
difficult in structured interviews than on self-report personality
measures is that applicants have less control over interview outcomes. That is, applicants provide the ratings on a self-report
instrument, whereas interviewers provide the ratings in an interview. Although applicants can influence the ratings they receive in
an interview, the outcome is ultimately under the control of another individual. This would seem to further attenuate the effect of
response inflation. For example, interviewers are information processors. As such, even if applicants inflate their responses, interviewers may be able to detect some of this inflation and adjust
their ratings accordingly. Research has shown, for example, that
deception is associated with more speech disturbances, a higher
pitched voice, and longer response latencies (Vrij et al., 1996).
There is also evidence that compared with honest individuals,
people who are lying tend to provide longer answers to questions

(DePaulo et al., 2003) and use their hands and arms to supplement
what they say (Vrij, Edward, Roberts, & Bull, 2000). We should
note, however, that there is also evidence to suggest that some
people have difficulty detecting deception (e.g., they focus on the
wrong cues; Mann, Vrij, & Bull, 2004). Nevertheless, even if some
interviewers are not very effective at detecting response inflation,
interviewees may perceive that they can and, in turn, choose not to
inflate their responses.
Thus, there are several reasons why the nature of structured
interviews may inhibit response inflation. We believe that inflation
is more difficult in interviews than on self-report personality
measures because of the cognitive and behavioral demands of the
interview situation. Furthermore, because interviewers provide the
ratings in an interview, response inflation is likely to have less of
an effect than on a self-report inventory in which applicants have
greater control over their responses. Taken together, we hypothesized the following:
The standardized difference between ratings of individuals instructed to respond honestly and individuals instructed to respond
like a job applicant on a personality-based interview will be less
than the difference between ratings of individuals instructed to
respond honestly and those instructed to respond like an applicant
on a self-report personality measure.
We were also interested in how instructions to respond like an
applicant would influence the underlying structure of self-report
and interview-based personality ratings. As discussed, we know of
no published studies that have examined how response inflation
affects the psychometric characteristics (e.g., factor structure) of
selection interview ratings. However, several studies have compared the factor structure of self-report personality measures in
settings where differences in motivation to inflate would seem
likely to exist (i.e., in applicant and nonapplicant samples). Some
research has found that the factor structure of the Big Five does not
hold up in applicant samples, or at least is not as clean as the
structure of nonapplicant data (e.g., Cellar, Miller, Doverspike, &
Klawsky, 1996; Schmit & Ryan, 1993; M. A. Smith et al., 2001).
A consistent finding in these studies is that six distinguishable
factors emerge in the applicant data (although the composition of
this sixth factor varies somewhat by study). Some researchers have
interpreted this sixth dimension as an ideal employee factor
resulting from applicant response inflation on the personality factor (or factors) most relevant to the job of interest.
The results of other studies suggest that response inflation (to
the extent it occurs in applicant settings) does not influence the
underlying structure of self-report personality measures. For instance, M. A. Smith et al. (2001) compared the factor structure of
student, applicant, and job incumbent responses on the Hogan
Personality Inventory. Multigroup confirmatory factor analysis
(CFA) supported a 5-factor model in all three samples. In fact, the
goodness-of-fit indices in each sample were practically identical,
which suggests that the factor structure was invariant across the
three groups. Ellingson, Smith, and Sackett (2001) discovered
similar results when they compared the structure of data from
applicants with high and low social desirability scores. Multigroup CFA yielded remarkably similar models for the high and
low groups on four personality measures: the Hogan Personality
Inventory, the Assessment of Background and Life Experiences,
the California Psychological Inventory, and the Sixteen Personality Factor Questionnaire.
Thus, the extent to which response inflation influences the
underlying structure of personality measures is unclear. Given the
previously discussed reasons why structured interview ratings may
inhibit response inflation, the structure of a personality-based
interview may be more resistant to instructions to respond like a
job applicant than the structure of a self-report personality measure. However, given the lack of prior research, as well as the
inconsistent findings from the above personality studies, we did
not make any specific predictions, but rather posed the following
research question:
What is the relative influence of instructions to respond like a
job applicant on the factor structure of interview-based and selfreport of personality ratings?
Contributions of the Current Study

This study addresses two critical gaps in the selection literature.
First, it responds to recent calls for research on employment
interviews specifically designed to assess personality factors such
as the Big Five (e.g., Barrick et al., 2000; Huffcutt, Conway, et al.,
2001; Kristof, 2000). Despite recent interest in personality
interview relations, surprisingly few studies have systematically
examined the construct-related validity of interviews specifically
designed to assess personality. Given that organizations appear to
use employment interviews to assess personality-related job dimensions (Huffcutt, Conway et al., 2001), it is critical to determine
how effectively interviews measure such dimensions.
Second, we are unaware of any other studies that have directly
examined response inflation within a structured employment interview. Determining the extent to which interviews are vulnerable
to inflation is important for several reasons. For example, estimating the amount of inflation in interviews could increase understanding of the construct- and criterion-related validity of employment interviews. This is also the first known study to compare
response inflation in interview and self-report measures designed
to assess the same personality factors. The results of this research
may provide information to help selection researchers and practitioners decide whether to assess personality in an interview, with
a self-report measure, or with some combination of the two methods. For these reasons, we believe that this study makes an important contribution to the existing literature.
Method
Participants
The interviewees for this study were 143 advanced undergraduates and
graduate students. The typical interviewee was 23 years old and at least a
junior in college. Every participant had been through at least one formal
selection interview (M 3.4 interviews). In addition, 62% of the students
had at least 1 year of full- or part-time customer service experience, and
35% of students had 3 years or more of such work experience. This level
of experience is noteworthy because we developed a simulated selection
situation for the job of grocery store customer service manager (CSM).
Experienced interviewers from area organizations (N 52) administered the interviews. Each interviewer conducted an average of five interviews (SD 1.94). Interviewers represented a wide variety of industries
(e.g., automotive, retail, e-commerce) and positions (e.g., recruiter, manager, vice president). All participants had prior interviewing experience
541
(M 11 years), and most (71%) had taken at least one formal interviewer
training course.
Design
All interviewees were administered (in random order) a structured
interview and select facet scales from the NEO Personality Inventory
(Costa & McCrae, 1992). Half of the participants were asked to respond
honestly to the interview and NEO questions, and half were instructed to
respond like a motivated applicant for the CSM position on both
measures.1
Recently, some researchers (e.g., Hough & Ones, 2001; Robie et al.,
2001; Smith & Ellingson, 2002) have concluded that faking in experimental studies does not approximate the faking behavior of actual applicants.
They contend that directed faking studies only indicate the maximum
degree to which a measure can be faked, and provide limited information
about the extent to which faking occurs in applied settings, or how it
influences the validity of personality measures.
As with most types of experimental selection research, it is difficult to
replicate the motivation and thought processes of real applicants. This is
unfortunate because it is very difficult to study phenomena such as response inflation in an operational setting. However, we believe that several
design features of the current study enhance the generalizability of results
compared with those of previous experimental studies. For one, interviewees were advanced undergraduates and graduate students who had both
previous interview and relevant work experience. In addition, most students participated in this study to prepare for the postgraduation job search
process, and thus appeared to take the interviews quite seriously (e.g.,
dressed up for the interviews, appeared nervous, requested feedback on
their performance). In addition, experienced interviewers administered the
interviews in a professional setting (e.g., interviews were conducted in
actual interview rooms in the university career center).
Perhaps the most notable reason for the enhanced generalizability of the
current study is the experimental manipulation. First, before the interview,
participants reviewed a detailed description of the CSM job. Then, rather
than simply asking participants to fake good as in the typical faking
study, interviewees in the applicant-like condition were asked to respond to
the interview and NEO like an applicant who was highly interested in this
specific position. Not only did this increase the realism of the study, there
was also an empirical basis for using these instructions. Specifically, there
is increasing evidence that asking individuals to respond toward the requirements of a specific job is more representative of applicant behavior
than is simply asking individuals to fake good (Burnkrant, 2001; Kluger &
Collela, 1993; Miller & Tristan, 2002; Shilobod & Raymark, 2003).
Finally, our goal was to determine, under controlled conditions, whether
applicant-like instructions would produce differential effects across interview and self-report scores of the same personality dimensions. Thus, even
if response inflation in an experimental setting is more severe than in an
applicant setting, the results of this study provide important evidence
regarding the relative effect of applicant-like instructions on these two
types of measures. We do not attempt to estimate the effect sizes that would
be expected in an operational setting.
1
Within each condition, participants were administered either a behavioral interview or a situational interview. Thus, the design underlying this
study was actually a 2 (honest vs. applicant-like instructions) 2 (behavioral vs. situational interview) between-subjects design. The two sets of
interview questions were identical with the exception of verb tense (i.e.,
What did you do? vs. What would you do?). In fact, the means,
standard deviations, reliability estimates, and construct-related validity
evidence of the behavioral and situational interviews were highly similar.
As a result, the data from the two interviews were combined.
542
Measures
Procedure
Personality-based interview. The debate continues among selection

researchers about the relative value of broad versus narrow personality
traits for making personnel decisions (e.g., Hough & Furnham, 2003; Ones
& Viswesvaran, 1996; Paunonen, Rothstein, & Jackson, 1999; Schneider,
Hough, & Dunnette, 1996). The decision about whether to focus on broad
or narrow personality factors in the current study was based on the results
of a structured job analysis for the CSM position at a large grocery
organization. The analysis indicated that three FFM facets that the NEO
measures are particularly important for success in the CSM position:
Altruism (agreeableness), Self-Discipline (conscientiousness), and Vulnerability (emotional stability). Altruism is the extent to which an individual
is generous, considerate, and willing to help others. Self-discipline is the
degree to which an individual is motivated to get the job done and follows
through on tasks despite boredom and distractions. Vulnerability is the
extent to which an individual is able to cope with stressful events. The
importance of these personality variables in jobs that require extensive
interpersonal interaction is consistent with the findings of previous studies
(e.g., Frei & McDaniel, 1998; Mount, Barrick, & Stewart, 1998).
Once the critical personality dimensions of the CSM job were identified, we developed several interview items to assess each dimension.
After an initial bank of questions was created, we reviewed the items
several times, revised and discarded many of the original items, and
developed several new ones. Next, 15 advanced graduate students from
a doctoral program in industrial organizational psychology provided
structured evaluations of 24 interview items. The analysis revealed 18
items on which there was substantial agreement about the personality
factor measured and the quality of that assessment. For example, one of
the final Vulnerability questions was We all have had times when the
pressure at work is extremely high. Tell me about a time when you had
several competing deadlines or had a very important project you were
counted on to complete successfully. Describe how you felt. How did
you deal with the situation?
We then developed rating scales with behavioral anchors for the final set
of interview items. For each item, a list of good, marginal, and poor
potential responses was developed. We reviewed and modified this initial
set of rating anchors several times before asking another group of I-O
graduate students to evaluate the responses. The students rated the quality
of each response on a 5-point scale where 1 extremely poor response and
5 extremely good response. Responses on which there was substantial
agreement were then included in the rating scales as example answers to
provide standard criteria for rating interviewee responses.
Finally, we created an interview guide that consisted of nine questions,
three items to assess each of the three personality facets critical to the CSM
position. For each question, a 5-point scale was provided with behavioral
anchors that represented very poor (1), satisfactory (3), and very good (5)
performance. Interviewers were also asked to evaluate the overall performance of the interviewee (i.e., hereafter referred to as global interview
ratings) with a 3-item scale adapted from Stevens and Kristof (1995).
Interviewers rated, on a 7-point scale, (a) how well the applicant did in the
interview, (b) how likely they would be to offer the applicant a second
interview, and (c) how attractive the applicant is as a potential employee of
their organization.
Self-report personality measure. The personality of interviewees was
also assessed with the Altruism, Self-Discipline, and Vulnerability facet
scales of the revised NEO Personality Inventory (NEO PI-R; Costa &
McCrae, 1992). Each subscale consists of 8 items rated on a 5-point
scale that ranges from strongly disagree (1) to strongly agree (5).
Interviewees completed the self-report form of the NEO, whereas
interviewers completed an observer version of the NEO that contained
the same 24 items. Reliability estimates for all self-report and interviewer ratings were at least .70.
Each session of the study began with a 1-hr interviewer orientation led
by Chad H. Van Iddekinge. During this time, the general purpose of the
study was described and interviewers were instructed how to (a) administer
the interview, (b) use the interview guide, and (c) make evaluations using
the behavioral rating scales. The orientation concluded with a brief review
of the critical aspects of the study and an opportunity to ask questions. It
should be noted, however, that interviewers were blind to the specific
purpose of the study.
Interviews were conducted in interview rooms at the campus career
center. The orientation leader greeted each interviewee when they arrived
and escorted them to an interview room where they were asked to read a
brief job description of the CSM position for which they were interviewing.
Next, interviewees were read the instructions for the NEO (or interview)
according to the condition to which they were assigned (i.e., respond
honestly or like an applicant). Interviewees in the honest condition were
told that their interview and NEO responses would be kept completely
confidential and that the answers they provided should be as truthful as
possible. Interviewees in the applicant-like condition were asked to respond like an applicant who really wanted to become a CSM. Consistent
with past research (e.g., McFarland & Ryan, 2000; Mueller-Hanson et al.,
2003), interviewees in this condition were also told that top scoring
participants would be rewarded (i.e., given $50), providing them additional
motivation to perform.
Panels of two interviewers conducted each interview. Within each interview, the order of the nine interview items was randomized across
participants. Interviewers took turns asking questions and independently
rated each question on the 5-point scale immediately after the response.
This was done to minimize the general impression effects that have been
shown to influence ratings made once the entire interview is completed
(Webster, 1982). After the interview, interviewers returned to the orientation room to evaluate the overall performance of the interviewee (using the
three global ratings) and rate his or her personality with the NEO. The
interviewee remained in the interview room to complete a poststudy
questionnaire or to take the NEO (followed by the questionnaire) if the
interview was administered first. Lastly, participants were given an opportunity to ask questions and thanked for their time.
Results
Manipulation Check
In the poststudy questionnaire, participants were asked to recall
both the position they were applying for and the instructions they
were given prior to completing the interview and NEO. Over 95%
of participants correctly indicated that they were completing the
measures for the CSM position, and 98% recalled the instructions
they were given. Participants were also asked six questions about
the degree to which they attempted to inflate their responses in the
interview (three items, .89) and on the NEO (three items,
.88). Each item (e.g., My test responses were completely honest.) was rated on a 5-point scale that ranged from strongly
disagree (1) to strongly agree (5). The mean of the three interview
items was 4.63 (SD 0.58) in the honest condition and 3.04
(SD 0.98) in the applicant-like condition (higher ratings indicating greater honesty). A similar pattern of results was found on
the three NEO items whereby the mean in the honest group was
4.60 (SD 0.55) and the mean in the applicant-like group was
2.79 (SD 0.93). Taken together, participants in the applicant-like
condition reported attempting to inflate their interview and NEO
responses to a significantly greater extent ( p .01) than did
participants in the honest condition.
Descriptive Statistics
Table 1 displays descriptive statistics and reliability estimates
for interview ratings in the honest and applicant-like conditions.
All dimension means were higher in the applicant-like condition
than in the honest condition, whereas the standard deviations and
internal consistency reliability estimates were smaller in the
applicant-like condition. In contrast, the interclass correlation coefficients (ICCs; McGraw & Wong, 1996) between the two interviewers in each interview were similar in the two conditions.
The descriptives and internal consistency reliability estimates
for self-report and interviewer NEO ratings are shown in Table 2.
As with the interview ratings, all dimension means were higher in
the applicant-like condition than in the honest condition, whereas
the standard deviations and reliability estimates were consistently
smaller. Once again, the interrater reliability estimates were similar in the two conditions. We also note that the mean interview
and self-report NEO ratings in the honest condition were almost
identical (Ms 4.02 and 4.01, respectively). Thus, any observed
differences in response inflation cannot be attributed to a leniency
effect often observed in interview ratings.
Construct-Related Validity of Personality-Based Interview

Ratings
Before assessing the effects of response inflation, we wanted to
determine whether there was evidence that interviewer ratings
reflected the intended personality constructs. The first step in
examining the construct-related validity of a measure is to estimate
its reliability (Cronbach & Meehl, 1955). We also conducted a
multitraitmultimethod (MTMM; Campbell & Fiske, 1959) analysis to evaluate the convergent and discriminant validity of the
interviews and used CFA to examine the underlying structure of
Table 1
Descriptive Statistics and Reliability Estimates for Structured
Interview Ratings in the Honest and Applicant-Like Conditions
Variable/condition
Altruism
Honesta
Applicant-likeb
Self-discipline
Honest
Applicant-like
Vulnerability
Honest
Applicant-like
Mean
Honest
Applicant-like
Global ratings
Honest
Applicant-like
SD
ICC
3.93
4.27
0.76
0.55
.71
.55
.88
.75
4.13
4.36
0.73
0.58
.68
.59
.78
.79
3.99
4.32
0.64
0.48
.62
.53
.64
.67
4.02
4.31
0.55
0.43
.67
.56
.86
.80
5.32
5.90
1.28
1.00
.95
.96
.82
.79
Note. Reliability estimates for the mean interview ratings represent the
mean alpha across the three personality dimensions. Structured interview
ratings were made on a 15 scale, whereas global interview ratings were
made on a 17 scale. internal consistency reliability estimate (alpha).
ICC intraclass correlation coefficient (C,2) between the ratings of the
two interviewers in each interview.
a
n 73. b n 70.
543
Table 2
Descriptive Statistics and Reliability Estimates for Interviewer
and Self-Report NEO Ratings in the Honest and Applicant-Like
Conditions
Variable/condition
Interviewer NEO ratings
Altruism
Honesta
Applicant-likeb
Self-Discipline
Honest
Applicant-like
Vulnerability
Honest
Applicant-like
Mean
Honest
Applicant-like
Self-report NEO ratings
Altruism
Honest
Applicant-like
Self-Discipline
Honest
Applicant-like
Vulnerability
Honest
Applicant-like
Mean
Honest
Applicant-like
SD
ICC
3.67
3.75
0.51
0.46
.84
.85
.71
.73
4.02
4.05
0.51
0.50
.89
.88
.62
.67
3.92
4.05
0.46
0.41
.85
.85
.51
.69
3.87
3.95
0.42
0.37
.91
.92
.63
.72
4.18
4.58
0.51
0.39
.81
.74
3.95
4.78
0.62
0.29
.85
.68
3.91
4.77
0.52
0.28
.80
.77
4.01
4.71
0.40
0.26
.82
.73
Note. internal consistency reliability estimate (alpha). ICC intraclass correlation coefficient (C,2) between the ratings of the two interviewers in each interview.
a
n 73. b n 70.
the ratings. We report results from both the honest and applicantlike conditions to compare the extent to which instructions to
inflate influenced the construct-related validity of interviewer and
self-report personality ratings.
With regard to interrater reliability, the ICCs (C,2) for Altruism,
Self-Discipline, and Vulnerability (respectively) were .88, .78, and
.58 in the honest condition and .75, .79, and .67 in the applicantlike condition. The ICCs for the mean ratings of the two interviewers in each interview were .86 and .80, respectively, in the two
conditions. These values compare favorably to current metaanalytic interrater reliability estimates for employment interviews
of similar structure. For example, Conway, Jako, and Goodman
(1995) reported a mean interrater reliability of .75 (k 33, n
3,428) for interviews with standardized questions and follow-up
probing.
The internal consistency reliability estimates for the three interview dimensions were .71, .68, and .62 in the honest condition and
.55, .59, and .53 in the applicant-like condition. These estimates
were notably lower than the estimates for the corresponding selfreport NEO ratings (mean .87). However, interview reliabilities were based on only three items, whereas NEO reliabilities
were based on eight items. For comparison, the internal consistency reliability of an eight-item interview was estimated with the
Spearman-Brown prophecy formula. The corrected coefficients
alpha for Altruism, Self-Discipline, and Vulnerability (honest con-
544
dition only) were .87, .85, and .81 (M .84), which are similar to
the corresponding NEO reliability estimates.
We then conducted two MTMM analyses. The first analysis was
performed on the mean dimension ratings of the two interviewers
in each interview. Interviewer was treated as the method factor in
this analysis, and the three personality dimensions represented the
trait factors. We randomly designated the two interviewers in each
interview as Interviewer 1 and Interviewer 2 and computed Pearson correlations between the ratings of the two sets of ratings.
Correlations were then transformed to z-scores, averaged, and
converted back to the corresponding correlation coefficients.
The MTMM matrix for this analysis is presented in Table 3. All
convergent validity estimates were statistically significant ( p
.01). In the honest condition, the mean monotrait heteromethod
correlation (i.e., mean correlation within dimension and between
interviewers) was .66, and the mean heterotraitmonomethod correlation (i.e., mean correlation across dimensions and within interviewer) was .38. In the applicant-like condition, the mean
monotrait heteromethod correlation was lower (.60), whereas
the mean heterotraitmonomethod correlation was higher (.45). A
t test of the mean z-scores indicated that the mean monotrait
heteromethod correlation was significantly larger than the mean
heterotraitmonomethod correlation in the honest condition, but
not in the applicant-like condition.
The second MTMM analysis examined relations between interviewer ratings of the interview questions and the NEO items. Thus,
measure (interview vs. NEO) served as the method factor in this
analysis. Table 4 displays the correlation matrix for this analysis,
which also includes correlations between interviewer and selfreport NEO ratings. In the honest condition, the mean monotrait
heteromethod correlation (i.e., mean correlation within dimension
and across measures) was .59 and the mean heterotrait
monomethod correlation (i.e., mean correlation across dimensions
and within measure) was .50. Similar results were found in the
applicant-like condition, for which the mean monotrait
heteromethod correlation was .63 and the mean heterotrait
monomethod correlation was .52. The fact that the mean
monotrait heteromethod correlations were larger than the mean
heterotraitmonomethod correlations provides evidence for the
construct-related validity of the interview ratings (Campbell &
Fiske, 1959). However, the difference between the two correla-
Table 3
MultitraitMultimethod Matrix of the Structured Interview
Ratings in the Honest and Applicant-Like Conditions
Variable
Interviewer 1 ratings
1. Altruism
2. Self-Discipline
3. Vulnerability
Interviewer 2 ratings
4. Altruism
5. Self-Discipline
6. Vulnerability
.43**
.29*
.47**
.31**
.32**
.65**
.61**
.17
.10
.42**
.66**
.39**
.41**
.56**
.51**
.79**
.25*
.35**
.40**
.67**
.53**
.27*
.24*
.48**
.43**
.45**
.41**
.36**
.22*
.57**
Note. Honest condition correlations are below the diagonal, and

applicant-like condition correlations are above the diagonal. Convergent
correlations are in bold.
* p .05. ** p .01.
tions was not statistically significant in either condition. The main

reason for this was the substantial relationship between NEO
ratings of Self-Discipline and Vulnerability (r .80 in the honest
condition and .82 in the applicant-like condition). The mean
heterotraitmonomethod correlations for the other two dimensions
were about half the size (r .42 and .43 in the two conditions).
Although we limited the construct-related validity analyses to
interviewer personality ratings, we briefly note relations between
the interviewer and self-report ratings shown in Table 4. First,
consistent with prior research (e.g., Barrick et al., 2000), selfreport and interviewer personality ratings were only modestly
correlated, and there was limited evidence of discriminant validity.
Second, the magnitude of these relations differed by condition.
Specifically, the mean monotrait heteromethod correlations between self-report NEO ratings and interviewer ratings of the interview questions and NEO items were 23 times larger in the
honest condition (.24 and .32, respectively) than in the applicantlike condition (.12 and .11, respectively). This is interesting because if all participants inflated their responses to the same extent
on both the interview and NEO, self-interviewer correlations in the
two conditions should be similar. These correlations would change
only if individuals inflated their responses to varying degrees on
the two measures. The smaller mean differences between interview
ratings in the honest and applicant-like conditions relative to such
differences for self-report NEO scores (to be discussed later)
suggest that some, but not all, participants inflated their NEO
responses when instructed, whereas fewer individuals inflated their
interview responses.
Next, we used CFA to determine the degree to which interviewer ratings reflected the three personality dimensions the interviews were designed to measure. We also used these analyses to
address our research question regarding whether instructions to
respond like a job applicant would have a differential effect on the
underlying structure of interviewer and self-report personality ratings. All analyses were conducted using LISREL 8.3 (Joreskog &
Sorbom, 1996) with a maximum likelihood method of estimation.
We began by fitting separate models for the structured interview
ratings and interviewer NEO ratings using the averaged item
ratings from the two interviewers in each interview. We chose to
analyze the interview and NEO ratings separately (rather than
within the same model) because the honestapplicant-like effect
sizes were different in the two sets of ratings (discussed below),
and thus we had reason to believe that a different pattern of results
might emerge. Further, we believe it is unlikely that organizations
would incorporate both types of measures in an interview process.
Given this, we thought it would be useful to report the results
separately so that readers could see how the applicant-like instructions influenced the two sets of ratings.
To analyze the interview data, we used the ratings averaged
between the two interviewers who conducted each interview because we thought that the results would be more generalizable, as
panel interview ratings are typically aggregated before making
selection decisions. Moreover, the interrater reliability estimates
were quite high, so it seemed reasonable to average the ratings
from the two interviewers in each interview. Using the averaged
ratings also increased the subject-to-variable ratio on which the
results were based.
The fit statistics for these models are shown in Table 5. In
addition to the chi-square statistic, we report the goodness-of-fit
545
Table 4
MultitraitMultimethod Matrix of Interviewer and Self-Report Personality Ratings in the Honest and Applicant-Like Conditions
Variable
Structured interview ratings
1. Altruism
2. Self-Discipline
3. Vulnerability
4. Altruism
5. Self-Discipline
6. Vulnerability
7. Altruism
8. Self-Discipline
9. Vulnerability
.43**
.36**
.46**
.38**
.31**
.63**
.66**
.31**
.22*
.33**
.66**
.55**
.28*
.54**
.55**
.15
.00
.01
.12
.11
.16
.07
.08
.10
.63**
.34**
.39**
.44**
.58**
.59**
.37**
.51**
.55**
.44**
.47**
.37**
.80**
.32**
.82**
.10
.04
.08
.26*
.15
.17
.22*
.02
.07
.33**
.14
.01
.37**
.29*
.02
.10
.32**
.10
.43**
.25*
.05
.18
.32**
.18
.24*
.31**
.20*
.24*
.09
.46**
.54**
.29**
.72**
Note. Honest condition correlations are below the diagonal, and applicant-like condition correlations are above the diagonal. Convergent correlations are
in bold.
* p .05. ** p .01.
two-factor model in each condition by fixing the correlation between the Self-Discipline and Vulnerability factors to 1.0. This
model demonstrated a good fit to the data in the applicant-like
condition, and eliminating the third factor did not harm model fit,
2(2, N 70) 2.49, ns. In contrast, the fit of the two-factor
model was significantly worse than the three-factor model in the
honest condition, 2(2, N 73) 22.97, p .001.
For interviewer NEO ratings, the fit of the three-factor model
was almost identical in the honest and applicant-like conditions.
Although model fit was not great in either condition, all factor
loadings were large and statistically significant, the interfactor
correlations were modest, and the suggested modifications would
index, the adjusted goodness-of-fit index, the comparative fit index, and the root-mean-square error of approximation (RMSEA).
Goodness-of-fit index, adjusted goodness-of-fit index, and comparative fit index values greater than .90 are generally considered
acceptable, whereas values of .95 or above indicate an excellent fit
to the data. RMSEA values around .05 indicate a good fit for the
model (Browne & Cudeck, 1993; Hu & Bentler, 1998, 1999).
Analysis of the structured interview ratings revealed a decent fit
for the three-factor model in the honest condition. Although the fit
statistics for this model were actually better in the applicant-like
condition, the correlation between the Self-Discipline and Vulnerability factors exceeded unity. Thus, we assessed the fit of a
Table 5
Fit Statistics for Models of Interviewer and Self-Report Personality Ratings
Personality ratings/model
Structured interview ratings
3-factor model
Honest condition
Applicant-like condition
2-factor model
Honest condition
3-factor model
Honest condition
2-factor model
Honest condition
3-factor model
Honest condition
2-factor model
Honest condition
df
GFI
AGFI
CFI
RMSEA
43.07
30.99
24
24
.90
.91
.81
.84
.90
.94
.09
.06
66.04
33.48
26
26
.83
.91
.70
.84
.80
.93
.15
.06
393.29
393.34
249
249
.73
.70
.68
.64
.87
.85
.06
.08
428.75
402.54
251
251
.70
.70
.64
.64
.84
.81
.08
.09
402.17
575.04
249
249
.69
.63
.63
.55
.75
.50
.09
.12
454.41
578.76
251
251
.66
.63
.59
.56
.67
.50
.11
.12
Note. All chi-square statistics are significant ( p .05) with the exception of the three- and two-factor models
of the structured interview ratings in the applicant-like condition. df degrees of freedom. GFI goodnessof-fit index. AGFI adjusted goodness-of-fit index. CFI comparative fit index. RMSEA root-mean-square
error of approximation.
546
not notably enhance the fit of the model. Nonetheless, we performed an exploratory factor analysis to identify potential alternative models. As with the structured interview ratings, a two-factor
model appeared to have some promise. However, the fit of this
model was significantly worse than the three-factor model in both
the honest condition, 2(2, N 73) 35.46, p .001, and in the
applicant-like condition, 2(2, N 70) 9.20, p .05.
We then evaluated the underlying structure of the self-report
NEO ratings. In the honest condition, results suggested that the
three-factor model did not fit the data very well. However, the
three factors were relatively independent, all factor loadings were
significant and greater than .50, and the suggested modifications
would provide only a minimal improvement in model fit. This
finding is consistent with prior NEO research (e.g., McCrae,
Zonderman, Costa, & Bond, 1996) in which self-report scores
appear to represent that intended personality factors, yet the associated CFA statistics do not indicate a good fit. The fit statistics,
however, were appreciably better in the honest condition than in
the applicant-like condition. In addition, several factor loadings in
the applicant-like condition were small and nonsignificant, and the
interfactor correlations were much larger. The correlation of .93
between the Self-Discipline and Vulnerability factors suggested
that a two-factor model might better explain these data. We therefore evaluated the fit of the two-factor model. In the applicant-like
condition, the fit indices for this model were practically identical
to those of the three-factor model, whereas the two-factor model fit
much worse in the honest condition. Indeed, the change in chisquare (from the two- to three-factor model) was significant in
honest condition, 2(2, N 73) 52.24, p .001, but not in
applicant-like condition, 2(2, N 73) 3.72, ns.
To summarize, results of the analyses described above provide
some evidence that structured interviews can be developed to
assess facets of the FFM in a simulated selection setting. Results
also suggest that instructions to respond like a job applicant had a
negative effect on the construct-related validity of both interviewer
and self-report personality ratings. Analysis of the structured interview ratings and self-report NEO ratings yielded similar results
in that the a priori model fit the data best in the honest condition,
whereas a two-factor model appeared to be the most parsimonious
model in the applicant-like condition. In contrast, analysis of
interviewer NEO ratings indicated that the a priori model demonstrated the best fit in both conditions.
Influence of Applicant-Like Instructions on Interviewer

and Self-Report Ratings
The primary goal of this study was to examine the extent to
which a personality-based structured interview was vulnerable to
response inflation relative to a typical self-report personality measure of the same dimensions. We hypothesized that the cognitive
and behavioral demands of the interview situation would inhibit
response inflation. To this end, we compared the difference between interviewer ratings of individuals from the honest and
applicant-like conditions to the difference between self-report
NEO scores from the two conditions. The standardized group
difference (d) is typically calculated by dividing the mean difference between the two relevant groups (e.g., honest and faking
conditions) by the average, sample-size weighted standard deviation of the groups. However, the standard deviations for both
interviewer and self-report personality ratings were notably larger

in the honest condition than in the applicant-like condition. Thus,
we used only the standard deviation from the honest group to
compute the d statistics.2
Table 6 displays the d statistics for the three sets of personality
ratings. Also shown are the ds corrected for unreliability (using the
internal consistency estimates from the honest condition) and their
associated standard errors and 95% confidence intervals. For the
structured interview ratings, participants in the applicant-like condition received higher ratings than did participants in the honest
condition across all nine interview questions. For five of the nine
interview items, the mean ratings from the applicant-like condition
were significantly higher than the ratings from the honest condition. Observed effect sizes for the Altruism, Self-Discipline, and
Vulnerability scales were 0.44, 0.31, and 0.51, respectively. The d
for the mean ratings across the nine interview questions was 0.54,
and the d for the mean global interview ratings was 0.43. Effect
sizes between honest and applicant-like scores were much smaller
for interviewer NEO ratings for which only ratings of Vulnerability were significantly higher in the applicant-like condition. Observed d values were 0.15, 0.07, and 0.30 for Altruism, SelfDiscipline, and Vulnerability. The mean d across the 24 NEO
ratings was 0.20.
For self-report NEO scores, participants in the applicant-like
condition had significantly higher scores than participants in the
honest condition across the three personality dimensions and overall. Large effect sizes between honest and applicant-like scores
were observed for self-reported Altruism (0.79), Self-Discipline
(1.67), and Vulnerability (1.34). The mean d across the 24 NEO
items (1.74) was about three times larger than the mean honest
applicant-like effect size for the structured interview ratings and
about nine times larger than the mean effect size for interviewer
NEO ratings.
To test whether the self-report d statistics were significantly
larger than the effect sizes for interviewer personality ratings, we
transformed interview and NEO d statistics to zero-order correlations, then to z-scores, which, in turn, were tested for significant
differences (Cohen & Cohen, 1983). Results revealed that the
difference between self-report scores from the applicant-like and
honest conditions was significantly larger ( p .01) than the
difference between interviewer ratings (both structured interview
and NEO ratings) from the two conditions across the three personality dimensions and the overall mean. We also calculated the
standard error of the d statistics and the 95% confidence intervals
around the ds. With the exception of structured interview ratings of
Altruism, the confidence intervals around the mean interview ds
did not overlap with the confidence intervals around the mean
self-report ds. This overall pattern of results provides strong support for the study hypothesis.
Discussion
Summary of Main Findings
The objective of this study was to compare the effects of
response inflation in a structured interview and a self-report mea2
We thank an anonymous reviewer for suggesting this approach.
Table 6
Effect Size Differences Between the Applicant-Like and Honest
Conditions for Interviewer and Self-Report Personality Ratings
Variable
Altruism ratings
Structured interview
Interviewer NEO
Self-report NEO
Self-Discipline ratings
Interviewer NEO
Self-report NEO
Vulnerability ratings
Interviewer NEO
Self-report NEO
Mean ratings
Global interview
Interviewer NEO
Self-report NEO
dc
SE
95% CI
0.44*
0.15
0.79**
0.52**
0.16
0.88**
0.17
0.17
0.18
0.190.85
0.170.49
0.531.23
0.31*
0.07
1.67**
0.38*
0.07
1.87**
0.17
0.17
0.20
0.050.71
0.260.40
1.482.26
0.51**
0.30*
1.34**
0.65**
0.32*
1.45**
0.17
0.17
0.19
0.320.98
0.010.65
1.081.82
0.54**
0.43**
0.20
1.74**
0.66**
0.46**
0.21
1.92**
0.17
0.17
0.17
0.20
0.330.99
0.130.79
0.120.54
1.532.31
Note. d standardized mean difference between ratings from the

applicant-like and honest conditions. dc d corrected for unreliability
using the internal consistency reliability estimate of the relevant dimension
from the honest condition. SE standard error of dc. 95% CI confidence
interval of dc. Asterisks indicate that ratings from the applicant-like condition are significantly higher than ratings from the honest condition (* p
.05, ** p .01).
sure designed to assess the same personality factors. However,

before advocating structured interviews for personality assessment, it must be shown that interviewers are capable of making
valid and reliable personality judgments. Multiple methods were
used to assess the construct-related validity of the personalitybased interview we developed. Internal consistency and interrater
reliability estimates were higher than the current meta-analytic
estimates of interviews of similar structure. MTMM analyses
revealed good convergent and discriminant evidence between interviewers and, to a lesser extent, between interviewer ratings of
the interview questions and NEO items. In addition, CFA of the
structured interview ratings indicated an acceptable fit for the
three-factor trait model in the honest condition.
The construct-related validity evidence for this interview was
much better than the evidence reported in other recent interview
studies (e.g., Conway & Peneno, 1998; Huffcutt, Weekley, et al.,
2001; Van Iddekinge, Raymark, Eidson, & Attenweiler, 2004).
Several factors could have contributed to this result. For instance,
we went to great lengths to ensure that the interview items tapped
the intended personality dimensions. Specifically, we used a structured validation process in which raters evaluated the correspondence between interview items and the personality dimensions
they were designed to assess. Interviewers were also provided with
behavioral rating scales (unique to each question) and were required to evaluate each question immediately after the interviewee
responded to minimize the general impression effects that can
occur when interviewers assign their ratings after the interview.
Finally, recent studies in the assessment center literature suggest
that the construct-related validity of assessor ratings is better when
a limited number of conceptually distinct dimensions are assessed
(Lievens & Conway, 2001; Woehr & Arthur, 2003). In contrast to
547
previous studies that have not found evidence of construct-related

validity (e.g., Huffcutt, Weekley, et al., 2001), the interviews we
developed measured only three, relatively independent dimensions. Taken together, we believe these results provide some
optimism for using structured employment interviews to measure
narrow personality dimensions.
As for relations between interviewer and self-report personality
ratings, they were rather modest and generally consistent with
prior studies (e.g., Barrick et al., 2000; Cortina et al., 2000;
Huffcutt, Weekley, et al., 2001). An important contribution of this
investigation is that developing interviews specifically designed to
assess certain personality variables did not appear to increase
correlations with self-report ratings of the same dimensions. A
variety of reasons could contribute to the modest correspondence
between self and interviewer personality judgments. For one,
interviews and self-report inventories are very different assessment
methods. Interview questions and rating scales are designed to
elicit and evaluate work-related behaviors, whereas items on selfreport measures like the NEO are not context-specific (e.g., She
keeps a cool head in emergencies.). Likewise, self-report items
generally contain a discrete number of response options, whereas
structured interview questions involve open-ended responses that
interviewers must try to match with predetermined behavioral
benchmarks. The two sets of ratings are also based on different
behavioral domains. Interviewees probably considered their behavior in a variety of situations (e.g., at work, at home, at social
events) when making their ratings, whereas interviewer personality
ratings are based on a much more restricted domain of behaviors
(i.e., how the interviewee behaved during the interview).
Thus, moderate relations between interview and self-report personality ratings do not necessarily indicate that interviewer personality ratings lack construct-related validity. In fact, self-report
personality scores are not necessarily the criteria against which all
other methods of personality assessment should be evaluated. For
example, it is likely that some intentional distortion and selfdeception (Paulhus, 1991) exist even when there is no motivation
for individuals to present themselves in a desirable way. These
factors underscore how difficult it can be to assess the constructrelated validity of personality judgments within a nomological net.
The main goal of this study was to assess the degree to which a
personality-based structured employment interview was susceptible to response inflation. We hypothesized that the various cognitive and behavioral aspects of the interview situation would inhibit
response inflation relative to a self-report measure designed to
assess the same personality factors. We also maintained that applicants have less control over interview scores than self-report
personality scores and that this could likewise attenuate interview
response inflation. The study hypothesis was supported in that the
mean differences between the applicant-like and honest conditions
were significantly larger for self-report NEO scores than for the
structured interview ratings and interviewer NEO ratings. In fact,
the mean difference between self-report scores from the two conditions was about three times larger than the mean difference
between the two groups for the structured interview ratings, and
was about nine times larger than the honestapplicant-like differences for interviewer NEO ratings. These results help rule out the
possibility that the level of response inflation is unaffected by
using a structured interview to measure personality.
548
We also examined the extent to which instructions to respond

like a job applicant influenced the underlying structure of interviewer and self-report personality judgments. CFA indicated that
the applicant-like instructions had a detrimental effect on both the
structured interview ratings and self-report NEO ratings, but had
no appreciable effect on interviewer NEO ratings. Interviewer
NEO ratings were best described by the a priori three-factor model
in both conditions. On the other hand, items from the SelfDiscipline and Vulnerability scales comprised a single factor in the
structured interview and self-report data from the applicant-like
condition. Thus, although the applicant-like instructions resulted in
more response inflation within the self-report NEO data than
within the structured interview data, these instructions appeared to
have a similar influence on the underlying structure of the two
personality measures.
Limitations
Although we believe the present results are informative, it is
important to note some limitations of this study. First, although we
recruited experienced interviewers to conduct the interviews, the
ratings that they provided were not for keeps. This clearly limits
the extent to which these results can be generalized to an organizational context. The fact that college students served as interviewees in this study may also decrease the generalizability of the
results. The vast majority of students, however, were advanced
undergraduates and graduate students who had both previous interview and customer service experience. Further, most students
participated in this study to help prepare for the postgraduation job
search process and thus appeared to take the interviews quite
seriously. Nonetheless, readers should be cautious about generalizing these results to the use of structured interviews in applied
settings.
Second, we compared response inflation in the interviews and
NEO by manipulating the experimental instructions (i.e., respond
honestly or like a job applicant). As discussed, experimental faking
studies have been criticized in the literature (e.g., Hough & Ones,
2001). We agree that the behavior of participants instructed to
pretend like a job applicant may be quite different from the
behavior of actual applicants. However, we went to great lengths
to make the research environment as realistic as possible and
included design features previous studies have not (e.g., used
stimuli to encourage job-desirable responding). Moreover, even if
inflation in an experimental setting is more severe than in an
applicant setting, these results are still relevant because we demonstrated that there were relative differences in inflation for self
and interviewer ratings of the same personality factors. Thus, we
believe that these results provide useful information about the use
of structured interviews for personality assessment.
Finally, the time commitment required of both the interviewers
and the student interviewees limited the number of participants
available for this study. Although the size of this sample compares
favorably to those of previous studies in this general area of
research (e.g., Barrick et al., 2000; Blackman, 2002), it would have
been helpful if these results had been based on a larger sample. We
hope that researchers with access to larger samples can attempt to
replicate and extend the present findings.
Directions for Future Research

To our knowledge, this was the first study to assess the
construct-related validity of a structured employment interview
specifically designed to measure personality. It is also the only
study we know of to compare response inflation of interview and
self-report scores of the same dimensions. Clearly, additional
research is needed to further understanding of these areas. Results
suggested that structured interviews might be less vulnerable to
inflation than self-report personality measures such as the NEO.
However, this study was conducted in a simulated selection setting. Future studies are needed to compare interview and selfreport measures of the same dimensions in a field setting.
In addition, although these findings suggest that response inflation might be less of an issue in structured interviews, the study
design did not allow us to determine why interviews might be less
susceptible to inflation. For example, we do not know whether the
lower inflation we observed occurred because interviewees were
unable to inflate their responses, or whether they were able to
inflate their responses but the experimental situation did not provide enough motivation to do so. An important avenue for future
research is to determine whether response inflation in interviews is
a can-do issue, a will-do issue, or some combination of the
two. If it is a can-do issue, then studies are needed to identify why
interviews are more resistant to inflation. We proposed several
reasons structured interviews may inhibit response inflation (e.g.,
cognitive load on interviewees, behavioral nature of the interview
situation), but we did not have the opportunity to investigate their
relative influence. We hope other researchers will be able to
examine the influence of these and other features of the interview
situation.
We examined the influence of instructions to inflate on the
means and construct-related validity of interview and self-report
personality ratings. However, the present results provide only
preliminary (and incomplete) evidence regarding the relative usefulness of these two methods of personality assessment. A critical
area for future research is to compare interview and self-report
measures on other important factors, including criterion-related
validity, subgroup differences, and applicant reactions. Researchers should also compare the utility of interview and self-report
assessment techniques. Several well-validated personality measures are available to selection practitioners, and the costs of using
such measures are relatively small. In contrast, considerable time
and resources may be needed to develop, validate, and administer
a personality-based interview. Studies are needed to determine
whether such costs outweigh the superior predictive validity that
personality-based interviews may provide.
The interviews we developed assessed only three facets of the
Big Five. Future studies should evaluate interviews designed to
measure other broad and narrow dimensions of personality. In fact,
relatively little is known about the constructs (personality or otherwise) interviews can validly assess. Researchers need to determine what types of knowledge, skills, abilities, and other characteristics interviews can assess, and then examine the predictive
validity and adverse impact associated with interviewer judgments
of those characteristics. For example, Funder (1995, 1999) suggested that certain personality variables may be easier for observers to judge than others. Indeed, studies in the personality literature
have found that judgment accuracy (e.g., in terms of self other
correlations) is higher for more observable traits such as extraversion than for less visible traits such as openness (e.g., Borkenau &
Liebler, 1992; Connolly & Viswesvaran, 1998; Funder & Dobroth,
1987; Watson, 1989). Thus, studies could examine whether employment interviews are more effective for measuring some personality factors than others.
The constructs interviews are designed to measure may also
influence the extent to which applicants can inflate their scores.
For example, inflation may be higher in interviews designed to
assess personality-related constructs than those designed to assess
job knowledge. Given that employment interviews are developed
to measure a variety of constructs (Huffcutt, Conway, et al., 2001),
additional research is needed to better understand the amount and
effects of response inflation in an interview setting.
Finally, future research should compare behavioral and situational interviews for assessing personality and resistance to response inflation. There are reasons why these two interview formats may be differentially effective for assessing personality. For
example, behavioral interviews should be effective for evaluating
personality because traits influence behavior. It is also likely that
personality influences intentions, and thus situational interviews
could be useful for measuring personality.
These two interview formats may also vary in the extent to
which they minimize response inflation. For example, Ellis et al.
(2002) found that interviewees use different impression management tactics in behavioral and situational interviews. Because
future behaviors cannot be verified, it may be easier for individuals
to inflate their responses to situational questions than to behavioral
questions, which in many cases can be confirmed (e.g., via a
reference check). However, there is evidence that performance in
situational interviews depends to some extent on job knowledge
(Conway & Peneno, 1999; Motowidlo, 1999). Therefore, interviewees unfamiliar with the job of interest may be less able to
inflate their responses in a situational interview. Given the popularity of behavioral and situational interviews, we encourage researchers to directly compare the two formats for measuring personality and susceptibility to inflation.
References
Bargh, J. A. (1984). Automatic and conscious processing of social information. In R. Wyer & T. Srull (Eds.), Handbook of social cognition
(Vol. 3, pp. 1 43). Mahwah, NJ: Erlbaum.
Bargh, J. A. (1994). The four horseman of automaticity: Awareness,
intention, efficiency, and control in social cognition. In R. Wyer & T.
Srull (Eds.), Handbook of social cognition (Vol. 1, 2nd ed., pp. 1 40).
Mahwah, NJ: Erlbaum.
Bargh, J. A., & Thein, R. D. (1985). Individual construct accessibility,
person memory, and the recall-judgment link: The case of information
overload. Journal of Personality and Social Psychology, 49, 1129 1146.
Barrick, M. R., & Mount, M. K. (1996). Effects of impression management
and self-deception on the predictive validity of personality constructs.
Journal of Applied Psychology, 81, 261272.
Barrick, M. R., Mount, M. K., & Judge, T. A. (2001). Personality and
performance at the beginning of the new millennium: What do we know
and where do we go next? International Journal of Selection and
Assessment, 9, 9 30.
Barrick, M. R., Patton, G. K., & Haugland, S. N. (2000). Accuracy of
interview judgments of job applicant personality traits. Personnel Psychology, 53, 925951.
Binning, J. F., LeBreton, J. M., & Adorno, A. J. (1999). Assessing
549
personality. In R. W. Eder & M. M. Harris (Eds.), The employment

interview handbook (pp. 105123). Thousand Oaks, CA: Sage.
Blackman, M. C. (2002). Personality judgment and the utility of the
unstructured employment interview. Basic and Applied Social Psychology, 24, 241250.
Blackman, M. C., & Funder, D. C. (1998). The effect of information on
consensus and accuracy in personality judgment. Journal of Experimental Social Psychology, 34, 164 181.
Blackman, M. C., & Funder, D. C. (2002). Effective interview practices for
accurately assessing counterproductive traits. International Journal of
Selection and Assessment, 10, 109 116.
Borkenau, P., & Liebler, A. (1992). Trait inferences: Sources of validity at
zero acquaintance. Journal of Personality and Social Psychology, 62,
645 657.
Borman, W. C., Hanson, M. A., & Hedge, J. W. (1997). Personnel
selection. Annual Review of Psychology, 48, 299 337.
Bozeman, D. P., & Kacmar, K. M. (1997). A cybernetic model of impression management processes in organizations. Organizational Behavior
and Human Decision Processes, 69, 9 30.
Brown, D. J., Diefendorff, J. M., Kamin, A. M., & Lord, R. G. (1999,
April). Predicting organizational citizenship with the big five: The
source matters. Paper presented at the 14th Annual Conference of the
Society of Industrial and Organizational Psychology, Atlanta, GA.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model
fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation
models (pp. 136 142). Newbury Park, CA: Sage.
Burnkrant, S. R. (2001, April). Faking personality profiles: Jobdesirability or social desirability. Paper presented at the 16th Annual
San Diego, CA.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant
validation by the multitrait-multimethod matrix. Psychological Bulletin,
56, 81105.
Cascio, W. F., & Phillips, N. F. (1979). Performance testing: A rose among
thorns. Personnel Psychology, 32, 751766.
Cellar, D. F., Miller, M. L., Doverspike, D. D., & Klawsky, J. D. (1996).
Comparison of factor structures and criterion-related validity coefficients for two measures of personality based of the five factor model.
Journal of Applied Psychology, 81, 694 704.
Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation
analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum.
Colquitt, J. A., LePine, J. A., & Noe, R. A. (2000). Toward an integrative
theory of training motivation: A meta-analytic path analysis of 20 years
of research. Journal of Applied Psychology, 85, 678 707.
Colvin, C. R., & Funder, D. C. (1991). Predicting personality and behavior:
A boundary on the acquaintanceship effect. Journal of Personality and
Social Psychology, 60, 884 894.
Connolly, J. J., & Viswesvaran, C. (1998, April). The convergent validity
between self- and observer ratings of personality. Paper presented at the
13th Annual Conference of the Society for Industrial and Organizational
Psychology, Dallas, TX.
Conway, J. M., Jako, R. A., & Goodman, D. F. (1995). A meta-analysis of
interrater and internal consistency reliability of selection interviews.
Journal of Applied Psychology, 80, 565579.
Conway, J. M., & Peneno, G. M. (1999). Comparing structured interview
question types: Construct validity and applicant reactions. Journal of
Business and Psychology, 13, 485506.
Cook, K. W., Vance, C. A., & Spector, P. E. (2000). The relation of
candidate personality with selection-interview outcomes. Journal of
Applied Social Psychology, 30, 867 885.
Cortina, J. M., Goldstein, N. B., Payne, H. K., Davison, & Gilliland, S. W.
(2000). The incremental validity of interview scores over and above
cognitive ability and conscientiousness scores. Personnel Psychology,
53, 325347.
550
Costa, P. T., & McCrae, R. R. (1992). Professional manual for the

Revised NEO Personality Inventory (NEO PI-R) and the NEO Five
Factor Inventory (NEO-FFI). Odessa, FL: Psychological Assessment
Resources.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological
tests. Psychological Bulletin, 52, 281302.
DePaulo, B. M., Lindsay, J. L., Malone, B. E., Muhlenbruck, L., Charlton,
K., & Cooper, H. (2003). Cues to deception. Psychological Bulletin,
129, 74 118.
DePaulo, B. M., Stone, J. I., & Lassiter, G. D. (1985). Deceiving and
detecting deceit. In B. R. Schlenker (Ed.), The self and social life (pp.
323370). New York: McGraw-Hill.
DeShon, R. P., & Alexander, R. A. (1996). Goal setting effects on implicit
and explicit learning of complex tasks. Organizational Behavior and
Human Decision Processes, 65, 18 36.
Douglas, E. F., McDaniel, M. A., & Snell, A. F. (1996, August). The
validity of non-cognitive measures decays when applicants fake. Paper
presented at the 11th Annual Conference of the Academy of Management, Cincinnati, OH.
DuBois, C. L. Z., Sackett, P. R., Zedeck, S., & Fogli, L. (1993). Further
exploration of typical and maximum performance criteria: Definitional
issues, prediction, and White-Black differences. Journal of Applied
Psychology, 78, 205211.
Ellingson, J. E., Smith, D. B., & Sackett, P. R. (2001). Investigating the
influence of social desirability on personality factor structure. Journal of
Applied Psychology, 86, 122133.
Ellis, A. P. J., West, B. J., Ryan, A. M., & DeShon, R. P. (2002). The use
of impression management tactics in structured interviews: A function of
question type? Journal of Applied Psychology, 87, 1200 1208.
Frei, R. L., & McDaniel, M. A. (1998). Validity of customer service
measures in personnel selection: A review of criterion and construct
evidence. Human Performance, 11, 127.
Funder, D. C. (1995). On the accuracy of personality judgment: A realistic
approach. Psychological Review, 102, 652 670.
Funder, D. C. (1999). Personality judgment: A realistic approach to person
perception. San Diego, CA: Academic Press.
Funder, D. C. (2001). The personality puzzle (2nd ed.). New York: Norton.
Funder, D. C., & Colvin, C. R. (1988). Friends and strangers: Acquaintanceship, agreement, and the accuracy of personality judgment. Journal
of Personality and Social Psychology, 55, 149 158.
Funder, D. C., & Dobroth, K. M. (1987). Differences between traits:
Properties associated with inter-judge agreement. Journal of Personality
and Social Psychology, 52, 409 418.
Funder, D. C., & Sneed, C. D. (1993). Behavioral manifestations of
personality: An ecological approach to judgmental accuracy. Journal of
Personality and Social Psychology, 64, 479 490.
Gilbert, D. T., & Osborne, R. E. (1989). Thinking backward: Some curable
and incurable consequences of cognitive busyness. Journal of Personality and Social Psychology, 57, 940 946.
Goffin, R. D., Rothstein, M. G., & Johnston, N. G. (1996). Personality
testing and the assessment center: Incremental validity in managerial
selection. Journal of Applied Psychology, 81, 746 756.
Goldberg, L. R. (1990). An alternative description of personality: The
Big-Five factor structure. Journal of Personality and Social Psychology,
59, 1216 1229.
Goldberg, L. R. (1993). The structure of phenotypic personality traits.
American Psychologist, 48, 26 34.
Griffith, R. L., Chmielowski, T. S., Snell, A. F., Frei, R. L., & McDaniel,
M. A. (2000, April). Does faking matter? An examination of rank order
changes in applicant data. In D. B. Schmidt (Chair), Assessing the
prevalence and impact of applicant faking. Symposium conducted at the
15th Annual Conference of the Society for Industrial and Organizational
Psychology, New Orleans, LA.
Hamill, D., & Bartle, S. (1998, April). Applicant perceptions of selection
procedures and future job seeking behaviors. Paper presented at the 13th
Annual Conference of the Society for Industrial and Organizational
Psychology, Dallas, TX.
Hogan, R., Hogan, J., & Roberts, B. W. (1996). Personality measurement
and employment decisions. American Psychologist, 51, 469 477.
Hough, L. M. (1998). Effects of intentional distortion in personality measurement and evaluation of suggested palliatives. Human Performance,
11, 209 244.
Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy,
R. A. (1990). Criterion-related validities of personality constructs and
the effect of response distortion on those validities. Journal of Applied
Hough, L. M., & Furnham, A. (2003). Importance and use of personality
variables in work settings. In I. B. Weiner (Ed.-in-Chief) & W. Borman,
D. Ilgen, & R. Klimoski (Vol. Eds.), Handbook of psychology: Vol. 12.
Industrial and organizational psychology (pp. 131169). New York:
Wiley.
Hough, L. M., & Ones, D. S. (2001). The structure, measurement, validity,
and use of personality variables in industrial, work, and organizational
psychology. In N. R. Anderson, D. S. Ones, H. K. Sinangil, & C.
Viswesvaran (Eds.), Handbook of work psychology (pp. 233377). London and New York: Sage.
Hough, L. M., Oswald, F. L., & Ployhart, R. E. (2001). Determinants,
detection, and amelioration of adverse impact in personnel selection
procedures: Issues, evidence, and lessons learned. International Journal
of Selection and Assessment, 9, 152194.
Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structure
modeling: Sensitivity to underparametized model misspecification. Psychological Methods, 3, 424 453.
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance
structure analysis: Conventional criteria versus new alternatives. Structural Equations Modeling, 6, 155.
Huffcutt, A. I., Conway, J. M., Roth, P. L., & Stone, N. J. (2001).
Identification and meta-analytic assessment of psychological constructs
measured in employment interviews. Journal of Applied Psychology, 86,
897913.
Huffcutt, A. I., Weekley, J., Wiesner, W. H., DeGroot, T., & Jones, C.
(2001). Comparison of situational and behavior description interview
questions for higher-level positions. Personnel Psychology, 54, 619
644.
Jackson, D. N., Peacock, A. C., & Holden, R. R. (1982). Professional
interviewers trait inferential structures for diverse occupational groups.
Organizational Behavior and Human Performance, 29, 120.
Jackson, D. N., Peacock, A. C., & Smith, J. P. (1980). Impressions of
personality in the employment interview. Journal of Personality and
Social Psychology, 39, 294 307.
Joreskog, K. G., & Sorbom, D. (1996). LISREL 8: Users reference guide.
Chicago: Scientific Software International.
Judge, T. A., Bono, J. E., Remus, I., & Gerhardt, M. W. (2002). Personality
and leadership: A qualitative and quantitative review. Journal of Applied
Judge, T. A., Martocchio, J. J., & Thoresen, C. J. (1997). Five factor model
of personality and employee absence. Journal of Applied Psychology,
82, 745755.
Kluger, A. N., & Collela, A. (1993). Beyond the mean bias: The effect of
warning against faking on biodata item variances. Personnel Psychology, 46, 763780.
Kristof, A. L. (2000). Perceived applicant fit: Distinguishing between
recruiters perceptions of person-job and person-organization fit. Personnel Psychology, 53, 643 671.
Kristof-Brown, A., Barrick, M. R., & Franke, M. (2002). Influences and
outcomes of candidate impression management use in job interviews.
Journal of Management, 28, 27 46.

Landy, F. J., Shankster, L. J., & Kohler, S. S. (1994). Personnel selection
and placement. Annual Review of Psychology, 45, 261296.
Levashina, J., & Campion, M. A. (2003, August). Faking in the employment interview. Paper presented at the 18th Annual Conference of the
Academy of Management, Seattle, WA.
Lievens, F., & Conway, J. M. (2001). Dimension and exercise variance in
assessment center scores: A large-scale evaluation of multitraitmultimethod studies. Journal of Applied Psychology, 8, 12021222.
Mann, S., Vrij, A., & Bull, R. (2004). Detecting true lies: Police officers
ability to detect suspects lies. Journal of Applied Psychology, 89,
137149.
McCrae, R. R., Zonderman, A. B., Costa, P. T., Jr., & Bond, M. H. (1996).
Evaluating the replicability of factors in the Revised NEO Personality
Inventory: Confirmatory factor analysis versus Procrustes rotation. Journal of Personality and Social Psychology, 70, 552566.
McFarland, L. A., & Ryan, A. M. (2000). Variance in faking across
noncognitive measures. Journal of Applied Psychology, 85, 812 821.
McFarland, L. A., Ryan, A. M., & Kriska, S. D. (2003). Impression
management use and effectiveness across assessment methods. Journal
of Management, 29, 641 661.
McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some
intraclass correlations coefficients. Psychological Methods, 1, 30 46.
McHenry, J. J., Hough, L. M., Toquam, J. T., Hanson, M. A., & Ashworth,
S. (1990). Project A validity results: The relationship between predictor
and criterion domains. Personnel Psychology, 43, 335354.
McIntire, S. A., & Miller, L. A. (2000). Foundations of psychological
testing. Boston: McGraw-Hill.
Miller, C. E., & Tristan, E. (2002, April). Expanding the definition of
faking beyond social desirability: The case for job desirability. Paper
presented at the 17th Annual Conference of the Society for Industrial
and Organizational Psychology, Toronto, ON.
Motowidlo, S. J. (1999). Asking about past behavior versus hypothetical
behavior. In R. W. Eder & M. M. Harris (Eds.), The employment
interview handbook (pp. 179 190). Thousand Oaks, CA: Sage.
Mount, M. K., Barrick, M. R., & Stewart, G. L. (1998). Five-factor model
of personality and performance in jobs involving interpersonal interactions. Human Performance, 11, 145165.
Mount, M. K., Barrick, M. R., & Strauss, J. P. (1994). Validity of observer
ratings of the big five personality factors. Journal of Applied Psychology, 79, 272280.
Mueller-Hanson, R., Heggestad, E. D., & Thornton, G. C., III. (2003).
Faking and selection: Considering the use of personality from select-in
and select-out perspectives. Journal of Applied Psychology, 88, 348
355.
Murphy, K. R., & Davidshofer, C. O. (1998). Psychological testing:
Principles and applications (4th ed.). Upper Saddle River, NJ: Prentice
Hall.
Nilsen, D. (1995). Investigation of the relationship between personality
and leadership performance. Unpublished doctoral dissertation, University of Minnesota.
Ones, D. S., & Viswesvaran, C. (1996). Bandwidthfidelity dilemma in
personality measurement for personnel selection. Journal of Organizational Behavior, 17, 609 626.
Ones, D. S., & Viswesvaran, C. (1998). The effects of social desirability
and faking on personality and integrity assessment for personnel selection. Human Performance, 11, 245269.
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive
meta-analysis of integrity test validities: Findings and implications for
personnel selection and theories of job performance [Monograph]. Journal of Applied Psychology, 78, 679 703.
Paulhus, D. L. (1991). Measurement and control of response bias. In J. P.
Robinson, P. R. Shaver, & L. S. Wrightsman (Eds.), Measurement of
personality and social psychological attitudes (pp. 1759). San Diego:
Academic Press.
551
Paunonen, S. V., & Jackson, D. N. (1987). Accuracy of interviewers and

students in identifying the personality characteristics of personnel managers and computer programmers. Journal of Vocational Behavior, 31,
26 36.
Paunonen, S. V., Jackson, D. N., & Oberman, S. M. (1987). Personnel
selection decisions: Effects of applicant personality and letter of reference. Organizational Behavior and Human Decision Processes, 40,
96 114.
Paunonen, S. V., Rothstein, M. G., & Jackson, D. N. (1999). Narrow
reasoning about the use of broad personality measures for personnel
selection. Journal of Organizational Behavior, 20, 389 405.
Perry, J. C. (1992). Problems and considerations in the valid assessment of
personality disorders. American Journal of Psychiatry, 149, 16451653.
Pontari, B. A., & Schlenker, B. R. (2000). The influence of cognitive load
on self-presentation: Can cognitive busyness help as well as harm social
performance. Journal of Personality and Social Psychology, 78, 1092
1108.
Robie, C., Zickar, M. J., & Schmit, M. J. (2001). Measurement equivalence
between applicant and incumbent groups: An IRT analysis of personality
scales. Human Performance, 14, 187207.
Rosse, J. G., Stecher, M. D., Miller, J. L., & Levin, R. A. (1998). The
impact of response distortion on preemployment personality testing and
hiring decisions. Journal of Applied Psychology, 83, 634 644.
Rothstein, M., & Jackson, D. N. (1980). Decision making in the employment interview: An experimental approach. Journal of Applied Psychology, 65, 271283.
Schmit, M. J., & Ryan, A. M. (1993). The big five in personnel selection:
Factor structure in applicant and nonapplicant populations. Journal of
Applied Psychology, 78, 966 974.
Schneider, R. J., Hough, L. A., & Dunnette, M. D. (1996). Broadsided by
broad traits: How to sink science in five dimensions or less. Journal of
Organizational Behavior, 17, 639 655.
Shilobod, T. L., & Raymark, P. H. (2003, August). Individual differences
in ability to fake on personality measures. Paper presented at the 18th
Annual Conference of the Academy of Management, Seattle, WA.
Shriffin, R. M., & Schneider, W. (1977). Controlled and automatic information processing: Perceptual learning, automatic attending, and a general theory. Psychological Review, 84, 127190.
Smith, D. B., & Ellingson, J. E. (2002). Substance versus style: A new look
at social desirability in motivating contexts. Journal of Applied Psychology, 87, 211219.
Smith, D. B., Hanges, P. J., & Dickson, M. W. (2001). Personnel selection
and the five-factor model: Reexamining the effects of applicants frame
of reference. Journal of Applied Psychology, 86, 304 315.
Smith, M. A., Moriatry, K. O., Lutrick, E. C., & Canger, J. M. (2001,
April). Exploratory factor analysis of NEO PI-R for job applicants:
Evidence against the Big Five. Paper presented at the 16th Annual
Convention of the Society for Industrial and Organizational Psychology,
San Diego, CA.
Smither, J. W., Reilly, R. R., Millsap, R. E., Pearlman, K., & Stoffey,
R. W. (1993). Applicant reactions to selection procedures. Personnel
Psychology, 46, 49 76.
Stark, S., Chernyshenko, O. S., Chan, K.-Y., Lee, W. C., & Drasgow, F.
(2001). Effects of the testing situation on item responding: Cause for
concern. Journal of Applied Psychology, 86, 943953.
Steiner, D. D., & Gilliland, S. W. (1996). Fairness reactions to personnel
selection techniques in France and the United States. Journal of Applied
Psychology, 81, 134 141.
Stevens, C. K., & Kristof, A. L. (1995). Making the right impression: A
field study of applicant impression management during job interviews.
Journal of Applied Psychology, 80, 587 606.
Trull, T. J., & Widiger, T. A. (1997). Structured interview for the fivefactor model of personality (SIFFM): Professional manual. Odessa, FL:
Psychological Assessment Resources.
552
Trull, T. J., Widiger, T. A., & Burr, R. (2001). A structured interview for
the assessment of the five-factor model of personality: Facet-level relations to the Axis II personality disorders. Journal of Personality, 69,
175198.
Trull, T. J., Widiger, T. A., Useda, J. D., Holcomb, J., Doan, B., Axelrod,
S. R., et al. (1998). A structured interview for the assessment of the
five-factor model of personality. Psychological Assessment, 10, 229
240.
Tupes, E. C., & Christal, R. E. (1992). Recurrent personality factors based
on trait ratings. Journal of Personality, 60, 225251.
Van Iddekinge, C. H., Raymark, P. H., Eidson, C. E., Jr., & Attenweiler,
W. (2004). What do structured interviews really measure? The construct
validity of behavior description interviews. Human Performance, 17,
7193.
Van Iddekinge, C. H., Raymark, P. H., Eidson, C. E., Jr., & Putka, D. P.
(2003, April). Applicant-incumbent differences on personality, integrity,
and customer service measures. Paper presented at the 18th Annual
Orlando, FL.
Vrij, A., Edward, K., Roberts, K. P., & Bull, R. (2000). Detecting deceit via
analysis of verbal and nonverbal behavior. Journal of Nonverbal Behavior, 24, 239 263.
Vrij, A., Semin, G. R., & Bull, R. (1996). Insight into behavior displayed
during deception. Human Communication Research, 22, 544 562.
Watson, D. (1989). Strangers ratings of the five robust personality factors:
Evidence of a surprising convergence with self-report. Journal of Personality and Social Psychology, 57, 120 128.
Webster, E. C. (1982). The employment interview: A social judgment
process. Schomberg, Canada: S. I. P.
Widiger, T. A., & Sanderson, C. J. (1995). Assessing personality disorders.
In J. N. Butcher (Ed.), Clinical personality assessment: Practical approaches (pp. 380 394). New York: Oxford University Press.
Woehr, D. J., & Arthur, W., Jr. (2003). The construct-related validity of
assessment center ratings: A review and meta-analysis of the role of
methodological factors. Journal of Management, 29, 231258.
Zimmerman, M. (1994). Diagnosing personality disorders: A review of
issues and research methods. Archives of General Psychiatry, 51, 225
245.
Received October 4, 2002

Revision received June 18, 2004
Accepted July 12, 2004

Assessing Personality With A Structured Employment Interview

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Assessing Personality With A Structured Employment Interview

Hochgeladen von

Copyright:

Verfügbare Formate

Journal of Applied Psychology

2005, Vol. 90, No. 3, 536 552

Copyright 2005 by the American Psychological Association

Assessing Personality With a Structured Employment Interview:

Patrick H. Raymark and Philip L. Roth

Human Resources Research Organization

Chad H. Van Iddekinge, Human Resources Research Organization,

RESPONSE INFLATION IN EMPLOYMENT INTERVIEWS

Sources of Personality Ratings

Assessing Personality in Interviews

validity of interview-based personality judgments (Barrick et al.,

VAN IDDEKINGE, RAYMARK, AND ROTH

Trull and colleagues have developed an interview to measure

Response Distortion in Self-Report Personality Measures

Response Inflation in Employment Interviews

RESPONSE INFLATION IN EMPLOYMENT INTERVIEWS

ment (IM) behaviors in interviews (e.g., Ellis, West, Ryan, &

plicants are allowed to skip questions or change their responses to

VAN IDDEKINGE, RAYMARK, AND ROTH

also appeared to be less effective than the tactics used in the

people who are lying tend to provide longer answers to questions

RESPONSE INFLATION IN EMPLOYMENT INTERVIEWS

Contributions of the Current Study

VAN IDDEKINGE, RAYMARK, AND ROTH

Personality-based interview. The debate continues among selection

RESPONSE INFLATION IN EMPLOYMENT INTERVIEWS

Construct-Related Validity of Personality-Based Interview

VAN IDDEKINGE, RAYMARK, AND ROTH

Note. Honest condition correlations are below the diagonal, and

tions was not statistically significant in either condition. The main

RESPONSE INFLATION IN EMPLOYMENT INTERVIEWS

VAN IDDEKINGE, RAYMARK, AND ROTH

Influence of Applicant-Like Instructions on Interviewer

interviewer and self-report personality ratings were notably larger

We thank an anonymous reviewer for suggesting this approach.

RESPONSE INFLATION IN EMPLOYMENT INTERVIEWS

Note. d standardized mean difference between ratings from the

sure designed to assess the same personality factors. However,

previous studies that have not found evidence of construct-related

VAN IDDEKINGE, RAYMARK, AND ROTH

We also examined the extent to which instructions to respond

Directions for Future Research

RESPONSE INFLATION IN EMPLOYMENT INTERVIEWS

personality. In R. W. Eder & M. M. Harris (Eds.), The employment

VAN IDDEKINGE, RAYMARK, AND ROTH

Costa, P. T., & McCrae, R. R. (1992). Professional manual for the

RESPONSE INFLATION IN EMPLOYMENT INTERVIEWS

Paunonen, S. V., & Jackson, D. N. (1987). Accuracy of interviewers and

VAN IDDEKINGE, RAYMARK, AND ROTH

Received October 4, 2002

Das könnte Ihnen auch gefallen