Sie sind auf Seite 1von 12

Journal of Applied Psychology 2011, Vol. 96, No.

4, 762773

2010 American Psychological Association 0021-9010/10/$12.00 DOI: 10.1037/a0021832

Validity of Observer Ratings of the Five-Factor Model of Personality Traits: A Meta-Analysis


In-Sue Oh
Virginia Commonwealth University

Gang Wang and Michael K. Mount


University of Iowa

Conclusions reached in previous research about the magnitude and nature of personalityperformance linkages have been based almost exclusively on self-report measures of personality. The purpose of this study is to address this void in the literature by conducting a meta-analysis of the relationship between observer ratings of the five-factor model (FFM) personality traits and overall job performance. Our results show that the operational validities of FFM traits based on observer ratings are higher than those based on self-report ratings. In addition, the results show that when based on observer ratings, all FFM traits are significant predictors of overall performance. Further, observer ratings of FFM traits show meaningful incremental validity over self-reports of corresponding FFM traits in predicting overall performance, but the reverse is not true. We conclude that the validity of FFM traits in predicting overall performance is higher than previously believed, and our results underscore the importance of disentangling the validity of personality traits from the method of measurement of the traits. Keywords: personality, five-factor model, observer ratings, job performance, selection

Numerous meta-analytic studies using the five-factor model (FFM) personality traits as an organizing framework have shown that personality traits are valid predictors of job performance for numerous criteria. In particular, Conscientiousness, and to a lesser extent Emotional Stability, are the most consistent predictors across jobs and criteria (e.g., Barrick & Mount, 1991; Hurtz & Donovan, 2000; Salgado, 1997). Other FFM traits (e.g., Agreeableness) are also valid predictors of performance in certain jobs or certain performance criteria (e.g., organizational citizenship behavior), but their validities do not generalize across jobs and performance criteria (see Ilies, Fulmer, Spitzmuller, & Johnson, 2009). Despite these positive findings, the magnitude of the validity obtained in these studies is only moderate, with estimated operational validity in the .20s range (Barrick, Mount, & Judge, 2001; Schmidt, Shaffer, & Oh, 2008). In light of these findings, a set of

This article was published Online First December 13, 2010. In-Sue Oh, Department of Management, School of Business, Virginia Commonwealth University; Gang Wang and Michael K. Mount, Department of Management and Organizations, Tippie College of Business, University of Iowa. An earlier version of this article was presented at the 2010 annual meeting of the Academy of Management, Montreal, Quebec, Canada, August 6 10. We thank Frank Schmidt, Murray Barrick, and Kibeom Lee for their useful comments on an earlier version of this article. We also thank Jared LeDoux and Donald H. Kluemper for sharing their data with us. Correspondence concerning this article should be addressed to In-Sue Oh, Department of Management, School of Business, Virginia Commonwealth University, 301 West Main Street, Box 844000, Richmond, VA 23284, or to Michael K. Mount, Department of Management and Organizations, Tippie College of Business, S358 John Pappajohn Bus Building, University of Iowa, Iowa City, IA 52242-1994. E-mail: isoh@vcu.edu or michael-mount@uiowa.edu 762

former journal editors of the Journal of Applied Psychology and Personnel Psychology reviewed the literature pertaining to the relationship between personality traits and job performance (Morgeson et al., 2007), and they concluded that the validity of personality measures in predicting job performance is so low that the use of self-report personality tests in selection contexts should be reconsidered. Similarly, Ployhart (2006, pp. 878 879) stated, This concern is not restricted to academics; applied experience suggests many practitioners and human resource (HR) managers remain skeptical of personality testing because these validities appear so small. It should be noted, however, that other scholars have refuted these claims and have presented a more favorable view of the validity of self-report personality measures (e.g., Oh & Berry, 2009; Ones, Dilchert, Viswesvaran, & Judge, 2007; Tett & Christiansen, 2007). Thus, considering all arguments pro and con, it is fair to say that there is considerable debate about the amount of progress that has been made in understanding relationships between personality traits and job performance. However, one area where there is no disagreement is that virtually all of our knowledge about the usefulness of personality traits in selection contexts (e.g., which personality traits predict which criteria; how well they predict; and whether the validities generalize across jobs, organizations, and occupations) is based exclusively on a single method of measurement: self-reports of personality traits (see Mount, Barrick, & Strauss, 1994, for an exception). Accordingly, in Morgeson et al.s (2007) article, several of the journal editors suggested that future research should focus on identifying alternatives to self-report personality measures. For example, K. Murphy (p. 719) stated,
I have more faith in the constructs than in the measures. And I think the problem is that we have been relying on self-report measures for the last 100 years. We should look at other ways of assessing personality. If you want to know about someones personality just ask his or her co-workers.

OBSERVER RATINGS OF PERSONALITY

763

Similarly, M. Campion (p. 719) stated, I think one theme I hear is lets think about different ways of measuring these constructs. Lets not abandon the construct, but instead abandon self-report measurement and think about new and innovative ways of measuring the constructs. In their Annual Review of Psychology chapter, Sackett and Lievens (2008, p. 424) also noted that one strategy for improving the measurement of personality is to measure the same construct with different methods than self-reports. Examples of this strategy include measuring the FFM traits using interviews rather than self-reports (Barrick, Patton, & Haugland, 2000; Van Iddekinge, Raymark, & Roth, 2005) and developing implicit measures of personality using situational judgment tests (Motowidlo, Hooper, & Jackson, 2006) or conditional reasoning measures (James et al., 2005; also see Berry, Sackett, & Tobares, 2010). Clearly, in any field of science, it is important to demonstrate the robustness of findings by showing that results converge across multiple methods of measurement of the same traits (Campbell & Fiske, 1959). Given the important role that personality traits play in predicting job performance at work (e.g., Barrick et al., 2001) and given recent, reoccurring pessimistic views of the use of self-reports of personality in high-stakes selection contexts (e.g., Morgeson et al., 2007), it is imperative that researchers examine alternatives to self-report measures of personality in predicting job performance. Accordingly, the major purpose of this study is to conduct a meta-analysis to estimate the validity of observer ratings of FFM personality traits in predicting job performance. We compare the magnitude of the validity estimates of observer ratings of FFM traits with previously derived validity estimates based on self-report measures of FFM traits. We also examine the incremental validity of observer ratings of personality over self-reports of personality using currently available meta-analytic evidence. We believe the findings have the potential to make important practical and theoretical contributions to the field. For example, if the results show that observer ratings of personality have substantially higher validities than validities based on self-reports, it affirms the perspective of both Murphy and Campion in Morgeson et al. (2007) that we should not abandon FFM constructs but rather should explore alternative ways of measuring the constructs. Theoretically, such findings would mean that personality traits should play even more prominent roles in models that seek to explain job performance.

Validity of Self-Reports Versus Observer Ratings of Personality Traits


Historically, the estimated strength of relationships between FFM traits and outcome variables has been based on the ability of self-reports of personality to predict the outcomes. In the field of industrial/organizational (I/O) psychology, use of self-reports of job-relevant personality traits as a selection procedure is often the norm given its positive and generalizable validity and its practically meaningful incremental validity over cognitive ability, the single best selection procedure (Schmidt & Hunter, 1998). Why has the use of self-reports been the most frequently used method in personality measurement? First, it may reflect the convenience of obtaining such information. It is easier for researchers and practitioners to obtain ratings from a single individual who is the target research participant or the focal job applicant or incumbent than it

is to collect the ratings from other individuals. Second, it may reflect the belief that personality judgments made by the target person are the most accurate representations of his or her actual personality. Third, it may reflect the fact that some personality traits (e.g., affective personality traits including Emotional Stability) are inherently internal to the target person and thus are less able to be observed by others or perhaps are unknowable by others (e.g., Spain, Eaton, & Funder, 2000). Lastly, it may reflect research findings that show that self-reports of personality are valid in predicting numerous important life outcomes, including job performance both cross-sectionally and longitudinally (e.g., Judge, Higgins, Thoresen, & Barrick, 1999; Roberts, Kuncel, Shiner, Caspi, & Goldberg, 2007). Nonetheless, as discussed earlier, the validity of self-reports of personality is not as high as many researchers expect to be. As Sackett and Lievens (2008, p. 424) stated, there seems to be a general sense that personality should fare better than it does. There are several possible explanations for the relatively low validity of self-reports of personality traits. First, self-reports of personality are subject to response distortion (faking). Generally speaking, it is clear that individuals can fake/distort their responses to personality measures when instructed to do so in both laboratory and field settings (see Hooper & Sackett, 2008, for meta-analytic evidence). One form of response distortion is impression management tendencies, whereby in high-stakes testing individuals intentionally present themselves in a favorable light to obtain valued rewards (e.g., employment). A second form is self-deception, whereby individuals unintentionally distort their ratings of socially desirable traits in a positive direction (e.g., Paulhus, 1991; Paulhus & Reid, 1984). Both of these response distortion tendencies represent systematic, but irrelevant, variance that may lower the validity of the ratings. There is considerable debate in the literature about the impact of response distortion on the validity of selfreports of personality. Existing research is inconclusive about whether these response distortion tendencies significantly alter the validity of self-reports of personality. For example, Ellingson, Smith, and Sackett (2001) and Ones, Viswesvaran, and Reiss (1996) found that social desirability does not affect the construct and criterion-related validity of self-report measures of personality, respectively. However, in actual selection settings, equal validity does not necessarily translate into the same applicants being selected (Kehoe, 2002; Morgeson et al., 2007). To the degree that applicants do not distort their ratings equally, their rank order on test scores may change, which in turn may alter who is selected if a top-down selection strategy is used. An anonymous reviewer suggested to us that response distortion may be less severe for applicants with high levels of positive FFM traits (e.g., Conscientiousness) than for applicants with low levels of the same traits, which further indicates that the degree of response distortion may not be equal among applicants; some may fake more than others. Overall, the above discussion clearly illustrates that personality measurement (particularly, in selection contexts) should not be limited to self-reports and that we need to look for alternative methods of measurement to move this field forward (Morgeson et al., 2007). For example, there is evidence (e.g., Kolar, Funder, & Colvin, 1996; see Hofstee, 1994) that personality assessments made by well-acquainted observers can provide equally accurate information about the target person. According to socioanalytic theory

764

OH, WANG, AND MOUNT

(R. T. Hogan, 1991), there is a clear distinction between what self-reports and observer ratings of personality measure. Selfreports assess the internal dynamics (e.g., identity) that shape the behavior of an individual, whereas observer ratings capture the reputation of an individual. Because reputations typically are based on the individuals past performance, and because past performance is a good predictor of future performance, reputations are likely to be more predictive of behavior in work settings than the internal dynamics of ones personality. That is, observer ratings of personality are not so much a foresight as a hindsight, in which personality is inferred from the persons behavior rather than vice versa. Funder and Dobroth (1987, p. 409; also see Funder & Colvin, 1988) also argued, evaluations of the people in our social environment are central to our decisions about who to befriend and avoid, trust and distrust, hire and fire, and so on. Hence, the goal of observer ratings of personality is behavioral predictionfrom an evolutionary psychology perspective, the prediction accuracy is critical to survival (R. T. Hogan, 2007). Furthermore, an individuals identity (self-reports of personality) and reputations (observer ratings of personality) are not necessarily the same, although they are related to each other to a meaningful degree. Connolly, Kavanaugh, and Viswesvaran (2007) presented metaanalytic evidence showing that the true-score correlations between self-reports and observer ratings of the FFM personality are moderate to strong with values ranging from a low of .46 for Agreeableness to a high of .62 for Extraversion. This suggests that ratings/reports of the same trait obtained from different sources are related but also capture unique variance. Thus, one of the main questions we address in the present study is whether observer ratings of personality account for incremental validity in job performance over that accounted for by self-reports of personality. In line with this reasoning, Mount et al. (1994) found that observer ratings of sales representatives personality showed substantial incremental validity above and beyond self-reports of personality in predicting overall job performance. When examined in the opposite direction, however, self-reports of personality did not show significant incremental validity over observer ratings of personality. Despite these positive findings, it should be noted that their results were limited to a single jobsales; thus, we do not yet know whether the higher validity of observer ratings compared with self-reports of personality is generalizable to other jobs. Although Mount et al.s (1994) results are encouraging, there are several possible reasons that observer ratings actually may have lower validity than self-reports. One is that observers have limited opportunities to observe a target persons behavior both in terms of the diversity and relevance of situations, as well as the duration and frequency of time spent observing the target. As mentioned earlier (Connolly et al., 2007), some personality traits are private to the target person and thus not easily observable to observers. In other words, target persons are in a better position to rate/report their own personality traits because they observe themselves all the time and across numerous different situations. Second, it is possible that observer ratings, like self-reports, contain response distortion (e.g., friendship bias), albeit different types than those associated with self-reports (e.g., faking). On one hand, it is likely that observers (friends, parents, referees) intentionally minimize targets socially undesirable traits (e.g., being sloppy, rude, nervous, untrustworthy, or unimaginative) or exaggerate targets socially desirable traits (e.g., being hardworking, assertive,

cooperative, unenvious, or creative). On the other hand, the fundamental attribution error posits that observers are inclined to unconsciously make dispositional attributions for targets behavior (Martinko & Gardner, 1987; L. Ross, 1977). Thus, observers may unintentionally amplify the role that targets personality traits play but overlook the influence of situational factors. As a result, it is possible that observers unintentionally give high ratings to desirable traits when targets achieve good results (e.g., job performance) or low ratings of desirable traits to those who achieve poor results, even though the good or bad results might be mainly due to situational factors that are out of targets control. To the extent that these unintentional distortions exist, observer ratings of personality may have lower validity than self-reports of personality. Third, when assessing others personality, observers may also be subject to rater errors, such as leniency, central tendency, and halo found in performance appraisal (Ployhart, Schneider, & Schmitt, 2006; Scullen, Mount, & Goff, 2000). To the degree that observers make these rater errors, observer ratings of personality may have lower validities than self-reports. Nonetheless, given the lack of meta-analytic evidence regarding the validity of observer ratings of personality, it remains an empirical question whether the validity of observer ratings is higher or lower than that of self-reports. In summary, there are compelling reasons to investigate the validity of observer ratings of personality traits versus self-reports as well as their incremental validity over self-reports in the prediction of job performance. First, nearly all of the previous research about the validity of personality traits is based on selfreports, and this has limited the robustness of conclusions of the usefulness of personality for selection purposes. Second, recent research (Connolly et al., 2007) shows that observer ratings demonstrate adequate psychometric propertiessuch as internal consistency, testretest reliability, and convergent validitywhich establishes the necessary conditions for further exploration of their validity. Third, by definition, observer ratings do not contain self-distortion bias (e.g., faking) that constrains the validity, fairness, applicant reaction, and practicality (manager reaction) of self-reports, although observer ratings may be subject to other equally or more serious types of bias (Ployhart, 2006). Fourth, although there is very limited evidence, empirical studies (e.g., Mount et al., 1994) and socioanalytic theory (R. T. Hogan, 1991, 2007) suggest that compared with self-reports, observer ratings (ones reputation) may have higher validities and can account for unique variance in performance measures beyond that explained by self-reports (ones identity). Finally, on the basis of their review of the literature, members of a distinguished panel of ex-journal editors (Morgeson et al., 2007) and authors of a recent review of selection research (Ployhart, 2006; Sackett & Lievens, 2008; Schmidt et al., 2008) have called for more research that examines alternatives to self-reports of personality. Therefore, the main purpose of this study is to conduct a meta-analysis of the validity of observer ratings of personality traits for overall work performance. We examine, via metaanalysis and meta-analytic regression, two important research questions: (a) are observer ratings of FFM personality traits valid predictors of overall job performance? and (b) do observer ratings of FFM personality traits account for incremental validity over self-reports of FFM traits?

OBSERVER RATINGS OF PERSONALITY

765

Method Literature Search


We conducted an extensive electronic and manual search for both published and unpublished studies to minimize publication bias (Cooper, 2003). For the electronic search, we searched electronic databases such as EBSCO, PsycINFO, Web of Science, and Dissertation Abstracts. Given the large number of studies on the relationship between personality and performance, we developed two decision rules to search relevant primary studies. First, we limited our initial search to studies whose abstract included at least a keyword for each of the following two categories: (a) keywords representing personality traits (e.g., personality, Big Five, fivefactor model [FFM]) and (b) keywords representing performance criteria (e.g., performance, organizational citizenship behavior [OCB], helping, deviance, counterproductive work behavior). Second, we searched primary studies citing Mount et al. (1994), the pioneering study examining the validity of observer ratings of personality in predicting job performance. For the manual search, we consulted all issues of major I/O psychology journals (e.g., Journal of Applied Psychology, Personnel Psychology) published as of May 2010 for in-press articles that may have not yet been included in the electronic databases. In addition, we searched abstracts of recent major scholarly meetings (e.g., the Society for Industrial and Organizational Psychology and the Academy of Management conference programs) and contacted several researchers active in personality research. Finally, we searched for possible unpublished and in-press studies by sending e-mail requests to the Academy of Management listservs and by posting a request on the call for papers section of the Society for Industrial and Organizational Psychologys official website.

(MBA) applicants personality was measured by three referees, one of whom may be the current supervisor who provided performance ratings (R. Zimmerman, personal communication, May 12, 2010). In addition, we included two primary studies (Aronson, Reilly, & Lynn, 2006; Brennan & Skarlicki, 2004) in which personality and performance were assessed by the same observer. Contrary to our expectations, we found that the extrapolated results were virtually the same with and without the six primary studies (no more than .02). Thus, in Table 1, we report the results that include all six studies. (Results without these six studies are available upon request from the first author.) Fifth, we focused on samples of working adults in organizational settings to generalize our findings to a typical organizational setting and to be comparable with similar meta-analyses examining the validity of selfreports of personality for job performance (e.g., Hurtz & Donovan, 2000). As a result, a total of 16 primary studies with 18 independent samples met the inclusion criteria and were included in the metaanalysis. Among the 16 primary studies, 12 were published articles, and four were unpublished (a dissertation, a data set, and two working papers). All of the samples are incumbent samples. To maximize coding accuracy, the first two authors independently coded all studies and compared their coding results. The initial agreement rate was 93%. All the remaining discrepancies were resolved through a series of discussions.

Meta-Analytic Procedures
We estimated the operational (true) validity (corrected for unreliability in the criterion measure and range restriction on the predictor measure) of observer ratings of personality using Hunter and Schmidts (2004) validity generalization methods, which have been used in all other meta-analyses on the relationship between self-reports of personality and job performance (e.g., Barrick & Mount, 1991; Hurtz & Donovan, 2000). Specifically, we used individual correction methods to synthesize validity coefficients across studies using the HunterSchmidt Meta-Analysis Package Program 1.1 (VG6 Module; Schmidt & Le, 2004). To be comparable with the procedures used in previous metaanalyses (e.g., Hurtz & Donovan, 2000), we corrected each validity (correlation) coefficient for measurement error in the criterion measure using the meta-analytically derived mean interrater reliability estimate, given that no primary study reported interrater reliability estimates (Viswesvaran, Ones, & Schmidt, 1996). If there were multiple raters, we used the meta-analytic single-rater reliability (.52) and the SpearmanBrown formula to derive a reliability estimate that reflected the number of individuals who provided performance ratings. The mean interrater reliability estimate across these studies included in this meta-analysis was .67 (SD .16). It should be noted that Hurtz and Donovan (2000) used a somewhat lower interrater reliability estimate (.53) to correct for measurement error in the criterion measures. With regard to which type of reliability to use (interrater or coefficient alpha) in correcting for unreliability of job performance ratings to estimate operational validity, there is an on-going debate in the literature (Murphy & DeShon, 2000; Schmidt, Viswesvaran, & Ones, 2000). For informational purposes, we report results corrected for measurement error using the meta-analytic mean interrater reliability estimate adjusted for the number of raters and the

Inclusion Criteria
To be included in the meta-analysis, primary studies had to meet the following criteria. First, at least one of the FFM personality traits had to be measured. We mainly included studies in which measures were explicitly designed to measure one of the FFM personality traits (factors), as was done in Hurtz and Donovan (2000). That is, studies in which non-FFM personality traits (e.g., Core Self-Evaluations; Scott & Judge, 2009) were measured were excluded. Second, participants FFM personality traits had to be assessed by single or multiple observers. Third, participants work performance (e.g., overall, task, contextual, counterproductive performance; Rotundo & Sackett, 2002) had to be examined at the individual level. We excluded primary studies in which interview performance was the criterion (e.g., Van Iddekinge, McFarland, & Raymark, 2007). Fourth, participants FFM personality and performance had to be rated by different people/sources to avoid common source bias (Podsakoff, MacKenzie, Lee, & Podsakoff, 2003). At the suggestion of an anonymous reviewer, however, we relaxed this inclusion criterion to increase the number of relevant studies. We included four primary studies (Peterson, Smith, Martorana, & Owens, 2003; S. M. Ross & Offermann, 1997; Shaffer, Harrison, Gregersen, Black, & Ferzandi, 2006; Zimmerman, Triana, & Barrick, 2010) in which there is or might be slight overlap between personality and performance rating sources. For example, in Zimmerman et al. (2010), Master of Business Administration

766

OH, WANG, AND MOUNT

Table 1 Operational Validity of Observer Ratings of Five-Factor Model (FFM) Personality Traits for Overall Job Performance
(1) FFM trait Conscientiousnessa Conscientiousnessb Emotional Stabilitya Emotional Stabilityb Agreeablenessa Agreeablenessb Extraversiona Extraversionb Openness/Intellecta Openness/Intellectb k 17 17 16 16 16 16 14 14 14 14 (2) N 2,171 2,171 1,872 1,872 2,074 2,074 1,735 1,735 1,735 1,735 (3) Nraters 1.72 1.72 1.70 1.70 1.83 1.83 1.73 1.73 1.73 1.73 (4) r .28 .28 .15 .15 .21 .21 .19 .19 .18 .18 (5) .37 .33 .21 .18 .31 .26 .27 .24 .26 .23 (6) SD .17 .15 .00 .00 .14 .15 .15 .13 .19 .17 (7) 80% CrI .16 .14 .21 .18 .12 .06 .08 .07 .02 .01 .59 .53 .21 .18 .49 .46 .46 .40 .50 .44 (8) SE .05 .04 .02 .02 .04 .04 .05 .04 .06 .05 (9) 95% CI .29 .25 .16 .14 .22 .17 .18 .16 .15 .13 .46 .42 .25 .23 .39 .35 .36 .32 .37 .33 (10) % Var 31 30 140 136 39 31 40 40 29 28

Note. (1) number of independent validity coefficientsvalidities for counterproductive work behaviors are reverse coded when aggregating across performance criteria; (2) total sample size; (3) average numbers of observers (raters) for personality; (4) sample-size weighted mean observed validity; (5) estimated mean operational (true) validityvalidity corrected for unreliability in the criterion measure and range restriction on the personality measures; (6) estimated standard deviation of the operational validities; (7) lower and upper bounds of the 80% credibility interval for the validity distribution; (8) estimated standard error for the mean operational validity; (9) lower and upper bounds of the 95% confidence interval for the estimated mean operational validity; (10) percentage of observed variance accounted for by statistical artifacts. a Corrected for measurement error in the criterion using inter-rater reliability after adjusting for the number of raters (M .67, SD .16) and range restriction on the predictor (.92 for Conscientiousness and Extraversion and .91 for the other FFM traits). b Corrected for measurement error in the criterion using internal consistency (coefficient alpha; M .89, SD .08) and range restriction on the predictor (.92 for Conscientiousness and Extraversion and .91 for the other FFM traits).

alpha coefficient estimates available from the primary studies included in this meta-analysis (M .89, SD .08). Because we were interested in estimating operational validity, we did not correct for measurement error in the predictor (FFM personality) measures; the mean reliabilities (coefficients alpha) computed across primary studies included in this meta-analysis range from .77 for Openness to .86 for Extraversion (on average .82 across all FFM traits), which are generally comparable with Viswesvaran and Oness (2000) FFM reliability generalization results. We further corrected for direct range restriction on the predictor measure using the meta-analytic information (ux [ restricted incumbent SD/unrestricted applicant SD] .92 for Conscientiousness and Extraversion, and ux .91 for the other FFM traits) reported in Schmidt et al. (2008). It should be noted that Hurtz and Donovan (2000) used essentially the same mean ux estimate (.92) to correct for measurement error in their predictor measures. It is noted that by using the mean artifact estimates rather than their distributions, the variance due to artifacts was somewhat underestimated, and consequently true standard deviation of operational validities was somewhat overestimated (McDaniel, Whetzel, Schmidt, & Maurer, 1994). This practice, nonetheless, made the 90% credibility value (a cutoff value used to determine validity generalization) more conservative. To ensure that the validity coefficients included in our meta-analysis were statistically independent, we computed a composite correlation when original studies reported multiple validity estimates within a single sample (e.g., correlations between personality and several facets of performance); otherwise, the average of the validity estimates was used. Finally, we initially set the cutoff value of the minimum number of primary studies to be included in each meta-analysis to three based on Chambless and Hollon (1998), who argued that good empirical evidence exists when an important relationship is

found in a minimum of three studies from at least two different researchers.

Results Overall Validity Coefficients


Table 1 presents the results of the omnibus meta-analysis aggregated across performance criteria (e.g., overall, task performance, contextual performance, and counterproductive work behavior). In this analysis, validity (correlation) coefficients for counterproductive work behavior are reverse coded. Average numbers of observers (raters) for the FFM personality traits are 1.72 for Conscientiousness, 1.70 for Emotional Stability, 1.83 for Agreeableness, 1.73 for Extraversion, and 1.73 for Openness/Intellect. The mean operational validity estimates ( ) range from .21 (Emotional Stability) to .37 (Conscientiousness). Consistent with previous meta-analyses (Barrick & Mount, 1991; Hurtz & Donovan, 2000; Salgado, 1997) and second-order meta-analyses (Barrick et al., 2001; Schmidt et al., 2008), Conscientiousness ( .37, k 17, N 2,171) has the highest mean operational validity among the FFM traits, which represents a moderate to strong level of validity. The estimated operational validity for Agreeableness, Openness/Intellect, Extraversion, and Emotional Stability is greater than .20 (ks 14 16, Ns 1,7352,074). The 80% credibility intervals for all FFM traits do not include zero, suggesting that all FFM traits positive validities generalize in at least 90% of the cases. Further, the 95% confidence intervals (CIs) for all FFM traits do not include zero, which indicates that all mean operational validity estimates are significantly greater than zero (nonzero) at the .05 level (Whitener, 1990).

OBSERVER RATINGS OF PERSONALITY

767

Validity Coefficients as a Function of the Number of Observers (Raters)


The validity estimates in Table 1 were based on more than one observer (rater), and the average number of raters varied somewhat across the FFM traits. (On average, across the FFM traits there were 1.74 observers [raters].) It is well established that validity estimates may be influenced (increased) by the use of multiple observers (raters) because averaging across multiple raters may increase both reliability and validity, as the reliability index sets the upper limit of validity (Oh & Berry, 2009; Schmidt & Zimmerman, 2004). Thus, to make the validity estimates based on observer ratings of personality comparable with those based on self-reports of personality, we applied the procedure used in Schmidt and Zimmerman (2004) based on the SpearmanBrown formula to extrapolate mean observed and operational validity estimates based on one, two, and three observer ratings of personality. Accordingly, Table 2 presents the mean observed and operational validity estimates of the FFM traits in terms of the average number of observers (raters) and compares the results with operational validity estimates of self-reports for overall job performance. Using the average number of observers for each FFM personality trait and/or meta-analytic evidence on interrater reliability for a single observer (rater) reported in Connelly (2008), we computed the mean observed and operational validities as a function of the number of observers (reported in Columns 1 and 2 of

Table 2). To make the comparison, we carefully chose metaanalytic results of self-reports of personality in Hurtz and Donovan (2000, Table 3, p. 875). As shown in Table 2, the mean observed and operational validity estimates of the FFM personality traits measured by a single observer are higher than the corresponding results based on selfreports for overall job performance on average by .10 (ranging from .04 [Emotional Stability] to .13 [Openness/Intellect]) and .12 (ranging from .04 [Emotional Stability] to .17 [Openness/ Intellect]), respectively. Further, the mean observed and operational validities of the FFM personality traits based on three observers are even higher than the corresponding results based on self-reports for overall job performance on average by .14 and .19, respectively. Standard deviations of the mean operational validity estimates and the standard errors for the mean operational validity estimates were also estimated by the use of the procedure described in Hunter and Schmidt (2004, pp. 158 159, 205).

Incremental Validity of Single Observer Ratings of Personality Over Self-Reports


Table 3 presents the incremental validity of single observer ratings of personality over self-reports of personality for overall job performance. Column 1 shows mean operational validity estimates for self-reports of FFM traits for job performance reported in Hurtz and Donovan (2000, Table 3). Column 2 shows the

Table 2 Operational Validity of Observer Ratings of Five-Factor Model (FFM) Personality for Overall Job Performance as a Function of the Number of Observers
(1) FFM trait Conscientiousness Rating source self-reporta 1 observerb 2 observersb 3 observersb self-reporta 1 observerb 2 observersb 3 observersb self-reporta 1 observerb 2 observersb 3 observersb self-reporta 1 observerb 2 observersb 3 observersb self-reporta 1 observerb 2 observersb 3 observersb r .15 .25 .29 .31 .09 .13 .16 .17 .07 .18 .21 .23 .06 .17 .19 .21 .03 .16 .19 .20 (2) .22 .32 .38 .41 .14 .18 .22 .24 .10 .26 .32 .34 .09 .24 .28 .29 .05 .22 .27 .29 (3) SD .13 .15 .18 .19 .05 .00 .00 .00 .10 .12 .14 .15 .11 .13 .15 .16 .08 .16 .20 .21 (4) 80% CrI .06 .13 .15 .17 .07 .18 .22 .24 .02 .11 .14 .15 .05 .07 .09 .09 .05 .02 .01 .02 .38 .51 .61 .65 .21 .18 .22 .24 .22 .41 .50 .53 .23 .41 .47 .49 .15 .42 .53 .56 (5) SE .03 .05 .06 .07 .03 .04 .05 .05 .02 .05 .06 .06 .03 .05 .05 .06 .03 .07 .08 .08 (6) 95% CI .17 .22 .26 .27 .09 .10 .12 .14 .05 .16 .20 .22 .04 .14 .18 .17 .01 .08 .11 .13 .27 .42 .50 .55 .19 .26 .32 .34 .15 .36 .44 .46 .14 .34 .38 .41 .11 .36 .43 .45 (7) Diff. .10 .16 .19 .04 .08 .10 .16 .22 .24 .15 .19 .20 .17 .22 .24

Emotional Stability

Agreeableness

Extraversion

Openness/Intellect

Note. (1) estimated mean observed validity; (2) estimated mean operational (true) validityvalidity corrected for unreliability in the criterion measure and range restriction on the personality measures; (3) estimated standard deviation of the operational validity; (4) lower and upper bounds of the 80% credibility interval for the validity distribution; (5) estimated standard error for the mean operational validity; (6) lower and upper bounds of the 95% confidence interval for the estimated mean operational validity; (7) raw differences between operational validities for self-reports and observer ratings of a given FFM trait. Total N and k for each FFM personality are as follows: Conscientiousness (N 2,171, k 17); Emotional Stability (N 1,872, k 16); Agreeableness (N 2,074, k 16); Extraversion (N 1,735, k 14); Openness/Intellect (N 1,735, k 14). a Values for self-report personality are taken from Hurtz and Donovan (2000, Table 3). b Values for observer ratings of personality in this table are extrapolated using the Spearman-Brown formula based on results in boldface in Table 1 and the average number of observers (raters) for each FFM trait: 1.72 for Conscientiousness; 1.70 for Emotional Stability; 1.83 for Agreeableness; 1.73 for Extraversion; and 1.73 for Openness to Experience.

768

OH, WANG, AND MOUNT

Table 3 Regression and Usefulness Analyses for Self-Report Versus Single Observer Ratings of Personality in Predicting Overall Job Performance
(1) FFM trait Conscientiousness Emotional Stability Agreeableness Extraversion Openness/Intellect self .22 .14 .10 .09 .05 (2) other .32 .18 .26 .24 .22 (3)
2 ( Rself) ( Rself)

(4)
2 ( Rother) ( Rother)

(5)
2 (Rs o) (Rs o)

(6)
self

(7)
other

.019 (.013) .020 (.008) .001 (.001) .001 (.000) .003 (.001)

.119 (.067) .060 (.020) .161 (.058) .151 (.050) .173 (.047)

.339 (.115) .200 (.040) .261 (.068) .241 (.058) .223 (.050)

.12 .09 .02 .02 .04

.28 .15 .25 .25 .24

Note. (1) mean operational validity estimates for self-reports of five-factor model (FFM) personality traits from Hurtz and Donovan (2000, Table 3); (2) mean operational validity estimates of observer ratings of personality from the current studyvalidity is attenuated for a single observer (see Table 2); (3) incremental validity ( R) and change in R2 ( R2) due to adding self-reports to observer ratings of a given FFM personality trait; (4) incremental validity ( R) and change in R2 ( R2) due to adding observer ratings to self-reports of a given FFM personality trait; (5) multiple R and R2 from the combination of self-reports and observer ratings of a given FFM personality trait; (6 7) standardized regression weights comparing self-reports vs. observer ratings of a given FFM personality trait.

attenuated operational validity estimates for a single observer rating. Column 3 shows incremental validities of self-reports over a single observer rating of a given FFM trait computed using the meta-analytically derived intercorrelations between self-reports and observer ratings of FFM traits (Connolly et al., 2007), whereas Column 4 reports incremental validities of a single observer rating over self-reports of a given FFM trait. The results for each of the FFM traits show that self-reports of personality provide negligible incremental validity (.001.020; on average .009) above and beyond observer ratings of personality. However, observer ratings of personality provide substantial incremental validity (.060 .173; on average .132) over self-reports of personality in predicting overall job performance. Column 5 shows the overall multiple Rs (.200 .336; on average .25) for both self-reports and observer ratings of a given FFM personality trait; it clearly shows that the multiple Rs are mostly due to observer ratings, rather than self-reports, of personality. Because the validity of a hiring method is a direct determinant of its practical value or utility, we focused on multiple R and incremental validity (change in R) above (Schmidt & Hunter, 1998, p. 262). However, as an anonymous reviewer suggested, some readers may be interested in seeing variance accounted for (R2) and change in R2. Thus, these values are also provided in Table 3 (numbers in parentheses in Columns 3, 4, and 5). It is again clear that for each FFM trait, observer ratings of personality (.020 .067; on average .048 or 4.8%) account for more variance in job performance than do self-reports of personality (.000 .013; on average .005 or .5%), net of each other. Columns 6 and 7 show the standardized regression weights ( s) for self-reports and observer ratings of a given FFM trait. For example, when both observer ratings and self-reports of Conscientiousness are entered in the same equation predicting overall job performance, the estimated standardized regression weight for observer ratings (.28) is 2.33 times larger than that for self-reports (.12).

meta-analyses have shown that the operational validities of selfreports of FFM traits are modest (in the .20s range) at best. Accordingly, this meta-analysis responds to the call by several prominent scholars (e.g., Morgeson et al., 2007; Ployhart, 2006) for research that investigates alternatives to self-report measures of personality. The results lead to different conclusions about the validity of FFM traits in predicting job performance and may inform the debate surrounding the usefulness of personality traits for predicting job performance. Prior to discussing specific results, we urge the reader to use appropriate levels of caution in interpreting our results because of the relatively small number of primary studies/samples included in the meta-analysis (ks 14 17). For example, given the relatively small ks, the standard deviations (variances) of the observed and operational validity estimates may be somewhat overestimated. This may result in wider credibility intervals and conservative estimates of the generalizability of the results across primary studies. However, the mean operational validities are likely to be accurate given considerable sample sizes (Ns 1,7352,171). Relatedly, Hunter and Schmidt (2004, p. 408) stated that several studies combined meta-analytically contain limited information about between-study variance (although they provide substantial information about means).

Magnitude of the Validity of Observer Ratings of FFM Traits


One important contribution of this study is that the operational validities of all FFM traits based on a single observer rating are higher than those obtained in meta-analyses based on self-report measures. The operational validities we obtained for a single observer rating predicting overall performance ranged from .18 (Emotional Stability) to .32 (Conscientiousness), all of which except for Emotional Stability exceed or equal the highest validity reported in previous meta-analyses based on self-reports of the FFM traits (i.e., .22 for Conscientiousness; Hurtz & Donovan, 2000, Table 3). In fact, the magnitude of the differentials between observer-ratings and self-reports based validities in predicting overall performance is substantial (at least .10) except for Emotional Stability. Even for Emotional Stability, the observer validity

Discussion
Although progress has been made in the past 20 years in understanding the validity of personality traits for predicting job performance, results of multiple meta-analyses and second-order

OBSERVER RATINGS OF PERSONALITY

769

is larger by .04 (.18 vs. .14), which translates into about 30% gain in validity. As mentioned earlier, the reason why Emotional Stability based on observer ratings has relatively smaller validity gain might be that internal thought processes inherent to anxiety and depression are difficult to observe. Nonetheless, the findings regarding the mean operational validity for observer ratings of Openness/Intellect (.22 vs. .05 for self-reports; a 340% gain) are especially noteworthy in this regard because every prior metaanalysis based on self-reports has shown that Openness has essentially no validity in predicting overall performance (Schmidt et al., 2008). One possible explanation pertains to the method of measuring Openness. In fact, most of the primary studies included in this meta-analysis measured Openness/Intellect using Goldberg (1992) or similar variants such as Saucier (1994; in fact, a short version of Goldberg, 1992) and the International Personality Item Pool (http://ipip.ori.org/ipip/).1 These FFM questionnaires emphasize more g-loaded facets of Openness such as ideas, inquisitiveness, and intellectance than other facets such as aesthetics and fantasy. Accordingly, the items shared variance with g may account for the high validity obtained for Openness. This also relates to the distinction we made earlier between personality viewed from the self versus other perspective. From the self perspective, Openness may refer to traits associated with ones internal experience, such as the facets of fantasy, feelings, and aesthetics, whereas from the observers perspective, Openness refers to those traits associated with external experience, such as the facets of actions, ideas, and values. Because the external facets are more observable (visible to others) and are more highly correlated with g, observer ratings are more valid predictors of job performance (Griffin & Hesketh, 2004). In sum, our findings based on a single observer rating lead to more optimistic conclusions about the validity of FFM traits in predicting overall performance; not only are the validity estimates substantially higher than those based on self-reports but also all of the FFM traits are more generalizable predictors of overall performance.

Incremental Validity of Observer Ratings


Another contribution of the study is that single observer ratings have substantial incremental validity (on average .132) over corresponding self-report measures of FFM traits in predicting overall performance. However, the reverse is not true, as the incremental gain of self-reports over observer ratings was negligible (close to zero) for all FFM traits. Thus, when single observer ratings are used to predict overall performance, self-report measures add little to the prediction. These findings broadly support the findings in the pioneering study by Mount et al. (1994), which showed higher validities of observer ratings of personality relative to self-reports and substantial incremental validity of observer ratings of personality over self-reports in predicting job performance among sales representatives. Our results extend their findings by including multiple samples and by using meta-analytic methods that remove biases due to statistical and methodological artifacts.

accounts for variance in job performance above that accounted for by other well established predictors. Following Schmidt and Hunters (1998) analysis scheme, we computed the incremental validity of single observer ratings of Conscientiousness ( .32) over cognitive ability ( .51), which is the single best predictor of overall job performance. The overall R is .605, and the incremental validity ( R) of single observer ratings of Conscientiousness over general mental ability is .10. Further, one noteworthy advantage of observer ratings of personality over self-reports is the use of multiple raters, which increases reliability and, in turn, increases its validity (Schmidt et al., 2008). If three observers (raters) were used to assess a target persons personality, we could expect the validity to be estimated at .41 (with R .17) for Conscientiousness. In sum, our findings address many of the concerns raised by prominent scholars that personality traits should predict better than they do; in fact, our results confirm what those scholars suspected, as we found that all of the FFM traits predict overall job performance substantially better than previous research has shown. Given the aforementioned advantage whereby multiple observer ratings are available, one possible use of observer ratings for selection purposes is to embed FFM personality items in reference check procedures. For example, Zimmerman et al. (2010; included in the current meta-analysis) measured two FFM personality traits (Conscientiousness and Emotional Stability) using a reference checklist form. They reported stronger (observed) validities at .39 for Conscientiousness and .26 for Emotional Stability. Similarly, Taylor, Pajo, Cheung, and Stringfield (2004; included in the current meta-analysis) measured two FFM personality traits (Conscientiousness and Agreeableness) through a structured telephone reference check with a brief (less than 15 min) interview with each referee, and they also reported stronger validities. Another noteworthy practical issue is that applicant reactions to selection procedures and tools can influence recruitment outcomes (Ployhart, 2006). As Hausknecht, Day, and Thomas (2004, p. 675) stated, organizations using selection tools and procedures that are perceived unfavorably by applicants may find that they are unable to attract top applicants, and may be more likely to face litigation or negative public relations. On the positive side, the good news is that recent meta-analytic evidence indicates that reference forms (M 3.29, SD 0.93) are more favorably rated than are (selfreport) personality tests (M 2.83, SD 1.01; d 0.47, 95% CI [.36, .58]). As such, it is possible that embedding personality tests in reference forms may lead to more positive applicant reactions than asking applicants to respond to a personality questionnaire. Confirming this expectation, Zimmerman et al. (2010) reported that MBA applicant reactions to a reference checklist containing personality measures were positive. Notwithstanding the above advantages and possibilities of using observer ratings of personality in selection, Hurtz and Donovan (2000, p. 877) noted, the practice of using rating sources other than oneself is not likely to be adopted in personnel selection practices. Although we do not share this view, we acknowledge that there are important issues that need to be addressed before observer ratings can be used for personnel selection purposes. As an example, although it is possible to embed personality measures in other selection tools (e.g.,
We thank an anonymous reviewer for bringing this issue to our attention.
1

Practical and Theoretical Implications


Practically speaking, one way to show the potential of observer ratings of personality as a selection tool is whether this predictor

770

OH, WANG, AND MOUNT

reference forms, phone reference checks, interviews), it is difficult to overcome some of the contextual aspects of the rating procedures. As an anonymous reviewer suggested, applicants are unlikely to seek recommendation letters from individuals who are likely to rate them poorly, and past employers are generally unwilling to provide negative references because of concerns about litigation.2 More research is definitely needed on this issue. Nonetheless, we believe it is premature to dismiss the possible use of observer ratings for personnel selection purposes. Our results show that observer ratings have much higher validities in predicting job performance than self-reports and additionally have an important advantage over self-reports because multiple observer ratings can be averaged, which can reduce the amount of bias (idiosyncrasy) across observers and thus can increase validity. Regardless of the practical issues associated with using observer ratings for selection purposes, the present findings have significant theoretical implications for models of job performance. An important goal in I/O psychology is to develop models that seek to explain how individual differences and situational factors relate to job performance (e.g., Schmidt & Hunter, 1992). By disentangling the validity of FFM traits from their method of measurement, our results contribute to the development of these models by providing a better understanding of the importance of one set of individual difference variables, personality traits, in explaining job performance. Our findings that the operational validities for the FFM traits based on single observer ratings are, on average, 1.5 times (or by .12) larger than those for corresponding traits based on selfreports clearly show that personality traits play a more central role in explaining job performance than previous research has revealed. Further, our findings that each of the FFM traits is a valid (nonzero) predictor of overall performance and that the validities of all FFM traits generalize (80% credibility intervals do not contain zero) suggest that models that include only the FFM constructs of Conscientiousness (and Emotional Stability) are likely to be deficient in explaining job performance. Finally, the findings that FFM traits based on observer ratings account for substantial incremental validity over cognitive ability (e.g., for Conscientiousness alone there is a incremental gain of .10 based on a single observer rating) reveal that FFM traits are comparatively more important relative to general mental ability than previous research has suggested.

Limitations and Future Research Directions


There are several limitations in this meta-analysis. First, as we acknowledged in several places, the number of primary studies and the sample sizes in this meta-analysis are relatively small. However, the overall findings are consistent with those of representative primary studies such as Mount et al. (1994). Specifically, the average raw score difference across the FFM traits is only .02 between the validities based on one observer in Table 2 and the validities reported in Table 1 of Mount et al. with coworker as personality rating source and supervisor as performance rating source. Given the much larger sample sizes in the current study, our study provides more accurate and powerful validity estimates of observer ratings of personality than Mount et al. (Hunter & Schmidt, 2004). In addition, we used relatively stringent inclusion criteria, which restricted the population of primary studies. In these circumstances, it is useful to know the robustness of the findings.

For example, the operational validity of Conscientiousness measured via a single observer rating ( .32) is based on 17 independent samples (N 2,171; average n per sample approximately 130). Following Berry et al. (2010), we determined how many primary studies, each with an operational validity of .20 (a typical level of operational validity) and sample size of 130, would be needed to lower the operational validity of observer ratings of Conscientiousness for overall job performance (.32) to .22, the operational validity of self-reports of Conscientiousness. We found that it would take about 100 such primary studies with the specified parameters to lower the validity to .22. Thus, these findings are quite robust, as it would take the addition of a large number of primary studies to change our conclusions. Having said this, one major goal of this meta-analysis was to stimulate additional research pertaining to observer ratings of personality. It is clear that more primary studies are needed so that we can have more accurate estimates and test more moderators (e.g., occupation, narrower facet, rating source, characteristics of observers such as the level of acquaintanceship). We have alluded to areas where additional research is needed, but there are certainly other areas that need to be addressed. One is the extent to which the characteristics of the observers influence the validity of the ratings. For example, in the present study, most observers are coworkers (peers), which we expect are likely to be the most frequent source of observer ratings in organizational settings. One research question is whether other rating sources such as subordinates, supervisors, customers, friends, or referees (those explicitly asked to provide ratings in high-stakes contexts) may yield comparable or even higher validities. Thus, a fruitful area for future research is to examine the psychometric properties of ratings made by raters from different perspectives. Potential moderators include how long the rater has known the ratee, in what context he or she has known the person (work or nonwork), and what the purpose of the observer rating is (for selection or development). Another fruitful future research might be to selectively choose and combine facets of the FFM because this would yield shorter measures that may have higher validities (Hurtz & Donovan, 2000). Recent research findings show that some facets of FFM traits are significantly more predictive of job performance than other facets and are more predictive of the overall score on the FFM construct (e.g., Ashton, 1998; Dudley, Orvis, Lebiecki, & Cortina, 2006; Mount & Barrick, 1995; Paunonen & Nicol, 2001; Stewart, 1999; Tett, Steele, & Beauregard, 2003). Second, both validation type (predictive vs. concurrent) and validation sample (applicants vs. incumbents) are important issues (moderators) but have rarely been empirically examined in the domain of personality (see Van Iddekinge & Ployhart, 2008, pp. 883 885). However, there are several reasons that these issues should be carefully examined in future personality validation studies. For example, applicants are thought to be more likely than incumbents to attempt to distort their response (i.e., fake) on noncognitive predictors to increase their chances of being selected (Van Iddekinge & Ployhart, 2008, p. 894). As discussed earlier, response distortion may lead to lower validity. In support of this notion, Hough (1998) in fact found that observed validity
We thank an anonymous reviewer for bringing this issue to our attention.
2

OBSERVER RATINGS OF PERSONALITY

771

was smaller for predictive designs than for concurrent designs on average by .07 by reanalyzing personality validation data (albeit not based on explicit FFM measures) collected during Project A. The same may apply to observer ratings of personality. Observers (e.g., referees) are more likely to distort their ratings (i.e., be susceptible to friendship bias) when their target is in high-stakes selection contexts rather than in nonselection contexts. To the extent this is true, the validity estimates reported in this study may be overestimates of predictive designs, given that all primary studies included in the meta-analysis were based on incumbents and that all primary studies except for two (Taylor et al., 2004; Zimmerman et al., 2010) were based on concurrent designs. However, Weekley, Ployhart, and Harold (2004) did not find significant differences in terms of the validity of three FFM traits (Conscientiousness, Extraversion, Agreeableness) between predictive (based on three primary studies) and concurrent designs (based on five primary studies). Thus, research is inconclusive about this issue, and thus more primary studies should further examine this issue.3 Lastly, Ployhart (2006, p. 884) argued that previous metaanalyses (e.g., Barrick & Mount, 1991; J. Hogan & Holland, 2003; Hurtz & Donovan, 2000) provide an effective summary of what has been done on the relationships between personality and job performance, but we may often be interested in questions of what could be done or what should be done. On this point, as did Ployhart (2006), we agree with Landy, Shankster, and Kohler (1994, p. 286), who noted that meta-analysis and traditional research should be complementary and not competitors. That is, meta-analysis should encourage, rather than discourage, future primary studies. We hope that this meta-analysis will further encourage organizational researchers to investigate the use of different sources of personality assessment other than self-reports in their primary studies, as well as to explore procedural issues (both legal and logistical) in the use of observer ratings for selection purposes.

Conclusion
Due caution should be exercised when interpreting our results based on the relatively small number of primary studies included in the meta-analyses. Nonetheless, the results reveal that the validities of observer ratings of FFM traits are substantially higher in predicting overall performance compared with those based on self-reports. Further, consistent with previous research, Conscientiousness has the highest validity predicting overall job performance. The new substantive finding was that when based on observer ratings, all FFM traits are valid (nonzero) and generalizable predictors of overall performance across situations. The results also show that observer ratings have substantial incremental validity over self-reports of FFM traits and over general mental ability in predicting overall performance.

We thank the Action Editor, Robert Ployhart, for bringing this issue to our attention.

References
References marked with an asterisk indicate studies included in the meta-analysis.

*Aronson, Z. H., Reilly, R. R., & Lynn, G. S. (2006). The impact of leader personality on new product development teamwork and performance: The moderating role of uncertainty. Journal of Engineering and Technology Management, 23, 221247. doi:10.1016/j.jengtecman.2006.06.003 Ashton, M. C. (1998). Personality and job performance: The importance of narrow traits. Journal of Organizational Behavior, 19, 289 303. doi:10.1002/(SICI)1099-1379(199805)19:3 289::AID-JOB841 3.0 .CO;2-C *Barrick, M. R. (2009). [Observer ratings of personality and multiple performance outcomes]. Unpublished raw data. Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance. Personnel Psychology, 44, 126. doi: 10.1111/j.1744-6570.1991.tb00688.x Barrick, M. R., Mount, M. K., & Judge, T. A. (2001). The FFM personality dimensions and job performance: Meta-analysis of meta-analyses. International Journal of Selection and Assessment, 9, 9 30. doi:10.1111/ 1468-2389.00160 Barrick, M. R., Patton, G. K., & Haugland, S. N. (2000). Accuracy of interviewer judgments of job applicant personality traits. Personnel Psychology, 53, 925951. doi:10.1111/j.1744-6570.2000.tb02424.x Berry, C. M., Sackett, P. R., & Tobares, V. (2010). A meta-analysis of conditional reasoning tests of aggression. Personnel Psychology, 63, 361384. doi:10.1111/j.1744-6570.2010.01173.x *Brennan, A., & Skarlicki, D. P. (2004). Personality and perceived justice as predictors of survivors reactions following downsizing. Journal of Applied Social Psychology, 34, 1306 1328. doi:10.1111/j.15591816.2004.tb02008.x Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitraitmultimethod matrix. Psychological Bulletin, 56, 81105. doi:10.1037/h0046016 Chambless, D. L., & Hollon, S. D. (1998). Defining empirically supported therapies. Journal of Consulting and Clinical Psychology, 66, 718. doi:10.1037/0022-006X.66.1.7 Connelly, B. S. (2008). The reliability, convergence, and predictive validity of personality ratings: An other perspective (Unpublished doctoral dissertation). University of Minnesota, Minneapolis. Connolly, J. J., Kavanagh, E. J., & Viswesvaran, C. (2007). The convergent validity between self and observer ratings of personality: A metaanalytic review. International Journal of Selection and Assessment, 15, 110 117. doi:10.1111/j.1468-2389.2007.00371.x Cooper, H. (2003). Editorial. Psychological Bulletin, 129, 39. doi: 10.1037/0033-2909.129.1.3 Dudley, N. M., Orvis, K. A., Lebiecki, J. E., & Cortina, J. M. (2006). A meta-analytic investigation of conscientiousness in the prediction of job performance: Examining the intercorrelations and the incremental validity of narrow traits. Journal of Applied Psychology, 91, 40 57. doi:10.1037/0021-9010.91.1.40 Ellingson, J. E., Smith, D. B., & Sackett, P. R. (2001). Investigating the influence of social desirability on personality factor structure. Journal of Applied Psychology, 86, 122133. doi:10.1037/0021-9010.86.1.122 Funder, D. C., & Colvin, C. R. (1988). Friends and strangers: Acquaintanceship, agreement, and the accuracy of personality judgment. Journal of Personality and Social Psychology, 55, 149 158. doi:10.1037/00223514.55.1.149 Funder, D. C., & Dobroth, K. M. (1987). Differences between traits: Properties associated with inter-judge agreement. Journal of Personality and Social Psychology, 52, 409 418. doi:10.1037/0022-3514.52.2.409 Goldberg, L. R. (1992). The development of markers for the Big-Five factor structure. Psychological Assessment, 4, 26 42. doi:10.1037/ 1040-3590.4.1.26 Griffin, B., & Hesketh, B. (2004). Why openness to experience is not a good predictor of job performance. International Journal of Selection and Assessment, 12, 243251. doi:10.1111/j.0965-075X.2004.278_1.x Hausknecht, J. P., Day, D. V., & Thomas, S. C. (2004). Applicant reactions

772

OH, WANG, AND MOUNT Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N. (2007). Reconsidering the use of personality tests in personnel selection contexts. Personnel Psychology, 60, 683 729. doi:10.1111/j.1744-6570.2007.00089.x Motowidlo, S. J., Hooper, A. C., & Jackson, H. L. (2006). Implicit policies about relations between personality traits and behavioral effectiveness in situational judgment items. Journal of Applied Psychology, 91, 749 761. doi:10.1037/0021-9010.91.4.749 Mount, M. K., & Barrick, M. R. (1995). The Big Five personality dimensions: Implications for research and practice in human resource management. Research in Personnel and Human Resources Management, 13, 153200. *Mount, M. K., Barrick, M. R., & Strauss, J. P. (1994). Validity of observer ratings of the Big Five personality factors. Journal of Applied Psychology, 79, 272280. doi:10.1037/0021-9010.79.2.272 Murphy, K. R., & DeShon, R. P. (2000). Inter-rater correlations do not estimate the reliability of job performance ratings. Personnel Psychology, 53, 873900. doi:10.1111/j.1744-6570.2000.tb02421.x *Nilsen, D. (1995). An investigation of the relationship between personality and leadership performance (Unpublished doctoral dissertation). University of Minnesota, Minneapolis. Oh, I.-S., & Berry, C. M. (2009). The five-factor model of personality and managerial performance: Validity gains through the use of 360 degree performance ratings. Journal of Applied Psychology, 94, 1498 1513. doi:10.1037/a0017221 Ones, D. S., Dilchert, S., Viswesvaran, C., & Judge, T. A. (2007). In support of personality assessment in organizational settings. Personnel Psychology, 60, 9951027. doi:10.1111/j.1744-6570.2007.00099.x Ones, D. S., Viswesvaran, C., & Reiss, A. D. (1996). Role of social desirability in personality testing for personnel selection: The red herring. Journal of Applied Psychology, 81, 660 679. doi:10.1037/00219010.81.6.660 *Parks, L., & Mount, M. K. (2005). The dark-side of self-monitoring: Engaging in counterproductive behavior at work. Best Papers Proceedings of the Academy of Management, 1116. Paulhus, D. L. (1991). Enhancement and denial in socially desirable responding. Journal of Personality and Social Psychology, 60, 307317. doi:10.1037/0022-3514.60.2.307 Paulhus, D. L., & Reid, D. B. (1984). Two-component models of socially desirable responding. Journal of Personality and Social Psychology, 46, 598 609. doi:10.1037/0022-3514.46.3.598 Paunonen, S. V., & Nicol, A. A. M. (2001). The personality hierarchy and the prediction of work behaviors. In B. W. Roberts & R. Hogan (Eds.), Personality psychology in the workplace (pp. 161191). Washington, DC: American Psychological Association. doi:10.1037/10434-007 *Peterson, R. S., Smith, D. B., Martorana, P. V., & Owens, P. D. (2003). The impact of chief executive officer personality on top management team dynamics: One mechanism by which leadership affects organizational performance. Journal of Applied Psychology, 88, 795 808. doi: 10.1037/0021-9010.88.5.795 Ployhart, R. E. (2006). Staffing in the 21st century: New challenges and strategic opportunities. Journal of Management, 32, 868 897. doi: 10.1177/0149206306293625 Ployhart, R. E., Schneider, B., & Schmitt, N. (2006). Staffing organizations: Contemporary practice and research. Mahwah, NJ: Erlbaum. Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879 903. doi:10.1037/0021-9010.88.5.879 Roberts, B. W., Kuncel, N., Shiner, R. N., Caspi, A., & Goldberg, L. R. (2007). The power of personality: The comparative validity of personality traits, socio-economic status, and cognitive ability for predicting important life outcomes. Perspectives in Psychological Science, 2, 313 345. doi:10.1111/j.1745-6916.2007.00047.x

to selection procedures: An updated model and meta-analysis. Personnel Psychology, 57, 639 683. doi:10.1111/j.1744-6570.2004.00003.x Hofstee, W. K. B. (1994). Who should own the definition of personality? European Journal of Personality, 8, 149 162. doi:10.1002/ per.2410080302 Hogan, J., & Holland, B. (2003). Using theory to evaluate personality and job performance relations: A socioanalytic perspective. Journal of Applied Psychology, 88, 100 112. doi:10.1037/0021-9010.88.1.100 Hogan, R. T. (1991). Personality and personality measurement. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (Vol. 2, 2nd ed., pp. 873919). Palo Alto, CA: Consulting Psychologists Press. Hogan, R. T. (2007). Personality and the fate of organizations. Mahwah, NJ: Erlbaum. Hooper, A. C., & Sackett, P. R. (2008, April). Self-presentation on personality measures: A meta-analysis. Paper presented at the 23rd Annual Conference of the Society for Industrial and Organizational Psychology, San Francisco, CA. Hough, L. M. (1998). Personality at work: Issues and evidence. In M. Hakel (Ed.), Beyond multiple choice: Evaluating alternatives to traditional testing for selection (pp. 131166). Mahwah, NJ: Erlbaum. Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings. Newbury Park, CA: Sage. Hurtz, G. M., & Donovan, J. J. (2000). Personality and job performance: The Big Five revisited. Journal of Applied Psychology, 85, 869 879. doi:10.1037/0021-9010.85.6.869 Ilies, R., Fulmer, I., Spitzmuller, M., & Johnson, M. (2009). Personality and citizenship behavior: The mediating role of job satisfaction. Journal of Applied Psychology, 94, 945959. doi:10.1037/a0013329 James, L. R., McIntyre, M. D., Glisson, C. A., Green, P. D., Patton, T. W., LeBreton, J. M., . . . Williams, L. J. (2005). A conditional reasoning measure for aggression. Organizational Research Methods, 8, 69 99. doi:10.1177/1094428104272182 *Judge, T. A., & Colbert, A. E. (2001). Personality and leadership: A multi-sample study. Working paper. Judge, T. A., Higgins, C., Thoresen, C. J., & Barrick, M. R. (1999). The Big Five personality traits, general mental ability, and career success across the life span. Personnel Psychology, 52, 621 652. doi:10.1111/ j.1744-6570.1999.tb00174.x *Kamdar, D., & Dyne, L. V. (2007). The joint effects of personality and workplace social exchange relationships in predicting task performance and citizenship performance. Journal of Applied Psychology, 92, 1286 1298. doi:10.1037/0021-9010.92.5.1286 Kehoe, J. (2000). Research and practice in selection. In J. Kehoe (Ed.), Managing selection in changing organizations: Human resource strategies (pp. 397 437). San Francisco, CA: Jossey-Bass. Kolar, D. W., Funder, D. C., & Colvin, C. R. (1996). Comparing the accuracy of personality judgments by the self and knowledgeable others. Journal of Personality, 64, 311337. doi:10.1111/j.1467-6494.1996 .tb00513.x Landy, F. J., Schankster, L. J., & Kohler, S. S. (1994). Personnel selection and placement. Annual Review of Psychology, 45, 261296. doi: 10.1146/annurev.ps.45.020194.001401 *LeDoux, J., & Kluemper, D. H. (2010, August). An examination of potential antecedents and organization-based outcomes of metaperception accuracy. Paper presented at the annual meeting of the Academy of Management, Montreal, Quebec, Canada. Martinko, M. J., & Gardner, W. L. (1987). The leader member attribution process. Academy of Management Review, 12, 235249. doi:10.2307/ 258532 McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). The validity of employment interviews: A comprehensive review and meta-analysis. Journal of Applied Psychology, 79, 599 616. doi: 10.1037/0021-9010.79.4.599

OBSERVER RATINGS OF PERSONALITY Ross, L. (1977). The intuitive psychologist and his shortcomings: Distortions in the attribution process. Advances in Experimental Social Psychology, 10, 173220. doi:10.1016/S0065-2601(08)60357-3 *Ross, S. M., & Offerman, L. R. (1997). Transformational leaders: Measurement of personality attributes and work group performance. Personality and Social Psychology Bulletin, 23, 1078 1086. doi:10.1177/ 01461672972310008 Rotundo, M., & Sackett, P. R. (2002). The relative importance of task, citizenship, and counterproductive performance to global ratings of job performance: A policy-capturing approach. Journal of Applied Psychology, 87, 66 80. doi:10.1037/0021-9010.87.1.66 Sackett, P. R., & Lievens, F. (2008). Personnel selection. Annual Review of Psychology, 59, 419 450. doi:10.1146/annurev.psych.59.103006.093716 Salgado, J. F. (1997). The five factor model of personality and job performance in the European community. Journal of Applied Psychology, 82, 30 43. doi:10.1037/0021-9010.82.1.30 Saucier, G. (1994). Mini-markers: A brief version of Goldbergs unipolar Big-Five markers. Journal of Personality Assessment, 63, 506 516. doi:10.1207/s15327752jpa6303_8 Schmidt, F. L., & Hunter, J. E. (1992). Development of causal models of processes determining job performance. Current Directions in Psychological Science, 1, 89 92. doi:10.1111/1467-8721.ep10768758 Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262274. doi:10.1037/0033-2909.124.2.262 Schmidt, F. L., & Le, H. (2004). Software for HunterSchmidt metaanalysis methods. Iowa City, IA: Department of Management and Organizations, University of Iowa. Schmidt, F. L., Shaffer, J. A., & Oh, I.-S. (2008). Increased accuracy of range restriction corrections: Implications for the role of personality and general mental ability in job and training performance. Personnel Psychology, 61, 827 868. doi:10.1111/j.1744-6570.2008.00132.x Schmidt, F. L., Viswesvaran, C., & Ones, D. S. (2000). Reliability is not validity and validity is not reliability. Personnel Psychology, 53, 901 912. doi:10.1111/j.1744-6570.2000.tb02422.x Schmidt, F. L., & Zimmerman, R. D. (2004). A counterintuitive hypothesis about employment interview validity and some supporting evidence. Journal of Applied Psychology, 89, 553561. doi:10.1037/00219010.89.3.553 Scott, B. A., & Judge, T. A. (2009). The popularity contest at work: Who wins, why, and what do they receive? Journal of Applied Psychology, 94, 20 33. doi:10.1037/a0012951 Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85, 956 970. doi:10.1037/0021-9010.85.6.956 *Shaffer, M. A., Harrison, D. A., Gregersen, H., Black, J. S., & Ferzandi, L. A. (2006). You can take it with you: Individual differences and expatriate effectiveness. Journal of Applied Psychology, 91, 109 125. doi:10.1037/0021-9010.91.1.109 *Small, E. E., & Diffendorff, J. M. (2006). The impact of contextual self-ratings and observer ratings of personality on the personalityperformance relationship. Journal of Applied Social Psychology, 36, 297 320. doi:10.1111/j.0021-9029.2006.00009.x Spain, J. S., Eaton, L. G., & Funder, D. C. (2000). Perspectives on

773

personality: The relative accuracy of self vs. others for the prediction of behavior and emotion. Journal of Personality, 68, 837 867. doi: 10.1111/1467-6494.00118 Stewart, G. L. (1999). Trait bandwidth and stages of job performance: Assessing differential effects for conscientiousness and its subtraits. Journal of Applied Psychology, 84, 959 968. doi:10.1037/00219010.84.6.959 *Taylor, P. J., Pajo, K., Cheung, G. W., & Stringfield, P. (2004). Dimensionality and validity of a structured telephone reference check procedure. Personnel Psychology, 57, 745772. doi:10.1111/j.17446570.2004.00006.x Tett, R. P., & Christiansen, N. D. (2007). Personality tests at the crossroads: A response to Morgeson, Campion, Dipboye, Hollenbeck, Murphy, and Schmitt (2007). Personnel Psychology, 60, 967993. doi: 10.1111/j.1744-6570.2007.00098.x Tett, R. P., Steele, J. R., & Beauregard, R. S. (2003). Broad and narrow measures on both sides of the personalityjob performance relationship. Journal of Organizational Behavior, 24, 335356. doi:10.1002/job.191 Van Iddekinge, C. H., McFarland, L. A., & Raymark, P. H. (2007). Antecedents of impression management use and effectiveness in a structured interview. Journal of Management, 33, 752773. doi: 10.1177/0149206307305563 Van Iddekinge, C. H., & Ployhart, R. E. (2008). Developments in the criterion-related validation of selection procedures: A critical review and recommendations for practice. Personnel Psychology, 61, 871925. doi: 10.1111/j.1744-6570.2008.00133.x Van Iddekinge, C. H., Raymark, P. H., & Roth, P. L. (2005). Assessing personality with a structured employment interview: Construct-related validity and susceptibility to response inflation. Journal of Applied Psychology, 90, 536 552. doi:10.1037/0021-9010.90.3.536 Viswesvaran, C., & Ones, D. S. (2000). Measurement error in Big Five factors personality assessment: Reliability generalization across studies and measures. Educational and Psychological Measurement, 60, 224 235. doi:10.1177/00131640021970475 Viswesvaran, C., Ones, D. S., & Schmidt, F. L. (1996). Comparative analysis of the reliability of job performance ratings. Journal of Applied Psychology, 81, 557574. doi:10.1037/0021-9010.81.5.557 Weekley, J. A., Ployhart, R. E., & Harold, C. M. (2004). Personality and situational judgment tests across applicant and incumbent settings: An examination of validity, measurement, and subgroup differences. Human Performance, 17, 433 461. doi:10.1207/s15327043hup1704_5 Whitener, E. M. (1990). Confusion of confidence intervals and credibility intervals in meta-analysis. Journal of Applied Psychology, 75, 315321. doi:10.1037/0021-9010.75.3.315 *Yoo, T.-Y. (2007). The relationship between HEXACO personality factors and a variety of performance in work organizations. Korean Journal of Industrial and Organizational Psychology, 20, 283314. *Zimmerman, R. D., Triana, M. C., & Barrick, M. R. (2010). Predictive criterion-related validity of observer-ratings of personality and jobrelated competencies using multiple raters and multiple performance criteria. Human Performance, 23, 361378.

Received February 25, 2010 Revision received September 20, 2010 Accepted September 23, 2010

Das könnte Ihnen auch gefallen