Sie sind auf Seite 1von 17

The Emerald Research Register for this journal is available at The current issue and full text archive

ive of this journal is available at


www.emeraldinsight.com/researchregister www.emeraldinsight.com/0268-3946.htm

Motivational
An investigation of motivational factors
factors influencing performance
ratings
695
Rating audience and incentive
Received June 2005
Sylvia G. Roch
Department of Psychology, University at Albany, State University of New York,
Albany, New York, USA

Abstract
Purpose – Managers frequently complain that performance ratings are inflated; thus, this study
aims to explore what extent two motivational factors theoretically associated with accountability,
rating audience and incentive, can influence rating inflation.
Design/methodology/approach – One hundred and forty-nine raters were assigned to one of four
audience conditions (ratee, expert, both ratee and expert – dual, and no audience) and either to an
incentive or no incentive condition.
Findings – Results showed that when an incentive was offered, raters expecting an expert audience
to view their ratings provided significantly lower ratings, and raters expecting a dual audience
provided significantly higher ratings compared to raters not offered an incentive. Furthermore, raters
expecting a ratee audience inflated their ratings, regardless of incentive.
Research limitations/implications – Financial incentives were used in this study and more
research is needed to explore other types of incentives. Nonetheless, this research shows that
incentives influence rating level.
Practical implications – The research suggests that if managers wish to reduce rating inflation,
they should ensure that an audience, other than the person being rated, will view the ratings.
Originality/value – This study is the first to show that feelings of accountability and rating level
are influenced by incentives, and that the audience of the ratings can determine whether incentives
result in lower or higher ratings. Furthermore, it appears that the tendency to inflate ratings given a
ratee audience may be quite powerful, even in the absence of specific incentives.
Keywords Motivation (psychology), Performance measures
Paper type Research paper

One of the most frequent complaints regarding performance appraisal is that ratings
are inflated (Ilgen and Feldman, 1983; Landy and Farr, 1983; Murphy and Cleveland,
1995). Murphy and Cleveland (1995) state that 80 percent of all ratings conducted on a
7-point scale are either a six or seven. Bretz et al. (1992) sampled companies regarding
their performance appraisal systems and found that 77 percent of the sampled
companies thought that lenient appraisals jeopardized their appraisal system. Lenient
or inflated ratings make it difficult to compare employees for promotion or
Journal of Managerial Psychology
Vol. 20 No. 8, 2005
The author would like to thank Susan Adams, Donhwa Lee, and Laura Roof for their assistance pp. 695-711
q Emerald Group Publishing Limited
in data collection. This work was partly completed while the author was a faculty member at the 0268-3946
Illinois Institute of Technology. DOI 10.1108/02683940510631453
JMP compensation purposes, validate selection instruments, identify training needs, or
20,8 justify termination decisions. An increased understanding of the motivational factors
influencing rating inflation may help organizations design performance appraisal
environments that reduce rating inflation.
Harris (1994) reviewed the motivational factors influencing performance ratings
and identified accountability as one such factor. Even though research has provided
696 some preliminary explanations of how accountability can influence performance
ratings, many questions still remain. The present study focuses on resolving questions
regarding two aspects of accountability: expectations of different audiences that will
view the ratings and incentive.

Accountability
Most researchers investigating accountability base their work on Tetlock’s extensive
social psychological research investigating accountability. Tetlock (1983) defined
accountability in terms of the pressure to justify one’s opinions to others. Implicit in
this definition is the idea that accountability implies consequences, either positive or
negative (Lerner and Tetlock, 1999).
Tetlock (1985) proposed that individuals who feel accountable use more complex
cognitive strategies and base their opinions and impressions on data, as long as they
do not know the views of their audience (to whom they are accountable). Empirical
work has found that accountable individuals use more self-critical strategies
(Simonson and Nye, 1992; Tetlock, 1983) and form more complex impressions (Tetlock
and Kim, 1987). On the other hand, if accountable individuals know the views of their
audience, they will not engage in complex information processing but will simply
present impressions or opinions that agree with their audience (Tetlock, 1983; Tetlock
et al., 1989).
Furthermore, Lerner and Tetlock (1999), in their review of the accountability
literature, state that accountability is not a unitary phenomenon but do not decompose
accountability into its components. They do, however, categorize accountability
manipulations into four categories:
(1) mere presence;
(2) reason giving;
(3) identifiability; and
(4) evaluation.

Performance appraisal accountability research


The performance appraisal research investigating accountability can also be placed in
these four categories. The first category, mere presence, includes research in which
participants expect that another will observe their performance. Performance appraisal
researchers have investigated mere presence in the context of rating audience;
however, it should be noted that in this context, the accountability effect is probably
not only due to mere presence but also evaluation apprehension and perhaps even
reason-giving and identifiability. Specifically, this research has addressed the question
of whether face-to-face feedback leads to inflated ratings if the person receiving the
feedback is the ratee (ratee audience) or more accurate ratings if the person receiving
the feedback is an expert and not the person being rated (expert audience). Most of
the research using this manipulation investigated a ratee audience, but in this research Motivational
participants did not actually deliver the performance appraisal to the ratee; they only factors
were led to believe that they would.
Fisher (1979), Ilgen and Knowlton (1980), Klimoski and Inks (1990) and Waung and
Highhouse (1997) found that informing raters that they will discuss their ratings in a
face-to-face meeting with the ratee resulted in inflated ratings, especially for low
performing ratees. Jones (1992) informed accountable raters that their ratings would be 697
discussed not with the ratee but with an expert and informed other raters that they
were free to leave after providing the ratings. In this study, the accountable raters
provided more accurate ratings than nonaccountable raters, as long as they were not
given information regarding the ratee’s previous performance. Accurate ratings,
however, do not necessarily imply lower ratings. Very little relationship exists between
accuracy and rating errors, including leniency (Murphy and Balzer, 1989).
In the most recent study investigating accountability using a face-to-face meeting,
Curtis et al. (2005) investigated the effects of upcoming meetings with the ratee and
with the experimenter. They found that participants who believed that they would be
speaking with the experimenter regarding the ratings they assigned provided
significantly higher ratings than raters informed that they would be speaking with the
ratee. It is not clear, however, whether either condition was significantly different from
the control condition. Thus, it appears that face-to-face feedback to a ratee, especially a
low performing one, will result in inflated ratings but it is not entirely clear whether
feedback to an expert will result in significantly lower ratings.
The second category of accountability manipulations, reason-giving, has also been
used to investigate performance ratings. Lerner and Tetlock (1999) define
reason-giving as informing accountable individuals that they must give reasons for
what they say or do. Only two studies investigating performance appraisal have used
this manipulation, one involving a ratee audience (Klimoski and Inks, 1990) and
another involving an expert audience (Mero and Motowidlo, 1995). Klimoski and Inks,
in addition to investigating the effect of a face-to-face meeting with the ratee, also
included an anonymous written feedback condition. They found no significant
difference between ratings assigned by raters in an anonymous written feedback
condition and a control condition. Based on this finding, Klimoski and Inks concluded
that face-to-face feedback is needed to induce rating inflation. The only other study
investigating reason-giving asked accountable participants to provide an identifiable
written justification of their ratings to an expert (Mero and Motowidlo, 1995). Mero and
Motowidlo found that raters required to provide an identifiable written justification of
their ratings assigned more accurate ratings but not lower ratings. In analyzing the
process data from this study, Mero et al. (2003) found that accountable raters attended
more to ratee behaviors and took better notes, and that these behaviors partly mediated
the relationship between accountability and rating accuracy. Nonetheless, whether
reason-giving alone can result in adjustments in rating level remains unclear; the
existing research indicates that it may not.
Lerner and Tetlock (1999) also listed identifiability as a method of manipulating
accountability. According to Lerner and Tetlock, under conditions of identifiability
individuals expect what they say or do can be linked to them personally. Not much
performance appraisal research has investigated accountability using identifiability.
One such study, however, was conducted by Antonioni (1994). Antonioni investigated
JMP upward feedback in an organizational setting and found that when asked to provide
20,8 their names along with ratings of their supervisors, not surprisingly, raters inflated
their ratings. Also, as mentioned earlier, Mero and Motowidlo (1995) asked their
accountable participants to write identifiable paragraphs and, thus, used a
combination of both reason giving and identifiability to manipulate accountability
but they found no effect on rating level. Another study investigating indentifiability,
698 Roch et al. (2005), found that identifiability and conscientiousness interacted in
predicting rating level in a classroom setting in which students rated their instructor’s
performance. Students relatively low in conscientiousness provided higher ratings
when they could be identified but raters relatively high in conscientiousness were not
influenced by identifiability.
Finally, Lerner and Tetlock (1999) identified evaluation as a type of
accountability manipulation. According to Lerner and Tetlock, for evaluation to
influence accountability, participants must expect that their performance will be
assessed by another according to some normative rules with implied consequences,
and this type of accountability pressure may be especially useful to organizations
wishing to reduce rating inflation. One possible consequence in an organizational
setting is a financial incentive, such as a bonus or raise, based rating quality. Only
two studies have directly investigated the use of incentives, such as a financial
incentive. One of the two studies, Salvemini et al. (1993), did not conceptualize their
study in terms of accountability but in terms of motivation. They found that
participants who received a financial incentive provided more accurate ratings, thus
including both assessment by another according to normative rules (assessment by
experts for accuracy) and consequences (money). Even though it appears that
participants with a financial incentive provided slightly higher average ratings than
the control participants, unfortunately, Salvemini et al. do not report whether this
difference was significant.
Roch and Woehr (1997) also used evaluation to manipulate accountability to both
ratee and expert audiences. They informed raters either that:
.
their ratings would be given to the ratee who would assign an acceptability score
to their ratings or
. their ratings would be given to an expert who would assign an accuracy score.

Half of the participants were given the opportunity to receive money based on their
acceptability or accuracy score. Roch and Woehr found no rating inflation for the ratee
audience condition; however, it may be that their raters would have inflated their
ratings if their instructions clearly indicated that “acceptable” ratings were ones that
match the ratee’s self ratings. On the other hand, they found a significant main effect
for the expert audience on rating level (but no interaction with incentive) and concluded
that the knowledge that one’s ratings will be compared with expert ratings is sufficient
to induce lower ratings. This finding raises the question whether knowing that one’s
ratings will be compared with the ratings of different appraisal audiences is sufficient
to influence ratings or is an incentive necessary. Even though Roch and Woehr (1997)
did not find an interaction between their audience and incentive manipulations, they
did find a three-way interaction between these two manipulations and another study
manipulation, effort. Thus, it is not clear what kind and whether incentives are needed
for ratings to be influenced by potential audiences of the performance ratings.
The research discussed thus far examines various accountability pressures to only Motivational
one type of audience at a time. It may often be the case that raters are aware that both factors
the ratee and their immediate supervisor will view their ratings. For example, both the
ratee and the rater’s supervisor will most likely receive copies of the performance
appraisal. In this situation, individuals should experience conflict, especially when one
party, the supervisor, represents a pull towards more accurate ratings (as long as raters
believe their supervisor expects accurate ratings) and the other party, the ratee, 699
represents a pull toward more inflated ratings. Rating pressure due to a dual audience
has only been examined in one study (Curtis et al., 2005). Using an upcoming
face-to-face meeting with both the ratee and the experimenter to manipulate
accountability, they found that even though this dual accountability condition
provided more lenient ratings than a control condition, this difference was not
significant. The present research will also investigate dual accountability.

Present research and hypotheses


Research questions
The present research addresses four questions based on the existing literature
investigating rater accountability. The first question is whether Klimoski and Inks
(1990) are correct in asserting that a face-to-face feedback session is needed to induce
rating inflation. Antonioni (1994) investigated identifiable written feedback to the ratee
(supervisor in this case) and found rating inflation. Nonetheless, it could be possible
that the subordinates in the Antonioni study feared a potential face-to-face meeting
with their supervisor regarding their ratings; therefore, this study does not eliminate
the possibility that a face-to-face meeting is needed to induce rating inflation given a
ratee audience. On the other hand, Roch and Woehr (1997) used evaluation to
manipulate accountability and found no rating inflation in their ratee audience
condition but it may be that their experimental manipulation was confusing. The
ratees may have been confused by the term “acceptable ratings”. Thus, the question of
whether a face-to-face feedback session is the only motivational incentive that can
result in inflated ratings still remains unanswered.
This question is closely related to the broader question of whether an interpersonal
incentive is needed to induce rating accountability and changes in rating level.
Tetlock’s research and almost all of the research investigating accountability has
focused on interpersonal pressure, based on the desire to please another person, but it
is possible that other pressures may also induce feelings of accountability and,
therefore, changes in rating level. Specifically, the question of whether other incentives
result in changes in rating levels similar to those found using interpersonal
manipulations of accountability has not been adequately answered and will be
addressed in the present study. Salvemini et al. (1993) provided financial incentives for
accurate ratings, and the incentives resulted in more accurate ratings but not lower
ratings. Roch and Woehr (1997) also used financial incentives but only found a weak
three-way interaction between audience, incentive, and another study variable, effort.
The last two questions address not the specific pressures that induce feelings of
accountability but the effects of accountability. It is clear that accountability can
contribute to rating inflation given a ratee audience but whether it can result in lower
ratings given an expert audience, such as a supervisor or any other person who is
interested in accurate ratings, is less clear. The only study that found expectations of
JMP an expert audience results in significant lower ratings is Roch and Woehr (1997). Can
20,8 this study be replicated?
The last question addressed in the present research has only been addressed by one
previous study (Curtis et al., 2005) but with insignificant findings and using an
interpersonal manipulation of accountability. What is the effect of a dual audience? Can
a dual audience moderate ratings?
700
Hypotheses
First of all, Lerner and Tetlock (1999) speculated that a consequence is needed for
accountability to have an effect. If there is evaluation without consequences, most likely
raters will not feel a need to please the rating audience. The greater the consequence, the
more likely raters will try to please the rating audience or, in other words, experience
accountability. Thus, without a consequence, raters should not feel accountable. There
are many different ways to incorporate consequences into the rating process. Murphy
and Cleveland (1995) suggest that if an organization values accuracy, rewards for
accuracy should be provided to employees, such as salary increases, promotions, and
assignments to choice positions. Even though directly assessing accuracy can be
difficult in an organizational setting, Murphy and Cleveland suggest evaluating raters
based on how well they follow rating procedures, with the assumption that to the extent
that raters’ strategies for evaluating performance closely mirror procedures more likely
to results in accurate ratings, the more accurate the ratings.
Thus, though an incentive, such as a financial incentive either in the form of a bonus
or salary increase, may not provide exactly the same motive to match expert or self
ratings as an interpersonal incentive, the end result should still be the same. Given an
incentive, raters should feel increased accountability and adjust their ratings in the
same manner as when experiencing accountability that is interpersonal in origin.
However, without an incentive, raters may not experience much accountability and
thus not change their rating levels.
H1. An incentive is needed for changes in rating levels to occur.
Specifically, a ratee audience may place pressure on the rater to match the ratee’s
expected self ratings, mostly likely inflated ratings. The 3608 feedback literature
investigating the convergence between self, supervisor, and peer ratings consistently
has found that self-ratings tend to be the highest ratings (Conway and Huffcutt, 1997;
Harris and Schaubroeck, 1988). This effect may be due to the possibility that most
people have an inflated view of themselves. Taylor and Brown (1988) propose that
most people have more positive views of themselves, their abilities, and their control
over the environment than warranted by an objective appraisal. Furthermore, many
self-serving biases have been identified that help to maintain this overly positive view
of ourselves. Among these biases is the tendency to see ourselves as better than
average on any dimension that is both subjective and desirable, (Allison et al., 1989). It
may be that the general public also recognizes this tendency. Thus, given an incentive
to match self-ratings, raters may feel pressure to provide inflated ratings.
H2a. Given an incentive, a ratee audience should result in inflated ratings.
On the other hand, raters given an incentive to match expert ratings may have more
difficulty in determining the type of rating that would agree with their audience. Thus,
they should put forth more effort and evaluate the data (Tetlock, 1985), which is ratee Motivational
performance in this case. Given that most ratings are inflated (Murphy and Cleveland, factors
1995) and assuming that actual performance is lower than indicated by most ratings, it
may be that by evaluating performance carefully, raters held accountable to an expert
audience provide less inflated ratings.
H2b. Given an incentive, an expert audience should result in deflated ratings.
701
Raters in the dual audience condition, however, should feel pressure from the ratee
audience to provide inflated ratings but pressure from the expert audience to provide
lower ratings should modify the inflationary pressure from the ratee audience, thus
resulting in moderate ratings – lower than if they experienced a ratee audience only
and higher than if they had experienced a expert audience only.
H3. Given an incentive, raters whose ratings will be compared with both ratees’
self ratings and experts’ ratings (dual audience) should provide moderate
ratings.

Method
Participants
The participants were 150 undergraduate psychology students from a private
mid-western university who participated for extra credit. One person could not be
included because of missing data. One hundred and three men and 45 women
participated in the study. One person did not complete the gender question.

Materials
Stimulus material. Presentation of ratee performance was via a 25-minute videotape.
This videotape presented four people playing roles in an assessment center allocation
exercise in which each person represented a bureau in an imaginary government. The
purpose of the exercise was to distribute a $10 million surplus among the bureaus.
Performance ratings. The performance rating measure consisted of six Likert type
scales. Each scale corresponded to one of six assessment center dimensions: oral
communication, influencing others, problem solving, flexibility, organizing and
planning, and team building. Clark et al. (1992) developed these dimensions for an
assessment center. Each dimension was rated on a scale of 1-7 (1 ¼ poor to
7 ¼ excellent). A seventh “overall” evaluation scale also was included (1 ¼ poor to
7 ¼ excellent). Two videotaped assessment center participants were rated, one man
and one woman.
The averages of the six dimension ratings for the male and female ratees were used
to determine rating level. Even though the six dimensions conceptually assessed
different constructs, the ratings were highly correlated. Rating dimensions are often
highly correlated (Murphy and Cleveland, 1995). The rating index created by
averaging the ratings resulted in a Cronbach a of 0.75 for the male ratee and 0.80 for
the female ratee.
Questionnaire. The questionnaire consisted of 13 items assessing demographic
characteristics, understanding of the experimental task, and a series of seven items
designed to assess raters’ perceptions of the task. See the Appendix for the seven
perception questions.
JMP Design and procedure
20,8 A 3(rating audience) £ 2(incentive) £ 2(ratee) factorial design was used with ratee as
the repeated measure. A control condition, not experiencing any of the experimental
manipulations, was also included to determine whether rating inflation or deflation
occurred. All participants, except the control participants, were randomly assigned to a
rating audience condition and an incentive condition before arrival.
702 Rating audience. Participants were randomly assigned to meet the expectations of
one of three audience conditions: expert, ratee, and dual. In the expert condition,
participants were informed that their ratings will be compared with expert ratings, and
in the ratee condition, participants were informed that their ratings will be compared
with self ratings provided by the ratees. In the dual condition, participants were
informed that their ratings will be compared with self ratings provided by the ratees
and with expert ratings.
Incentive. Participants were either:
.
told that the ten people whose ratings come closest to the relevant comparison
ratings (rating audience condition) would receive $15.00 each or
.
not told of an incentive.

All participants participated in groups of 1-5. Participants were instructed that they
will watch a 25-minute videotape of an assessment center group allocation exercise and
that their task will be to rate two of the individuals shown in the videotape. All
participants were instructed to watch the same two individuals, one man and one
woman. Participants also received instructions regarding their respective experimental
condition. All instructions were presented both verbally and in written form.
After watching the videotape, participants completed two rating forms, one for each
ratee. The name of the person to be rated was on the rating form to ensure that half of the
participants rated the male ratee first and the other half rated the female ratee first.
After participants finished rating, they completed the questionnaire assessing
demographics and perceptions of the task. The experimental session lasted
approximately an hour.

Results
Rating data
To investigate the effect of the experimental manipulations on performance ratings, a
2(incentive) £ 3(appraisal audience) £ 2(ratee) repeated measures ANOVA was
performed with the average rating for each ratee as the repeated measure. A
repeated measures ANOVA was used instead of an MANOVA based on a
recommendation by Maxwell and Delaney (1990). Maxwell and Delaney recommend
that in cases consisting of a relatively small number of participants in comparison to
the number of factors, a repeated measures ANOVA is preferable to a MANOVA
because an ANOVA has more power.
The repeated measures ANOVA revealed no main effects or interactions for ratee.
Thus, to simplify data interpretation, data were collapsed across ratees for all further
analyses. A 2(incentive) £ 3(appraisal audience) between factors ANOVA revealed a
significant two-way interaction between incentive and appraisal audience,
F(2,108) ¼ 9.28, p , 0.001. To interpret this interaction, three one-way ANOVAs
were performed, one for each rating audience.
The one way ANOVAs for the expert and dual audience conditions revealed Motivational
significant main effects for incentive, F(1,109) ¼ 4.35, p , 0.05, F(1,109) ¼ 4.43, factors
p , 0.05, respectively, but the ANOVA for the ratee audience condition did not reveal a
significant main effect, F(1,109) ¼ 0.36, ns See Table I for the means and standard
deviations by condition. These results indicate that when provided an incentive, raters
in the expert audience condition provided significantly lower ratings, raters in the dual
condition provided significantly higher ratings, and raters in the ratee condition 703
provided essentially the same ratings as their respective counterparts in the no
incentive condition.
Next, the ratings were compared with control raters, who did not experience being
held to the expectations of a rating audience or an incentive, to determine which ratings
were significantly inflated or deflated. Surprisingly, in the expert condition, only the
raters not offered a financial incentive provided ratings significantly different from
control raters, inflated ratings in this case. When offered an incentive, raters in the
expert audience condition significantly lowered their ratings, but these ratings were
not significantly lower than the ratings assigned by control raters. In the ratee
audience condition, regardless of incentive, raters provided inflated ratings relative to
the control ratings. Lastly, in the dual audience condition, raters provided ratings
significantly inflated relative to the control raters only when offered an incentive.
Furthermore, it should be noted that when offered a financial incentive, raters in the
dual audience condition did not provide significantly lower ratings than the ratee
audience condition but they did provide significantly higher ratings than the expert
condition (Figure 1).
In summary, H1, stating that an incentive is needed to induce changes in rating
level according to rating audience, received mixed support. The results show that a
financial incentive resulted in different rating patterns for the raters in the expert and
dual audience conditions, but not for raters in the ratee audience condition. H2a stating
that given an incentive, a ratee audience should result in inflated ratings was
supported. H2b stating that given an incentive, an expert audience should result in
deflated ratings was only partly supported. Even though the expert audience condition

Incentive
Audience condition Yes No Mean

Expert 4.37a 4.89b 4.61


(0.52) (0.58) (0.60)
n ¼ 21 n ¼ 18 n ¼ 39
Ratee 4.76b 4.92b 4.84
(0.51) (0.48) (0.49)
n ¼ 21 n ¼ 18 n ¼ 39
Dual 4.83b 4.26a 4.57
(0.66) (0.59) (0.68)
n ¼ 20 n ¼ 17 n ¼ 37
Mean 4.65 4.70
(0.59) (0.62) Table I.
n ¼ 62 n ¼ 53 Mean ratings (and
standard deviation) by
ab
Note: Different letters denote significant differences experimental condition
JMP
20,8

704

Figure 1.
Mean rating by rating
audience and incentive

provided significantly lower ratings when offered an incentive, these ratings were not
significantly lower than those of the control condition. Lastly, H3 was also partially
supported. Raters in the dual condition did provide significantly inflated ratings, even
though the ratings were not significantly different from those provided by raters in the
ratee audience condition.

Perception questions
Participants rated the perceptions questions in the post-experimental questionnaire on
a seven point scale from (1) not at all to (7) very much or (1) strongly disagree to (7)
strongly agree, depending on the wording of the question. Originally, the items asking
to what extent participants felt accountable, responsible, and answerable were to be
combined into an accountability index; however, the correlation between responsible
and the other two variables was smaller than the correlation between accountable and
answerable. Furthermore, the Cronbach a was slightly higher for the two item index
than for the three item index. Upon further reflection, it appears that “responsible” may
be broader term than either accountable or answerable. In an effort to create the best
possible composite index of accountability, only accountable and answerable were
combined, resulting in an index with a Cronbach a of 0.81. A 2(incentive) £ 3(rating
audience) ANOVA with the accountability index as the dependent variable revealed a
significant main effect for incentive, with participants in the incentive condition
reporting significantly more accountability than participants in the no incentive
condition. Similar ANOVAs were conducted on the remainder of the perception
questions. There were no other significant main effects or interactions. Table II
presents the means by incentive condition.
Lastly, correlations between the perception questions and ratings were computed.
The accountability index and a question asking about accountability to the ratee were
significantly and positively correlated with performance ratings. The correlations
between the perception questions and rating level can be found in Table III.
Discussion Motivational
The results generally support the hypotheses, especially the hypotheses directly based factors
on previous research. Two aspects of this study that have received limited research
attention, dual audience and no rating incentive, resulted in some surprising findings.
Thus, the findings both replicate and expand upon the previous literature.
As predicted, raters whose ratings were compared with expert ratings significantly
lowered their ratings when provided with an incentive in comparison to raters not 705
offered an incentive. Thus, the present results suggest that if consequences are
attached to whether raters match expert ratings, such as supervisor ratings, raters will
assign lower ratings. But, even though raters in the expert condition significantly
lowered their ratings when provided with an incentive, they did not assign ratings
significantly lower than control raters. In order to conclusively say that the
expectations of an expert audience (or any of the other conditions) produces deflated or
inflated ratings, the ratings must be significantly different from control raters, and this
was not the case, at least not when offered a financial incentive. Surprisingly, in the
absence of an incentive, raters faced with an expert audience provided significantly
inflated ratings relative to control raters. This finding was unexpected and future
research is needed to both replicate this finding and to provide an explanation.
Nonetheless, the expert condition closely resembled the expert condition in the Roch
and Woehr (1997) study, in which the researchers found that raters held to the
expectations of an expert audience provided significantly lower ratings than control

Incentive No incentive
Question Mean SD Mean SD F p

Accountable/answerable 5.22 1.51 4.68 1.40 4.10 0.045


Responsible 5.23 1.62 5.02 1.35 0.50 0.48 Table II.
Accountable to ratee 4.15 1.89 4.60 1.83 1.59 0.21 Means, standard
Accountable to experts 4.19 1.83 3.88 1.65 0.90 0.35 deviations, and
Deserve rating 6.10 1.37 5.53 1.69 3.62 0.06 significance tests for the
Higher rating 2.54 1.67 2.10 1.19 2.62 0.11 general perception items

Average Accountable/ Accountable Accountable Deserve


Question rating answerable Responsible to ratees to experts rating

Accountable/
answerable 0.21 *
Responsible 0.14 0.57 * *
Accountable
to ratees 0.24 * * 0.42 * * 0.28 * *
Accountable
to experts 0.04 0.44 * * 0.28 * * 0.42 * *
Deserve rating 0.03 0.07 0.13 20.04 0.07 Table III.
Higher rating 0.14 0.09 2 0.03 0.09 0.18 * 2 0.26 * * Correlations between all
perception items and
Note: *p , 0.05; * *p , 0.01 rating level
JMP raters. As mentioned, in the present study, the raters in the expert audience condition
20,8 also assigned lower ratings but only when given an incentive and even then, they did
not assign significantly deflated ratings. The difference between these two studies may
be a function of power: The Roch and Woehr study had almost three times as many
subjects. An examination of the effect sizes suggests that this may be the case. Relative
to the control raters, the effect of being offered a financial incentive to match expert
706 ratings was very small (d ¼ 0.08), but even though Roch and Woehr found a slightly
larger effect (d ¼ 0.20), it was still a small effect. Thus, it appears that, at best, pressure
to match expert ratings results in a small reduction in rating level, relative to
individuals not experiencing any pressure. Of course, other manipulations of
accountability to an expert may result in larger effects. Future research is needed to
determine if this is the case.
Furthermore, as hypothesized, raters in the dual audience condition who were
provided with an incentive assigned significantly inflated ratings relative to control
and expert raters, even though their ratings were not significantly different from those
provided by raters in the ratee condition. To put this effect in perspective, the dual
audience raters did not assign ratings significantly different from control raters in the
absence of a financial incentive but significantly inflated their ratings when offered an
incentive. It may be that when given an incentive, dual audience raters weighed the
ratee’s self-ratings more heavily than when not offered an incentive. In the absence of
an incentive, it may be that raters were not motivated to spend much time thinking
about the type of ratings expected by the various audiences and provided ratings
similar to those of the control condition. These results replicate and expand upon the
findings of Curtis et al. (2005), who did not find significant differences between a
control condition and a dual audience condition using an upcoming face-to-face
meeting with both the ratee and the experimenter to manipulate accountability. It may
be that this face-to-face meeting in a laboratory setting with the ratee and experimenter
was not a sufficient incentive for ratees to significantly inflate ratings. It appears that a
stronger incentive, such as a financial incentive, is needed for raters to inflate
performance ratings in a dual accountability situation than when held accountable
only to the ratee.
As predicted, raters in the ratee audience condition given an incentive generated
inflated ratings. This finding is significant for several reasons. First, it expands on
previous research investigating ratee accountability by showing that a face-to-face
meeting with the ratee is not needed for raters to inflate their ratings. Previous
research (Ilgen and Knowlton, 1980; Klimoski and Inks, 1990; Waung and
Highhouse, 1997) found that raters held accountable to the ratee provided
significantly inflated ratings, but only when required to provide face-to-face
feedback. Previous research has unsuccessfully investigated whether other
manipulations of accountability can cause rating inflation. For example, Klimoski
and Inks (1990) found no significant rating difference between control and
anonymous written ratee feedback conditions, and Roch and Woehr (1997) found no
significant difference between their control and ratee audience conditions. Even
though the Roch and Woehr study may appear similar to the present study, as
mentioned earlier, they may have confused raters by asking them to assign ratings
acceptable to the ratee. The task in the present study was more easily understood:
provide ratings matching the ratee’s self-ratings.
Another important but surprising finding is that raters not given an incentive but Motivational
only told that their ratings will be compared with the ratee’s self ratings also provided factors
significantly inflated ratings, indicating that rater inflation can occur even without a
specific incentive. It should be noted, however, that the no incentive condition was not
devoid of motivational pressure. All participants received extra credit to participate in
this experiment. Even though the extra credit was not contingent on their performance,
participants nonetheless may have experienced motivational pressure. It may be that 707
this situational pressure, slight as it may be, was sufficient for participants to feel some
accountability. The perception questions provide support for this possibility; nearly all
participants experienced accountability. The means for the questions assessing to
what extent participants felt accountable and answerable were five or above on a
7-point scale.
Thus, the present results show that even when relatively naı̈ve raters are asked to
provide ratings matching ratees’ self ratings, they provide inflated ratings, even
without much incentive. It may be that raters knew the ratee audience’s views – high
ratings. As mentioned earlier, we know from previous research that people often have
more positive views of themselves, their abilities, and their control over the
environment than what others perceive (Taylor and Brown, 1988). Furthermore, the
research investigating 3608 feedback has found that self ratings tend to be inflated in
comparison with other rating sources, such as supervisors or peers (Conway and
Huffcutt, 1997; Harris and Schaubroeck, 1988). The present research suggests that the
finding that self-ratings are inflated may be common knowledge and that whenever a
rater wishes to provide ratings close to a ratee’s self-ratings, either to please the ratee or
for any other purposes, the rater will provide inflated ratings. Thus, it is not surprising
that 80 percent of all ratings conducted on a 7-point scale are either a six or seven
(Murphy and Cleveland, 1995), and that one of the most frequent complaints about
performance appraisal is that ratings are inflated (Ilgen and Feldman, 1983; Landy and
Farr, 1983; Longenecker et al., 1987; Murphy and Cleveland, 1995).
It may be that it takes less cognitive effort to simply give inflated ratings when
given the task of matching an audience’s self ratings than to closely view the
performance to determine how experts or experts and ratees rated the performance. In
other words, even though the situation had adequate motivational pressure to induce
rating inflation for raters held accountability to a ratee audience regardless of
incentive, an incentive may have been needed to motivate the raters held accountable
to an expert or dual audience. Because it is relatively cognitively demanding to
determine either expert ratings or expert ratings and self ratings, raters in these
audience conditions may not have been motivated to put forth the cognitive effort
needed to determine how experts rated or experts and the ratee rated until given a
financial incentive, resulting in different rating patterns for the incentive versus no
incentive conditions for these audience conditions. Thus, when faced with a more
cognitively demanding task than only matching self ratings, an incentive appears to be
needed to influence rating levels.
Furthermore, the present research shows that interpersonal incentives, such as a
face-to-face feedback session, are not the only way to induce feelings of accountability.
Participants in the incentive condition reported feeling significantly more
accountable/answerable than participants in the no incentive condition. It should be
noted, however, that the type of rating audience did not directly relate to feelings of
JMP accountability. It may be that the type of audience does not directly determine the
20,8 amount of accountability but only the direction of the accountability effect.
Also, in the present study, the incentive consisted of a financial bonus to the raters
who provided ratings that most closely resembled their audience’s ratings – expert,
ratee, or dual. In an organizational setting, it would be possible to provide similar
financial incentives, but in this case the expert would most likely be an upper level
708 supervisor or a human resource professional. If the upper level supervisor or HR
professional cannot directly assess the accuracy of the manager’s ratings, the
supervisor or HR professional could still distribute a bonus based on factors such as
perceived rating quality, as evidenced by how well the rater followed the rating
procedures, to promote less inflated ratings. As Murphy and Cleveland (1995) suggest,
if raters follow strategies that should result in more accurate ratings, most likely the
ratings will be more accurate. Other ways of assessing rating quality include
examining other indicators of performance. For example, if subordinates are rated
highly, do any objective indicators of their performance also suggest good performance
or if a rater recommends a subordinate for promotion based on superior performance
ratings, does this subordinate receive good ratings in his/her new position? Thus, it is
possible to provide incentives that may reduce rating inflation in an organizational
setting.
Furthermore, even though financial incentives are a convenient method of
promoting less inflated performance ratings, other incentives also could be used. For
example, as suggested by Murphy and Cleveland (1995) how well a rater conducts
performance evaluation could be considered part of the rater’s performance ratings or
tied to valued outcomes, such as a choice position, office space, or even parking space.
More research is needed to determine whether these incentives have similar influences
on rating level. Nonetheless, it appears that incentives can play an important role in
determining feelings of accountability and rating level.
The present study has some potential limitations. One concern shared by all
laboratory research is limited external validity. The present results have unknown
generalizability to other subject populations and research settings. Only replication
can determine the generalizability of this study, but as Dipboye (1990) reminds us, all
studies, both laboratory and field, need to be replicated to establish generalizability.
Also, as Mook (1983) mentions, the purpose of some research is to test theories and not
to generalize. The present study falls in that category. The purpose of the present
research was to investigate the combined effects certain rating audiences and incentive
on rating level within an accountability framework. Future research is needed to see if
the results generalize to other populations. It should be noted, however, that the length
and amount of contact with the ratee was not unlike that found in assessment centers.
Raters conducting performance appraisals in organizations, however, do have more
contact with their ratees, and the impact of accountability may be even greater in
organizational settings. In the present study, the raters did not know their ratees, yet
they all felt accountable. In organizations, raters know the ratee and the expert (i.e.
supervisor), which should magnify the effects revealed in the present study. Thus, even
though the effect of accountability may be greater in the field than in the laboratory,
the direction of the effect itself should be generalizable.
Furthermore, it should be noted that in the present study the focus was on rating
level and not accuracy. Without true scores, it is impossible to determine rating
accuracy, and it is difficult to obtain true scores in most organizations. Thus, in the Motivational
present study the focus was on how to structure the rating environment to reduce factors
rating inflation, which would allow employers to discriminate between employees for
purposes such as promotion or compensation. It is difficult to discriminate between
employees when 80 percent of employees are rated six or higher on a 7-point scale.
Lastly, the experts in this study were not clearly identified. In an organizational
setting, the experts are usually the rater’s supervisors; however, not all supervisors 709
value accurate ratings. Some supervisors may expect other types of ratings from the
raters. For example, if the rater knows that his/her supervisor thinks highly of one of
the ratees, the rater may assign inflated ratings to please the supervisor. Longenecker
et al. (1987) showed that often managers do not value accurate ratings. Thus, the
results regarding experts may generalize only to supervisors who communicate that
they value accuracy and who do not express prior opinions regarding the ratees.
Further research is needed to determine if this is indeed the case. The results, however,
demonstrate that if upper level managers wish for their subordinates to provide less
inflated ratings, it may be beneficial to publicly espouse the value of providing
accurate ratings, even though the exact accuracy of the ratings may not be easily
determined.
In summary, the present research reveals that feelings of accountability and rating
level can be manipulated via incentives, demonstrating that incentives other than the
type of interpersonal incentives traditionally included in the study of accountability
can influence rating level and feelings of accountability. Furthermore, the present
research suggests that the tendency to inflate ratings may be quite powerful, even in
the absence of specific incentives. The finding that just telling raters that their ratings
will be compared with the ratee’s self ratings causes rating inflation should be of
interest to both researchers and practitioners. This is especially relevant in
organizations in which the ratees provide self-ratings and the rater’s supervisor
does not examine the ratings. In this type of situation, there is much pressure to inflate
ratings. The present research suggests that one simple way to counteract this inflation
pressure is by involving another audience in the rating process. The dual
accountability raters did not provide inflated ratings when no incentive was offered,
indicating that in situations in which no financial incentives are offered, as in most
organizational settings, simply telling the raters that both the ratee and their
supervisor will view the ratings may be sufficient to reduce the rating inflation, as long
as the supervisor espouses the importance of accurate ratings. It appears that the worst
possible environment for rating inflation is one in which raters’ managers do not
express any interest in reviewing the performance ratings, and only the ratee reviews
the ratings.
The present results also have implications for assessment centers. The results
reinforce the importance of requiring more than one rater to rate each assessment
center participant. By having several raters view a participant’s behavior, preferably
the same behavior, raters should feel dual accountability – to their fellow raters, who
should value accuracy in this situation, and to the assessment center participant,
especially if the rater must give feedback to the participant. More research is needed to
explore dual accountability. The more we learn about how the rating context can
influence ratings, the better we can structure the rating context to produce the type of
ratings we deem appropriate.
JMP References
20,8 Allison, S.T., Messick, D.M. and Goethals, G.R. (1989), “On being better but not smarter than
others: the Muhammad Ali effect”, Social Cognition, Vol. 7, pp. 275-96.
Antonioni, D. (1994), “The effects of feedback accountability on upward appraisal ratings”,
Personnel Psychology, Vol. 47, pp. 349-56.
Bretz, R.D. Jr, Milkovich, G.T. and Read, W. (1992), “The current state of performance appraisal
710 research and practice: concerns, directions, and implications”, Journal of Management,
Vol. 18 No. 2, pp. 321-52.
Clark, K., Arthur, W. Jr, Bennett, W. Jr and Hedley-Goode, A. (1992), Designing a Higher
Education Administrator Developmental Workshop for College Faculty, Texas A&M
University, College Station, TX.
Conway, J. and Huffcutt, A.I. (1997), “Psychometric properties of multisource ratings: a
meta-analysis of subordinate, supervisor, peer, and self-ratings”, Human Performance,
Vol. 10 No. 4, pp. 331-60.
Curtis, A.B., Harvey, R.D. and Ravden, D. (2005), “Sources of political distortions in performance
appraisals”, Group and Organization Management, Vol. 30, pp. 42-60.
Dipboye, R.L. (1990), “Laboratory vs field research in industrial and organizational psychology”,
in Cooper, C.L. and Robertson, I.T. (Eds), International Review of Industrial and
Organizational Psychology, Wiley, New York, NY, Vol. 5, pp. 1-34.
Fisher, C.D. (1979), “The transmission of positive and negative feedback to subordinates: a
laboratory investigation”, Journal of Applied Psychology, Vol. 64, pp. 533-40.
Harris, M.M. (1994), “Rater motivation in the performance appraisal context: a theoretical
framework”, Journal of Management, Vol. 20 No. 4, pp. 737-56.
Harris, M.M. and Schaubroeck, J. (1988), “A meta-analysis of self-supervisor, self-peer, and
peer-supervisor ratings”, Personnel Psychology, Vol. 41, pp. 43-62.
Ilgen, D.R. and Feldman, J.M. (1983), “Performance appraisal: a process focus”, in Cummings, L.
and Staw, B. (Eds), Research in Organizational Behavior, Vol. 5, JAI Press, Greenwich, CT.
Ilgen, D.R. and Knowlton, W.A. Jr (1980), “Performance attributional effects on feedback from
superiors”, Organizational Behavior and Human Performance, Vol. 25, pp. 441-56.
Jones, R.G. (1992), “Attention allocation choices as an influence on accuracy and discriminant
validity of observational ratings: when are we diligent?”, unpublished doctoral
dissertation, Ohio State University, Columbus, OH.
Klimoski, R. and Inks, L. (1990), “Accountability forces in performance appraisal”,
Organizational Behavior and Human Decision Processes, Vol. 45, pp. 194-208.
Landy, F.J. and Farr, J.L. (1983), The Measurement of Work Performance: Methods, Theory, and
Applications, Academic Press, New York, NY.
Lerner, J.S. and Tetlock, P.E. (1999), “Accounting for the effects of accountability”, Psychological
Bulletin, Vol. 125 No. 2, pp. 255-75.
Longenecker, C.O., Sims, H.P. and Gioia, D.A. (1987), “Behind the mask: the politics of employee
appraisal”, The Academy of Management Executive, Vol. 1 No. 3, pp. 183-93.
Maxwell, S. and Delaney, H.D. (1990), Designing Experiments and Analyzing Data, Wadsworth,
Belmont, CA.
Mero, N.P. and Motowidlo, S.J. (1995), “Effects of rater accountability on the accuracy and
favorability of performance ratings”, Journal of Applied Psychology, Vol. 80 No. 4, pp. 517-24.
Mero, N.P., Motowidlo, S.J. and Anna, A.L. (2003), “Effects of accountability on rating behavior
and rater accuracy”, Journal of Applied Psychology, Vol. 33, pp. 2493-514.
Mook, D.G. (1983), “In defense of external invalidity”, American Psychologist, Vol. 39, pp. 379-87. Motivational
Murphy, K.R. and Balzer, W.K. (1989), “Rater errors and rating accuracy”, Journal of Applied factors
Psychology, Vol. 74 No. 4, pp. 619-24.
Murphy, K.R. and Cleveland, J.N. (1995), Understanding Performance Appraisal: Social,
Organizational, and Goal-based Perspectives, Sage, Thousand Oaks, CA.
Roch, S.G. and Woehr, D.J. (1997), “The effect of rater motivation on the accuracy of performance
appraisal: an NPI approach”, poster presented at the annual meeting of the American 711
Psychological Association, Chicago, IL, August.
Roch, S.G., Ayman, R., Newhouse, N.K. and Harris (2005), “Effect of identifiability, rating
audience, and conscientiousness on rating level”, International Journal of Selection and
Assessment, Vol. 13, pp. 53-62.
Salvemini, N.J., Reilly, R.R. and Smither, J.W. (1993), “The influence of rater motivation on
assimilation effects and accuracy in performance ratings”, Organizational Behavior and
Human Decision Process, Vol. 55, pp. 41-60.
Simonson, I. and Nye, P. (1992), “The effect of accountability on susceptibility to decision error”,
Organizational Behavior and Human Decision Processes, Vol. 51, pp. 416-46.
Taylor, S.E. and Brown, J.D. (1988), “Illusion and wellbeing: a social psychological perspective on
mental health”, Psychological Bulletin, Vol. 103, pp. 193-210.
Tetlock, P.E. (1983), “Accountability and complexity of thought”, Journal of Personality and
Social Psychology, Vol. 45, pp. 74-83.
Tetlock, P.E. (1985), “Accountability: the neglected social context of judgment and choice”, in
Staw, B.M. and Cummings, L. (Eds), Research in Organizational Behavior, 7, JAI Press,
Greenwich, CT, pp. 297-332.
Tetlock, P.E. and Kim, J.I. (1987), “Accountability and judgment processes in a personality
prediction task”, Journal of Personality and Social Psychology, Vol. 52, pp. 700-9.
Tetlock, P.E., Skitka, L. and Boettger, R. (1989), “Social and cognitive strategies for coping with
accountability: conformity, complexity, and bolstering”, Journal of Personality and Social
Psychology, Vol. 57, pp. 632-40.
Waung, M. and Highhouse, S. (1997), “Fear of conflict and empathic buffering: two explanations
for the inflation of performance feedback”, Organizational Behavior and Human Decision
Processes, Vol. 71 No. 1, pp. 37-54.

Appendix. Perception questions

(1) To what extent did you feel accountable for your ratings?
(2) To what extent did you feel responsible for your ratings?
(3) To what extent did you feel answerable for your ratings?
(4) I thought that my ratings would be viewed more favorably if I gave the participants
higher ratings than I thought they deserved.
(5) I thought that my ratings would be viewed more favorably if I gave the participants the
ratings I thought they deserved.
(6) To what extent did you feel accountable to Cindy and Robert for your ratings?
(7) To what extent did you feel accountable to experts for your ratings?

Das könnte Ihnen auch gefallen