Sie sind auf Seite 1von 25

The Leadership Quarterly 28 (2017) 153–177

Contents lists available at ScienceDirect

The Leadership Quarterly


journal homepage: www.elsevier.com/locate/leaqua

A review, analysis, and extension of peer-leader feedback


agreement: Contrasting group aggregate agreement vs.
self-other agreement using entity analytics and visualization☆
Steven E. Markham a,⁎, Ina S. Markham b, Janice Witt Smith c
a
Department of Management, Pamplin College of Business, Virginia Tech, Blacksburg, VA 24061, United States
b
Department of Computer Information Systems & Business Analytics, College of Business, MSC 0202, James Madison University, Harrisonburg, VA 22807, United States
c
Department of Management and Marketing, College of Arts, Science, Business, & Education, Winston-Salem State University, Winston-Salem, NC 27110, United States

a r t i c l e i n f o a b s t r a c t

Article history: In reviewing peer-leader feedback within Multi-Source Feedback programs, the group aggre-
Received 26 May 2016 gate agreement (GAA) method is contrasted with self-other agreement (SOA). Past research
Received in revised form 9 September 2016 (Markham, Smith, et al., 2014) has demonstrated convergence problems with GAA for groups
Accepted 4 October 2016
of peer raters. To evaluate dyadic convergence, we used the Benchmarks data to investigate
Available online 26 October 2016
two derailment factors (Building & Mending Relationships and Interpersonal Problems) for
4607 peers describing 1505 focal respondents. For high-agreement dyads, the rtotal =
Keywords: −0.66** with 88% of the combined variance and covariance based on dyadic averages converg-
360° feedback
ing as whole units. Only 50% of all dyads demonstrated this type of high convergence. For low
Dyadic peer feedback
agreement dyads, the matching correlation (rtotal = −0.56**) was almost exclusively a func-
Leadership
WABA tion of within-dyad divergence with only 4% stemming from between-dyad sources. Research
Self-other agreement implications for evaluating SOA under these agreement conditions are highlighted. Practitioner
Organizational visualization applications for using an entity-based visualization of dyads also are prototyped and discussed.
© 2016 Elsevier Inc. All rights reserved.

Introduction & background

Peer ratings and multisource feedback program

Multisource Feedback (MSF) programs are essential tools for organizations that are engaged in formal leadership development
programs (Atwater & Waldman, 1998). A wealth of anecdotal experiences exists suggesting that the comparison of one's percep-
tions with others' perceptions can be transformative, under the right conditions. It is not clear, however, if the effort required to
gather information from the focal respondents, their bosses, their subordinates, their peers, and others provides equally valuable
information. This overarching issue of how to provide feedback has important theoretical and practical implications in light of the
large numbers of participants who experience a 360° MSF program each year (Van Velsor, McCauley, & Ruderman, 2010). There is
also a perceived need to dramatically improve the efficacy of leadership development programs as argued by Kellerman (2012)

☆ For making the data available, special thanks to Bill Gentry Ph.D., Senior Research Scientist at CCL, and the staff at Center for Creative Leadership in Greensboro,
North Carolina. This work was supported by the College of Business, James Madison University and the Department of Management at the Pamplin College of Business,
Virginia Tech.
⁎ Corresponding author.
E-mail addresses: markhami@vt.edu (S.E. Markham), markhais@jmu.edu (I.S. Markham), drjwsmith@jwsmithconsulting.com (J.W. Smith).

http://dx.doi.org/10.1016/j.leaqua.2016.10.001
1048-9843/© 2016 Elsevier Inc. All rights reserved.
154 S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177

and Pfeffer (2015). They have criticized the leadership development and feedback industry in the popular press, especially in
terms of the large sums of money spent while returning such small benefits. As such, the underlying themes of this article in-
clude: (1) reviewing the importance of peer feedback, especially within the 360° context, (2) identifying problems with tradition-
al methods for analyzing and reporting peer feedback, and (3) developing alternative analysis and reporting approaches for peer
feedback.
The first purpose of this research is to review our knowledge concerning peer feedback and how it is tied to fundamental data
configuration issues, regardless of whether averages of groups of peer raters are examined, averages of dyadic pairings of a focal
respondent with a single peer are used, or if simple individual reports are utilized. Researchers have debated the meaning and
utility of peer reports in an MSF feedback context for a number of years (Abdulla, 2008; Bettenhausen & Fedor, 1997; Conway,
Lombardo, & Sanders, 2001; Cushing, Abbott, Lothian, Hall, & Westwood, 2011; Dalessio & Vasilopulos, 2001; Dominick, Reilly,
& McGourty, 1997; Feudo, Vining-Bethea, Shulman, Shedlin, & Burleson, 1998; Facteau, Facteau, Schoel, Russell, & Poteet, 1998;
Facteau & Craig, 2001; Furnham & Stringfield, 1998; Mayo, Kakarika, Pastor, & Brutus, 2012). The superior-subordinate self-
other agreement (SOA) literature is often used for guidance in this debate (Heidemeier & Moser, 2009).
There is sufficient evidence to suggest that peer feedback is significantly associated with a number of positive outcomes for
individuals, teams, and organizations (Antonioni & Park, 2001; Atwater & Brett, 2006; Bettenhausen & Fedor, 1997; Byrd,
Martin, Nichols, & Edmondson, 2015). At the same time, peer reports are also substantially different from subordinate feedback
so that they cannot be considered duplicate information. In fact, it may be the case that peer feedback works better if the
peers are not anonymous, unlike other sources of feedback (Bamberger, Erev, Kimmel, & Oref-Chen, 2005). Finally, however,
peer feedback reports suffer from the same problem as do subordinate reports when matched against the focal respondents' re-
ports, which is the kernel of SOA. In both cases, while statistically significant, there are relatively small amounts of variance in the
focal reports that are explained by the raters' reports (Zhou & Schriesheim, 2009).
A second purpose of this research is to empirically demonstrate dyads as an alternative level of analysis that can be used in-
stead of whole groups for analyzing peer reports. This will help bridge the gap between the practitioner focus on GAA and the
academic focus on SOA. Researchers are confronted with the question of how we know when there is sufficient variation between
rater groups coupled with adequate convergence within such groups to justify creating an average for a particular respondent
group. The practical issue of aggregability (defined as meeting the requirements for statistical and practical justification so as to
aggregate hierarchically-nested, granular data to higher level entities without creating statistical artifacts or false inferences),
can be summarized quite simply. If feedback reports are being generated based upon averages that are essentially statistical arti-
facts, then false or misleading feedback is being given to focal respondents. It is also not clear if the feedback from these three
grouped sources (subordinates, peers, and others, such as customers) should be aggregated in order to provide feedback. Thus,
we pose a fundamental research question: Does convergence operate at the same level of analysis for peer reports as it does
for direct subordinate reports?
The examination of leadership processes that focus on dyads instead of whole groups has a long history (Dansereau, 1995).
Similarly, the general analysis of dyadic data also has been advocated by Kenny, Kashy, and Cook (2006), as has the recommen-
dation that multi-level methods (MLM) be applied to multisource feedback (Yammarino, 2003). However, dyadic studies using
complete multi-level methods are scarce (Gooty & Yammarino, 2011), and, to date, they rarely have been applied to multisource
feedback data. An exception to this observation was recently illustrated in a predecessor article (Markham, Markham, & Smith,
2015) in which the dyadic level of analysis was operationalized for pairs of focal respondents and direct subordinates. In that
study, the dyadic level of analysis provided more useful, convergent information than the whole group level of analysis. In this
study, we will apply the dyadic level of analysis to peer data so as to determine its configurational characteristics compared to
whole groups of peers.
The third purpose of this research is to develop alternative approaches to providing peer-based feedback. One potential tool is
a visual 3D display, prototyped below, that will simultaneously show the scores of the focal respondent, the direct subordinates,
and the peers based on the Within And Between Analysis (WABA) methodology. What does it mean to develop such types of
displays for dyadic feedback reports? The foundation to this question can be partially provided by the field of organizational
visualization (Markham, 1988, 1998; Markham, Markham, & Braekkan, 2014b) in which a variety of 2D representations have
been offered. Specifically, a 3D visualization of 360° data can provide an alternative way of communicating to focal respondents
the nature of their feedback, especially when the average scores of rating groups should not be utilized.

The importance and use of peer feedback

Peer reports can be a valuable source of feedback beyond the information provided by subordinates or bosses. Peers represent
a unique social resource. They are not part of the direct chain of command, and thus less subject to rules, constraints, and cen-
soring. They constitute a potentially much larger pool of raters when compared to the smaller number of direct subordinates.
Peers are usually considered “friendlies” when they are nominated by the focal respondent, in contrast to direct subordinates
who often might be hostile to the focal respondent. Finally, peers can provide a comparison group for the focal leader when seek-
ing role model information, political alliances, and tacit information about the social system, etc.
Evidence for the value of peer feedback in 360° leadership development programs is relatively clear; it is of value. However, it
is difficult to ascertain exactly how much value and in what manner. This is because it is difficult to interpret most of the research.
Peer feedback is clearly correlated with a variety of leadership effectiveness measures. However, it is also substantially different
from other sources of feedback, be they subordinates or bosses. As such, it has often been a subsidiary concern because of the
S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177 155

emphasis on direct subordinates and bosses. As a result, it has rarely been a central focus, despite its positive research record. In
fact, the use of peer reports has often outperformed other rating sources (Brutus, Fleenor, & Tisak, 1999).
Outside of the management research domain, there is a surprisingly large literature that suggests that peer reports and peer
relations are helpful as well as sometimes critical factors for an individual's success. These additional interest groups include med-
ical doctors (Boerboom et al., 2011; Donnon, Al Ansari, Al Alawi, & Violato, 2014; Lee, Tsai, Chiu, & Ho, 2016; Lockyer, Violato, &
Fidler, 2003; Roberts, Campbell, Richards, & Wright, 2013; Saedon, Salleh, Balakrishnan, Imray, & Saedon, 2012; Wright et al.,
2016), nurses (DeStephano, Crawford, Jashi, & Wold, 2014; Fedor, Bettenhausen, & Davis, 1999; van Schaik, O'Sullivan, Eva,
Irby, & Regehr, 2016), K-12 teachers (Howe & Stubbs, 2003), college and high school learners (Diab, 2016; Gielen, Peeters,
Dochy, Onghena, & Struyven, 2010), patients in recovery programs (Firmin, Luther, Lysaker, & Salyers, 2015; Naslund,
Aschbrenner, Marsch, & Bartels, 2016; Perreault et al., 2016), and at-risk teenagers (Feudo et al., 1998; Kalafat & Elias, 1992;
McIntosh, MacDonald, & McKeganey, 2006; Pedersen et al., 2013; Podschun, 1993).
Given the ubiquity and importance of peer relations and feedback for businesses (Baker, 2014), it should not be surprising that
peers also can be a very important source of political support, tacit knowledge, and social comparison for managers. This is despite
the fact that peer relations are much less frequently studied and reported when compared to the plethora of studies examining
the direct hierarchical chain composed of the boss, the focal leader, and the direct reports of that focal leader.

The objective: understanding the problem of derailment

The history of 360° programs is tied to leadership development, and in particular to the notion of derailment, a long-term re-
search initiative led by the CCL starting in the 1980s. In fact, leadership derailment factors have become the focus of a dedicated
literature (Braddy, Gooty, Fleenor, & Yammarino, 2014; Leslie & Braddy, 2013). This research stream attempts to identify the kinds
of developmental activities and training necessary for individuals to attain senior-level positions. Supported by the CCL, the Exec-
utive Derailment research project captured data from individuals who were successful and then compared this information to
data from executives who had been only somewhat successful; in other words, those who had been derailed. This comparison
group had not been fired, but they had not reached the organizational level that they and the organization expected them to
achieve.
By way of definition, a leadership derailment factor is any item or event that has been shown to sidetrack a manager's career
such that he or she is no longer considered “hi-po” (high potential) by the organization (Zenger & Folkman, 2009). Derailment is a
precursor process that leads ultimately to sidelining, underutilization, demotion, voluntary turnover, premature retirement, or
outright firing (Carson et al., 2012). While derailed managers are not often fired immediately, they usually are removed from
an internal pool of candidates for promotion (Herbst, 2014; Thuraisingham, 2010). Numerous studies have supported the inves-
tigation of derailment (Zenger & Folkman, 2009) and have found that ratings from peers and subordinates, aside from self-other
agreement between subordinates and focal respondents, are important when predicting the risk of derailment (Braddy et al.,
2014).
The significance of the problem of leader derailment is especially salient for the special case of CEOs who fail (Dotlich & Cairo,
2003). While there are a number of alternative explanations for these expensive failures, one of the key ones has to do with an
inability to manage interpersonal issues and problems, a key derailment concept (Conger & Nadler, 2004). The significance of the
issue of derailment for leaders in the boardroom is just as pronounced as it is for middle management, if not more so (Furnham,
2010). There is some data to suggest that a variety of personality disorders and a lack of interpersonal skills are more evident at
higher levels in organizations (Tang, Dai, & De Meuse, 2013) which can lead to these terminations. Because financial and organi-
zational costs associated with CEO failures are so high (Stoddard & Wyckoff, 2008), this single area can stand alone as a significant
problem within the leadership domain.
We will be using the same model and subset of variables that were analyzed by Markham et al. (2015) who examined the
extent to which focal respondents and direct subordinates could be universally modeled as whole dyadic units. They found
that focal respondent-subordinate dyads did not universally converge. Previous work by Markham, Smith, Markham, and
Braekkan (2014a) found that the reports from subordinates, peers, and others should not be aggregated at the rating group
level of analysis. The data for these studies were provided by the Center for Creative Leadership (CCL), and they are part of a
much larger set contained within the Benchmarks 360° Instrument that focuses on preventing managerial derailment (Leslie &
Braddy, 2013). In this study, we will be using the same two variables derailment precursors from a larger set: (1) Building &
Mending Relationships and (2) Problems with Interpersonal Relationships. The first scale, Building & Mending Relationships, re-
fers to the types of activities and the amount of effort spent by the subject in reaching out to and investing in his or her social
support network. The second, Problems with Interpersonal Relations, refers to the severity and difficulty of interpersonal problems
faced by the focal respondent. Both variables are highly salient to the problem of derailment (Williams, Campbell, McCartney, &
Gooding, 2013).

The problems with peer feedback reports

Contrasting group aggregate agreement (GAA) with self-other agreement (SOA)

Within MSF programs, the traditional method for reporting data from groups of raters (subordinates, peers, or others) is to
calculate an average score and use that either in its raw form or as a percentile. This is the essence of the group aggregate
156 S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177

agreement (GAA) approach. The self-other agreement (SOA) model examines matched pairs of reports between the focal respon-
dent (aka, self) and a single rater (aka, a peer, a subordinate, a customer, etc.). If GAA can be viewed as a horizontal form of ag-
gregation where the members of the rating groups are at the same hierarchical level of analysis, then SOA can be seen as a
vertical, one-on-one event. From the perspective of a leadership development professional who uses 360° surveys, there is little
question that the GAA model is dominant. This is because data are collected from peers with the proviso that no individual will be
revealed or reported. Concomitantly, an average for the group of peer raters will only be calculated if a certain minimum number
of raters respond. This leads to an underlying question: Should peer feedback even be organized and reported by averaging each
focal respondent's feedback group? This method would imply an average leadership style that can be perceived by each group
member, whether the group is composed of subordinates or peers.
The analysis of peer reports suffers from the same type of problems as does the analysis of direct subordinate reports in 360°
programs — namely, the practitioner concern about GAA approaches, despite their popularity in the field, that has been shown to
be fraught with difficulties from a methodological point of view. Past research (Markham et al., 2014a) has demonstrated conver-
gence problems with the GAA method for feedback from groups of peer raters, and no studies have examined and tested the dy-
adic convergence between peers and focal leaders.
In contrast to the practitioner preference for GAA-based techniques, the SOA approach has garnered much more interest from
the academics, caused largely by the difficulty of reporting about individual raters who might lose their anonymity. As an alter-
native to the GAA approach, the SOA method inherently implies the following question: Should the data be examined by pairing a
self-report of the focal respondent with an individual peer report, thereby pointing towards a more individualized, dyadic style of
leadership, which is not necessarily consistent across all members of the rating group?

Whole groups of peer raters: the problem with group aggregate agreement (GAA) models

The traditional method of dealing with raters, such as peers, is to calculate an average score for each rater group, regardless of
the type of rater. Alternatively, some form of inter-rater agreement might be used (Brown & Hauenstein, 2005). In either event, a
group in this context is defined as all of the raters of the same type who are describing the same focal respondent. Many consult-
ing organizations, including the CCL, have a procedure for adding an asterisk to a group average if there is more than a three-point
spread among respondents. Nevertheless, that average is still reported and interpreted.
In theory, for the GAA approach to make sense, the creation of these averages should meet minimum criteria to prevent the
introduction of statistical artifacts (Dansereau, Cho, & Yammarino, 2006). These criteria include: (1) significant differences be-
tween focal groups, (2) minimal variation within focal rating groups, (3) significant correlations based on the focal group aver-
ages, and (4) simultaneously no substantial residual correlations within the groups. If these criteria can be met as either a
weak or strong inference, then a “whole” unit effect has been identified. In quick overview, this is the essence of the WABA in-
ferential system (Dansereau, Alutto, & Yammarino, 1984). The WABA system was selected for this study because of its focus on
identifying and testing significant entities as well as significant relationships between variables. In its updated version
(Markham et al., 2014b), it forms the essence of the term “entity analytics”.
This whole-group approach was used on the peer groups reported in a previous article (See Tables 2 and 6, Markham et al.,
2014a). These results for peer rating groups are summarized here. The raw score, individually-based correlation for the peers'
views of Building & Mending Relations compared to their views of Interpersonal Problems was r = −0.77**, with a correlation
based on peer group averages of =−0.83** and a within-rating group correlation of r = −0.70**. It is this large, within-rater
group correlation in conjunction with moderately significant between-group F tests that is problematical. (The within-group cor-
relation is calculated from deviation scores above and below the group's mean.) It represents a different configuration for the
group in which the dispersion within the group becomes of central interest. The combination of these two factors prevents a
strong, whole group inference from being drawn. When examining the aggregability of the peer groups, these results were on
the border between the Equivocal and Weak Whole Group conditions. It was concluded that peer group ratings should not uni-
versally be aggregated and reported especially in light of a further analysis of the dispersion of peer rating group coefficients of
variation statistics (CV). The 2014 article noted above only looked at peer data; it did not examine any convergence correlations
between peers and focal respondents, as does the current article.

The meaning of SOA convergence (or its lack)

The pursuit of SOA convergence has a history independent of the MSF movement. Originally, researchers considered close con-
vergence between a “self” (focal respondent) and a rater to be an ideal that was universally desirable. As such, this form of dyadic
convergence fits closely into a growing literature that emphasizes the importance of dyads in the workplace (Ferris et al., 2009).
When examining SOA in 360° programs, the use of dyads is an increasing trend (Gooty & Yammarino, 2011, 2016; Yammarino,
2003). Despite this growth, the meaning of convergence between a focal leader and a rater is not clear. The desirability of close
convergence might be interpreted as a form of “self-awareness” on the part of the focal respondent (Van Velsor, Taylor, & Leslie,
1993). Convergence has been interpreted as the ability to successfully read social cues from others and thereby have highly con-
sistent views about leadership behavior. Tang et al. (2013) and Swanson et al. (2010) represent a group of researchers who would
like to equate SOA convergence with self-awareness. Those leaders who show more convergence are therefore more self-aware
and better able to read and control social situations (Cullen, Gentry, & Yammarino, 2015). Consequently, they will be better
able to handle feedback in a non-defensive manner (Taylor & Bright, 2011). While this use of “gap analysis” (Calhoun, Rider,
S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177 157

Peterson, & Meyer, 2010) might be an example of this convergence approach, there is little data to support it. With respect to the
self-awareness concept, Day and Dragoni (2015) also confirm its appeal, despite the fact that there is little evidence yet to support
a relationship between rater agreement and leader effectiveness (Day, Fleenor, Atwater, Sturm, & McKee, 2014).
Part of the problem with defining SOA exclusively as a form of self-awareness is that it ignores valid reasons for differences in
the perspectives of the focal respondent and the other. Because there are a number of alternative explanations for the meaning of
convergence between “self and other”, more research needs to be done to rule them out. However, one point is clear in support of
the multiple interpretation approach: namely, convergence correlations in leadership are notoriously low. They might be statisti-
cally significant, but they do not share much common variance. This point has been explained theoretically and documented em-
pirically by Schriesheim and his students (Schriesheim, Wu, & Cooper, 2011; Zhou & Schriesheim, 2009, 2010). This research
group has shown that for superior-subordinate pairings that are queried about Leader Member Exchange (LMX), convergence
is highly varied; some dyads converged closely, many did not. The lack of convergence would appear to be just as likely as
that of close convergence. Similarly low correlations in the range of r = 0.22 have been found in a wide variety of superior-sub-
ordinate SOA research (Heidemeier & Moser, 2009). These results have similar orders of magnitude to peer-leader findings (Tang
et al., 2013).
From a theoretical point of view, it is attractive to wish to examine this discrepancy as a lack of self-awareness on the part of
the focal respondent. We disagree because there are a number of alternative explanations for the lack of convergence. The lack of
convergence might reflect valid sources of perceptual or attitudinal differences. Research by Armstrong, Allinson, and Hayes
(2004) used the Myers-Briggs Type Indicator (MBTI) which can explain such discrepancies. In a dyadic context, superiors who
were intuitives were seen as less nurturing and more dominant. Dyads composed of the non-intuitives reported the most positive
outcomes. Another alternative explanation for this difference is that the focal respondent might use a form of persuasion (hard,
soft, or rational) that does not fit well with the rater's preferences (Berson & Sosik, 2007). In a similar vein, how a focal leader

Matrix 1: Traditional Individual Level Convergence Matrix 2: Superimposition of Circle for Misclassifcation of
Matrix Mid-range Scores

Matix 3: Superimposition of Confidence Bands for Individual Matrix 4: Superimposition of Curvilinear Confidence Bands
Level Hypothesized Convergence Relationship on Dyadic Averages

Fig. 1. Self-other agreement congruence matrices.


Adapted from Cogliser et al. (2009).
158 S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177

scores on the Big 5 personality traits also affects these rating outcomes in a 360° program (Bergman, Lornudd, Sjoberg, & Schwarz,
2014).

The problems in SOA methods

A series of articles in the 1990s shifted the discussion of the feedback convergence problem away from the aggregation of rater
groups' reports to a comparison of a focal respondent with just one rater, which is a version of SOA. In this case, the focal respon-
dent could be compared with a single subordinate, a single boss, or a single peer. These authors introduced the general term “self-
perception accuracy”, especially as it applies to the 360° feedback context (Atwater & Yammarino, 1997; Church & Bracken, 1997;
Fleenor, McCauley, & Brutus, 1996; Nowack, 1997; Yammarino & Atwater, 1993). Three sets of specific terms, convergence, congru-
ence, and in-agreement, were presented across these articles to describe the SOA condition of accurate self-perceptions. Originally,
deviation scores were utilized to categorize different levels of agreement among self-other pairings (Yammarino & Atwater, 1993).
Due to methodological problems in using difference scores (Edwards, 1994, 1995, 2001; Edwards & Parry, 1993), Atwater, Ostroff,
Yammarino, and Fleenor (1998) and later authors moved to the recommended use of polynomial regression to reveal a more
complex picture of congruence than what could be revealed by a simple use of under-estimators and over-estimators.
Nevertheless, the attraction of the notion of self-perception accuracy remains, regardless of the method used to investigate it.
The above research helped provide the basis for Cogliser, Schriesheim, Scandura and Gardner (2009 p. 454) who summarized this
line of research with the matrix shown in the upper left quadrant of Fig. 1. This 2 × 2 matrix is designed to capture the issue of
over- and under-estimation when comparing these two sources of rating information. This figure has been re-created with a mod-
ification. Specifically, Cogliser, et al. used the term balance to indicate convergence or congruence. Furthermore, their original fig-
ure was presented from the perspective of the follower. Instead, we use the terms over- and under-estimator, and focus on the
leader, which is the terminology most often used in this SOA literature on leadership feedback.
Note that Fig. 1 has four quadrants. The upper right quadrant represents a condition of relative agreement and, at the same
time, high average scores between the two rating sources. The lower left quadrant also represents a condition of relative congru-
ence, also termed in-agreement (e.g., see Berson & Sosik, 2007; Devos, Hulpia, Tuytens, & Sinnaeve, 2013); however, this agree-
ment converges on low ratings, not high ratings. Both off-diagonal quadrants represent conditions of low convergence. In the
upper left quadrant, the focal respondent overestimates, and in the lower right quadrant the focal respondent underestimates.
This matrix recognizes that not all of the dyadic pairings will necessarily produce scores that converge. This lack of convergence
can occur for very valid reasons. For example, there may be hostility on the part of a subordinate who had been a favorite of the
previous leader, but not the current one. Some peers could also be antagonistic because they had wanted the job that had been
given to the focal respondent. Other peers or subordinates may have a problem with the focal respondent that they are not will-
ing to be open about sharing. This could result in a form of sabotage that would contaminate whatever reports they are making
about the focal respondent. Alternatively, it could also be the case that the focal respondent, while applying some variation of in-
dividualized dyadic leadership (Dansereau et al., 1995), intentionally has a different perspective on the type of leadership actions
taken. As a result, there is a theoretical basis for all four quadrants of this congruence matrix; however, at the same time, it should
not be surprising that the convergence correlations between focal respondents and their peers are modest, at best.
In summary, this matrix is a theoretically attractive option because it acknowledges the problem of overestimation and under-
estimation, and it has been used by many researchers (Tang et al., 2013 ). It also sidesteps the problem of the technique of ag-
gregating all members of a group of raters in order to provide a single number for feedback purposes. Nevertheless, it still
suffers from weaknesses, even when tested via polynomial regression.
What are these weaknesses? If one looks at the hypothesized 45° congruence regression line as shown in Matrix 2 of Fig. 1,
then it is evident that there is an artificial dichotomy that has been introduced by the traditional matrix. Specifically, a group
of individuals is represented by the circle at the center of this matrix who share essentially similar levels of feedback rating
and similar degrees of convergence. However, despite their similarity, they were assigned to one of the four quadrants. This
type of misclassification problem is unavoidable despite the negligible differences between them. Inevitably, the use of a dichot-
omous classification that is superimposed on continuous variables will lead to a loss of information. At the same time, resulting
analysis will be weaker and have less accuracy in detecting any effects.
An alternative to this categorization would be to put confidence intervals around the hypothesized, ideal regression line as
shown in Matrix 3 of Fig. 1. Respondents can then be viewed as a function of the degree of overestimation or underestimation
as a continuous variable measured by the distance from the regression line and the distance from the confidence bands that
are superimposed. This alternative would make it easier to see the degree of convergence as a continuous variable that is orthog-
onal to the regression line. It also would allow different degrees of convergence to be defined by various confidence intervals.
More importantly, this alternative permits for the possibility of having all ranges of the ratings come into play, not just low versus
high ratings.
While it may appear that the solution represented by Matrix 3 of Fig. 1 has advantages over the traditional approach, it is not
without its own difficulties. Specifically, this approach does not work at multiple levels of analysis. While it may work for individ-
uals, it does not work for dyads. Recent work by Gooty and Yammarino (2016) suggests that the SOA problem of convergence can
be conceptualized as an entity identification problem. In other words, the testing of entities built around dyads and rating groups
is highly salient to MSF programs. From a practical point of view, this is akin to asking if there is sufficient congruence between
the focal respondent and a rater that a whole, unitary dyad (as defined above) can be identified. For the same reason, it may well
be that a “dispersed dyadic” model is supported by the data in which the lack of congruence is paramount in the data, not the
S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177 159

degree of convergence. Their conclusions using different techniques are very similar to the findings of Markham et al. (2015) who
reported that for focal respondent-direct subordinate pairings, under some boundary conditions, whole dyads could be identified,
while under other conditions, a dispersed dyadic model worked best.
Matrix 4 in Fig. 1 is an operationalization of this framework incorporating the dyadic level of analysis. By way of explanation,
note first that Matrix 4 applies to dyadic averages, not individual reports. Also note that these confidence intervals are curvilinear
with respect to the idealized regression line (adapted from Markham et al., 2015). In other words, if the self-peer pairing pro-
duces a very low average score, then there is very little freedom for either rater's score to vary from the average. What constitutes
a high degree of divergence under this condition will be very different from the mid-range of scores where either the focal re-
spondent or the peer can use the full range of the scales. The same notion applies at the high-end of the rating scale: there is
very little latitude for digression by either rater. In contrast, at the middle of the rating scale, the opportunity for divergence is
maximized, where one respondent can give the lowest possible score while the other respondent can give the highest possible
score. This type of curvilinear confidence interval has been superimposed on Matrix 4 of Fig. 1. Additionally, at the peer group
level of analysis, Markham et al. (2014b, 2014a) also found that the confidence intervals around peer rater group averages
were curvilinear, not linear.
The direct application of Matrix 4 in Fig. 1 supports differential rules about what constitutes a dispersed dyadic condition. In
representing the dyadic SOA pairings, regardless of whether the other is a subordinate, a boss, or a peer, if researchers plan to use
these dyadic averages then they need to be restricted to those pairings which fall within a relatively narrow, curvilinear confi-
dence band around the hypothesized convergence line. This is because the average scores are meaningful and are substantially
larger than whatever residuals might occur within the dyads. On the other hand, if researchers wish to investigate the phenomena
of overestimation and underestimation, then they would want to do this with the deviation scores from the dyad averages, but
only using those dyads that have enough dispersion within to be meaningful. Thus, researchers should not use dyads whose av-
erages are near the hypothesized regression line, which represents close convergence. Instead, they would have to choose those
nominal dyads outside of the confidence limits in Matrix 4 because they would represent dyads where the reports are dispersed
substantially enough to be of statistical interest (Castro, 2002; Gooty, Serban, Thomas, Gavin, & Yammarino, 2012).

Working with dyads: self-other agreement

Given the previous weak findings when looking at whole groups of peer raters, (Markham et al., 2014a), a shift to analyzing
the dyadic level of analysis represents a viable alternative (Gooty & Yammarino, 2016). In support of this shift, the value of the
dyadic approach was shown for direct subordinate reports in Markham et al. (2015), the companion piece for this article.
What does it mean to analyze matched leadership reports at the dyadic level of analysis? In other words, what might a dyadic
entity look like?
The answer to this question can be partially provided by the field of organizational visualization (Markham, 1998, 2010;
Markham et al., 2014b). Figure 2 suggests three alternative visualizations of a dyad, which is defined as an organizational entity
composed of both a focal respondent and an “other”. In this study, we concentrate exclusively on the pairing of a focal respondent
with a single peer rater, and we test the data to see which type of dyadic data configuration best suits the data.

Nominal only, but


Unitary Dyads (“Whole”) Dispersed Dyads (“Parts”)
Independent (“Equivocal”)

5.0 5.0 5.0

4.0 4.0 4.0

3.0 3.0 3.0

2.0 2.0 2.0

1.0 1.0 1.0

Fig. 2. 2D visualization of alternative views of dyadic leadership pairings of self-reports with peer-reports.
160 S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177

On the left side of the figure, a unitary dyad composed of a self-report and a single report from a subordinate rater is shown
along with a vertical axis indicating a traditional rating scale of 1 to 5. “Unitary” in this context means that the pairing can be
considered as a single, whole dyadic unit. Three attributes lead to this conclusion. First, the two reports closely overlap (as
shown by the single circle shaded in a gradient gray in Fig. 2). Second, there is minimal variation within the dyad (as shown
by the relatively short vertical ellipse). Third, the use of the dyadic mean represents the pairing as a single entity (as shown
by the black triangle and the small confidence band around it). By definition, a unitary dyad will have close convergence in
the reports from both members. As a necessary corollary, in this unitary configuration, a small amount of variability within
each dyad also implies that most of the variance and covariance will exist between dyads. Hence, there is a large vertical spread
between the two different dyads illustrated under the unitary, whole dyad condition.
At the far right of Fig. 2, a dispersed dyad (traditionally called a “parts” condition) is shown. In this case, there is maximum
variation and covariation within dyads, and very few, if any, meaningful differences between the dyadic averages. It can also
been termed a “frog pond” effect that focuses on a within-dyad differences (Alicke, Zell, & Bloom, 2010; Bachman & Omalley,
1986; Espenshade, Hale, & Chung, 2005; Firebaugh, 1980; Jiang, Probst, & Benson, 2014; Markham & McKee, 1991; Pierro,
Presaghi, Higgins, Klein, & Kruglanski, 2012; Zell & Alicke, 2009). Notice in the case of the far right image in Fig. 2 that the pairing
of the two reports based on nominal dyadic membership is still crucial; however, instead of creating a dyadic mean, within-dyad
difference scores are highlighted and used. This is why the triangle showing the dyadic mean is left unshaded: it represents a sta-
tistical artifact. By definition, a dispersed dyad will have a lack of convergence in the reports from both members.
Figure 2 also illustrates a third alternative in the center column. In this view neither a dyadic average nor dyadic difference
scores should be utilized . In this third condition, the two individuals who are reporting are nominally part of a formally assigned
dyad, but, for whatever reason, their scores should be viewed independently of the dyad. In contrast to a unitary dyad (focusing
on average scores) or a dispersed dyad (focusing on within-dyad dispersion scores), this last alternative, the nominal dyad, exists
in name only and would be considered an equivocal finding.

Inductive entity testing: Are leader-peer dyads universally aggregable?

We will focus upon the manner in which the two variables of interest are configured around our entities, in this case a pairing
of matched reports from the focal leader and a direct report. To do this we will use the inferential system of WABA as developed
by Dansereau and his colleagues (Dansereau et al., 1984; Dansereau & McConnell, 2000). We will apply this system in its induc-
tive, entity testing mode as recently illustrated by Markham, Markham, and Braekkan (2014) and as generally recommended by
Yammarino, Dionne, Chun, and Dansereau (2005).
The central issue when testing dyads as entities appears to be straightforward: Should each dyadic pairing of a focal respon-
dent and peer who are both describing the same leadership behavior be aggregated into a single, dyadic average? We also pose a
related question: How strong must the evidence be in order to proceed with a universal approach to creating averages for all
dyads?
From a methodological perspective, how do we identify this type of dyadic convergence? Is it enough to simply align all of
these feedback groups as cells in a one-way Analysis of Variance (ANOVA), and then calculate the F statistic and proceed to av-
erage the scores? While any ANOVA test is an improvement over past practices with respect to aggregation, our experience has
been that there are additional issues and complexities that should be addressed before concluding that all dyads should be rep-
resented by their mean on any given scale.
The combination and simultaneous approach to these two questions form the basis for the inductive approach to testing mul-
tiple configurations of the data using WABA. High dyadic mean scores should be coupled with minimal within-dyad variance and
covariance if a strong inference is to be drawn towards the whole-dyad condition. If a dispersed dyadic configuration is to be de-
tected, then the reverse should be found: minimal contribution of the between-dyad sources of variance and covariance with
maximum contribution from the within-dyad components. This application of the WABA approach leads to an epistemological do-
main of science in which multiple models are simultaneously examined in order to reach a relatively stronger inference. This
“strong inference” approach was advocated in the classic article by Platt (1964), and it calls for examining data based on different
models in order to induce a conclusion (Yammarino et al., 2005). As a way of operationalizing this strong inference approach to
test variables and entities, the following research question is offered.
RQ 1. Does the agreement represented by each dyadic pairing of a self-report and a peer-report work equally well across all
dyads in terms of showing maximum convergence?
Given the possibility that a clear answer cannot be determined for this research question, we need to be able to assess those
dyads that are convergent and those that are not. This inquiry leads to our second research question.

The search for boundary conditions: the agreement indicator using the coefficient of variation

We will calculate for each dyad their corresponding coefficient of variation (CV) as an indicator of convergence. The lower the
CV, the closer the members' scores converge. This leads to our second research question.
RQ 2. Assuming that whole dyad effects can be detected for peers-focal respondent pairings, what percentage of these pairings
would be considered as having an unacceptably large range of within-dyad variability such that creating a dyadic average would
not be justified?
S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177 161

The search for boundary conditions: higher levels of analysis

In multi-level studies, the possibility of finding an effect at a given level without checking to see if a higher level of analysis
might account for the results is always a potential threat to the study's validity. In this case, if we discover dyadic level effects,
we would also wish to minimize any chance that a higher level (in this case, whole rating groups of peers) might explain the
results. This observation leads to our third research question.
RQ 3. Do all of the dyadic pairings between the focal respondent and each peer work the same within each rating group such
that a whole group effect will be found at the higher level of analysis?

Methodology

Setting

For this study we use the same data set as did Markham et al. (2014a). We did this for the express purpose of comparing their
peer group level results with the proposed dyadic analyses described herein. The current investigation uniquely incorporates the
focal respondents' information along with their peer information. The CCL, a premier center for leadership training and develop-
ment since the 1970s, provided access to their data. Their flagship 360° leadership feedback instrument, Benchmarks (Leslie &
Braddy, 2013) has been used by N200,000 participants.
These data were drawn from approximately 1505 focal respondents and 4607 peers in 2005 to 2009 who had been nominated
for the 360° feedback experience by their organization. The usual protocols for issuing invitations to peers, subordinates, and
others (such as customers) were used as per the CCL manual.
The Benchmarks 360° Instrument was developed as a comprehensive assessment for middle- and upper-level managers to ex-
amine 16 competencies that research indicates are critical for success as well as five potential career derailers. The results of the
Benchmarks report are norm referenced against respondents at the same level (middle or upper management), whether in public
or private sector, and (upon request) based on particular industries. According to CCL, the Benchmarks 360° Instrument was de-
veloped for, and should continue to be used for, developmental purposes only, and not for selection, compensation, or perfor-
mance evaluation. The feedback report belongs to the focal respondent, and only that individual can share the feedback results
with others. This usually occurs after their confidential feedback debriefing with a certified facilitator. Respondents other than
the boss provide anonymous feedback, unless they have agreed to disclosure.

Measures

The Benchmarks 360° Instrument is composed of four overall content areas: (a) leading the organization; (b) leading others;
(c) leading yourself; and (d) problems that can stall a career. Because the last area, also called “derailment” factors (Braddy et al.,
2014) is paramount to the feedback process, a scale from this factor will be the focus of this project. As noted above, leadership
derailment is a well-researched domain focusing on the processes leading to leader underutilization and termination.
From a convergence perspective, the focal leaders describe how well they engage in a leadership activity, such as Building &
Mending relationships, or how much of a problem they experience with a derailment factor, such as Interpersonal Problems. Peer
raters describe how well the focal leaders engage in a leadership behavior or how much of a problem the focal respondent expe-
riences. In other words, the focal respondent and rater describe the same phenomenon.

Variable 1: Building & Mending Relationships

Background information about these scales, their norms, and their developmental history can be found in Leslie and Braddy
(2013). The description for this competency is also explained in the Benchmarks Feedback Report: Knows how to build and main-
tain working relationships with co-workers and external parties; can negotiate and handle work problems without alienating peo-
ple; understands others and is able to get their cooperation in non-authority relationships.
Three sample items from the 11 items comprising this larger scale include: (1) gets things done without creating unnecessary
adversarial relationships; (2) tends to understand what other people think before making judgments about them; and (3) settles
problems with external groups without alienating them. The general population average for Building & Mending Relationships is
about 3.88, with an alpha of 0.91.

Variable 2: Problems with Interpersonal Relationships

The scale, Problems with Interpersonal Relationships, is a direct derailment factor that follows from Building & Mending Rela-
tionships, a key leadership competency. The description for this 10-item scale was taken from the Benchmarks Feedback Report:
Difficulties in developing good working relationships with others: Sample items are: (1) is arrogant (e.g., devalues the contribu-
tions of others); (2) makes direct reports or peers feel stupid or unintelligent; and (3) adopts a bullying style under stress.
Notice that, due to the phrasing of these items, lower scores are more favorable. As a result, any expected correlation with
Building Relationships will be negative. The mean value for the general CCL population for Problems with Interpersonal Relation-
ships is about 1.7. The alpha for this competency is 0.93.
162 S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177

Variable 3: Focal leader and peer age

We have included the age of the respondents. This background information allows us to be consistent with previous research
in this series. Past research in this series has found no direct effect for age, so it will not be included in later analyses. For an ex-
ample of showing how to analyze age at the dyadic level, see Markham et al. (2015).

Analytical techniques

Software and inferential system

As in the companion research (Markham et al., 2014a), the Within and Between Analysis (WABA) inferential method is well
suited to the detection of entity effects (in this case a whole dyad effect) (Dansereau et al., 1984). As demonstrated by Yammarino
(2003), it is also useful in analyzing multi-source feedback data. All relevant statistical and practical tests of significance will be
calculated for dyad-based effects for the entire population of focal role incumbents. From a software perspective both SPSS and
DETECT1 (Dansereau & McConnell, 2000) are used.
In addition to the univariate tests of statistical and practical significance, WABA's Cumulative R statistic can provide a joint,
multivariate estimate of both the contribution of the variability between dyads and the covariability among variables based on
these dyads in detecting self-other agreement (SOA). We will be able to apply the same inferential procedures regarding the Cu-
mulative R statistics as used by Markham et al. (2014b, 2014a) where a full discussion of the Cumulative R statistic can be found.
They will be adapted for the dyadic level of analysis.
There are two versions of the Cumulative R statistic that are compared. The Cumulative Rbetween condenses information about
both between-dyad differences and correlations based on between-dyad averages. The Cumulative Rwithin encapsulates informa-
tion from both the within-dyad variances and the within-dyad correlations into one term. Observe that Cumulative Rbetween is
the multiplicative product of between-dyad variances and covariances, as is the Cumulative Rwithin term for the corresponding
within-dyad components. When these two terms are added together, they should always sum to the total, raw score correlation
of the two variables of interest.

Descriptive statistics and database layouts

Table 1 shows the variables' means, standard deviations, and correlations for the focal respondents and their peers. This un-
derlying data is formatted in a traditional matrix such that the self-report for a variable and the peer report are contained in
the same row. To understand the importance of the database layout, consult the far left image in Fig. 3. This is the original data-
base layout from CCL where each person, regardless of whether they are a focal respondent, a peer, or a subordinate is assigned to
a different row. However, each rating source (i.e. row) has the same names for the variables. (The data in Fig. 3 are synthetic for
illustrative purposes.) In this original configuration, there are 1505 focal respondents and 4607 peer reports. However, it is not
possible to calculate convergence correlations with this layout. As a result, the data are transposed to fit the middle image
where there is one row for each dyadic pairing of a focal leader and a peer. The focal leaders' ratings about his or her behavior
are replicated in this matrix. Thus, there are only 4607 rows. Traditional convergence correlations for peer and self-reports can
be calculated from this database layout. This data layout was used to calculate the information found in Table 1.
In examining Table 1, the self-peer correlation of Building Relationships (focal respondent or self-view) with Interpersonal
Problems (peer view) is r = −0.22, p b 0.01, and it is based on 4607 rows containing both the target respondent's and peer's
data. The matching, self-peer correlation of Building Relationships (peer view) with Interpersonal Problems (self-view) is r =
−0.15, p b 0.01.
Unfortunately, this database layout will not allow us to answer the question concerning significant differences between dyads
as called for by a one-way ANOVA. To accomplish this test, the data must be rearranged from the display shown in the middle
image of Fig. 3 which shows the traditional convergence data layout to the reconfigured layout shown in the bottom image of
Fig. 3. Observe that in the bottom image each dyad is represented by two rows of data, one for the peer and one for the focal
respondent. A dyad number has also been introduced. This data reconfiguration allows a dyadic average to be calculated within
the ANOVA procedure. As an inherent consequence, the number of rows is doubled so that the 4607 still represents the dyadic
averages, but there are 9214 total reports that will be used. This is called a vertical or “stacked” layout and is a prerequisite for
conducting both an F test and any WABA tests.
The results of the reconfigured data layout represented by the bottom image in Fig. 3 are shown in Table 2. It has the same
descriptive information as Table 1, but it is arranged for dyads. This table also includes the WABA bivariate correlation information
and utilizes the following format. First, the traditional, raw score correlation based on the total number of peers is always shown
to the left of the vertical bar. Second, the correlation based on the number of dyads is calculated from dyadic averages and is lo-
cated to the right of the vertical bar and above the horizontal bar. Third, the correlation based on within-dyad dispersion is

1
A free version of the DETECT software, related manuals, and ancillary articles can be downloaded from the CLS website. Contact the Center for Leadership Studies at
Binghamton University. http://www.cls.binghamton.edu/#!detect-waba/c21c2.
S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177 163

Table 1
Descriptive statistics and traditional convergence correlations for focal leaders and peers.

Std. Building & mending Interpersonal Subject's Building & mending Interpersonal Peer's
Variable Mean dev. relations (self) problems (self) age relations (peer) problems (peer) age

Building & mending 3.91 0.45 X


relations (self)
Interpersonal problems 1.55 0.56 −0.44⁎⁎ X
(self)
Subject's age 42.74 7.35 0.04⁎⁎ −0.07⁎⁎ X
Building & mending 3.85 0.72 0.25⁎⁎ −0.15⁎⁎ 0.00 X
relations (peers)
Interpersonal problems 1.80 0.83 −0.22⁎⁎ 0.20⁎⁎ 0.01 −0.77⁎⁎ X
(peers)
Peer's age 44.94 0.50 0.05⁎⁎ −0.05⁎⁎ 0.29⁎⁎ 0.04 −0.03⁎ X

Original Number of Focal Subjects J = 1505; Peer Reports N = 4607.


For matched correlations, focal leader data have been replicated to match peer reports so that the final J = 4607.
This corresponds to the data layout diagram in View 2 of Fig. 3.
⁎ p b 0.05.
⁎⁎ p b 0.01.

calculated based upon the signed deviation scores after holding constant group differences. This last term is located to the right of
the vertical bar and below the horizontal bar. The degrees of freedom for each form of the correlation (raw, between-dyad, and
within-dyad) are different and noted in the table.

Fig. 3. Data layout requirements for traditional convergence analysis vs. ANOVA and WABA.
164 S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177

Table 2
Descriptive statistics, ANOVAS, individual, between-dyad and within-dyad correlations for matched focal leaders and peer reports.

Variable Dyadic Mean Std. Dev. F ratio η2 (1) (2) (3)


(1) Building &
Mending Relationships
3.88 .60 1.55** .61 X
(Matched Dyadic
Reports)

(2) Problems with


Interpersonal –.70**
1.67 .72 1.36** .58 –.67** | –.63** X
Relationships (Matched
Dyadic Reports)

The data from Table 1 have been rearranged from a matrix format for calculating matched correlations (see View 2 of Fig. 3) to a stacked format to calculate an ANOVA, thus
each dyad has two rows of data, one from the focal respondent and the next row from a peer. (see View 3 of Fig. 3 for the data layout diagram.)
The F test aligns feedback groups with a one-way ANOVA. The degrees of freedom: 9214 stacked dyadic reports, one each from the focal individual and a matched
peer, and J = 4607 peer-based dyads. The η2 symbol is the ANOVA equivalent of R2 (explained variance) in linear regression.
The correlation to the left of the bar is the total correlation with N = 9214. To the right of the bar, the upper correlation is based on dyadic averages with J = 4607. To the
right of the bar, the lower correlation is based on within-dyad deviation scores after holding constant dyadic averages with N-J-1 degrees of freedom.
⁎ p b 0.05.
⁎⁎p b 0.01.

Results

RQ 1: Do focal leader and matched peers form universally convergent dyads?

The traditional convergence correlations for peer-leader agreement are reported in Table 1. There is a reasonable correlation
within raters. Peers reported for their view of the focal leader that Building & Mendingpeer and Interpersonal Problemspeer were
correlated with r = −0.77**. Focal leaders reported from their personal perspective that Building & Mendingself and Interpersonal
Problemsself were correlated with an r of −0.44**. However, the convergence correlations were substantially lower with: (1)
Building & Mendingself and Interpersonal Problemspeer showing a correlation of r = − 0.22** and (2) Building & Mendingpeer
and Interpersonal Problemsself a correlation of r = −0.15**. For these types of convergence correlations, this type of significant
but small finding is typical in the leadership SOA literature (Zhou & Schriesheim, 2010). Observe that the middle data layout
(View 2) in Fig. 3 describes how the focal leader and peer reports were arranged for this traditional analysis.
These results do not rule out the possibility that there might be hidden group effects (e.g., Markham, 1988). A hidden group
effect refers to a situation where an untested alternative level of analysis is cloaking a potential finding. By moving to the different
level of analysis, the relationship between the variables of interest can be detected better. In this case, dyadic effects might better
model the data while substantially improving our understanding of congruence. Thus, the issue of convergence within dyads is of
central importance to this project.
As a first approach in addressing the question of convergence, notice that in Table 2 the F tests based on a one-way ANOVA are
designed to detect between-dyad differences. (The arrangement of the data in this table has been reconfigured from Table 1. It
now conforms to View 3 in Fig. 3. For all remaining ANOVA and WABA tests, View 3 is used.) Note that all of the F tests are sta-
tistically significant. In Table 3, notice that the correlation based on dyadic means (rbetween-dyad) equals −0.70 (p b 0.01). Both of
these pieces of evidence initially would suggest that there is some merit in using the aggregated dyadic means. However, these
two pieces of information by themselves are insufficient to draw this conclusion (Dansereau et al., 1984). Our next step for

Table 3
WABA inferences about relationship of derailment scales for all matched focal leader-peer dyads.

Grouping Entity: Dyads WABA I: Variation Source WABA II: Covariation Source Cumulative R
(Composed of Leader & Eta Ratio Correlation Ratio Components
Matched Peer)
Total Bet. With. E F Infer. Bet. With. A Z Infer. Bet. With. Infer.
Variables: Corr.
(1) Building &
Weak
Mending Relationships .78 .63 1.24 1.55**
Wholes

(2) Problems with


Interpersonal Weak
.76 .65 1.16 1.36**
Relationships Wholes

Scale 1 * Scale 2 –.67** –.70** –.63** .09 5.86** Equiv. –.41 –.26 Equiv.

† Significant by the 15° test; ‡ Significant by the 30° test; * p < .05, ** p < .01 Degrees of freedom: N=9,214 stacked dyadic reports, and
J=4,607 dyadic pairings. See Tables 1 and 2 and Figure 3 for a full explanation of the data layout configurations.
S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177 165

assessing the appropriateness of aggregating these data deploys the WABA inferential system. The relevant data and test results
are located in Table 3 and discussed below.
The steps for employing the WABA system in its original and updated form have been well-articulated elsewhere (Dansereau
et al., 1984; Markham et al., 2014b), and will be summarized here.

1. Evaluate the total correlation: Data sets with nested entities, such as this one, offer the potential for three types of correlations
to be calculated. One is based upon dyad (unit) averages. A second is based on deviations scores above or below the dyad av-
erage. The third is the total correlation based on the entire pool of observations, in this case 9488-paired reports. (Recall that
each dyad is described by two rows of data, one from the focal respondent and one from a peer.) Thus, we start by calculating
and evaluating the raw score correlation. In the last row of Table 3, the total correction is r = −0.67**. This represents the
pooled assessment of peers and of focal respondents with respect to the negative relationship of Building & Mending Relation-
ships and Interpersonal Problems. However, it is not a convergence correlation in the traditional sense.
2. Evaluate dyadic differences: As a starting point, a one-way ANOVA with cells aligned with the dyads is used to evaluate the
significance of the differences between dyads. This is a more difficult criterion to meet than other indicators such as rwg or
ICC1. The WABA system also includes tests of practical significance, which will be discussed below. The combination of the sta-
tistical and practical tests for detecting group (dyad) differences is termed WABA I because of its focus on questions of varia-
tion. The results for this data set can be found in the portion of Table 3 labelled the WABA I section, which it is based on the
logic of ANOVA. WABA I tests are necessary to ensure that we were not creating statistical artifacts through inappropriate ag-
gregation before we proceeded to WABA II where the focus is on covariation. (See Yammarino and Markham (1992) for a de-
tailed explanation with examples.) Each dyad is as a cell and contains two matched reports: one from the focal leader and one
from a peer. These F tests indicate statistically significant differences, and, in this case, they are significant for both variables.
We also used the E tests (Dansereau et al., 1984) of practical significance. Neither variable meets these more stringent practical
tests. These results lead to an inference about these dyads showing weak differences.
3. Evaluate the two dyadic-based correlations: Recall from Step 1 that there are three types of correlations. In this step, the cor-
relation based on dyadic averages is calculated, as is the within-dyad correlation. A significant correlation for the dyadic aver-
ages is a new form of convergence correlation. To achieve it, not only is there no significant variation within a dyad, but also
the joint peer-focal view of Building & Mending must converge and the correlation with the joint view of Interpersonal Prob-
lems. (In contrast, a significant within-dyad correlation would indicate that there is a substantial gap between the two raters
above and below the dyadic average.) The results for this data set are located in the middle section of Table 3, labelled “WABA
II: Covariation Source”. (WABA II focuses on issues of covariation and correlation and utilizes both statistical and practical tests
of significance.) Here we observe an rbetween = −0.70 (p b 0.01) based on dyadic averages. However, at the same time, the
partial (or within-unit) correlation is rwithin = − 0.63 (p b 0.01). To compare these two forms of the correlation, we used
two parameters: the statistically based Z test and the geometrically based A test to determine if there is a difference between
these two correlations. There is a significant difference between these correlations based on the Z test, but the A test for prac-
tical significance is not interpretable. Thus, an equivocal inference is shown for this section.
4. Evaluate the joint variance and covariance effects: A unique aspect of the WABA inferential system is that it offers a method to ex-
amine and assess the joint effects of the differences and the correlations. This is contained in the Cumulative R components. In
other words, the Cumulative R results represent a joint, multivariate test for inductions because each Cumulative R component si-
multaneously contains information about both variances (from the WABA I tests) and covariances (from the WABA II test). These
two components are labelled Cumulative Rbetween (for between-dyad joint effects) and Cumulative Rwithin (for within-dyad joint
effects). More specifically, the between-dyad Cumulative R component is the multiplicative product of three terms: the two be-
tween-dyad etas (from the F tests) and the between-dyad correlation. Consequently, the interpretation of this multiplicand is in-
triguing; the Cumulative Rbetween component shows how much of the raw score, total correlation is simultaneously derived from
dyad level variation and co-variation. In this case, when examining Scale 1 (Building Relationships) and Scale 2 (Interpersonal
Problems), more than half (i.e., 61%) (Cumulative Rbetween = −0.41) of the total correlation (rtotal = −0.67**) is a simultaneous
function of both between-unit differences and between-unit correlations. The within-unit Cumulative R component (Cumulative
Rwithin = − 0.25) is also the multiplicative product of three terms: the two within-dyad etas and the within-dyad correlation,
thereby showing how much of the raw score, total correlation is derived from within-dyad sources. In this case, slightly more
than a quarter of the total correlation is a function of within-unit differences and within-unit correlations. (Note that if the be-
tween-dyad cumulative component and the within-dyad cumulative component are summed, then the raw score, total correlation
is recreated.) From the cumulative perspective in the far right panel of Table 3, there is not enough of a difference between these
two components to draw a strong inference towards the whole dyad condition. Instead, a weaker inference is drawn. Table 3
shows an equivocal inference. By way of refinement to the original WABA inferential system, this particular configuration of results
might also be considered a weak, whole dyad effect using an entity analytics perspective (Markham et al., 2014b). In either case, it
is the substantial within-dyad correlation of −0.63** that is problematic.

RQ 2: evaluating dyadic convergence using the coefficient of variation (CV)

If we had found clear, unitary dyadic results based on all sets of statistical and practical tests (see Markham et al. (2014b) for
an example of this type of whole group finding that meets both statistical and practical tests of significance), then it would be
straightforward to conduct additional hypothesis testing by developing hierarchical regressions. Instead, we have some evidence
166 S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177

Fig. 4. Intensity Curve of convergence for two derailment scales for dyads (based upon coefficient of variation and dyadic means).

to support aggregating to the dyad level based on the WABA I F tests, but we also have substantial within-dyad variability and
covariability that is weakening the ultimate inference for Table 3.
We would like to know if all of these dyads possess the same amount of within-dyad variance and covariance, or if there are
some dyads that clearly converge and others that just as clearly do not. We will evaluate the meaning of lack of agreement within
dyads by calculating for each dyad its standard deviation and then converting it into the coefficient of variation (CV) for each
leadership scale. This will put all measures on the same scale with the same type of interpretation. The use of the CV as an
index of agreement for feedback based on whole groups of raters has been explicated and detailed by Markham, Markham,
and Braekkan (2014).
S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177 167

At a theoretical level, it may not be possible to have a single decision rule or criterion about the size of a CV to disqualify dyads
that have too much internal variability. This is because the range of standard deviations, and consequently their CVs, is dependent
upon the dyad means as shown in Fig. 4. The bottom chart in Fig. 4 shows the curvilinear relationship between the scale mean
and its associated coefficient of variation for Interpersonal Problems. In this case, scale averages at either end of the x-axis nec-
essarily imply that the within-group variability will be zero or near zero. A perfect dyad average of 1.00 or 5.00 will necessarily
mean that there is perfect agreement for the two sources. Only for the midrange of the scale shown in Fig. 4 is the standard de-
viation and CV for dyads, which are more free to vary. The top chart in Fig. 4, however, shows only a linear relationship, for Build-
ing & Mending Relationships, not the expected curvilinear relation, as has been found in previous research. This is because of a
scale truncation problem; there are no dyads in the range of 1.00 to 2.00, and there are very few dyads that have a score of
b2.75. This is not surprising given that most of the population of CCL focal respondents, a relatively elite group, have been nom-
inated by their organizations because of their high leadership potential.
We assigned each dyad to one of three categories based on the Interpersonal Relations scale shown on the right panel of Fig. 4.
• Category 1: High Agreement. If the dyadic CV fell below the curvilinear, best-fit prediction curve (that is, there was very little
variation between the peer and focal respondent report), then the dyad was assigned to the “high agreement” category. About
2317 dyads, or 50% of the data set, were categorized here.
• Category 2: Moderate Agreement. If the dyadic CV fell above the best-fit curve and below the 95% upper control limit (UCL),
then the dyad was slotted as “moderate” relative agreement. About 47% (or 2153 dyads) were placed in this category.
• Category 3: Low Agreement. If the dyadic CV was outside the 95% UCL for the prediction curve, it was assigned to the “low
agreement” category. Here there were only 137 dyads (or 3%) of the total.
Notice that the same CV score, for example of 0.25, would have different interpretations at different points along the X-axis in
this system. If it were associated with a mean of 3.00, then it would be considered high agreement, but if it were associated with a
mean of 4.5, then it would be considered an indicator of only moderate agreement.
Given that different types of control limits can be used, then it is possible that the assignments made above may not be op-
timum. However, as a cross-check, we evaluated the differences between the average CVs by agreement classification. Using this
tripartite categorization as the cells in a one-way ANOVA, the significance of the differences in the dyadic CV's was evaluated as
shown in Table 4.
It is not surprising that there likely would be statistically significant differences in the average CVs of these three categories of
agreement. Nevertheless, the mean differences are substantial, with a very large F value of 7297.1. About 50% of all dyads fell into
the High Agreement category with minimal differences between the matched leader and subordinate reports as represented by an
average CV of 0.12. This result contrasts sharply with the 47% of dyads categorized as moderate agreement and having an average
CV of 0.37, about three times as high as the first category. The remaining dyads (about 3% of the total) had a substantially higher
average CV of 0.77, which indicates practically no agreement at all between these matched peer and focal respondent reports.
These findings for peer-focal dyads almost perfectly replicate what was found for dyads composed of direct subordinates and
focal respondents (Markham et al., 2015).
The above evidence is not as complete as it could be in evaluating the nature of agreement for these dyads. In other words, the
results in Table 4 speak to general differences between the three agreement categories, but they do not provide any information
about the configuration of the dyads within each agreement category. That is to say, does the problem of having a high within-
dyad correlation, as noted above for Table 3, persist across these three categories? To answer this question, three separate WABAs
were conducted at the dyadic level of analysis for each of the three agreement categories. These results are shown in Table 5, and
they are evaluated using the same framework described above in Steps 1 through 4.
The top third of Table 5 applies to the High Agreement category. These results provide clear evidence that the dyads in the
High Agreement category demonstrate a strong, unitary, whole dyad effect that meets both statistical and practical tests of signif-
icance for the three sections of tests (WABA I, WABA II, and the Cumulative R components). This is tantamount to saying that
these data are configured most closely with the left hand symbol representing a unitary, whole dyadic effect in Fig. 3. The statis-
tical and inference conclusions are clear, but it must be remembered this finding only applies to 50% of all of this study's dyads.
The between-dyad correlation of rbetween = −0.74** can be interpreted as indicating that whole dyads converge and that dyads
reporting that when the focal leader evidences high amounts of Building Relationships there is at the same time very low
amounts of Interpersonal Problems.

Table 4
ANOVA for differences in dyadic CVs for high, medium, and low focal leader-peer agreement.

Building & Mending High relative agreement Moderate relative agreement (between Low relative agreement (N
Relationships (bfit line in Fig. 4) fit line and 95% UCL in Fig. 4) and 95% UCL in Fig. 4) F p η2 df

Avg. CV for dyads 0.12 0.37 0.77 7297.1 p b 0.001 0.61 2,


4605
Min & max CV of 0.00 to 0.40 0.09 to 0.72 0.67 to 0.94
dyads
Dyadic N for cells 2317 2153 137
Percent of total 50% 47% 3%

Total dyadic pairings N = 4607.


168 S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177

Table 5
WABA inferences about derailment scales for matched focal leader-peer dyads for three agreement conditions.

WABA I: variation source WABA II: covariation source


Cumulative R
Eta Ratio Correlation Ratio components
Variables for high dyadic Total
leader-peer agreement condition corr. Bet. With. E F Infer. Bet. With. A Z Infer. Bet. With. Infer.

(1) Building & Mending 0.84 0.54 1.57† 2.47⁎⁎ Wholes


Relationships (15°)
(2) Problems with Interpersonal 0.92 0.38 2.40‡ 5.77⁎⁎ Wholes
Relationships (30°)
Scale 1 ∗ Scale 2 −0.66⁎⁎ −0.74⁎⁎ −0.38⁎⁎ 0.45† 19.5⁎⁎ Wholes −0.58 −0.08 Wholes
(15°)

WABA I: variation source WABA II: covariation source


Cumulative R
Variables for moderate dyadic
Eta Ratio Correlation Ratio components
leader-peer agreement Total
condition corr. Bet. With. E F Infer. Bet. With. A Z Infer. Bet. With. Infer.

(1) Building & Mending 0.74 0.67 1.09 1.19⁎⁎ Weak


Relationships wholes
(2) Problems with 0.67 0.74 0.91 0.82 Equivocal
Interpersonal Relationships
Scale 1 ∗ Scale 2 −0.70⁎⁎ −0.69⁎⁎ −0.71⁎⁎ −0.02 −1.13 Equiv. −0.34 −0.35 Equivocal

WABA I: variation source WABA II: covariation source


Cumulative R
Eta Ratio Correlation Ratio components
Variables for low dyadic Total
leader-peer agreement condition corr. Bet. With. E Fa Infer. Bet. With. A Z Infer. Bet. With. Infer.

(1) Building & Mending 0.62 0.78 0.80 0.64


Relationships
(2) Problems with Interpersonal 0.28 0.96 29‡ 0.09 Dispersed
Relationships parts
Scale 1 ∗ Scale 2 −0.56⁎⁎ 0.09 −0.77⁎⁎ −0.79‡ −7.65⁎⁎ Dispersed 0.02 −0.58⁎⁎ Weak
parts parts

Degrees of freedom: N stacked reports = 4634 and J = 2317 dyads.


Degrees of freedom: N = 4306 stacked reports and J = 2153 dyads.
a”
Degrees of freedom: N = 274 stacked reports and J = 137 dyads.

Significant by the 15° test.

Significant by the 30° test.
⁎ p b 0.05.
⁎⁎ p b 0.01.

The middle horizontal third of Table 5 shows the Moderate Agreement category, and it demonstrates a different entity config-
uration. Based on the tests in the third columnar section, a clear equivocal condition is inferred. Notice that the CumulativeR-
between and Cumulative Rwithin components are nearly identical. This corresponds to the middle graphic in Fig. 2. It is
interpreted as indicating that neither a clear unitary effect nor a clear dispersed dyad effect can be determined. For an equivocal
effect like this, it is more parsimonious to view this category as individuals. Thus, the total correlation of rtotal = −70** is not
cloaking a hidden dyad effect. The fact that there is membership in a nominal dyad is not relevant to the interpretation of
these data because the imposition of dyads as cells does not help explain the data. This is clearly illustrated with the Interpersonal
Problems scale; there are no detectable dyadic differences based on an F test of 0.82, n.s.

Table 6
Frequency distributions of peer-leader dyads across dyad agreement categories and peer rating group agreement categories.

Cross tabulation of number of dyads in two agreement conditions

Count Peer dyadic agreement indicator Total count


1.00 2.00 3.00
High dyadic agreement Moderate dyadic agreement Low dyadic
agreement
Peer Group Agreement Indicator 1.00 = 1344 997 59 2400
High group agreement
2.00 = 919 1086 54 2059
Moderate group agreement
3.00 = 54 70 24 148
Low group agreement
Total 2317 2153 137 4607

Total Dyads: N = 4706.


S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177 169

The lower third of Table 5 shows a very different pattern of results for the Low Agreement category. Both scales have such
little amounts of variability between dyads that not only are the F values very small, but also a “parts” condition can be inferred.
This corresponds to the dispersed dyad condition described earlier in this paper. The most interesting evidence, however, is
contained in the Cumulative R components where the Cumulative Rwithin of −0.58 shows that almost all of the total correlation
is a function of within-dyad differences, not between-dyad differences. These results would fit best with the third, right-hand
image in Fig. 2 corresponding to a “dispersed” dyadic condition. In this case, the within-dyad correlation of rwithin = −0.77**
can be used, with the understanding that it is based on the deviation scores above and below the dyadic means.

RQ 3: the search for boundary conditions while crossing levels of analysis from dyads to rating groups

Detecting the three dramatically different configurations for the dyads in the three agreement conditions in Table 5 raises a
new set of researchable questions. How can we understand (and model) the conditions under which we would expect to find
unitary dyads, equivocal dyads, or dispersed dyads? In other words, does a focal respondent whose self-report converges with
one peer also show convergence with most of the remaining peer raters? This question then leads to the search for boundary con-
ditions for identifying when each configuration would be predicted as noted by Markham (2010).
One alternative approach has to do with nesting effects at higher levels of analysis. We know from previous research
(Markham et al., 2014a) which peer in this data set belongs to which group of raters for a focal respondent. (Note that these rat-
ing groups are composed of all of the peers nominated by a given focal leader.) When the data set is configured to examine entire
groups of raters for each focal leader, we can evaluate how much convergence exists horizontally between these raters. When the
data set is arranged to examine dyads, we can evaluate on a pair-wise basis how much convergence exists vertically between
each leader and a peer. Because these dyads are all nested within the larger rater groups, how do we simultaneously examine
if the convergence at the rating group level is related to dyadic convergence?
A straightforward method for an initial evaluation of this question without having to conduct a Multiple Relationship Analysis
in WABA (for an example, see Markham & Halverson, 2002), is to use the CV information from both dyads and the whole rating
groups. In other words, each dyad has already been assigned to an agreement category based on where its score fell on the in-
tensity convergence curve in Fig. 4. We then re-calculated the same agreement indicator for whole groups of peers who rate
the same focal respondent. By then creating a crosstab display, we determined if all of the high agreement dyads are nested with-
in high agreement rating groups. These results are shown in Table 6.
It appears that there is a form of a “misclassification” problem when examining Table 6. With respect to High Agreement
dyads, only 1344 of 2317 (or 58%) were also found in peer rating groups that were considered high agreement. Similarly, only
50% of moderate agreement dyads were located in rating groups with moderate agreement. This finding tentatively suggests
that high agreement dyads might be dispersed across all three types of rating groups. As such, it is unlikely that the findings
at the dyadic level can be generalized to the whole group of peer raters from which they were drawn.

Discussion

This investigation into the suitability of aggregating matched leader-peer reports using dyads as testable entities has revealed
pronounced differences in how convergence is configured. We will briefly review the following issues:

(1) What is the state of research regarding peer feedback?


(2) Should universal aggregation of dyadic leader-peer data continue to be the de facto standard?

Fig. 5. Relative strength of inductions for dyadic configurations (adapted from Markham et al., 2015).
170 S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177

(3) If aggregation is not universal, what alternatives exist for analyzing this type of data?
(4) If aggregation is not universal, what alternatives exist for feedback?
(5) Is the search for boundary conditions for detecting entity effects at the dyadic level fruitful?
(6) What are the implications for future practice?
(7) What are the implications for future research?

The state of research regarding peer feedback

On balance, the state of research regarding the use of peer feedback is encouraging and improving. For much of the history of
360° programs, peer feedback has been collected as an ancillary to subordinate and boss feedback, and it has rarely been the focus
in its own right. Recently, however, researchers have come to realize that peer feedback provides information that goes beyond
that provided by the respondents who are part of the direct chain of command. In fact, it is possible, from a reception and utili-
zation point of view, that feedback from peers might be the easiest form to accept, even when they are not anonymous.
In light of the general acknowledgment that there are inherent problems with the GAA model when applied to whole groups
of peer raters, researchers have moved into two subsidiary research streams. The first stream attempts to find ways to ameliorate
the problem of providing false feedback derived from peer group averages by providing alternative ways of identifying and tag-
ging groups whose averages are not valid. The second stream of research has moved to the use of the SOA model in attempting to
understand the nature and meaning of one-on-one leader-peer convergence. In response to significant, but relatively low tradi-
tional convergence correlations, recent research has gone beyond the use of simple difference scores that have inherent problems.
Instead, researchers are now exploring a variety of techniques ranging from surface-response models built upon polynomial re-
gression equations to entity analytic techniques.
These analytical improvements are necessary to overcome past methodological limitations and to move the field forward. Im-
proved techniques are required to help provide better and more targeted feedback for focal respondents. Given the size of the
population who participate in these programs, this is a substantial audience. It is only by improving the accuracy and targeting
of this type of feedback, that current criticisms of the leadership development field (Kellerman, 2012; Pfeffer, 2015) can be ad-
dressed and ameliorated.

Table 7
Summary table of inductions for two derailment scales under three dyadic agreement conditions for matched focal leader-peer and focal leader-subordinate reports.

Source Sig F for building & Total correlation of Cumulative Induction (see
Condition table mending? Sig. F for building & mending Rbetween as Fig. 5 for
Condition description (dyad and # of interpersonal with interpersonal Cumulative % of total definitions of
number composition) dyads problems? problems CumulativeRbetween Rwithin corr. bands.)

1a. All direct Table 5b Yes −0.63⁎⁎ −0.39 −0.24 62% Band A: weak
leader-subordinate 4811 Yes whole dyad
dyads dyads effect
1b. All leader-peer Table 3 Yes −0.67⁎⁎ −0.41 −0.26 61% Band A: weak
dyads 4607 Yes whole dyad
dyads effect
2a. Leader-subordinate Table 7b Yes + E ratio −0.51⁎⁎ −0.42 −0.09 82% Band B:
dyads categorized 2250 Yes moderate
as high agreement dyads whole dyad
effect
2b. Leader-peer dyads Table 5 Yes + E ratio −0.66⁎⁎ −0.58 −0.08 88% Band C: strong
categorized as high 2317 Yes + E ratio whole dyad
agreement dyads effect
3a. Leader-subordinate Table 8b No −0.65⁎⁎ −0.37 −0.29 54% Equivocal
dyads categorized 1927 Yes (individuals)
as moderate dyads
agreement
3b. Leader-peer dyads Table 5 Yes −0.70⁎⁎ −0.34 −0.35 50% Equivocal
categorized as 2153 No (individuals)
moderate dyads
agreement
4a. Leader-subordinate Table 9b No −0.85⁎⁎ −0.24 −0.60 28% Band D: weak
dyads categorized 188 No dispersed
as low agreement dyads dyads
4b. Leader-peer dyads Table 5 No −0.56⁎⁎ 0.02 −0.58 04% Band D: weak
categorized as low 137 No dispersed
agreement dyads dyads

The “b” symbol indicates that information is derived from the source table in Markham et al. (2015). See "At the crux of dyadic leadership: Self–other agreement of
leaders and direct reports — Analyzing 360-degree feedback." Leadership Quarterly, 26, 958–977.
⁎ p b .05
⁎⁎ p b 0.01.
S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177 171

Universal aggregation of leader-peer dyadic data

These results have revealed some contradictory data configurations for Building & Mending Relationships with respect to our
career derailer, Problems with Interpersonal Relationships. Our primary research question asked if dyads composed of one focal
leader and one peer would converge on their description of these two variables, and if the assumed negative relationship between
these two variables could then be characterized as a property of clear, whole dyadic units. We did not find the evidence for a
strong inference; at best, it appears to be a very weak whole unit borderline equivocal inference across all dyads. Using the CV
indicator to classify dyads in terms of peer-focal respondent agreement proved to have some utility. An evaluation of the coeffi-
cients of variation for these peer dyads indicates that about half of all dyads (3% in the low agreement category and 47% in the
moderate agreement categories) should be disqualified from having their dyadic averages reported. This finding is similar to other
findings showing a lack of homogeneity within many types of feedback groups, as was the case with Beus, Jarrett, Bergman, and
Payne (2012) whose tests of eight of 12 subgroups also failed to support aggregation. Universal aggregation of peer-leader dyads
should not be conducted without adequate checks for hidden dyadic levels of analysis effects.

Relative strength of inductions and the CV

As a key ancillary research point, we can attempt an answer to the question: How strong does the evidence need to be in
order to have confidence that a full, unitary group effect has been detected? A visual framework, adapted from Markham et al.
(2014b, 2014a) for addressing this question for dyads, can be found in Fig. 5.
This figure is derived from the four inferential conditions earlier visualized in Fig. 2. The first inferential condition focuses on a
unitary, whole dyad induction that highlights between-dyad sources of variance and covariance. The second inference condition
focuses on an equivocal induction with respect to dyads where raters are viewed as independent of their dyads and no adjust-
ment for dyad membership is required in the data. A third condition shows a dispersed “dyad-parts” induction that emphasizes
within-dyad signed deviation scores. Finally, a fourth inference corresponds to the traditional null condition where there is no
existing relationship between the two variables at any level of analysis.
Arranging these conditions in a layered, circular fashion makes it easier to visualize relatively weak versus moderate versus
strong inductions. (The bands relating to strength of induction for the dispersed group model are labelled D, E, and F. We will
focus on their mirror images, Bands A, B, and C, as they apply to the unitary, whole group condition.)
Overall, Table 3, which is the central data table of this study containing all of the available dyads, ultimately did not show an
general, strong unitary dyadic effect, as was expected. However, the table does indicate that there were sufficient differences be-
tween dyads with respect to the ANOVAs to make understandable the conclusions derived from traditional aggregation methods.
In contrast to the results in Table 3, the findings in Table 5 show much clearer inductions. For high agreement dyads, a very
clear picture of them as whole units with high levels of convergence could be induced. The large eta statistics for this condition
indicates a high degree of convergence in the matched reports. However, only half of the dyads could be classified in this manner
which is why the use of dyadic averages should restricted to those dyads that meet these criteria. In the moderate agreement cat-
egory, while peers and focal respondents were nominal members of a dyad, the imposition of dyadic membership on this subset
of data did not reveal any clear effects. While there appears to be a negative relationship of Building Relationships with Interper-
sonal Problems, it seems to be a function of individual differences with equal amounts of divergence and convergence. The low
agreement condition was configured very differently than the other two categories. For this condition, the imposition of dyadic
membership was necessary so as to calculate a dyadic average and remove its effect. For these low agreement dyads, there
were no discernable differences between them; all reported a fair degree of interpersonal problems. However, there was maxi-
mum divergence within the dyads, and the large deviation scores around the dyad averages means that the peers and focal re-
spondents had almost no similarity when describing these derailment factors.

Fig. 6. 3D visualization alternatives of multiple sources with focal respondent data in molecular form: ball-and-stick vs. space-filling models.
172 S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177

Fig. 7. 3D surface model adapted to 360° feedback sources.

The search for boundary conditions while spanning across levels

The short answer to the question about the hope of identifying boundary conditions appears to be affirmative. However, rather
than discuss in isolation each data table supporting this conclusion, Table 7 summarizes the relevant findings of this study across
some of the proposed boundary conditions and then juxtaposes these results with a similar study of the subordinate-focal respon-
dent dyads.
Table 7 combines data from this study about peers along with a previous study about subordinates (Markham et al., 2015) and
adds the new information about peers for each of the agreement conditions. Each condition (2a through 3c) had a single level
dyadic WABA analysis conducted. Each condition (row) is also summarized by its source table, the number of dyads involved,
the results of the WABA I tests, the total correlation between Building Relationships and Interpersonal Problems, the Cumulative
R components, and the final induction.

Fig. 8. 3D surface model showing 3 types of peer relationships.


S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177 173

Rather than parsing this table line by line, we will examine two overall conclusions. First, the dyadic configuration of peers
closely parallels that of direct subordinates for the overall study and for each agreement condition. This finding suggests that
within agreement conditions, similar processes might be occurring for both peers and subordinates. In contrast, there are substan-
tial differences between the agreement conditions that are parallel for peers and subordinates. The configurational differences are
substantial between the three agreement conditions. A second, important observation is that the total (raw score) correlations, all
of which are significant and ranging between rtotal = −0.51** (high agreement focal respondent-subordinate dyads) to rtotal =
−0.85** (low agreement focal respondent-subordinate dyads) can be deceptive in terms of the underlying entity configuration
that it might cloak. On the one hand, in Condition 2b, the strongest possible dyadic inference (Band C shown in Fig. 5) is evi-
denced with almost all of the raw score correlation being driven by dyadic differences for high agreement, peer-focal dyads.
This group of high agreement peers also has the largest Cumulative Rbetween statistic, which might indicate that peers can be
an essential source of feedback information that has more clarity than subordinate feedback. On the other hand, in Condition
4b, the reverse is true. For whatever reason, this group of peer-based dyads shows the lowest level of convergence of all the
groupings in Table 7. This agreement group's raw score correlation is almost exclusively a function of within-dyad variances
and covariances. This finding means that when something misfires in this type of dyad, it creates a situation of maximal diver-
gence in the perspectives of the peer and focal respondent. Just like the situation where peers can provide the clearest feedback
of the groups in Table 7, they can also show the highest level of disagreement.

Applications: visualization for feedback

This research poses a quandary for the multisource feedback specialist. If the results from some groups of peers should not be
reported, then what alternatives exist? One answer to this problem might come through the field of organizational visualization
in which more information other than just average rating group scores can be displayed (Markham, 1998). At least two alterna-
tives exist. First, one can display dispersion data for rating groups without violating issues of anonymity. Second, one can also
move to a different level of analysis — in this case, dyads. How might this work?
To begin this process of visual prototyping, realize that in the scientific visualization work, 2D and 3D data plots are considered
necessary. Rather than work with traditional, predefined x, y, and z data axes for the explicit display of variables, many visuali-
zation practitioners are interested in displaying the entities behind the variables of interest. In this respect, WABA's focus on test-
ing entities fits well with the needs found in the physical and engineering sciences for more realistic visualizations and
simulations (Markham, 2002).
There is a potential overlap between the organizational sciences and the visualizations available in molecular chemistry
(Markham, 1998). By way of example, the display in Fig. 6 shows two traditional visualizations drawn from the field of molecular
modeling.
The left frame shows a “ball-and-stick” model of an atom. The right frame displays a “state space” figure that enhances the
underlying structure with a variety of colored envelopes that provide additional information. Both figures have the potential to
help solve the 360° feedback problem in MSF programs. Starting with the state space model, the first issue to be solved is how
to move in a space not defined by the data axes, and, at the same time, to incorporate multiple types of feedback raters. Fig. 7
gives a potential solution.
In this figure, the central “atom” is the focal respondent. There are two intersecting planes, one horizontal and one vertical.
Each plane is again divided in half by the central axis line. This creates four planes, each at right angles to the others, rather
like Japanese fans that have been set at right angles and then glued together. Each of these 180° surfaces, i.e., “fans”, corresponds
to one of four sources of feedback collected by a typical 360° program: (1) direct subordinates; (2) peers; (3) bosses; and (4)
others such as customers. The surfaces are appropriately labelled by the type of rater in Fig. 7. Note that this model is not pop-
ulated. In the next figure, sample peers will be added.
Figure 8 builds on the previous figure by adding three different types of peers. The placement and bonding of each peer in
relation to the focal respondent is determined by the type of results that were developed in this paper. Thus, the distance of
the peer from the central focal respondent in the relevant plane is determined by the dyadic CV; the lower the CV (i.e. high agree-
ment derived from convergent scores), the closer the position to the focal respondent. The “bond” is determined by the level of
analysis inference. A double bond shown by the two lines connecting Peer 1 to the focal respondent represents a high agreement,
tightly convergent dyad. A single, dotted line, such as seen for Peer 2, represents the equivocal inference and moderate agreement.
Finally, if there is no connecting line, such as with Peer 3, then this is the most divergent, dispersed condition corresponding to
the low agreement condition. Peer 3 is a nominal member of the peer group, but is certainly on the periphery.
While this figure only represents a prototype, it does offer the possibility of visualizing dyads in a 360° space as entities while
at the same time capturing relevant conclusions from the WABA inductions. More important, it offers an alternative to the tradi-
tional row and column data display of group means used by most traditional 360° practitioners.

Future research

The application of this entity analytic technique to peer feedback data extends previous research findings derived from leader-
subordinate feedback. The key question for future research is why some focal respondents' perceptions can converge so closely
with peers while others are so divergent. There are also a number of methodological issues that need to be addressed to refine
this technique's application to 360° programs. These issues are encapsulated in the following list.
174 S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177

1. Agreement categories: The current agreement categories produced clear conclusions with respect to the way in which the en-
tities were configured. However, it is not clear if the decision rules based on the confidence bands to assign membership in
these categories were the most valid. This is especially important in light of how few low agreement dyads were identified
compared to the other two groups. Thus, further research into selecting the correct decision rule and finding an external
way of validating them is of paramount concern.
2. Additional raters: MSF survey tools, including Benchmarks, make accommodations for a category of raters called “others” and
for the superiors of the focal leaders. In both cases, there has not been any research that applies this entity-focused analytics
framework to raters composed of organizational outsiders, such as customers.
3. Antecedent variables: From a nomological network perspective, it would be helpful to have independent variables that help
contribute to understanding when convergence with raters might occur at the dyadic level. This could be based upon similarity
in age, time together, the degree of trust, etc.
4. Identify outcomes: Do focal respondents who receive feedback from convergent raters accept it more easily? And then act
upon it more willingly? Are these peers the most trusted of the group of peer raters? To what extent do convergent dyads
jointly agree that there are substantial problems? Could it be the case that the process of convergence somehow prevents neg-
ative feedback from being given?
5. Establish norms: While it might be tempting to expect that all organizations will show a similar proportion of high, moderate,
and low agreement dyads when the same decision categorization rules are used, such an expectation is not warranted. This is
because the CCL sample is unusual. It contains two types of extremes. First, there are focal leaders who have been nominated
by their sponsors because they are seen as high potential, and they have evidenced positive leadership behaviors. Second, there
is also a smaller group of leaders for whom the CCL experience represents their last chance to change. They are participating in
hopes of turning around a failing career. As such, it is not possible to generalize from the CCL experience to national norms
without considering the invitation process for participants. Similarly, any MSF program within a large corporation will have
a comparable generalization problem unless the organization requires all managers to participate. (It may be highly advanta-
geous if, for example, all CEOs of publicly traded organizations were required to do this on a regular basis.) Finally, while the
use of 360° feedback is a crucial part of leadership development programs, it is not clear that findings from US data sets will
necessarily generalize to other cultures (Bartram, 2004; Gooty & Yammarino, 2016), especially those that have a more collec-
tivistic basis.
6. Utilizing visualizations: With respect to visualizations, lab research to identify the most salient graphical characteristics for
users will need to be done. The prototype presented here is based on previous field experience. It is also not clear if the
same graphical display can be used for a participant as well as the coach or consultant who is giving the feedback.
7. Matching methods and data: The application of this entity analytics framework offers at least two positive possibilities for re-
search. On the one hand, it provides an alternative explanation and solution for why past leadership convergence studies have
found weak results. By mixing convergent dyads with divergent dyads and with nominal dyads (best thought of as individ-
uals), previous research has produced correlations on par with those in Table 1. On the other hand, these findings also suggest
that certain methods are more appropriate for the different agreement categories. For example, in the high agreement category,
using WABA to verify the appropriate level of analysis could be followed by using its Multiple Relationship Analysis (MRA)
technique or its Multiple Variable Analysis (MVA) technique. Any technique that focuses on the use of dyadic averages
would be appropriate for this category. For the same reason, the use of polynomial regression for this group would not be a
good match because it would miss the hidden group effects. Polynomial regression, however, would be an excellent choice
for the equivocal group (moderate agreement) because it would focus on individual scores under a condition where no dyadic
effects would interfere. Finally, the low agreement group would be appropriate for techniques that highlighted the deviation
scores above and below the mean. (Note that these are not the same as the difference scores that polynomial regression
was designed to replace. Those difference scores were generated by subtracting the leader's report from the rater's report.)

These results lead to a conundrum for the MSF field. The GAA model has been and probably will continue to be dominant in
the practitioner realm. Peer ratings will continue to be averaged and reported. That said, academic researchers have done far more
work based on the SOA model. To be able to merge these two streams, it might be time to reinvent 360° feedback to accommo-
date both development and performance needs (Toegel & Conger, 2003). There is some preliminary evidence to suggest this is
possible. For example, anonymity may no longer be needed for the peer raters. In a study done by Bamberger et al. (2005) the
feedback from peer raters who were not anonymous was the most useful of the types of feedback offered. As another example,
the use of an alternative form of 360° feedback specifically designed for performance evaluation could go a long way to amelio-
rating the problems of organizations associated with rewarding only extreme levels of performance (Zenger, 1992). A detailed vi-
sualization of dyadic reports might be helpful in a performance evaluation situation.

Summary

This research has approached the issue of contrasting the GAA method with the SOA approach in understanding focal leader/
peer dyads by using the levels of analysis perspective. The imposition of a levels of analysis framework built upon dyadic group
membership has substantially clarified the traditional, low convergence correlations between focal leaders and their raters. For the
same reason, these results confirm that problems abound with the GAA model, even if it endures in the practitioner's toolkit. The
S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177 175

universal aggregation and reporting of rating groups should not be done because it introduces too many errors and unknowns in
the feedback process.
As an alternative to the traditional approach of simply assuming the viability of aggregating peer rater reports, the WABA in-
ferential system has been used to test assumptions about the configuration of individuals, dyads, and whole rating groups. While
clear dyadic effects are not universal, it is evident that under certain conditions, strong dyadic inferences can be detected. Only
about 50% of all dyads demonstrated sufficient convergence to warrant being considered unitary, whole dyads. At the other ex-
treme, 3% of all dyads showed so much within-dyad variability that they fit the dispersed “parts” model where there is a maximal
lack of convergence. As such, two critical research questions arise for the future. First, under what other organizational boundary
conditions does this unitary dyadic convergence of focal leader/peer pairings most frequently occur? Second, how can the percep-
tual processes best be understood for the category of peer-focal pairings that shows so little convergence?

References

Abdulla, A. (2008). A critical analysis of mini peer assessment tool (mini-PAT). Journal of the Royal Society of Medicine, 101, 22–26.
Alicke, M. D., Zell, E., & Bloom, D. L. (2010). Mere categorization and the frog-pond effect. Psychological Science, 21, 174–177.
Antonioni, D., & Park, H. (2001). The relationship between rater affect and three sources of 360-degree feedback ratings. Journal of Management, 27, 479–495.
Armstrong, S. J., Allinson, C. W., & Hayes, J. (2004). The effects of cognitive style on research supervision: A study of student-supervisor dyads in management educa-
tion. Academy of Management Learning & Education, 3, 41–63.
Atwater, L. E., & Brett, J. F. (2006). 360-degree feedback to leaders - Does it relate to changes in employee attitudes? Group & Organization Management, 31, 578–600.
Atwater, L. E., & Waldman, D. A. (1998). 360 degree feedback and leadership development. The Leadership Quarterly, 9, 423–426.
Atwater, L. E., & Yammarino, F. J. (1997). Self-other rating agreement: A review and model. Research in Personnel and Human Resources Management, 15, 121–174.
Atwater, L. E., Ostroff, C., Yammarino, F. J., & Fleenor, J. W. (1998). Self-other agreement: Does it really matter? Personnel Psychology, 51, 577–598.
Bachman, J. G., & Omalley, P. M. (1986). Self-concepts, self-esteem, and educational-experiences: The frog pond revisited (again). Journal of Personality and Social
Psychology, 50, 35–46.
Baker, M. N. (2014). Peer to peer leadership: Why the network is the leader. San Fransisco, CA: Berrett-Koehler Publishers.
Bamberger, P. A., Erev, I., Kimmel, M., & Oref-Chen, T. (2005). Peer assessment, individual performance, and contribution to group processes - the impact of rater an-
onymity. Group & Organization Management, 30, 344–377.
Bartram, D. (2004). Assessment in organisations. Applied Psychology - An International Review-Psychologie Appliquee-Revue Internationale, 53, 237–259.
Bergman, D., Lornudd, C., Sjoberg, L., & Schwarz, U. V. (2014). Leader personality and 360-degree assessments of leader behavior. Scandinavian Journal of Psychology, 55,
389–397.
Berson, Y., & Sosik, J. J. (2007). The relationship between self-other rating agreement and influence tactics and organizational processes. Group & Organization
Management, 32, 675–698.
Bettenhausen, K. L., & Fedor, D. B. (1997). Peer and upward appraisals - A comparison of their benefits and problems. Group & Organization Management, 22, 236–263.
Beus, J. M., Jarrett, S. M., Bergman, M. E., & Payne, S. C. (2012). Perceptual equivalence of psychological climates within groups: When agreement indices do not agree.
Journal of Occupational and Organizational Psychology, 85, 454–471.
Boerboom, T. B. B., Jaarsma, D., Dolmans, D., Scherpbier, A., Mastenbroek, N., & Van Beukelen, P. (2011). Peer group reflection helps clinical teachers to critically reflect
on their teaching. Medical Teacher, 33, E615–E623.
Braddy, P. W., Gooty, J., Fleenor, J. W., & Yammarino, F. J. (2014). Leader behaviors and career derailment potential: A multi-analytic method examination of rating
source and self-other agreement. The Leadership Quarterly, 25, 373.
Brown, R. D., & Hauenstein, N. M. A. (2005). Interrater agreement reconsidered: An alternative to the r(wg) indices. Organizational Research Methods, 8, 165–184.
Brutus, S., Fleenor, J. W., & Tisak, J. (1999). Exploring the link between rating congruence and managerial effectiveness. Canadian Journal of Administrative Sciences-
Revue Canadienne Des Sciences De L Administration, 16, 308–322.
Byrd, B., Martin, C., Nichols, C., & Edmondson, A. (2015). Examination of the quality and effectiveness of peer feedback and self-reflection exercises among medical
students. Federation of American Societies for Experimental Biology Journal, 29, 1.
Calhoun, A. W., Rider, E. A., Peterson, E., & Meyer, E. C. (2010). Multi-rater feedback with gap analysis: An innovative means to assess communication skill and self-
insight. Patient Education and Counseling, 80, 321–326.
Carson, M. A., Shanock, L. R., Heggestad, E. D., Andrew, A. M., Pugh, S. D., & Walter, M. (2012). The relationship between dysfunctional interpersonal tendencies, de-
railment potential behavior, and turnover. Journal of Business and Psychology, 27, 291–304.
Castro, S. L. (2002). Data analytic methods for the analysis of multilevel questions - A comparison of intraclass correlation coefficients, r(wg(j)), hierarchical linear
modeling, within- and between-analysis, and random group resampling. The Leadership Quarterly, 13, 69–93.
Church, A. H., & Bracken, D. W. (1997). Advancing the state of the art of 360-degree feedback - guest editors' comments on the research and practice of multirater
assessment methods. Group & Organization Management, 22, 149–161.
Cogliser, C. C., Schriesheim, C. A., Scandura, T. A., & Gardner, W. L. (2009). Balance in leader and follower perceptions of leader-member exchange: Relationships with
performance and work attitudes. The Leadership Quarterly, 20, 452–465.
Conger, J. A., & Nadler, D. A. (2004). When CEOs step up to fail. MIT Sloan Management Review, 45 (50-+).
Conway, J. M., Lombardo, K., & Sanders, K. C. (2001). A meta-analysis of incremental validity and nomological networks for subordinate and peer ratings. Human
Performance, 14, 267–303.
Cullen, K. L., Gentry, W. A., & Yammarino, F. J. (2015). Biased self-perception tendencies: Self-enhancement/self-diminishment and leader derailment in individualistic
and collectivistic cultures. Applied Psychology-an International Review-Psychologie Appliquee-Revue Internationale, 64, 161–207.
Cushing, A., Abbott, S., Lothian, D., Hall, A., & Westwood, O. M. R. (2011). Peer feedback as an aid to learning - What do we want? Feedback. When do we want it? Now!
Medical Teacher, 33, e105–e112.
Dalessio, A. T., & Vasilopulos, N. L. (2001). Multisource feedback reports: Content, formats, and levels of analysis. In D. W. Bracken, C. W. Timmreck, & A. H. Church
(Eds.), The handbook of multisource feedback (pp. 181–203). San Francisco, CA: Pfeiffer/Jossey-Bass.
Dansereau, F. (1995). A dyadic approach to leadership: Creating and nurturing this approach under fire. The Leadership Quarterly, 6, 479–490.
Dansereau, F., & McConnell, J. J. (2000). Data enquiry that tests entity and correlational/causal theories for Windows® user's manual. Buffalo: Institute for Theory Testing.
Dansereau, F., Alutto, J. A., & Yammarino, F. J. (1984). Theory testing in organizational behavior: The varient approach. Englewood Cliffs, NJ: Prentice-Hall.
Dansereau, F., Yammarino, F. J., Markham, S. E., Alutto, J. A., Newman, J., Dumas, M., ... Keller, T. (1995). Individualized leadership: A new multiple-level approach. The
Leadership Quarterly, 6, 413–450.
Dansereau, F., Cho, J., & Yammarino, F. J. (2006). Avoiding the “fallacy of the wrong level” - A within and between analysis (WABA) approach. Group & Organization
Management, 31, 536–577.
Day, D. V., & Dragoni, L. (2015). Leadership development: An outcome-oriented review based on time and levels of analyses. In F. P. Morgeson (Ed.), Annual review of
organizational psychology and organizational behavior, vol 2. Vol. 2. (pp. 133–156) (Palo Alto: Annual Reviews).
Day, D. V., Fleenor, J. W., Atwater, L. E., Sturm, R. E., & McKee, R. A. (2014). Advances in leader and leadership development: A review of 25 years of research and theory.
The Leadership Quarterly, 25, 63–82.
DeStephano, C. C., Crawford, K. A., Jashi, M., & Wold, J. L. (2014). Providing 360-degree multisource feedback to nurse educators in the country of Georgia: A formative
evaluation of acceptability. Journal of Continuing Education in Nursing, 45, 278–284.
176 S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177

Devos, G., Hulpia, H., Tuytens, M., & Sinnaeve, I. (2013). Self-other agreement as an alternative perspective of school leadership analysis: An exploratory study. School
Effectiveness and School Improvement, 24, 296–315.
Diab, N. M. (2016). A comparison of peer, teacher and self-feedback on the reduction of language errors in student essays. System, 57, 55–65.
Dominick, P. G., Reilly, R. R., & McGourty, J. W. (1997). The effects of peer feedback on team member behavior. Group & Organization Management, 22, 508–520.
Donnon, T., Al Ansari, A., Al Alawi, S., & Violato, C. (2014). The reliability, validity, and feasibility of multisource feedback physician assessment: A systematic review.
Academic Medicine, 89, 511–516.
Dotlich, D. L., & Cairo, P. C. (2003). Why CEOs fail: The 11 behaviors that can derail your climb to the top and how to manage them. San Francisco, CA: Jossey-Bass.
Edwards, J. R. (1994). Regression-analysis as an alternative to difference scores. Journal of Management, 20, 683–689.
Edwards, J. R. (1995). Alternatives to difference scores as dependent variables in the study of congruence in organizational research. Organizational Behavior and
Human Decision Processes, 64, 307–324.
Edwards, J. R. (2001). Ten difference score myths. Organizational Research Methods, 4, 265–287.
Edwards, J. R., & Parry, M. E. (1993). On the use of polynomial regression equations as an alternative to difference scores in organizational research. Academy of
Management Journal, 36, 1577–1613.
Espenshade, T. J., Hale, L. E., & Chung, C. Y. (2005). The frog pond revisited: High school academic context, class rank, and elite college admission. Sociology of Education,
78, 269–293.
Facteau, J. D., & Craig, S. B. (2001). Are performance appraisal ratings from different rating sources comparable? Journal of Applied Psychology, 86, 215–227.
Facteau, C. L., Facteau, J. D., Schoel, L. C., Russell, J. E. A., & Poteet, M. L. (1998). Reactions of leaders to 360-degree feedback from subordinates and peers. The Leadership
Quarterly, 9, 427–448.
Fedor, D. B., Bettenhausen, K. L., & Davis, W. (1999). Peer reviews - Employees' dual roles as raters and recipients. Group & Organization Management, 24, 92–120.
Ferris, G. R., Liden, R. C., Munyon, T. P., Summers, J. K., Basik, K. J., & Buckley, M. R. (2009). Relationships at work: Toward a multidimensional conceptualization of dy-
adic work relationships. Journal of Management, 35, 1379–1403.
Feudo, R., Vining-Bethea, S., Shulman, L. C., Shedlin, M. G., & Burleson, J. A. (1998). Bridgeport's Teen Outreach and Primary Services (TOPS) project - A model for raising
community awareness about adolescent HIV risk. Journal of Adolescent Health, 23, 49–58.
Firebaugh, G. (1980). Groups as contexts and frog ponds. In K. H. Roberts, & L. Burstein (Eds.), Issues in aggregation (pp. 43–52). San Francisco, CA: Jossey-Bass.
Firmin, R. L., Luther, L., Lysaker, P. H., & Salyers, M. P. (2015). Self-initiated helping behaviors and recovery in severe mental illness: Implications for work, volunteerism,
and peer support. Psychiatric Rehabilitation Journal, 38, 336–341.
Fleenor, J., McCauley, C., & Brutus, S. (1996). Self-other rating agreement and leader effectiveness. The Leadership Quarterly, 2, 487–506.
Furnham, A. (2010). The elephant in the boardroom: The causes of leadership derailment. Hampshire, England: Palgrave Macmillian.
Furnham, A., & Stringfield, P. (1998). Congruence in job-performance ratings: A study of 360 degree feedback examining self, manager, peers, and consultant ratings.
Human Relations, 51, 517–530.
Gielen, S., Peeters, E., Dochy, F., Onghena, P., & Struyven, K. (2010). Improving the effectiveness of peer feedback for learning. Learning and Instruction, 20,
304–315.
Gooty, J., & Yammarino, F. J. (2011). Dyads in organizational research: Conceptual issues and multilevel analyses. Organizational Research Methods, 14, 456–483.
Gooty, J., & Yammarino, F. J. (2016). The leader-member exchange relationship: A multisource, cross-level investigation. Journal of Management, 42, 915–935.
Gooty, J., Serban, A., Thomas, J. S., Gavin, M. B., & Yammarino, F. J. (2012). Use and misuse of levels of analysis in leadership research: An illustrative review of leader-
member exchange. The Leadership Quarterly, 23, 1080–1103.
Heidemeier, H., & Moser, K. (2009). Self-other agreement in job performance ratings: A meta-analytic test of a process model. Journal of Applied Psychology, 94,
353–370.
Herbst, T. (2014). The dark side of leadership: A psycho-spiritual approach towards understanding the origins of personality dysfunctions, derailment and the restoration of
personality. Bloomington, IN: AuthorHouse.
Howe, A. C., & Stubbs, H. S. (2003). From science teacher to teacher leader: Leadership development as meaning making in a community of practice. Science Education,
87, 281–297.
Jiang, L. X., Probst, T. M., & Benson, W. L. (2014). Why me? The frog-pond effect, relative deprivation and individual outcomes in the face of budget cuts. Work and
Stress, 28, 387–403.
Kalafat, J., & Elias, M. (1992). Adolescents experience with and response to suicidal peers. Suicide and Life-threatening Behavior, 22, 315–321.
Kellerman, B. (2012). The end of leadership. New York City, NY: HarperCollins.
Kenny, D. A., Kashy, D. A., & Cook, W. L. (2006). Dyadic data analysis New York City. New York: Guilford Press.
Lee, K. L., Tsai, S. L., Chiu, Y. T., & Ho, M. J. (2016). Can student self-ratings be compared with peer ratings? A study of measurement invariance of multisource feedback.
Advances in Health Sciences Education, 21, 401–413.
Leslie, J. B., & Braddy, P. W. (2013). Benchmarks technical manual (3rd ed.). Greensboro, NC: Center for Creative Leadership.
Lockyer, J., Violato, C., & Fidler, H. (2003). Likelihood of change: A study assessing surgeon use of multisource feedback data. Teaching and Learning in Medicine, 15,
168–174.
Markham, S. E. (1988). Pay-for-performance dilemma revisited: Empirical example of the importance of group effects. Journal of Applied Psychology, 73, 172–180.
Markham, S. E. (1998). The scientific visualization of organizations: A rationale for a new approach to organizational modeling. Decision Sciences, 29, 1–23.
Markham, S. E. (2002). Multi-level simulation analysis issues: Four themes. In F. Dansereau, & F. J. Yammarino (Eds.), Research in multi-level modeling: The many faces of
multi-level issues. Vol. 1. (pp. 387–396). Greenwich, CT: JAI Press.
Markham, S. E. (2010). Leadership, levels of analysis, and deja vu: Modest proposals for taxonomy and cladistics coupled with replication and visualization. The
Leadership Quarterly, 21, 1121–1143.
Markham, S. E., & Halverson, R. R. (2002). Within- and between-entity analyses in multilevel research: A leadership example using single level analyses and boundary
conditions (MRA). The Leadership Quarterly, 13, 35–52.
Markham, S. E., & McKee, G. H. (1991). Declining organizational size and increasing unemployment rates; Predicting employee absenteeism from within-plant and
between-plant perspectives. Academy of Management Journal, 34, 952–965.
Markham, S. E., Smith, J. W., Markham, I. S., & Braekkan, K. F. (2014a). A new approach to analyzing the Achilles' heel of multisource feedback programs: Can we really
trust ratings of leaders at the group level of analysis? The Leadership Quarterly, 25, 1120–1142.
Markham, S. E., Markham, I. S., & Braekkan, K. F. (2014b). A visual illustration of induction in multilevel methods: The problem of leaders awarding countervailing
merit components. Journal of Business and Psychology, 29, 503–518.
Markham, S. E., Markham, I. S., & Smith, J. W. (2015). At the crux of dyadic leadership: Self–other agreement of leaders and direct reports — Analyzing 360-degree
feedback. The Leadership Quarterly, 26, 958–977.
Mayo, M., Kakarika, M., Pastor, J. C., & Brutus, S. (2012). Aligning or inflating your leadership self-image? A longitudinal study of responses to peer feedback in MBA
teams. Academy of Management Learning & Education, 11, 631–652.
McIntosh, J., MacDonald, F., & McKeganey, N. (2006). Why do children experiment with illegal drugs? The declining role of peer pressure with increasing age. Addiction
Research and Theory, 14, 275–287.
Naslund, J. A., Aschbrenner, K. A., Marsch, L. A., & Bartels, S. J. (2016). The future of mental health care: Peer-to-peer support and social media. Epidemiology and
Psychiatric Sciences, 25, 113–122.
Nowack, K. M. (1997). Congruence between self-other ratings and assessment center performance. Journal of Social Behavior and Personality, 12, 145–166.
Pedersen, E. R., Miles, J. N. V., Hunter, S. B., Osilla, K. C., Ewing, B. A., & D'Amico, E. J. (2013). Perceived norms moderate the association between mental health symp-
toms and drinking outcomes among at-risk adolescents. Journal of Studies on Alcohol and Drugs, 74, 736–745.
Perreault, M., Milton, D., Komaroff, J., Levesque, G. P., Perron, C., & Wong, K. (2016). Resident perspectives on a Montreal peer-run housing project for opioid users.
Journal of Substance Use, 21, 355–360.
Pfeffer, J. (2015). Leadership BS: Fixing workplaces and careers one truth at a time. New York City, NY: HarperBusines.
S.E. Markham et al. / The Leadership Quarterly 28 (2017) 153–177 177

Pierro, A., Presaghi, F., Higgins, E. T., Klein, K. M., & Kruglanski, A. W. (2012). Frogs and ponds: A multilevel analysis of the regulatory mode complementarity hypoth-
esis. Personality and Social Psychology Bulletin, 38, 269–279.
Platt, J. R. (1964). Strong inference. Science, 146, 347–353.
Podschun, G. D. (1993). Teen peer outreach-street work project - HIV prevention education for runaway and homeless youth. Public Health Reports, 108, 150–155.
Roberts, M. J., Campbell, J. L., Richards, S. H., & Wright, C. (2013). Self-other agreement in multisource feedback: The influence of doctor and rater group characteristics.
Journal of Continuing Education in the Health Professions, 33, 14–23.
Saedon, H., Salleh, S., Balakrishnan, A., Imray, C. H. E., & Saedon, M. (2012). The role of feedback in improving the effectiveness of workplace based assessments: A
systematic review. BMC Medical Education, 12, 8.
Schriesheim, C. A., Wu, J. B., & Cooper, C. D. (2011). A two-study investigation of item wording effects on leader-follower convergence in descriptions of the leader-
member exchange (LMX) relationship. The Leadership Quarterly, 22, 881–892.
Stoddard, N., & Wyckoff, C. (2008). The costs of CEO failure. Chief Executive, 66–70.
Swanson, J. A., Antonoff, M. B., Martodam, D. L., Schmitz, C. C., D'Cunha, J., & Maddaus, M. A. (2010). Surgical leadership development: Identification of discrepancies in
self-awareness using a customized 360-degree feedback assessment. Journal of the American College of Surgeons, 211, S113.
Tang, K. Y., Dai, G. R., & De Meuse, K. P. (2013). Assessing leadership derailment factors in 360 degree feedback: Differences across position levels and self-other agree-
ment. Leadership and Organization Development Journal, 34, 326–343.
Taylor, S. N., & Bright, D. S. (2011). Open-mindedness and defensiveness in multisource feedback processes: A conceptual framework. The Journal of Applied Behavioral
Science, 47, 432–460.
Thuraisingham, M. (2010). Derailed! What smart executives do to stay on track. Singapore: Bluetoffie Pte Ltd.
Toegel, G., & Conger, J. A. (2003). 360-degree assessment: Time for reinvention. Academy of Management Learning & Education, 2, 297–311.
van Schaik, S. M., O'Sullivan, P. S., Eva, K. W., Irby, D. M., & Regehr, G. (2016). Does source matter? Nurses' and physicians' perceptions of interprofessional feedback.
Medical Education, 50, 181–188.
Van Velsor, E., Taylor, S., & Leslie, J. B. (1993). An examination of the relationships among self-perception accuracy, self-awareness, gender, and leader effectiveness.
Human Resource Management, 32, 249–263.
Van Velsor, E., McCauley, C. D., & Ruderman, M. N. (2010). The Center for creative leadership handbook of leadership development (3rd ed.). San Francisco, CA: Jossey-
Bass.
Williams, F. I., Campbell, C., McCartney, W., & Gooding, C. (2013). Leader derailment: The impact of self-defeating behaviors. Leadership and Organization Development
Journal, 34, 85–97.
Wright, C., Campbell, J., McGowan, L., Roberts, M. J., Jelley, D., & Chatterjee, A. (2016). Interpreting multisource feedback: Online study of consensus and variation
among GP appraisers. British Journal of General Practice, 66, E277–E284.
Yammarino, F. J. (2003). Modern data analytic techniques for multisource feedback. Organizational Research Methods, 6, 6–14.
Yammarino, F. J., & Atwater, L. E. (1993). Understanding self-perception accuracy: Implications for human-resource management. Human Resource Management, 32,
231–247.
Yammarino, F. J., & Markham, S. E. (1992). On the application of within and between analysis: Are absence and affect really group-based phenomena. Journal of Applied
Psychology, 77, 168–176.
Yammarino, F. J., Dionne, S. D., Chun, J. U., & Dansereau, F. (2005). Leadership and levels of analysis: A state-of-the-science review. The Leadership Quarterly, 16,
879–919.
Zell, E., & Alicke, M. D. (2009). Contextual neglect, self-evaluation, and the frog-pond effect. Journal of Personality and Social Psychology, 97, 467–482.
Zenger, T. R. (1992). Why do employers only reward extreme performance? Examining the relationships among performance, pay, and turnover. Administrative
Science Quarterly, 37, 198–219.
Zenger, J., & Folkman, J. (2009). Ten fatal flaws that derail leaders. Harvard Business Review, 87 (18-+).
Zhou, X. H., & Schriesheim, C. A. (2009). Supervisor-subordinate convergence in descriptions of leader-member exchange (LMX) quality: Review and testable prop-
ositions. The Leadership Quarterly, 20, 920–932.
Zhou, X. H., & Schriesheim, C. A. (2010). Quantitative and qualitative examination of propositions concerning supervisor-subordinate convergence in descriptions of
leader-member exchange (LMX) quality. The Leadership Quarterly, 21, 826–843.

Das könnte Ihnen auch gefallen