Sie sind auf Seite 1von 7

There are two power matching methods that are commonly used at debate tournaments: high/low and high/high

power matching. All debate tournaments are designed to increase education and fairness by matching each team with an opponent in the same win-loss bracket, preventing the cross-bracket pull-ups where one teams record already reveals it is more skilled than its opponent.1 The purpose of either method of power matching is to further increase the fairness of matching opponents within brackets. The high/low method matches highly ranked teams against poorly ranked teams in the same winloss bracket; the high/high method matches highly ranked teams against each other in the same bracket.2 Many tournaments alternate between these two methods for different rounds. If all the rounds were high/low, a highly ranked team would face the lowest ranked team in each bracket; as a worse case scenario, its opponents could garner 14 wins between them in six rounds.3 If all the rounds were high/high, the best teams always face each other and would have falsely poor win-loss records. A mixture of high/low and high/high power matching is intended to give each team a good mixture of opponents. However, final results from any tournament show wide variation in the best measure of the fairness of the matches: opponent wins. Within a final bracket, opponent wins vary despite the power matching methods used. Some teams had much tougher set of opponents than other teams with the same final win-loss record. These disparities in strengths of schedule are not only unfair; they also make win-loss record a less reliable indicator of a teams true skill.4 Therefore, a new power matching method is needed to control for strength of schedule and reduce the variability of opponent wins. This paper describes a new power matching method, a strength-of-schedule (S-o-S) power matching method. This method was used to run a hypothetical tournament, to compare the results to an actual tournament.5 This paper first describes the tournament procedure.

Hypothetical tournament procedure


A major (180+ teams) 2008 national debate tournament was the point of comparison. The actual results of the seven preliminary rounds were available online.
1

The use of win-loss brackets, with winners versus winners and losers versus losers, was first developed for a chess tournament in Zurich in 1895, the Swiss system. 2 Readers wishing to learn the details of these methods should read A Primer on Debate Tabulation by Dr. Jon Bruschke. 3 By hitting two 0-6 teams in the first two, pre-set rounds, then a 2-4, a 3-3, a 4-2, and a 5-1 in the power matched rounds. Of course, this is a worse case scenario, but the high-low power matching method does nothing to control for it. 4 A 5-1 team with 18 opponent wins may not be as good as a 4-2 team with 28 opponent wins. Thank you to Scott Devoid, Stephen Gray, and Owen Zahorcak for a discussion about the reliability of win-loss records when strengths of schedule are unequal. That discussion was the inspiration to develop a technique to control for strength of schedule. 5 Thank you to Joe Kelly and Orion Smith for the idea of a hypothetical tournament as a test.

For the hypothetical tournament, round 1 used the same pre-set matches and results as the actual tournament. Round 2 and every subsequent even round used the strength-of-schedule power matching method. Round 3 and every subsequent odd round used the traditional, high/low power matching method in TRPC.6 Once the pairing was set, a ballot was entered for every round. If the teams had met at the actual tournament, the real results were entered. If the teams had not met, each side received the speaker points they had earned in their respective debates for that round (against different opponents) at the actual tournament, and the winner was determined using the final rankings from the actual tournament (the higher ranking team won).7 Thus, the hypothetical tournament results were meant to replicate the actual tournament results as closely as possible.8

Strength-of-schedule pairing
For all even rounds, this analysis used the strength-of-schedule power matching method: teams who had had good opponents so far faced the weaker ones in their bracket; teams who had had weak opponents so far faced the stronger ones in their bracket. It is therefore distinct from both the high/low and high/high power matching methods. At the heart of a strength-of-schedule power-match is an optimization matrix, like so: Due Neg. Team 1 Team 2 Team 3 Due Aff. ! Spkr. pts. Spkr. pts. Opp. pts. High Average Low Low High Average Team A High Low Good Unfair: 2 strong opp. for team 2 Unfair: 2 weak opp. for team A Team B Average High Unfair: 2 strong opp. for team B Unfair: both teams deserve weak opp. Good Team C Low Average Unfair: 2 weak opp. for team 1 Good Unfair: both teams deserve avg. opp.

Although this analysis did not look at the TRPC code itself, thank you to Dr. Rich Edwards for generously sharing it with me. 7 The actual tournament rankings followed the following tie-breaking procedure: wins, dropped high/low speaker points, opponent wins, total points, double-dropped high/low speaker points, opponent points, and judge variance. Opponent wins were only rarely needed as tiebreakers, and double-dropped high/low speaker points were only needed once. 8 All rounds were power matched using only information available to the hypothetical tab director at the time: round 2 was paired on round 1 results, round 3 paired on round 2, etc. There was no ex poste facto use of the final rankings of the actual tournament or any other future information to manipulate the power matching during the process. The final rankings of the actual tournament were only used to decide rounds, serving in lieu of judges, and not to pair them, like a prescient tab director.

The worst pairings are unfair to both sides; the second worst are unfair to one team or the other; and the best pairings give both sides an opponent they deserve. In this example, the optimal solution is A-1, B-3, and C-2. It is possible to solve these kinds of matrices numerically. Each cell is populated with a score from 0, perfectly fair to both sides, to a large number for matches that are very unfair to both sides. Then, a computer algorithm solves the entire matrix for the optimal solution, the set of pairings fairest to every team, with the overall lowest scores. The optimization matrices for this analysis were created in an Excel spreadsheet. The formula for populating each cell was:
10^ winsi - wins j
2

1 + opp.strngi - strng j

) + (opp.strng

- strngi

ptsi opp. ptsi where strngi is the z-score of winsi + and opp.strngi the z-score of winsi + rds ! 30 rds ! 30 opp.winsi opp. ptsi for round 2 and of winsi + for every subsequent even round, once rds 2 rds ! 30 opponent wins had become a meaningful statistic.9 This formula ensured several desirable characteristics:

(a) within each bracket, the scores closest to 0 were those that maximize the distance between teams: teams with the lowest opposition records were paired against teams with the highest speaker points, and teams with the highest opposition records were paired against teams with the lowest speaker points; (b) win-loss brackets were broken the minimum number of times, since any within-bracket match possible had a lower score than even the best pull-up match, and thus the fewest pull-up matches possible were chosen in the optimized matrix;10 (c) when brackets were broken, teams with the lowest opposition records were always pulled up first, since they had the lowest scores for pull-up matches.11

Cells representing a pairing of two teams from the same school or a re-match of two teams that have already met at the tournament had to be given an arbitrarily high value. This analysis used 10rds for re-matches and 10rds+1 for same-school pairings a high enough value that the optimal solution never included these possibilities. At very small tournaments where the tab directors know that re-matches must happen, they could relax that condition by lowering the re-match value by a factor of ten. 10 Z-scores for at least 90% a data set range from +3 to -3, so the denominator is always a smaller factor than the numerator in the cell population formula. 11 Conceivably, the cell population formula could be set up differently yet still achieve these same desirable outcomes. For example, different measures could be factored into a teams strength, such as judge variance or dropped high-low points, or the calculation could be set up in

Once the matrix was populated, a simple algorithm found the optimal solution of the matrix.12 The program selected one match per column and per row, i.e., one match per affirmative and negative team, producing one set of strength-of-schedule pairings.

Results
As indicated before, the strength-of-schedule power matching method generated only the correct, minimum number of pull-ups. Teams pulled up were always those with the weakest opposition record in their bracket. Otherwise, all teams were correctly paired within their brackets. The results from the hypothetical tournament closely matched those of the actual tournament. Of the 32 teams that made it into elimination rounds at the actual tournament, 30 of them would have made it into elimination rounds at the hypothetical tournament. (At the actual tournament, there was a four-way speaker point tie for 31st place, broken on opponent wins. The 30th and 32nd place teams at the actual tournament had different opponents and lower opponent wins at the hypothetical tournament and dropped below the threshold.) In every bracket except the 7-0s, the hypothetical tournament using the strength-of-schedule power matching method had narrower ranges for opponent wins and smaller standard deviations. 7-0s Actual S-o-S Range 34-32 34-29 Average 33.00 31.33 Std. dev. 1.00 2.05 Bracket 6-1s Actual S-o-S 37-28 33-29 30.54 31.40 2.44 1.36 5-2s Actual S-o-S 37-22 38-25 28.39 29.97 3.38 2.93 4-3s Actual S-o-S 34-19 32-22 25.29 26.49 3.17 2.64

With so few 7-0s, the addition of one outlier made a large impact. This is not the most revealing statistic. The range for opponent wins for 6-1s at the actual tournament was nine, but four for the hypothetical tournament. The standard deviation was nearly halved. The range for 5-2s decreased from 15 to seven. The range for 4-3s decreased from 15 to ten.
ptsi - 25 to account for their real range of 30 to 6 25. Even geographic zones could be factored in, too. What is important is that the strength and opponent strength scores are in distinct, non-overlapping ranges for each bracket; e.g., the highest ranked 2-1 team should have a lower strength score than worst 3-0 team even though the 2-1 team may have better speaker points. This is necessary to ensure that brackets are broken correctly. 12 Thank you to Jake Stults for developing the Java program of the Hungarian algorithm used for this analysis. Although the algorithm is simple to understand, it would have been extremely laborious to do it by hand. With the time constraints of a normal tournament, a computer must do this step. Computer time to find a solution for a 180+ team tournament was less than one second; by hand, perhaps a week.

different ways, such as scaling speaker points

Another telling statistic is the comparison for the top 32 teams at the actual and the top 32 teams at the hypothetical tournament: Top 32 Actual S-o-S Range 37-25 38-26 Average 29.72 31.28 Std. dev. 3.12 2.43 94% * Actual S-o-S 37-25 35-27 29.63 31.23 2.81 1.96

* 94% dropped the teams with the highest and lowest opposition wins.

The ranges were the same, but the standard deviation was far lower at the hypothetical tournament. The middle 94% of the top 32 teams, eliminating the highest and lowest outlier, is even more telling. The range for the actual tournament, even eliminating the outliers, was still 12; the range for the hypothetical tournament dropped to eight.

Discussion
Although the opponent wins ranges improved using the strength-of-schedule power matching method, they were not as narrow as they could have been. One concern is the limit of speaker points as measure of a teams strength. Opponent wins effectively measures a teams strength of schedule, but the strength-of-schedule power matching method evens this out only so far as speaker points is an accurate predictor of a teams final record. A team with a weak opponent record needs to debate a good opponent who will finish with many opponent wins, but how a team will finish is not known in advance. Speaker points have the symmetry to traditional high/high and high/low power matching methods, but they may be too poor a predictor.13 A second reason that ranges were not as narrow as they could have been is that the last round used a high/low power matching method. Since using high/low power matching in odd rounds increased the ranges, finishing with an odd round meant that there was no chance to even them again with an even round strength-of-schedule power matching.

Odd rounds
The main reason the strength-of-schedule power matching method was not used for odd rounds is that there was no immediate solution to the side assignment problem. The sides must be
13

Perhaps another measure, such as judge variance, would be a better predictor of team strength. Another thought is creating a new measure, effective wins, adding opposition wins only for opponents defeated, which might more accurately measures a teams true strength, although it only makes sense after several rounds (idea from Steven Gray). Alternatively, tournaments could try to make speaker points more objective. In our experience, Owen Zahorcak and I have noticed that even a simple rubric (e.g., 25=poor, 26=fair, 27=average, ) posted at the ballot table is effective in reducing the variability of speaker points between different judges.

assigned first, before the matrix can be created and then solved. Assigning teams to sides at random would create side skew problems: if more than 50% of the stronger teams could end up on the affirmative, then many matches in the subsequent even round need to be pull ups. (This is analogous to problems with using high/high power matching for odd rounds, too there is no clearly better team, so side assignment to create balance in the next round is a guess.) High/low power matching largely avoids this problem because 50% of highly ranked teams, which will presumably win, are assigned to each side.

Byes
Of course, as is done now, the bye should be selected first, so that there is an even number of teams to power-match. The algorithm then proceeds as described before. Byes present three interesting questions. The first question is whether the team receiving the bye should have the weakest record and weakest speaker points (i.e., weakest overall strength) or the weakest record and strongest opponent record (i.e., the 0-3 most deserving a break). There are good reasons on either side. The second question raised by byes is how to handle side assignments for teams that have received byes in earlier rounds. A team might be 1-2, with one win (a bye in round 2) and two losses (on the affirmative, round 1, and the negative, round 3). This team is not side-constrained in round 4. In fact, to equalize the number of teams on each side, it may need to be on the negative again (thus, being negative in rounds 3 and 4). Current power-matching methods may have a way of dealing with this contingency, but the strength-of-schedule power matching method needs to assign a side before the matrix can be solved. The third question raised by byes is how to handle opponent wins and opponent points for bye rounds. In TRPC, the default is to average: a teams two opponents have three and four wins, so for the round 3 bye round, it receives 3.5 opponent wins, raising its total to 10.5 opponent wins for the three rounds. An alternative is to give zeros. In this method, the team would have only seven opponent wins recorded after three rounds. Giving zeros in bye rounds is accurate, since teams have no opponent, but the effect is that teams receiving byes are among the first teams pulled up for every subsequent round. If the teams receiving byes were chosen because they had the weakest record with the strongest opponent record, then all of that teams opponents would be the strongest in the zero-win bracket. This may be reasonably viewed as fair and thus an argument for giving zeros (all strong opponents offsets one round off), or it may also be reasonably viewed as unfair and thus an argument for averaging.

Programming
There was nothing in the process that required human judgment or that would be difficult to do in a computer program: 1. Retrieve the relevant statistics from the tabulation program.

2. 3. 4. 5. 6.

Assign byes. Calculate the z-scores of a teams strength and its opponents strength. Populate an optimization matrix. Solve the optimization matrix using the Hungarian algorithm. Feed the solution back into the tabulation program as the next rounds pairings.

Alternatively, the new power-matching algorithm could be written into an existing tabulation program, eliminating steps 1 and 6.

Das könnte Ihnen auch gefallen