Most LD tournaments do mutual judge preferences (MJP) by the category method. Competitors rate judges into categories (often 1, 2, 3, 4, 5, and 6/Strike) from their most preferred (1) to least preferred (6). When pairing, the tab software tries to find a judge that both competitors rated in the same category while controlling for judge quality. So, the best judges for a given round are judges that both competitors rated as 1s, followed by judges that both competitors rated as 2s. At a certain point, judge quality kicks in. Some tournaments place a 3-3 judge ahead of a 1-2 judge and vice versa.
Some LD tournaments such as the Loyola Invitational use the ordinal method. Competitors rank order all the judges relative to each other, so if there are 50 judges, competitors rank them 1 through 50. When pairing, the tab software uses an algorithm that weights mutuality and judge quality. Instead of matching categories, the mutuality variable tries to find a judge the two competitors ranked similarly (e.g. less than a 10% difference in ranking).
The decision between category and ordinal methods matters. MJP accomplishes a major objective of debate tournaments by using preferences to optimally allocate the judges available. Placing a judge that neither debater wants is wasteful not just for that round, but also because that judge could be someone that two other debaters would really enjoy having. Let’s take a look at some of the data from the Loyola Invitational to see how well we are achieving those objectives, and what potential advantages might stem from ordinals.
Not all 1s are the same.
The category method treats all 1s equally. So if a tournament that uses 5 categories allocates those categories evenly, that makes it so 20% of the judge pool is in each category. For a pool of 50 judges, a competitor will rate 10 judges a 1. But are all 10 of those judges equally preferred? Of course not! If you’re a regular competitor at big national tournaments, you probably have a favorite judge. Call this judge your “ordinal 1.”
While an ordinal method can place an “ordinal 1” relatively frequently (because it knows who that judge is based on your rankings), categories can only place that judge by pure luck. Here are two screenshots of the best prefs for a few prelims at Loyola:
(The dashes represent people that didn’t fill out prefs; the computer assumes mutuality for those, so fill out your prefs!)
In the debate on the left, 10 debaters had their favorite judge in that debate – not just any category 1 judge, but their ordinal 1. The algorithm is clearly hitting 1-1 with some regularity, which is something any debater should be happy to hear.
I’m focusing on the very top, but of course, ordinals increases mutuality and judge quality across the board. With categories, the software can’t distinguish between your 40th judge and your opponent’s 20th judge if they’re both 2s in a 100 judge pool, but with ordinals, we might find a 24-26 instead.
Precision helps quality, empirics prove.
Tournaments don’t generally release their pref data, but I think they should. We’ll lead by example by showing you some pref data from Loyola:
Round | Pref (%) | Mutuality (%) |
1 | 14.5 | 10.8 |
2 | 17.1 | 12.1 |
3 | 18.8 | 11.5 |
4 | 23.3 | 12 |
5 | 20.6 | 13.5 |
6 | 16.3 | 11 |
If the data is confusing to you, in every round we averaged giving a debater a judge in the top 25% of their rankings. In a five category system, this would mean mostly 1s with some 2s. The judges assigned were also within about 11.8% of one another in terms of ranking. In a five category system, this would mean mostly judges in the same category with some 1 category differences. In round 6 the average pref looked something like 11/21.5. While we can’t be sure until there is more data available, this seems better than what I generally see as a coach traveling on the circuit.
Ordinals functionally create more strikes.
Strikes are great because they guarantee that you can never get a judge you don’t want to have. You’ll notice ordinals don’t have explicit strikes, but that doesn’t tell the whole story. Throughout the whole tournament, if you had 2 losses or fewer, you never got a judge in the bottom 50% of your rankings. That’s a lot of strikes! Ordinals seem to do a great job of making sure you get exactly the judges you want, and avoid the judges you don’t.
2 Comments
What about ordinals uniquely causes the bottom 50% of the pool to be “strikes”? If the same tournament had a category-based pref system without a “strike” category, why would you be unable to give debaters judges that are only in the top half of their categories?
If your first point is true about categories artificially increasing the mutuality of 1-1 pairings, shouldn’t that increase the ability of the tournament to give debaters mutual 1s (at the cost of less mutuality within the 1s category)? I’d think that should make avoiding bottom-half judges easier.
I support ordinal prefs, but I don’t follow point #3.
50% number is specific to Loyola. In general though ordinals functionally maximize strikes because specificity allocates judges more efficiently.