Friday, March 5, 2010

SLUGgish Math, Kinda

Warning - this is me getting really nerdy about Ultimate and a lecture I heard this week on incorporating uncertainty into wildlife management practices. Read at your peril.

So here's an applied philosophy / math problem for you from the domain of Ultimate, namely our Saturday morning scrimmages. The general situation is this: 20 people show up to play an Ultimate game, and each person is effectively betting five dollars (a pizza lunch) to play. Our goal is to make the games as fair as possible (balance the two teams) and not have any appearance of having picked teams in a biased manner. This is important not just in the spirit of fairness, but because if people show up and lose their five dollars on an unfairly disadvantaged team, they will be disinclined to show up for the following weeks' games. And we can't have that.

To simplify the problem (and this is not to say that Ultimate players are like this, just to demonstrate how difficult this is even with an oversimplified heuristic), let's pretend that you can evaluate Ultimate players on a Madden-esque 1-100 scale, and that the only factor that matters for comparing teams is the aggregate point total of the individual players. I.e., we're not concerned about holistic evaluations of the teams, whether you have a good balance of height, handling and speed, etc., just whether the team point totals match. Let's further pretend that no one is perfect at evaluating a player's point total, but that the best evaluators can get within +/- 1 of a player's true ability.

So there are obvious situations in which fair teams would be impossible. E.g., if there were nineteen 90 point players and one 10 point player, one of the teams will be necessarily outmanned from the get-go. So to make this even simpler, let's say there are 10 pairs of players whom our best evaluator puts at 95, 90, 85, 80, 75, 70, 65, 60, 55, and 50. Again, this is ridiculously over-simplified, but this is a situation that at least looks like it would be amenable to crafting fair teams.

We are trying to dodge all appearances of bias, so we don't want anyone, even the best evaluator, to pick the teams by himself due to the obvious conflict of interest involved. Using the two 95 point players as captains and drafting schoolyard style has drawbacks, too - 1, if the best evaluator is involved, he will have an inherent advantage; 2, if the best evaluator is not involved but there is still a disparity in evaluation ability, one captain will still have an inherent advantage*; 3, if the captains are nowhere near the best evaluator and are picking (relatively) haphazardly, getting the balance correct will be very unlikely.

* - Either 1 or 2 would test the better evaluator's sense of duty to fairness v. his lust for pizza; presumably, if he were a really stand-up individual, he could draft to correct any mistakes the opposing captain made to keep things fair. But I don't know how likely this is, as pizza is really tasty.

All of that was set up to what I find interesting, our solution and its success or failure. We've had the best evaluator (or a sort of communal "best evaluator council" of a few knowledgeable players) name these pairs and then select the teams by coin flips. This randomizes the teams to some extent* and is pretty effective in killing the appearance of bias - unless we're rigging the coin flips, we're pretty clearly not rigging the teams. So one goal is well accomplished. But does this method result in fair teams?

* - It also happens to mandate some sets of never-teammates - e.g., Griesy and Cole have never played together because they are a somewhat natural ability-level pair. So no matter which way the coin flips go, they're still on opposing teams every week.

The answer is "hmmm." Even if we believe the generous assumption that we can get within one of a player's "true ability," and even when people are lined up as I've outlined above with a seemingly balanced pool from which to pick, the coin flips actually work against balanced teams. Look at each pair - it's either going to be a wash (we got both players right, or got them both wrong in the same direction - e.g., 90 v. 90 was right, or they're both 91, or both 89), or +1 (e.g. the second round pick is "actually" 91 v. 90) or +2 (e.g., 81 v. 89). To really crunch the math on this, we'd need to start inventing probabilities of being off by one in each direction, etc. But to make the general point, I'll keep the math a little simple and note that for the errors to balance out, you effectively need the coin to come up heads (the better player of the pair ending up on team A) the same number of times that it comes up tails (the better player of the pair ending up on team B). You're asking, in ten coin flips, for the coin to come up heads or tails 50% of the time.

The problem is, that's not a common event. The chance of five heads in ten flips is (10 combination 5) * (.5)^10 = .246. So the teams will be "fair" or as fair as possible given the constraints only a quarter of the time.

But that's assuming that all ten coin flips mattered. Let's say that all of those probabilities of error in individual player evaluation are equal. In other words, it's equally likely that we get the rating right, overestimate by one, or underestimate by one. If this is the case, then 1/3 of the time a pair will be equal, 2/9 of the time, they'll be off by 2, and 4/9 of the time they'll be off by one*. The important thing is that one third of the coin flips won't matter. Let's be generous and call that four coin flips that don't matter, so now we only need to get 3 out of 6 heads.

* - E.g., there is a 1/3 chance that I overrate player A times a 1/3 chance that I overrate player B, so there is a 1/9th chance that I overrate them both. There's also a 1/9th chance that I get them the same and a 1/9th chance that I underrate them both, so there's a 1/3 total chance that they end up washing. Similar calculations can be made for the other states.

That's more likely - (6 combination 3) * (.5)^6 = .3125, or getting toward a third of the time. But that still leaves two thirds of the time where the teams will be unbalanced*, which is not great.

* - I'm leaving out the complications of those +1's and +2's washing out or not at varying rates in both the 3 out of 6 heads situations and otherwise. That difference is going to complicate both the allegedly balanced and unbalanced situations, so I think it's reasonable for simplicity's sake to collapse those +2's to +1's to make the point that there is a better chance that one team will have an advantage than that they will be even).

But let's step back into something resmebling reality and introduce the notion that maybe a +1 or +2 differential won't *really* make that much of a difference spread over a team. I.e., as long as one team gets at most two "advantages" in the coin-flipping process, things will be effectively even. In other words, if I get 4 heads, 3 heads, or 2 heads out of the 6 coin flips that matter, the teams are "balanced enough." Ah, now we're talking. Because that means that all three of the following situations would be fair:

(6 combo 3) * (.5^6) = .3125
(6 combo 2) * (.5^6) = .2344
(6 combo 4) * (.5^6) = .2344
----------------------------------
Fair teams = .7813

So we can rest easy, knowing that between 3/4ths and 4/5ths of the time, the teams are within a tolerable level of fairness. It's probably worth noting that the remaining "unfair teams" situations here are driven by the uncertainty in our evaluation process, not by the drafting process. In other words, dropping the coin flips and just having the best evaluator split each of the balanced pairs however he will would still result in this ratio of fair and unfair team situations. Again, the "balanced pairs" situation is not real-world likely, so we don't want to let the best evaluator choose the teams that way, lest he appear biased. (Particularly if he wins pizza!).

Right - all of that is qualified by the notion that this is a pretty simplistic model, and the pairs involved are probably more disparate than this. To account for obvious discrepancies, we could probably improve the process (in the event of some wacky coin-flipping that appears to be favoring one team over the other) by stopping the coin-flipping and actively balancing the teams with the last couple of picks. And we could keep the bias out by assigning people involved in the team-crafting process by coin flip at the end. Voila!

Alright, hope that wasn't too painful. But it is interesting to me that coin-flipping is automatically assumed to be fair, and in many situations - where the individual player disparities are large enough to swing a game, and you really do need the coin to come up heads exactly half of the time - the randomness of the process is actually more likely to produce an imbalance. Seems that, especially if some of the player pairs are not ideal, that some conscious adjustments need to be made to keep the people coming back to SLUG it out.

No comments:

Post a Comment