Fantasy, in Theory: If We Aren't All Playing, Then What's The Point?

Measuring the quality of measurements of quality.

Welcome to Fantasy, in Theory. One of the driving forces behind this column is the idea that some things in fantasy are interesting, even important, but not especially useful, and there should be a place for those ideas, too.

Today, I wanted to look at one of the most common questions in fantasy football: “how good is my team?” This is a very interesting question— Footballguys' Rate My Team tool is one of our most popular features during the preseason— but it's not an especially useful question, at least not in redraft. After all, whether your team is good or bad, it's the only team you've got.

(This question is actually of some practical use to dynasty owners, where the earlier you can decide whether to compete or rebuild, the better off you are. But in redraft it's just one of those things that we want to know for the sake of knowing.)

We could just say that the team with the most wins is the best team, but everyone knows that's a terrible method. You could have a week where the team with the second-highest score loses and the team with the second-lowest score wins.

Indeed, in one of my leagues, an owner is in the middle of the worst stretch of schedule luck I've ever seen. He has finished each week with the 4th-, 3rd-, 4th-, and 3rd-highest score... and he's lost all four games. Third in the league in points scored, still waiting for his first win. Meanwhile, there's a team he has outscored in three out of four weeks, and that team is 4-0.

The old-school, low-tech response has been to just look at how many points a team has scored, the “total points” method. But since we like to make things super-complicated, a newer approach called “all-play record” has been steadily gaining in popularity over the last decade.

“All-play record” is basically what it sounds like: if you played every team every week, what would your record be? In a twelve-team league, the team that got the highest weekly score would have beaten all 11 other teams, so its all-play record would be 11-0. The team with the fourth-highest score would lose to the three teams with higher scores and beat the eight teams with lower scores, so its all-play record would be 8-3. And so on down the line.

Now, all-play record is a much better measure of team quality than just plain old head-to-head record. But that's irrelevant because head-to-head record wasn't the goal to beat. In order to be useful, all-play record has to be a better measure of team quality than total points.

I have a team in one league that ranks 2nd in all-play winning percentage, but 5th in points scored. Is this team closer to the 2nd-best team in the league, or the 5th?

Failure Modes

Let's start with the basics. Why did players start using all-play instead of total points in the first place? The main reason— the “failure mode” of total points, if you will— is that total points is very sensitive to outliers. If one team scores 50, 50, 50, 250, and the other team scores 99, 99, 99, 99, then total points says the first team is better than the second. But the second team was much more consistent, and as a result, would have won three out of four potential matchups.

All-play record doesn't just reward production, it rewards consistency, and who doesn't want that? (Don't answer that yet, we'll come back to it.)

But even if we accept the “failure mode” of total points, all-play record has a failure mode, too. Namely, it has no sense of proportion or scale.

Imagine a 10-team league where the teams finished with the following points:

According to all-play record, the team in second place is much closer in quality to the team in first than it is to the team in last. But the scoring difference between first and second is more than nine times greater than the scoring difference between second and tenth.

Additionally, all-play record suggests that the team with 120.7 points could score 79 more points without actually getting any better, but if the team in last place scored just nine more points its all-play winning percentage would increase by 89%. (Heck, the team in first could score a thousand more points and all-play record would never recognize the improvement.)

Obviously, this is an exaggerated example. But so is the idea of a team that scores 50, 50, 50, and 250. Are tightly-clustered scores more common than huge outlier weeks? It's a hard comparison to make, but both are fairly common. And I'd say the former failure mode is a bigger problem because it both exaggerates small differences and understates large differences.

“Failure Modes”

That's right, it's time to bust out the scare quotes. Because you know that question I said I'd revisit later about who doesn't value consistency? Surprise, the answer is me!

There's something important to remember about consistency: a fantasy team isn't a single unit that has “good weeks” and “bad weeks”. It's comprised, instead, of 8-10 individual players who have “good weeks” and “bad weeks”, and the performance of those players is for the most part totally uncorrelated. If you own Todd Gurley and Stefon Diggs and Todd Gurley has a good week, Stefon Diggs isn't magically more likely to have a bad week just because he's on your team.

Think of players like dice rolls. There's a range of possible outcomes, with the average outcome representing how good a player is and the spread of outcomes representing how consistent he is.

Le'Veon Bell is really good and really consistent, so maybe his weekly production is akin to rolling five 4-sided dice and totaling the results. The average of this will be 10.5, and you're likely to get a result that's pretty close to that average.

Meanwhile, let's say Odell Beckham is equally good, but much less consistent, (receivers tend to be less consistent than running backs with a comparable weekly average). Say Odell Beckham is akin to rolling one twenty-sided die. Again, the average result is 10.5, but the actual outcome is going to vary a lot from week to week.

The chances of Odell Beckham scoring 20 points are 5%. The chances of Le'Veon Bell scoring 20 points? 0.09%. On the other end, the chances of each player scoring 1 point are the same.

Ignore the names I've used if you're getting hung up on them, (surely Beckham is more consistent than that in real life and Bell is less). The key thing to pay attention to is this: the more dice you're rolling with a given average, the less likely you are to get an extreme result.

Now, remember, every team has 8-10 players, so they're rolling at a bare minimum 8-10 dice. Which means even a team of wildly inconsistent players is going to wind up being fairly consistent on a weekly basis! Some players will have massive games, other players will be huge disappointments, and the results will tend to average out.

Meanwhile, huge outlier weeks give us valuable information. In order to score 200 points, you need to be really, really lucky... but you also need to have players whose maximum totals add up to at least 200. Which means you probably have some really good players.

Having the top weekly score with 200 points tells us you're probably a better team than if you'd had the top weekly score with 120 points, even if that 200-point performance was extremely lucky.

The Takeaway

All-play record is a fun descriptive stat that's useful for telling which teams have benefitted from schedule luck and which teams have been hurt by it. But as a measure of team quality, you're adding complexity without gaining any utility.

It's old-school, it's low-tech, it's retro, but in case you've missed the memo, retro is making a comeback. If you want to know how good your team is, just check how much it's scored.

More articles from Adam Harstad

See all

More articles on: Strategy

See all

More articles on: Timeless

See all