I wrote yesterday about how we can improve the way we calculate a player’s value in fantasy, but that piece had one big omission- the way we calculate a player’s value over baseline is important, but the way we determine our baselines is arguably even more so.
Determining Starters on a Per-Game Basis
When calculating VBD on a season-long basis, the “worst starter” is easy to identify. In a league that starts 12 quarterbacks, the 12th-best quarterback is the “worst starter”. When calculating VBD on a per-game basis, though, that’s not necessarily the case.
Consider, for instance, the 2013 season. Jay Cutler started 11 games for Chicago, while Josh McCown started the other 5. Because of the abbreviated season, Cutler ranked 24th in season-ending fantasy points in standard Footballguys scoring, while McCown ranked 28th.
From a season-long perspective, neither player ranked high enough to play a role in determining our quarterback baseline. But from a per-game standpoint? If you added together the fantasy points Cutler and McCown produced, the combined “Chicago QB” amalgam would have ranked 3rd for the year!
Clearly, then, both Cutler and McCown need to factor into our per-game baseline. But how do we accomplish this? It’s simple: instead of counting the number of “starting quarterbacks” in a league, count the number of “quarterback starts”.
For instance, in a league that starts 12 quarterbacks over 16 weeks, there will be 12 * 16 or 192 “quarterback starts” over the course of the season. In other words, there will be 192 distinct times that a team sets a quarterback in its starting lineup. (Yes, technically there will be 204 “quarterback starts” because the season is 17 weeks long, thanks to byes. But for the purposes of later calculations, it’s better to assume a 16-week season since no quarterback will start more than 16 times.)
To find the baseline, then, we can sort all quarterbacks by points per game, then count down the list until we count out enough quarterbacks to account for 192 starts. Thinking about it conceptually, if one of the top 12 quarterbacks missed a single start, then that means that someone would have been forced to start the 13th best quarterback at some point. This is simply a way to account for that.
Last year, among top 12 quarterbacks in points per game, Carson Palmer missed ten games and Cam Newton missed two games, which means our “192nd best start” baseline winds up being QB13 in points per game. In 2013, thanks in large part to substantially abbreviated seasons by Aaron Rodgers, Josh McCown, and Sam Bradford, the “192nd best start” actually belonged to QB15 in points per game.
We can use a similar procedure to find a per-game baseline at the other positions, too. We estimate how many players at each position will start per week, we multiply that number by 16, and then we count down the player list until that many starts have been accounted for. This gives us our “worst starter” points per game baseline.
Problems With This Approach
The biggest limitation of this approach is that it assumes talent is perfectly evenly distributed. If one team owns two of the top twelve quarterbacks, for instance, that leaves another team starting QB13 on a weekly basis, at best.
This method of determining a baseline also assumes that we have perfect foresight. Many owners in 2013 left Josh McCown on their waiver wire rather than starting him, simply because they didn’t anticipate him being as productive as he was.
The other problem is that it assumes quarterback production is randomly distributed. It assumes that we should expect a quarterback who averages 15 points per game to produce 15 points every week. But that’s simply not the case.
Instead, we are able to play matchups to try to start players in games where they outperform their average. Imagine I have two quarterbacks who average 15 points per game overall. Imagine they average 18 points per game against bad defenses, though, and 12 points per game against good defenses.
If the schedule lines up just right, it’s possible that by alternating two starters who average 15 points per game, I will be able to average 18 points per game from my quarterback position.
The first few factors suggest that a theoretical “worst starter” baseline might overstate the worst-case scenario that a team might expect to get from the position. The last factor suggests that it might understate the worst case scenario. Which effect is stronger?
Instead of continuing on with a strictly theoretical approach, I figured the best way to go would be to look at actual real-world results. To that end, I looked at several dozen actual MFL leagues from the 2014 season.
What follows is an in-depth look at my process for determining the worst starter. If you’re not interested in how the sausage is made, feel free to skip the next section and scroll down to “Recapping Replacement Level” to see what “real world” baselines I found. I promise you won’t hurt my feelings any.
How I Determined the “Worst Starter” in Actual Leagues
Still with me? Great!
To find my baselines, I examined real leagues and asked myself two questions. First, what was the minimum production that any team managed to get from a position in real-world scenarios? Second, what was the true average production from the position as a whole in real-world scenarios?
Then I sorted all players at each position by points per game, and I counted how many “player starts” were above the real-world “worst starter” and “average starter” measurements. To avoid situations where eliminated teams stopped setting their lineup, I only looked at the first 13 weeks of the season when determining both baselines, then pro-rated out to 16 weeks.
This probably sounds confusing, so let me walk you through a sample league. In one MFL league, the fewest points any franchise scored at the QB position through 13 weeks was 217.7. On a per-game basis, that works out to a “worst starter” baseline of 16.75 points per game.
There were 25 quarterbacks with a higher point-per-game average than this “worst starter” baseline. Those quarterbacks collectively accounted for 254 total starts, meaning the “worst starter” baseline was effectively the 255th-best fantasy start.
Now, remember, that this is just through the first 13 weeks, and we want baselines for a full season. If we pro-rate the “255th best player start” out from 13 weeks to 16 weeks, we get a baseline of “313th best player start” for the full season.
In that very same league, the average points per game among all quarterback starts was 20.4. If we performed a weighted average of every starter, we would need to accumulate 266 “player starts” before the league-wide average reached 20.4 fantasy points per game. Pro-rating this to 16 games again, we get an implied baseline of 328 “player starts”.
So, in this one particular league, it looks like if your quarterback situation fell apart and you were forced to rely on replacement-level production, you could expect to score on par with the 313th best player start, (roughly on par with QB19 or QB20 on a weekly basis). Similarly, the league average suggests that, on average, the league was starting more or less everyone who produced above the 328th best player start, (roughly on par with QB20 or QB21 on a weekly basis).
I went digging through MFL’s archives looking for leagues with basic PPR scoring, 12 teams, and a starting lineup consisting of 1 QB, 2 RBs, 3 WRs, 1 TE, and 1 flex. I found over a hundred leagues that fit my requirements.
I then started digging through them by hand and tossing out any league where owners stopped setting lineups at some point in the first 13 weeks. Doing this left me with a sample of 12 leagues with standardized scoring and lineup rules and an active owner base.
All 12 of these leagues featured a buy-in, with seven costing $100 and five costing $50. I believe that the owners are reasonably representative of what one could expect to find in an active league with experienced owners. As such, I think the implied baselines can be generalized to all active leagues as a whole. In short, I think these baselines truly represent what you might see in your own leagues.
Averaging results over these dozen leagues, I found an implied “worst starter” baseline at quarterback of the 330th best “quarterback start” over a full season. I found an implied “average starter” baseline of the 290th best “quarterback start”. These would work out, respectively, to somewhere in the QB18-QB21 range on a weekly basis.
I then repeated the process at tight end and found an implied “worst starter” baseline of the 343rd best player start, and an implied “average starter” baseline of the 355th best player start. That’s roughly on par with TE21-TE22 on a weekly basis. (An aside: the higher baseline likely suggests that tight end is more difficult to project from week to week.)
When attempting to find the worst starter at running back and wide receiver, though, I ran into a big problem. At quarterback and tight end, the team with the “worst starter” was the team that scored the fewest points, since teams only started one of each, (outside of rare situations where someone put a tight end in their flex; the fact that they were willing to flex that tight end suggests it’s unlikely that tight end was actually the worst starter.)
How was I to find which team constituted the “worst starter” at running back or wide receiver, though? Was it the team that scored the fewest points at the position? The problem with that is that perhaps the guy with the worst RB2 situation also had Le’Veon Bell, and therefore still managed to finish middle-of-the-pack in RB production.
Further compounding this problem, thanks to the flex, I’d be comparing teams starting 2 RBs a week to teams starting 3. Of course the “3 RB” teams would score more, even if they had worse backs.
Even if I figured out which team represented the “worst starter”, teams were starting multiple running backs per week, so how was I to determine which particular one was the “worst”? I couldn’t simply take whichever RB had the worst game that week, or else sometimes guys like Marshawn Lynch and Le’Veon Bell could contribute to the “worst starter” baseline.
Basically, the idea of finding a “worst starter” was a non-starter. But thankfully for me, something interesting happened at quarterback and tight end. The actual calculated “worst starter” baseline wound up being very close to the implied “average starter” baseline.
Even better, the “worst starter” baseline was higher in 8 cases, lower in 10 cases, and the same in 2 cases. This suggests that even if the worst starter baseline varies from the average starter baseline, it is not biased to do so in some systemic way.
Armed with this information, I felt comfortable merely calculating an implied “average starter” baseline at running back and wide receiver and letting that stand in as the “worst starter” baseline, too.
In the various leagues, the average points per game among all running back starts implied a baseline of approximately 541 “RB starts”, which works out to about RB34 on a weekly basis. At wide receiver, the implied baseline correlated to the 866th best “player start”, which is equivalent to around WR54 on a weekly basis.
Then, just to get a single unified baseline at quarterback and tight end, I used a “wisdom of Solomon” approach to split the difference between the “worst starter” and “average starter” baselines. I settled on 300 “quarterback starts” and 350 “tight end starts” as my baselines, which corresponds to roughly QB19 and TE21 on a weekly basis.
Another quick aside: I feel pretty comfortable with making estimates on these baselines, just because of how tightly clustered the data is. There is a lot of separation among RBs at the top, but in the ranges we’re dealing with, they become much more closely packed. The 541st best player start gives us a replacement level of 8.20 points per game. Moving down to the 600th best player start would give us 7.85 ppg as a baseline, while moving up to the 500th best player start gives us 8.73 ppg. Even if I’m off by as much as 10% in either direction, we’re looking at a difference of about a half point per game.
Recapping Replacement Level
According to real-world results from 2014, if you find yourself in a doomsday scenario and forced to rely on “replacement level” at quarterback, running back, wide receiver, or tight end, you can count on weekly production roughly on par with QB19, RB34, WR54, or TE21. That’s the absolute worst-case scenario that an average owner should be able to cobble together through minor trades and the waiver wire.
Similarly, those baselines also apply if you want to know how much your studs are outproducing average starters at each position. The average starter at quarterback on a weekly basis will score like the average of the top 19 quarterbacks on a weekly basis, and so on.
In the broadest possible sense, this means that the commonly accepted “worst starter” baselines in use are hopelessly optimistic. I’ve seen some use an “average backup” baseline, which would be a mid-tier QB2, RB3, WR4, or TE2. Actual real-world results suggests that even that overstates the quality of replacement level.
Instead, if everything goes wrong, owners will likely have to turn to a mid-to-low QB2, low-end RB3, mid-tier WR5, or low-end TE2.
Using these newly calculated baselines and my previously outlined method for calculating value, Here are the top 48 players of 2014 by both EVoB (Estimated Value over Baseline) and EVoS (Estimated Value over Starters).
Again, it’s interesting to note how similar the two lists really are. In fact, the only position that seems to make a big move from one list to the other is quarterback. That suggests that the running back, wide receiver, and tight end production is relatively evenly distributed. A higher EVoS than EVoB, on the other hand, suggests that the quarterback position was very top-heavy.
Or to put it another way, the elite players like Rodgers and Luck gave you a substantial advantage over every other starter… but they were also relatively easier to replace than studs at any other position. This jives with what we know about quarterback— the top guys are difference makers, but at the same time, it’s the easiest position to get by with players off the street.
I like the fact that all of this work is essentially telling us something we already know. If a new statistic jives with conventional wisdom, then that means it’s probably a pretty good descriptor of what is really happening. And even if we already “know” this about quarterbacks, it’s always a good idea to check things that are common sense to make sure they’re actually true.
Three Final Takeaways
One thing to remember is that these replacement values are from actual owners in actual leagues. And, being “worst case scenario” benchmarks, they naturally represent the owners who failed the most at any given position. It’s entirely possible to adopt a quarterback streaming strategy that outperforms this “replacement level”, perhaps even by a substantial margin.
If you believe you can get QB8-type production by streaming quarterbacks and playing matchups, this data is not meant to discourage you or convince you otherwise. Instead, this simply attempts to show, if you’re successful, how much of an advantage your quarterback stream could be reasonably expected to provide in an actual league.
Also, while it would be ideal to re-calculate these baselines for every season to determine if 2014 was an anomaly in any respect, the fact that I started with a list of over 100 leagues and ended up with just 12 that were usable should provide a clue as to how hard it was to get quality data, even for just last season.
I’ll re-run this analysis after the 2015 season, but in the meantime, the idea of getting quality data from 2013 or earlier seems like a pipe dream. As such, I will merely assume that these baselines hold true throughout history and will continue to hold true in 2015. Perhaps not the best assumption, but certainly the best I can do until I have a few more seasons worth of data to work with.
Finally, this is an improvement over existing methods, but I hold no illusions that this is the perfect measure of player value. C.J. Anderson ranks pretty low by this method because he had just 116 yards and 0 touchdowns in his first seven games. That stretch as a backup brought down his per-game averages, and the actual value he provided per start was therefore much higher than was listed here.
Similarly, for the purposes of EVoB and EVoS, a missed game is a missed game. In reality, a missed game during the playoffs is more damaging than a missed game during the regular season. Likewise, a big game during the playoffs will do much more to help you win a title than a big game during the regular season.
These effects are not anything we can easily control for without manually going through by hand and adjusting. In addition to being time consuming, it would not be feasible to give the same treatment to every year in history. So while EVoB and EVoS allow us to quickly compare value across several years, an adjusted and "more accurate" value measurement would not.
The goal here is not to create the perfect statistic so we can all just pack up and go home. The goal is to keep getting us incrementally closer to our goal of measuring a player's true contributions to our championships.
Note: In an earlier version of this article, the statistics were called VAR and VOAS. After receiving some feedback that the names made the statistics sounded unrelated and confusing, I opted to change to Estimated Value over Baseline (EVoB) and Estimated Value over Starters (EVoS) in the name of clarity. Thanks to those who have contacted me via email and Twitter to let me know how they felt!