I wrote last week about how to determine “worst starter” baselines based on real-world results, at least in 12-team PPR leagues that start 1 quarterback, 2 running backs, 3 wide receivers, 1 tight end, and 1 flex. Since then, I’ve been approached by several people who wanted to know how to generalize those findings to other league sizes and starting lineups. This is a really reasonable idea, but unfortunately, there’s no easy answer.
For starters, the method I used to find the baselines in the first place is right out. My goal was to move beyond the typical theory and estimation and measure actual real-world outcomes. I was only able to do so because of the introduction of standardized leagues on MyFantasyLeague.com. Last year, MFL created a series of “best ball” and active management leagues with fixed buy-ins, ($10, $25, $50, or $100). These were a godsend for me, because it gave me a pool of data that was large, standardized, and easily searchable.
The problem was that, in order to get clean data, the leagues needed to meet certain requirements. The biggest was that all participants were active and aggressive through the entire 13-week sample.
The nice thing about having a pool of leagues that were played for money is that it guaranteed me owners who were, on the whole, pretty into fantasy football. Few casual players are going to drop $50 or $100 to play in a league with strangers.
The downside of the leagues being played for money, however, is that once teams started falling behind, there was a tendency to just abandon the league. Many teams, especially teams that suffered an early setback, would simply stop setting lineups within the first six weeks. Obviously teams like that are going to distort the data. Baselines will seem artificially higher if there are two or three teams who simply aren’t trying anymore.
I could have tried using a smaller sample, such as the first six weeks, but I felt like if I didn’t at least include all of the NFL byes, my findings were going to be distorted and unrepresentative. So out of my pool of hundreds of leagues, maybe 10% met my requirements.
Quite simply, the challenge of finding enough leagues in various other formats— 10 teams, 14 teams, superflex, non-PPR, etc.— to produce a viable sample is daunting. So daunting as to seem insurmountable.
But all is not lost
While measuring real-world outcomes is a non-starter, we can take what we learned from our original sample, extract the lessons, and then apply them more broadly. Yes, this means we’re stepping back out of the world of actual results and back into the world of theory and estimation. But at least we’re now using theory and estimation that is informed, at its heart, by real-world results.
In this next section, I’m going to lay out the assumptions and estimations I used to get working baselines for other league types. If all you’re interested in is brass tacks and you trust that I used a good process to get there, feel free to skip down to the “Applying Our Estimates” section to see what numbers I came up with.
Showing Our Work: How We Generalize Our Findings
Our original sample consisted of 12-team leagues that started 1 quarterback, 2 running backs, 3 wide receivers, 1 tight end, and one flex. The scoring was PPR, and roster size was 16 players. From that sample, we found baselines of 300 starts at quarterback, 541 starts at running back, 866 starts at wide receiver, and 350 starts at tight end.
These baselines roughly correspond, on a weekly basis, to QB19, RB34, WR54, and TE21. In terms of points per game, the quarterback baseline was 17.3 ppg, the running back baseline was 8.3 ppg, the wide receiver baseline was 8.8 ppg, and the tight end baseline was 6.5 ppg.
My first observation is relatively obvious: the running back baseline was very similar to the wide receiver baseline. This makes perfect sense because of the existence of a flex. In theory, if the “worst starter” at running back is scoring fewer points than the “worst starter” at wide receiver, teams will start more receivers and fewer backs in their flex until things equalize.
If anything, the relative baselines suggest that teams are still starting slightly too many running backs. A perfectly optimum distribution would be about 520 starts at running back and about 880 starts at wide receiver, which would have both positions scoring around 8.6 points per game. Which means maybe leagues are flexing one running back too many every week. But either way, the leagues are settling into a distribution that is really close to optimum.
The first takeaway we get from this is that the real-world running back and wide receiver baselines should be very similar in terms of points per game, with the receiver baseline being slightly higher to account for the apparent preference for running backs in the flex.
Assuming that every flex was either a running back or a wide receiver, (probably not entirely true, but likely pretty close, given how much lower the tight end baseline was), teams started 72 receivers and running backs a week. We know that the receiver baseline included 60% more starts than the runner baseline (866 vs. 541). If the baselines were representative, we might assume that teams started 60% more receivers than runners every week. So, using those two equations, (RBs + WRs = 72 and WRs = 1.6*RBs), we can estimate that teams started between 27-28 running backs and between 44-45 wide receivers on a weekly basis.
So we assume the league would start 27-28 backs per week, which over a 16-game season would equate to 432-448 total starts. We observed that the real worst starter baseline was equivalent to 541 starts, which is 22% lower. The baseline at wide receiver is similarly 22% lower, (because in the last step we assumed a similar ratio of inefficiency at both positions). Because of unequal talent distribution, the actual measured “worst starter” baseline is about 22% below the theoretical “worst starter” baseline.
Running backs and wide receivers squared away, it’s time to move on to the “singlet” positions— quarterback and tight end, so-called because teams only start one per week. Because teams only start one, the ratio between “theoretical worst starter” and “actual worst starter” is much higher, because each additional player represents a much higher percentage of the number of starters. So if these ratios look high, that’s why.
In a 12-team league, the “theoretical worst starter” at quarterback is QB12. Over 16 weeks, that’s the equivalent of 192 “quarterback starts”. Our calculated real-world “worst starter” baseline was around 300 starts, which is about 56% more. Similarly, our real-world tight end baseline was 81% more starts than the theoretical baseline.
(As before, I don’t have a great reason for why the tight end baseline should be so much lower than the quarterback baseline. I suspect it might be because tight ends are more unpredictable from week to week. One could think of these percentages as a measure of real-world inefficiency in selecting starters.)
So now we have our rules for estimating baselines in other scoring systems. We assume about 56% inefficiency when selecting quarterback starters, 81% inefficiency at tight end, and 22% inefficiency at running back and wide receiver.
The last thing we need to estimate is how many starters there will be at each position each week. Obviously this is easy enough when there is no flex- the number of starters is simply the number of teams times the number of starters per team.
Adding a flex complicates matters, because we need to calculate how many players at each position we should expect to start in the flex. We remember our rule from earlier that the flex position tends to be distributed among all eligible positions based on expected points per game, with a slight bias towards running back.
Based on per-game averages from last season, we can estimate that in PPR leagues, about 30% of flex starters will be running backs and about 70% will be wide receivers. In non-PPR scoring, we should assume about 80% of flex starters will be running backs and the remaining 20% should be wide receivers.
In tight end premium scoring, (1.5 points per reception for tight ends only), flexing a tight end suddenly becomes palatable. It’s hard to estimate how often it would make sense for teams to do so, given the unpredictability of the position, but conservatively we might expect teams to opt to do so about 20% of the time. Assuming the ratio between running backs and wide receivers holds true, that would imply 56% of flexes are receivers, 24% are running backs, and 20% are tight ends.
Finally, because quarterbacks tend to outscore every other position, I will assume that in leagues with a “superflex”, (i.e. a flex position for which quarterbacks are eligible), the limiting factor will not be expected production, but available supply. Nearly every team eligible to start a quarterback in the superflex position will opt to do so. Because of byes and certain teams stockpiling the position, I’d estimate that only 75% of the time will starting a quarterback in the flex position be feasible.
Applying Our Estimates
Using the aforementioned estimations, we have a formula for what a real-world “worst starter” baseline might look like in various league setups. Let “N” be the number of teams in the league, “S” equal the number of weekly starters at the position (not counting flexes), “F” equal the number of flexes, and let “SF” equal the number of superflexes. Here are the formulas for determining the weekly “worst starter” baseline in various scoring systems:
QB = N * (S + 0.75*SF) * 1.56
RB = N * (S + 0.8*F + 0.2*SF) * 1.22
WR = N * (S + 0.2*F + 0.05*SF) * 1.22
TE = N * S * 1.81
QB = N * (S + 0.75*SF) * 1.56
RB = N * (S + 0.3*F + 0.08*SF) * 1.22
WR = N * (S + 0.7*F + 0.17*SF) * 1.22
TE = N * S * 1.81
QB = N * (S + 0.75*SF) * 1.56
RB = N * (S + 0.24*F + 0.06*SF) * 1.22
WR = N * (S + 0.56*F + 0.14*SF) * 1.22
TE = N * (S + 0.20*F + 0.05*SF) * 1.81
Finally, you can take that result and multiply by 16 to get the “worst start” baseline for the whole season.
Here’s a sample to walk you through the process. Let’s assume we’re playing in a 10-team TE premium league that starts 1/2/3/1 with one flex and one superflex position. N = 10, S = 1/2/3/1, F = 1, SF = 1
QB = 10 * (1 + 0.75) * 1.56 = 27.30 (on average, the “worst starter” every week is QB27)
RB = 10 * (2 + .24 + .06) * 1.22 = 28.06 (on average, the “worst starter” every week is RB28)
WR = 10 * (3 + .56 + .14) * 1.22 = 45.14 (on average, the “worst starter” every week is WR45)
TE = 10 * (1 + 0.20 + 0.05) * 1.81 = 22.63 (on average, the “worst starter” every week is TE23)
If we multiply all of those results by 16, we get a baseline of 437 starts at quarterback, 449 starts at running back, 722 starts at wide receiver, and 362 starts at tight end.
Using these formulas, you should be able to get pretty close to what an actual "worst case scenario" might look like in your leagues this year. If everything goes sideways in week 1, you lose your starting quarterback for the season, and you're forced to try to cobble something together on the fly, what might that look like? These calculations should give you a pretty rough idea.
One final note: our estimate for the superflex position breaks down quickly in larger leagues. In a 12-team league, it already assumes a baseline of QB32- in other words, every single starting quarterback in the NFL will be considered "startable" in fantasy every week. In 14-team leagues, it would estimate the baseline at 38 quarterbacks a week, which is just silly. No matter how quarterback-friendly the league is, nobody is going to opt to start backup quarterbacks if they have any alternative.
We could try to build a formula that fixes this, but honestly, a simpler solution is just to assume when you play in a large league with an available superflex that the quarterback baseline is QB32 on a weekly basis and 512 “quarterback starts” on a season-long basis. In short, if there aren’t enough quarterbacks to go around, just assume that every team will start every quarterback they have every week.