Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes, I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples, (other than choosing which metric to focus on). If the metric I'm focusing on is yards per target, and Antonio Brown is one of the high outliers in yards per target, then Antonio Brown goes into Group A and may the fantasy gods show mercy on my predictions. On a case-by-case basis, it's easy to find reasons why any given player is going to buck the trend and sustain production. So I constrain myself and remove my ability to rationalize on a case-by-case basis.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of all my predictions from last year and how they fared. Here's a similar list from 2017.
The Scorecard
In Week 2, I opened with a primer on what regression to the mean was, how it worked, and how we would use it to our advantage. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
In Week 4, I explained why touchdowns follow yards, (but yards don't follow back), and predicted that the players with the fewest touchdowns per yard gained would outscore the players with the most touchdowns per yard gained going forward.
In Week 5, I talked about how preseason expectations still held as much predictive power as performance through four weeks. No specific prediction was made.
In Week 6, I talked about why quarterbacks tended to regress less than other positions but nevertheless predicted that Patrick Mahomes II would somehow manage to get even better and score ten touchdowns over the next four weeks.
In Week 7, I talked about why watching the game and forming opinions about players makes it harder to trust the cold hard numbers when the time comes to put our chips on the table. (I did not recommend against watching football; football is wonderful and should be enjoyed to its fullest.)
In Week 8, I discussed how yard-to-touchdown ratios can be applied to tight ends but the players most likely to regress positively were already the top performers at the position. I made a novel prediction to try to overcome this quandary.
In Week 9, I discussed several of the challenges in predicting regression for wide receiver "efficiency" stats such as yards per target. No specific prediction was made.
In Week 10, I proposed a "leaderboard test" to quickly tell whether a statistic was noisy (and more prone to regression) or stable (and less prone to regression). I illustrated this test in action and made another prediction that yards per carry would regress.
In Week 11, I mentioned that many unexpected things were at the mercy of regression to the mean, highlighting how the average age of players at a given position tends to regress over time as incoming talent ebbs and flows.
In Week 12, I predicted that because players regress, and units are made up of players, units should regress, too. I identified the top five offenses, bottom five offenses, top five defenses, and bottom five defenses, and predicted that after four weeks those twenty units would collectively be less "extreme" (defined as closer to league average). Because offense tends to be more stable than defense, I added a bonus prediction that the defenses would regress more than the offenses.
In Week 13, I delved into how interceptions were the only quarterback stat that is mostly noise and predicted that the most interception-prone quarterbacks in the league (yes, including Jameis Winston) would start throwing fewer interceptions than the least interception-prone quarterbacks in the league.
Statistic For Regression
|
Performance Before Prediction
|
Performance Since Prediction
|
Weeks Remaining
|
Yards per Carry
|
Group A had 20% more rushing yards per game
|
Group B has 30% more rushing yards per game
|
Success!
|
Yard:Touchdown Ratio
|
Group A had 23% more points per game
|
Group B has 47% more points per game
|
Success!
|
Mahomes averaged 2.2 touchdowns per game
|
Mahomes averages 2.0 touchdowns per game
|
Failure
|
|
Yard:Touchdown Ratios
|
Group B had 76% more point per game
|
Group B has 146% more points per game
|
Success!
|
Mahomes TDs Redux
|
Mahomes averaged 2.2 touchdowns per game |
Mahomes averages 2.3 touchdowns per game
|
Failure
|
Yards per Carry Redux
|
Group A had 22% more rushing yards per game
|
Group B has 23% more rushing yards per game
|
Success!
|
"Extreme" performance
|
"Extreme" units were ~6.4 ppg from average
|
"Extreme" units are 93% as "extreme"
|
2
|
Defense vs. Offense
|
|
Defenses have regressed 7% more than Offenses
|
2
|
Team Interceptions
|
Group A had 87% as many interceptions
|
Group B has 43% as many interceptions
|
3
|
Take two on our "Patrick Mahomes II will score more touchdowns" prediction came down to the wire but ultimately ended in failure as well. I could certainly offer excuses and justifications— a tenth touchdown called back to offsetting penalties, a windy game that heavily impacted passing, an awful performance by the opposing Raiders that resulted in Mahomes attempting just two passes in the fourth quarter— but honestly, I'm okay with the loss.
The point of Regression Alert is to make sure our process is sound and trust that positive outcomes will follow more often than not as a result. The process that resulted in the Mahomes prediction was noting that while he had thrown for a lot of yards, his yard-to-touchdown ratio of 166 was actually fairly average. For a star quarterback like Mahomes, we'd expect his yard-to-touchdown ratio to be much lower (in other words, we'd expect more touchdowns for any given number of yards).
Typically yards are fairly stable and these ratios regress because touchdowns increase or decrease to a more commensurate level. But in the short term, there are all manner of things that can cause yards to fluctuate, too— blowout wins, windy days, injuries, etc. That's what happened to Mahomes; he averaged 366 yards per game at the time of the prediction and, discounting the game he left early due to injury, he's averaged just 269 yards per game since.
But yard-to-touchdown ratio, the underlying metric, did regress. Since the time of the prediction, Mahomes has passed for one touchdown for every 128 passing yards, which is just off the 130-140 yard per touchdown ratio I predicted for him. So the prediction was wrong: full stop, no justifications. But I'd rather be wrong for the right reasons than right for the wrong reasons, and in this case, I'll wear that loss with pride.
One loss that I won't have to wear (with pride or otherwise) is our running back yards per carry prediction. This is the sixth time we've predicted that yards per carry would regress, this is the sixth decisive win on the subject, and this is the fifth time that our low-ypc cohort averaged more yards per carry than our high-ypc cohort. I cannot stress this enough: regression suggests that the groups should move closer together, not that the low outliers should pass the high outliers entirely, but that's exactly what has happened in five out of six trials.
Our Group A backs averaged 5.12 yards per carry at the time of our prediction. They average 4.41 yards per carry in the four weeks since. Our Group B backs averaged 3.61 yards per carry at the time of our prediction. They average 4.54 yards per carry in the four weeks since. Group A was averaging a yard and a half more per carry through nine weeks, and yet they averaged fewer yards per carry straight up over the last four weeks.
I say all the time that yards per carry is not a thing, but I really cannot stress this enough. If you take one thing away from this entire column, you could do a lot worse than this: yards per carry is maximally not a thing. It's just not.
I don't want to belabor the point, but just to drive it home: before the season I found that over the last two decades there have been fourteen backs who had 200 carries in their first two seasons and averaged 5 or more yards per carry. Three had yet to play their third season (Aaron Jones, Alvin Kamara, and Saquon Barkley). Of the other eleven, only two averaged more than 4.4 yards per carry in year three and the median result was 4.3 yards per carry. People would think Aaron Jones averaging 4.3 yards per carry after back-to-back seasons averaging 5.5 was a shocking outcome, but in reality that was the expected result.
So far this year, Alvin Kamara is at 4.7 yards per carry, Aaron Jones is at 4.1, and Saquon Barkley is at 4.0. Meanwhile, Jamal Williams (Aaron Jones' much-derided backup who entered this year with a 3.7 career yards per carry average) is gaining 4.4 yards per carry, substantially ahead of Jones' figure. This shouldn't be a surprising result; yards per carry is functionally a random number generator and the absolute easiest profit you will ever make comes from just constantly betting on it to regress. And yet somehow this is a surprising result to nearly everyone, which I suppose is why betting against yards per carry remains so profitable year after year after year.
Not a whole lot to say about our other predictions. Offense and defense continue to regress, with defense regressing more than offense. If it seems like they're regressing a small amount, remember that while usually we would compare Weeks 1-11 to Weeks 12-15, in this case, we're actually comparing Weeks 1-11 to Weeks 1-15, so the "pre-prediction" sample still makes up the bulk of our "post-prediction" data and it's unsurprising that the numbers aren't moving much. If I had OSRS and DSRS just for the last two weeks, that would show a much more dramatic swing... but I don't, so we're working with what we have.
Finally, interceptions. Not only did the "high-interception" teams throw fewer interceptions than the "low-interception" teams last week... they actually threw fewer interceptions per game. Our "interception-prone" teams were less likely to throw multiple interceptions, and they were less likely to throw any interceptions at all (44% of our "interception-prone" teams finished without a pick vs. 39% of our "interception-avoidant" teams). It's a shame fantasy leagues don't penalize interceptions more, because there'd be another fantastic opportunity to make an easy profit by betting against them.
Regression in its Purest Form
I often say that observed results are merely true performance level plus random chance, and regression to the mean is just the tendency for random chance to even out and production to revert to true performance level going forward. With this in mind, the larger a role random chance plays in any given value, the more strongly that value will tend to regress.
Since Week 14 marks the point where we no longer have enough season left to make predictions, track them for four weeks, and then follow up, I thought this would be a great time to dive into some more theory. And to kick us off, I wanted to talk about what regression to the mean looks like for something where there is no "true performance level", where results are 100% determined by random chance.
I love head-to-head fantasy football leagues. I think it's the most exciting format and the most fun. I get why some people prefer to use Victory Points or All-Play or any of a number of other systems designed to minimize the role of luck, but to me, the luck is the point. Sometimes the best team has a bad draw and loses. My favorite leagues are dynasty leagues, and sometimes you'll see an owner manage to put together a truly monstrous squad. If the best teams didn't lose sometimes, the rest of the league would find the season a lot less interesting!
So I'm not denying that head-to-head introduces a large element of luck to fantasy football. I'm saying that, in my opinion, the luck makes head-to-head leagues more fun. With that said, schedule luck is unequivocally luck in its purest, most unadulterated form. And as pure luck, it regresses. Hard.
Here's what that luck looks like. My oldest dynasty league has ten teams. If I finish with the highest score of the week, I'm going to win 100% of the time regardless of who I happen to play that week. My "expected winning percentage" in this case is 100%.
If I have the second-highest score of the week, I could expect to beat eight of the nine other teams and lose to one (the weekly high scorer). If I finished with the 2nd-highest score a million times, I'd expect to win 8/9ths of those games, or 88.9%. So I could say that a 2nd-place weekly finish is "worth" 0.89 wins.
Obviously I can't win fractions of a game. In practice, either I'll win or I'll lose. If I win, I overperformed expectations by 0.11 wins (I was expected to get 0.89, I actually got 1.00, so I was very slightly lucky to avoid the top team). If I lose, then I underperform expectations by 0.89 wins (I was expected to get 0.89, I actually got zero, and I was quite unlucky indeed to draw the one team that week that was capable of beating me).
I can then repeat that calculation for every game from every team and come up with a quick and dirty estimate of how many wins or losses they gained or lost due to schedule luck. (If your league tracks your all-play winning percentage for you, you can find this value much more quickly by subtracting your all-play winning percentage from your actual winning percentage and multiplying by games played.)
But when it comes to pure luck, there's no "true performance level". Everyone's "true luck level" is "neither good nor bad luck". There's practically no correlation between your luck in one sample and your luck in another. (If you want to get extremely technical, if you have the second-highest score and lose that means you probably played a very good team, which means you're less likely to face a very good team later in the season, so we should expect a minuscule negative correlation between luck from one sample to the next during the same season. But this effect is completely negligible.)
I can put numbers to this. My oldest dynasty league drafted in 2007, which means we have twelve complete seasons on the books. We can pair seasons and compare results from one year to results the next— twelve seasons means eleven sets of season-pairs, times ten teams gives us 110 season-pairs. (Example: my team in 2011 compared against my team in 2012, Mike's team in 2015 compared to Mike's team in 2016, and so on.)
I've tracked ten different variables across these season-pairs and I can measure how consistent they are from year to year. The variables are: points scored, potential points (how many points a team would score if they submitted their best lineup every week), wins, expected wins (using the formula above), points rank (where the team ranked from 1st to 10th in scoring, which helps reduce the impact of outliers), potential points rank, wins rank, expected wins rank, playoff results (how far each team advanced), and "schedule luck" (the difference between actual wins and expected wins).
With this data, I can tell how well one variable correlates with another. Correlation is just the tendency for one variable to move in tandem with the other. For instance, height and weight are correlated; taller people tend to weigh more than shorter people. But this correlation isn't perfect, and you can doubtless think of many examples of a taller person who weighs less than a shorter person. Indeed, the correlation between height and weight is around 0.7, and if you square that correlation coefficient you get the R^2 value, which represents how much of one variable is explained by variations in the other. In this case, variations in height explain about 50% of the variations in weight.
And how well do these variables in one year predict these variables in the next? It depends on the variable. The correlation between potential points or potential points rank in one year and potential points or potential points rank in the next year is about 0.45, the strongest relationship on the board. This makes sense; potential points measure not just how good your starters are, but also how good your depth is, and strong + deep teams are the ones that are most likely to survive age and injuries to remain competitive in future years.
But notice how much smaller this correlation coefficient is than the 0.7 correlation between height and weight. Potential points in one year explain just 20% of the variation in potential points the next year. The other 80% is explained by other factors. This 80% is where regression to the mean does its work. If that value is shockingly high... perhaps this is a useful reminder of why it's a good idea to buy future picks at a discount from good teams, because regression is a much harsher mistress than anyone anticipates.
If you compare potential points (or potential points rank) to all nine (non-luck) variables, the average correlation is 0.35, which is the strongest predictor in the bunch (explaining about 12% of the variation in next year's data). Meanwhile, wins, win rank, and playoff finish are the three weakest predictors with a correlation coefficient of just 0.17 (meaning they explain less than 3% of the variation in next year's data). This is the basis for comparison.
Now let's look at schedule luck. The correlation between schedule luck one year and the other nine variables the next year is just 0.1, meaning schedule luck explains just 1% of the variance in things like record and points scored. (If I had a larger sample, this would drop to 0%, it's just that 110 data points isn't quite enough to get us there). Meanwhile, the correlation between the nine primary variables one year and schedule luck the next is just 0.05, which means these factors explain less than one-quarter of one percent of the variation in luck next year.
What I'm saying is that luck doesn't just regress, it regresses maximally and instantly. Productive teams tend to remain productive and unproductive teams tend to remain unproductive, but if you've been lucky to this point your expected luck going forward is zero, and if you've been unlucky to this point your expected luck going forward is also zero.
The inspiration for this topic came from one of my leaguemates named Mike. Mike had a great team coming into the season. He had won three of the last four titles and his team last year set the all-time record for points scored in the regular season. But over the first six games, Mike wasn't just unlucky, he had the most-unlucky six-game stretch by any team in any season in the history of our league. He opened the season with the third-highest score in each of the first two weeks and lost both games. He started 0-6 after schedule luck cost him a total of 2.8 wins.
Was Mike unlucky? Absolutely. IS Mike unlucky? Absolutely not, because luck is completely random. Despite the terrible start, there was no reason to believe Mike's bad luck would continue. And indeed, not only did his poor luck stop, it completely reversed itself; over the final seven weeks, Mike won 3.1 games more than he would have been expected to based on his weekly scores, the luckiest seven-game stretch in league history. The result was essentially neutral schedule luck over the full season and one of the odder 7-6 finishes I've ever seen.
Don't make the mistake of thinking that because Mike's luck had been bad he was due for some good luck to come his way. That's a classic mistake when it comes to regression. Mike's hot streak was just as fluky and unlikely as his cold streak. It's just that at any given time a team is as likely to go on a run of good luck as it is to go on a run of bad luck. Luck is pure noise.
If you, like Mike, managed to make your league's playoffs, then congratulations. If you had favorable luck assisting you, then I regret to inform you that you can't count on that luck going forward. You might get lucky again, though it's just as likely that your luck turns.
Likewise, if you made the playoffs despite the headwinds of bad luck, take comfort. History isn't destiny and there's as good of a chance your bad luck changes immediately as there is that it persists.
If you missed the playoffs due to bad luck, I'm sorry. Hopefully, you can take comfort in the fact that bad luck comes for everyone eventually, but good luck does as well. It's rather like yards per carry in that respect— it tells us a lot about what has happened but virtually nothing about what will happen next.
I can't control luck, and neither can you. But I can wish you luck nevertheless, and comfort myself with the fact that, for half of you, my wish will come true. Best of luck in the playoffs if you made it, and may you have better luck next year if you did not.