Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If the metric I'm focusing on is touchdown rate, and Christian McCaffrey is one of the high outliers in touchdown rate, then Christian McCaffrey goes into Group A, and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of my predictions from 2019 and their final results, here's the list from 2018, and here's the list from 2017.
THE SCORECARD
In Week 2, I opened with a primer on what regression to the mean was, how it worked, and how we would use it to our advantage. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
In Week 4, I talked about how the ability to convert yards into touchdowns was most certainly a skill, but it was a skill that operated within a fairly narrow and clearly-defined range, and any values outside of that range were probably just random noise and therefore due to regress. I predicted that high-yardage, low-touchdown receivers would outscore low-yardage, high-touchdown receivers going forward.
In Week 5, I talked about how historical patterns suggested we had just reached the informational tipping point, the time when performance to this point in the season carried as much predictive power as ADP. In general, I predicted that players whose early performance differed substantially from their ADP would tend to move toward a point between their early performance and their draft position, but no specific prediction was made.
In Week 6, I talked about simple ways to tell whether a statistic was especially likely to regress or not. No specific prediction was made.
In Week 7, I speculated that kickers were people, too, and lamented the fact that I'd never discussed them in this column before. To remedy that, I identified teams that were scoring "too many" field goals relative to touchdowns and "too many" touchdowns relative to field goals and predicted that scoring mix would regress and kickers from the latter teams would outperform kickers from the former going forward.
In Week 8, I noted that more-granular measures of performance tended to be more stable than less-granular measures and predicted that teams with a great point differential would win more games going forward than teams with an identical record, but substantially worse point differential.
In Week 9, I talked about the interesting role regression to the mean plays in dynasty, where the mere fact that a player is likely to regress sends signals that that player is probably quite good and worth rostering long-term, anyway. No specific prediction was made.
In Week 10, I explained why Group B's lead in these predictions tended to get smaller the longer each prediction ran and showed how a small edge over a huge sample could easily be more impressive than a huge edge over a small sample. No specific prediction was made.
In Week 11, I wrote that yards per pass attempt was an example of a statistic that was significantly less prone to regression, and for the first time I bet against it regressing.
In Week 12, I talked about "on pace" stats and how many of the players who wound up setting records were not the players who were "on pace" to do so.
In Week 13, I came up with a list of players who were getting hot just in time for the playoffs... and then explained why they probably weren't getting hot just in time for the playoffs, predicting that they'd cool off back to their normal production level going forward.
In Week 14, I offered the cold comfort that if you lose in the fantasy playoffs, the odds were never in your favor, anyway.
In Week 15, I made our last prediction of the year, once again looking at yard-to-touchdown ratios for touchdown regression. I also noted that if we cut the duration of our prediction in half, we could make up for it by doubling the size of our prediction to offset.
Statistic for regression | Performance before prediction | Performance since prediction | Weeks remaining |
---|---|---|---|
Yards per Carry | Group A had 3% more rushing yards per game | Group B has 36% more rushing yards per game | Success! |
Yard to Touchdown Ratio | Group A averaged 2% more fantasy points per game | Group B averages 40% more fantasy points per game | Success! |
TD to FG ratio | Group A averaged 20% more points per game | Group B averages 36% more points per game | Success! |
Wins vs. Points | Both groups had an identical win% | Group B has a 4% higher win% | Failure |
Yards per Attempt | Group B had 14% more yards per game | Group B has 28% more yards per game | Success! |
Recent Performances | Players were "hot" for the playoffs | Players regressed 97% back to their previous avg | 1 |
Yard to Touchdown Ratio | Group A had 15% more points per game | Group B has 34% more points per game | 1 |
There's still a week to go, but at this point, we can all-but-officially close out our prediction that "hot" players would cool down. Players in our sample were averaging 15.6 points per game in recent weeks, but 11.1 points per game over the full season. In the three weeks since they averaged 11.6, 10.8, and 11.3 points per game, for a total average of 11.2 points per game. They haven't regressed in the direction of their season average, they've regressed all the way back to their season average. In order for the prediction to fail, these players would need to average 16.6 points per game next week, or a full point better than they were doing even on their hot streak.
Does this mean the last four weeks don't matter? Of course not. Consider Robert Woods. Over the first seven games of the season, he averaged 13.9 points per game. Over the next four, he averaged 20.9. That four-game stretch brought his full-season average up to 16.4 points per game. And in the three weeks since, he's averaged... 16.1 points per game. It's less than his performance over the last four games... but it's more than his performance over the first seven games, too. Instead, it's right at his average over the full season.
So a player's last four games certainly matter when predicting how he'll perform next, they just don't matter more than the player's first four games, or his third, fifth, seventh, and ninth games, or any other four-game sample. (The one exception would be if there was a dramatic change in a player's role; if a starting running back gets hurt, you should look at his backup's performances in games that the starter has missed.) Otherwise, if you have a league host that reports your players' full-season averages as well as their average in recent weeks, you can safely ignore the latter number because the former number already accounts for it.
As for our yard-to-touchdown prediction, everything so far has happened as expected. Group A averaged 54 yards per game at the time of prediction and 46 yards per game last week. Group B averaged 68 yards per game at the time of the prediction and 71 yards per game last week. Both groups' yardage totals remained substantially the same. But the touchdown totals changed dramatically; Group A fell from 0.70 touchdowns per game to 0.33 touchdowns per game, while Group B rose from 0.25 touchdowns to 0.29 touchdowns per game. As a result, Group B outscored Group A fairly handily.
More Things Regress Than You Might Expect
Two years ago, I wrote about the aging of the quarterback position in the NFL. By fantasy-point weighted age (which gives the most weight to the most productive quarterbacks), the position was the oldest it had been since 2008, as far back as I tracked. It broke the previous record which had been set in 2017. The position was old and getting ever older.
The consensus at the time was that breakthroughs in modern medicine were extending careers and old quarterbacks were the new normal. I hypothesized that instead "quarterback age" was just a reflection of incoming talent. The league had a lot of really good quarterbacks enter between 2000 and 2005. Then it had a long drought with little incoming talent, so starting quarterbacks kept getting older and older with fewer young challengers around to earn starting jobs.
This practical implication of this hypothesis is that all it would take is a few strong draft classes to reverse this aging trend and lead us into a new era of young quarterbacks. Little did I know at the time that the seeds of that revolution were already in place. By midseason, the average quarterback was 30.8 years old; by the end of the year, that total would fall a full year to 29.5. When I revisited last year, it had plummeted all the way down to 27.9 (though it would rebound a bit to 28.1 by the end of the season). Within a year and a half, the average age of fantasy-relevant quarterbacks had fallen by nearly three full years.
Today, that average age (weighted by fantasy production) has risen back to 29.8 years. Why? Lamar Jackson, Russell Wilson, Deshaun Watson, Josh Allen, Patrick Mahomes II, Kyler Murray, Aaron Rodgers, and Tom Brady all repeated as Top 12 quarterbacks from 2019 to 2020, but all got a year older. Dak Prescott (25), Jameis Winston (25), and Carson Wentz (26) fell out of the Top 12. (So did 34-year-old Matt Ryan, but he only dropped from 11th to 15th, so it didn't move the average much.) They were replaced by Ryan Tannehill (32), Kirk Cousins (32), Justin Herbert (22), and Ben Roethlisberger (38). The new guys were older, on average, than the old guys, bringing up the position's age.
This is how average ages shift. When new talents like Justin Herbert explode onto the scene, they pull the age down by a lot. When established talents continue performing at a high level, they pull the age up a little bit. And when older players see a resurgence, they pull up the average age by a lot.
This means we tend to see the age creep down during periods where lots of transcendent talents are entering the league at the same time, then creep back up in the lulls in between when the incoming talent is much sparser. And we see this in the data; the 2008 and 2017 running back classes and the 2014 wide receiver classes were famously good, and we saw the average running back and wide receiver ages plummet in the immediate aftermath. Meanwhile, there was a dearth of great quarterbacks entering the league from around 2006 to 2016, and the quarterback age gradually crept upwards over that span.
We often think of regression as applying to individual players. Aaron Rodgers has a "true production level", and sometimes he'll overperform that level while other times he'll underperform it. Regardless of whether he's been overperforming or underperforming recently, however, he always returns to his true mean.
But this idea that everything has a "true mean" doesn't just apply to individual players. It applies to anything where chance predominates, including abstract concepts like "incoming talent". There's a "true average" amount of wide receiver talent entering the league, but on a year to year basis, you'll see wild fluctuations. Some years are like 2014, which has already given us nine different receivers with a 1,000-yard season. Others are more like 2005, which brought just three.
These fluctuations in incoming talent have massive, far-reaching consequences, but because we never think of "incoming talent" as having a "true mean", it never occurs to us that those fluctuations might be what's driving our various observations. We see quarterbacks getting older and attribute it to modern medicine. We see running backs getting drafted later and attribute it to structural shifts toward the pass. We see wide receivers getting younger and assume that the learning curve at the position must be getting easier. In the NFL, talent is the invisible force that dominates outcomes, and unless we're controlling for it, we're going to be led astray.
In dynasty leagues, we see this in which strategies are currently in vogue. Around 2015, it was conventional wisdom that the best way to build a team was to focus on cornerstone wide receivers and fill in the rest around them. At the time, this was true because we were fresh off a draft that had given us a huge crop of new cornerstone wide receivers. But it wasn't a universal law of roster construction; in 2017, one would have been better-served focusing on running backs because that's where the incoming talent was.
At any given point in dynasty, it's generally better to be heavily-invested in the positions with the youngest average age. (Of course, individual circumstances can vary.) But going forward it's always best to invest where you think the talent is, because incoming talent today dictates where the productive youth will be tomorrow.
Here's a heatmap of the (production-weighted) average age of every position since 2008; it's shaded from YOUNGER to OLDER. You can see for yourself where a lot of young talent has entered the league in a short span (the position will trend toward red), and also where the league has gone long stretches with minimal incoming talent (the position will trend blue).
You might also notice how much red there has been across the board in recent seasons. Indeed, if you take the fantasy-point weighted age of all players regardless of position, the last three seasons are the youngest three seasons since 2008. A lot of talented young players have entered the league at all positions in recent years. But this, too, is prone to regression; at some point in the future the spigot of stars will turn off and players will trend older once again.
Because recent drafts have been so phenomenal, you see the price of draft picks creeping upwards in many leagues. No one wants to sell their future picks when they see even late 1sts, 2nds, and 3rds turning into players like Justin Jefferson, D.K. Metcalf, Alvin Kamara, Justin Herbert, Kyler Murray, or George Kittle. Any GM who over-invested in the rookie draft over the last few years has likely built a powerhouse.
But we've had plenty of lean years, too; from 2012 to 2014 the number and quality of incoming stars was terribly anemic. Any GM who over-invested in those drafts probably took a long time to recover. If you are tempted to buy future picks in the hopes that next year's draft, or the draft after that, or the draft after that will somehow be as talent-rich as the last few drafts... well, they might be. But you should temper your expectations and guard against the possibility that they won't be, too.