Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes, I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples, (other than choosing which metric to focus on). If the metric I'm focusing on is yards per target, and Antonio Brown is one of the high outliers in yards per target, then Antonio Brown goes into Group A and may the fantasy gods show mercy on my predictions. On a case-by-case basis, it's easy to find reasons why any given player is going to buck the trend and sustain production. So I constrain myself and remove my ability to rationalize on a case-by-case basis.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of all my predictions from last year and how they fared. Here's a similar list from 2017.
The Scorecard
In Week 2, I opened with a primer on what regression to the mean was, how it worked, and how we would use it to our advantage. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
In Week 4, I explained why touchdowns follow yards, (but yards don't follow back), and predicted that the players with the fewest touchdowns per yard gained would outscore the players with the most touchdowns per yard gained going forward.
In Week 5, I talked about how preseason expectations still held as much predictive power as performance through four weeks. No specific prediction was made.
In Week 6, I talked about why quarterbacks tended to regress less than other positions but nevertheless predicted that Patrick Mahomes II would somehow manage to get even better and score ten touchdowns over the next four weeks.
In Week 7, I talked about why watching the game and forming opinions about players makes it harder to trust the cold hard numbers when the time comes to put our chips on the table. (I did not recommend against watching football; football is wonderful and should be enjoyed to its fullest.)
In Week 8, I discussed how yard-to-touchdown ratios can be applied to tight ends but the players most likely to regress positively were already the top performers at the position. I made a novel prediction to try to overcome this quandary.
In Week 9, I discussed several of the challenges in predicting regression for wide receiver "efficiency" stats such as yards per target. No specific prediction was made.
In Week 10, I proposed a "leaderboard test" to quickly tell whether a statistic was noisy (and more prone to regression) or stable (and less prone to regression). I illustrated this test in action and made another prediction that yards per carry would regress.
Statistic For Regression
|
Performance Before Prediction
|
Performance Since Prediction
|
Weeks Remaining
|
Yards per Carry
|
Group A had 20% more rushing yards per game
|
Group B has 30% more rushing yards per game
|
None (Success!)
|
Yard:Touchdown Ratio
|
Group A had 23% more points per game
|
Group B has 47% more points per game
|
None (Success!)
|
Mahomes averaged 2.2 touchdowns per game
|
Mahomes averages 2.0 touchdowns per game
|
None (Failure)
|
|
Yard:Touchdown Ratios
|
Group B had 76% more point per game
|
Group B has 132% more points per game
|
1
|
Mahomes TDs Redux
|
Mahomes averaged 2.2 touchdowns per game |
Mahomes averages 3.0 touchdowns per game
|
2
|
Yards per Carry Redux
|
Group A had 22% more rushing yards per game
|
Group B has 183% more rushing yards per game
|
3
|
Our Group A tight ends relied heavily on touchdowns for fantasy relevance; last week, those touchdowns dried up entirely and none of Group A reached the end zone. Without commensurate yardage to backstop their value, Group B pulled even further ahead.
Our first Mahomes prediction ended in failure, but I mentioned last week that it's a shame to let it go to waste entirely, so we're reviving it now that he's healthy and on the field again. Mahomes has now thrown for six touchdowns in his two full games since the prediction and is on pace to meet our goal of ten in four weeks.
You didn't misread that last row; Group B had 183% more rushing yards per game than Group A last week. This is mostly a product of small sample sizes; three out of ten backs had a bye last week, so we're comparing three games from Group A to four games from Group B, and one of those latter games was a 188-yard outing from Derrick Henry.
Keeping in mind how tiny the sample is so far, it's interesting to note that Group A averaged 1.51 more yards per carry than Group B prior to our prediction, but in the first week of our sample Group B averaged 1.59 more yards per carry than Group A. 0.9 yards per carry of that difference came from a single 68-yard Derrick Henry run, but that's kind of the point; yards per carry is extremely sensitive to single outlier runs, and those runs are essentially random events, which is why it is so unstable from one sample to the next.
EVEN ABSTRACT CONCEPTS REGRESS.
Last year, I wrote about the aging of the quarterback position in the NFL. By fantasy-point weighted age (which gives the most weight to the most productive quarterbacks), the position was the oldest it had been since 2008, as far back as I tracked. It broke the previous record which had been set in 2017. The position was old and getting ever older.
The consensus at the time was that breakthroughs in modern medicine were extending careers and old quarterbacks were the new normal. I hypothesized that instead "quarterback age" was just a reflection of incoming talent. The league had a lot of really good quarterbacks enter between 2000 and 2005. Then it had a long drought with little incoming talent, so starting quarterbacks kept getting older and older with fewer young challengers around to earn starting jobs.
This practical implication of this hypothesis is that all it would take is a few strong draft classes to reverse this aging trend and lead us into a new era of young quarterbacks. Little did I know at the time that the seeds of that revolution were already in place. At midseason last year, the average quarterback was 30.8 years old; by the end of the year, that total would fall a full year to 29.5.
Today, four of the top five quarterbacks in fantasy football are 24 or younger: Dak Prescott (class of 2016), Deshaun Watson (2017), Lamar Jackson (2018), and Kyler Murray (2019). Patrick Mahomes II (2017) ranks 7th and would be higher if he hadn't missed time to injury. Josh Allen (2018), Carson Wentz (2016), Gardner Minshew (2019), Daniel Jones (2019), Jared Goff (2016), Jacoby Brissett (2016), Baker Mayfield (2018), and Mason Rudolph (2018) have all performed well enough to this point that 13 of the top 24 fantasy quarterbacks hail from the past four draft classes.
The result has been a stunning and dramatic reversal; as of today the weighted average age of all fantasy quarterbacks is 27.9 (the lowest figure since I began tracking in 2008). That number will likely rise a little over the remainder of the season, but this year will almost certainly remain the youngest the position has produced since at least 2012.
We often think of regression as applying to individual players. Aaron Rodgers has a "true production level", and sometimes he'll overperform that level while other times he'll underperform it. Regardless of whether he's been overperforming or underperforming recently, however, he always returns to his true mean.
But this idea that everything has a "true mean" doesn't just apply to individual players. It applies to anything where chance predominates, including abstract concepts like "incoming talent". There's a "true average" amount of wide receiver talent entering the league, but on a year to year basis, you'll see wild fluctuations. Some years are like 2014, which has already given us nine different receivers with a 1,000-yard season. Others are more like 2005, which brought just three.
These fluctuations in incoming talent have massive, far-reaching consequences, but because we never think of "incoming talent" as having a "true mean", it never occurs to us that those fluctuations might be what's driving our various observations. We see quarterbacks getting older and attribute it to modern medicine. We see running backs getting drafted later and attribute it to structural shifts toward the pass. We see wide receivers getting younger and assume that the learning curve at the position must be getting easier.
But in the NFL, talent is the invisible force that dominates outcomes and unless we're controlling for it we're going to be led astray.
Here's a heatmap of the (production-weighted) average age of every position since 2018; it's shaded from YOUNGER to OLDER. You can see for yourself where a lot of young talent has entered the league in a short span (the position will trend toward red), and also where the league has gone long stretches with minimal incoming talent (the position will trend blue).