Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples, (other than choosing which metric to focus on). If the metric I'm focusing on is yards per target, and Antonio Brown is one of the high outliers in yards per target, then Antonio Brown goes into Group A and may the fantasy gods show mercy on my predictions. On a case-by-case basis, it's easy to find reasons why any given player is going to buck the trend and sustain production. So I constrain myself and remove my ability to rationalize on a case-by-case basis.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of all my predictions from last year and how they fared. Here's a similar list from 2017.
The Scorecard
In Week 2, I opened with a primer on what regression to the mean was, how it worked, and how we would use it to our advantage. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
Statistic For Regression
|
Performance Before Prediction
|
Performance Since Prediction
|
Weeks Remaining
|
Yards per Carry
|
Group A had 20% more rushing yards per game
|
Group B has 15% more rushing yards per game
|
SUCCESS!
|
I don't want to make too much about a one-week sample. (Not out of any principled stand or anything, I've just seen plenty of one-week samples flip wildly come weeks two, three, and four and I'd hate to look foolish for celebrating too early.) But this is pretty much exactly what we expect to see when we bet on yards per carry. The yards per carry gap between Group A and Group B was 2.31 over the first two weeks, but just 0.61 in Week 3. Meanwhile, no back in Group A topped 17 carries, while half of the backs in Group B managed the feat. As a result, Group B's superior volume carried the week. Now we'll see if it can carry the next three weeks as well.
Playing the Hits
If you go see Lynyrd Skynyrd live, you know they're playing Sweet Home Alabama and Freebird. The Stones are going to play (I Can't Get No) Satisfaction. KISS is going to play Rock and Roll All Nite and Detroit Rock City, and of course Ozzy is eventually going to get around to Crazy Train.
Similarly, Regression Alert loves delving into the back catalog for obscure stats and deep cuts from time to time, but we know where our bread is buttered and we aren't shy about serving up the hits, either. Last week we played our old classic "Yards Per Carry is Pseudoscience". This week we have our seminal work "Touchdowns Follow Yards (But Yards Don't Follow Back)". Next week we're going to really drive the crowd nuts with our smash "Revisiting Preseason Expectations". But that's getting ahead of ourselves.
First, let's talk about touchdowns. Actually, before we talk about touchdowns, let's talk about vocabulary.
sto·chas·tic
adjective
randomly determined; having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely.
Touchdowns are stochastic. Over his career, Julio Jones has scored 55 touchdowns in 114 games, an average of 0.48 touchdowns per game. Let's be generous and round it off to 0.5 to make things easy. We could say that's his "true production level", and over a long timeline, we'd probably expect him to conform to that, averaging 0.5 touchdowns per game going forward.
Despite that being his true production level, though, guess how many times Julio Jones has scored half a touchdowns in a game? As far as I can tell (and I have researched this topic extensively), it has never happened. Instead, he either scores zero touchdowns... or he scores one touchdown. (On very rare occasions he has even been known to score two touchdowns.) Because they are binary outcomes, we can analyze Julio Jones' touchdowns statistically, but we cannot predict them precisely.
Yards don't really behave like that. Over his career, Julio Jones averages 96.5 yards per game, the highest total in history. But it's not like every week he's either getting you 0 yards or else he's getting you 200 yards. Instead, he's usually getting you somewhere between 50 and 150 yards. His yardage total is much more consistent from game to game than his touchdown total.
One way to measure consistency is something called standard deviation, which measures how much something varies around the average. The standard deviation of Jones' receiving yards is 51.5 yards. The standard deviation of Jones' receiving touchdowns is 0.67 touchdowns.
Now, these numbers are not directly comparable. But if you divide a player's standard deviation by that player's average, you get something called the coefficient of variation, or CV. CV is a way to compare how volatile different statistics are. The CV of Jones' yards is 53%, meaning it tends to vary by about 53% of his overall average. The CV of Jones' touchdowns is 138%. Touchdowns are much more random from week to week than yards are— about 2.6 times as random, according to CV.
Not only that, but touchdowns are also much more valuable than yards. In most scoring systems, one extra touchdown is worth the equivalent of 60 extra yards. Which means if Jones catches the high side of variance and scores a few extra touchdowns early in the year, it can dramatically inflate his fantasy production to date. And if he catches the low side of variance and fails to reach the end zone, it can leave him far lower than we'd otherwise expect.
Which gives rise to my favorite statistic for regression: yard-to-touchdown ratios. Some players are really, really good at getting yards and/or not quite as good at scoring touchdowns. For years, Jones has been the most famous example of this; he has gained 202 receiving yards in his career for every touchdown he has scored. This is a very high average, but there are plenty of other wide receivers in this general range; Andre Johnson averaged 203 yards for every touchdown, Henry Ellard averaged 212, etc.
Other players are really, really good at getting touchdowns but typically aren't commensurately good at getting yards. For his career, Davante Adams scores a touchdown for every 113 yards he gains receiving. Again, this is a very low average, but not historically implausible; Dez Bryant averaged 102 yards for every touchdown, while Randy Moss was all the way down at 98 yards per touchdown.
Importantly: yard-to-touchdown ratio is not a measure of player quality. Davante Adams has twice scored 10 or more touchdowns with 1,000 or fewer yards. All else being equal, a guy who gains 1500 yards and 10 touchdowns is better than a guy who gains 1000 yards and 10 touchdowns, even if the latter guy has a "better" yard-to-touchdown ratio. Calvin Ridley averages a touchdown for every 83 yards in his career; he is not a better receiver than Julio Jones.
With that in mind, over the long term, receivers tend to average between 100 and 200 yards per touchdown, with the majority of the league clustered between 120 and 180. Any rate that falls in that range is plausibly sustainable and perhaps a true representation of a player's relative skill at scoring touchdowns. But because touchdowns are stochastic, in the short run we see yard-to-touchdown ratios that are wildly outside of that "sustainable" zone. And because touchdowns count for so many points in fantasy football, this gives us a ton of targets for regression.
So let's pit the receivers with a lot of yards but very few touchdowns against the receivers with a lot of touchdowns but very few yards and see what happens. There are a dozen receivers in the NFL right now who have 200 or fewer yards and 2 or more touchdowns (guaranteeing a yard-to-touchdown ratio of 100 or lower). Similarly, there are a dozen receivers in the NFL right now who have 201 or more yards and 1 or fewer touchdowns (resulting in a yard-to-touchdown ratio of 200 or higher). Here are the 24 receivers in question, sorted by their yard-to-touchdown ratio.
Player | RecYds | RecTDs | Ratio |
Tyler Boyd | 249 | 0 | undefined |
Michael Gallup | 226 | 0 | undefined |
Christian Kirk | 205 | 0 | undefined |
Allen Robinson | 203 | 0 | undefined |
Courtland Sutton | 247 | 0 | undefined |
Odell Beckham | 288 | 1 | 288.0 |
Michael Thomas | 266 | 1 | 266.0 |
John Brown | 246 | 1 | 246.0 |
JuJu Smith-Schuster | 243 | 1 | 243.0 |
Brandin Cooks | 225 | 1 | 225.0 |
DK Metcalf | 217 | 1 | 217.0 |
D.J. Moore | 217 | 1 | 217.0 |
Emmanuel Sanders | 194 | 2 | 97.0 |
Kenny Golladay | 176 | 2 | 88.0 |
Calvin Ridley | 175 | 2 | 87.5 |
Adam Thielen | 173 | 2 | 86.5 |
Mecole Hardman | 158 | 2 | 79.0 |
DeSean Jackson | 154 | 2 | 77.0 |
Paul Richardson Jr | 135 | 2 | 67.5 |
Phillip Dorsett | 187 | 3 | 62.3 |
Tyrell Williams | 180 | 3 | 60.0 |
Nelson Agholor | 168 | 3 | 56.0 |
T.Y. Hilton | 195 | 4 | 48.8 |
Taylor Gabriel | 110 | 3 | 36.7 |
Emmanuel Sanders, Kenny Golladay, Calvin Ridley, Adam Thielen, Mecole Hardman, DeSean Jackson, Paul Richardson Jr, Phillip Dorsett, Tyrell Williams, Nelson Agholor, T.Y. Hilton, and Taylor Gabriel have all scored at more than one touchdown per every 100 receiving yards; collectively they are averaging 11.2 fantasy points per game in standard scoring. This is your Group A.
Tyler Boyd, Michael Gallup, Christian Kirk, Allen Robinson, Courtland Sutton, Odell Beckham, Michael Thomas, John Brown, JuJu Smith-Schuster, Brandin Cooks, DK Metcalf, and D.J. Moore have all scored fewer than one touchdown per every 200 receiving yards; collectively they are averaging 9.1 fantasy points per game in standard scoring. This is your Group B.
Despite Group A scoring 23% more points per game to this point, I would expect touchdown production to normalize and Group B to average more points per game (in standard scoring) over the next four weeks. Tune in later as we track our results.