If you read about fantasy football for long enough, you'll eventually encounter the phrase "regression to the mean." It's statistician speak for "things even out over time." All the phrase really means is that over a small sample size, extreme events are more likely to occur than they are over a longer period. Landing on heads 80% of the time on five coin flips is much more likely than it is on 50 coin flips. And a sixteen-game NFL season is a small sample size.
So how -- and why -- does regression to the mean work for fantasy football?
Let's create a hypothetical quarterback named Brew Drees. Suppose that over every four-game sample, Drees will average either 250 or 350 passing yards per game, giving him either 1,000 or 1400 yards in every set of four games. Over a long sample, Brew Drees would be expected to average 4800 passing yards in a season (assuming good health). However, let's assume that whether he averages 250 or 350 yards is completely out of Drees' control. It is essentially a coin flip as to whether he produces 1,000 or 1,400 passing yards in every quarter of the season. This means that occasionally, Drees will pass for as many as 5,600 yards or as few as 4,000 yards, with the result being entirely the result of chance. In fact, Drees can have 16 different possible season ending totals, assuming the order in which he gains his yards matters:
|Games 1--4||Games 5--8||Games 9--12||Games 13--16||Total|
So how often will each season-ending total occur (assuming the order in which he gained his yards doesn't matter)?
5600 yards ------->1/16
5200 yards ------->4/16
4800 yards ------->6/16
4400 yards ------->4/16
4000 yards ------->1/16
The mode -- the result occuring most often -- is 4800 yards. That's what the median (middle result) and mean (average result) are, too. Seven out of every eight times, Brew Drees will finish with between 4400 and 5200 yards. Once every eight times, however, we'll see a crazy result of either 4,000 yards of 5,600 yards.
Now Brew Drees is a very good quarterback. Let's create 11 more hypothetical passers who are less talented. Like Drees, they will have two possible per-game averages.
|QB||Bad Games||Good Games||16-game avg|
Now that we know the potential outcomes for each quarterback, how many times (out of 16 possible results) will each quarterback end up with each number of passing yards? Here's how to read the table below. QB3 will throw for 5,280 yards one out of 16 times, 4,880 yards four out of 16 times, 4,480 yards six out of 16 times, and so on. On average, he'll gain 4,480 yards per season.
Our worst quarterback will have a 2,240 yards performance every sixteen years, but he'll throw for 3,840 yards once when everything breaks right in each quarter of the season. By now you might have figured where this is going. We now have 12 different quarterbacks with five different possible season-ending numbers for each passer. With our dozen quarerbacks, we can then get a sense for the possible outcomes for the group.
Let's say these 12 quarterbacks are simulated over sixteen different seasons. How many times should we expect a quarterback finish with 4,000 yards over a sixteen-year period? How many times will a passer finish in the 3360 to 3680 range? What number will we see the most?
We will only see for a 5600-yard season once every sixteen years; that's because only one of our quarterbacks (Drees) can hit that mark, and he can only hit it when he hits 350 yards per game in each of his four quarter seasons. Let's look at the 4480-4639 range: over sixteen years, 11 quarterbacks will finish in that range: QB8 has a 1/16 chance in doing it each year, QB5 has a 4/16 chance, and QB3 has a 6/16 chance.
Now we get to the point of today's post. Assume that these quarterbacks don't age and their situations don't change. We'd project the same number -- their average projection -- for them every single year. Now, what happens the year we see a quarterback (Drees) hit 5600 yards? We'd project 4800 yards for him the next year. What about when we see a quarterback ends up in the 4800-4959 range? One time it will be QB6 (4800), and we'll project 4,000 yards the next year. Four times it will be QB3 (4880), and we'll project 4480 for the next season. And six times it will be Drees, and we'll project 4800 yards the next year.
So what does that mean? In Year N, the 11 QB seasons that landed in the 4800-4959 range averaged 4,829 yards. In Year N+1, we'd project a weighted average of 4,610 yards for those quarterbacks. Remember, absolutely nothing changed in between the two seasons, yet we'd reduce our projection for the WRs by over 200 yards. When a quarterback throws for 5,280 yards -- and only QB3 can do that -- we expect him to throw for 800 fewer yards the next season. This is the concept behind regression to the mean.
When an impressive feat is hit, there's a good bit of luck involved. Sometimes, it's hit by someone who is actually as good as his stats like Drees (but this becomes less likely the more impressive the feat is). But other times it's by a player who is a little lucky, and sometimes it's by a player who's really lucky.
Now NFL players aren't computer programs or dice, but the same theory applies. And we see these results every year in the NFL. When Drew Brees (yes, that guy) threw for 5,476 yards in 2011, we didn't project him to do the same in 2012 because we know his true ability isn't 5500 yards per season. To reach such a ridiculous result, a good bit of "luck" had to be involved. I put that word in quotes because I mean more than just luck in the general sense: things like falling behind early and needing to pass, or playing weak pass defenses, or having your supporting cast stay healthy all fall in the category of "luck," too. If a player has a monster season, he: a) is an excellent player b) had some luck in getting there (e.g., getting and staying hot); and c) had other things fall into place, too (strength of schedule, game script, health, etc.). And only one of those traits is likely to be there the next season.
This isn't just theory. From 1990 to 2011, there were 407 quarterbacks that played in at least 10 games in a season (Year N), averaged at least 125 passing yards per game, and then played in at least 10 games in the next season (Year N+1). I grouped the quarterbacks into ranges and placed them into the table below. For example, 44 different quarterbacks averaged at least 150 but fewer than 175 yards per game in Year N. On average, that group averaged 165 yards per game; in Year N+1, they averaged 178 yards per game. That's regression to the mean at work.
|Range||# QBs||Yr N Avg||Yr N+1 Avg|
Or, if you are more graphically inclined, take a look at the same data:
The N+1 data resembles the Year N data, but you can see that passing yards "regress" for the high totals and increase for the low totals. And some, if not most, of that convergence can be explained by regression to the mean. No one likes to attribute incredible success to luck, but it plays a much bigger role in sports than we tend to remember.
I'll close with one non-quarterback example. Since 1970, 376 running backs have recorded at least 200 carries in consecutive years for the same team. I grouped those running backs into ranges based on their average yards per carry gain and placed them into the table below. For example, 23 running backs averaged fewer than 3.5 yards per carry in Year N. On average, that group produced a 3.35 yards per carry average in Year N and a 3.94 YPC average in Year N+1. That's regression to the mean, which impacts both really high and really low statistical outliers.
|Category||# RB||Yr N YPC||Yr N+1 YPC|
|3.5 or less||23||3.35||3.94|
Here's how that data looks on a graph:
If a running back has a really high or really low yards per carry average, chances are there are many things outside of his control influencing that number. Regression to the mean tells us not to count on those events to repeat themselves the next year.