# Regression to the Mean

Regression to the mean is often cited but rarely defined.  Just what exactly does -- and doesn't -- this phrase mean?

If you read about fantasy football for long enough, you'll eventually encounter the phrase "regression to the mean."  It's statistician speak for "things  even out over time."  All the phrase really means is that over a small sample size, extreme events are more likely to occur than they are over a longer period.  Landing on heads 80% of the time on five coin flips is much more likely than it is on 50 coin flips.  And a sixteen-game NFL season is a small sample size.

So how -- and why -- does regression to the mean work for fantasy football?

Let's create a hypothetical quarterback named Brew Drees.  Suppose that over every four-game sample, Drees will average either 250 or 350 passing yards per game, giving him either 1,000 or 1400 yards in every set of four games.  Over a long sample, Brew Drees would be expected to average 4800 passing yards in a season (assuming good health).  However, let's assume that whether he averages 250 or 350 yards is completely out of Drees' control.  It is essentially a coin flip as to whether he produces 1,000 or 1,400 passing yards in every quarter of the season.  This means that occasionally, Drees will pass for as many as 5,600 yards or as few as 4,000 yards, with the result being entirely the result of chance. In fact, Drees can have 16 different possible season ending totals, assuming the order in which he gains his yards matters:

Games 1--4Games 5--8Games 9--12Games 13--16Total
1400 1400 1400 1400 5600
1000 1400 1400 1400 5200
1400 1000 1400 1400 5200
1400 1400 1000 1400 5200
1400 1400 1400 1000 5200
1000 1000 1400 1400 4800
1000 1400 1000 1400 4800
1000 1400 1400 1000 4800
1400 1000 1000 1400 4800
1400 1000 1400 1000 4800
1400 1400 1000 1000 4800
1000 1000 1000 1400 4400
1000 1000 1400 1000 4400
1000 1400 1000 1000 4400
1400 1000 1000 1000 4400
1000 1000 1000 1000 4000

So how often will each season-ending total occur (assuming the order in which he gained his yards doesn't matter)?

5600 yards ------->1/16
5200 yards ------->4/16
4800 yards ------->6/16
4400 yards ------->4/16
4000 yards ------->1/16

The mode -- the result occuring most often -- is 4800 yards.  That's what the median (middle result) and mean (average result) are, too.  Seven out of every eight times, Brew Drees will finish with between 4400 and 5200 yards. Once every eight times, however, we'll see a crazy result of either 4,000 yards of 5,600 yards.

Now Brew Drees is a very good quarterback.  Let's create 11 more hypothetical passers who are less talented.  Like Drees, they will have two possible per-game averages.

Brew Drees 250 350 4800
QB2 240 340 4640
QB3 230 330 4480
QB4 220 320 4320
QB5 210 310 4160
QB6 200 300 4000
QB7 190 290 3840
QB8 180 280 3680
QB9 170 270 3520
QB10 160 260 3360
QB11 150 250 3200
QB12 140 240 3040

Now that we know the potential outcomes for each quarterback, how many times (out of 16 possible results) will each quarterback end up with each number of passing yards? Here's how to read the table below.  QB3 will throw for 5,280 yards one out of 16 times, 4,880 yards four out of 16 times, 4,480 yards six out of 16 times, and so on.  On average, he'll gain 4,480 yards per season.

QB14641Average
Drees 5600 5200 4800 4400 4000 4800
QB2 5440 5040 4640 4240 3840 4640
QB3 5280 4880 4480 4080 3680 4480
QB4 5120 4720 4320 3920 3520 4320
QB5 4960 4560 4160 3760 3360 4160
QB6 4800 4400 4000 3600 3200 4000
QB7 4640 4240 3840 3440 3040 3840
QB8 4480 4080 3680 3280 2880 3680
QB9 4320 3920 3520 3120 2720 3520
QB10 4160 3760 3360 2960 2560 3360
QB11 4000 3600 3200 2800 2400 3200
QB12 3840 3440 3040 2640 2240 3040

Our worst quarterback will have a 2,240 yards performance every sixteen years, but he'll throw for 3,840 yards once when everything breaks right in each quarter of the season. By now you might have figured where this is going.  We now have 12 different quarterbacks with five different possible season-ending numbers for each passer.  With our dozen quarerbacks, we can then get a sense for the possible outcomes for the group.

Let's say these 12 quarterbacks are simulated over sixteen different seasons.  How many times should we expect a quarterback finish with 4,000 yards over a sixteen-year period? How many times will a passer finish in the 3360 to 3680 range?  What number will we see the most?

Passing YardsFrequency
5600-5759 1
5440-5599 1
5280-5439 1
5120-5279 5
4960-5119 5
4800-4959 11
4640-4799 11
4480-4639 11
4320-4479 15
4160-4319 15
4000-4159 16
3840-3999 16
3680-3839 15
3520-3679 15
3360-3519 15
3200-3359 11
3040-3199 11
2880-3039 5
2720-2879 5
2560-2719 5
2400-2559 1
2240-2399 1

We will only see for a 5600-yard season once every sixteen years; that's because only one of our quarterbacks (Drees) can hit that mark, and he can only hit it when he hits 350 yards per game in each of his four quarter seasons.   Let's look at the 4480-4639 range: over sixteen years, 11 quarterbacks will finish in that range: QB8 has a 1/16 chance in doing it each year, QB5 has a 4/16 chance, and QB3 has a 6/16 chance.

Now we get to the point of today's post. Assume that these quarterbacks don't age and their situations don't change. We'd project the same number -- their average projection -- for them every single year. Now, what happens the year we see a quarterback (Drees) hit 5600 yards? We'd project 4800 yards for him the next year.  What about when we see a quarterback ends up in the 4800-4959 range?  One time it will be QB6 (4800), and we'll project 4,000 yards the next year.  Four times it will be QB3 (4880), and we'll project 4480 for the next season.  And six times it will be Drees, and we'll project 4800 yards the next year.

So what does that mean?  In Year N, the 11 QB seasons that landed in the 4800-4959 range averaged 4,829 yards. In Year N+1, we'd project a weighted average of 4,610 yards for those quarterbacks. Remember, absolutely nothing changed in between the two seasons, yet we'd reduce our projection for the WRs by over 200 yards.  When a quarterback throws for 5,280 yards -- and only QB3 can do that -- we expect him to throw for 800 fewer yards the next season.  This is the concept behind regression to the mean.

When an impressive feat is hit, there's a good bit of luck involved. Sometimes, it's hit by someone who is actually as good as his stats like Drees (but this becomes less likely the more impressive the feat is). But other times it's by a player who is a little lucky, and sometimes it's by a player who's really lucky.

Now NFL players aren't computer programs or dice, but the same theory applies. And we see these results every year in the NFL. When Drew Brees (yes, that guy) threw for 5,476 yards in 2011, we didn't project him to do the same in 2012 because we know his true ability isn't 5500 yards per season. To reach such a ridiculous result, a good bit of "luck" had to be involved.   I put that word in quotes because I mean more than just luck in the general sense: things like falling behind early and needing to pass, or playing weak pass defenses, or having your supporting cast stay healthy all fall in the category of "luck," too.  If a player has a monster season, he: a) is an excellent player b) had some luck in getting there (e.g., getting and staying hot); and c) had other things fall into place, too (strength of schedule, game script, health, etc.). And only one of those traits is likely to be there the next season.

This isn't just theory.  From 1990 to 2011, there were 407 quarterbacks that played in at least 10 games in a season (Year N), averaged at least 125 passing yards per game, and then played in at least 10 games in the next season (Year N+1).  I grouped the quarterbacks into ranges and placed them into the table below.  For example, 44 different quarterbacks averaged at least 150 but fewer than 175 yards per game in Year N.  On average, that group averaged 165 yards per game; in Year N+1, they averaged 178 yards per game.  That's regression to the mean at work.

Range# QBsYr N AvgYr N+1 Avg
125-149 9 140 159
150-174 44 165 178
175-199 74 188 197
200-224 114 211 211
225-249 83 236 233
250-274 51 260 252
275-299 24 284 260
300-324 6 312 286
325+ 2 335 313

Or, if you are more graphically inclined, take a look at the same data:

The N+1 data resembles the Year N data, but you can see that passing yards "regress" for the high totals and increase for the low totals. And some, if not most, of that convergence can be explained by regression to the mean. No one likes to attribute incredible success to luck, but it plays a much bigger role in sports than we tend to remember.

I'll close with one non-quarterback example.  Since 1970, 376 running backs have recorded at least 200 carries in consecutive years for the same team. I grouped those running backs into ranges based on their average yards per carry gain and placed them into the table below.  For example, 23 running backs averaged fewer than 3.5 yards per carry in Year N.  On average, that group produced a 3.35 yards per carry average in Year N and a 3.94 YPC average in Year N+1.  That's regression to the mean, which impacts both really high and really low statistical outliers.

Category# RBYr N YPCYr N+1 YPC
3.5 or less 23 3.35 3.94
3.5-3.75 27 3.66 3.9
3.75-4.00 67 3.89 3.97
4-4.25 62 4.13 4.12
4.25-4.5 68 4.36 4.23
4.5-4.75 51 4.62 4.36
4.75-5.00 40 4.85 4.36
5.00+ 38 5.34 4.53

Here's how that data looks on a graph:

If a running back has a really high or really low yards per carry average, chances are there are many things outside of his control influencing that number.  Regression to the mean tells us not to count on those events to repeat themselves the next year.