Regression Alert: Weighted Coins

You know what fantasy football needs? More coin-flip analogies!

Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.

The Scorecard

Returning readers, you know how this works by now, but for new readers here's the deal. Every week I take a look at a specific statistic that is prone to regression and identify high and low outliers in that statistic, and then I wave my hands in the air and shout “regression!”

But since predictions aren't any fun without someone holding your feet to the fire afterward, I don't stop there. I lump all of the high outliers into Group A. I lump all of the low outliers into Group B. I verify that Group A is outperforming Group B. And then I predict that Group B will outperform Group A over the next four weeks.

I don't get to pick and choose my groups, beyond being free to pick and choose what statistics are especially prone to regression. If I'm tracking yards per target, and Antonio Brown is one of the high outliers in yards per target, then Antonio Brown goes into Group A and may the fantasy gods show mercy on my predictions.

And then, groups chosen and predictions made, I track my progress. That's this.

In Week 2, I outlined what regression was, what it wasn't, and how it worked. No prediction was made.

In Week 3, I listed running backs with exceptionally high and low yards per carry averages and predicted that the low-ypc cohort would outperform the high-ypc cohort over the next four weeks.

In Week 4, I looked at receivers who were overperforming and underperforming in yards per target and predicted that the underperformers would outperform the overperformers over the next four weeks.

In Week 5, I compared the predictive accuracy of in-season results to the predictive accuracy of preseason ADP. Outside of a general prediction that players would tend to regress in the direction of their preseason ADP, no specific prediction was made.

In Week 6, I looked at quarterbacks who were throwing too many or too few touchdowns given the amount of passing yards they were accumulating, then predicted that the underperformers would score more fantasy points than the overperformers going forward.

In Week 7, I looked at receivers who were catching too many or too few touchdowns based on their yardage total, then predicted that the underperformers would score more fantasy points than the overperformers going forward.

In Week 8, I revisited yards per carry, again predicting that the high-carry, low-ypc group would outrush the low-carry, high-ypc group going forward.

In Week 9, I went back to yard to touchdown ratios, predicting that the low-touchdown group would close the gap substantially with the high-touchdown group going forward.

In Week 10, I discussed the pitfalls of predicting regression over 4-week windows. No specific prediction was made.

Statistic for regressionPerformance before predictionPerformance since predictionWeeks remaining
yards per carry Group A had 60% more rushing yards per game Group B has 16% more rushing yards per game None (Win!)
yards per target Group A had 16% more receiving yards per game Group B has 11% more receiving yards per game None (Win!)
passing yards per touchdown Group A had 13% more fantasy points per game Group A has 17% more fantasy points per game None (Loss)
receiving yards per touchdown Group A had 28% more fantasy points per game Group B has 1% more fantasy points per game None (Win!)
yards per carry Group A had 25% more fantasy points per game Group B has 34% more fantasy points per game 1
rushing yards per touchdown Group A had 21% more fantasy points per game Group B has 0% more fantasy points per game 2

Another week, another closed prediction. A 1% advantage for Group B here might not seem like much, but remember how big the lead was coming in. Group A averaged 10.02 points per game over the first six weeks. That fell to 8.67 over the last four. Meanwhile, Group B rose from 7.82 points per game over the first six weeks to 8.72 over the last four.

The most impressive part, for me, is exactly how that gap was closed. Both groups averaged nearly the same number of yards per game in our four-week sample, (61.7 ypg for Group A, 62.0 ypg for Group B). But Group B— the group that couldn't reach the end zone to save its life through six weeks— actually scored more touchdowns over the last four weeks than Group A! If this doesn't convince you that touchdowns are largely random and prime for regression, I'm not sure what will.

The other two outstanding predictions both had strong weeks, as well. Given the small samples involved, (just five players per group, which after byes sometimes means as few as three performances per week), the standings in those predictions tend to be pretty swingy— remember, Group A in yards per touchdown ratio led last week by 147%, but this week Group B has completely reversed that and now leads by 0.4%. (I rounded in the table above.)

With our regression predictions standing with a strong 3-1 record so far, we'll see if they can add another win or two in the coming weeks.

Weighted Coins

Hopefully you'll forgive me for not diving into a prediction for a second consecutive week, but by this point, you've seen all of my favorite tricks. Yard to touchdown ratio, yards per carry, and yards per target... these are the most volatile metrics on the block.

Statistics are often said to be either descriptive or predictive. (Really great statistics can sometimes be both.) Descriptive statistics tell you what happened and why. Predictive statistics tell you what is going to happen and why.

An ideal candidate for regression is any statistic that is strongly descriptive but weakly predictive, and my favorite regression metrics all fall into that bucket. Catching a lot of 60-yard touchdowns tells us a lot about how many points you've scored to date, but very little about how many points you'll score going forward.

My goal with this column isn't just to tell you who is going to regress, it's to equip you with the tools and understanding of how regression operates so you can tell for yourself what kind of production is sustainable and what kind is not. Give a man a burger and you'll feed him for a day; give a man a Five Guys franchise and you'll feed him until his arteries give out.

So since you've already seen my favorite tricks, I want to hammer a bit more on the conceptual side.

I say it every week: regression operates on longer timescales. The prediction that came due this week is the perfect illustration of this. I predicted touchdown regression after six weeks, and for two more weeks that regression failed to materialize. One week after the prediction was made, Group A still led Group B in points per game by 27%. Two weeks later, it had increased that lead to 60%. Group A had nine touchdowns during those weeks, while Group B had just four. 

When you identify something that's bound to regress and it doesn't regress, that's pretty discouraging, especially if you've already invested resources into your belief, if you've bought Robert Woods or Marqise Lee or Rishard Matthews or Demaryius Thomas with the anticipation that the tap was about to open and the touchdowns would begin to flow.

But the biggest weapon in the entire arsenal when predicting regression is time.





Think of regression like a weighted coin. If there's a coin that comes up heads 60% of the time, there's still a really good chance on any given flip that it will come up tails. If you're gambling using that coin, you should really want to make as many flips as possible.

“Making more flips” means diversifying your portfolio wherever possible. Some of the individual players in Group B wound up being major disappointments. Pierre Garcon gained 66 yards in two weeks and then was lost for the season. Danny Amendola doesn't even have an injury to excuse the 67 yards he's gained in three games.

But the two lowest-scoring players in Group B through six weeks were Robert Woods and Marqise Lee... but they actually were the first and third highest-scoring receivers in Group B on a per-game basis! (Adam Thielen came in second.)

This is a big reason why I don't pick and choose who goes into my groups. Had I tried to limit myself just to stars who were primed to regress, I'd have been worse off. Antonio Brown, Julio Jones, T.Y. Hilton, Demaryius Thomas, Kelvin Benjamin, and Keenan Allen combined to average 7.98 points per game. Adam Thielen, Pierre Garcon, Robert Woods, Marqise Lee, Rishard Matthews, and Danny Amendola combined for 9.65.

Again, Garcon and Amendola were the two biggest busts in the sample, and Matthews underperformed as well. But if I'd tried to eliminate potential misses, I'd have also weeded out potential hits. So I select the metric for regression, and whoever it tells me is going to regress is who I bet on. The sample is the sample. (If anything, the unintuitive nature of the results is kind of the point.)

The second way to get more flips of this weighted coin, (besides refusing to weed out results that make you uncomfortable), is simply to give it more time. And this is why I'm always harping on how regression operates on longer timescales. Sometimes things go wrong in the short term— as we saw with my passing yardage to touchdown ratio prediction. But just like how over enough flips a weighted coin is going to favor the side it's weighted towards, over a long enough timeline regression is going to be undefeated.


More articles from Adam Harstad

See all

More articles on: Analysis

See all

More articles on: Stats

See all