Regression Alert: Massive Samples

Regression operates more consistently the more chances you give it. So let's give it a ton of chances and see what happens.

Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.

The Scorecard

Returning readers, you know how this works by now, but for new readers here's the deal. Every week I take a look at a specific statistic that is prone to regression and identify high and low outliers in that statistic, and then I wave my hands in the air and shout “regression!”

But since predictions aren't any fun without someone holding your feet to the fire afterward, I don't stop there. I lump all of the high outliers into Group A. I lump all of the low outliers into Group B. I verify that Group A is outperforming Group B. And then I predict that Group B will outperform Group A over the next four weeks.

I don't get to pick and choose my groups, beyond being free to pick and choose what statistics are especially prone to regression. If I'm tracking yards per target, and Antonio Brown is one of the high outliers in yards per target, then Antonio Brown goes into Group A and may the fantasy gods show mercy on my predictions.

And then, groups chosen and predictions made, I track my progress. That's this.

In Week 2, I outlined what regression was, what it wasn't, and how it worked. No prediction was made.

In Week 3, I listed running backs with exceptionally high and low yards per carry averages and predicted that the low-ypc cohort would outperform the high-ypc cohort over the next four weeks.

In Week 4, I looked at receivers who were overperforming and underperforming in yards per target and predicted that the underperformers would outperform the overperformers over the next four weeks.

In Week 5, I compared the predictive accuracy of in-season results to the predictive accuracy of preseason ADP. Outside of a general prediction that players would tend to regress in the direction of their preseason ADP, no specific prediction was made.

In Week 6, I looked at quarterbacks who were throwing too many or too few touchdowns given the amount of passing yards they were accumulating, then predicted that the underperformers would score more fantasy points than the overperformers going forward.

In Week 7, I looked at receivers who were catching too many or too few touchdowns based on their yardage total, then predicted that the underperformers would score more fantasy points than the overperformers going forward.

In Week 8, I revisited yards per carry, again predicting that the high-carry, low-ypc group would outrush the low-carry, high-ypc group going forward.

In Week 9, I went back to yard to touchdown ratios, predicting that the low-touchdown group would close the gap substantially with the high-touchdown group going forward.

In Week 10, I discussed the pitfalls of predicting regression over 4-week windows. No specific prediction was made.

In Week 11, I once more delved into the theory behind regession and highlighted the importance of not cherrypicking which players are “too good” or “not good enough” to regress.

In Week 12, I took one more shot at touchdown regression for quarterbacks, predicting that the low-touchdown cohort would close the gap with the high-touchdown cohort going forward.

Statistic for regressionPerformance before predictionPerformance since predictionWeeks remaining
yards per carry Group A had 60% more rushing yards per game Group B has 16% more rushing yards per game None (Win!)
yards per target Group A had 16% more receiving yards per game Group B has 11% more receiving yards per game None (Win!)
passing yards per touchdown Group A had 13% more fantasy points per game Group A has 17% more fantasy points per game None (Loss)
receiving yards per touchdown Group A had 28% more fantasy points per game Group B has 1% more fantasy points per game None (Win!)
yards per carry Group A had 25% more fantasy points per game Group B has 16% more fantasy points per game None (Win!)
rushing yards per touchdown Group A had 21% more fantasy points per game Group B has 8% more fantasy points per game None (Win!)
passing yards per touchdown Group A had 14% more fantasy points per game Group A has 37% more fantasy points per game 3

Despite a down week for Group B, our low-touchdown backs had built enough of a lead to hold on and give regression to the mean another win. So far, regression remains undefeated.

Well, undefeated except for on quarterback predictions, where it continued to take a beating in week 12. Honestly, the rules of statistics do apply to quarterbacks, despite all evidence so far this season. With three weeks left on my latest quarterback prediction, maybe they'll show up sometime to help me out.

Now on to the prediction.

One Last Hurrah

The format of this column calls for 4-week predictions. With that in mind, this is the last chance I'll have all season to make a prediction and test it out. Don't worry, I'll have plenty to write about in coming weeks as I revisit old predictions and delve more into the theory and application of regression.

But for our last hurrah, I wanted to do something big. I've written about how the bigger a sample is, the more regression benefits. So let's make our biggest sample yet.

I took the top 100 fantasy performers at running back, wide receiver, and tight end so far this season, everyone from Todd Gurley (200.4 fantasy points) down to Randall Cobb (60.2 fantasy points). Then I tossed them all together and sorted them by their yard-to-touchdown ratios, and divided that group into thirds.

The top third, the group with the highest yard-to-touchdown ratio, is comprised of Will Fuller V, Corey Clement, Jimmy Graham, Tyler Kroft, Nelson Agholor, Jordy Nelson, Michael Crabtree, Kyle Rudolph, Austin Ekeler, Alshon Jeffery, Chris Hogan, Zach Ertz, Marvin Jones Jr, Evan Engram, Sammy Watkins, Rob Gronkowski, Davante Adams, Amari Cooper, Robby Anderson, Jermaine Kearse, Jarvis Landry, Latavius Murray, Melvin Gordon III, Tevin Coleman, Ezekiel Elliott, Cameron Brate, Marshawn Lynch, Ty Montgomery, DeAndre Hopkins, DeMarco Murray, JuJu Smith-Schuster, Mohamed Sanu, Paul Richardson Jr, and Stefon Diggs.

The bottom third, the group with the lowest yard-to-touchdown ratio, is comprised of Jay Ajayi, Marquise Goodwin, Jamison Crowder, Michael Thomas, Isaiah Crowell, Julio Jones, Alex Collins, Adam Thielen, Orleans Darkwa, Bilal Powell, Delanie Walker, Adrian Peterson, Marqise Lee, Le'Veon Bell, Jack Doyle, Kelvin Benjamin, James White, Samaje Perine, Frank Gore, T.Y. Hilton, Matt Forte, Randall Cobb, Golden Tate, Keenan Allen, Carlos Hyde, C.J. Anderson, LeGarrette Blount, Ted Ginn Jr, Demaryius Thomas, LeSean McCoy, Rishard Matthews, Kareem Hunt, and DeSean Jackson.

Collectively, Group A has averaged 8.98 fantasy points per game compared to 8.36 for Group B, a 7% advantage. With samples this large, you can't expect a single huge outlier to swing things much one way or the other; in order for Group B to overtake Group A, you'd need a sustained shift across the whole group.

So that's what we're predicting. A sustained shift across the entire group large enough for Group B to outscore Group A over the rest of the season.

Be sure to check back in coming weeks to see how they do.


More articles from Adam Harstad

See all

More articles on: Analysis

See all

More articles on: Stats

See all