Fantasy, in Theory: Bayes and Bob

Using Bayesian Inference to improve our predictions, (and also: Bob Henry is pretty good).

I wrote last week about how, historically, preseason ADP predicts late-season performance about as well as a player’s performance over the first four games does. Highly-drafted disappointments tended to rebound. Late-round surprises tended to fade. The final result wound up resembling our initial expectations much more than we might assume.

But the question of whether ADP or early-season performance is better doesn’t answer what we really want to know: is ADP or early-season performance best?

I wrote a couple of years back about a mathematical strategy called “Bayesian Inference”. The math can get quite complicated, but the underlying idea is very simple: we take our initial belief, (called our prior), and we update it based on any new information we receive. The amount we update is weighted by the strength of our prior and the strength of the new information.

So, for instance, if our prior is that Rob Gronkowski is really, really good, that’s a pretty strong prior. It’s based on a lot of previous information. If Rob Gronkowski has a bad game, we revise our opinion of him downward, but not very much. The weight of one bad game is very small compared to the weight of our prior.

On the other hand, if we think Hunter Henry is pretty decent, that’s not based on nearly as much information. It’s a weaker prior. And so if Hunter Henry has an amazing game, that one game is going to affect our opinion of him much more than it would affect our opinion of Rob Gronkowski.

This might seem pretty commonsense; indeed, as I outlined in my article, most people operate as closet Bayesians in real life, whether they know it or not. But for some reason we tend to forget this in fantasy, attaching way too much emphasis to early-season results at the expense of all of our offseason research.

What does this mean for predicting the future?

As I said, the actual math behind Bayes Theorem can be pretty involved, but if we say that ADP, (our prior), and early-season performance, (our new information), are roughly equally predictive… in theory, weighting them evenly and combining them should be even more predictive than either alone.

Is that what we see? I mentioned last week the correlations between ADP and stretch performance and between early-season performance and stretch performance. Here’s a chart that has both of those correlations, as well as the correlation between our Bayesian-inspired average of the two and stretch performance.

PositionADPEarly-seasonADP + early-season
Quarterback 0.260 0.215 0.296
Running back 0.309 0.644 0.655
Wide receiver 0.648 0.632 0.706
Tight end 0.295 0.559 0.533
All 0.548 0.659 0.697

Huzzah! In four out of five instances, the simple Bayesian average outperformed both ADP and early-season performance at predicting performance down the stretch. (The one exception was at tight end, where early-season performance was slightly more predictive.)

So this seems like a pretty clear case for Bayesian updating. But lets put Bayes to the real test.

So far we’re just using a very simple, naive average of the two values, not taking into account any external factors. But Footballguys employs Bob Henry to project rest-of-season performance. Henry has been writing about fantasy football for 20 years, and over that span he has won numerous industry awards for the accuracy of his projections. And, luckily, I’ve saved records of all of his rest-of-season projections from every week of 2015.

So let’s compare Bob Henry, one of the best in the business, against our simple Bayesian average.

PositionADPEarly-seasonADP + early-seasonBob Henry
Quarterback 0.260 0.215 0.296 0.404
Running back 0.309 0.644 0.655 0.651
Wide receiver 0.648 0.632 0.706 0.669
Tight end 0.295 0.559 0.533 0.716
All 0.548 0.659 0.697 0.703

Bob: 1, Bayes: 0.

I have two big takeaways from this. First, it’s possible for an experienced and skilled observer to take into account additional factors beyond just ADP and early-season performance and integrate them to improve predictive accuracy. As an example: Derek Carr had an ADP of QB19 last season. Over the first four weeks, he was QB17. Bob Henry projected him as QB12 the rest of the way, a projection that wasn’t supported by any “hard” data to that point, but was likely resulting from factors like observing his quality of play and the quality of his skill players, (both Crabtree and Cooper were looking better than expected early). Indeed, Carr wound up performing as QB11 down the stretch.

The second takeaway, though is this: the advantage that Bob Henry offers over our naive Bayesian average is not huge. The bulk of the predictive power is coming from preseason expectations and early-season performance. Bob Henry is able to incorporate other information to improve on it, but the improvement is mostly on the margins.

There’s one last thing I want to test. Bob is very, very good at what he does; the awards and trophies on his mantle can make this case more forcefully than I could. He is basing his projections on a number of different pieces of information, and obviously preseason expectations and early-season performance are going to be two huge pieces in that.

If Henry is weighting those two factors optimally, then adding them to his own projections should not improve predictive power any. So let’s put this to the test. I’ve weighted Henry’s projections, ADP, and early-season performance equally and tested correlations one last time.

PositionADPEarly-seasonADP + early-seasonBob HenryADP + Early + Bob
Quarterback 0.260 0.215 0.296 0.404 0.357
Running back 0.309 0.644 0.655 0.651 0.682
Wide receiver 0.648 0.632 0.706 0.669 0.707
Tight end 0.295 0.559 0.533 0.716 0.618
All 0.548 0.659 0.697 0.703 0.718

Ah, an interesting result. This would suggest that perhaps Bob Henry is not assigning enough weight to either preseason ADP or early-season performance. Or, you know, it could just suggest that the data is noisy and we’re overfitting.

But if we’re going to overfit anyway, why not go all the way with it? I’ve played around with the weights of each of the three factors, (ADP, early-season performance, and Bob Henry’s rest-of-season projections), to find what combination produced the highest correlation with rest-of-season performance.

Again, to be clear: this is certainly a case of overfitting. Data is noisy from year-to-year, and just because this produced the optimal results in 2015 doesn’t mean this weight will produce the optimal results in 2016. In fact, I would bet a large sum of money that it won’t. This is mostly presented as an interesting thought experiment.

With the emptors properly caveated, the ideal weighting I found was 43.2% Bob Henry’s projections, 35.7% average draft position, and 21.1% early-season performance. This confirms that Bob Henry alone is the best predictor of future performance, but the higher weight given to ADP suggests that if he has any bias, it’s towards early-season performance at the expense of preseason expectations.

Or, again, that perhaps 2015 was just a weird year, because this data is extremely noisy.

Here’s one last chart that includes all of the correlations to this point, including one using our calculated weights for the three inputs.

PositionADPEarly-seasonADP + early-seasonBob HenryADP + Early + BobWeighted A+E+B
Quarterback 0.260 0.215 0.296 0.404 0.357 0.382
Running back 0.309 0.644 0.655 0.651 0.682 0.674
Wide receiver 0.648 0.632 0.706 0.669 0.707 0.710
Tight end 0.295 0.559 0.533 0.716 0.618 0.632
All 0.548 0.659 0.697 0.703 0.718 0.721

For all of our overfitting, you can see how insignificant the predictive gains actually are. This shows just how close we already are to the predictive threshold. In other words, no one will ever be able to predict future performance with 100% accuracy; there will always be some unanticipated variation, which means there’s a practical limit in how high we can get our correlations. The fact that all of our efforts only managed to get us up to a correlation of 0.721 shows how close to that limit we already were.

It is said that if you square a correlation coefficient, (R^2), the resulting value represents how much of your outcome data is explained by your input variable. So a correlation coefficient in the low 0.7 range gives an R^2 of around 0.5. In other words, despite our best efforts, we’re probably never going to be able to explain significantly more than 50% of the variation in the data ahead of time.

As for the rest of the time… well, fantasy football is weird.

Conclusions:

  • Bayesian Inference is a simple, but very powerful tool. We should update our prior beliefs to account for new information proportional to the strength of our initial beliefs and the value of the new information.
  • Bob Henry is demonstrably good at what he does.
  • If Bob Henry has a blind spot, it’s possible that he underrates preseason expectations a hair when creating his rest-of-season projections.
  • It’s more likely, of course, that we’re committing statistical malpractice and badly overfitting our model to the data available.
  • No matter what we do, we’re probably never going to be able to explain more than 50% of future data.

Follow @AdamHarstad


More articles from Adam Harstad

See all

More articles on: Projections

See all

More articles on: Timeless

See all