Understanding Expectation and Variance
By Maurile Tremblay
When we project Keenan Allen to score 9.5 points in FanDuel's scoring system, what does it mean?
It doesn't mean that we expect him to score precisely 9.5 points. That's possible, but it's very unlikely. Even if 9.5 is more likely than any other specific number, that exact outcome occupies an exceedingly small slice of probability space.
What it means in theory is that if you take each fantasy point total Allen could conceivably get, multiplied it by the respective probability of getting that score, and add all of those products up, you'd get a sum of 9.5. (Using the same procedure, we'd project the roll of a six-sided die to produce a value of 3.5, because 1*1/6 + 2*1/6 + … + 6*1/6 = 3.5. Even though the die lacks a side with 3.5 on it, 3.5 is a good projection in the sense that it would be the fair over/under at even odds.)
I say "in theory" because nobody actually does projections that way. If you consult the section on projections, you won't see anyone estimating the probability that Keenan Allen will score 0.0 points, and then doing the same for 0.1 points, 0.2 points, and so on all the way up to 60+ points before doing some multiplication and addition to get a projection of 9.5 points.
Rather, 9.5 points represents a decent estimate of his points if the game goes the way we expect—if Allen catches an expected number of passes for an expected number of yards and touchdowns, based on all the factors outlined in section 4.5 on projections.
But we can reverse engineer that 9.5-point projection to tell us something about the implied distribution curve comprising all those other possibilities. If you know what a normal distribution is—sometimes called a "bell curve"—the distribution of probabilities implied by a player's projection will share a number of factors with that. (A player's distribution of point probabilities is not actually a normal curve. A normal curve is bilaterally symmetrical, but a player's fantasy-point distribution will be a bit skewed because it extends further to the right than to the left, where it reaches a fairly hard wall at zero. If you want to nerd out, a player's fantasy-point probability distribution is more like a gamma distribution than a normal distribution.)
For one thing, a player's fantasy-point probability distribution will generally be unimodal, which is a fancy way of saying that it generally has just one peak. And that peak will generally be roughly equal to the projection itself. So that means that while it is unlikely that Keenan Allen will score exactly 9.5 points, he is more likely to score 9.5 points than 10 or 11 or 12 points, or than eight or seven or six points. The further away the projection gets from 9.5, the less likely that particular point total will be to occur.
Different players, however, will have differently shaped distributions even if they have the same projected point total.
In a given week, Keenan Allen and Anquan Boldin may both be projected to score 9.5 points. But Anquan Boldin's distribution curve might be relatively tall and skinny while Keenan Allen's is relatively short and fat. What that would mean is that while both players should score around 9.5 points on average, Boldin is likely to score between seven and 12 points, while Allen is likely to score between four and 15 points. While both players' projected point totals have the same expectation, Allen's projection has a greater variance.
Just as any individual player's projected point total will have an expectation and variance, so will any group of players. In fact, the group's projected total will just be the sum of the individuals' totals. As long as none of the players are playing in the same games, the same is true for variance. You find the group's variance by summing the variance of the individuals.
Keep in mind that when multiple players from the same group are playing in the same game, the variance of the group cannot be reached through a simple sum. The group's variance can be greater than or less than the sum of the individual players' variance, depending on how the performances of the individuals are correlated with each other.
For example, a quarterback's performance and his primary receiver's performance are positively correlated with each other—meaning that when one does well, the other will usually do well; and when one does poorly, the other will usually do poorly. In this situation, the variance of the two players as a group is greater than the sum of their individual variance.
By the same token, a quarterback's performance is negatively correlated with that of the defense opposing him. To put it another way, when one does well, it's bad news for the other. When considering a quarterback and the defense opposing him as a group, the group's variance will be less than the sum of the variance of the component players.
Here's something that's true of variance across all of life's uncertain activities: for the underdog, variance is a friend. It's the only thing giving the underdog a chance to win. For the favorite, variance is the enemy. It's what gives his opponents a chance to beat him.
How can we use that bit of wisdom in our DFS exploits? Consider the difference between cash games and tournaments.
In a cash game, let's say we think we'll have to score 110 fantasy points in order to finish in the money, and let's say that we construct a lineup that is expected to score 116 points. That makes us the favorite! If our expectations are calculated correctly, we'll win more than half the time no matter what. And in fact, if it weren't for variance, we'd win every time. With zero variance and a correctly calculated expectation of 116 points, we'd score 116 points with 100% certainty—never more, never less—and automatically beat our goal of 110. Zero variance is impossible in fantasy football (unless you start only players who are inactive, which we don't recommend), but as long as your expectation is above the projected cutoff to finish in the money, less variance is better than more variance.
In tournaments, on the other hand, your expectation will nearly always be out of the money. Let's say we think we'll have to score 140 points to cash in a particular tournament, for example, but our best lineup is expected to score only 116 points. With zero variance in this case, we'd be toast. The only reason we have a chance to finish in the money is because of variance—because of the fact that sometimes we'll score well above 116 points, and sometimes we'll score well below 116 points. It's the "above" part that we care about here. Even if our team scores only 116 points on average, with a high enough variance, we may score more than 140 points as often as 25% of the time. That will make us money if only 20% of the field gets paid.
So we see that, in cash games, we want a high expectation with a low variance; and in tournaments, we want a high expectation with a high variance. That means that in a cash game, we generally want to fill our roster with low-variance players, while in a tournaments, we're happy to include more high-variance players.
How can you distinguish between low-variance players and high-variance players? There isn't a magic statistic that gives it away. The simplest rule of thumb, if you are a generally well informed NFL fan, is to ask yourself how well you think you can predict a player's performance in the upcoming game. If you think you're pretty sure you can pin down his likely production into a fairly narrow range, he's a low-variance player. If you have only a wild guess rather than a well-grounded estimate, he's a high-variance player.
In more concrete terms, high-variance players are likely to fit into one of the following categories:
(1) His role in the offense is uncertain due to a teammate's injury. An example would be Philadelphia's Ryan Mathews if DeMarco Murray is banged up and may not play his usual role. (Incidentally, Murray himself would be high-variance in that situation as well, but since FanDuel generally does not significantly discount a player's salary if he is banged up but expected to play, injured players generally don't provide great value, and should usually be avoided even if they offer high variance.)
(2) His role in the offense varies significantly based on game script. Maybe the Giants' Shane Vereen will get a lot of touches if his team gets behind early, but few touches if his team is protecting a lead. The prospect of the Giants getting behind early may be a worthwhile gamble.
(3) He is a goal-line specialist who isn't a big part of the offense between the twenties. This fits the boom-or-bust paradigm because the player could score multiple touchdowns, but if he fails to find the end zone he'll be nearly worthless.
(4) He is a complimentary player in an offense that is expected to score a lot of points. If a game becomes a shootout, even a team's No. three wide receiver could have a big day. Look for games with high over/unders.