Larry Johnson's 417 Carries - A Reason To Avoid Him?

  Posted 8/27 by Maurile Tremblay, Exclusive to Footballguys.com

Larry Johnson set a new NFL record last year with 417 regular-season carries. It is commonly suggested that such a voluminous workload generally portends a steep decline in productivity the following year. This suggestion has been dubbed the "Curse of 370," referring to the number of regular-season carries thought to trigger the decline.

Is the curse for real? This article will apply the statistical method of hypothesis-testing to lend some insight into that question.

In statistics, hypothesis-testing has three steps: (1) forming a hypothesis, (2) testing it, and (3) interpreting the results.

1. Forming the hypothesis

The "Curse of 370" was first written about in 2004 by Aaron Schatz -- in Pro Football Forecast 2004 and in this article at FootballOutsiders.com. At the time, Ricky Williams was coming off of a 392-carry season in 2003, and Schatz noted that the 22 RBs who'd previously gotten over 370 carries in a season performed fairly dismally, as a group, the following year.

Schatz has summarized the "Curse of 370" as follows: "A running back with 370 or more carries during the regular season will usually suffer either a major injury or loss of effectiveness the following year, unless he is named Eric Dickerson."

Note that even if there were no validity at all to the "Curse of 370," we would still expect players with 370+ carries to substantially regress to the mean the following year. In general, fantasy points are positively associated with carries; and 370 carries is far above reasonable expectations for just about any player. So any running back who gets 370+ carries in a season is a huge favorite to get fewer carries - and thus fewer fantasy points - the following year.

What matters, therefore, is not whether a player who got 370+ carries last year should be expected to see a decline in his fantasy production this year. Of course he should. What matters is whether a player who got 370+ carries last year should be expected to underperform a player who got, say, a mere 350 carries last year.

In other words, if you are drafting out of the #3 spot this year and are trying to decide between Larry Johnson and Frank Gore, the relevant question is not whether Larry Johnson 2007 will do better than Larry Johnson 2006. The relevant question, rather, is whether Larry Johnson 2007 will do better than Frank Gore 2007.

So the "Curse of 370" hypothesis, for practical purposes, might be phrased thus:

A player who gets more than 370 carries in Year N should, in Year N+1, be expected to underperform a player who got 344-369 carries in Year N.

(Entering the 2004 season, there had been 22 players who had gotten 370+ carries in Year N and who had already played their Year N+1. I chose the 344-369 range to compare it to because, as of 2004, there had also been 22 players in that range.)

2. Testing the hypothesis

There is a problem with testing hypotheses like the "Curse of 370." Such hypotheses are typically formed using all the data currently available - which means that there are no fresh data left to test them on. It is a fundamental rule of hypothesis-testing that, whenever possible, you should not use the same data to both formulate and test your hypothesis. A short example will illustrate why this is so.

Suppose I roll a six-sided die 100 times and analyze the results. I will be able to find many patterns in the results of those 100 rolls. I may find, for example, that a three was followed by a six 40% of the time, or that a one was never followed by a six.

Would you trust any such patterns to hold true over the next hundred rolls? You shouldn't. If they do, it would just be coincidence. It is easy to find patterns by looking for them in a given set of data; but the test of whether those patterns are meaningful is whether they hold true in data that have not yet been examined.

So if the "Curse of 370" hypothesis was formed using data available up through the 2003 season, it should be tested only on data from 2004 and later.

The problem is that we are left with too small a sample of data to meaningfully test it on. Since 2003, only three RBs have played seasons following up a 370+ carry season - Jamal Lewis in 2004, Curtis Martin in 2005, and Shaun Alexander in 2006. (Ricky Williams had 392 carries in 2003, but did not play a follow-up season. If he had missed the 2004 season due to injury, it would make sense to include him in our data set; but since he missed the 2004 season due to retirement, which was almost certainly unrelated to his number of carries in 2003, his 2004 nonperformance is just noise.) Lewis, Martin, and Alexander all had terrible follow-up seasons, far underperforming the median from the group of 9 RBs coming off seasons with 344-369 carries during that period. So the "Curse of 370" theory is currently going three for three. The problem is that going three for three is not a sufficient track record to be considered confirmed in a statistically significant sense.

The most commonly used standard of statistical significance is about five percent, or two standard errors, which means you will get a false positive in about one out of every 20 tests, on average. Whether this standard is the appropriate one to use for evaluating the "Curse of 370" will be discussed below. For now, suffice it to say that the standard cannot be satisfied with a sample of three players.

But do we really have to limit ourselves to data from 2004 and beyond? It is preferable, but is it absolutely necessary? What if all 25 of the RBs who had ever played follow-up seasons to 370-carry years had underperformed the median from the 344-369-carry group? There must be a point where the pattern is so strong that we would be justified in accepting the Curse on the basis of "back-testing" it against previously known data, right?

Right. If the most commonly used standard of statistical significance when testing a hypothesis against fresh data is two standard errors, the typical standard when testing it against previously known data is four standard errors. ("Think of it as two standard errors to develop the hypothesis, and then two more to test it," writes Stanford Wong in his book, Sharp Sports Betting.)

So here is how we can set up a test using the 25 RBs who have so far carried the ball 370+ times and then played a follow-up season. There have been 31 RBs who have carried the ball 344-369 times and then played a follow-up season. In their follow-up seasons, they produced a median VBD value of 71 points. (I am using VBD value instead of fantasy points to help control for the different eras they played in.)

If there is nothing to the "Curse of 370," and the 370+ carry group fares equally well the following year as the 349-366 carry group, then we should expect half of the 25 RBs in the 370+ carry group to surpass the median from the 349-366 group in their follow-up seasons.

Here is the group of RBs who have had 370+ carries and then played follow-up seasons.

Running Back
year
rsh
N+1 VBD
Shaun Alexander
2005
370
0
Curtis Martin
2004
371
0
Jamal Lewis
2003
387
0
Ricky Williams
2002
383
90
LaDainian Tomlinson
2002
372
202
Eddie George
2000
403
17
Edgerrin James
2000
387
0
Jamal Anderson
1998
410
0
Terrell Davis
1998
392
0
Jerome Bettis
1997
375
18
Emmitt Smith
1995
377
110
Barry Foster
1992
390
35
Emmitt Smith
1992
373
138
Christian Okoye
1989
370
0
Eric Dickerson
1988
388
80
Eric Dickerson
1986
404
87
Gerald Riggs
1985
397
62
Marcus Allen
1985
380
25
James Wilder
1984
407
74
Walter Payton
1984
381
119
Eric Dickerson
1984
379
57
Eric Dickerson
1983
390
182
John Riggins
1983
375
87
George Rogers
1981
378
0
Earl Campbell
1980
373
67

Here is the group of RBs who have had 344-369 carries and then played follow-up seasons:

Running Back
year
rsh
N+1 VBD
Edgerrin James
2005
360
13
Tiki Barber
2005
357
92
Clinton Portis
2005
352
0
Rudi Johnson
2004
361
84
Shaun Alexander
2004
353
221
Corey Dillon
2004
345
26
Ahman Green
2003
355
39
Deuce McAllister
2003
351
27
Fred Taylor
2003
345
18
Stephen Davis
2001
356
0
Jerome Bettis
2000
355
7
Edgerrin James
1999
369
179
Curtis Martin
1999
367
78
Curtis Martin
1998
369
72
Eddie George
1998
348
124
Terrell Davis
1997
369
233
Eddie George
1997
357
69
Ricky Watters
1996
353
65
Terry Allen
1996
347
0
Terrell Davis
1996
345
162
Curtis Martin
1995
368
125
Emmitt Smith
1994
368
225
Thurman Thomas
1993
355
71
Emmitt Smith
1991
365
209
Dalton Hilliard
1989
344
0
Herschel Walker
1988
361
68
James Wilder
1985
365
0
Gerald Riggs
1984
353
108
Earl Campbell
1981
361
0
Walter Payton
1979
369
97
Earl Campbell
1979
368
149

The second group had a median VBD value of 71 in their follow-up seasons. Of the 25 RBs in the first group, 10 of them surpassed a VBD value of 71 in their follow-up seasons, while 15 of them did worse than a VBD value of 71 in their follow-up seasons.

3. Analyzing the results.

The group coming off of 370+ carry seasons has a 10-15 record against the median of the group coming off of 344-369 carry seasons. That is exactly one standard error below what would be expected if the groups were presumed equal. (If each RB from the first group, in accordance with the null hypothesis, has a 50% chance of beating the median from the second group, the standard error is just the difference between the RBs who beat that median and the RBs who are beaten by it, divided by the square root of the sample size - in this case, five divided by the square root of 25, or one.)

So not only does that fall short of the four standard errors generally required when back-testing a hypothesis against the data used to form it, but it even falls short of the two standard errors generally required when testing a hypothesis against fresh data.

Are those really the appropriate standards, though? It depends.

Those standards are overly stringent if we care only about whether a hypothesis is more likely to be true than not. If we have Larry Johnson and Frank Gore otherwise rated exactly evenly, for example, and are using the "Curse of 370" strictly as a tiebreaker, then we would not need airtight evidence of the Curse to pass on Johnson for Gore - a mere scintilla would suffice. But if we otherwise have Johnson rated solidly ahead of Gore, then we should want to be rather more confident in the Curse before selecting Gore instead of Johnson.

There is a sort of sliding scale. Assuming we like Johnson better than Gore before considering the curse, then the larger the difference between the two, the more confident we would need to be in the reality of the Curse before we should pass on Johnson.

While I would consider the Curse to be unconfirmed from a statistical standpoint, my tentative view is that, beyond a certain workload in Year N, a running back's productivity in Year N+1 will probably be adversely affected. But the sample of very high-volume runners is currently small enough that it's hard to draw any firm conclusions about what the cutoff point is (although it is very likely below 417 carries), what categories of runners are most likely to be affected (e.g., does college workload count?), or how it should be quantified. This is especially true since many of the runners in that sample played in a slightly different era, with different equipment and different medical treatment available and so on.

Personally, I am drafting Larry Johnson third this year.