Larry Johnson's 417 Carries  A Reason To Avoid Him?
Posted 8/27 by Maurile Tremblay, Exclusive to Footballguys.com
Larry Johnson set a new NFL record last year with 417 regularseason carries. It is commonly suggested that such a voluminous workload generally portends a steep decline in productivity the following year. This suggestion has been dubbed the "Curse of 370," referring to the number of regularseason carries thought to trigger the decline.
Is the curse for real? This article will apply the statistical method of hypothesistesting to lend some insight into that question.
In statistics, hypothesistesting has three steps: (1) forming a hypothesis, (2) testing it, and (3) interpreting the results.
1. Forming the hypothesis
The "Curse of 370" was first written about in 2004 by Aaron Schatz  in Pro Football Forecast 2004 and in this article at FootballOutsiders.com. At the time, Ricky Williams was coming off of a 392carry season in 2003, and Schatz noted that the 22 RBs who'd previously gotten over 370 carries in a season performed fairly dismally, as a group, the following year.
Schatz has summarized the "Curse of 370" as follows: "A running back with 370 or more carries during the regular season will usually suffer either a major injury or loss of effectiveness the following year, unless he is named Eric Dickerson."
Note that even if there were no validity at all to the "Curse of 370," we would still expect players with 370+ carries to substantially regress to the mean the following year. In general, fantasy points are positively associated with carries; and 370 carries is far above reasonable expectations for just about any player. So any running back who gets 370+ carries in a season is a huge favorite to get fewer carries  and thus fewer fantasy points  the following year.
What matters, therefore, is not whether a player who got 370+ carries last year should be expected to see a decline in his fantasy production this year. Of course he should. What matters is whether a player who got 370+ carries last year should be expected to underperform a player who got, say, a mere 350 carries last year.
In other words, if you are drafting out of the #3 spot this year and are trying to decide between Larry Johnson and Frank Gore, the relevant question is not whether Larry Johnson 2007 will do better than Larry Johnson 2006. The relevant question, rather, is whether Larry Johnson 2007 will do better than Frank Gore 2007.
So the "Curse of 370" hypothesis, for practical purposes, might be phrased thus:
A player who gets more than 370 carries in Year N should, in Year N+1, be expected to underperform a player who got 344369 carries in Year N.
(Entering the 2004 season, there had been 22 players who had gotten 370+ carries in Year N and who had already played their Year N+1. I chose the 344369 range to compare it to because, as of 2004, there had also been 22 players in that range.)
2. Testing the hypothesis
There is a problem with testing hypotheses like the "Curse of 370." Such hypotheses are typically formed using all the data currently available  which means that there are no fresh data left to test them on. It is a fundamental rule of hypothesistesting that, whenever possible, you should not use the same data to both formulate and test your hypothesis. A short example will illustrate why this is so.
Suppose I roll a sixsided die 100 times and analyze the results. I will be able to find many patterns in the results of those 100 rolls. I may find, for example, that a three was followed by a six 40% of the time, or that a one was never followed by a six.
Would you trust any such patterns to hold true over the next hundred rolls? You shouldn't. If they do, it would just be coincidence. It is easy to find patterns by looking for them in a given set of data; but the test of whether those patterns are meaningful is whether they hold true in data that have not yet been examined.
So if the "Curse of 370" hypothesis was formed using data available up through the 2003 season, it should be tested only on data from 2004 and later.
The problem is that we are left with too small a sample of data to meaningfully test it on. Since 2003, only three RBs have played seasons following up a 370+ carry season  Jamal Lewis in 2004, Curtis Martin in 2005, and Shaun Alexander in 2006. (Ricky Williams had 392 carries in 2003, but did not play a followup season. If he had missed the 2004 season due to injury, it would make sense to include him in our data set; but since he missed the 2004 season due to retirement, which was almost certainly unrelated to his number of carries in 2003, his 2004 nonperformance is just noise.) Lewis, Martin, and Alexander all had terrible followup seasons, far underperforming the median from the group of 9 RBs coming off seasons with 344369 carries during that period. So the "Curse of 370" theory is currently going three for three. The problem is that going three for three is not a sufficient track record to be considered confirmed in a statistically significant sense.
The most commonly used standard of statistical significance is about five percent, or two standard errors, which means you will get a false positive in about one out of every 20 tests, on average. Whether this standard is the appropriate one to use for evaluating the "Curse of 370" will be discussed below. For now, suffice it to say that the standard cannot be satisfied with a sample of three players.
But do we really have to limit ourselves to data from 2004 and beyond? It is preferable, but is it absolutely necessary? What if all 25 of the RBs who had ever played followup seasons to 370carry years had underperformed the median from the 344369carry group? There must be a point where the pattern is so strong that we would be justified in accepting the Curse on the basis of "backtesting" it against previously known data, right?
Right. If the most commonly used standard of statistical significance when testing a hypothesis against fresh data is two standard errors, the typical standard when testing it against previously known data is four standard errors. ("Think of it as two standard errors to develop the hypothesis, and then two more to test it," writes Stanford Wong in his book, Sharp Sports Betting.)
So here is how we can set up a test using the 25 RBs who have so far carried the ball 370+ times and then played a followup season. There have been 31 RBs who have carried the ball 344369 times and then played a followup season. In their followup seasons, they produced a median VBD value of 71 points. (I am using VBD value instead of fantasy points to help control for the different eras they played in.)
If there is nothing to the "Curse of 370," and the 370+ carry group fares equally well the following year as the 349366 carry group, then we should expect half of the 25 RBs in the 370+ carry group to surpass the median from the 349366 group in their followup seasons.
Here is the group of RBs who have had 370+ carries and then played followup seasons.
Running Back 
year

rsh

N+1 VBD

Shaun Alexander 
2005

370

0

Curtis Martin 
2004

371

0

Jamal Lewis 
2003

387

0

Ricky Williams 
2002

383

90

LaDainian Tomlinson 
2002

372

202

Eddie George 
2000

403

17

Edgerrin James 
2000

387

0

Jamal Anderson 
1998

410

0

Terrell Davis 
1998

392

0

Jerome Bettis 
1997

375

18

Emmitt Smith 
1995

377

110

Barry Foster 
1992

390

35

Emmitt Smith 
1992

373

138

Christian Okoye 
1989

370

0

Eric Dickerson 
1988

388

80

Eric Dickerson 
1986

404

87

Gerald Riggs 
1985

397

62

Marcus Allen 
1985

380

25

James Wilder 
1984

407

74

Walter Payton 
1984

381

119

Eric Dickerson 
1984

379

57

Eric Dickerson 
1983

390

182

John Riggins 
1983

375

87

George Rogers 
1981

378

0

Earl Campbell 
1980

373

67

Here is the group of RBs who have had 344369 carries and then played followup seasons:
Running Back 
year

rsh

N+1 VBD

Edgerrin James 
2005

360

13

Tiki Barber 
2005

357

92

Clinton Portis 
2005

352

0

Rudi Johnson 
2004

361

84

Shaun Alexander 
2004

353

221

Corey Dillon 
2004

345

26

Ahman Green 
2003

355

39

Deuce McAllister 
2003

351

27

Fred Taylor 
2003

345

18

Stephen Davis 
2001

356

0

Jerome Bettis 
2000

355

7

Edgerrin James 
1999

369

179

Curtis Martin 
1999

367

78

Curtis Martin 
1998

369

72

Eddie George 
1998

348

124

Terrell Davis 
1997

369

233

Eddie George 
1997

357

69

Ricky Watters 
1996

353

65

Terry Allen 
1996

347

0

Terrell Davis 
1996

345

162

Curtis Martin 
1995

368

125

Emmitt Smith 
1994

368

225

Thurman Thomas 
1993

355

71

Emmitt Smith 
1991

365

209

Dalton Hilliard 
1989

344

0

Herschel Walker 
1988

361

68

James Wilder 
1985

365

0

Gerald Riggs 
1984

353

108

Earl Campbell 
1981

361

0

Walter Payton 
1979

369

97

Earl Campbell 
1979

368

149

The second group had a median VBD value of 71 in their followup seasons. Of the 25 RBs in the first group, 10 of them surpassed a VBD value of 71 in their followup seasons, while 15 of them did worse than a VBD value of 71 in their followup seasons.
3. Analyzing the results.
The group coming off of 370+ carry seasons has a 1015 record against the median of the group coming off of 344369 carry seasons. That is exactly one standard error below what would be expected if the groups were presumed equal. (If each RB from the first group, in accordance with the null hypothesis, has a 50% chance of beating the median from the second group, the standard error is just the difference between the RBs who beat that median and the RBs who are beaten by it, divided by the square root of the sample size  in this case, five divided by the square root of 25, or one.)
So not only does that fall short of the four standard errors generally required when backtesting a hypothesis against the data used to form it, but it even falls short of the two standard errors generally required when testing a hypothesis against fresh data.
Are those really the appropriate standards, though? It depends.
Those standards are overly stringent if we care only about whether a hypothesis is more likely to be true than not. If we have Larry Johnson and Frank Gore otherwise rated exactly evenly, for example, and are using the "Curse of 370" strictly as a tiebreaker, then we would not need airtight evidence of the Curse to pass on Johnson for Gore  a mere scintilla would suffice. But if we otherwise have Johnson rated solidly ahead of Gore, then we should want to be rather more confident in the Curse before selecting Gore instead of Johnson.
There is a sort of sliding scale. Assuming we like Johnson better than Gore before considering the curse, then the larger the difference between the two, the more confident we would need to be in the reality of the Curse before we should pass on Johnson.
While I would consider the Curse to be unconfirmed from a statistical standpoint, my tentative view is that, beyond a certain workload in Year N, a running back's productivity in Year N+1 will probably be adversely affected. But the sample of very highvolume runners is currently small enough that it's hard to draw any firm conclusions about what the cutoff point is (although it is very likely below 417 carries), what categories of runners are most likely to be affected (e.g., does college workload count?), or how it should be quantified. This is especially true since many of the runners in that sample played in a slightly different era, with different equipment and different medical treatment available and so on.
Personally, I am drafting Larry Johnson third this year.