Friday, June 13, 2014


Yes, we are probably overdue for another visitation with one of the few universally useful precepts that emerged in the "neo-sabe" era--the notion of "regression to the mean."

And that principle has been highlighted and intensified in the data surrounding our ongoing "pet project" of 2014--the complete game.

Because CGs are quite clearly the exclusive province of starting pitchers, we can apply a favorite tool--the Quality Matrix (QMAX)--to an examination of what happens after a great individual performance.

QMAX tells us that, as they have dwindled, complete games are now almost exclusively "great performances." The aggregrate ERA for the 44 CGs (still not counting Clayton Kershaw!) demonstrates this as well: 0.52.

That's right. Half a run per nine innings. 27 of the 44 CGs thus far in 2014 are shutouts. There were two more games where no earned runs were permitted. Either way you slice it, more than 60% of all complete games feature performances in which no runs/no earned runs are allowed.

The QMAX data parses this further, and tells us more. As the chart shows, 25 of these games fall in the 1,1 slot (the very best games possible according to QMAX).

38 of the 44 CGs reside inside the "Elite Square" (the grouping of the four best performance gradations based on hit and walk prevention--the slots marked 1,1; 1,2; 2,1; and 2,2). That's 86% of the games.

The percentage of games in the "Success Square" (which, yes, we know, is not quite a square in the same way that a house is not a home...) is even higher. 42 of the 44 CGs fall inside it, which works out to 95%.

These are, then, with only a very few exceptions, the top hit and walk prevention games that occur during the season.

The next question is: what happens to these starters--the ones who've achieved a pinnacle of starting pitcher performance--when they start their next game?

What happens? They regress to the mean. And it's a bit shocking, in fact, because one might expect that these starters who reach the heights in such spectacular fashion (did we mention that their won-loss record in those CGs is 38-6 this year?) would actually be an aggregation that is a good bit better than the average starting pitcher.

But, in fact, when these pitchers take the mound the next time, they are, collectively speaking, quite ordinary. And both the QMAX data and the more basic measures bear that out.

In the QMAX data, the basic score rises from 1.68, 1.09/2.77 (a number that no individual pitcher could get remotely near...)  to 4.26, 2.59/6.85. The SS ratio drops almost in half (from 95% down to 49%). The ES ratio is only about one-fourth of what it was in the previous CG (23%, down from 86%). The "top hit prevention" games (those in the 1S and 2S rows of the matrix chart) drop almost as much as the ES data (from 86% down to just 26%). "Hit hard" (HH) games--the 6S and 7S rows shown in orange--jump from 2% to 36%.

In conventional stats, these starters go from a 38-6, 0.52 ERA performance to 13-15, 4.33 ERA in the "game after."

What's exceptionally interesting to us is the fact that in neither of these game populations, there are no games in the boxed region at the upper right of the chart, where hit prevention is great but walk prevention is spotty--a region we've named the "Power Precipice." While it's not really surprising that there aren't any CGs in this region--allowing walks means throwing more pitches, and the defining feature of the 21st-century CG is that there are few walks (1.0 per 9 IP) and a high amount of economy of scale with respect to pitch counts--it is nothing short of astonishing that there are no such games in this region to be found in the "game after" data.

Tommy John...a medical procedure, a QMAX region, a
clothing about a string of burger joints??
Instead, there are 10 games (26%, equal to the top hit prevention--S12--region) which fall in the lower left region where hits allowed are greater than average but walks are still exceptionally low--a zone we call the "Tommy John" region.

It should also be noted that K/9 in these CGs is not an aggregation or clustering of high-K performances: the average K/9 here is 7.3--right about average for all starting pitchers in 2014. The K/9 in the "games after" is actually a bit higher, at 7.5.

So we can conclude that CGs, at least as they are manifested in 2014, have a noticeable selection bias toward finesse pitchers as opposed to power pitchers--and this explains why the "regression to the mean" here actually goes a bit beyond "the mean." The pitchers who strike out five batters or less in their complete games outnumber those who strike out nine or more by about 35%, and their aggregate ERA in their "games after" is 5.92, more than two runs higher than the higher-K pitchers (3.63).

The suspicion here is that this effect has (as we said) intensified as complete games have grown significantly more scarce in the last twenty years. We'll go back and look at this for past years, and then we'll see just how consistent this manifestation of "regression to the mean" really is...