Friday, November 18, 2011

QMAX SMOKES OUT THE CYA--NL & ELSEWHERE

That skunky smell you may be noticing is emanating out of the virtual cubicles at SB Nation, where BMOC Rob Neyer is wasting no time in proving our point about the vagaries of sabe-centric dogmatism, as referenced in our most recent post (!!), where we alluded to:

"the strange intractability inherent in the war over WAR, the guerrilla infighting, the race to phantom regions of moral rectitude..."

Now Bill James really is to blame for all of this--it was his high moral tone in the midst of his long-term, intransigent gadfly-ism that set the bar for the knee-jerk "us vs. them" mentality that Neyer and others have absorbed through the pores. Consequently it has made a quest that should have been conducted from the cerebrum into something that continues to be ruled by the cerebellum, despite all of the metamathematical anathemas that the Whole Sick Crew has been conjuring for better and worse in the neo-sabe Iron Age.

Rob is currently tilting windmills over the selection of Clayton Kershaw as the 2011 Cy Young Award winner. He and his former ESPN colleague, Keith Law, seem to have decided that they know best with  respect to which version of WAR (Wins Above Replacement, for those of you still break bread with a Visigothic splinter group...) is the one to apply to the task of ranking Cy Young candidates.

To say that this is a decision made based on factionalism and politics as opposed to any demonstrable technical knowledge on the part of these two is perhaps more bold a statement than we should make, given that these two are now BBWAA members (and, unbeknownst to themselves, have jumped the shark.) But overly bold as it might be, it is the plain and ugly truth.

The plain fact of the matter is that the war over WAR is a pointless one, and the idea that anyone could cite one or the other of the competing formulae as definitive is both insult and injury. Both versions--the one at Forman et fils and the one at Fangraphs--are flawed. Just how flawed is one of the murky embarrassments of the field, because rather than working to clarify the issues involved and possibly leapfrog past the limitations and distortions, we instead have careerist insiders using these tools for their own agendas.

It's ironic that Bill James just wrote an eloquent rejection of the notion of "expertise" as a claim for methodological superiority, only to witness the exact type of behavior he is critiquing rear its head.

It's ironic, but it's not surprising, given the track record of the two men in question. This is what happens when the quest for knowledge gets compromised by careerism.

The idea that Clayton Kershaw's selection as NL CYA is any kind of a blot on the award process, or on sabermetrics, or any similarly phrased journalistic exercise in misdirection, is beyond silly. (We'll address this question in greater detail below.)

Apparently our heroes think that because one version of WAR incorporates BABIP into its calculation and it creates distance between Kershaw and Roy Halladay, this is proof that we have all gone right back down the rabbit hole that we all just climbed out of when Felix Hernandez was awarded the AL CYA last year.

There are a few interesting conceptual problems about BABIP and how it should be adjusted for in a WAR statistic that rarely--if ever--get addressed. The one that seems to elude most of its practitioners is that its slice of pitching statistics is both incomplete and based on half-truths. (For one thing, it's highly ironic that a stat based on batting average has become so pivotal in a field that continues to insist that BA is a woefully inadequate tool.)

Another, possibly more devastating problem is that other slices of pitching performance that may more accurately depict the way in which pitchers prevent run scoring--such as situational pitching--are completely ignored and discarded in the mad rush to a so-called "fielding independent" perspective. Each of these constitutes subsets of data, but one has become inordinately privileged as a result of a series of assumptions that are nowhere near being verified.

On second thought...maybe not.
In the continuing absence of a definitive solution to the war over WAR, there is a need--now more than ever--to revisit other probabilistic modeling methods, particularly for pitching. We've been doing just that here for awhile with the Quality Matrix, which--yes--was invented here so many years ago. Rather than simply sum up and perform adjustments on run prevention, it provides a probabilistic basis for winning percentage by creating a bidirectional performance grid that is tied to actual game results.

What's different in QMAX is its willingness to throw away the runs to get at the probabilities of the combined components that result in runs. Its indirectness is upfront, as opposed to the indirectness in the application of WAR for pitchers, which makes a series of murky assumptions about what the "replacement level" of runs allowed is for each individual pitcher. You will chase your tail in trying to reconcile how those replacement level figures are calculated.

You don't have to bother with that for QMAX, because the runs are removed and probabilistic winning percentages are calculated from the thousands of individual games played in each season. BABIP-based systems really slide over the sample size issues in their calculations, figuring (conveniently) that the details of run-scoring don't really matter--their regression model is supposed to handle it all. There is increasing evidence that it doesn't.

Each of those QMAX cells has hundreds-thousands of games represented, with probabilistic winning percentages that are linear in their descent from the best games (at the top left of the matrix) to the worst (in the lower right). Let's see what QMAX has to say about the 2011 NL CYA.

First we'll look at four QMAX matrix boxes--for Kershaw, Halladay, Halladay's illustrious teammate Cliff Lee, and Ian Kennedy of the Arizona Diamondbacks. These are the four best starting pitchers in the 2011 NL according to QMAX. The other benefit with this tool is that it gives a graphic presentation of the quality pattern of the individual pitcher.

The measurements on the "S" (hit/XB prevention) and "C" (walk prevention) create a "shape" function that can't be found in other pitching statistics. We'll look at the "shape data" for these four pitchers and what it tells us as we go along.

Remembering that the charts depict the best in the upper left and the worst in the lower right, we can see that these four pitchers had very fine years in 2011.

What's clear from the matrix boxes is that Kershaw and Lee had many more games in the very best area of the QMAX chart, the yellow-shaded area that we call the "Elite Square," where 83% of the games in that region wind up as wins for the team whose starter inhabits it.

What's also clear is that Kershaw and Kennedy were both able to avoid getting "hit hard" (the region on the chart that's shown in orange) to a far greater extent than the two Phillies' aces.

These two veterans inhabit what we call the "Tommy John" region of the QMAX chart (the box at the lower left) with much greater frequency than the two younger pitchers. These are games where hits are plentiful, walks are scarce, and runs saved over probabilistic expectation can be achieved.

We can't simply read the QMAX chart to know which of these four had the best season; we need to compile the QMAX range data to see how the candidates compare. The range data creates totals for each of the regions defined on the chart--the aforementioned "Elite Square", the broader "Success Square" (which many folks have already pointed out, thank you, is not quite a square), the counterintuitive regions in the upper right and lower left (the "Tommy John" and "Power Precipice" regions--that last one may be familiar to you from our look at Jonathan Sanchez recently), and the deadly box in the lower right, the "Blown Start" region (not much in play for the four pitchers here).

The range data shows how well these guys really did. "Success Square" percentages in the 60s and 70s; "Elite Square" numbers in the 20s through 40s (Lee had a magnificent run of these games in the second half of 2011 and wound up at 44% in this category, one of the highest totals in recent memory). Kershaw excelled at avoiding "hit hard" games, with only 6%; Kennedy was very good as well (15%), while the two Phils were much closer to league average in this stat.

In terms of top hit prevention games (measured in the top two rows of the QMAX diagram, the ones referred to as "S12"), Kershaw and Lee had excellent percentages, while Halladay and Kennedy were merely above average.


We can rank these for each pitcher relative to the others, assigning points to the relative values: the bright orange worth three points, the pale orange two, and the yellow one. That adds up to what we call the "Quality Range Score" (QRS). It's just a crude indicator, but it might well be suitable for breaking a tie or moving a close contest in one particular direction. Kershaw has the advantage here.

The summary stats for QMAX are the "S" and "C" averages, which add up to a total ("T") ranking. (The lower the "T" score, the better--just like ERA.) By using the probabilistic win percentages or values assigned to each cell in the matrix (these are called QWVs), we can calculate the pitcher's overall quality value, his Quality Winning Percentage (QWP).

Remember, we are pushing back against actual wins and losses here, as represented in the linkage between each performance cell and the historical results in that cell. What is lost by ignoring questions of "fielding independence" is offset by a grounding in both probability and reality.

The result is "abstract," but so is FIP. We can adjust the "T" value to an ERA construct if it makes you feel more at home, but we haven't bothered. What you need to know is that any "T" score below six is an ace, anything below five is an historic season. (Pedro Martinez scored 2.00 "S", 1.90 "C"/3.90 "T" in 2000, which was pretty darned historic.)

What QMAX shows us is a narrow lead for Kershaw in both the "T" score and in the QWP. What it does is reinforce the largely-held impression by intelligent folks (those who are undraped in questionable "expertise" and sportswriterly posturing) that the race between Kershaw, Halladay and Lee was extremely close.

Notice that we are not discussing pitching triple crowns here, or any other stats. QMAX is agnostic concerning strikeouts. Based on the down-at-the-game level probabilities, and without recourse to any of the ideological puffery of a Law or the grandstanding bully pulpit of a Neyer (OK, we're partially guilty on that one...), we figure that it's actually OK to vote for Clayton Kershaw with a (relatively) free conscience. (The relativity has more to do with what else you've been up to.)

We sincerely wish that our two fine feathered "friends" would quit leading us on a wild goose chase. However, it is migration season and the skies are pretty crowded. No one would blame you if you hauled out the shotgun and took aim at the sky. Just don't take aim at our two friends here, who seem to think they are flying high up in the air, but are actually stuck on the ground.