Tuesday, November 22, 2011

A FLUKE FOR ALL TIME, or: QMAX AND STARTING PITCHER MVPs


We weren't quite expecting Justin Verlander to become a double award winner this year, but the BB-WAA has had its way with us. Rather than remain doctrinaire, we figured it might be worth using the Quality Matrix (QMAX) to examine Verlander in the context of those starting pitchers who've won the MVP award.

In our previous entry we showed how closely bunched four NL pitchers (Clayton Kershaw, Roy Halladay, Cliff Lee, and Cole Hamels) were in the QMAX data. Given that QMAX is a 7 x 7 bidirectional matrix, it's probably not surprising to you that the aggregate average in any given year is somewhere around 7--which is right at the dead center of the matrix chart. As offensive levels change, that average fluctuates up and down--from a low of 6.81 in 1968 to a high of 7.59 in 2000.

So you can see that scores around 5 are excellent and usually match up well with an ERA+ between 145 and 160. The fluctuations in ERA+ generally have to do with individual factors--flukes of clutch pitching, or extremely low distributions of extra-base hits, etc.--that pull away from QMAX's large-scale probabilistic centrifugal force.

OK, enough of that, let's try to contextualize Verlander using QMAX. How does his season compare to the other starting pitchers who either won both the CYA and MVP, or won the MVP prior to the creation of the CYA, or who won the CYA and were strongly ballyhooed for the MVP?

Let's cut right to the QMAX value chart, the basic QMAX average and the Quality Winning Percentage (QWP).

Don Newcombe in his heyday...

We can see that while Verlander ranks seventh in the overall QMAX score and sixth in QWP, his "T" score of 4.92 and his .725 QWP is well within the range of performance where pitchers have been awarded MVPs as well as CYAs. (Don Newcombe, the first pitcher to win a CYA, and who also won the MVP award that year in large part due to his winning 27 games, has the lowest QWP of any of these pitchers.)

Verlander's 2011 season, from the basic QMAX data, looks like pretty much a dead ringer for Roger Clemens' 1986 season.

But this is where we can bring in QMAX's "shape" component to add context to the basic data. The QMAX range data is extremely suggestive in providing us with a series of percentages for performance criteria within the expanse of the QMAX matrix chart.

We want to look at the ranges within the QMAX chart that seem to have the greatest "range" (distance from the best score to the worst from among pitchers in this most distinguished sample). When we examine the QMAX range chart, we can see that the range categories that show the most fluctuation are the "Elite Square" (ES) and the "Hit Hard" (HH) sector.

We can see that with respect to those stats, Verlander is again down in the pack a bit. His ES score is seventh best and his HH percentage, while excellent (the average AL pitcher was hit hard in 30% of his starts during 2011), is tied for sixth in this rarefied company.

We follow with a whole series of QMAX charts for these pitchers, and we will conclude by adding the QMAX data for three pitchers not on the current list--Ron Guidry and his storybook 25-3 season with the Yankees in 1978; Pedro Martinez' best-ever season in 2000 (even though most would expect us to be looking at 1999 instead, when Pudge Rodriguez beat him out for MVP); and Zack Greinke in his much ballyhooed 2009 campaign.

What these charts mostly tell you is that great pitchers have very similar success patterns. They may flip-flop on their top hit prevention (S12) games--some have more in the "1" (most dominant) area, some have it in "2", but these games constitute at least 50% of all their starts. In the case of Bob Gibson and his legendary 1968 season, that figure breaks 75%.

Denny McLain isn't really given all that much credit for his achievements in 1968--the modern low point in run scoring has become a bit exaggerated, and the campaign against the value of wins has also caused many to put aside his 31-win season. (It's now been longer since McLain achieved this feat in 1968--forty-three years and counting--that it was between McLain and Dizzy Dean, who did in 1934. It will probably be a whole lot longer before anyone does it again.) He also suffers in comparison to Gibson's incredible achievement.

Bobby Shantz--a true "pocket ace" in 1952
Of the three pre-1960 pitchers on this list (Newcombe, Bobby Shantz, Hal Newhouser) that we chose to include, it's Shantz' season that is the most notable--if only for the fact that Bobby was one of the tiniest aces in baseball history. (Forman et fils lists him at 5'6" and 139 lbs.--now that's not just tiny, that's virtually nonexistent.)

A look at Shantz' game logs in 1952 shows that he was on his way to a sub-4 QMAX season as late as August 22nd (when his record was 22-4, 1.81), but he just didn't have the stamina to sustain such an effort over a full season and he faded badly in September. (Five of the seven games that Shantz had in the "HH" category came in his last ten starts of the season, a sign that the little lefty was simply gassed. And, of course, he never came close to duplicating his 1952 performance.)

That leads us to the three might-have-been MVPs--Guidry, Martinez, and Greinke. When we look at the basic data, we start to get a sense that something is out of order with one of these guys. Whereas nobody in the original CYA/MVP list shows up with a "T" score higher than 5.1 or a QWP lower than .680, all of a sudden we have one "legendary" season--Greinke's 2009--that looks more than a little pekid.

We'll get back to that in awhile. What's clear from the rest of the data above (and in the associated QMAX range data) is that Guidry was right in the pocket for the double trophy.

Pedro Martinez: in 2000, that upwardly-pointed finger
gave him direct access to whatever celestial deity
floats your boat...
And Pedro's 2000 season, despite being lower in wins (18-6 vs. 23-4 in 1999), is the truly killer year for him, with only Gibson's 1968 being in its gunsights. He holds the record for all of the QMAX range categories (save the "TJ" and "PP" ranges, shown in green because they are most descriptive and not a direct measure of performance).

Now you may be wondering why Pedro's basic QMAX score (3.79) is higher than Gibson's (3.68).  The answer is that there's a difference in the winning percentages for each QMAX cell in 2000 than in 1968. That difference across the entire matrix means that the value of a 1,1 game is a few points of WPCT higher in 2000 than was the case in 1968. Thus QMAX is also era-adjusted (taking away another of the original objections that surfaced when this method was introduced in the mid-90s). While the basic S, C, T numbers do not change, the "win values" for the cells do fluctuate from year to year. The effect is not as striking as some of the other "sabe-centric" adjustments that have become popular, but it does correct for run-scoring levels appropriately.

But let's get back to Greinke. This is a guy whose 2009 season has been spoken of in hushed tones by an large coterie of baseball folks. An adjusted ERA (ERA+) of 205, for Crissakes. How can he be showing up so poorly in comparison to the rest of these guys? Surely that means that QMAX is full of it, nicht war?

Sorry, the answer is "nein." Remember that we've always pointed out that QMAX is a probabilistic system. While it does correct for XB/H in the "S" value, it doesn't attempt to be as precise as what all the other systems do when they simply start with runs. What QMAX does is tell you with all reasonable conditions controlled for and with baserunner strand rates assumed to fall within relatively narrow range parameters, this is what you can expect from the hit and walk prevention figures that it computes.

So why does it vary so much with respect to Greinke? Well, there are some odd aspects to Zack's 2009. It turns out that Zack had an incredibly hot start (0.94 ERA and 8-1 record in his first ten starts) and a blistering finish (5-0 and a 1.29 ERA over his last eight starts). In between those two streaks, however, he was just about a league average pitcher (3.66 ERA and a 3-7 record).

Let's look at the QMAX range data for Zack as it maps out for the 18-game "Buddha" period in 2009, and for the 15-game "Bubba" period. Remember, Greinke has a 1.29 ERA in that first group, a figure that should by all rights be producing a sub-4 QMAX "T" score a la Gibson and Pedro, but instead is coming out in the high 4's--great, but nowhere near the "godhead" level.

In the other 15 games, he's a very hittable pitcher with good control, and he's just barely more than a .500 pitcher. This evaluation is supported by the range data, which shows that Greinke had 55% of an historic season in 2009, and 45% of a season where he was--in the immortal words of Zbigniew Bzrezinski: "Meh."

The Bzrezinskian "grand chessboard" might be an excuse for the 
strategic deployment of dirty bombs, but it's no match for the geopolitical 
pitching intricacies as they are laid out in the QMAX matrix chart...
But so what, you say? The adjusted ERA is what matters, right? And the fact that his other measures--his BABIP, for example--doesn't point to great luck? Who cares? You don't even want to use runs in this crazy system!

Ah, but that's the point. A counter-intuitive system needs to have something that it uses as a fulcrum. FIP-based stats use BABIP as that fulcrum, pretending that it's random enough to operate that way, preferring to believe that the slice of data that it uses is somehow sufficient to leach out all the "fielding luck."

In QMAX that fulcrum is the strand rate, which can either be measured directly or can be simulated (just as BABIP acts a proxy in that system for focusing only on balls in play) by looking at the difference in batter vs. pitcher OPS in general and a key subset of that measure--batter vs. pitcher OPS with men on base.

You want to argue as to which one is more "valid"? Are any of these measures necessarily more "valid" than any others? Take a shot at it. Take your best shot. What might just be mind-opening is to take a look at the deviation in those two batter vs. pitcher OPS figures and see if they point in one particular direction, and whether Greinke might just have had one of the most aberrantly brilliant seasons in baseball history.

The chart at right shows those figures for the twelve pitchers we've looked at in this study. As you can see, while there are some differences, the major trend for these pitchers with respect to batter vs. pitcher OPS is for that figure to rise when men are on base. There are three exceptions to this: Guidry, Pedro, and--

Oh my god. Look at that outlier for Greinke. The average differential for this comparison over all of MLB in 2011 is -3.4% (.780 OPS with men on vs. .754 OPS overall).

Greinke's batter vs. pitcher OPS with men on base in 2009 was about 20% better than it was overall.

That is how his ERA ended up being so low for the year. It was a year of pitching unconsciously when men were on base. This is the textbook extreme of how one maximizes performance elements into a season of overachievement.

Sure, we can give Greinke credit for his great and sustained clutch performance over 2009. And let's face it, the fact that he was in Kansas City that year made it certain that he would get adulation from the sabe-centric world. Joe Posnanski, maybe the key fence-straddler in the mainstream media with his ties to Bill James and his folksy, aberrantly experimental writing forays--and his unique access to the socially challenged Greinke--made absolutely sure of that. Quirky anti-folk hero? Check. Struggling small market team? Double check. All "sympathetic trope systems" were "go." Pos's brilliant proselytizing for Greinke, timed to the tail end of his first hot streak, opened the door for many other sportswriters, and there was undoubtedly a domino effect when it came time for the CYA voting. Zack didn't hurt his chances by getting white hot down the last six weeks of the season, either.

Greinke in 2009: a dominance created by a fast start, 
historic clutch pitching and a brilliant tout 
from Joe Pos...not from his hit prevention!
But what QMAX tells us is that it was a glorious fluke season of overachieving greatness, propelled by a stat that's much more of an outlier than any of the BABIP data in the "fielding independent" cigar box.

And you will note that Greinke has never come close to doing that again, either before or since.

In fact, usually the greater the pitcher, the closer these two OPS figures cling to one another.

QMAX assumes that this will be the case. Therefore, it stubbornly--and correctly--suggests that Greinke's true value in 2009 was a good bit lower than what the ERA/ERA+ (and WAR data) suggests..

Now this isn't going to help get QMAX accepted in the little sabe-centric world. They like their myths, especially when the myth looks empirically, walks empiricially, and talks empirically like a duck. After all, that 205 ERA+ is real: it really happened. The fact that it may be part of a different type of illusion, one that has yet to recognized, is unlikely to register at this time. The underlying feeling from this piece for many will be that we're trying to "take down" Greinke--who, having moved to another midwest franchise that actually made its move into the playoffs, is still benefitting from the original halo effect. And that's not going to sit well.

But keep in mind that Greinke has never come close to that season. His next best ERA+ to that 205 in 2009 was 126 the previous year. The fact that his QWP was actually only .640 in 2009 tells us that he really hasn't fallen as far from his "peak" has is commonly thought. His .566 QWP in 2011 could well be  his true level--unless he goes unconscious with men on base again.

But chances are that he'll always be thought of as someone who showed a singular glimpse of unalloyed greatness. After all, that's kinder and gentler (and more in keeping with this most emprirical of myths) than calling it a "fluke."

How about we just agree to call it the "greatest and grandest fluke in the history of baseball"? That sounds more positive, to be sure--and it also happens to be the stone cold truth.