Tuesday, June 14, 2011

GEE, THIS CAN'T CONTINUE...RIGHT?

Dillon Gee (Hong Kong crossover actor or porn star?) is 7-0.

Now, no, he can't be Abb Vaughn (go back aways and you'll get the reference...) but he is one of the few genuinely bright lights in the New York Mets' rather soggy 2011.

There is a lot of skepticism out there in the over-numerated world of baseball anal-y-sis concerning Dillon, however. It seems that he's living some kind of charmed life according to a series of "advanced metrics":

--Fangraphs shows that his FIP (fielding independent pitching, a formula that supposedly measures the portions of a pitcher's performance not colored by the fielders who surround him) is a good bit higher than his actual ERA.

--He's giving up far fewer HR than would be expected given his performance in the minors.

--He's having the luck of ten Irishmen (actually 9.68, but what's one-third of an Irishman among friends??) with respect to line drives: according to the data at Forman et fils, the league is hitting only .484 when hitters hit liners against him, as opposed to the league average of .718.

On the basis of this type of data, there's actually a sportswriter in New Jersey who suggested that the Mets trade Gee now while he's hot. (Perhaps the ultimate endorsement for the "wins mean nothing" approach, but somehow eat cake and having it, as it assumes that insiders--saturated with numbers-numbers--can't quite grasp the concept and will fall for the sabermetric variant of the Indian rope trick.)

What's clear from the rest of the data is that Gee is getting great run support (5.7/g) and he's pitching extremely well in his home ball park (1.77 ERA thus far.) What's equally clear is that he's not going to keep winning at this pace.

What we want to find, however, is as simple as possible a method for looking at young starting pitchers after "x" number of games and see if the data can tell us anything about how their careers will unfold. Dillon Gee has been in just seventeen games (only 14 of which have been starts). What happens when we look at the young pitchers over the past 5+ years (2006-2011) by freezing their stats after 17 games (with at least 10 GS) and see what they've done subsequently?

The big objection, of course, is that it's not enough time to know anything. In the immortal words of Eric Blore: bumblepuppy. Pitchers rise to great heights, or establish a level of major league competence, or regress/keep stumbling around in relatively short order, and things tend to shake out more logically than is often thought to be the case (even when applying "advanced metrics"). And it seems that we can get a handle on this when using only 17 games (yes, we picked it because that's just where Dillon Gee is at the moment) and at least 10 starts.

But how do we come up with a sorting mechanism that isn't already determined by results? Meaning: either by won-loss record (completely discredited), WHIP (mostly discredited), ERA (on the list of things to be discredited) and FIP (which is a "result" proxy, so we don't want to use it for this purpose, we want to let someone compare it to an approach from a different set of assumptions).

Well, we do it by taking some basic rate stats that are considered to be significant these days, such as K/9, K/W, and adding back in WHIP (eyebrows rise in consternation even as we type this), plus a few more arcane but easily computed stats such as HR/IP, and what back in the days of BBBA we liked to call POW (K/H).

If you read the text carefully, this reference will make
"sense"...if you don't and it doesn't, just dig Neo's cape 

and practice your kewpie-doll face in the mirror....
If you want to try the formula at home, first get some additional insurance, and then put this into Ye Olde Pipe and smoke it:

((K9*KW*POW)*(.3*IPHR))/WHIP

Yes, it's kind of a bathtub gin version of TangoTiger's construct, only it's regression-free and contains 50% less fat than the usual tub of theoretical goo that's in the refrigerated section of a neo-sabe's ideological shopping cart.

To make a long story short, we take this value and matricize it against the WHIP value. Why do we do that? Because we want to, that's why. No, actually, it's to create a set of gradations based on the comparison of the most basic non-run-based measure (WHIP) with a set of values that try to simulate a variant of the "fielding independent" suite of stats. Once we do that, we can break each group out into quintiles, and see what we get.

And what we get from this is kind of interesting. Looking at the chart of pitchers from 2006-2011 in their first 17 games (with at least 10 of them GS), we see a virtually linear descent of career evolution that radiates into the future. While it's not by any means perfect, it's still surprisingly predictive of who will be really good, who will be fair to middling, and who will be borderline at best (we like to call it "Meh", as you'll see below).


The pitchers who begin their careers with solid WHIP built into their "XK" stats (the formula designed to mimic the fielding independent data by emphasizing K/9, K/W, and the ability to keep the ball in the park) tend to roll onto a very solid level of performance (read the names in red--a few of these could probably be moved around, but it seems like a reasonably viable list...you can quibble about Dice-K if you want, but in his first two seasons with the Sox he was quite a successful pitcher. If you want to keep quibbling, we can just take the older pitchers and the Japanese pitchers out of the data and it really doesn't affect the results.)

There does seem to be a significant anomaly here, of course--some types of pitchers come up and struggle before making a big leap forward. Interesting how many of the guys over in the "5" column who turned it around despite semi-inauspicious beginnings happen to be lefties.

Rick Reed: the same number as Gee, but not really
the same guy...
So what about Dillon Gee? That's what all this hoo-hah was about. Well, he's in the 3,1 box. That's a place where pitchers do go on to solid success, but it's not a slam dunk. The question that many are asking is whether Gee's pitching profile, pitch selection strategy and execution has evolved since reaching the big leagues. The short form of that question is being stated thusly: is Dillon Gee the second coming of Rick Reed?

The answer: no, he's not. He doesn't have Rick's pinpoint control. He can probably strike out more guys if he keeps his change-up in good form. If he keeps the ball down, he can keep the ball in the park (something that Reed didn't really do).

Can he do all that? Signs are promising that he just might. He's pitched well against good teams (2.92 ERA), but he hasn't faced all the good hitting teams in the NL as yet. There is going to have to be some level of regression due to that low line-drive OPS compared to the league, but the effect of that might not be all that dramatic if he can avoid feeding his gopher.

(We're going to go back and test this method with earlier pitchers; it will probably show up here at some point...when you least expect it, of course.)