Wednesday, February 6, 2019


Most of you know that Bill James has his own site, where he and a crew of significant lesser lights crank out content, most of it driven by Bill's wormhole-like gaze into the desiccated remains of sabemetric inquiry. Thanks to the continuing double-dribble of the Tango Love Pie™ (indisputably the Donald Trump of neo-sabermetrics, who's worked hard to pre-empt James in subtle--and not-so-subtle--ways), the "dialogue" between Bill's pet method for assigned a "single number" to player performances (Win Shares) and the tortured, mangled, compromised Wins Above Replacement has resulted in a series of tragic, retrograde "linkages" between concepts now wielded more as political tools rather than vessels of actual insight.

Recently Bill created a new Hall of Fame "projection system" which purported to bridge the "gaposis" between these two "tools of overreach" by force-fitting them, Jay Jaffe-style, into a silly set of gradations that ignored the essential incompatibility of the two methods. The results added little to nothing to currently existing approaches to that knotty problem, which (as noted last time) requires a lot more nuance than any of these fine feathered folk are willing to pro for vide to it.

But that's NOT where we're going here, actually. We're here today to address a totally different topic--a side issue that stemmed from Bill's most recent effort to elasticize his pet method (Win Shares) into a slippery new-old vernacular (a refashioned variant of "winning percentage"). Critiquing that unfortunate effort--which, like so much of what Bill works on in his dotage, needs much more space than he's either willing or able to devote to it--will have to wait for another time, but suffice to say that it's yet another unfortunate concession to modeling imperatives that produces no actual/practical result with genuine utility in the real world (and by that we also mean the current "real world" inside baseball itself, which is increasingly beset by measures that lead us down multiple rabbit holes simultaneously).

In the midst of that effort, Bill--as he's often wont to do--suddenly shifted gears and moved from theory to colloquialism (a writing strategy that he's over-employed to the point where it's long since teetered over into self-parody; folks are so used to this writerly tic of his that they've become unable to distinguish between legitimate and gratuitous uses of this furtively defiant sleight-of-hand).

In this instance, the subject shift was to the Yankees' rookie third baseman Miguel Andujar, who had an impressive power year (even in the context of another semi-absurd "power year" for baseball) as a 23-year-old in 2018: 27 HR, 47, doubles, .527 SLG. These are all solid numbers: his OPS+ is also solid at 126. All very respectable, even if he's overly aggressive at the plate (just 4% base-on-balls percentage, and an extremely low OBP/SLG ratio), a factor that often retards further offensive development.

Bill, however, decided to enter into his "blurt mode" once he'd encountered Andujar's name on his team-by-team "winning percentage" lists. He gushed out a statement to the effect that Andujar could hit 400 lifetime HRs. He offered no accompanying context whatsoever. When challenged by several skeptical readers, he added nothing but his own first-hand observation of Andujar and the admonition that he trusted his own judgment more than those who challenged his assertion.

Of course, we've all become used to such discussion in "chat" situations by now; by the same token, Bill's use of social media has, over time, become aggressive to the point of bellicosity (an unfortunate sign of the times we live in that's difficult for anyone to successfully avoid). But rather than dwell on that issue, it suddenly became clear that a better approach would be to find how to place that remark into some kind of actual historical context.

And that led to an effort to concoct, just as Bill has done himself on so many occasions, an intriguing little jackleg study that could put Andujar's 23-year-old 2018 season into a perspective capable of generating a range of lifetime HR predictions. To do so, it was also desirable to avoid the overused and not very reliable chestnut of projection tools that Bill had developed back in his pre-dotage days: the Brock2 system (or Brock6, or 201.1, or whatever "final version number" it had stalled at back in the days of yore before it wound up in the lost universe of floppy disks.) If Bill had dusted it off for such a projection, it probably would have produced a lifetime 400 HR projection for Andujar; but given its set of assumptions and the almost comically optimistic results it produces, it clearly makes sense to omit any reference to it.

No, there had to be a better way than that. And so, after quickly soaking my head (at last following the kind long-term advice from such a sizable plurality of you...) I struck upon a way to do. The control study would be to create a list of 23-year old hitters with 20+ HRs and with an OPS+ tightly in the range of what Andujar had posted in 2018. (Since his was--remember?--126, the search range was 125-129.)

As you can see in the table below, this actually produced a robust little list of players--17 in all, beginning with Harlond Clift in 1936. Adding to the seasonal data for each player, we focus on HRs hit prior to age 24 (15 of the 17 on the list had played at least a significant portion of a season or seasons as a younger player), followed by HRs hit from age 24 on.

This data permits us to create an average expected career HR total for the group, factoring in the highest achievers (Jim Thome, with 20 HRs in 98 games during his age-23 year in 1994, who wound up with 612 lifetime HRs; Andruw Jones, with 36 HRs in this fifth major league season at age-23, whose career total was 434) to those who flamed out (Ellis Valentine, Tommie Agee, Carlos Baerga, Billy Butler, Nate Colbert, Clift, and Cesar Cedeno, all who wound up with less than 200 lifetime HRs). It was clear from this initial list that Andujar was in a group that would produce far fewer than 400 HRs--35% fewer, in fact: the group's average lifetime HR total was 257.

But looking at that list again, and adding some more precise parameters, we can refine our projection in a way that better takes into account Andujar's power profile.

We like to measure "power profiles" by using a stat we call ISOBA. (You might recall it from earlier posts: ISOBA measures the ratio of isolated power to batting average. The higher it is, the more power-based the hits that are being produced.)

Andujar's ISOBA is .774, while most of the players on the list (including those with less than 200 lifetime HRs) had a significantly lower ISOBA value. So, to better predict Andujar's likely career HR total, we need to remove those players (Valentine, Agee, Butler, Baerga, Vern Stephens, Ron Santo).

We added two layers of refinement: first, a lifetime HR projection for the players on the list with ISOBA higher than .700, and second, a subsequent projection using only those players whose ISOBA was higher than Andujar's (exit Clift and Cedeno). The projection range calculated for these two groups is shown above in two locations: in the boxes as the far right of Andujar's stat line, and again in the "HR 23+" column (down at the bottom where the numbers are displayed in a light green background).

What they suggest is that Andujar's likely range for his career HR total is between 274 and 317.

So is Bill is right that Andujar "could" hit 400 HRs. Two players on this list--Thome and Jones--did so. Of course, Andruw had an 80 HR head-start from breaking into MLB as a teenager, while Thome proved to be a bonafide Hall of Fame slugger. Both players showed significantly more strike zone judgment at this age than Andujar has (12% for Thome and 8% for Andruw, as opposed to just 4% for Miguel) and this is a factor that can't be discounted as a likely retardant for sustained career success.

But Bill would have been much more comfortably connected to the level of rigor that people assume exists in his work if he'd revised his statement to say that there's a good chance that Andujar will hit 300+ HRs.

So why did he opt to go with the 400 figure? Was there actually something else behind that number, which just appeared out of thin air and reads like one more of his patented "fanboy excited utterances" (you know, like "Don Mattingly: 100% baseball, 0% bullshit": sheesh...) used as his preferred form of stylistic lubricant? Apparently not: his responses to skeptics in the subsequent chat sequence do not open any doors with respect to a viable rationale...

One final note, not related to Bill's "excited utterance." Note that the player on the above list with the lowest total of offensive WAR in his age-23 year is Jim Thome--who happened to wind up with the most lifetime HRs of anyone on the list. That's because even "offensive WAR" is subject to distortion, befouled by positional adjustments and insufficient attention to rate stat and league-relative valuations. It's untrustworthy as a gauge of overall and/or future value: in short, it's as much of a hot mess, honey, as a radioactive Tango Love Pie™ left randomly out in the sun...