Sunday, May 29, 2022

STATCAST: THE ANGLE OF VELOCITY & OTHER HIDDEN AGENDAS

[The Dodgers and Giants resume their 1962 showdown tomorrow. They both had two days off on May 28th and May 29th as they traveled east to continue slapping around the Phillies and Mets. We hope you'll join us for our continuing coverage...]

Meanwhile, let's spend time today grappling with the data behemoth that's issued forth from the problematic mind-meld we like to call the Tango Love Pie™. Armed and dangerous is a loaded phrase in every sense in present-day America (and we apologize for the pun), and the personage designing the massive Statcast data set--which tracks tons of details many of us in the baseball analysis community have clamored for since the 1990s--is at best a mixed blessing for the profession. 

There is useful knowledge to be gained from Statcast, but it's not getting out to the public in a way that doesn't also create more conceptual confusion; sadly, that seems to be the strategy behind not creating a more definitive path through the data. The good news is that the data has been made available to the public, and for that we are grateful--there are too many stories floating around about private data modeling work being conducted by the analytics folks now employed by most of MLB's franchises. But the bad news is that the data is being characterized in ways that have accelerated strategic changes in the game--and this has led to unforeseen consequences that can be traced back to a particular analytical mindset...one which is, unfortunately, embodied in Statcast--in how it's set up and how it is both articulated (and not articulated) to the public.

We've spent some time with the data, and there is definitely interesting info to be gleaned there--so long as the interpretation isn't slanted to a hidden agenda. We'll try to walk you through what we see as being the key takeaway from the combined measures that have been the most ballyhooed within the Statcast data set: exit velocity for a batted ball and the launch angle off the bat when it makes contact.

So let's get right to the bi-directional chart summarizing more than half a million batted balls since the launch (pardon the pun, again...) of Statcast in 2015...

You're saying, OK...and you wouldn't be wrong. The first thing to note is that we've rolled up the results for ranges of exit velocities (which run across the page in rows, beginning with the hardest-hit balls--those in excess of 100mph--and ending with the softest-hit...the group marked ≤79mph. Those have then been broken out into the launch angle of the ball off the bat as captured for every batted ball since 2015.

When you see insanely high BA and SLG data (and you'll see it immediately in the 100+ mph section), remember that these numbers measure only at-bats where the ball is hit. There are no strikeouts or walks or hit batters in this data set. What you have here is the 65% of the plate appearances where the at-bat ends with the batter making contact with the ball.

To orient you further about the exit velocity ranges--particularly in terms of how frequently they occur relative to each other--we direct you to the darker blue boxes near the right-hand edge of the table. There you can see a box for our four exit velocity ranges that quantifies the percentage of total at-bats found in each of them. Just to walk you through it: note that 24% of all batted balls are hit with an exit velocity of 100+ mph; 30% are hit with a velocity between 90-99 mph; 21% create an exit speed of 80-89 mph; and 25% are what folks would likely call "soft contact" (≤ 79mph).

The summary data shown in white type on the brown background (the rightmost column in the table) tells us that BA/SLG is, naturally enough, at its highest when the batter hits the ball the hardest. We also provided the percentage of these hits in each range that were XBH (doubles-triples-homers) and which were HRs: you can see that the percentage of XBH descends down the velocity ranges, going from 55% at 100+ mph to 34% for 90-99 mph; then, down to just 17% for 80-89 mph, and finally to just 9% at ≤79 mph. 

Now, when you go over and look at the percentages of XBH and HR in the 100+ mph range, you may be confused by the incredibly high percentages we see there in the middle of the table--where certain launch angles produce 90% homers and 98-100% XBH. How can those numbers be so high but the overall number for each only be 55% XBH and just 28% HR? 

The answer is in the italicized numbers in red that move across the top of each data section. For 100+ mph, the data subsets that produced the least fly balls are the < -1 (grounder) and 0-10 launch angles, which also happen to represent nearly half of the at-bats in the 100+ mph data set (47.5% to be exact: add 21.8% and 25.7%). If you look at the left two columns and examine the italicized numbers in red all the way down, you'll see that the percentages are very consistent all the way down the exit velocity ranges. Grounders and low-mid liners account for about half the plate appearances with batted balls. Generally speaking, the higher the launch angle, the fewer instances there are. 

The areas highlighted in orange show us the combinations of exit velocities (EVs) and launch angles (LAs) that produce XBHs/HRs, but note that as the launch angle increases, the lower the BA/SLG becomes. Too much loft means the ball is not going to clear the fence or hit the wall--it's going to get caught. And in the 80-89 and 90-99 mph ranges, you can see by the cells in light blue just how much those come into play. Another way to quantify it: guaranteed fly ball outs ( the cells where you find sub-.200 BAs) occur only about 2% of the time on balls with EVs of 100+ mph, but it happens about 25% of the time for balls with 90-99 -ph EVs. For balls with 80-89 mph EVs, that figure goes up to just under 40% of the time.

Two more "trajectory zones" should be noted here, shown with a bold line drawn around them where they occur on the left side of the table. The one furthest left is the "grounder zone," (the row for  the < -1 angle...which, strictly speaking, is not really a "launch angle" at all). In this zone, the harder the ball is hit, the more likely it is to be a hit, but almost always a single, as the XBH% show (reading down: 7%, 9%, 10%, 5%). The next zone, which kind of looks like the panhandle of Texas, is the "line drive zone." If you've looked at stats breaking out hitters' BAs on line drives, you'll see that these numbers look extremely similar to what you'll see in that other breakout. These have higher XBH% associated with the hits (ranging roughly from 12-35%), but still no homers.

The final numbers that will orient you to all of the nuances of the data sample are found in the columns next to the AVG column at the far right of the table. These show the percentage of HRs and XBH from the entire sample as they occur in the EV ranges. To follow those through, 84% of all HRs hit from 2015 to the present with measured EVs have occurred when that EV is 100+ mph. 16% occur with an EV in the 90-99 range; less than 0.1% of all HRs occur with an EV below 90 mph. (It's almost enough to make you want to quote those EVs mindlessly about individual HRs in game summaries, isn't it?)

Those Tango Love Pies remain radioactive for years!
For XBH, it's not quite so monolithic--only 64% of XBH have an EV of 100+ mph. A lot of doubles and triples are hit a bit less hard--26% in the 90-99 mph range, 6% in the 80-89 mph range, and 4% in the ≤79mph range (these are likely the "fly ball doubles" that you occasionally see...now you know that, in this instance, "occasional" means 4%). 

THIS is a lot to absorb, so we're going to shove off for now to let your brains cool down--but we'll return to this data a bit later to discuss and demonstrate the limitations of its actual utility. It has many fascinating components, but it's not material that will move us toward a solution of what plagues baseball at this moment in time. The best intellectual exercise that that the would-be data hierophant we call the Tango Love Pie™ could do for us is to use a table such as this one to reverse-engineer what this data would have looked like in the 1920s and 1930s...that would be a data model that could actually have some use as a target for how to modify the game in ways that would restore its long-lost diversity. Sadly, the arrow associated with the not-so-hidden agenda contained within data sets like Statcast seems to point in the opposite direction.