Applying Data Science to Predict MLB Pitching Patterns


mlb analytics booz allen hamilton

mlb analytics booz allen hamilton

All right, you’re an MLB manager and your team is down 1-0 facing your division rival’s ace in game 158 of a grueling 162 game season. Your team is batting in the sixth inning of the first game of a pivotal four game series where the series winner will advance to the playoffs. Your pitcher just reached first on a fielder’s choice when attempting to lay down a sacrifice bunt to move your eighth hitter – who had singled to lead off the inning for your team’s first hit of the game – from first to second.  So now your starting pitcher is on first base with one out.  Do you try to take advantage of the catcher’s below average caught stealing percentage and pinch run for your pitcher in an attempt to steal your way into scoring position all the while knowing your bullpen is spent from the previous day’s game that ended up going 13 innings?

Decisions, decisions.

It is situations like this where all sorts of variables come into play and MLB managers have to be odds calculating machines that incorporate statistics and years of baseball intuition into making decisions.

But what if I told you that there was a 74.5% percent chance that you could correctly predict the pitcher’s next pitch type? Would that change how you handle the previous situation? For an MLB manager, the ability to predict a pitch with that type of accuracy might just provide the statistical competitive edge that they were looking for to make that crucial decision.

One particular company is looking to provide this potential gold mine of predictability for the baseball world.

Booz Allen Hamilton, a management and technology consulting firm that recently celebrated its 100th anniversary, has started to turn its data science capabilities towards predicting pitcher behavior.

To be clear, data science is the process of drawing actionable insights from data for organizations, which allows decision-makers to look beyond the underlying information. If this shouts MLB manager to you then you are right on and clearly still have your managing hat on from the original scenario.

Booz Allen’s Strategic Innovation Group typically provides clients in the federal and commercial sectors with technology focused solutions for success. So given the amount of data generation in sports and their knowledge from other verticals, the space is a natural fit to expand their data science capabilities.

Joel Bock and Andrea Gallego from Booz Allen’s Strategic Innovations Group, have developed statistical models of pitcher behaviors and tendencies using pitch sequences thrown during the 2011-2013 MLB seasons. The goal of their model was to be as accurate as possible in predicting the next pitch type. Their predictions for each particular pitcher were based on data available at that moment for each at-bat.

They used Justin Verlander as an example to explain how independent models were developed to predict a pitcher’s most frequently thrown pitch:

“Consider for example Detroit Tigers pitcher Justin Verlander. His most frequent pitches, in order of frequency, are: a four-seam fastball (FF), changeup (CH), curveball (CU) and a slider (SL). From historical data, we train models to predict the likelihood of a batter seeing FF, CH, CU or SL as his very next pitch. This is done over a multitude of batters and various game situations.”

With their model, they were able to have an overall predictability of 74.5% for a pitcher’s next pitch. This is a very significant accuracy percentage and is a great example of how a company typically focused on data science can turn their concentration towards the sports world and potentially change the way a MLB game is played and managed.

A system architecture for real-time, in-game analytics using this pitch predictability model from Booz Allen is currently under development.

This predictability of pitcher behavior is just one of numerous examples demonstrating the exciting potential that data science can have on the sports world. However, it should be noted that the ability to predict a pitcher’s next pitch or finding patterns in data that provide other valuable insights into the game will never replace a manager’s decision making. It will simply supplement it.

As always, knowledge is power and the more knowledge that a team can gain from using data science from companies like Booz Allen the more they can make their franchise more efficient and successful. For example, Booz Allen believes that the right data provides the ability to measure player performance as well as a team’s reliance on a particular individual for success. This information could subsequently impact the business side of an organization and inform contractual negotiations, as the data doesn’t lie.

Additionally, player safety, and thus franchise investments, can be improved and players can be given the best chance to stay in the lineup. By analyzing player biometrics, weather, field conditions, geographic location and other measurables, data scientists can break down the factors of an in-game injury and better determine the needed treatment and preventative measures that can be taken to mitigate future injuries. Freak injuries will never be prevented but cutting down on nagging injuries and putting players in the best situation to stay healthy is extremely valuable to teams when considering how much MLB players are paid.

Booz Allen Hamilton is currently pursuing opportunities with MLB teams this season and has provided the league with its technical expertise leveraging the firm’s government experience in developing command centers. The data Booz Allen analyzes should be seen as a cost-effective learning tool for MLB management, and the impact it will have on team performance is just beginning to emerge.