Chapter 6: Statcast Analytics - Hitting

In 2015, Major League Baseball underwent a revolutionary transformation in how we analyze the game. Statcast, a state-of-the-art tracking system, was installed in all 30 MLB ballparks, fundamentally changing our understanding of baseball performance. The system combines two technologies: **TrackMan doppler radar** (for ball tracking) and **ChyronHego cameras** (for player tracking).

Intermediate ~11 min read 11 sections 33 code examples
Book Progress
13%
Chapter 7 of 54
What You'll Learn
  • The Statcast Revolution in Hitting
  • Exit Velocity Deep Dive
  • Launch Angle Deep Dive
  • Barrels
  • And 7 more topics...
Languages in This Chapter
R (10) Python (23)

All code examples can be copied and run in your environment.

6.1 The Statcast Revolution in Hitting

6.1.1 What Statcast Measures

In 2015, Major League Baseball underwent a revolutionary transformation in how we analyze the game. Statcast, a state-of-the-art tracking system, was installed in all 30 MLB ballparks, fundamentally changing our understanding of baseball performance. The system combines two technologies: TrackMan doppler radar (for ball tracking) and ChyronHego cameras (for player tracking).

For hitting analysis, Statcast captures an unprecedented level of detail on every batted ball:

  • Exit Velocity: The speed of the ball as it comes off the bat (measured in mph)
  • Launch Angle: The vertical angle at which the ball leaves the bat (measured in degrees)
  • Hit Distance: The projected distance a batted ball would travel
  • Hit Direction: The horizontal spray angle of the batted ball
  • Hang Time: How long the ball remains in the air

For player movement, Statcast tracks:

  • Sprint Speed: A runner's maximum speed on competitive plays (measured in feet/second)
  • Home to First Time: Time elapsed from contact to reaching first base
  • Baserunning Routes: Efficiency and path optimization

This wealth of data has enabled the creation of expected statistics (xStats) - what a player's outcomes should be based purely on contact quality, independent of defensive positioning, park factors, or luck.

6.1.2 Key Hitting Metrics Overview

Here's a comprehensive table of the most important Statcast hitting metrics:

MetricDefinitionMLB AverageElite ThresholdWhat It Reveals
Exit Velocity (EV)Speed off bat~89 mph93+ mphRaw power and contact quality
Max Exit VelocityHardest contact~110 mph115+ mphPeak power capability
Hard-Hit Rate% of batted balls ≥95 mph~35%45%+Consistency of hard contact
Launch AngleVertical angle~10-15°Varies by approachBall flight trajectory
Sweet Spot %% hit at 8-32°~33%40%+Optimal contact rate
Barrel %% of "perfect" contact~6-8%12%+Elite contact quality
xBAExpected batting avg.245.280+True contact quality
xwOBAExpected wOBA.320.360+Comprehensive hitting value
Sprint SpeedMax speed (ft/s)~27 ft/s30+ ft/sAthleticism and baserunning

These metrics form the foundation of modern hitting analysis. Unlike traditional statistics that only tell us what happened, Statcast metrics tell us how it happened and often what should have happened.



6.2 Exit Velocity Deep Dive

6.2.1 Understanding Exit Velocity

Exit Velocity (EV) is perhaps the single most important Statcast metric. It measures the speed of the baseball immediately after contact with the bat, before air resistance affects the ball's flight. Think of it as the "power" measurement - harder hit balls are more likely to become hits and extra-base hits.

The physics is straightforward: exit velocity is determined by three factors:

  1. Bat speed: How fast the bat is moving at contact
  2. Pitch velocity: The incoming speed of the pitch (energy transfer)
  3. Contact quality: Where on the bat and ball the collision occurs

The relationship between exit velocity and outcomes is remarkably strong:

  • < 85 mph: Low probability of becoming a hit (~.100 BA)
  • 85-95 mph: Moderate hit probability (~.250 BA)
  • 95-105 mph: High hit probability (~.500 BA)
  • 105+ mph: Very high hit probability (~.700+ BA)

The MLB average exit velocity hovers around 88-89 mph on all batted balls. Elite power hitters consistently average 92-94 mph, with the very best reaching 95+ mph. As of 2024, players like Aaron Judge, Giancarlo Stanton, and Kyle Schwarber regularly lead in average exit velocity.

6.2.2 Exit Velocity Metrics with Code

Let's explore how to calculate and analyze various exit velocity metrics using both R and Python.

Python Implementation

import pandas as pd
from pybaseball import statcast_batter, playerid_lookup
import numpy as np
from datetime import datetime

# Get a player's ID (Aaron Judge example)
judge_id = playerid_lookup('judge', 'aaron')
player_id = 592450  # Aaron Judge

# Fetch Statcast data for 2024 season
start_date = '2024-04-01'
end_date = '2024-10-01'
statcast_data = statcast_batter(start_date, end_date, player_id)

# Calculate comprehensive exit velocity metrics
def calculate_ev_metrics(df):
    """
    Calculate comprehensive exit velocity metrics from Statcast data.

    Parameters:
    df: DataFrame with Statcast data including 'launch_speed' column

    Returns:
    Dictionary of exit velocity metrics
    """
    # Filter for batted balls only (exclude nulls)
    batted_balls = df[df['launch_speed'].notna()].copy()

    if len(batted_balls) == 0:
        return None

    metrics = {
        'avg_ev': batted_balls['launch_speed'].mean(),
        'max_ev': batted_balls['launch_speed'].max(),
        'min_ev': batted_balls['launch_speed'].min(),
        'ev_90th_percentile': batted_balls['launch_speed'].quantile(0.90),
        'ev_50th_percentile': batted_balls['launch_speed'].median(),
        'hard_hit_count': (batted_balls['launch_speed'] >= 95).sum(),
        'hard_hit_pct': (batted_balls['launch_speed'] >= 95).mean() * 100,
        'soft_contact_pct': (batted_balls['launch_speed'] < 85).mean() * 100,
        'medium_contact_pct': ((batted_balls['launch_speed'] >= 85) &
                               (batted_balls['launch_speed'] < 95)).mean() * 100,
        'batted_balls': len(batted_balls)
    }

    return metrics

# Calculate metrics
ev_metrics = calculate_ev_metrics(statcast_data)

# Display results
print("Aaron Judge 2024 Exit Velocity Profile")
print("=" * 50)
print(f"Average Exit Velocity: {ev_metrics['avg_ev']:.1f} mph")
print(f"Maximum Exit Velocity: {ev_metrics['max_ev']:.1f} mph")
print(f"90th Percentile EV: {ev_metrics['ev_90th_percentile']:.1f} mph")
print(f"Median Exit Velocity: {ev_metrics['ev_50th_percentile']:.1f} mph")
print(f"\nContact Distribution:")
print(f"Hard Hit Rate (≥95 mph): {ev_metrics['hard_hit_pct']:.1f}%")
print(f"Medium Contact (85-94 mph): {ev_metrics['medium_contact_pct']:.1f}%")
print(f"Soft Contact (<85 mph): {ev_metrics['soft_contact_pct']:.1f}%")
print(f"\nTotal Batted Balls: {ev_metrics['batted_balls']}")

R Implementation

library(baseballr)
library(dplyr)
library(tidyr)

# Fetch Statcast data for Aaron Judge (2024)
judge_data <- statcast_search_batters(
  start_date = "2024-04-01",
  end_date = "2024-10-01",
  batterid = 592450  # Aaron Judge
)

# Calculate comprehensive exit velocity metrics
calculate_ev_metrics <- function(df) {
  # Filter for batted balls only
  batted_balls <- df %>%
    filter(!is.na(launch_speed))

  if (nrow(batted_balls) == 0) {
    return(NULL)
  }

  metrics <- list(
    avg_ev = mean(batted_balls$launch_speed, na.rm = TRUE),
    max_ev = max(batted_balls$launch_speed, na.rm = TRUE),
    min_ev = min(batted_balls$launch_speed, na.rm = TRUE),
    ev_90th = quantile(batted_balls$launch_speed, 0.90, na.rm = TRUE),
    ev_50th = median(batted_balls$launch_speed, na.rm = TRUE),
    hard_hit_count = sum(batted_balls$launch_speed >= 95, na.rm = TRUE),
    hard_hit_pct = mean(batted_balls$launch_speed >= 95, na.rm = TRUE) * 100,
    soft_contact_pct = mean(batted_balls$launch_speed < 85, na.rm = TRUE) * 100,
    medium_contact_pct = mean(batted_balls$launch_speed >= 85 &
                              batted_balls$launch_speed < 95, na.rm = TRUE) * 100,
    batted_balls = nrow(batted_balls)
  )

  return(metrics)
}

# Calculate metrics
ev_metrics <- calculate_ev_metrics(judge_data)

# Display results
cat("Aaron Judge 2024 Exit Velocity Profile\n")
cat(strrep("=", 50), "\n")
cat(sprintf("Average Exit Velocity: %.1f mph\n", ev_metrics$avg_ev))
cat(sprintf("Maximum Exit Velocity: %.1f mph\n", ev_metrics$max_ev))
cat(sprintf("90th Percentile EV: %.1f mph\n", ev_metrics$ev_90th))
cat(sprintf("Median Exit Velocity: %.1f mph\n", ev_metrics$ev_50th))
cat("\nContact Distribution:\n")
cat(sprintf("Hard Hit Rate (≥95 mph): %.1f%%\n", ev_metrics$hard_hit_pct))
cat(sprintf("Medium Contact (85-94 mph): %.1f%%\n", ev_metrics$medium_contact_pct))
cat(sprintf("Soft Contact (<85 mph): %.1f%%\n", ev_metrics$soft_contact_pct))
cat(sprintf("\nTotal Batted Balls: %d\n", ev_metrics$batted_balls))

6.2.3 Hard-Hit Rate: The 95 mph Threshold

Hard-Hit Rate is defined as the percentage of batted balls with an exit velocity of 95 mph or greater. This threshold isn't arbitrary - research shows that 95 mph represents a meaningful breakpoint where hit probability increases dramatically.

Why 95 mph matters:


  • Balls hit 95+ mph have a batting average around .500

  • They're more likely to find gaps and fall for hits

  • They're harder for fielders to react to and convert into outs

  • They correlate strongly with power output (HR, XBH)

The MLB average hard-hit rate is approximately 35-37%. Elite hitters consistently post hard-hit rates of 45%+, with the best in baseball reaching 50%+.

Here's code to analyze hard-hit rate trends:

# Analyze hard-hit rate by outcome
def analyze_hard_hit_outcomes(df):
    """Analyze outcomes of hard-hit balls vs. other contact."""
    batted_balls = df[df['launch_speed'].notna()].copy()
    batted_balls['is_hard_hit'] = batted_balls['launch_speed'] >= 95

    # Group by hard-hit status
    outcomes = batted_balls.groupby('is_hard_hit').agg({
        'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean(),
        'estimated_ba_using_speedangle': 'mean',
        'launch_speed': 'mean',
        'launch_angle': 'mean'
    }).round(3)

    outcomes.columns = ['Hit_Rate', 'xBA', 'Avg_EV', 'Avg_LA']
    outcomes.index = ['Not Hard Hit (<95)', 'Hard Hit (95+)']

    return outcomes

hard_hit_analysis = analyze_hard_hit_outcomes(statcast_data)
print("\nHard-Hit vs. Non-Hard-Hit Outcomes:")
print(hard_hit_analysis)
# R version: Analyze hard-hit rate by outcome
analyze_hard_hit_outcomes <- function(df) {
  batted_balls <- df %>%
    filter(!is.na(launch_speed)) %>%
    mutate(is_hard_hit = launch_speed >= 95)

  outcomes <- batted_balls %>%
    group_by(is_hard_hit) %>%
    summarise(
      hit_rate = mean(events %in% c('single', 'double', 'triple', 'home_run'),
                     na.rm = TRUE),
      xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
      avg_ev = mean(launch_speed, na.rm = TRUE),
      avg_la = mean(launch_angle, na.rm = TRUE),
      .groups = 'drop'
    ) %>%
    mutate(across(where(is.numeric), round, 3))

  return(outcomes)
}

hard_hit_analysis <- analyze_hard_hit_outcomes(judge_data)
print("Hard-Hit vs. Non-Hard-Hit Outcomes:")
print(hard_hit_analysis)

6.2.4 Case Study: Exit Velocity Leaders

Let's examine the 2024 exit velocity leaders to understand what elite power looks like:

# Fetch league-wide data for qualifying hitters (sample approach)
# Note: This would typically require aggregating data for all players

def get_league_leaders_ev(year=2024, min_pa=200):
    """
    Get exit velocity leaders for a given season.
    This is a conceptual example - full implementation would require
    iterating through all players or using Baseball Savant's leaderboards.
    """
    # Example data structure for demonstration
    leaders_data = {
        'Player': ['Aaron Judge', 'Giancarlo Stanton', 'Kyle Schwarber',
                   'Yordan Alvarez', 'Marcell Ozuna'],
        'Avg_EV': [95.2, 94.8, 93.9, 93.5, 93.2],
        'Max_EV': [122.4, 121.8, 119.5, 118.9, 118.2],
        'Hard_Hit_Pct': [58.2, 55.7, 52.3, 51.8, 50.9],
        'Barrel_Pct': [18.5, 17.2, 15.8, 15.1, 14.6],
        'xwOBA': [.412, .385, .368, .372, .361]
    }

    df = pd.DataFrame(leaders_data)
    return df

ev_leaders = get_league_leaders_ev(2024)
print("\n2024 Exit Velocity Leaders")
print("=" * 70)
print(ev_leaders.to_string(index=False))

# Calculate the difference from MLB average
mlb_avg_ev = 88.5
ev_leaders['EV_Above_Avg'] = ev_leaders['Avg_EV'] - mlb_avg_ev
print(f"\nMLB Average Exit Velocity: {mlb_avg_ev} mph")
print("\nDifference from League Average:")
print(ev_leaders[['Player', 'EV_Above_Avg']].to_string(index=False))

Key Insights from Exit Velocity Leaders:

  1. Aaron Judge consistently ranks among the top exit velocity producers, typically averaging 94-95 mph
  2. Elite exit velocity correlates with high Barrel% and xwOBA
  3. Players with 93+ mph average EV are almost exclusively power threats
  4. The gap between elite (95 mph) and average (89 mph) is significant - 6+ mph difference represents massive power disparity

R
library(baseballr)
library(dplyr)
library(tidyr)

# Fetch Statcast data for Aaron Judge (2024)
judge_data <- statcast_search_batters(
  start_date = "2024-04-01",
  end_date = "2024-10-01",
  batterid = 592450  # Aaron Judge
)

# Calculate comprehensive exit velocity metrics
calculate_ev_metrics <- function(df) {
  # Filter for batted balls only
  batted_balls <- df %>%
    filter(!is.na(launch_speed))

  if (nrow(batted_balls) == 0) {
    return(NULL)
  }

  metrics <- list(
    avg_ev = mean(batted_balls$launch_speed, na.rm = TRUE),
    max_ev = max(batted_balls$launch_speed, na.rm = TRUE),
    min_ev = min(batted_balls$launch_speed, na.rm = TRUE),
    ev_90th = quantile(batted_balls$launch_speed, 0.90, na.rm = TRUE),
    ev_50th = median(batted_balls$launch_speed, na.rm = TRUE),
    hard_hit_count = sum(batted_balls$launch_speed >= 95, na.rm = TRUE),
    hard_hit_pct = mean(batted_balls$launch_speed >= 95, na.rm = TRUE) * 100,
    soft_contact_pct = mean(batted_balls$launch_speed < 85, na.rm = TRUE) * 100,
    medium_contact_pct = mean(batted_balls$launch_speed >= 85 &
                              batted_balls$launch_speed < 95, na.rm = TRUE) * 100,
    batted_balls = nrow(batted_balls)
  )

  return(metrics)
}

# Calculate metrics
ev_metrics <- calculate_ev_metrics(judge_data)

# Display results
cat("Aaron Judge 2024 Exit Velocity Profile\n")
cat(strrep("=", 50), "\n")
cat(sprintf("Average Exit Velocity: %.1f mph\n", ev_metrics$avg_ev))
cat(sprintf("Maximum Exit Velocity: %.1f mph\n", ev_metrics$max_ev))
cat(sprintf("90th Percentile EV: %.1f mph\n", ev_metrics$ev_90th))
cat(sprintf("Median Exit Velocity: %.1f mph\n", ev_metrics$ev_50th))
cat("\nContact Distribution:\n")
cat(sprintf("Hard Hit Rate (≥95 mph): %.1f%%\n", ev_metrics$hard_hit_pct))
cat(sprintf("Medium Contact (85-94 mph): %.1f%%\n", ev_metrics$medium_contact_pct))
cat(sprintf("Soft Contact (<85 mph): %.1f%%\n", ev_metrics$soft_contact_pct))
cat(sprintf("\nTotal Batted Balls: %d\n", ev_metrics$batted_balls))
R
# R version: Analyze hard-hit rate by outcome
analyze_hard_hit_outcomes <- function(df) {
  batted_balls <- df %>%
    filter(!is.na(launch_speed)) %>%
    mutate(is_hard_hit = launch_speed >= 95)

  outcomes <- batted_balls %>%
    group_by(is_hard_hit) %>%
    summarise(
      hit_rate = mean(events %in% c('single', 'double', 'triple', 'home_run'),
                     na.rm = TRUE),
      xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
      avg_ev = mean(launch_speed, na.rm = TRUE),
      avg_la = mean(launch_angle, na.rm = TRUE),
      .groups = 'drop'
    ) %>%
    mutate(across(where(is.numeric), round, 3))

  return(outcomes)
}

hard_hit_analysis <- analyze_hard_hit_outcomes(judge_data)
print("Hard-Hit vs. Non-Hard-Hit Outcomes:")
print(hard_hit_analysis)
Python
import pandas as pd
from pybaseball import statcast_batter, playerid_lookup
import numpy as np
from datetime import datetime

# Get a player's ID (Aaron Judge example)
judge_id = playerid_lookup('judge', 'aaron')
player_id = 592450  # Aaron Judge

# Fetch Statcast data for 2024 season
start_date = '2024-04-01'
end_date = '2024-10-01'
statcast_data = statcast_batter(start_date, end_date, player_id)

# Calculate comprehensive exit velocity metrics
def calculate_ev_metrics(df):
    """
    Calculate comprehensive exit velocity metrics from Statcast data.

    Parameters:
    df: DataFrame with Statcast data including 'launch_speed' column

    Returns:
    Dictionary of exit velocity metrics
    """
    # Filter for batted balls only (exclude nulls)
    batted_balls = df[df['launch_speed'].notna()].copy()

    if len(batted_balls) == 0:
        return None

    metrics = {
        'avg_ev': batted_balls['launch_speed'].mean(),
        'max_ev': batted_balls['launch_speed'].max(),
        'min_ev': batted_balls['launch_speed'].min(),
        'ev_90th_percentile': batted_balls['launch_speed'].quantile(0.90),
        'ev_50th_percentile': batted_balls['launch_speed'].median(),
        'hard_hit_count': (batted_balls['launch_speed'] >= 95).sum(),
        'hard_hit_pct': (batted_balls['launch_speed'] >= 95).mean() * 100,
        'soft_contact_pct': (batted_balls['launch_speed'] < 85).mean() * 100,
        'medium_contact_pct': ((batted_balls['launch_speed'] >= 85) &
                               (batted_balls['launch_speed'] < 95)).mean() * 100,
        'batted_balls': len(batted_balls)
    }

    return metrics

# Calculate metrics
ev_metrics = calculate_ev_metrics(statcast_data)

# Display results
print("Aaron Judge 2024 Exit Velocity Profile")
print("=" * 50)
print(f"Average Exit Velocity: {ev_metrics['avg_ev']:.1f} mph")
print(f"Maximum Exit Velocity: {ev_metrics['max_ev']:.1f} mph")
print(f"90th Percentile EV: {ev_metrics['ev_90th_percentile']:.1f} mph")
print(f"Median Exit Velocity: {ev_metrics['ev_50th_percentile']:.1f} mph")
print(f"\nContact Distribution:")
print(f"Hard Hit Rate (≥95 mph): {ev_metrics['hard_hit_pct']:.1f}%")
print(f"Medium Contact (85-94 mph): {ev_metrics['medium_contact_pct']:.1f}%")
print(f"Soft Contact (<85 mph): {ev_metrics['soft_contact_pct']:.1f}%")
print(f"\nTotal Batted Balls: {ev_metrics['batted_balls']}")
Python
# Analyze hard-hit rate by outcome
def analyze_hard_hit_outcomes(df):
    """Analyze outcomes of hard-hit balls vs. other contact."""
    batted_balls = df[df['launch_speed'].notna()].copy()
    batted_balls['is_hard_hit'] = batted_balls['launch_speed'] >= 95

    # Group by hard-hit status
    outcomes = batted_balls.groupby('is_hard_hit').agg({
        'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean(),
        'estimated_ba_using_speedangle': 'mean',
        'launch_speed': 'mean',
        'launch_angle': 'mean'
    }).round(3)

    outcomes.columns = ['Hit_Rate', 'xBA', 'Avg_EV', 'Avg_LA']
    outcomes.index = ['Not Hard Hit (<95)', 'Hard Hit (95+)']

    return outcomes

hard_hit_analysis = analyze_hard_hit_outcomes(statcast_data)
print("\nHard-Hit vs. Non-Hard-Hit Outcomes:")
print(hard_hit_analysis)
Python
# Fetch league-wide data for qualifying hitters (sample approach)
# Note: This would typically require aggregating data for all players

def get_league_leaders_ev(year=2024, min_pa=200):
    """
    Get exit velocity leaders for a given season.
    This is a conceptual example - full implementation would require
    iterating through all players or using Baseball Savant's leaderboards.
    """
    # Example data structure for demonstration
    leaders_data = {
        'Player': ['Aaron Judge', 'Giancarlo Stanton', 'Kyle Schwarber',
                   'Yordan Alvarez', 'Marcell Ozuna'],
        'Avg_EV': [95.2, 94.8, 93.9, 93.5, 93.2],
        'Max_EV': [122.4, 121.8, 119.5, 118.9, 118.2],
        'Hard_Hit_Pct': [58.2, 55.7, 52.3, 51.8, 50.9],
        'Barrel_Pct': [18.5, 17.2, 15.8, 15.1, 14.6],
        'xwOBA': [.412, .385, .368, .372, .361]
    }

    df = pd.DataFrame(leaders_data)
    return df

ev_leaders = get_league_leaders_ev(2024)
print("\n2024 Exit Velocity Leaders")
print("=" * 70)
print(ev_leaders.to_string(index=False))

# Calculate the difference from MLB average
mlb_avg_ev = 88.5
ev_leaders['EV_Above_Avg'] = ev_leaders['Avg_EV'] - mlb_avg_ev
print(f"\nMLB Average Exit Velocity: {mlb_avg_ev} mph")
print("\nDifference from League Average:")
print(ev_leaders[['Player', 'EV_Above_Avg']].to_string(index=False))

6.3 Launch Angle Deep Dive

6.3.1 Understanding Launch Angle

Launch Angle measures the vertical angle at which the ball leaves the bat, measured in degrees from horizontal. A ball hit straight into the ground would have a negative launch angle, while a ball hit straight up would be 90°.

Launch angle is categorized into four primary types:

CategoryLaunch Angle RangeExpected OutcomeTypical BA
Ground Ball (GB)< 10°Mostly singles, some outs.240
Line Drive (LD)10° to 25°High hit rate, XBH.600+
Fly Ball (FB)25° to 50°Home runs or fly outs.200-.250
Pop-up (PU)> 50°Almost always outs~.020

Line drives (10-25°) have the highest batting average because they stay in the air long enough to get past infielders but not long enough for outfielders to comfortably track them down. However, they rarely result in home runs.

Fly balls (25-50°) are where home run power comes from. When combined with high exit velocity, fly balls in the 25-35° range become home runs. Without sufficient exit velocity, they become routine fly outs.

Ground balls (< 10°) can be effective for speedy players who can beat out infield hits, but generally result in lower production. The shift era (2015-2022) made pull-side ground balls particularly ineffective.

Pop-ups (> 50°) are almost universally negative outcomes, giving fielders ample time to position themselves.

6.3.2 The Optimal Launch Angle Debate

The "launch angle revolution" began around 2015-2016 when coaches and analysts realized that players were optimizing for the wrong outcomes. Traditionally, hitting coaches emphasized "staying on top of the ball" and hitting line drives. However, Statcast data revealed that slight uppercut swings producing launch angles of 25-35° with high exit velocity were the most valuable.

The Home Run Peak: Home runs are most common with launch angles between 25-35 degrees when exit velocity exceeds 95 mph. This discovery led to a league-wide increase in home runs from 2015-2019.

The Line Drive Counter-Argument: While 25-35° produces homers, line drives (10-25°) still have the highest BABIP (Batting Average on Balls In Play). Players who can consistently hit line drives with authority remain extremely valuable.

The Modern Approach: Elite hitters aim for the "sweet spot" - launch angles between 8-32 degrees - which balances the high BABIP of line drives with the power of fly balls.

6.3.3 Launch Angle Metrics with Code

def calculate_launch_angle_metrics(df):
    """
    Calculate comprehensive launch angle distribution metrics.

    Parameters:
    df: Statcast DataFrame with 'launch_angle' column

    Returns:
    Dictionary of launch angle metrics
    """
    batted_balls = df[df['launch_angle'].notna()].copy()

    if len(batted_balls) == 0:
        return None

    # Categorize each batted ball
    def categorize_launch_angle(la):
        if la < 10:
            return 'ground_ball'
        elif la < 25:
            return 'line_drive'
        elif la < 50:
            return 'fly_ball'
        else:
            return 'popup'

    batted_balls['la_category'] = batted_balls['launch_angle'].apply(categorize_launch_angle)

    metrics = {
        'avg_la': batted_balls['launch_angle'].mean(),
        'median_la': batted_balls['launch_angle'].median(),
        'gb_count': (batted_balls['launch_angle'] < 10).sum(),
        'gb_pct': (batted_balls['launch_angle'] < 10).mean() * 100,
        'ld_count': ((batted_balls['launch_angle'] >= 10) &
                     (batted_balls['launch_angle'] < 25)).sum(),
        'ld_pct': ((batted_balls['launch_angle'] >= 10) &
                   (batted_balls['launch_angle'] < 25)).mean() * 100,
        'fb_count': ((batted_balls['launch_angle'] >= 25) &
                     (batted_balls['launch_angle'] < 50)).sum(),
        'fb_pct': ((batted_balls['launch_angle'] >= 25) &
                   (batted_balls['launch_angle'] < 50)).mean() * 100,
        'popup_count': (batted_balls['launch_angle'] >= 50).sum(),
        'popup_pct': (batted_balls['launch_angle'] >= 50).mean() * 100,
        'sweet_spot_count': ((batted_balls['launch_angle'] >= 8) &
                             (batted_balls['launch_angle'] <= 32)).sum(),
        'sweet_spot_pct': ((batted_balls['launch_angle'] >= 8) &
                           (batted_balls['launch_angle'] <= 32)).mean() * 100
    }

    # Calculate performance by category
    category_performance = batted_balls.groupby('la_category').agg({
        'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean(),
        'estimated_ba_using_speedangle': 'mean'
    }).round(3)

    metrics['category_performance'] = category_performance

    return metrics

# Calculate and display
la_metrics = calculate_launch_angle_metrics(statcast_data)

print("\nLaunch Angle Distribution")
print("=" * 50)
print(f"Average Launch Angle: {la_metrics['avg_la']:.1f}°")
print(f"Median Launch Angle: {la_metrics['median_la']:.1f}°")
print(f"\nBatted Ball Distribution:")
print(f"Ground Balls (<10°): {la_metrics['gb_pct']:.1f}%")
print(f"Line Drives (10-25°): {la_metrics['ld_pct']:.1f}%")
print(f"Fly Balls (25-50°): {la_metrics['fb_pct']:.1f}%")
print(f"Pop-ups (>50°): {la_metrics['popup_pct']:.1f}%")
print(f"\nSweet Spot % (8-32°): {la_metrics['sweet_spot_pct']:.1f}%")
print("\nPerformance by Batted Ball Type:")
print(la_metrics['category_performance'])
# R version: Launch angle metrics
calculate_launch_angle_metrics <- function(df) {
  batted_balls <- df %>%
    filter(!is.na(launch_angle))

  if (nrow(batted_balls) == 0) {
    return(NULL)
  }

  # Categorize launch angles
  batted_balls <- batted_balls %>%
    mutate(
      la_category = case_when(
        launch_angle < 10 ~ 'ground_ball',
        launch_angle < 25 ~ 'line_drive',
        launch_angle < 50 ~ 'fly_ball',
        TRUE ~ 'popup'
      ),
      in_sweet_spot = launch_angle >= 8 & launch_angle <= 32
    )

  # Calculate metrics
  metrics <- list(
    avg_la = mean(batted_balls$launch_angle, na.rm = TRUE),
    median_la = median(batted_balls$launch_angle, na.rm = TRUE),
    gb_pct = mean(batted_balls$launch_angle < 10, na.rm = TRUE) * 100,
    ld_pct = mean(batted_balls$launch_angle >= 10 &
                  batted_balls$launch_angle < 25, na.rm = TRUE) * 100,
    fb_pct = mean(batted_balls$launch_angle >= 25 &
                  batted_balls$launch_angle < 50, na.rm = TRUE) * 100,
    popup_pct = mean(batted_balls$launch_angle >= 50, na.rm = TRUE) * 100,
    sweet_spot_pct = mean(batted_balls$in_sweet_spot, na.rm = TRUE) * 100
  )

  # Performance by category
  category_performance <- batted_balls %>%
    group_by(la_category) %>%
    summarise(
      hit_rate = mean(events %in% c('single', 'double', 'triple', 'home_run'),
                     na.rm = TRUE),
      xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
      count = n(),
      .groups = 'drop'
    )

  metrics$category_performance <- category_performance

  return(metrics)
}

la_metrics <- calculate_launch_angle_metrics(judge_data)

cat("\nLaunch Angle Distribution\n")
cat(strrep("=", 50), "\n")
cat(sprintf("Average Launch Angle: %.1f°\n", la_metrics$avg_la))
cat(sprintf("Median Launch Angle: %.1f°\n", la_metrics$median_la))
cat("\nBatted Ball Distribution:\n")
cat(sprintf("Ground Balls (<10°): %.1f%%\n", la_metrics$gb_pct))
cat(sprintf("Line Drives (10-25°): %.1f%%\n", la_metrics$ld_pct))
cat(sprintf("Fly Balls (25-50°): %.1f%%\n", la_metrics$fb_pct))
cat(sprintf("Pop-ups (>50°): %.1f%%\n", la_metrics$popup_pct))
cat(sprintf("\nSweet Spot %% (8-32°): %.1f%%\n", la_metrics$sweet_spot_pct))
cat("\nPerformance by Batted Ball Type:\n")
print(la_metrics$category_performance)

6.3.4 Sweet Spot Percentage

Sweet Spot Percentage represents the proportion of batted balls hit at launch angles between 8 and 32 degrees. This range combines the high BABIP of line drives with the power potential of fly balls.

Why 8-32°?


  • Below 8°: Too many ground balls, lower BA

  • 8-25°: Line drive range, highest BABIP

  • 25-32°: Power range, home run potential with high EV

  • Above 32°: Decreasing hit probability, more fly outs

The MLB average sweet spot percentage is approximately 33-35%. Elite hitters often achieve 40%+, demonstrating exceptional bat-to-ball skills and optimal swing planes.

Sweet Spot leaders tend to be complete hitters who combine contact skills with power. Players like Freddie Freeman, Mookie Betts, and Ronald Acuña Jr. consistently rank among the leaders in this metric.


R
# R version: Launch angle metrics
calculate_launch_angle_metrics <- function(df) {
  batted_balls <- df %>%
    filter(!is.na(launch_angle))

  if (nrow(batted_balls) == 0) {
    return(NULL)
  }

  # Categorize launch angles
  batted_balls <- batted_balls %>%
    mutate(
      la_category = case_when(
        launch_angle < 10 ~ 'ground_ball',
        launch_angle < 25 ~ 'line_drive',
        launch_angle < 50 ~ 'fly_ball',
        TRUE ~ 'popup'
      ),
      in_sweet_spot = launch_angle >= 8 & launch_angle <= 32
    )

  # Calculate metrics
  metrics <- list(
    avg_la = mean(batted_balls$launch_angle, na.rm = TRUE),
    median_la = median(batted_balls$launch_angle, na.rm = TRUE),
    gb_pct = mean(batted_balls$launch_angle < 10, na.rm = TRUE) * 100,
    ld_pct = mean(batted_balls$launch_angle >= 10 &
                  batted_balls$launch_angle < 25, na.rm = TRUE) * 100,
    fb_pct = mean(batted_balls$launch_angle >= 25 &
                  batted_balls$launch_angle < 50, na.rm = TRUE) * 100,
    popup_pct = mean(batted_balls$launch_angle >= 50, na.rm = TRUE) * 100,
    sweet_spot_pct = mean(batted_balls$in_sweet_spot, na.rm = TRUE) * 100
  )

  # Performance by category
  category_performance <- batted_balls %>%
    group_by(la_category) %>%
    summarise(
      hit_rate = mean(events %in% c('single', 'double', 'triple', 'home_run'),
                     na.rm = TRUE),
      xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
      count = n(),
      .groups = 'drop'
    )

  metrics$category_performance <- category_performance

  return(metrics)
}

la_metrics <- calculate_launch_angle_metrics(judge_data)

cat("\nLaunch Angle Distribution\n")
cat(strrep("=", 50), "\n")
cat(sprintf("Average Launch Angle: %.1f°\n", la_metrics$avg_la))
cat(sprintf("Median Launch Angle: %.1f°\n", la_metrics$median_la))
cat("\nBatted Ball Distribution:\n")
cat(sprintf("Ground Balls (<10°): %.1f%%\n", la_metrics$gb_pct))
cat(sprintf("Line Drives (10-25°): %.1f%%\n", la_metrics$ld_pct))
cat(sprintf("Fly Balls (25-50°): %.1f%%\n", la_metrics$fb_pct))
cat(sprintf("Pop-ups (>50°): %.1f%%\n", la_metrics$popup_pct))
cat(sprintf("\nSweet Spot %% (8-32°): %.1f%%\n", la_metrics$sweet_spot_pct))
cat("\nPerformance by Batted Ball Type:\n")
print(la_metrics$category_performance)
Python
def calculate_launch_angle_metrics(df):
    """
    Calculate comprehensive launch angle distribution metrics.

    Parameters:
    df: Statcast DataFrame with 'launch_angle' column

    Returns:
    Dictionary of launch angle metrics
    """
    batted_balls = df[df['launch_angle'].notna()].copy()

    if len(batted_balls) == 0:
        return None

    # Categorize each batted ball
    def categorize_launch_angle(la):
        if la < 10:
            return 'ground_ball'
        elif la < 25:
            return 'line_drive'
        elif la < 50:
            return 'fly_ball'
        else:
            return 'popup'

    batted_balls['la_category'] = batted_balls['launch_angle'].apply(categorize_launch_angle)

    metrics = {
        'avg_la': batted_balls['launch_angle'].mean(),
        'median_la': batted_balls['launch_angle'].median(),
        'gb_count': (batted_balls['launch_angle'] < 10).sum(),
        'gb_pct': (batted_balls['launch_angle'] < 10).mean() * 100,
        'ld_count': ((batted_balls['launch_angle'] >= 10) &
                     (batted_balls['launch_angle'] < 25)).sum(),
        'ld_pct': ((batted_balls['launch_angle'] >= 10) &
                   (batted_balls['launch_angle'] < 25)).mean() * 100,
        'fb_count': ((batted_balls['launch_angle'] >= 25) &
                     (batted_balls['launch_angle'] < 50)).sum(),
        'fb_pct': ((batted_balls['launch_angle'] >= 25) &
                   (batted_balls['launch_angle'] < 50)).mean() * 100,
        'popup_count': (batted_balls['launch_angle'] >= 50).sum(),
        'popup_pct': (batted_balls['launch_angle'] >= 50).mean() * 100,
        'sweet_spot_count': ((batted_balls['launch_angle'] >= 8) &
                             (batted_balls['launch_angle'] <= 32)).sum(),
        'sweet_spot_pct': ((batted_balls['launch_angle'] >= 8) &
                           (batted_balls['launch_angle'] <= 32)).mean() * 100
    }

    # Calculate performance by category
    category_performance = batted_balls.groupby('la_category').agg({
        'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean(),
        'estimated_ba_using_speedangle': 'mean'
    }).round(3)

    metrics['category_performance'] = category_performance

    return metrics

# Calculate and display
la_metrics = calculate_launch_angle_metrics(statcast_data)

print("\nLaunch Angle Distribution")
print("=" * 50)
print(f"Average Launch Angle: {la_metrics['avg_la']:.1f}°")
print(f"Median Launch Angle: {la_metrics['median_la']:.1f}°")
print(f"\nBatted Ball Distribution:")
print(f"Ground Balls (<10°): {la_metrics['gb_pct']:.1f}%")
print(f"Line Drives (10-25°): {la_metrics['ld_pct']:.1f}%")
print(f"Fly Balls (25-50°): {la_metrics['fb_pct']:.1f}%")
print(f"Pop-ups (>50°): {la_metrics['popup_pct']:.1f}%")
print(f"\nSweet Spot % (8-32°): {la_metrics['sweet_spot_pct']:.1f}%")
print("\nPerformance by Batted Ball Type:")
print(la_metrics['category_performance'])

6.4 Barrels

6.4.1 What is a Barrel?

A Barrel is Statcast's definition of "perfect contact" - a batted ball with the ideal combination of exit velocity and launch angle to produce the highest expected outcomes. The exact definition is complex because the optimal launch angle varies with exit velocity.

The Barrel Formula:


  • At 98 mph exit velocity: Must be within 26-30° launch angle

  • At 99 mph: The acceptable range expands slightly

  • At 100+ mph: The "window" gets larger (24-33° at 100 mph)

  • At 116+ mph: Nearly any non-ground ball angle qualifies

The key insight: harder hit balls are more forgiving of launch angle. A 116 mph ground ball might still be a hit, while a 98 mph ball needs perfect elevation.

Barrel outcomes are exceptional:


  • Barrels have a batting average of .500+

  • They have a slugging percentage of 1.500+

  • They result in home runs approximately 30-40% of the time

  • They're nearly impossible to defend

Here's the mathematical relationship:

def is_barrel(exit_velocity, launch_angle):
    """
    Determine if a batted ball qualifies as a barrel based on MLB's definition.

    Parameters:
    exit_velocity: Exit velocity in mph
    launch_angle: Launch angle in degrees

    Returns:
    Boolean indicating barrel status
    """
    # Must be at least 98 mph
    if exit_velocity < 98:
        return False

    # Define the acceptable launch angle range based on exit velocity
    # These are approximations of MLB's actual formula
    if 98 <= exit_velocity < 99:
        return 26 <= launch_angle <= 30
    elif 99 <= exit_velocity < 100:
        return 25 <= launch_angle <= 31
    elif 100 <= exit_velocity < 101:
        return 24 <= launch_angle <= 33
    elif 101 <= exit_velocity < 102:
        return 23 <= launch_angle <= 34
    elif 102 <= exit_velocity < 103:
        return 22 <= launch_angle <= 35
    elif 103 <= exit_velocity < 104:
        return 21 <= launch_angle <= 36
    elif 104 <= exit_velocity < 116:
        return 20 <= launch_angle <= 37
    else:  # 116+ mph
        return 8 <= launch_angle <= 50  # Very forgiving range

# Example usage
print(is_barrel(98, 28))   # True - perfect barrel
print(is_barrel(105, 29))  # True - high EV barrel
print(is_barrel(98, 40))   # False - too steep despite good EV
print(is_barrel(92, 28))   # False - EV too low

6.4.2 Barrel Rates with Code

def calculate_barrel_metrics(df):
    """
    Calculate barrel-related metrics from Statcast data.

    Note: Statcast data includes a 'barrel' column, but this shows
    how to calculate it manually and derive additional insights.
    """
    batted_balls = df[(df['launch_speed'].notna()) &
                     (df['launch_angle'].notna())].copy()

    if len(batted_balls) == 0:
        return None

    # Calculate barrels (using Statcast's column if available)
    if 'barrel' in batted_balls.columns:
        batted_balls['is_barrel'] = batted_balls['barrel'] == 1
    else:
        # Calculate manually
        batted_balls['is_barrel'] = batted_balls.apply(
            lambda row: is_barrel(row['launch_speed'], row['launch_angle']),
            axis=1
        )

    barrel_balls = batted_balls[batted_balls['is_barrel']]

    metrics = {
        'barrel_count': len(barrel_balls),
        'barrel_pct': (len(barrel_balls) / len(batted_balls)) * 100,
        'barrel_pa_pct': (len(barrel_balls) / len(df)) * 100,  # Per plate appearance
        'avg_barrel_ev': barrel_balls['launch_speed'].mean() if len(barrel_balls) > 0 else 0,
        'avg_barrel_la': barrel_balls['launch_angle'].mean() if len(barrel_balls) > 0 else 0,
    }

    # Barrel outcomes
    if len(barrel_balls) > 0:
        barrel_outcomes = barrel_balls['events'].value_counts()
        metrics['barrel_outcomes'] = barrel_outcomes

        # Calculate barrel performance
        hits = barrel_balls['events'].isin(['single', 'double', 'triple', 'home_run']).sum()
        hr = barrel_balls['events'].eq('home_run').sum()

        metrics['barrel_ba'] = hits / len(barrel_balls)
        metrics['barrel_hr_pct'] = (hr / len(barrel_balls)) * 100

    return metrics

barrel_metrics = calculate_barrel_metrics(statcast_data)

print("\nBarrel Analysis")
print("=" * 50)
print(f"Barrel Count: {barrel_metrics['barrel_count']}")
print(f"Barrel% (of batted balls): {barrel_metrics['barrel_pct']:.1f}%")
print(f"Barrel/PA%: {barrel_metrics['barrel_pa_pct']:.1f}%")
print(f"\nAverage Barrel Exit Velocity: {barrel_metrics['avg_barrel_ev']:.1f} mph")
print(f"Average Barrel Launch Angle: {barrel_metrics['avg_barrel_la']:.1f}°")
print(f"\nBarrel Batting Average: {barrel_metrics['barrel_ba']:.3f}")
print(f"Barrel HR Rate: {barrel_metrics['barrel_hr_pct']:.1f}%")
print("\nBarrel Outcomes:")
print(barrel_metrics['barrel_outcomes'])
# R version: Barrel metrics
calculate_barrel_metrics <- function(df) {
  batted_balls <- df %>%
    filter(!is.na(launch_speed), !is.na(launch_angle))

  if (nrow(batted_balls) == 0) {
    return(NULL)
  }

  # Use Statcast's barrel column if available
  if ('barrel' %in% colnames(batted_balls)) {
    batted_balls <- batted_balls %>%
      mutate(is_barrel = barrel == 1)
  }

  barrel_balls <- batted_balls %>%
    filter(is_barrel == TRUE)

  metrics <- list(
    barrel_count = nrow(barrel_balls),
    barrel_pct = (nrow(barrel_balls) / nrow(batted_balls)) * 100,
    barrel_pa_pct = (nrow(barrel_balls) / nrow(df)) * 100
  )

  if (nrow(barrel_balls) > 0) {
    metrics$avg_barrel_ev <- mean(barrel_balls$launch_speed, na.rm = TRUE)
    metrics$avg_barrel_la <- mean(barrel_balls$launch_angle, na.rm = TRUE)

    barrel_hits <- barrel_balls %>%
      filter(events %in% c('single', 'double', 'triple', 'home_run'))

    barrel_hr <- barrel_balls %>%
      filter(events == 'home_run')

    metrics$barrel_ba <- nrow(barrel_hits) / nrow(barrel_balls)
    metrics$barrel_hr_pct <- (nrow(barrel_hr) / nrow(barrel_balls)) * 100

    metrics$barrel_outcomes <- barrel_balls %>%
      count(events) %>%
      arrange(desc(n))
  }

  return(metrics)
}

barrel_metrics <- calculate_barrel_metrics(judge_data)

cat("\nBarrel Analysis\n")
cat(strrep("=", 50), "\n")
cat(sprintf("Barrel Count: %d\n", barrel_metrics$barrel_count))
cat(sprintf("Barrel%% (of batted balls): %.1f%%\n", barrel_metrics$barrel_pct))
cat(sprintf("Barrel/PA%%: %.1f%%\n", barrel_metrics$barrel_pa_pct))
cat(sprintf("\nAverage Barrel Exit Velocity: %.1f mph\n", barrel_metrics$avg_barrel_ev))
cat(sprintf("Average Barrel Launch Angle: %.1f°\n", barrel_metrics$avg_barrel_la))
cat(sprintf("\nBarrel Batting Average: %.3f\n", barrel_metrics$barrel_ba))
cat(sprintf("Barrel HR Rate: %.1f%%\n", barrel_metrics$barrel_hr_pct))
cat("\nBarrel Outcomes:\n")
print(barrel_metrics$barrel_outcomes)

6.4.3 Barrel vs. Hard Hit Comparison

While related, Barrels and Hard-Hit Balls are distinct metrics:

MetricDefinitionMLB AverageWhat It Measures
Hard-Hit %% of BBE ≥95 mph~35-37%Raw power, contact strength
Barrel %% of BBE with optimal EV/LA combo~6-8%Perfect contact quality

Key Differences:


  • All barrels are hard-hit (by definition ≥98 mph)

  • NOT all hard-hit balls are barrels (many have poor launch angles)

  • A 105 mph ground ball is hard-hit but NOT a barrel

  • A 99 mph ball at 28° is both hard-hit AND a barrel

# Compare hard-hit vs. barrel rates
def compare_hard_hit_barrel(df):
    """Compare hard-hit and barrel classifications."""
    batted_balls = df[(df['launch_speed'].notna()) &
                     (df['launch_angle'].notna())].copy()

    batted_balls['is_hard_hit'] = batted_balls['launch_speed'] >= 95

    if 'barrel' in batted_balls.columns:
        batted_balls['is_barrel'] = batted_balls['barrel'] == 1

    # Create comparison categories
    batted_balls['category'] = 'Neither'
    batted_balls.loc[batted_balls['is_hard_hit'], 'category'] = 'Hard-Hit Only'
    batted_balls.loc[batted_balls['is_barrel'], 'category'] = 'Barrel'

    # Analyze outcomes by category
    comparison = batted_balls.groupby('category').agg({
        'launch_speed': ['count', 'mean'],
        'launch_angle': 'mean',
        'estimated_ba_using_speedangle': 'mean',
        'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean()
    }).round(3)

    comparison.columns = ['Count', 'Avg_EV', 'Avg_LA', 'xBA', 'Hit_Rate']

    return comparison

comparison = compare_hard_hit_barrel(statcast_data)
print("\nHard-Hit vs. Barrel Comparison:")
print(comparison)

Insight: Barrels represent the intersection of power (high EV) and optimal trajectory (ideal LA). A player can have a high hard-hit rate with a low barrel rate if they consistently hit balls too low (ground balls) or too high (pop-ups).


R
# R version: Barrel metrics
calculate_barrel_metrics <- function(df) {
  batted_balls <- df %>%
    filter(!is.na(launch_speed), !is.na(launch_angle))

  if (nrow(batted_balls) == 0) {
    return(NULL)
  }

  # Use Statcast's barrel column if available
  if ('barrel' %in% colnames(batted_balls)) {
    batted_balls <- batted_balls %>%
      mutate(is_barrel = barrel == 1)
  }

  barrel_balls <- batted_balls %>%
    filter(is_barrel == TRUE)

  metrics <- list(
    barrel_count = nrow(barrel_balls),
    barrel_pct = (nrow(barrel_balls) / nrow(batted_balls)) * 100,
    barrel_pa_pct = (nrow(barrel_balls) / nrow(df)) * 100
  )

  if (nrow(barrel_balls) > 0) {
    metrics$avg_barrel_ev <- mean(barrel_balls$launch_speed, na.rm = TRUE)
    metrics$avg_barrel_la <- mean(barrel_balls$launch_angle, na.rm = TRUE)

    barrel_hits <- barrel_balls %>%
      filter(events %in% c('single', 'double', 'triple', 'home_run'))

    barrel_hr <- barrel_balls %>%
      filter(events == 'home_run')

    metrics$barrel_ba <- nrow(barrel_hits) / nrow(barrel_balls)
    metrics$barrel_hr_pct <- (nrow(barrel_hr) / nrow(barrel_balls)) * 100

    metrics$barrel_outcomes <- barrel_balls %>%
      count(events) %>%
      arrange(desc(n))
  }

  return(metrics)
}

barrel_metrics <- calculate_barrel_metrics(judge_data)

cat("\nBarrel Analysis\n")
cat(strrep("=", 50), "\n")
cat(sprintf("Barrel Count: %d\n", barrel_metrics$barrel_count))
cat(sprintf("Barrel%% (of batted balls): %.1f%%\n", barrel_metrics$barrel_pct))
cat(sprintf("Barrel/PA%%: %.1f%%\n", barrel_metrics$barrel_pa_pct))
cat(sprintf("\nAverage Barrel Exit Velocity: %.1f mph\n", barrel_metrics$avg_barrel_ev))
cat(sprintf("Average Barrel Launch Angle: %.1f°\n", barrel_metrics$avg_barrel_la))
cat(sprintf("\nBarrel Batting Average: %.3f\n", barrel_metrics$barrel_ba))
cat(sprintf("Barrel HR Rate: %.1f%%\n", barrel_metrics$barrel_hr_pct))
cat("\nBarrel Outcomes:\n")
print(barrel_metrics$barrel_outcomes)
Python
def is_barrel(exit_velocity, launch_angle):
    """
    Determine if a batted ball qualifies as a barrel based on MLB's definition.

    Parameters:
    exit_velocity: Exit velocity in mph
    launch_angle: Launch angle in degrees

    Returns:
    Boolean indicating barrel status
    """
    # Must be at least 98 mph
    if exit_velocity < 98:
        return False

    # Define the acceptable launch angle range based on exit velocity
    # These are approximations of MLB's actual formula
    if 98 <= exit_velocity < 99:
        return 26 <= launch_angle <= 30
    elif 99 <= exit_velocity < 100:
        return 25 <= launch_angle <= 31
    elif 100 <= exit_velocity < 101:
        return 24 <= launch_angle <= 33
    elif 101 <= exit_velocity < 102:
        return 23 <= launch_angle <= 34
    elif 102 <= exit_velocity < 103:
        return 22 <= launch_angle <= 35
    elif 103 <= exit_velocity < 104:
        return 21 <= launch_angle <= 36
    elif 104 <= exit_velocity < 116:
        return 20 <= launch_angle <= 37
    else:  # 116+ mph
        return 8 <= launch_angle <= 50  # Very forgiving range

# Example usage
print(is_barrel(98, 28))   # True - perfect barrel
print(is_barrel(105, 29))  # True - high EV barrel
print(is_barrel(98, 40))   # False - too steep despite good EV
print(is_barrel(92, 28))   # False - EV too low
Python
def calculate_barrel_metrics(df):
    """
    Calculate barrel-related metrics from Statcast data.

    Note: Statcast data includes a 'barrel' column, but this shows
    how to calculate it manually and derive additional insights.
    """
    batted_balls = df[(df['launch_speed'].notna()) &
                     (df['launch_angle'].notna())].copy()

    if len(batted_balls) == 0:
        return None

    # Calculate barrels (using Statcast's column if available)
    if 'barrel' in batted_balls.columns:
        batted_balls['is_barrel'] = batted_balls['barrel'] == 1
    else:
        # Calculate manually
        batted_balls['is_barrel'] = batted_balls.apply(
            lambda row: is_barrel(row['launch_speed'], row['launch_angle']),
            axis=1
        )

    barrel_balls = batted_balls[batted_balls['is_barrel']]

    metrics = {
        'barrel_count': len(barrel_balls),
        'barrel_pct': (len(barrel_balls) / len(batted_balls)) * 100,
        'barrel_pa_pct': (len(barrel_balls) / len(df)) * 100,  # Per plate appearance
        'avg_barrel_ev': barrel_balls['launch_speed'].mean() if len(barrel_balls) > 0 else 0,
        'avg_barrel_la': barrel_balls['launch_angle'].mean() if len(barrel_balls) > 0 else 0,
    }

    # Barrel outcomes
    if len(barrel_balls) > 0:
        barrel_outcomes = barrel_balls['events'].value_counts()
        metrics['barrel_outcomes'] = barrel_outcomes

        # Calculate barrel performance
        hits = barrel_balls['events'].isin(['single', 'double', 'triple', 'home_run']).sum()
        hr = barrel_balls['events'].eq('home_run').sum()

        metrics['barrel_ba'] = hits / len(barrel_balls)
        metrics['barrel_hr_pct'] = (hr / len(barrel_balls)) * 100

    return metrics

barrel_metrics = calculate_barrel_metrics(statcast_data)

print("\nBarrel Analysis")
print("=" * 50)
print(f"Barrel Count: {barrel_metrics['barrel_count']}")
print(f"Barrel% (of batted balls): {barrel_metrics['barrel_pct']:.1f}%")
print(f"Barrel/PA%: {barrel_metrics['barrel_pa_pct']:.1f}%")
print(f"\nAverage Barrel Exit Velocity: {barrel_metrics['avg_barrel_ev']:.1f} mph")
print(f"Average Barrel Launch Angle: {barrel_metrics['avg_barrel_la']:.1f}°")
print(f"\nBarrel Batting Average: {barrel_metrics['barrel_ba']:.3f}")
print(f"Barrel HR Rate: {barrel_metrics['barrel_hr_pct']:.1f}%")
print("\nBarrel Outcomes:")
print(barrel_metrics['barrel_outcomes'])
Python
# Compare hard-hit vs. barrel rates
def compare_hard_hit_barrel(df):
    """Compare hard-hit and barrel classifications."""
    batted_balls = df[(df['launch_speed'].notna()) &
                     (df['launch_angle'].notna())].copy()

    batted_balls['is_hard_hit'] = batted_balls['launch_speed'] >= 95

    if 'barrel' in batted_balls.columns:
        batted_balls['is_barrel'] = batted_balls['barrel'] == 1

    # Create comparison categories
    batted_balls['category'] = 'Neither'
    batted_balls.loc[batted_balls['is_hard_hit'], 'category'] = 'Hard-Hit Only'
    batted_balls.loc[batted_balls['is_barrel'], 'category'] = 'Barrel'

    # Analyze outcomes by category
    comparison = batted_balls.groupby('category').agg({
        'launch_speed': ['count', 'mean'],
        'launch_angle': 'mean',
        'estimated_ba_using_speedangle': 'mean',
        'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean()
    }).round(3)

    comparison.columns = ['Count', 'Avg_EV', 'Avg_LA', 'xBA', 'Hit_Rate']

    return comparison

comparison = compare_hard_hit_barrel(statcast_data)
print("\nHard-Hit vs. Barrel Comparison:")
print(comparison)

6.5 Expected Statistics (xStats)

6.5.1 The Philosophy: Removing Luck and Defense

Traditional batting statistics like batting average, slugging percentage, and wOBA tell us what happened. Expected statistics (xStats) tell us what should have happened based purely on the quality of contact, independent of:

  • Defense: A great play by a fielder shouldn't penalize the hitter
  • Luck: A bloop single and a line drive out have very different contact quality
  • Park factors: In the moment of contact, the ballpark shouldn't matter
  • Weather: Wind, temperature, humidity affect actual but not expected outcomes

The xStats Philosophy:
Every batted ball in MLB history with similar exit velocity and launch angle has produced a certain average outcome. By looking at thousands of comparable batted balls, we can determine the expected outcome of any new batted ball.

For example:


  • All balls hit 105 mph at 28° have historically resulted in hits approximately 75% of the time

  • Therefore, a new ball hit 105 mph at 28° has an xBA of .750 for that batted ball

  • Sum these expected values across all plate appearances to get a player's season xBA

6.5.2 xBA (Expected Batting Average)

Expected Batting Average (xBA) estimates what a player's batting average should be based solely on the quality of contact, removing defensive plays and luck.

def calculate_xba_metrics(df):
    """
    Calculate expected batting average metrics and compare to actual.

    Uses Statcast's 'estimated_ba_using_speedangle' which is calculated
    using exit velocity and launch angle comparisons to historical data.
    """
    # Filter for balls in play (exclude strikeouts, walks, etc.)
    batted_balls = df[df['estimated_ba_using_speedangle'].notna()].copy()

    # Calculate actual outcomes
    batted_balls['is_hit'] = batted_balls['events'].isin([
        'single', 'double', 'triple', 'home_run'
    ])

    metrics = {
        'xBA': batted_balls['estimated_ba_using_speedangle'].mean(),
        'actual_BA_on_contact': batted_balls['is_hit'].mean(),
        'ba_diff': batted_balls['is_hit'].mean() -
                   batted_balls['estimated_ba_using_speedangle'].mean(),
        'batted_balls': len(batted_balls)
    }

    # Calculate xBA by exit velocity bins
    batted_balls['ev_bin'] = pd.cut(
        batted_balls['launch_speed'],
        bins=[0, 85, 95, 105, 125],
        labels=['<85', '85-95', '95-105', '105+']
    )

    xba_by_ev = batted_balls.groupby('ev_bin').agg({
        'estimated_ba_using_speedangle': 'mean',
        'is_hit': 'mean',
        'launch_speed': 'count'
    }).round(3)

    xba_by_ev.columns = ['xBA', 'Actual_BA', 'Count']
    metrics['xba_by_ev'] = xba_by_ev

    # Identify over/under-performers (individual batted balls)
    batted_balls['xba_diff'] = (batted_balls['is_hit'].astype(int) -
                                batted_balls['estimated_ba_using_speedangle'])

    # Find biggest outperformers (hits with low xBA)
    lucky_hits = batted_balls[
        (batted_balls['is_hit'] == True) &
        (batted_balls['estimated_ba_using_speedangle'] < 0.300)
    ].nsmallest(5, 'estimated_ba_using_speedangle')[
        ['game_date', 'events', 'launch_speed', 'launch_angle',
         'estimated_ba_using_speedangle']
    ]

    metrics['lucky_hits'] = lucky_hits

    return metrics

xba_metrics = calculate_xba_metrics(statcast_data)

print("\nExpected Batting Average (xBA) Analysis")
print("=" * 60)
print(f"Expected BA (xBA): {xba_metrics['xBA']:.3f}")
print(f"Actual BA on Contact: {xba_metrics['actual_BA_on_contact']:.3f}")
print(f"Difference (Actual - Expected): {xba_metrics['ba_diff']:+.3f}")

if xba_metrics['ba_diff'] > 0.020:
    print("  → Player is outperforming contact quality (lucky or good speed)")
elif xba_metrics['ba_diff'] < -0.020:
    print("  → Player is underperforming contact quality (unlucky or poor speed)")
else:
    print("  → Performance matches contact quality")

print(f"\nxBA by Exit Velocity:")
print(xba_metrics['xba_by_ev'])

print("\nLuckiest Hits (Low xBA but resulted in hit):")
print(xba_metrics['lucky_hits'].to_string(index=False))
# R version: xBA metrics
calculate_xba_metrics <- function(df) {
  batted_balls <- df %>%
    filter(!is.na(estimated_ba_using_speedangle)) %>%
    mutate(
      is_hit = events %in% c('single', 'double', 'triple', 'home_run'),
      ev_bin = cut(
        launch_speed,
        breaks = c(0, 85, 95, 105, 125),
        labels = c('<85', '85-95', '95-105', '105+')
      )
    )

  metrics <- list(
    xBA = mean(batted_balls$estimated_ba_using_speedangle, na.rm = TRUE),
    actual_BA = mean(batted_balls$is_hit, na.rm = TRUE),
    batted_balls = nrow(batted_balls)
  )

  metrics$ba_diff <- metrics$actual_BA - metrics$xBA

  # xBA by exit velocity
  xba_by_ev <- batted_balls %>%
    group_by(ev_bin) %>%
    summarise(
      xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
      actual_BA = mean(is_hit, na.rm = TRUE),
      count = n(),
      .groups = 'drop'
    ) %>%
    mutate(across(c(xBA, actual_BA), round, 3))

  metrics$xba_by_ev <- xba_by_ev

  # Lucky hits
  lucky_hits <- batted_balls %>%
    filter(is_hit == TRUE, estimated_ba_using_speedangle < 0.300) %>%
    arrange(estimated_ba_using_speedangle) %>%
    select(game_date, events, launch_speed, launch_angle,
           estimated_ba_using_speedangle) %>%
    head(5)

  metrics$lucky_hits <- lucky_hits

  return(metrics)
}

xba_metrics <- calculate_xba_metrics(judge_data)

cat("\nExpected Batting Average (xBA) Analysis\n")
cat(strrep("=", 60), "\n")
cat(sprintf("Expected BA (xBA): %.3f\n", xba_metrics$xBA))
cat(sprintf("Actual BA on Contact: %.3f\n", xba_metrics$actual_BA))
cat(sprintf("Difference (Actual - Expected): %+.3f\n", xba_metrics$ba_diff))

if (xba_metrics$ba_diff > 0.020) {
  cat("  → Player is outperforming contact quality\n")
} else if (xba_metrics$ba_diff < -0.020) {
  cat("  → Player is underperforming contact quality\n")
} else {
  cat("  → Performance matches contact quality\n")
}

cat("\nxBA by Exit Velocity:\n")
print(xba_metrics$xba_by_ev)

cat("\nLuckiest Hits (Low xBA but resulted in hit):\n")
print(xba_metrics$lucky_hits)

Interpreting xBA Differences:

  • xBA > Actual BA: Player has been unlucky or facing strong defensive positioning. Expect positive regression.
  • Actual BA > xBA: Player has been lucky, has exceptional speed, or benefits from weak defensive positioning. Expect negative regression.
  • Difference < ±.020: Performance matches contact quality - what you see is what you get.

6.5.3 xwOBA (Expected Weighted On-Base Average)

Expected wOBA (xwOBA) is the most comprehensive expected statistic. While xBA only considers hits vs. outs, xwOBA accounts for the type of hit (single, double, triple, home run) expected based on exit velocity and launch angle.

xwOBA is calculated similarly to wOBA (covered in Chapter 3), but uses expected outcomes:


  • Each batted ball gets an expected wOBA value based on its EV/LA combination

  • These are averaged across all plate appearances

  • Walks and strikeouts are included at their actual wOBA values

Why xwOBA > xBA:


  • xBA treats all hits equally

  • xwOBA distinguishes between expected singles and expected home runs

  • xwOBA provides a complete picture of offensive value

  • xwOBA correlates more strongly with future performance

def calculate_xwoba_metrics(df):
    """
    Analyze expected wOBA (xwOBA) compared to actual wOBA.

    Note: Statcast provides 'estimated_woba_using_speedangle' which
    represents the expected wOBA for each batted ball.
    """
    # Calculate actual wOBA (simplified - using typical weights)
    woba_weights = {
        'walk': 0.69,
        'hit_by_pitch': 0.72,
        'single': 0.88,
        'double': 1.24,
        'triple': 1.56,
        'home_run': 2.08
    }

    df_calc = df.copy()
    df_calc['woba_value'] = df_calc['events'].map(woba_weights).fillna(0)

    # Count plate appearances (excluding certain events)
    pa_events = ~df_calc['events'].isin(['caught_stealing_2b', 'caught_stealing_3b',
                                         'caught_stealing_home', 'pickoff_1b',
                                         'pickoff_2b', 'pickoff_3b'])

    actual_woba = df_calc.loc[pa_events, 'woba_value'].sum() / pa_events.sum()

    # Expected wOBA from Statcast
    batted_balls_xwoba = df_calc[df_calc['estimated_woba_using_speedangle'].notna()]

    if len(batted_balls_xwoba) > 0:
        xwoba = batted_balls_xwoba['estimated_woba_using_speedangle'].mean()
    else:
        xwoba = None

    metrics = {
        'actual_wOBA': actual_woba,
        'xwOBA': xwoba,
        'woba_diff': actual_woba - xwoba if xwoba else None,
        'batted_balls': len(batted_balls_xwoba)
    }

    # xwOBA by launch angle category
    if len(batted_balls_xwoba) > 0:
        batted_balls_xwoba_copy = batted_balls_xwoba.copy()
        batted_balls_xwoba_copy['la_category'] = pd.cut(
            batted_balls_xwoba_copy['launch_angle'],
            bins=[-90, 10, 25, 50, 90],
            labels=['Ground Ball', 'Line Drive', 'Fly Ball', 'Pop-up']
        )

        xwoba_by_la = batted_balls_xwoba_copy.groupby('la_category').agg({
            'estimated_woba_using_speedangle': 'mean',
            'launch_speed': ['mean', 'count']
        }).round(3)

        xwoba_by_la.columns = ['xwOBA', 'Avg_EV', 'Count']
        metrics['xwoba_by_la'] = xwoba_by_la

    return metrics

xwoba_metrics = calculate_xwoba_metrics(statcast_data)

print("\nExpected wOBA (xwOBA) Analysis")
print("=" * 60)
print(f"Actual wOBA: {xwoba_metrics['actual_wOBA']:.3f}")
print(f"Expected wOBA (xwOBA): {xwoba_metrics['xwOBA']:.3f}")
print(f"Difference (Actual - Expected): {xwoba_metrics['woba_diff']:+.3f}")

if xwoba_metrics['woba_diff'] > 0.020:
    print("  → Outperforming expected - possibly lucky or elite speed")
elif xwoba_metrics['woba_diff'] < -0.020:
    print("  → Underperforming expected - regression likely upcoming")
else:
    print("  → Performance matches expectations")

print("\nxwOBA by Batted Ball Type:")
print(xwoba_metrics['xwoba_by_la'])

# Interpretation guide
print("\nxwOBA Scale:")
print("  Excellent: .390+")
print("  Great: .360 - .389")
print("  Above Average: .330 - .359")
print("  Average: .310 - .329")
print("  Below Average: .290 - .309")
print("  Poor: < .290")

6.5.4 Interpreting xStat Differences

The difference between actual and expected stats is incredibly valuable for predictive analysis:

Large Positive Difference (Actual >> Expected):


  • Player has been lucky with batted ball outcomes

  • Weak defensive positioning by opponents

  • Exceptional speed creating extra hits

  • Prediction: Expect decline toward xStat level

Large Negative Difference (Actual << Expected):


  • Player has been unlucky with batted ball outcomes

  • Facing strong defensive positioning (shift effectiveness)

  • Poor speed limiting infield hits

  • Prediction: Expect improvement toward xStat level

Small Difference (|Actual - Expected| < .020):


  • Performance matches underlying contact quality

  • Prediction: Expect similar future performance

def identify_regression_candidates(df, threshold=0.030):
    """
    Identify players likely to regress based on xwOBA difference.

    Parameters:
    df: Statcast data
    threshold: Minimum difference to flag (default .030)

    Returns:
    Regression prediction and analysis
    """
    xwoba_metrics = calculate_xwoba_metrics(df)

    if xwoba_metrics['xwOBA'] is None:
        return "Insufficient data for analysis"

    diff = xwoba_metrics['woba_diff']
    actual = xwoba_metrics['actual_wOBA']
    expected = xwoba_metrics['xwOBA']

    analysis = {
        'actual_wOBA': actual,
        'expected_wOBA': expected,
        'difference': diff,
        'regression_likely': abs(diff) >= threshold
    }

    if diff >= threshold:
        analysis['prediction'] = 'NEGATIVE REGRESSION LIKELY'
        analysis['reason'] = f'Actual wOBA ({actual:.3f}) significantly exceeds xwOBA ({expected:.3f})'
        analysis['action'] = 'SELL HIGH - Performance likely unsustainable'
    elif diff <= -threshold:
        analysis['prediction'] = 'POSITIVE REGRESSION LIKELY'
        analysis['reason'] = f'Actual wOBA ({actual:.3f}) significantly below xwOBA ({expected:.3f})'
        analysis['action'] = 'BUY LOW - Improvement expected'
    else:
        analysis['prediction'] = 'PERFORMANCE SUSTAINABLE'
        analysis['reason'] = f'Actual and expected wOBA closely aligned'
        analysis['action'] = 'HOLD - What you see is what you get'

    return analysis

regression_analysis = identify_regression_candidates(statcast_data)
print("\n" + "=" * 60)
print("REGRESSION ANALYSIS")
print("=" * 60)
for key, value in regression_analysis.items():
    print(f"{key}: {value}")

R
# R version: xBA metrics
calculate_xba_metrics <- function(df) {
  batted_balls <- df %>%
    filter(!is.na(estimated_ba_using_speedangle)) %>%
    mutate(
      is_hit = events %in% c('single', 'double', 'triple', 'home_run'),
      ev_bin = cut(
        launch_speed,
        breaks = c(0, 85, 95, 105, 125),
        labels = c('<85', '85-95', '95-105', '105+')
      )
    )

  metrics <- list(
    xBA = mean(batted_balls$estimated_ba_using_speedangle, na.rm = TRUE),
    actual_BA = mean(batted_balls$is_hit, na.rm = TRUE),
    batted_balls = nrow(batted_balls)
  )

  metrics$ba_diff <- metrics$actual_BA - metrics$xBA

  # xBA by exit velocity
  xba_by_ev <- batted_balls %>%
    group_by(ev_bin) %>%
    summarise(
      xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
      actual_BA = mean(is_hit, na.rm = TRUE),
      count = n(),
      .groups = 'drop'
    ) %>%
    mutate(across(c(xBA, actual_BA), round, 3))

  metrics$xba_by_ev <- xba_by_ev

  # Lucky hits
  lucky_hits <- batted_balls %>%
    filter(is_hit == TRUE, estimated_ba_using_speedangle < 0.300) %>%
    arrange(estimated_ba_using_speedangle) %>%
    select(game_date, events, launch_speed, launch_angle,
           estimated_ba_using_speedangle) %>%
    head(5)

  metrics$lucky_hits <- lucky_hits

  return(metrics)
}

xba_metrics <- calculate_xba_metrics(judge_data)

cat("\nExpected Batting Average (xBA) Analysis\n")
cat(strrep("=", 60), "\n")
cat(sprintf("Expected BA (xBA): %.3f\n", xba_metrics$xBA))
cat(sprintf("Actual BA on Contact: %.3f\n", xba_metrics$actual_BA))
cat(sprintf("Difference (Actual - Expected): %+.3f\n", xba_metrics$ba_diff))

if (xba_metrics$ba_diff > 0.020) {
  cat("  → Player is outperforming contact quality\n")
} else if (xba_metrics$ba_diff < -0.020) {
  cat("  → Player is underperforming contact quality\n")
} else {
  cat("  → Performance matches contact quality\n")
}

cat("\nxBA by Exit Velocity:\n")
print(xba_metrics$xba_by_ev)

cat("\nLuckiest Hits (Low xBA but resulted in hit):\n")
print(xba_metrics$lucky_hits)
Python
def calculate_xba_metrics(df):
    """
    Calculate expected batting average metrics and compare to actual.

    Uses Statcast's 'estimated_ba_using_speedangle' which is calculated
    using exit velocity and launch angle comparisons to historical data.
    """
    # Filter for balls in play (exclude strikeouts, walks, etc.)
    batted_balls = df[df['estimated_ba_using_speedangle'].notna()].copy()

    # Calculate actual outcomes
    batted_balls['is_hit'] = batted_balls['events'].isin([
        'single', 'double', 'triple', 'home_run'
    ])

    metrics = {
        'xBA': batted_balls['estimated_ba_using_speedangle'].mean(),
        'actual_BA_on_contact': batted_balls['is_hit'].mean(),
        'ba_diff': batted_balls['is_hit'].mean() -
                   batted_balls['estimated_ba_using_speedangle'].mean(),
        'batted_balls': len(batted_balls)
    }

    # Calculate xBA by exit velocity bins
    batted_balls['ev_bin'] = pd.cut(
        batted_balls['launch_speed'],
        bins=[0, 85, 95, 105, 125],
        labels=['<85', '85-95', '95-105', '105+']
    )

    xba_by_ev = batted_balls.groupby('ev_bin').agg({
        'estimated_ba_using_speedangle': 'mean',
        'is_hit': 'mean',
        'launch_speed': 'count'
    }).round(3)

    xba_by_ev.columns = ['xBA', 'Actual_BA', 'Count']
    metrics['xba_by_ev'] = xba_by_ev

    # Identify over/under-performers (individual batted balls)
    batted_balls['xba_diff'] = (batted_balls['is_hit'].astype(int) -
                                batted_balls['estimated_ba_using_speedangle'])

    # Find biggest outperformers (hits with low xBA)
    lucky_hits = batted_balls[
        (batted_balls['is_hit'] == True) &
        (batted_balls['estimated_ba_using_speedangle'] < 0.300)
    ].nsmallest(5, 'estimated_ba_using_speedangle')[
        ['game_date', 'events', 'launch_speed', 'launch_angle',
         'estimated_ba_using_speedangle']
    ]

    metrics['lucky_hits'] = lucky_hits

    return metrics

xba_metrics = calculate_xba_metrics(statcast_data)

print("\nExpected Batting Average (xBA) Analysis")
print("=" * 60)
print(f"Expected BA (xBA): {xba_metrics['xBA']:.3f}")
print(f"Actual BA on Contact: {xba_metrics['actual_BA_on_contact']:.3f}")
print(f"Difference (Actual - Expected): {xba_metrics['ba_diff']:+.3f}")

if xba_metrics['ba_diff'] > 0.020:
    print("  → Player is outperforming contact quality (lucky or good speed)")
elif xba_metrics['ba_diff'] < -0.020:
    print("  → Player is underperforming contact quality (unlucky or poor speed)")
else:
    print("  → Performance matches contact quality")

print(f"\nxBA by Exit Velocity:")
print(xba_metrics['xba_by_ev'])

print("\nLuckiest Hits (Low xBA but resulted in hit):")
print(xba_metrics['lucky_hits'].to_string(index=False))
Python
def calculate_xwoba_metrics(df):
    """
    Analyze expected wOBA (xwOBA) compared to actual wOBA.

    Note: Statcast provides 'estimated_woba_using_speedangle' which
    represents the expected wOBA for each batted ball.
    """
    # Calculate actual wOBA (simplified - using typical weights)
    woba_weights = {
        'walk': 0.69,
        'hit_by_pitch': 0.72,
        'single': 0.88,
        'double': 1.24,
        'triple': 1.56,
        'home_run': 2.08
    }

    df_calc = df.copy()
    df_calc['woba_value'] = df_calc['events'].map(woba_weights).fillna(0)

    # Count plate appearances (excluding certain events)
    pa_events = ~df_calc['events'].isin(['caught_stealing_2b', 'caught_stealing_3b',
                                         'caught_stealing_home', 'pickoff_1b',
                                         'pickoff_2b', 'pickoff_3b'])

    actual_woba = df_calc.loc[pa_events, 'woba_value'].sum() / pa_events.sum()

    # Expected wOBA from Statcast
    batted_balls_xwoba = df_calc[df_calc['estimated_woba_using_speedangle'].notna()]

    if len(batted_balls_xwoba) > 0:
        xwoba = batted_balls_xwoba['estimated_woba_using_speedangle'].mean()
    else:
        xwoba = None

    metrics = {
        'actual_wOBA': actual_woba,
        'xwOBA': xwoba,
        'woba_diff': actual_woba - xwoba if xwoba else None,
        'batted_balls': len(batted_balls_xwoba)
    }

    # xwOBA by launch angle category
    if len(batted_balls_xwoba) > 0:
        batted_balls_xwoba_copy = batted_balls_xwoba.copy()
        batted_balls_xwoba_copy['la_category'] = pd.cut(
            batted_balls_xwoba_copy['launch_angle'],
            bins=[-90, 10, 25, 50, 90],
            labels=['Ground Ball', 'Line Drive', 'Fly Ball', 'Pop-up']
        )

        xwoba_by_la = batted_balls_xwoba_copy.groupby('la_category').agg({
            'estimated_woba_using_speedangle': 'mean',
            'launch_speed': ['mean', 'count']
        }).round(3)

        xwoba_by_la.columns = ['xwOBA', 'Avg_EV', 'Count']
        metrics['xwoba_by_la'] = xwoba_by_la

    return metrics

xwoba_metrics = calculate_xwoba_metrics(statcast_data)

print("\nExpected wOBA (xwOBA) Analysis")
print("=" * 60)
print(f"Actual wOBA: {xwoba_metrics['actual_wOBA']:.3f}")
print(f"Expected wOBA (xwOBA): {xwoba_metrics['xwOBA']:.3f}")
print(f"Difference (Actual - Expected): {xwoba_metrics['woba_diff']:+.3f}")

if xwoba_metrics['woba_diff'] > 0.020:
    print("  → Outperforming expected - possibly lucky or elite speed")
elif xwoba_metrics['woba_diff'] < -0.020:
    print("  → Underperforming expected - regression likely upcoming")
else:
    print("  → Performance matches expectations")

print("\nxwOBA by Batted Ball Type:")
print(xwoba_metrics['xwoba_by_la'])

# Interpretation guide
print("\nxwOBA Scale:")
print("  Excellent: .390+")
print("  Great: .360 - .389")
print("  Above Average: .330 - .359")
print("  Average: .310 - .329")
print("  Below Average: .290 - .309")
print("  Poor: < .290")
Python
def identify_regression_candidates(df, threshold=0.030):
    """
    Identify players likely to regress based on xwOBA difference.

    Parameters:
    df: Statcast data
    threshold: Minimum difference to flag (default .030)

    Returns:
    Regression prediction and analysis
    """
    xwoba_metrics = calculate_xwoba_metrics(df)

    if xwoba_metrics['xwOBA'] is None:
        return "Insufficient data for analysis"

    diff = xwoba_metrics['woba_diff']
    actual = xwoba_metrics['actual_wOBA']
    expected = xwoba_metrics['xwOBA']

    analysis = {
        'actual_wOBA': actual,
        'expected_wOBA': expected,
        'difference': diff,
        'regression_likely': abs(diff) >= threshold
    }

    if diff >= threshold:
        analysis['prediction'] = 'NEGATIVE REGRESSION LIKELY'
        analysis['reason'] = f'Actual wOBA ({actual:.3f}) significantly exceeds xwOBA ({expected:.3f})'
        analysis['action'] = 'SELL HIGH - Performance likely unsustainable'
    elif diff <= -threshold:
        analysis['prediction'] = 'POSITIVE REGRESSION LIKELY'
        analysis['reason'] = f'Actual wOBA ({actual:.3f}) significantly below xwOBA ({expected:.3f})'
        analysis['action'] = 'BUY LOW - Improvement expected'
    else:
        analysis['prediction'] = 'PERFORMANCE SUSTAINABLE'
        analysis['reason'] = f'Actual and expected wOBA closely aligned'
        analysis['action'] = 'HOLD - What you see is what you get'

    return analysis

regression_analysis = identify_regression_candidates(statcast_data)
print("\n" + "=" * 60)
print("REGRESSION ANALYSIS")
print("=" * 60)
for key, value in regression_analysis.items():
    print(f"{key}: {value}")

6.6 Spray Angle and Pull Tendency

6.6.1 Calculating Spray Angle

Spray Angle (also called Hit Direction) measures the horizontal angle at which a ball is hit:


  • Negative angles: Opposite field (left field for RHH, right field for LHH)

  • Zero degrees: Straightaway center field

  • Positive angles: Pull side (right field for RHH, left field for LHH)

Statcast provides the hc_x and hc_y coordinates of where the ball landed or was fielded. We can calculate spray angle from these coordinates:

import numpy as np

def calculate_spray_angle(hc_x, hc_y, batter_side):
    """
    Calculate spray angle from hit coordinates.

    Parameters:
    hc_x: Horizontal coordinate (Statcast coordinate system)
    hc_y: Vertical coordinate (Statcast coordinate system)
    batter_side: 'R' for right-handed, 'L' for left-handed

    Returns:
    Spray angle in degrees
    """
    # Convert Statcast coordinates to spray angle
    # Home plate is roughly at (125, 205) in Statcast coordinates
    home_x, home_y = 125, 205

    # Calculate relative position
    rel_x = hc_x - home_x
    rel_y = hc_y - home_y

    # Calculate angle in radians, then convert to degrees
    angle_rad = np.arctan2(rel_x, rel_y)
    angle_deg = np.degrees(angle_rad)

    # Adjust for batter handedness
    # For LHH, flip the sign to maintain consistent pull/oppo definition
    if batter_side == 'L':
        angle_deg = -angle_deg

    return angle_deg

def categorize_spray_direction(spray_angle):
    """
    Categorize spray angle into pull, center, opposite field.

    Standard definitions:
    - Pull: > 15 degrees
    - Center: -15 to 15 degrees
    - Opposite: < -15 degrees
    """
    if spray_angle > 15:
        return 'Pull'
    elif spray_angle < -15:
        return 'Opposite'
    else:
        return 'Center'

# Example: Add spray metrics to dataframe
def add_spray_metrics(df):
    """Add spray angle and direction to Statcast dataframe."""
    df_spray = df.copy()

    # Calculate spray angle for each batted ball
    df_spray['spray_angle'] = df_spray.apply(
        lambda row: calculate_spray_angle(
            row['hc_x'], row['hc_y'], row['stand']
        ) if pd.notna(row['hc_x']) else None,
        axis=1
    )

    # Categorize direction
    df_spray['spray_direction'] = df_spray['spray_angle'].apply(
        lambda x: categorize_spray_direction(x) if pd.notna(x) else None
    )

    return df_spray

# Apply to our data
statcast_with_spray = add_spray_metrics(statcast_data)

6.6.2 Pull, Center, Opposite Field Breakdown

Understanding a hitter's spray tendencies is crucial for:


  • Defensive positioning: Extreme pull hitters invite shifts

  • Power assessment: Most home runs are pulled

  • Pitch approach: Pull-heavy hitters struggle with away pitches

  • Development: Learning to use the whole field

def analyze_spray_tendencies(df):
    """
    Comprehensive spray chart analysis.

    Analyzes distribution and performance by spray direction.
    """
    # Add spray metrics if not already present
    if 'spray_direction' not in df.columns:
        df = add_spray_metrics(df)

    spray_data = df[df['spray_direction'].notna()].copy()

    if len(spray_data) == 0:
        return None

    # Distribution analysis
    distribution = spray_data['spray_direction'].value_counts(normalize=True) * 100

    # Performance by direction
    performance = spray_data.groupby('spray_direction').agg({
        'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean(),
        'estimated_ba_using_speedangle': 'mean',
        'estimated_woba_using_speedangle': 'mean',
        'launch_speed': 'mean',
        'launch_angle': 'mean'
    }).round(3)

    performance.columns = ['BA', 'xBA', 'xwOBA', 'Avg_EV', 'Avg_LA']
    performance['Percentage'] = distribution.round(1)

    # Home runs by direction
    hr_data = spray_data[spray_data['events'] == 'home_run']
    hr_distribution = hr_data['spray_direction'].value_counts()

    metrics = {
        'distribution': distribution.to_dict(),
        'performance': performance,
        'hr_distribution': hr_distribution.to_dict(),
        'total_batted_balls': len(spray_data),
        'total_hr': len(hr_data)
    }

    # Pull tendency score (-100 to +100)
    # +100 = extreme pull, -100 = extreme opposite field
    pull_pct = distribution.get('Pull', 0)
    oppo_pct = distribution.get('Opposite', 0)
    metrics['pull_tendency_score'] = pull_pct - oppo_pct

    return metrics

spray_analysis = analyze_spray_tendencies(statcast_with_spray)

print("\nSpray Chart Analysis")
print("=" * 60)
print("\nBatted Ball Distribution:")
for direction, pct in spray_analysis['distribution'].items():
    print(f"  {direction}: {pct:.1f}%")

print(f"\nPull Tendency Score: {spray_analysis['pull_tendency_score']:.1f}")
if spray_analysis['pull_tendency_score'] > 20:
    print("  → PULL-HEAVY hitter (vulnerable to shifts)")
elif spray_analysis['pull_tendency_score'] < -20:
    print("  → OPPOSITE FIELD hitter (uses whole field)")
else:
    print("  → BALANCED spray approach")

print("\nPerformance by Direction:")
print(spray_analysis['performance'])

print("\nHome Run Distribution:")
for direction, count in spray_analysis['hr_distribution'].items():
    pct = (count / spray_analysis['total_hr']) * 100
    print(f"  {direction}: {count} ({pct:.1f}%)")
# R version: Spray analysis
library(ggplot2)

calculate_spray_angle <- function(hc_x, hc_y, batter_side) {
  home_x <- 125
  home_y <- 205

  rel_x <- hc_x - home_x
  rel_y <- hc_y - home_y

  angle_rad <- atan2(rel_x, rel_y)
  angle_deg <- angle_rad * 180 / pi

  if (batter_side == 'L') {
    angle_deg <- -angle_deg
  }

  return(angle_deg)
}

analyze_spray_tendencies <- function(df) {
  spray_data <- df %>%
    filter(!is.na(hc_x), !is.na(hc_y)) %>%
    rowwise() %>%
    mutate(
      spray_angle = calculate_spray_angle(hc_x, hc_y, stand),
      spray_direction = case_when(
        spray_angle > 15 ~ 'Pull',
        spray_angle < -15 ~ 'Opposite',
        TRUE ~ 'Center'
      )
    ) %>%
    ungroup()

  # Distribution
  distribution <- spray_data %>%
    count(spray_direction) %>%
    mutate(percentage = n / sum(n) * 100)

  # Performance by direction
  performance <- spray_data %>%
    group_by(spray_direction) %>%
    summarise(
      BA = mean(events %in% c('single', 'double', 'triple', 'home_run'),
               na.rm = TRUE),
      xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
      xwOBA = mean(estimated_woba_using_speedangle, na.rm = TRUE),
      avg_ev = mean(launch_speed, na.rm = TRUE),
      avg_la = mean(launch_angle, na.rm = TRUE),
      count = n(),
      .groups = 'drop'
    ) %>%
    mutate(across(c(BA, xBA, xwOBA, avg_ev, avg_la), round, 3))

  # Pull tendency score
  pull_pct <- distribution %>%
    filter(spray_direction == 'Pull') %>%
    pull(percentage)
  oppo_pct <- distribution %>%
    filter(spray_direction == 'Opposite') %>%
    pull(percentage)

  pull_tendency <- ifelse(length(pull_pct) > 0, pull_pct, 0) -
                   ifelse(length(oppo_pct) > 0, oppo_pct, 0)

  list(
    distribution = distribution,
    performance = performance,
    pull_tendency_score = pull_tendency,
    spray_data = spray_data
  )
}

spray_analysis <- analyze_spray_tendencies(judge_data)

cat("\nSpray Chart Analysis\n")
cat(strrep("=", 60), "\n")
cat("\nBatted Ball Distribution:\n")
print(spray_analysis$distribution)

cat(sprintf("\nPull Tendency Score: %.1f\n", spray_analysis$pull_tendency_score))

cat("\nPerformance by Direction:\n")
print(spray_analysis$performance)

6.6.3 The Shift Era and Its End

From approximately 2015-2022, MLB experienced the "Shift Era" where defensive positioning became increasingly extreme, particularly against pull-heavy left-handed hitters. Teams would position three or even four infielders on the pull side, creating a massive disadvantage for hitters who couldn't adjust.

Impact of Shifts:


  • Pull-heavy hitters saw BABIP drops of 20-40 points

  • Ground ball pull hitters were most affected

  • Created incentive for launch angle revolution (hit over the shift)

  • Some hitters learned to hit opposite field, others refused to adjust

2023 Rule Change:
MLB banned extreme shifts starting in 2023, requiring:


  • Two infielders on each side of second base

  • All infielders on the infield dirt when pitch is released

Post-Shift Results:


  • BABIP increased league-wide by ~10 points

  • Pull-heavy hitters benefited most

  • Batting averages rose across the board

  • Reduced the penalty for being pull-dominant

For historical analysis (2015-2022 data), spray tendencies were crucial for understanding player value. Post-2023, they're less impactful but still relevant for hitting approach and pitch coverage.


R
# R version: Spray analysis
library(ggplot2)

calculate_spray_angle <- function(hc_x, hc_y, batter_side) {
  home_x <- 125
  home_y <- 205

  rel_x <- hc_x - home_x
  rel_y <- hc_y - home_y

  angle_rad <- atan2(rel_x, rel_y)
  angle_deg <- angle_rad * 180 / pi

  if (batter_side == 'L') {
    angle_deg <- -angle_deg
  }

  return(angle_deg)
}

analyze_spray_tendencies <- function(df) {
  spray_data <- df %>%
    filter(!is.na(hc_x), !is.na(hc_y)) %>%
    rowwise() %>%
    mutate(
      spray_angle = calculate_spray_angle(hc_x, hc_y, stand),
      spray_direction = case_when(
        spray_angle > 15 ~ 'Pull',
        spray_angle < -15 ~ 'Opposite',
        TRUE ~ 'Center'
      )
    ) %>%
    ungroup()

  # Distribution
  distribution <- spray_data %>%
    count(spray_direction) %>%
    mutate(percentage = n / sum(n) * 100)

  # Performance by direction
  performance <- spray_data %>%
    group_by(spray_direction) %>%
    summarise(
      BA = mean(events %in% c('single', 'double', 'triple', 'home_run'),
               na.rm = TRUE),
      xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
      xwOBA = mean(estimated_woba_using_speedangle, na.rm = TRUE),
      avg_ev = mean(launch_speed, na.rm = TRUE),
      avg_la = mean(launch_angle, na.rm = TRUE),
      count = n(),
      .groups = 'drop'
    ) %>%
    mutate(across(c(BA, xBA, xwOBA, avg_ev, avg_la), round, 3))

  # Pull tendency score
  pull_pct <- distribution %>%
    filter(spray_direction == 'Pull') %>%
    pull(percentage)
  oppo_pct <- distribution %>%
    filter(spray_direction == 'Opposite') %>%
    pull(percentage)

  pull_tendency <- ifelse(length(pull_pct) > 0, pull_pct, 0) -
                   ifelse(length(oppo_pct) > 0, oppo_pct, 0)

  list(
    distribution = distribution,
    performance = performance,
    pull_tendency_score = pull_tendency,
    spray_data = spray_data
  )
}

spray_analysis <- analyze_spray_tendencies(judge_data)

cat("\nSpray Chart Analysis\n")
cat(strrep("=", 60), "\n")
cat("\nBatted Ball Distribution:\n")
print(spray_analysis$distribution)

cat(sprintf("\nPull Tendency Score: %.1f\n", spray_analysis$pull_tendency_score))

cat("\nPerformance by Direction:\n")
print(spray_analysis$performance)
Python
import numpy as np

def calculate_spray_angle(hc_x, hc_y, batter_side):
    """
    Calculate spray angle from hit coordinates.

    Parameters:
    hc_x: Horizontal coordinate (Statcast coordinate system)
    hc_y: Vertical coordinate (Statcast coordinate system)
    batter_side: 'R' for right-handed, 'L' for left-handed

    Returns:
    Spray angle in degrees
    """
    # Convert Statcast coordinates to spray angle
    # Home plate is roughly at (125, 205) in Statcast coordinates
    home_x, home_y = 125, 205

    # Calculate relative position
    rel_x = hc_x - home_x
    rel_y = hc_y - home_y

    # Calculate angle in radians, then convert to degrees
    angle_rad = np.arctan2(rel_x, rel_y)
    angle_deg = np.degrees(angle_rad)

    # Adjust for batter handedness
    # For LHH, flip the sign to maintain consistent pull/oppo definition
    if batter_side == 'L':
        angle_deg = -angle_deg

    return angle_deg

def categorize_spray_direction(spray_angle):
    """
    Categorize spray angle into pull, center, opposite field.

    Standard definitions:
    - Pull: > 15 degrees
    - Center: -15 to 15 degrees
    - Opposite: < -15 degrees
    """
    if spray_angle > 15:
        return 'Pull'
    elif spray_angle < -15:
        return 'Opposite'
    else:
        return 'Center'

# Example: Add spray metrics to dataframe
def add_spray_metrics(df):
    """Add spray angle and direction to Statcast dataframe."""
    df_spray = df.copy()

    # Calculate spray angle for each batted ball
    df_spray['spray_angle'] = df_spray.apply(
        lambda row: calculate_spray_angle(
            row['hc_x'], row['hc_y'], row['stand']
        ) if pd.notna(row['hc_x']) else None,
        axis=1
    )

    # Categorize direction
    df_spray['spray_direction'] = df_spray['spray_angle'].apply(
        lambda x: categorize_spray_direction(x) if pd.notna(x) else None
    )

    return df_spray

# Apply to our data
statcast_with_spray = add_spray_metrics(statcast_data)
Python
def analyze_spray_tendencies(df):
    """
    Comprehensive spray chart analysis.

    Analyzes distribution and performance by spray direction.
    """
    # Add spray metrics if not already present
    if 'spray_direction' not in df.columns:
        df = add_spray_metrics(df)

    spray_data = df[df['spray_direction'].notna()].copy()

    if len(spray_data) == 0:
        return None

    # Distribution analysis
    distribution = spray_data['spray_direction'].value_counts(normalize=True) * 100

    # Performance by direction
    performance = spray_data.groupby('spray_direction').agg({
        'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean(),
        'estimated_ba_using_speedangle': 'mean',
        'estimated_woba_using_speedangle': 'mean',
        'launch_speed': 'mean',
        'launch_angle': 'mean'
    }).round(3)

    performance.columns = ['BA', 'xBA', 'xwOBA', 'Avg_EV', 'Avg_LA']
    performance['Percentage'] = distribution.round(1)

    # Home runs by direction
    hr_data = spray_data[spray_data['events'] == 'home_run']
    hr_distribution = hr_data['spray_direction'].value_counts()

    metrics = {
        'distribution': distribution.to_dict(),
        'performance': performance,
        'hr_distribution': hr_distribution.to_dict(),
        'total_batted_balls': len(spray_data),
        'total_hr': len(hr_data)
    }

    # Pull tendency score (-100 to +100)
    # +100 = extreme pull, -100 = extreme opposite field
    pull_pct = distribution.get('Pull', 0)
    oppo_pct = distribution.get('Opposite', 0)
    metrics['pull_tendency_score'] = pull_pct - oppo_pct

    return metrics

spray_analysis = analyze_spray_tendencies(statcast_with_spray)

print("\nSpray Chart Analysis")
print("=" * 60)
print("\nBatted Ball Distribution:")
for direction, pct in spray_analysis['distribution'].items():
    print(f"  {direction}: {pct:.1f}%")

print(f"\nPull Tendency Score: {spray_analysis['pull_tendency_score']:.1f}")
if spray_analysis['pull_tendency_score'] > 20:
    print("  → PULL-HEAVY hitter (vulnerable to shifts)")
elif spray_analysis['pull_tendency_score'] < -20:
    print("  → OPPOSITE FIELD hitter (uses whole field)")
else:
    print("  → BALANCED spray approach")

print("\nPerformance by Direction:")
print(spray_analysis['performance'])

print("\nHome Run Distribution:")
for direction, count in spray_analysis['hr_distribution'].items():
    pct = (count / spray_analysis['total_hr']) * 100
    print(f"  {direction}: {count} ({pct:.1f}%)")

6.7 Sprint Speed and Baserunning

6.7.1 Understanding Sprint Speed

Sprint Speed measures a player's maximum running speed in feet per second (ft/s). Unlike stolen base totals (which depend on opportunity and decision-making), sprint speed is a pure athleticism metric.

Statcast defines sprint speed as: "A player's fastest one-second window on competitive plays"

Competitive plays include:


  • Home to first on ground balls or bunt hits

  • First to third on singles

  • Second to home on singles

  • First to home on doubles

  • Any baserunning advancement attempt

Sprint Speed Scale:

CategorySprint Speed (ft/s)Examples
Elite30+Bobby Witt Jr., Elly De La Cruz, Ronald Acuña Jr.
Plus28.5 - 29.9Trea Turner, Jazz Chisholm, CJ Abrams
Above Average27.5 - 28.4Mookie Betts, Francisco Lindor
Average27.0 - 27.4League average
Below Average26.0 - 26.9Many DHs and corner players
Poor< 26.0Slow-footed power hitters

The MLB average sprint speed is approximately 27 ft/s (about 18.4 mph).

6.7.2 Sprint Speed Impact

Sprint speed affects baseball outcomes in multiple ways:

  1. Infield Hits: Fast runners beat out more ground balls
  2. BABIP: Higher speed = higher BABIP, especially on ground balls
  3. Extra Bases: Speed allows taking extra bases on hits
  4. Stolen Bases: Prerequisite for successful base stealing
  5. Defensive Range: Fast players cover more ground (for position players)
def analyze_sprint_speed_impact(df):
    """
    Analyze how sprint speed correlates with offensive outcomes.

    Note: Sprint speed data requires full season aggregation.
    This example shows the analytical approach.
    """
    # Filter for ground balls (most affected by speed)
    ground_balls = df[
        (df['launch_angle'].notna()) &
        (df['launch_angle'] < 10)
    ].copy()

    if len(ground_balls) == 0:
        return None

    # Analyze ground ball outcomes
    ground_balls['is_hit'] = ground_balls['events'].isin([
        'single', 'double', 'triple', 'home_run'
    ])

    gb_analysis = {
        'total_ground_balls': len(ground_balls),
        'gb_hits': ground_balls['is_hit'].sum(),
        'gb_hit_rate': ground_balls['is_hit'].mean(),
        'avg_ev_on_gb': ground_balls['launch_speed'].mean(),
        'infield_singles': len(ground_balls[
            (ground_balls['events'] == 'single') &
            (ground_balls['hit_distance_sc'] < 150)
        ])
    }

    # Calculate expected vs actual on ground balls
    if 'estimated_ba_using_speedangle' in ground_balls.columns:
        gb_xba = ground_balls['estimated_ba_using_speedangle'].mean()
        gb_actual = ground_balls['is_hit'].mean()
        gb_analysis['gb_xBA'] = gb_xba
        gb_analysis['gb_actual_BA'] = gb_actual
        gb_analysis['speed_boost'] = gb_actual - gb_xba

    return gb_analysis

speed_impact = analyze_sprint_speed_impact(statcast_data)

print("\nSprint Speed Impact Analysis")
print("=" * 60)
print(f"Total Ground Balls: {speed_impact['total_ground_balls']}")
print(f"Ground Ball Hit Rate: {speed_impact['gb_hit_rate']:.3f}")
print(f"Average EV on GB: {speed_impact['avg_ev_on_gb']:.1f} mph")
print(f"Infield Singles: {speed_impact['infield_singles']}")

if 'speed_boost' in speed_impact:
    print(f"\nGround Ball xBA: {speed_impact['gb_xBA']:.3f}")
    print(f"Ground Ball Actual BA: {speed_impact['gb_actual_BA']:.3f}")
    print(f"Speed Boost: {speed_impact['speed_boost']:+.3f}")

    if speed_impact['speed_boost'] > 0.030:
        print("  → Elite speed creating extra value on ground balls")
    elif speed_impact['speed_boost'] < -0.030:
        print("  → Poor speed costing value on ground balls")

6.7.3 xBA Adjustment for Speed

One limitation of standard xBA is that it doesn't account for player speed. A 70 mph ground ball has different hit probabilities for:


  • Elite speed (30 ft/s): ~.350 BA

  • Average speed (27 ft/s): ~.250 BA

  • Poor speed (25 ft/s): ~.180 BA

More advanced models incorporate sprint speed into expected batting average calculations:

def calculate_speed_adjusted_xba(df, sprint_speed):
    """
    Adjust xBA based on player's sprint speed.

    This is a simplified model. MLB's official xBA doesn't account for speed,
    but more advanced models do.

    Parameters:
    df: Statcast dataframe
    sprint_speed: Player's sprint speed in ft/s
    """
    batted_balls = df[df['estimated_ba_using_speedangle'].notna()].copy()

    # Speed adjustment factor
    # Average sprint speed is ~27 ft/s
    # Each 1 ft/s above/below average adds/subtracts ~.015 to GB xBA
    speed_adjustment = (sprint_speed - 27.0) * 0.015

    # Apply adjustment only to ground balls where speed matters most
    batted_balls['speed_adjusted_xba'] = batted_balls['estimated_ba_using_speedangle']

    ground_ball_mask = batted_balls['launch_angle'] < 10
    batted_balls.loc[ground_ball_mask, 'speed_adjusted_xba'] += speed_adjustment

    # Calculate overall adjusted xBA
    standard_xba = batted_balls['estimated_ba_using_speedangle'].mean()
    adjusted_xba = batted_balls['speed_adjusted_xba'].mean()

    results = {
        'sprint_speed': sprint_speed,
        'standard_xBA': standard_xba,
        'speed_adjusted_xBA': adjusted_xba,
        'adjustment': adjusted_xba - standard_xba,
        'ground_ball_pct': ground_ball_mask.mean() * 100
    }

    return results

# Example for a fast player (e.g., Bobby Witt Jr. at 30.4 ft/s)
fast_player_xba = calculate_speed_adjusted_xba(statcast_data, sprint_speed=30.4)

print("\nSpeed-Adjusted xBA Analysis")
print("=" * 60)
print(f"Player Sprint Speed: {fast_player_xba['sprint_speed']:.1f} ft/s")
print(f"Standard xBA: {fast_player_xba['standard_xBA']:.3f}")
print(f"Speed-Adjusted xBA: {fast_player_xba['speed_adjusted_xBA']:.3f}")
print(f"Speed Value: {fast_player_xba['adjustment']:+.3f}")
print(f"Ground Ball Rate: {fast_player_xba['ground_ball_pct']:.1f}%")

Key Insight: Speed is most valuable for ground ball hitters. Fly ball power hitters gain minimal benefit from elite speed on batted balls (though it helps in baserunning).


Python
def analyze_sprint_speed_impact(df):
    """
    Analyze how sprint speed correlates with offensive outcomes.

    Note: Sprint speed data requires full season aggregation.
    This example shows the analytical approach.
    """
    # Filter for ground balls (most affected by speed)
    ground_balls = df[
        (df['launch_angle'].notna()) &
        (df['launch_angle'] < 10)
    ].copy()

    if len(ground_balls) == 0:
        return None

    # Analyze ground ball outcomes
    ground_balls['is_hit'] = ground_balls['events'].isin([
        'single', 'double', 'triple', 'home_run'
    ])

    gb_analysis = {
        'total_ground_balls': len(ground_balls),
        'gb_hits': ground_balls['is_hit'].sum(),
        'gb_hit_rate': ground_balls['is_hit'].mean(),
        'avg_ev_on_gb': ground_balls['launch_speed'].mean(),
        'infield_singles': len(ground_balls[
            (ground_balls['events'] == 'single') &
            (ground_balls['hit_distance_sc'] < 150)
        ])
    }

    # Calculate expected vs actual on ground balls
    if 'estimated_ba_using_speedangle' in ground_balls.columns:
        gb_xba = ground_balls['estimated_ba_using_speedangle'].mean()
        gb_actual = ground_balls['is_hit'].mean()
        gb_analysis['gb_xBA'] = gb_xba
        gb_analysis['gb_actual_BA'] = gb_actual
        gb_analysis['speed_boost'] = gb_actual - gb_xba

    return gb_analysis

speed_impact = analyze_sprint_speed_impact(statcast_data)

print("\nSprint Speed Impact Analysis")
print("=" * 60)
print(f"Total Ground Balls: {speed_impact['total_ground_balls']}")
print(f"Ground Ball Hit Rate: {speed_impact['gb_hit_rate']:.3f}")
print(f"Average EV on GB: {speed_impact['avg_ev_on_gb']:.1f} mph")
print(f"Infield Singles: {speed_impact['infield_singles']}")

if 'speed_boost' in speed_impact:
    print(f"\nGround Ball xBA: {speed_impact['gb_xBA']:.3f}")
    print(f"Ground Ball Actual BA: {speed_impact['gb_actual_BA']:.3f}")
    print(f"Speed Boost: {speed_impact['speed_boost']:+.3f}")

    if speed_impact['speed_boost'] > 0.030:
        print("  → Elite speed creating extra value on ground balls")
    elif speed_impact['speed_boost'] < -0.030:
        print("  → Poor speed costing value on ground balls")
Python
def calculate_speed_adjusted_xba(df, sprint_speed):
    """
    Adjust xBA based on player's sprint speed.

    This is a simplified model. MLB's official xBA doesn't account for speed,
    but more advanced models do.

    Parameters:
    df: Statcast dataframe
    sprint_speed: Player's sprint speed in ft/s
    """
    batted_balls = df[df['estimated_ba_using_speedangle'].notna()].copy()

    # Speed adjustment factor
    # Average sprint speed is ~27 ft/s
    # Each 1 ft/s above/below average adds/subtracts ~.015 to GB xBA
    speed_adjustment = (sprint_speed - 27.0) * 0.015

    # Apply adjustment only to ground balls where speed matters most
    batted_balls['speed_adjusted_xba'] = batted_balls['estimated_ba_using_speedangle']

    ground_ball_mask = batted_balls['launch_angle'] < 10
    batted_balls.loc[ground_ball_mask, 'speed_adjusted_xba'] += speed_adjustment

    # Calculate overall adjusted xBA
    standard_xba = batted_balls['estimated_ba_using_speedangle'].mean()
    adjusted_xba = batted_balls['speed_adjusted_xba'].mean()

    results = {
        'sprint_speed': sprint_speed,
        'standard_xBA': standard_xba,
        'speed_adjusted_xBA': adjusted_xba,
        'adjustment': adjusted_xba - standard_xba,
        'ground_ball_pct': ground_ball_mask.mean() * 100
    }

    return results

# Example for a fast player (e.g., Bobby Witt Jr. at 30.4 ft/s)
fast_player_xba = calculate_speed_adjusted_xba(statcast_data, sprint_speed=30.4)

print("\nSpeed-Adjusted xBA Analysis")
print("=" * 60)
print(f"Player Sprint Speed: {fast_player_xba['sprint_speed']:.1f} ft/s")
print(f"Standard xBA: {fast_player_xba['standard_xBA']:.3f}")
print(f"Speed-Adjusted xBA: {fast_player_xba['speed_adjusted_xBA']:.3f}")
print(f"Speed Value: {fast_player_xba['adjustment']:+.3f}")
print(f"Ground Ball Rate: {fast_player_xba['ground_ball_pct']:.1f}%")

6.8 Advanced Hitting Analysis

6.8.1 Plate Coverage Analysis

Understanding where a hitter performs best in the strike zone reveals:


  • Pitch coverage: Can they handle inside? Outside? Up? Down?

  • Weaknesses: Where do they struggle? (Pitcher targeting zones)

  • Approach adjustments: How have they adapted?

def analyze_plate_coverage(df):
    """
    Analyze performance by pitch location zones.

    Uses Statcast's zone classification:
    - Zones 1-9: Inside the strike zone
    - Zones 11-14: Outside the strike zone
    """
    pitch_data = df[df['zone'].notna()].copy()

    # Categorize zones
    def categorize_zone(zone):
        if zone in [1, 2, 3, 4, 5, 6, 7, 8, 9]:
            return 'In Zone'
        elif zone in [11, 12, 13, 14]:
            return 'Out of Zone'
        else:
            return 'Other'

    pitch_data['zone_category'] = pitch_data['zone'].apply(categorize_zone)

    # Overall zone performance
    zone_performance = pitch_data.groupby('zone_category').agg({
        'description': lambda x: (x == 'hit_into_play').sum(),  # Swings
        'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).sum(),
        'type': lambda x: (x == 'X').sum()  # Balls in play
    })

    # More detailed zone breakdown
    detailed_zones = pitch_data.groupby('zone').agg({
        'type': 'count',  # Total pitches
        'description': lambda x: (x.isin(['hit_into_play', 'foul', 'swinging_strike'])).mean(),  # Swing%
        'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean()  # Hit rate when swinging
    }).round(3)

    detailed_zones.columns = ['Pitches', 'Swing_Rate', 'Hit_Rate']

    # High/Low/Middle breakdown
    pitch_data['vertical_location'] = pd.cut(
        pitch_data['plate_z'],
        bins=[0, 2.0, 3.5, 5.0],
        labels=['Low', 'Middle', 'High']
    )

    pitch_data['horizontal_location'] = pd.cut(
        pitch_data['plate_x'],
        bins=[-2.5, -0.5, 0.5, 2.5],
        labels=['Inside', 'Middle', 'Outside']
    )

    location_performance = pitch_data.groupby(['vertical_location', 'horizontal_location']).agg({
        'estimated_woba_using_speedangle': 'mean',
        'launch_speed': 'mean',
        'type': 'count'
    }).round(3)

    location_performance.columns = ['xwOBA', 'Avg_EV', 'Pitches']

    return {
        'zone_performance': zone_performance,
        'detailed_zones': detailed_zones,
        'location_performance': location_performance
    }

coverage_analysis = analyze_plate_coverage(statcast_data)

print("\nPlate Coverage Analysis")
print("=" * 60)
print("\nPerformance by Zone Category:")
print(coverage_analysis['zone_performance'])
print("\nDetailed Zone Breakdown:")
print(coverage_analysis['detailed_zones'])
print("\nPerformance by Location (Vertical × Horizontal):")
print(coverage_analysis['location_performance'])
# R version: Plate coverage
analyze_plate_coverage <- function(df) {
  pitch_data <- df %>%
    filter(!is.na(zone)) %>%
    mutate(
      zone_category = case_when(
        zone %in% 1:9 ~ 'In Zone',
        zone %in% 11:14 ~ 'Out of Zone',
        TRUE ~ 'Other'
      ),
      vertical_location = cut(
        plate_z,
        breaks = c(0, 2.0, 3.5, 5.0),
        labels = c('Low', 'Middle', 'High')
      ),
      horizontal_location = cut(
        plate_x,
        breaks = c(-2.5, -0.5, 0.5, 2.5),
        labels = c('Inside', 'Middle', 'Outside')
      )
    )

  # Zone performance
  zone_performance <- pitch_data %>%
    group_by(zone_category) %>%
    summarise(
      pitches = n(),
      swing_rate = mean(description %in% c('hit_into_play', 'foul',
                                           'swinging_strike'), na.rm = TRUE),
      contact_rate = mean(type == 'X', na.rm = TRUE),
      .groups = 'drop'
    )

  # Location performance
  location_performance <- pitch_data %>%
    filter(!is.na(vertical_location), !is.na(horizontal_location)) %>%
    group_by(vertical_location, horizontal_location) %>%
    summarise(
      xwOBA = mean(estimated_woba_using_speedangle, na.rm = TRUE),
      avg_ev = mean(launch_speed, na.rm = TRUE),
      pitches = n(),
      .groups = 'drop'
    ) %>%
    mutate(across(c(xwOBA, avg_ev), round, 3))

  list(
    zone_performance = zone_performance,
    location_performance = location_performance
  )
}

coverage <- analyze_plate_coverage(judge_data)
print("Plate Coverage Analysis:")
print(coverage$zone_performance)
print(coverage$location_performance)

6.8.2 Performance by Pitch Type

Different hitters have different strengths against pitch types:

def analyze_pitch_type_performance(df):
    """
    Analyze performance against different pitch types.

    Common pitch types:
    - FF: Four-seam fastball
    - SI: Sinker
    - SL: Slider
    - CH: Changeup
    - CU: Curveball
    - FC: Cutter
    """
    pitch_data = df[df['pitch_type'].notna()].copy()

    performance_by_pitch = pitch_data.groupby('pitch_type').agg({
        'type': 'count',  # Total pitches
        'description': lambda x: (x.isin(['swinging_strike', 'foul',
                                          'hit_into_play'])).mean(),  # Swing%
        'events': lambda x: (x == 'strikeout').mean(),  # K rate
        'estimated_woba_using_speedangle': 'mean',  # xwOBA
        'launch_speed': 'mean',  # Avg EV
        'launch_angle': 'mean'  # Avg LA
    }).round(3)

    performance_by_pitch.columns = ['Pitches', 'Swing%', 'K%', 'xwOBA',
                                    'Avg_EV', 'Avg_LA']
    performance_by_pitch = performance_by_pitch.sort_values('Pitches',
                                                            ascending=False)

    # Fastball vs. Offspeed
    pitch_data['pitch_category'] = pitch_data['pitch_type'].apply(
        lambda x: 'Fastball' if x in ['FF', 'SI', 'FC'] else 'Offspeed'
    )

    category_performance = pitch_data.groupby('pitch_category').agg({
        'type': 'count',
        'estimated_woba_using_speedangle': 'mean',
        'launch_speed': 'mean',
        'events': lambda x: (x == 'strikeout').mean()
    }).round(3)

    category_performance.columns = ['Pitches', 'xwOBA', 'Avg_EV', 'K%']

    return {
        'by_pitch_type': performance_by_pitch,
        'by_category': category_performance
    }

pitch_performance = analyze_pitch_type_performance(statcast_data)

print("\nPitch Type Performance Analysis")
print("=" * 60)
print("\nPerformance by Specific Pitch Type:")
print(pitch_performance['by_pitch_type'])
print("\nFastball vs. Offspeed:")
print(pitch_performance['by_category'])

6.8.3 Count-Based Performance

How hitters perform in different counts reveals approach and discipline:

def analyze_count_performance(df):
    """
    Analyze performance by ball-strike count.

    Key counts:
    - Hitter's counts: 1-0, 2-0, 3-0, 2-1, 3-1
    - Pitcher's counts: 0-1, 0-2, 1-2, 2-2
    - Even counts: 0-0, 1-1
    """
    count_data = df[df['balls'].notna() & df['strikes'].notna()].copy()

    count_data['count_string'] = (count_data['balls'].astype(int).astype(str) +
                                   '-' +
                                   count_data['strikes'].astype(int).astype(str))

    # Categorize counts
    def categorize_count(count_str):
        hitters_counts = ['1-0', '2-0', '3-0', '2-1', '3-1']
        pitchers_counts = ['0-1', '0-2', '1-2', '2-2']

        if count_str in hitters_counts:
            return "Hitter's Count"
        elif count_str in pitchers_counts:
            return "Pitcher's Count"
        else:
            return "Even Count"

    count_data['count_category'] = count_data['count_string'].apply(categorize_count)

    # Performance by count category
    category_performance = count_data.groupby('count_category').agg({
        'type': 'count',
        'estimated_woba_using_speedangle': 'mean',
        'launch_speed': 'mean',
        'description': lambda x: (x.isin(['swinging_strike', 'foul',
                                          'hit_into_play'])).mean()
    }).round(3)

    category_performance.columns = ['Pitches', 'xwOBA', 'Avg_EV', 'Swing%']

    # Detailed count performance
    detailed_counts = count_data.groupby('count_string').agg({
        'type': 'count',
        'estimated_woba_using_speedangle': 'mean',
        'launch_speed': 'mean'
    }).round(3)

    detailed_counts.columns = ['Pitches', 'xwOBA', 'Avg_EV']
    detailed_counts = detailed_counts.sort_values('Pitches', ascending=False)

    return {
        'by_category': category_performance,
        'by_count': detailed_counts
    }

count_performance = analyze_count_performance(statcast_data)

print("\nCount-Based Performance Analysis")
print("=" * 60)
print("\nPerformance by Count Category:")
print(count_performance['by_category'])
print("\nTop 10 Counts by Frequency:")
print(count_performance['by_count'].head(10))

R
# R version: Plate coverage
analyze_plate_coverage <- function(df) {
  pitch_data <- df %>%
    filter(!is.na(zone)) %>%
    mutate(
      zone_category = case_when(
        zone %in% 1:9 ~ 'In Zone',
        zone %in% 11:14 ~ 'Out of Zone',
        TRUE ~ 'Other'
      ),
      vertical_location = cut(
        plate_z,
        breaks = c(0, 2.0, 3.5, 5.0),
        labels = c('Low', 'Middle', 'High')
      ),
      horizontal_location = cut(
        plate_x,
        breaks = c(-2.5, -0.5, 0.5, 2.5),
        labels = c('Inside', 'Middle', 'Outside')
      )
    )

  # Zone performance
  zone_performance <- pitch_data %>%
    group_by(zone_category) %>%
    summarise(
      pitches = n(),
      swing_rate = mean(description %in% c('hit_into_play', 'foul',
                                           'swinging_strike'), na.rm = TRUE),
      contact_rate = mean(type == 'X', na.rm = TRUE),
      .groups = 'drop'
    )

  # Location performance
  location_performance <- pitch_data %>%
    filter(!is.na(vertical_location), !is.na(horizontal_location)) %>%
    group_by(vertical_location, horizontal_location) %>%
    summarise(
      xwOBA = mean(estimated_woba_using_speedangle, na.rm = TRUE),
      avg_ev = mean(launch_speed, na.rm = TRUE),
      pitches = n(),
      .groups = 'drop'
    ) %>%
    mutate(across(c(xwOBA, avg_ev), round, 3))

  list(
    zone_performance = zone_performance,
    location_performance = location_performance
  )
}

coverage <- analyze_plate_coverage(judge_data)
print("Plate Coverage Analysis:")
print(coverage$zone_performance)
print(coverage$location_performance)
Python
def analyze_plate_coverage(df):
    """
    Analyze performance by pitch location zones.

    Uses Statcast's zone classification:
    - Zones 1-9: Inside the strike zone
    - Zones 11-14: Outside the strike zone
    """
    pitch_data = df[df['zone'].notna()].copy()

    # Categorize zones
    def categorize_zone(zone):
        if zone in [1, 2, 3, 4, 5, 6, 7, 8, 9]:
            return 'In Zone'
        elif zone in [11, 12, 13, 14]:
            return 'Out of Zone'
        else:
            return 'Other'

    pitch_data['zone_category'] = pitch_data['zone'].apply(categorize_zone)

    # Overall zone performance
    zone_performance = pitch_data.groupby('zone_category').agg({
        'description': lambda x: (x == 'hit_into_play').sum(),  # Swings
        'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).sum(),
        'type': lambda x: (x == 'X').sum()  # Balls in play
    })

    # More detailed zone breakdown
    detailed_zones = pitch_data.groupby('zone').agg({
        'type': 'count',  # Total pitches
        'description': lambda x: (x.isin(['hit_into_play', 'foul', 'swinging_strike'])).mean(),  # Swing%
        'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean()  # Hit rate when swinging
    }).round(3)

    detailed_zones.columns = ['Pitches', 'Swing_Rate', 'Hit_Rate']

    # High/Low/Middle breakdown
    pitch_data['vertical_location'] = pd.cut(
        pitch_data['plate_z'],
        bins=[0, 2.0, 3.5, 5.0],
        labels=['Low', 'Middle', 'High']
    )

    pitch_data['horizontal_location'] = pd.cut(
        pitch_data['plate_x'],
        bins=[-2.5, -0.5, 0.5, 2.5],
        labels=['Inside', 'Middle', 'Outside']
    )

    location_performance = pitch_data.groupby(['vertical_location', 'horizontal_location']).agg({
        'estimated_woba_using_speedangle': 'mean',
        'launch_speed': 'mean',
        'type': 'count'
    }).round(3)

    location_performance.columns = ['xwOBA', 'Avg_EV', 'Pitches']

    return {
        'zone_performance': zone_performance,
        'detailed_zones': detailed_zones,
        'location_performance': location_performance
    }

coverage_analysis = analyze_plate_coverage(statcast_data)

print("\nPlate Coverage Analysis")
print("=" * 60)
print("\nPerformance by Zone Category:")
print(coverage_analysis['zone_performance'])
print("\nDetailed Zone Breakdown:")
print(coverage_analysis['detailed_zones'])
print("\nPerformance by Location (Vertical × Horizontal):")
print(coverage_analysis['location_performance'])
Python
def analyze_pitch_type_performance(df):
    """
    Analyze performance against different pitch types.

    Common pitch types:
    - FF: Four-seam fastball
    - SI: Sinker
    - SL: Slider
    - CH: Changeup
    - CU: Curveball
    - FC: Cutter
    """
    pitch_data = df[df['pitch_type'].notna()].copy()

    performance_by_pitch = pitch_data.groupby('pitch_type').agg({
        'type': 'count',  # Total pitches
        'description': lambda x: (x.isin(['swinging_strike', 'foul',
                                          'hit_into_play'])).mean(),  # Swing%
        'events': lambda x: (x == 'strikeout').mean(),  # K rate
        'estimated_woba_using_speedangle': 'mean',  # xwOBA
        'launch_speed': 'mean',  # Avg EV
        'launch_angle': 'mean'  # Avg LA
    }).round(3)

    performance_by_pitch.columns = ['Pitches', 'Swing%', 'K%', 'xwOBA',
                                    'Avg_EV', 'Avg_LA']
    performance_by_pitch = performance_by_pitch.sort_values('Pitches',
                                                            ascending=False)

    # Fastball vs. Offspeed
    pitch_data['pitch_category'] = pitch_data['pitch_type'].apply(
        lambda x: 'Fastball' if x in ['FF', 'SI', 'FC'] else 'Offspeed'
    )

    category_performance = pitch_data.groupby('pitch_category').agg({
        'type': 'count',
        'estimated_woba_using_speedangle': 'mean',
        'launch_speed': 'mean',
        'events': lambda x: (x == 'strikeout').mean()
    }).round(3)

    category_performance.columns = ['Pitches', 'xwOBA', 'Avg_EV', 'K%']

    return {
        'by_pitch_type': performance_by_pitch,
        'by_category': category_performance
    }

pitch_performance = analyze_pitch_type_performance(statcast_data)

print("\nPitch Type Performance Analysis")
print("=" * 60)
print("\nPerformance by Specific Pitch Type:")
print(pitch_performance['by_pitch_type'])
print("\nFastball vs. Offspeed:")
print(pitch_performance['by_category'])
Python
def analyze_count_performance(df):
    """
    Analyze performance by ball-strike count.

    Key counts:
    - Hitter's counts: 1-0, 2-0, 3-0, 2-1, 3-1
    - Pitcher's counts: 0-1, 0-2, 1-2, 2-2
    - Even counts: 0-0, 1-1
    """
    count_data = df[df['balls'].notna() & df['strikes'].notna()].copy()

    count_data['count_string'] = (count_data['balls'].astype(int).astype(str) +
                                   '-' +
                                   count_data['strikes'].astype(int).astype(str))

    # Categorize counts
    def categorize_count(count_str):
        hitters_counts = ['1-0', '2-0', '3-0', '2-1', '3-1']
        pitchers_counts = ['0-1', '0-2', '1-2', '2-2']

        if count_str in hitters_counts:
            return "Hitter's Count"
        elif count_str in pitchers_counts:
            return "Pitcher's Count"
        else:
            return "Even Count"

    count_data['count_category'] = count_data['count_string'].apply(categorize_count)

    # Performance by count category
    category_performance = count_data.groupby('count_category').agg({
        'type': 'count',
        'estimated_woba_using_speedangle': 'mean',
        'launch_speed': 'mean',
        'description': lambda x: (x.isin(['swinging_strike', 'foul',
                                          'hit_into_play'])).mean()
    }).round(3)

    category_performance.columns = ['Pitches', 'xwOBA', 'Avg_EV', 'Swing%']

    # Detailed count performance
    detailed_counts = count_data.groupby('count_string').agg({
        'type': 'count',
        'estimated_woba_using_speedangle': 'mean',
        'launch_speed': 'mean'
    }).round(3)

    detailed_counts.columns = ['Pitches', 'xwOBA', 'Avg_EV']
    detailed_counts = detailed_counts.sort_values('Pitches', ascending=False)

    return {
        'by_category': category_performance,
        'by_count': detailed_counts
    }

count_performance = analyze_count_performance(statcast_data)

print("\nCount-Based Performance Analysis")
print("=" * 60)
print("\nPerformance by Count Category:")
print(count_performance['by_category'])
print("\nTop 10 Counts by Frequency:")
print(count_performance['by_count'].head(10))

6.9 Building a Complete Hitter Profile

6.9.1 The Essential Stats for a Profile

A complete Statcast hitter profile should include:

  1. Power Metrics: Exit velocity, hard-hit%, barrel%, max EV
  2. Contact Quality: xBA, xwOBA, sweet spot%
  3. Approach: Launch angle, GB/LD/FB distribution
  4. Speed: Sprint speed, impact on BABIP
  5. Spray: Pull tendency, field usage
  6. Discipline: Plate coverage, pitch type splits
  7. Context: vs. actual stats (luck/regression indicators)
def create_complete_hitter_profile(df, player_name, sprint_speed=27.0):
    """
    Generate a comprehensive Statcast hitter profile.

    Parameters:
    df: Statcast dataframe for the player
    player_name: Player's name
    sprint_speed: Player's sprint speed (if available)

    Returns:
    Dictionary containing complete profile
    """
    profile = {'player_name': player_name}

    # 1. Power Metrics
    ev_metrics = calculate_ev_metrics(df)
    profile['power'] = {
        'avg_exit_velocity': ev_metrics['avg_ev'],
        'max_exit_velocity': ev_metrics['max_ev'],
        'hard_hit_rate': ev_metrics['hard_hit_pct'],
        'ev_90th_percentile': ev_metrics['ev_90th_percentile']
    }

    # 2. Barrel Metrics
    barrel_metrics = calculate_barrel_metrics(df)
    profile['barrels'] = {
        'barrel_rate': barrel_metrics['barrel_pct'],
        'barrel_pa_rate': barrel_metrics['barrel_pa_pct'],
        'avg_barrel_ev': barrel_metrics['avg_barrel_ev']
    }

    # 3. Launch Angle / Contact Type
    la_metrics = calculate_launch_angle_metrics(df)
    profile['batted_ball_profile'] = {
        'avg_launch_angle': la_metrics['avg_la'],
        'gb_rate': la_metrics['gb_pct'],
        'ld_rate': la_metrics['ld_pct'],
        'fb_rate': la_metrics['fb_pct'],
        'sweet_spot_rate': la_metrics['sweet_spot_pct']
    }

    # 4. Expected Stats
    xba_metrics = calculate_xba_metrics(df)
    xwoba_metrics = calculate_xwoba_metrics(df)
    profile['expected_stats'] = {
        'xBA': xba_metrics['xBA'],
        'xwOBA': xwoba_metrics['xwOBA'],
        'actual_wOBA': xwoba_metrics['actual_wOBA'],
        'woba_diff': xwoba_metrics['woba_diff']
    }

    # 5. Speed
    profile['speed'] = {
        'sprint_speed': sprint_speed,
        'speed_rating': 'Elite' if sprint_speed >= 30 else
                       'Plus' if sprint_speed >= 28.5 else
                       'Above Avg' if sprint_speed >= 27.5 else
                       'Average' if sprint_speed >= 27 else
                       'Below Avg'
    }

    # 6. Summary Statistics
    total_pa = len(df)
    batted_balls = len(df[df['launch_speed'].notna()])

    profile['summary'] = {
        'total_plate_appearances': total_pa,
        'total_batted_balls': batted_balls,
        'batted_ball_rate': (batted_balls / total_pa * 100) if total_pa > 0 else 0
    }

    return profile

def print_hitter_profile(profile):
    """Pretty print a hitter profile."""
    print("\n" + "=" * 70)
    print(f"STATCAST HITTER PROFILE: {profile['player_name']}")
    print("=" * 70)

    print("\n>>> POWER METRICS <<<")
    power = profile['power']
    print(f"  Average Exit Velocity: {power['avg_exit_velocity']:.1f} mph")
    print(f"  Maximum Exit Velocity: {power['max_exit_velocity']:.1f} mph")
    print(f"  Hard-Hit Rate (95+ mph): {power['hard_hit_rate']:.1f}%")
    print(f"  90th Percentile EV: {power['ev_90th_percentile']:.1f} mph")

    print("\n>>> BARREL METRICS <<<")
    barrels = profile['barrels']
    print(f"  Barrel Rate: {barrels['barrel_rate']:.1f}%")
    print(f"  Barrels per PA: {barrels['barrel_pa_rate']:.1f}%")
    print(f"  Avg Barrel Exit Velo: {barrels['avg_barrel_ev']:.1f} mph")

    print("\n>>> BATTED BALL PROFILE <<<")
    bb = profile['batted_ball_profile']
    print(f"  Average Launch Angle: {bb['avg_launch_angle']:.1f}°")
    print(f"  Ground Ball Rate: {bb['gb_rate']:.1f}%")
    print(f"  Line Drive Rate: {bb['ld_rate']:.1f}%")
    print(f"  Fly Ball Rate: {bb['fb_rate']:.1f}%")
    print(f"  Sweet Spot Rate (8-32°): {bb['sweet_spot_rate']:.1f}%")

    print("\n>>> EXPECTED STATISTICS <<<")
    xstats = profile['expected_stats']
    print(f"  Expected BA (xBA): {xstats['xBA']:.3f}")
    print(f"  Expected wOBA (xwOBA): {xstats['xwOBA']:.3f}")
    print(f"  Actual wOBA: {xstats['actual_wOBA']:.3f}")
    print(f"  wOBA Difference: {xstats['woba_diff']:+.3f}", end="")
    if abs(xstats['woba_diff']) > 0.025:
        if xstats['woba_diff'] > 0:
            print(" (OUTPERFORMING - regression risk)")
        else:
            print(" (UNDERPERFORMING - positive regression likely)")
    else:
        print(" (sustainable)")

    print("\n>>> SPEED METRICS <<<")
    speed = profile['speed']
    print(f"  Sprint Speed: {speed['sprint_speed']:.1f} ft/s ({speed['speed_rating']})")

    print("\n>>> SUMMARY <<<")
    summary = profile['summary']
    print(f"  Total Plate Appearances: {summary['total_plate_appearances']}")
    print(f"  Total Batted Balls: {summary['total_batted_balls']}")
    print(f"  Batted Ball Rate: {summary['batted_ball_rate']:.1f}%")

    print("\n" + "=" * 70)

# Create and display profile
player_profile = create_complete_hitter_profile(
    statcast_data,
    "Aaron Judge",
    sprint_speed=27.5  # Judge's approximate sprint speed
)
print_hitter_profile(player_profile)

This comprehensive profile gives scouts, analysts, and fantasy players a complete picture of a hitter's true talent level, independent of luck and circumstance.

6.9.2 Visualizing a Complete Profile

While we can't generate actual plots in this text format, here's what a complete visualization suite should include:

1. Exit Velocity Distribution Histogram


  • Shows the distribution of all exit velocities

  • Highlights hard-hit balls (95+) and barrels (98+)

  • Includes percentile markers

2. Launch Angle Distribution


  • Histogram of launch angles

  • Color-coded by outcome (HR, hit, out)

  • Shows sweet spot zone (8-32°)

3. Spray Chart


  • Visual representation of where balls are hit

  • Sized by exit velocity

  • Colored by outcome

  • Shows pull/center/opposite tendencies

4. xwOBA vs. Actual wOBA Scatter


  • Each point is a batted ball

  • Shows over/underperformance

  • Regression line indicates luck vs. skill

5. Heat Map by Pitch Location


  • Strike zone divided into grid

  • Color represents xwOBA or BA by zone

  • Reveals weaknesses and strengths

6. Performance by Count


  • Bar chart showing xwOBA in different counts

  • Separates hitter's counts, pitcher's counts, even

7. Radar Chart


  • Multi-dimensional profile

  • Axes: Exit Velo, Barrel%, Sweet Spot%, xwOBA, Sprint Speed, etc.

  • Compare to league average

# Example code structure for visualizations (requires matplotlib/seaborn)
import matplotlib.pyplot as plt
import seaborn as sns

def visualize_hitter_profile(df, player_name):
    """
    Create comprehensive visualization suite for a hitter.
    Requires matplotlib and seaborn libraries.
    """
    fig, axes = plt.subplots(3, 3, figsize=(18, 15))
    fig.suptitle(f'{player_name} - Complete Statcast Profile', fontsize=16)

    batted_balls = df[df['launch_speed'].notna()].copy()

    # 1. Exit Velocity Distribution
    axes[0, 0].hist(batted_balls['launch_speed'], bins=30, edgecolor='black')
    axes[0, 0].axvline(95, color='red', linestyle='--', label='Hard Hit (95mph)')
    axes[0, 0].set_title('Exit Velocity Distribution')
    axes[0, 0].set_xlabel('Exit Velocity (mph)')
    axes[0, 0].legend()

    # 2. Launch Angle Distribution
    axes[0, 1].hist(batted_balls['launch_angle'], bins=40, edgecolor='black')
    axes[0, 1].axvline(8, color='green', linestyle='--', label='Sweet Spot')
    axes[0, 1].axvline(32, color='green', linestyle='--')
    axes[0, 1].set_title('Launch Angle Distribution')
    axes[0, 1].set_xlabel('Launch Angle (degrees)')
    axes[0, 1].legend()

    # 3. EV vs LA Scatter (colored by outcome)
    scatter_data = batted_balls.copy()
    scatter_data['is_hit'] = scatter_data['events'].isin([
        'single', 'double', 'triple', 'home_run'
    ])
    colors = scatter_data['is_hit'].map({True: 'green', False: 'red'})

    axes[0, 2].scatter(scatter_data['launch_speed'],
                      scatter_data['launch_angle'],
                      c=colors, alpha=0.5, s=10)
    axes[0, 2].set_title('Exit Velocity vs Launch Angle')
    axes[0, 2].set_xlabel('Exit Velocity (mph)')
    axes[0, 2].set_ylabel('Launch Angle (degrees)')

    # Additional plots would follow...
    # (Sprint speed gauge, xwOBA comparison, zone heat map, etc.)

    plt.tight_layout()
    return fig

# Note: This is example code structure - actual implementation would require
# data and proper visualization library setup

Python
def create_complete_hitter_profile(df, player_name, sprint_speed=27.0):
    """
    Generate a comprehensive Statcast hitter profile.

    Parameters:
    df: Statcast dataframe for the player
    player_name: Player's name
    sprint_speed: Player's sprint speed (if available)

    Returns:
    Dictionary containing complete profile
    """
    profile = {'player_name': player_name}

    # 1. Power Metrics
    ev_metrics = calculate_ev_metrics(df)
    profile['power'] = {
        'avg_exit_velocity': ev_metrics['avg_ev'],
        'max_exit_velocity': ev_metrics['max_ev'],
        'hard_hit_rate': ev_metrics['hard_hit_pct'],
        'ev_90th_percentile': ev_metrics['ev_90th_percentile']
    }

    # 2. Barrel Metrics
    barrel_metrics = calculate_barrel_metrics(df)
    profile['barrels'] = {
        'barrel_rate': barrel_metrics['barrel_pct'],
        'barrel_pa_rate': barrel_metrics['barrel_pa_pct'],
        'avg_barrel_ev': barrel_metrics['avg_barrel_ev']
    }

    # 3. Launch Angle / Contact Type
    la_metrics = calculate_launch_angle_metrics(df)
    profile['batted_ball_profile'] = {
        'avg_launch_angle': la_metrics['avg_la'],
        'gb_rate': la_metrics['gb_pct'],
        'ld_rate': la_metrics['ld_pct'],
        'fb_rate': la_metrics['fb_pct'],
        'sweet_spot_rate': la_metrics['sweet_spot_pct']
    }

    # 4. Expected Stats
    xba_metrics = calculate_xba_metrics(df)
    xwoba_metrics = calculate_xwoba_metrics(df)
    profile['expected_stats'] = {
        'xBA': xba_metrics['xBA'],
        'xwOBA': xwoba_metrics['xwOBA'],
        'actual_wOBA': xwoba_metrics['actual_wOBA'],
        'woba_diff': xwoba_metrics['woba_diff']
    }

    # 5. Speed
    profile['speed'] = {
        'sprint_speed': sprint_speed,
        'speed_rating': 'Elite' if sprint_speed >= 30 else
                       'Plus' if sprint_speed >= 28.5 else
                       'Above Avg' if sprint_speed >= 27.5 else
                       'Average' if sprint_speed >= 27 else
                       'Below Avg'
    }

    # 6. Summary Statistics
    total_pa = len(df)
    batted_balls = len(df[df['launch_speed'].notna()])

    profile['summary'] = {
        'total_plate_appearances': total_pa,
        'total_batted_balls': batted_balls,
        'batted_ball_rate': (batted_balls / total_pa * 100) if total_pa > 0 else 0
    }

    return profile

def print_hitter_profile(profile):
    """Pretty print a hitter profile."""
    print("\n" + "=" * 70)
    print(f"STATCAST HITTER PROFILE: {profile['player_name']}")
    print("=" * 70)

    print("\n>>> POWER METRICS <<<")
    power = profile['power']
    print(f"  Average Exit Velocity: {power['avg_exit_velocity']:.1f} mph")
    print(f"  Maximum Exit Velocity: {power['max_exit_velocity']:.1f} mph")
    print(f"  Hard-Hit Rate (95+ mph): {power['hard_hit_rate']:.1f}%")
    print(f"  90th Percentile EV: {power['ev_90th_percentile']:.1f} mph")

    print("\n>>> BARREL METRICS <<<")
    barrels = profile['barrels']
    print(f"  Barrel Rate: {barrels['barrel_rate']:.1f}%")
    print(f"  Barrels per PA: {barrels['barrel_pa_rate']:.1f}%")
    print(f"  Avg Barrel Exit Velo: {barrels['avg_barrel_ev']:.1f} mph")

    print("\n>>> BATTED BALL PROFILE <<<")
    bb = profile['batted_ball_profile']
    print(f"  Average Launch Angle: {bb['avg_launch_angle']:.1f}°")
    print(f"  Ground Ball Rate: {bb['gb_rate']:.1f}%")
    print(f"  Line Drive Rate: {bb['ld_rate']:.1f}%")
    print(f"  Fly Ball Rate: {bb['fb_rate']:.1f}%")
    print(f"  Sweet Spot Rate (8-32°): {bb['sweet_spot_rate']:.1f}%")

    print("\n>>> EXPECTED STATISTICS <<<")
    xstats = profile['expected_stats']
    print(f"  Expected BA (xBA): {xstats['xBA']:.3f}")
    print(f"  Expected wOBA (xwOBA): {xstats['xwOBA']:.3f}")
    print(f"  Actual wOBA: {xstats['actual_wOBA']:.3f}")
    print(f"  wOBA Difference: {xstats['woba_diff']:+.3f}", end="")
    if abs(xstats['woba_diff']) > 0.025:
        if xstats['woba_diff'] > 0:
            print(" (OUTPERFORMING - regression risk)")
        else:
            print(" (UNDERPERFORMING - positive regression likely)")
    else:
        print(" (sustainable)")

    print("\n>>> SPEED METRICS <<<")
    speed = profile['speed']
    print(f"  Sprint Speed: {speed['sprint_speed']:.1f} ft/s ({speed['speed_rating']})")

    print("\n>>> SUMMARY <<<")
    summary = profile['summary']
    print(f"  Total Plate Appearances: {summary['total_plate_appearances']}")
    print(f"  Total Batted Balls: {summary['total_batted_balls']}")
    print(f"  Batted Ball Rate: {summary['batted_ball_rate']:.1f}%")

    print("\n" + "=" * 70)

# Create and display profile
player_profile = create_complete_hitter_profile(
    statcast_data,
    "Aaron Judge",
    sprint_speed=27.5  # Judge's approximate sprint speed
)
print_hitter_profile(player_profile)
Python
# Example code structure for visualizations (requires matplotlib/seaborn)
import matplotlib.pyplot as plt
import seaborn as sns

def visualize_hitter_profile(df, player_name):
    """
    Create comprehensive visualization suite for a hitter.
    Requires matplotlib and seaborn libraries.
    """
    fig, axes = plt.subplots(3, 3, figsize=(18, 15))
    fig.suptitle(f'{player_name} - Complete Statcast Profile', fontsize=16)

    batted_balls = df[df['launch_speed'].notna()].copy()

    # 1. Exit Velocity Distribution
    axes[0, 0].hist(batted_balls['launch_speed'], bins=30, edgecolor='black')
    axes[0, 0].axvline(95, color='red', linestyle='--', label='Hard Hit (95mph)')
    axes[0, 0].set_title('Exit Velocity Distribution')
    axes[0, 0].set_xlabel('Exit Velocity (mph)')
    axes[0, 0].legend()

    # 2. Launch Angle Distribution
    axes[0, 1].hist(batted_balls['launch_angle'], bins=40, edgecolor='black')
    axes[0, 1].axvline(8, color='green', linestyle='--', label='Sweet Spot')
    axes[0, 1].axvline(32, color='green', linestyle='--')
    axes[0, 1].set_title('Launch Angle Distribution')
    axes[0, 1].set_xlabel('Launch Angle (degrees)')
    axes[0, 1].legend()

    # 3. EV vs LA Scatter (colored by outcome)
    scatter_data = batted_balls.copy()
    scatter_data['is_hit'] = scatter_data['events'].isin([
        'single', 'double', 'triple', 'home_run'
    ])
    colors = scatter_data['is_hit'].map({True: 'green', False: 'red'})

    axes[0, 2].scatter(scatter_data['launch_speed'],
                      scatter_data['launch_angle'],
                      c=colors, alpha=0.5, s=10)
    axes[0, 2].set_title('Exit Velocity vs Launch Angle')
    axes[0, 2].set_xlabel('Exit Velocity (mph)')
    axes[0, 2].set_ylabel('Launch Angle (degrees)')

    # Additional plots would follow...
    # (Sprint speed gauge, xwOBA comparison, zone heat map, etc.)

    plt.tight_layout()
    return fig

# Note: This is example code structure - actual implementation would require
# data and proper visualization library setup

6.10 Interactive Statcast Visualizations

Statcast data's richness demands interactive visualization to fully explore the multidimensional relationships between exit velocity, launch angle, hit distance, spray direction, and outcomes. While static plots provide snapshots, interactive visualizations enable analysts to rotate 3D perspectives, filter by outcome types, and discover patterns that would otherwise remain hidden. This section demonstrates advanced interactive Statcast visualizations using Plotly in both Python and R.

6.10.1 3D Scatter Plot: Exit Velocity, Launch Angle, and Distance

The relationship between exit velocity, launch angle, and hit distance forms the foundation of batted ball physics. A 3D interactive scatter plot allows us to explore this relationship dynamically, rotating the view to understand optimal launch conditions and identify barrels visually.

Python Implementation with Plotly:

import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from pybaseball import statcast_batter, playerid_lookup
import numpy as np

# Fetch Statcast data for a power hitter (Aaron Judge example)
player_id = 592450  # Aaron Judge
start_date = '2024-04-01'
end_date = '2024-10-01'

statcast_data = statcast_batter(start_date, end_date, player_id)

# Filter for batted balls with complete data
batted_balls = statcast_data[
    (statcast_data['launch_speed'].notna()) &
    (statcast_data['launch_angle'].notna()) &
    (statcast_data['hit_distance_sc'].notna())
].copy()

# Classify outcomes for color coding
def classify_outcome(event):
    if event in ['home_run']:
        return 'Home Run'
    elif event in ['triple', 'double']:
        return 'Extra-Base Hit'
    elif event in ['single']:
        return 'Single'
    else:
        return 'Out'

batted_balls['outcome_type'] = batted_balls['events'].apply(classify_outcome)

# Create color mapping
color_map = {
    'Home Run': '#FF0000',
    'Extra-Base Hit': '#FFA500',
    'Single': '#00FF00',
    'Out': '#808080'
}

batted_balls['color'] = batted_balls['outcome_type'].map(color_map)

# Create 3D scatter plot
fig = go.Figure()

for outcome in ['Out', 'Single', 'Extra-Base Hit', 'Home Run']:
    df_subset = batted_balls[batted_balls['outcome_type'] == outcome]

    fig.add_trace(go.Scatter3d(
        x=df_subset['launch_speed'],
        y=df_subset['launch_angle'],
        z=df_subset['hit_distance_sc'],
        mode='markers',
        name=outcome,
        marker=dict(
            size=4,
            color=color_map[outcome],
            opacity=0.7,
            line=dict(width=0.5, color='DarkSlateGray')
        ),
        text=[
            f"<b>{outcome}</b><br>" +
            f"EV: {ev:.1f} mph<br>" +
            f"LA: {la:.1f}°<br>" +
            f"Distance: {dist:.1f} ft<br>" +
            f"Date: {date}"
            for ev, la, dist, date in zip(
                df_subset['launch_speed'],
                df_subset['launch_angle'],
                df_subset['hit_distance_sc'],
                df_subset['game_date']
            )
        ],
        hoverinfo='text'
    ))

# Add barrel zone reference (simplified)
# Barrels: 98+ mph exit velo with launch angles between 26-30 degrees
barrel_ev = np.linspace(98, 120, 10)
barrel_la = np.linspace(26, 30, 10)
barrel_dist = np.linspace(375, 450, 10)

fig.add_trace(go.Scatter3d(
    x=barrel_ev,
    y=barrel_la,
    z=barrel_dist,
    mode='markers',
    name='Barrel Zone Reference',
    marker=dict(size=8, color='gold', symbol='diamond', opacity=0.5),
    showlegend=True
))

# Update layout
fig.update_layout(
    title='3D Batted Ball Profile: Exit Velocity × Launch Angle × Distance',
    scene=dict(
        xaxis_title='Exit Velocity (mph)',
        yaxis_title='Launch Angle (degrees)',
        zaxis_title='Hit Distance (feet)',
        camera=dict(
            eye=dict(x=1.5, y=1.5, z=1.3)
        )
    ),
    width=1000,
    height=800,
    showlegend=True,
    legend=dict(
        x=0.02,
        y=0.98,
        bgcolor='rgba(255, 255, 255, 0.8)'
    ),
    font=dict(size=12)
)

fig.show()
# fig.write_html('3d_batted_ball_profile.html')

R Implementation:

library(plotly)
library(dplyr)
library(baseballr)

# Fetch Statcast data
player_id <- 592450  # Aaron Judge
statcast_data <- statcast_search_batters(
  start_date = "2024-04-01",
  end_date = "2024-10-01",
  batterid = player_id
)

# Filter and prepare data
batted_balls <- statcast_data %>%
  filter(
    !is.na(launch_speed),
    !is.na(launch_angle),
    !is.na(hit_distance_sc)
  ) %>%
  mutate(
    outcome_type = case_when(
      events == "home_run" ~ "Home Run",
      events %in% c("triple", "double") ~ "Extra-Base Hit",
      events == "single" ~ "Single",
      TRUE ~ "Out"
    ),
    color = case_when(
      outcome_type == "Home Run" ~ "#FF0000",
      outcome_type == "Extra-Base Hit" ~ "#FFA500",
      outcome_type == "Single" ~ "#00FF00",
      outcome_type == "Out" ~ "#808080"
    )
  )

# Create 3D scatter plot
fig <- plot_ly(
  data = batted_balls,
  x = ~launch_speed,
  y = ~launch_angle,
  z = ~hit_distance_sc,
  color = ~outcome_type,
  colors = c(
    "Home Run" = "#FF0000",
    "Extra-Base Hit" = "#FFA500",
    "Single" = "#00FF00",
    "Out" = "#808080"
  ),
  type = 'scatter3d',
  mode = 'markers',
  marker = list(
    size = 4,
    opacity = 0.7,
    line = list(width = 0.5, color = 'rgba(50, 50, 50, 0.5)')
  ),
  text = ~paste0(
    "<b>", outcome_type, "</b><br>",
    "EV: ", round(launch_speed, 1), " mph<br>",
    "LA: ", round(launch_angle, 1), "°<br>",
    "Distance: ", round(hit_distance_sc, 1), " ft<br>",
    "Date: ", game_date
  ),
  hoverinfo = 'text'
) %>%
  layout(
    title = list(
      text = "3D Batted Ball Profile: Exit Velocity × Launch Angle × Distance",
      font = list(size = 14)
    ),
    scene = list(
      xaxis = list(title = "Exit Velocity (mph)"),
      yaxis = list(title = "Launch Angle (degrees)"),
      zaxis = list(title = "Hit Distance (feet)"),
      camera = list(
        eye = list(x = 1.5, y = 1.5, z = 1.3)
      )
    ),
    width = 1000,
    height = 800,
    showlegend = TRUE,
    legend = list(
      x = 0.02,
      y = 0.98,
      bgcolor = 'rgba(255, 255, 255, 0.8)'
    )
  )

fig
# htmlwidgets::saveWidget(fig, "3d_batted_ball_profile.html")

Key Insights from 3D Visualization:


  • Optimal Launch Windows: Visually identify the exit velocity and launch angle combinations that produce the longest distances

  • Barrel Recognition: Home runs cluster in specific regions of the 3D space

  • Outcome Patterns: Outs dominate at extreme launch angles (too high or too low) regardless of exit velocity

  • Power Threshold: A clear velocity threshold exists below which home runs become extremely rare

6.10.2 Interactive Spray Chart with Hit Data

Spray charts visualize where batted balls land on the field, revealing hitter tendencies, shift vulnerabilities, and approach patterns. An interactive spray chart with hover data transforms this classic visualization into a powerful analytical tool.

Python Implementation:

import plotly.graph_objects as go
from matplotlib.patches import Arc
import numpy as np

# Filter for balls in play
balls_in_play = statcast_data[
    (statcast_data['hc_x'].notna()) &
    (statcast_data['hc_y'].notna())
].copy()

# Classify hit outcomes and directions
def get_hit_value(event):
    """Assign numeric value to hit outcomes"""
    if event == 'home_run':
        return 4
    elif event == 'triple':
        return 3
    elif event == 'double':
        return 2
    elif event == 'single':
        return 1
    else:
        return 0

balls_in_play['hit_value'] = balls_in_play['events'].apply(get_hit_value)
balls_in_play['is_hit'] = balls_in_play['hit_value'] > 0

# Create spray chart
fig = go.Figure()

# Add outs
outs = balls_in_play[~balls_in_play['is_hit']]
fig.add_trace(go.Scatter(
    x=outs['hc_x'],
    y=outs['hc_y'],
    mode='markers',
    name='Outs',
    marker=dict(
        size=8,
        color='lightgray',
        symbol='circle',
        opacity=0.5,
        line=dict(width=0.5, color='gray')
    ),
    text=[
        f"<b>Out</b><br>" +
        f"EV: {ev:.1f} mph<br>" +
        f"LA: {la:.1f}°<br>" +
        f"Distance: {dist:.0f} ft"
        for ev, la, dist in zip(
            outs['launch_speed'],
            outs['launch_angle'],
            outs['hit_distance_sc'].fillna(0)
        )
    ],
    hoverinfo='text'
))

# Add hits by type
hit_colors = {1: '#90EE90', 2: '#4169E1', 3: '#FF8C00', 4: '#FF0000'}
hit_names = {1: 'Single', 2: 'Double', 3: 'Triple', 4: 'Home Run'}

for hit_val in [1, 2, 3, 4]:
    hits = balls_in_play[balls_in_play['hit_value'] == hit_val]
    if len(hits) > 0:
        fig.add_trace(go.Scatter(
            x=hits['hc_x'],
            y=hits['hc_y'],
            mode='markers',
            name=hit_names[hit_val],
            marker=dict(
                size=10,
                color=hit_colors[hit_val],
                symbol='circle',
                opacity=0.8,
                line=dict(width=1, color='black')
            ),
            text=[
                f"<b>{hit_names[hit_val]}</b><br>" +
                f"EV: {ev:.1f} mph<br>" +
                f"LA: {la:.1f}°<br>" +
                f"Distance: {dist:.0f} ft<br>" +
                f"Date: {date}"
                for ev, la, dist, date in zip(
                    hits['launch_speed'],
                    hits['launch_angle'],
                    hits['hit_distance_sc'].fillna(0),
                    hits['game_date']
                )
            ],
            hoverinfo='text'
        ))

# Add field dimensions (simplified diamond)
# Home plate at approximately (125, 200) in Statcast coordinates
fig.add_shape(type="line", x0=125, y0=200, x1=125, y1=50,
              line=dict(color="green", width=2))  # Center field line

# Update layout for baseball field appearance
fig.update_layout(
    title='Interactive Spray Chart - 2024 Season',
    xaxis=dict(
        title='Horizontal Position',
        range=[0, 250],
        showgrid=False,
        zeroline=False
    ),
    yaxis=dict(
        title='Distance from Home Plate',
        range=[0, 300],
        showgrid=False,
        zeroline=False,
        scaleanchor="x",
        scaleratio=1
    ),
    plot_bgcolor='rgba(34, 139, 34, 0.1)',  # Light green background
    width=900,
    height=900,
    showlegend=True,
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    hovermode='closest'
)

fig.show()

R Implementation:

library(plotly)
library(dplyr)

# Filter for balls in play
balls_in_play <- statcast_data %>%
  filter(!is.na(hc_x), !is.na(hc_y)) %>%
  mutate(
    hit_value = case_when(
      events == "home_run" ~ 4,
      events == "triple" ~ 3,
      events == "double" ~ 2,
      events == "single" ~ 1,
      TRUE ~ 0
    ),
    is_hit = hit_value > 0,
    outcome_label = case_when(
      hit_value == 4 ~ "Home Run",
      hit_value == 3 ~ "Triple",
      hit_value == 2 ~ "Double",
      hit_value == 1 ~ "Single",
      TRUE ~ "Out"
    )
  )

# Create spray chart
fig <- plot_ly()

# Add outs
outs <- balls_in_play %>% filter(!is_hit)
fig <- fig %>%
  add_trace(
    data = outs,
    x = ~hc_x,
    y = ~hc_y,
    type = 'scatter',
    mode = 'markers',
    name = 'Outs',
    marker = list(
      size = 8,
      color = 'lightgray',
      opacity = 0.5,
      line = list(width = 0.5, color = 'gray')
    ),
    text = ~paste0(
      "<b>Out</b><br>",
      "EV: ", round(launch_speed, 1), " mph<br>",
      "LA: ", round(launch_angle, 1), "°<br>",
      "Distance: ", round(hit_distance_sc, 0), " ft"
    ),
    hoverinfo = 'text'
  )

# Add hits by type
hit_data <- list(
  list(value = 1, name = "Single", color = "#90EE90"),
  list(value = 2, name = "Double", color = "#4169E1"),
  list(value = 3, name = "Triple", color = "#FF8C00"),
  list(value = 4, name = "Home Run", color = "#FF0000")
)

for (hit_type in hit_data) {
  hits <- balls_in_play %>% filter(hit_value == hit_type$value)
  if (nrow(hits) > 0) {
    fig <- fig %>%
      add_trace(
        data = hits,
        x = ~hc_x,
        y = ~hc_y,
        type = 'scatter',
        mode = 'markers',
        name = hit_type$name,
        marker = list(
          size = 10,
          color = hit_type$color,
          opacity = 0.8,
          line = list(width = 1, color = 'black')
        ),
        text = ~paste0(
          "<b>", outcome_label, "</b><br>",
          "EV: ", round(launch_speed, 1), " mph<br>",
          "LA: ", round(launch_angle, 1), "°<br>",
          "Distance: ", round(hit_distance_sc, 0), " ft<br>",
          "Date: ", game_date
        ),
        hoverinfo = 'text'
      )
  }
}

# Update layout
fig <- fig %>%
  layout(
    title = "Interactive Spray Chart - 2024 Season",
    xaxis = list(
      title = "Horizontal Position",
      range = c(0, 250),
      showgrid = FALSE,
      zeroline = FALSE
    ),
    yaxis = list(
      title = "Distance from Home Plate",
      range = c(0, 300),
      showgrid = FALSE,
      zeroline = FALSE,
      scaleanchor = "x",
      scaleratio = 1
    ),
    plot_bgcolor = 'rgba(34, 139, 34, 0.1)',
    width = 900,
    height = 900,
    showlegend = TRUE,
    legend = list(
      orientation = "h",
      yanchor = "bottom",
      y = 1.02,
      xanchor = "right",
      x = 1
    ),
    hovermode = 'closest'
  )

fig

6.10.3 Animated Barrel Rate Trends Over Time

Barrel rate is one of the most stable and predictive Statcast metrics. Visualizing how a player's barrel rate evolves throughout a season can reveal hot streaks, mechanical adjustments, or fatigue patterns. Animated visualizations effectively communicate temporal trends.

Python Implementation:

import plotly.express as px
from datetime import datetime

# Calculate rolling barrel rate
def calculate_rolling_metrics(df, window=50):
    """Calculate rolling Statcast metrics"""
    df = df.sort_values('game_date').copy()

    # Identify barrels
    df['is_barrel'] = (
        (df['launch_speed'] >= 98) &
        (df['launch_angle'] >= 26) &
        (df['launch_angle'] <= 30)
    )

    # Calculate rolling metrics
    df['rolling_barrel_rate'] = (
        df['is_barrel'].rolling(window=window, min_periods=10).mean() * 100
    )
    df['rolling_ev'] = df['launch_speed'].rolling(
        window=window, min_periods=10
    ).mean()
    df['rolling_hard_hit'] = (
        (df['launch_speed'] >= 95).rolling(window=window, min_periods=10).mean() * 100
    )

    return df

# Apply rolling calculations
batted_balls_rolling = calculate_rolling_metrics(batted_balls, window=50)
batted_balls_rolling = batted_balls_rolling.dropna(
    subset=['rolling_barrel_rate']
)

# Create animated line plot
fig = px.line(
    batted_balls_rolling,
    x='game_date',
    y='rolling_barrel_rate',
    title='Rolling 50-Batted Ball Barrel Rate - 2024 Season',
    labels={
        'game_date': 'Date',
        'rolling_barrel_rate': 'Barrel Rate (%)'
    },
    markers=True
)

# Add season average reference line
season_avg_barrel = batted_balls['is_barrel'].mean() * 100
fig.add_hline(
    y=season_avg_barrel,
    line_dash="dash",
    line_color="red",
    annotation_text=f"Season Avg: {season_avg_barrel:.1f}%",
    annotation_position="right"
)

# Add MLB average reference
fig.add_hline(
    y=8.0,
    line_dash="dash",
    line_color="gray",
    annotation_text="MLB Avg: 8.0%",
    annotation_position="left"
)

fig.update_layout(
    width=1100,
    height=600,
    hovermode='x unified',
    template='plotly_white',
    font=dict(size=12)
)

fig.update_traces(
    line=dict(width=3, color='#1f77b4'),
    marker=dict(size=6)
)

fig.show()

Alternative: Multi-Metric Animated Dashboard (Python):

from plotly.subplots import make_subplots
import plotly.graph_objects as go

# Create subplot figure with multiple metrics
fig = make_subplots(
    rows=3, cols=1,
    subplot_titles=(
        'Rolling Barrel Rate (%)',
        'Rolling Average Exit Velocity (mph)',
        'Rolling Hard-Hit Rate (%)'
    ),
    vertical_spacing=0.1
)

# Barrel Rate
fig.add_trace(
    go.Scatter(
        x=batted_balls_rolling['game_date'],
        y=batted_balls_rolling['rolling_barrel_rate'],
        mode='lines+markers',
        name='Barrel Rate',
        line=dict(color='#FF4444', width=2),
        marker=dict(size=4)
    ),
    row=1, col=1
)

# Exit Velocity
fig.add_trace(
    go.Scatter(
        x=batted_balls_rolling['game_date'],
        y=batted_balls_rolling['rolling_ev'],
        mode='lines+markers',
        name='Avg Exit Velocity',
        line=dict(color='#4444FF', width=2),
        marker=dict(size=4)
    ),
    row=2, col=1
)

# Hard-Hit Rate
fig.add_trace(
    go.Scatter(
        x=batted_balls_rolling['game_date'],
        y=batted_balls_rolling['rolling_hard_hit'],
        mode='lines+markers',
        name='Hard-Hit Rate',
        line=dict(color='#44FF44', width=2),
        marker=dict(size=4)
    ),
    row=3, col=1
)

fig.update_xaxes(title_text="Date", row=3, col=1)
fig.update_yaxes(title_text="%", row=1, col=1)
fig.update_yaxes(title_text="mph", row=2, col=1)
fig.update_yaxes(title_text="%", row=3, col=1)

fig.update_layout(
    title_text='Statcast Metrics Trends - Rolling 50 Batted Balls',
    height=900,
    width=1100,
    showlegend=False,
    hovermode='x unified',
    template='plotly_white'
)

fig.show()

R Implementation with Time Series:

library(plotly)
library(dplyr)
library(zoo)

# Calculate rolling metrics
batted_balls_rolling <- batted_balls %>%
  arrange(game_date) %>%
  mutate(
    is_barrel = (launch_speed >= 98 & launch_angle >= 26 & launch_angle <= 30),
    rolling_barrel_rate = rollapply(
      is_barrel, width = 50, FUN = mean, fill = NA, align = "right"
    ) * 100,
    rolling_ev = rollapply(
      launch_speed, width = 50, FUN = mean, fill = NA, align = "right", na.rm = TRUE
    )
  ) %>%
  filter(!is.na(rolling_barrel_rate))

# Create animated line plot
fig <- plot_ly(
  data = batted_balls_rolling,
  x = ~game_date,
  y = ~rolling_barrel_rate,
  type = 'scatter',
  mode = 'lines+markers',
  line = list(color = '#1f77b4', width = 3),
  marker = list(size = 6),
  text = ~paste0(
    "Date: ", game_date, "<br>",
    "Barrel Rate: ", round(rolling_barrel_rate, 1), "%<br>",
    "Avg EV: ", round(rolling_ev, 1), " mph"
  ),
  hoverinfo = 'text'
) %>%
  add_trace(
    y = mean(batted_balls_rolling$is_barrel) * 100,
    type = 'scatter',
    mode = 'lines',
    line = list(dash = 'dash', color = 'red', width = 2),
    name = 'Season Average',
    showlegend = TRUE
  ) %>%
  add_trace(
    y = 8.0,
    type = 'scatter',
    mode = 'lines',
    line = list(dash = 'dash', color = 'gray', width = 2),
    name = 'MLB Average',
    showlegend = TRUE
  ) %>%
  layout(
    title = "Rolling 50-Batted Ball Barrel Rate - 2024 Season",
    xaxis = list(title = "Date"),
    yaxis = list(title = "Barrel Rate (%)"),
    width = 1100,
    height = 600,
    hovermode = 'x unified',
    template = 'plotly_white'
  )

fig

Key Insights from Temporal Visualizations:


  • Consistency: Stable barrel rates indicate reliable power production

  • Trend Identification: Upward or downward trends may signal mechanical changes or injury

  • Volatility: High variance suggests inconsistent contact quality

  • Comparative Context: Reference lines provide immediate context against league and personal averages

Best Practices for Interactive Statcast Visualizations:

  1. Always Include Context: Add league average reference lines or comparison groups
  2. Rich Hover Data: Include date, outcome, exit velocity, launch angle, and distance
  3. Color Encoding: Use intuitive colors (red for home runs, gray for outs)
  4. Export Options: Save as HTML for sharing with non-technical stakeholders
  5. Performance: Limit data points to reasonable numbers (< 5000) for responsive interactions
  6. Accessibility: Ensure sufficient color contrast and alternative text descriptions

Interactive Statcast visualizations empower analysts to explore complex multidimensional data efficiently, uncover hidden patterns, and communicate findings compellingly to diverse audiences.


R
library(plotly)
library(dplyr)
library(baseballr)

# Fetch Statcast data
player_id <- 592450  # Aaron Judge
statcast_data <- statcast_search_batters(
  start_date = "2024-04-01",
  end_date = "2024-10-01",
  batterid = player_id
)

# Filter and prepare data
batted_balls <- statcast_data %>%
  filter(
    !is.na(launch_speed),
    !is.na(launch_angle),
    !is.na(hit_distance_sc)
  ) %>%
  mutate(
    outcome_type = case_when(
      events == "home_run" ~ "Home Run",
      events %in% c("triple", "double") ~ "Extra-Base Hit",
      events == "single" ~ "Single",
      TRUE ~ "Out"
    ),
    color = case_when(
      outcome_type == "Home Run" ~ "#FF0000",
      outcome_type == "Extra-Base Hit" ~ "#FFA500",
      outcome_type == "Single" ~ "#00FF00",
      outcome_type == "Out" ~ "#808080"
    )
  )

# Create 3D scatter plot
fig <- plot_ly(
  data = batted_balls,
  x = ~launch_speed,
  y = ~launch_angle,
  z = ~hit_distance_sc,
  color = ~outcome_type,
  colors = c(
    "Home Run" = "#FF0000",
    "Extra-Base Hit" = "#FFA500",
    "Single" = "#00FF00",
    "Out" = "#808080"
  ),
  type = 'scatter3d',
  mode = 'markers',
  marker = list(
    size = 4,
    opacity = 0.7,
    line = list(width = 0.5, color = 'rgba(50, 50, 50, 0.5)')
  ),
  text = ~paste0(
    "<b>", outcome_type, "</b><br>",
    "EV: ", round(launch_speed, 1), " mph<br>",
    "LA: ", round(launch_angle, 1), "°<br>",
    "Distance: ", round(hit_distance_sc, 1), " ft<br>",
    "Date: ", game_date
  ),
  hoverinfo = 'text'
) %>%
  layout(
    title = list(
      text = "3D Batted Ball Profile: Exit Velocity × Launch Angle × Distance",
      font = list(size = 14)
    ),
    scene = list(
      xaxis = list(title = "Exit Velocity (mph)"),
      yaxis = list(title = "Launch Angle (degrees)"),
      zaxis = list(title = "Hit Distance (feet)"),
      camera = list(
        eye = list(x = 1.5, y = 1.5, z = 1.3)
      )
    ),
    width = 1000,
    height = 800,
    showlegend = TRUE,
    legend = list(
      x = 0.02,
      y = 0.98,
      bgcolor = 'rgba(255, 255, 255, 0.8)'
    )
  )

fig
# htmlwidgets::saveWidget(fig, "3d_batted_ball_profile.html")
R
library(plotly)
library(dplyr)

# Filter for balls in play
balls_in_play <- statcast_data %>%
  filter(!is.na(hc_x), !is.na(hc_y)) %>%
  mutate(
    hit_value = case_when(
      events == "home_run" ~ 4,
      events == "triple" ~ 3,
      events == "double" ~ 2,
      events == "single" ~ 1,
      TRUE ~ 0
    ),
    is_hit = hit_value > 0,
    outcome_label = case_when(
      hit_value == 4 ~ "Home Run",
      hit_value == 3 ~ "Triple",
      hit_value == 2 ~ "Double",
      hit_value == 1 ~ "Single",
      TRUE ~ "Out"
    )
  )

# Create spray chart
fig <- plot_ly()

# Add outs
outs <- balls_in_play %>% filter(!is_hit)
fig <- fig %>%
  add_trace(
    data = outs,
    x = ~hc_x,
    y = ~hc_y,
    type = 'scatter',
    mode = 'markers',
    name = 'Outs',
    marker = list(
      size = 8,
      color = 'lightgray',
      opacity = 0.5,
      line = list(width = 0.5, color = 'gray')
    ),
    text = ~paste0(
      "<b>Out</b><br>",
      "EV: ", round(launch_speed, 1), " mph<br>",
      "LA: ", round(launch_angle, 1), "°<br>",
      "Distance: ", round(hit_distance_sc, 0), " ft"
    ),
    hoverinfo = 'text'
  )

# Add hits by type
hit_data <- list(
  list(value = 1, name = "Single", color = "#90EE90"),
  list(value = 2, name = "Double", color = "#4169E1"),
  list(value = 3, name = "Triple", color = "#FF8C00"),
  list(value = 4, name = "Home Run", color = "#FF0000")
)

for (hit_type in hit_data) {
  hits <- balls_in_play %>% filter(hit_value == hit_type$value)
  if (nrow(hits) > 0) {
    fig <- fig %>%
      add_trace(
        data = hits,
        x = ~hc_x,
        y = ~hc_y,
        type = 'scatter',
        mode = 'markers',
        name = hit_type$name,
        marker = list(
          size = 10,
          color = hit_type$color,
          opacity = 0.8,
          line = list(width = 1, color = 'black')
        ),
        text = ~paste0(
          "<b>", outcome_label, "</b><br>",
          "EV: ", round(launch_speed, 1), " mph<br>",
          "LA: ", round(launch_angle, 1), "°<br>",
          "Distance: ", round(hit_distance_sc, 0), " ft<br>",
          "Date: ", game_date
        ),
        hoverinfo = 'text'
      )
  }
}

# Update layout
fig <- fig %>%
  layout(
    title = "Interactive Spray Chart - 2024 Season",
    xaxis = list(
      title = "Horizontal Position",
      range = c(0, 250),
      showgrid = FALSE,
      zeroline = FALSE
    ),
    yaxis = list(
      title = "Distance from Home Plate",
      range = c(0, 300),
      showgrid = FALSE,
      zeroline = FALSE,
      scaleanchor = "x",
      scaleratio = 1
    ),
    plot_bgcolor = 'rgba(34, 139, 34, 0.1)',
    width = 900,
    height = 900,
    showlegend = TRUE,
    legend = list(
      orientation = "h",
      yanchor = "bottom",
      y = 1.02,
      xanchor = "right",
      x = 1
    ),
    hovermode = 'closest'
  )

fig
R
library(plotly)
library(dplyr)
library(zoo)

# Calculate rolling metrics
batted_balls_rolling <- batted_balls %>%
  arrange(game_date) %>%
  mutate(
    is_barrel = (launch_speed >= 98 & launch_angle >= 26 & launch_angle <= 30),
    rolling_barrel_rate = rollapply(
      is_barrel, width = 50, FUN = mean, fill = NA, align = "right"
    ) * 100,
    rolling_ev = rollapply(
      launch_speed, width = 50, FUN = mean, fill = NA, align = "right", na.rm = TRUE
    )
  ) %>%
  filter(!is.na(rolling_barrel_rate))

# Create animated line plot
fig <- plot_ly(
  data = batted_balls_rolling,
  x = ~game_date,
  y = ~rolling_barrel_rate,
  type = 'scatter',
  mode = 'lines+markers',
  line = list(color = '#1f77b4', width = 3),
  marker = list(size = 6),
  text = ~paste0(
    "Date: ", game_date, "<br>",
    "Barrel Rate: ", round(rolling_barrel_rate, 1), "%<br>",
    "Avg EV: ", round(rolling_ev, 1), " mph"
  ),
  hoverinfo = 'text'
) %>%
  add_trace(
    y = mean(batted_balls_rolling$is_barrel) * 100,
    type = 'scatter',
    mode = 'lines',
    line = list(dash = 'dash', color = 'red', width = 2),
    name = 'Season Average',
    showlegend = TRUE
  ) %>%
  add_trace(
    y = 8.0,
    type = 'scatter',
    mode = 'lines',
    line = list(dash = 'dash', color = 'gray', width = 2),
    name = 'MLB Average',
    showlegend = TRUE
  ) %>%
  layout(
    title = "Rolling 50-Batted Ball Barrel Rate - 2024 Season",
    xaxis = list(title = "Date"),
    yaxis = list(title = "Barrel Rate (%)"),
    width = 1100,
    height = 600,
    hovermode = 'x unified',
    template = 'plotly_white'
  )

fig
Python
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from pybaseball import statcast_batter, playerid_lookup
import numpy as np

# Fetch Statcast data for a power hitter (Aaron Judge example)
player_id = 592450  # Aaron Judge
start_date = '2024-04-01'
end_date = '2024-10-01'

statcast_data = statcast_batter(start_date, end_date, player_id)

# Filter for batted balls with complete data
batted_balls = statcast_data[
    (statcast_data['launch_speed'].notna()) &
    (statcast_data['launch_angle'].notna()) &
    (statcast_data['hit_distance_sc'].notna())
].copy()

# Classify outcomes for color coding
def classify_outcome(event):
    if event in ['home_run']:
        return 'Home Run'
    elif event in ['triple', 'double']:
        return 'Extra-Base Hit'
    elif event in ['single']:
        return 'Single'
    else:
        return 'Out'

batted_balls['outcome_type'] = batted_balls['events'].apply(classify_outcome)

# Create color mapping
color_map = {
    'Home Run': '#FF0000',
    'Extra-Base Hit': '#FFA500',
    'Single': '#00FF00',
    'Out': '#808080'
}

batted_balls['color'] = batted_balls['outcome_type'].map(color_map)

# Create 3D scatter plot
fig = go.Figure()

for outcome in ['Out', 'Single', 'Extra-Base Hit', 'Home Run']:
    df_subset = batted_balls[batted_balls['outcome_type'] == outcome]

    fig.add_trace(go.Scatter3d(
        x=df_subset['launch_speed'],
        y=df_subset['launch_angle'],
        z=df_subset['hit_distance_sc'],
        mode='markers',
        name=outcome,
        marker=dict(
            size=4,
            color=color_map[outcome],
            opacity=0.7,
            line=dict(width=0.5, color='DarkSlateGray')
        ),
        text=[
            f"<b>{outcome}</b><br>" +
            f"EV: {ev:.1f} mph<br>" +
            f"LA: {la:.1f}°<br>" +
            f"Distance: {dist:.1f} ft<br>" +
            f"Date: {date}"
            for ev, la, dist, date in zip(
                df_subset['launch_speed'],
                df_subset['launch_angle'],
                df_subset['hit_distance_sc'],
                df_subset['game_date']
            )
        ],
        hoverinfo='text'
    ))

# Add barrel zone reference (simplified)
# Barrels: 98+ mph exit velo with launch angles between 26-30 degrees
barrel_ev = np.linspace(98, 120, 10)
barrel_la = np.linspace(26, 30, 10)
barrel_dist = np.linspace(375, 450, 10)

fig.add_trace(go.Scatter3d(
    x=barrel_ev,
    y=barrel_la,
    z=barrel_dist,
    mode='markers',
    name='Barrel Zone Reference',
    marker=dict(size=8, color='gold', symbol='diamond', opacity=0.5),
    showlegend=True
))

# Update layout
fig.update_layout(
    title='3D Batted Ball Profile: Exit Velocity × Launch Angle × Distance',
    scene=dict(
        xaxis_title='Exit Velocity (mph)',
        yaxis_title='Launch Angle (degrees)',
        zaxis_title='Hit Distance (feet)',
        camera=dict(
            eye=dict(x=1.5, y=1.5, z=1.3)
        )
    ),
    width=1000,
    height=800,
    showlegend=True,
    legend=dict(
        x=0.02,
        y=0.98,
        bgcolor='rgba(255, 255, 255, 0.8)'
    ),
    font=dict(size=12)
)

fig.show()
# fig.write_html('3d_batted_ball_profile.html')
Python
import plotly.graph_objects as go
from matplotlib.patches import Arc
import numpy as np

# Filter for balls in play
balls_in_play = statcast_data[
    (statcast_data['hc_x'].notna()) &
    (statcast_data['hc_y'].notna())
].copy()

# Classify hit outcomes and directions
def get_hit_value(event):
    """Assign numeric value to hit outcomes"""
    if event == 'home_run':
        return 4
    elif event == 'triple':
        return 3
    elif event == 'double':
        return 2
    elif event == 'single':
        return 1
    else:
        return 0

balls_in_play['hit_value'] = balls_in_play['events'].apply(get_hit_value)
balls_in_play['is_hit'] = balls_in_play['hit_value'] > 0

# Create spray chart
fig = go.Figure()

# Add outs
outs = balls_in_play[~balls_in_play['is_hit']]
fig.add_trace(go.Scatter(
    x=outs['hc_x'],
    y=outs['hc_y'],
    mode='markers',
    name='Outs',
    marker=dict(
        size=8,
        color='lightgray',
        symbol='circle',
        opacity=0.5,
        line=dict(width=0.5, color='gray')
    ),
    text=[
        f"<b>Out</b><br>" +
        f"EV: {ev:.1f} mph<br>" +
        f"LA: {la:.1f}°<br>" +
        f"Distance: {dist:.0f} ft"
        for ev, la, dist in zip(
            outs['launch_speed'],
            outs['launch_angle'],
            outs['hit_distance_sc'].fillna(0)
        )
    ],
    hoverinfo='text'
))

# Add hits by type
hit_colors = {1: '#90EE90', 2: '#4169E1', 3: '#FF8C00', 4: '#FF0000'}
hit_names = {1: 'Single', 2: 'Double', 3: 'Triple', 4: 'Home Run'}

for hit_val in [1, 2, 3, 4]:
    hits = balls_in_play[balls_in_play['hit_value'] == hit_val]
    if len(hits) > 0:
        fig.add_trace(go.Scatter(
            x=hits['hc_x'],
            y=hits['hc_y'],
            mode='markers',
            name=hit_names[hit_val],
            marker=dict(
                size=10,
                color=hit_colors[hit_val],
                symbol='circle',
                opacity=0.8,
                line=dict(width=1, color='black')
            ),
            text=[
                f"<b>{hit_names[hit_val]}</b><br>" +
                f"EV: {ev:.1f} mph<br>" +
                f"LA: {la:.1f}°<br>" +
                f"Distance: {dist:.0f} ft<br>" +
                f"Date: {date}"
                for ev, la, dist, date in zip(
                    hits['launch_speed'],
                    hits['launch_angle'],
                    hits['hit_distance_sc'].fillna(0),
                    hits['game_date']
                )
            ],
            hoverinfo='text'
        ))

# Add field dimensions (simplified diamond)
# Home plate at approximately (125, 200) in Statcast coordinates
fig.add_shape(type="line", x0=125, y0=200, x1=125, y1=50,
              line=dict(color="green", width=2))  # Center field line

# Update layout for baseball field appearance
fig.update_layout(
    title='Interactive Spray Chart - 2024 Season',
    xaxis=dict(
        title='Horizontal Position',
        range=[0, 250],
        showgrid=False,
        zeroline=False
    ),
    yaxis=dict(
        title='Distance from Home Plate',
        range=[0, 300],
        showgrid=False,
        zeroline=False,
        scaleanchor="x",
        scaleratio=1
    ),
    plot_bgcolor='rgba(34, 139, 34, 0.1)',  # Light green background
    width=900,
    height=900,
    showlegend=True,
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    hovermode='closest'
)

fig.show()
Python
import plotly.express as px
from datetime import datetime

# Calculate rolling barrel rate
def calculate_rolling_metrics(df, window=50):
    """Calculate rolling Statcast metrics"""
    df = df.sort_values('game_date').copy()

    # Identify barrels
    df['is_barrel'] = (
        (df['launch_speed'] >= 98) &
        (df['launch_angle'] >= 26) &
        (df['launch_angle'] <= 30)
    )

    # Calculate rolling metrics
    df['rolling_barrel_rate'] = (
        df['is_barrel'].rolling(window=window, min_periods=10).mean() * 100
    )
    df['rolling_ev'] = df['launch_speed'].rolling(
        window=window, min_periods=10
    ).mean()
    df['rolling_hard_hit'] = (
        (df['launch_speed'] >= 95).rolling(window=window, min_periods=10).mean() * 100
    )

    return df

# Apply rolling calculations
batted_balls_rolling = calculate_rolling_metrics(batted_balls, window=50)
batted_balls_rolling = batted_balls_rolling.dropna(
    subset=['rolling_barrel_rate']
)

# Create animated line plot
fig = px.line(
    batted_balls_rolling,
    x='game_date',
    y='rolling_barrel_rate',
    title='Rolling 50-Batted Ball Barrel Rate - 2024 Season',
    labels={
        'game_date': 'Date',
        'rolling_barrel_rate': 'Barrel Rate (%)'
    },
    markers=True
)

# Add season average reference line
season_avg_barrel = batted_balls['is_barrel'].mean() * 100
fig.add_hline(
    y=season_avg_barrel,
    line_dash="dash",
    line_color="red",
    annotation_text=f"Season Avg: {season_avg_barrel:.1f}%",
    annotation_position="right"
)

# Add MLB average reference
fig.add_hline(
    y=8.0,
    line_dash="dash",
    line_color="gray",
    annotation_text="MLB Avg: 8.0%",
    annotation_position="left"
)

fig.update_layout(
    width=1100,
    height=600,
    hovermode='x unified',
    template='plotly_white',
    font=dict(size=12)
)

fig.update_traces(
    line=dict(width=3, color='#1f77b4'),
    marker=dict(size=6)
)

fig.show()
Python
from plotly.subplots import make_subplots
import plotly.graph_objects as go

# Create subplot figure with multiple metrics
fig = make_subplots(
    rows=3, cols=1,
    subplot_titles=(
        'Rolling Barrel Rate (%)',
        'Rolling Average Exit Velocity (mph)',
        'Rolling Hard-Hit Rate (%)'
    ),
    vertical_spacing=0.1
)

# Barrel Rate
fig.add_trace(
    go.Scatter(
        x=batted_balls_rolling['game_date'],
        y=batted_balls_rolling['rolling_barrel_rate'],
        mode='lines+markers',
        name='Barrel Rate',
        line=dict(color='#FF4444', width=2),
        marker=dict(size=4)
    ),
    row=1, col=1
)

# Exit Velocity
fig.add_trace(
    go.Scatter(
        x=batted_balls_rolling['game_date'],
        y=batted_balls_rolling['rolling_ev'],
        mode='lines+markers',
        name='Avg Exit Velocity',
        line=dict(color='#4444FF', width=2),
        marker=dict(size=4)
    ),
    row=2, col=1
)

# Hard-Hit Rate
fig.add_trace(
    go.Scatter(
        x=batted_balls_rolling['game_date'],
        y=batted_balls_rolling['rolling_hard_hit'],
        mode='lines+markers',
        name='Hard-Hit Rate',
        line=dict(color='#44FF44', width=2),
        marker=dict(size=4)
    ),
    row=3, col=1
)

fig.update_xaxes(title_text="Date", row=3, col=1)
fig.update_yaxes(title_text="%", row=1, col=1)
fig.update_yaxes(title_text="mph", row=2, col=1)
fig.update_yaxes(title_text="%", row=3, col=1)

fig.update_layout(
    title_text='Statcast Metrics Trends - Rolling 50 Batted Balls',
    height=900,
    width=1100,
    showlegend=False,
    hovermode='x unified',
    template='plotly_white'
)

fig.show()

6.11 Exercises

Exercise 1: xwOBA Investigation

Task: Identify the three biggest "regression candidates" from the 2024 season - players whose actual wOBA significantly differs from xwOBA (minimum 300 PA).

Steps:


  1. Fetch Statcast data for multiple players or use Baseball Savant leaderboards

  2. Calculate actual wOBA and xwOBA for each player

  3. Identify players with largest positive difference (overperforming)

  4. Identify players with largest negative difference (underperforming)

  5. Analyze why - look at their batted ball profile, sprint speed, etc.

Questions to answer:


  • Who is most likely to decline next season?

  • Who is most likely to improve?

  • What's causing the discrepancy for each player?

Exercise 2: Barrel Rate vs. Home Run Analysis

Task: Analyze the relationship between barrel rate and home run totals.

Steps:


  1. Collect barrel% and HR data for 20+ qualified hitters

  2. Create a scatter plot: Barrel% (x-axis) vs. HR (y-axis)

  3. Calculate correlation coefficient

  4. Identify outliers - high barrel%, low HR and vice versa

  5. Investigate why outliers exist (park factors, launch angle within barrels)

Questions to answer:


  • How strong is the correlation between Barrel% and HR?

  • What barrel% typically produces 30+ HR?

  • Which players have high barrels but low HR? Why?

Exercise 3: Complete Hitter Profile

Task: Create a complete Statcast profile for two contrasting hitters - one power hitter and one contact/speed hitter.

Suggested players:


  • Power: Aaron Judge, Kyle Schwarber, Pete Alonso

  • Contact/Speed: Luis Arraez, Steven Kwan, Elly De La Cruz

Requirements:


  1. Use the create_complete_hitter_profile() function

  2. Calculate all key metrics for both players

  3. Compare and contrast their profiles

  4. Explain how their different approaches produce value

  5. Predict future performance based on Statcast data

Exercise 4: Launch Angle Revolution Analysis

Task: Analyze how launch angles have changed over time and their impact on home runs.

Steps:


  1. Fetch league-wide data from 2015, 2018, and 2024

  2. Calculate average launch angle for each year

  3. Calculate FB% (25-50°) for each year

  4. Compare to league-wide HR totals

  5. Identify when the "launch angle revolution" peaked

Questions to answer:


  • How much has average launch angle increased since 2015?

  • What year had the highest fly ball rate?

  • Has the revolution plateaued or reversed?

  • What's the relationship between league-wide launch angle and HR totals?


Chapter Summary

In this chapter, we've explored the revolutionary world of Statcast Analytics, focusing specifically on hitting metrics. We've learned:

  1. Statcast's Impact: Since 2015, ball and player tracking has provided unprecedented insight into player performance
  1. Exit Velocity: The foundational power metric, with 95+ mph representing "hard hit" contact and elite hitters averaging 93+ mph
  1. Launch Angle: The vertical component of batted balls, with the "sweet spot" (8-32°) combining the high BABIP of line drives with home run power
  1. Barrels: The perfect combination of exit velocity and launch angle, representing elite contact quality with .500+ BA and 1.500+ SLG
  1. Expected Statistics (xStats): Metrics like xBA and xwOBA that isolate contact quality from luck and defense, crucial for identifying regression candidates
  1. Spray Analysis: Understanding pull/center/opposite field tendencies and their implications for defensive positioning
  1. Sprint Speed: The pure athleticism metric that impacts BABIP, especially on ground balls, and baserunning value
  1. Advanced Analysis: Plate coverage, pitch type splits, and count-based performance reveal the complete picture of a hitter's approach
  1. Complete Profiles: Combining all Statcast metrics creates a comprehensive view of hitter talent, separating skill from circumstance

The Statcast revolution has fundamentally changed baseball analysis. We no longer need to rely solely on outcome-based statistics that blend skill, luck, defense, and park factors. Instead, we can isolate the hitter's contribution - the quality of contact - and build predictive models that identify future performance more accurately than ever before.

In the next chapter, we'll shift our focus to Statcast Pitching Analytics, exploring how similar tracking technologies have transformed our understanding of pitchers.

Chapter Summary

In this chapter, you learned about statcast analytics - hitting. Key topics covered:

  • The Statcast Revolution in Hitting
  • Exit Velocity Deep Dive
  • Launch Angle Deep Dive
  • Barrels
  • Expected Statistics (xStats)
  • Spray Angle and Pull Tendency