Chapter 21: Umpire Analysis & Strike Zone Modeling

21.1 The Importance of Umpire Analysis

The Human Element in Baseball

Home plate umpires make approximately 150-180 ball-strike decisions per game, totaling over 350,000 calls per MLB season. Each of these decisions can influence at-bats, innings, and ultimately game results. Unlike many other sports where technology has largely replaced human judgment on close calls, baseball has maintained the umpire as the final arbiter of balls and strikes (at least until recently with the introduction of ABS in the minor leagues).

The importance of umpire analysis extends to multiple stakeholders:

For Teams and Players:

Understanding which umpires expand or contract the strike zone

Adjusting game strategy based on the umpire assignment

Training hitters to protect against umpires with larger zones

Pitcher preparation and pitch selection optimization

For Broadcasters and Fans:

Contextualizing controversial calls within an umpire's historical tendencies

Evaluating umpire consistency and accuracy

Enriching game narratives with umpire-specific insights

For League Officials:

Assessing umpire performance and providing feedback

Ensuring competitive balance through consistent strike zone enforcement

Making informed decisions about rule changes and technology adoption

Historical Context and Technology Evolution

Before the PITCHf/x era (introduced in 2007), umpire analysis was largely anecdotal. Scouts and players developed reputations for certain umpires, but quantitative assessment was impossible. The introduction of pitch tracking technology revolutionized our ability to evaluate umpire performance:

PITCHf/x (2007-2016): Camera-based system that tracked pitch location and trajectory
Statcast (2015-present): Radar and camera fusion system providing even more precise measurements
TrackMan/Hawk-Eye (2020-present): Current MLB standard with millimeter-level accuracy

These systems allow us to compare each called ball or strike against the rulebook strike zone, creating objective measures of umpire accuracy and consistency.

Key Metrics in Umpire Analysis

Several metrics have emerged as standards for evaluating umpire performance:

Accuracy Metrics:

Overall accuracy rate: Percentage of calls that match the rulebook zone

Called strike accuracy (CSA): Accuracy on pitches called strikes

Called ball accuracy (CBA): Accuracy on pitches called balls

Edge consistency: Performance on borderline pitches (within 1-2 inches of zone boundary)

Impact Metrics:

Runs Above Average (RAA): Run value of incorrect calls

Win Probability Added (WPA): Impact of calls on win probability

Favor metrics: Whether an umpire's calls systematically benefit one team

Descriptive Metrics:

Strike zone expansion/contraction: How the umpire's zone differs from the rulebook

Zone shape characteristics: Width, height, and asymmetries in the enforced zone

Context sensitivity: How the zone changes with count, score, inning, etc.

Let's begin our analysis by loading and exploring umpire and pitch data:

# R: Loading and exploring umpire data
library(tidyverse)
library(baseballr)
library(mgcv)        # For GAM models
library(randomForest)
library(ggplot2)
library(gridExtra)

# Load pitch data with umpire information
# In practice, this would come from Statcast or PITCHf/x data
load_pitch_data <- function(season = 2024) {
  # This is a placeholder - in practice, use baseballr or similar
  # pitch_data <- statcast_search(start_date = "2024-04-01",
  #                                end_date = "2024-10-01")

  # For demonstration, we'll create a sample dataset structure
  set.seed(42)
  n_pitches <- 100000

  data.frame(
    game_date = sample(seq.Date(as.Date("2024-04-01"),
                                as.Date("2024-09-30"), by = "day"),
                      n_pitches, replace = TRUE),
    umpire = sample(paste("Umpire", 1:20), n_pitches, replace = TRUE),
    pitcher = sample(paste("Pitcher", 1:100), n_pitches, replace = TRUE),
    batter = sample(paste("Batter", 1:100), n_pitches, replace = TRUE),
    plate_x = rnorm(n_pitches, 0, 0.8),      # Horizontal location (feet)
    plate_z = rnorm(n_pitches, 2.5, 0.8),    # Vertical location (feet)
    sz_top = rnorm(n_pitches, 3.4, 0.15),    # Top of strike zone
    sz_bot = rnorm(n_pitches, 1.5, 0.1),     # Bottom of strike zone
    balls = sample(0:3, n_pitches, replace = TRUE),
    strikes = sample(0:2, n_pitches, replace = TRUE),
    outs = sample(0:2, n_pitches, replace = TRUE),
    pitch_type = sample(c("FF", "SI", "SL", "CH", "CU"), n_pitches, replace = TRUE),
    stand = sample(c("L", "R"), n_pitches, replace = TRUE),
    p_throws = sample(c("L", "R"), n_pitches, replace = TRUE),
    description = sample(c("called_strike", "ball", "hit_into_play", "foul",
                          "swinging_strike"), n_pitches, replace = TRUE,
                        prob = c(0.15, 0.35, 0.20, 0.20, 0.10))
  )
}

# Load data
pitch_data <- load_pitch_data(2024)

# Filter to called pitches only
called_pitches <- pitch_data %>%
  filter(description %in% c("called_strike", "ball")) %>%
  mutate(
    called_strike = as.numeric(description == "called_strike"),
    # Distance from center of zone
    dist_from_center = sqrt(plate_x^2 + (plate_z - (sz_top + sz_bot)/2)^2),
    # Normalized vertical position (0 = bottom, 1 = top)
    norm_z = (plate_z - sz_bot) / (sz_top - sz_bot)
  )

# Summary statistics by umpire
umpire_summary <- called_pitches %>%
  group_by(umpire) %>%
  summarise(
    pitches_called = n(),
    strike_rate = mean(called_strike),
    avg_plate_x = mean(abs(plate_x)),
    avg_plate_z = mean(plate_z),
    consistency = sd(called_strike)
  ) %>%
  arrange(desc(pitches_called))

print(head(umpire_summary, 10))

# Python: Loading and exploring umpire data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score
import warnings
warnings.filterwarnings('ignore')

# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 8)

def load_pitch_data(season=2024):
    """
    Load pitch data with umpire information
    In practice, use pybaseball or MLB Stats API
    """
    np.random.seed(42)
    n_pitches = 100000

    # Generate sample data
    data = pd.DataFrame({
        'game_date': pd.date_range('2024-04-01', '2024-09-30',
                                   periods=n_pitches),
        'umpire': np.random.choice([f'Umpire_{i}' for i in range(1, 21)],
                                   n_pitches),
        'pitcher': np.random.choice([f'Pitcher_{i}' for i in range(1, 101)],
                                    n_pitches),
        'batter': np.random.choice([f'Batter_{i}' for i in range(1, 101)],
                                   n_pitches),
        'plate_x': np.random.normal(0, 0.8, n_pitches),
        'plate_z': np.random.normal(2.5, 0.8, n_pitches),
        'sz_top': np.random.normal(3.4, 0.15, n_pitches),
        'sz_bot': np.random.normal(1.5, 0.1, n_pitches),
        'balls': np.random.choice([0, 1, 2, 3], n_pitches),
        'strikes': np.random.choice([0, 1, 2], n_pitches),
        'outs': np.random.choice([0, 1, 2], n_pitches),
        'pitch_type': np.random.choice(['FF', 'SI', 'SL', 'CH', 'CU'], n_pitches),
        'stand': np.random.choice(['L', 'R'], n_pitches),
        'p_throws': np.random.choice(['L', 'R'], n_pitches),
        'description': np.random.choice(
            ['called_strike', 'ball', 'hit_into_play', 'foul', 'swinging_strike'],
            n_pitches,
            p=[0.15, 0.35, 0.20, 0.20, 0.10]
        )
    })

    return data

# Load data
pitch_data = load_pitch_data(2024)

# Filter to called pitches only
called_pitches = pitch_data[
    pitch_data['description'].isin(['called_strike', 'ball'])
].copy()

called_pitches['called_strike'] = (
    called_pitches['description'] == 'called_strike'
).astype(int)

# Distance from center of zone
called_pitches['dist_from_center'] = np.sqrt(
    called_pitches['plate_x']**2 +
    (called_pitches['plate_z'] -
     (called_pitches['sz_top'] + called_pitches['sz_bot'])/2)**2
)

# Normalized vertical position
called_pitches['norm_z'] = (
    (called_pitches['plate_z'] - called_pitches['sz_bot']) /
    (called_pitches['sz_top'] - called_pitches['sz_bot'])
)

# Summary statistics by umpire
umpire_summary = called_pitches.groupby('umpire').agg({
    'called_strike': ['count', 'mean', 'std'],
    'plate_x': lambda x: np.mean(np.abs(x)),
    'plate_z': 'mean'
}).round(4)

umpire_summary.columns = ['pitches_called', 'strike_rate', 'consistency',
                          'avg_abs_plate_x', 'avg_plate_z']

print(umpire_summary.sort_values('pitches_called', ascending=False).head(10))

The code above demonstrates how to load and prepare pitch data for umpire analysis. In practice, you would use actual Statcast data from Baseball Savant, the baseballr package, or pybaseball library.

# R: Loading and exploring umpire data
library(tidyverse)
library(baseballr)
library(mgcv)        # For GAM models
library(randomForest)
library(ggplot2)
library(gridExtra)

# Load pitch data with umpire information
# In practice, this would come from Statcast or PITCHf/x data
load_pitch_data <- function(season = 2024) {
  # This is a placeholder - in practice, use baseballr or similar
  # pitch_data <- statcast_search(start_date = "2024-04-01",
  #                                end_date = "2024-10-01")

  # For demonstration, we'll create a sample dataset structure
  set.seed(42)
  n_pitches <- 100000

  data.frame(
    game_date = sample(seq.Date(as.Date("2024-04-01"),
                                as.Date("2024-09-30"), by = "day"),
                      n_pitches, replace = TRUE),
    umpire = sample(paste("Umpire", 1:20), n_pitches, replace = TRUE),
    pitcher = sample(paste("Pitcher", 1:100), n_pitches, replace = TRUE),
    batter = sample(paste("Batter", 1:100), n_pitches, replace = TRUE),
    plate_x = rnorm(n_pitches, 0, 0.8),      # Horizontal location (feet)
    plate_z = rnorm(n_pitches, 2.5, 0.8),    # Vertical location (feet)
    sz_top = rnorm(n_pitches, 3.4, 0.15),    # Top of strike zone
    sz_bot = rnorm(n_pitches, 1.5, 0.1),     # Bottom of strike zone
    balls = sample(0:3, n_pitches, replace = TRUE),
    strikes = sample(0:2, n_pitches, replace = TRUE),
    outs = sample(0:2, n_pitches, replace = TRUE),
    pitch_type = sample(c("FF", "SI", "SL", "CH", "CU"), n_pitches, replace = TRUE),
    stand = sample(c("L", "R"), n_pitches, replace = TRUE),
    p_throws = sample(c("L", "R"), n_pitches, replace = TRUE),
    description = sample(c("called_strike", "ball", "hit_into_play", "foul",
                          "swinging_strike"), n_pitches, replace = TRUE,
                        prob = c(0.15, 0.35, 0.20, 0.20, 0.10))
  )
}

# Load data
pitch_data <- load_pitch_data(2024)

# Filter to called pitches only
called_pitches <- pitch_data %>%
  filter(description %in% c("called_strike", "ball")) %>%
  mutate(
    called_strike = as.numeric(description == "called_strike"),
    # Distance from center of zone
    dist_from_center = sqrt(plate_x^2 + (plate_z - (sz_top + sz_bot)/2)^2),
    # Normalized vertical position (0 = bottom, 1 = top)
    norm_z = (plate_z - sz_bot) / (sz_top - sz_bot)
  )

# Summary statistics by umpire
umpire_summary <- called_pitches %>%
  group_by(umpire) %>%
  summarise(
    pitches_called = n(),
    strike_rate = mean(called_strike),
    avg_plate_x = mean(abs(plate_x)),
    avg_plate_z = mean(plate_z),
    consistency = sd(called_strike)
  ) %>%
  arrange(desc(pitches_called))

print(head(umpire_summary, 10))

Python

# Python: Loading and exploring umpire data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score
import warnings
warnings.filterwarnings('ignore')

# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 8)

def load_pitch_data(season=2024):
    """
    Load pitch data with umpire information
    In practice, use pybaseball or MLB Stats API
    """
    np.random.seed(42)
    n_pitches = 100000

    # Generate sample data
    data = pd.DataFrame({
        'game_date': pd.date_range('2024-04-01', '2024-09-30',
                                   periods=n_pitches),
        'umpire': np.random.choice([f'Umpire_{i}' for i in range(1, 21)],
                                   n_pitches),
        'pitcher': np.random.choice([f'Pitcher_{i}' for i in range(1, 101)],
                                    n_pitches),
        'batter': np.random.choice([f'Batter_{i}' for i in range(1, 101)],
                                   n_pitches),
        'plate_x': np.random.normal(0, 0.8, n_pitches),
        'plate_z': np.random.normal(2.5, 0.8, n_pitches),
        'sz_top': np.random.normal(3.4, 0.15, n_pitches),
        'sz_bot': np.random.normal(1.5, 0.1, n_pitches),
        'balls': np.random.choice([0, 1, 2, 3], n_pitches),
        'strikes': np.random.choice([0, 1, 2], n_pitches),
        'outs': np.random.choice([0, 1, 2], n_pitches),
        'pitch_type': np.random.choice(['FF', 'SI', 'SL', 'CH', 'CU'], n_pitches),
        'stand': np.random.choice(['L', 'R'], n_pitches),
        'p_throws': np.random.choice(['L', 'R'], n_pitches),
        'description': np.random.choice(
            ['called_strike', 'ball', 'hit_into_play', 'foul', 'swinging_strike'],
            n_pitches,
            p=[0.15, 0.35, 0.20, 0.20, 0.10]
        )
    })

    return data

# Load data
pitch_data = load_pitch_data(2024)

# Filter to called pitches only
called_pitches = pitch_data[
    pitch_data['description'].isin(['called_strike', 'ball'])
].copy()

called_pitches['called_strike'] = (
    called_pitches['description'] == 'called_strike'
).astype(int)

# Distance from center of zone
called_pitches['dist_from_center'] = np.sqrt(
    called_pitches['plate_x']**2 +
    (called_pitches['plate_z'] -
     (called_pitches['sz_top'] + called_pitches['sz_bot'])/2)**2
)

# Normalized vertical position
called_pitches['norm_z'] = (
    (called_pitches['plate_z'] - called_pitches['sz_bot']) /
    (called_pitches['sz_top'] - called_pitches['sz_bot'])
)

# Summary statistics by umpire
umpire_summary = called_pitches.groupby('umpire').agg({
    'called_strike': ['count', 'mean', 'std'],
    'plate_x': lambda x: np.mean(np.abs(x)),
    'plate_z': 'mean'
}).round(4)

umpire_summary.columns = ['pitches_called', 'strike_rate', 'consistency',
                          'avg_abs_plate_x', 'avg_plate_z']

print(umpire_summary.sort_values('pitches_called', ascending=False).head(10))

21.2 Defining & Measuring the Strike Zone

The Rulebook Strike Zone

According to MLB's Official Baseball Rules, the strike zone is defined as:

"That area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the hollow beneath the kneecap. The Strike Zone shall be determined from the batter's stance as the batter is prepared to swing at a pitched ball."

This definition creates several measurement challenges:

Individual variation: Each batter has a unique strike zone based on their height and stance
Dynamic nature: The zone is determined by the batting stance, which may vary
Ambiguous boundaries: "Midpoint between shoulders and pants" is subjective
Horizontal boundaries: The 17-inch home plate width is clear, but pitch location is three-dimensional

Operational Strike Zone Definition

For analytical purposes, we typically define the strike zone using Statcast's sztop and szbot variables, which are calculated for each batter based on their physical dimensions. The horizontal boundaries are typically:

Left edge: -0.708 feet (-8.5 inches) from the center of home plate
Right edge: +0.708 feet (+8.5 inches) from the center of home plate
Width: 1.417 feet (17 inches)

Let's create functions to determine whether a pitch is in the rulebook zone:

# R: Strike zone classification functions

# Check if pitch is in rulebook strike zone
in_strike_zone <- function(plate_x, plate_z, sz_top, sz_bot) {
  # Horizontal boundaries (in feet)
  left_edge <- -0.708
  right_edge <- 0.708

  in_horizontal <- (plate_x >= left_edge) & (plate_x <= right_edge)
  in_vertical <- (plate_z >= sz_bot) & (plate_z <= sz_top)

  return(in_horizontal & in_vertical)
}

# Calculate distance to nearest strike zone edge
distance_to_zone <- function(plate_x, plate_z, sz_top, sz_bot) {
  left_edge <- -0.708
  right_edge <- 0.708

  # Horizontal distance
  dx <- pmax(0, pmax(left_edge - plate_x, plate_x - right_edge))

  # Vertical distance
  dz <- pmax(0, pmax(sz_bot - plate_z, plate_z - sz_top))

  # Euclidean distance to zone
  return(sqrt(dx^2 + dz^2))
}

# Apply to our data
called_pitches <- called_pitches %>%
  mutate(
    in_zone = in_strike_zone(plate_x, plate_z, sz_top, sz_bot),
    dist_to_zone = distance_to_zone(plate_x, plate_z, sz_top, sz_bot),
    # Classify pitch location
    location_type = case_when(
      in_zone ~ "In Zone",
      dist_to_zone <= 0.25 ~ "Edge (0-3in)",
      dist_to_zone <= 0.5 ~ "Near (3-6in)",
      TRUE ~ "Outside (6in+)"
    )
  )

# Accuracy analysis
accuracy_by_location <- called_pitches %>%
  mutate(
    correct_call = (in_zone & called_strike == 1) |
                   (!in_zone & called_strike == 0)
  ) %>%
  group_by(location_type) %>%
  summarise(
    n_pitches = n(),
    accuracy = mean(correct_call),
    strike_rate = mean(called_strike),
    expected_strike_rate = mean(in_zone)
  )

print(accuracy_by_location)

# Python: Strike zone classification functions

def in_strike_zone(plate_x, plate_z, sz_top, sz_bot):
    """Check if pitch is in rulebook strike zone"""
    left_edge = -0.708
    right_edge = 0.708

    in_horizontal = (plate_x >= left_edge) & (plate_x <= right_edge)
    in_vertical = (plate_z >= sz_bot) & (plate_z <= sz_top)

    return in_horizontal & in_vertical

def distance_to_zone(plate_x, plate_z, sz_top, sz_bot):
    """Calculate distance to nearest strike zone edge"""
    left_edge = -0.708
    right_edge = 0.708

    # Horizontal distance
    dx = np.maximum(0, np.maximum(left_edge - plate_x, plate_x - right_edge))

    # Vertical distance
    dz = np.maximum(0, np.maximum(sz_bot - plate_z, plate_z - sz_top))

    # Euclidean distance to zone
    return np.sqrt(dx**2 + dz**2)

# Apply to our data
called_pitches['in_zone'] = in_strike_zone(
    called_pitches['plate_x'].values,
    called_pitches['plate_z'].values,
    called_pitches['sz_top'].values,
    called_pitches['sz_bot'].values
)

called_pitches['dist_to_zone'] = distance_to_zone(
    called_pitches['plate_x'].values,
    called_pitches['plate_z'].values,
    called_pitches['sz_top'].values,
    called_pitches['sz_bot'].values
)

# Classify pitch location
def classify_location(dist_to_zone, in_zone):
    if in_zone:
        return "In Zone"
    elif dist_to_zone <= 0.25:
        return "Edge (0-3in)"
    elif dist_to_zone <= 0.5:
        return "Near (3-6in)"
    else:
        return "Outside (6in+)"

called_pitches['location_type'] = called_pitches.apply(
    lambda row: classify_location(row['dist_to_zone'], row['in_zone']),
    axis=1
)

# Accuracy analysis
called_pitches['correct_call'] = (
    (called_pitches['in_zone'] & (called_pitches['called_strike'] == 1)) |
    (~called_pitches['in_zone'] & (called_pitches['called_strike'] == 0))
)

accuracy_by_location = called_pitches.groupby('location_type').agg({
    'called_strike': ['count', 'mean'],
    'in_zone': 'mean',
    'correct_call': 'mean'
}).round(4)

accuracy_by_location.columns = ['n_pitches', 'strike_rate',
                                 'expected_strike_rate', 'accuracy']
print(accuracy_by_location)

Visualizing the Strike Zone

Creating effective visualizations of the strike zone is crucial for understanding umpire tendencies. Let's create several visualization functions:

# R: Strike zone visualization functions

# Basic strike zone plot
plot_strike_zone <- function(data, title = "Strike Zone") {
  ggplot(data, aes(x = plate_x, y = plate_z)) +
    # Add strike zone box (using average sz_top and sz_bot)
    geom_rect(aes(xmin = -0.708, xmax = 0.708,
                  ymin = mean(sz_bot), ymax = mean(sz_top)),
              fill = NA, color = "black", linewidth = 1) +
    # Add home plate
    geom_segment(aes(x = -0.708, xend = 0.708, y = 0, yend = 0),
                 color = "black", linewidth = 1.5) +
    coord_fixed(ratio = 1) +
    labs(title = title, x = "Horizontal Location (ft)",
         y = "Vertical Location (ft)") +
    theme_minimal()
}

# Heat map of called strike probability
plot_strike_probability_heatmap <- function(data, umpire_name = NULL) {
  if (!is.null(umpire_name)) {
    data <- data %>% filter(umpire == umpire_name)
    title <- paste("Called Strike Probability -", umpire_name)
  } else {
    title <- "Called Strike Probability - All Umpires"
  }

  ggplot(data, aes(x = plate_x, y = plate_z, z = called_strike)) +
    stat_summary_2d(fun = mean, bins = 20) +
    geom_rect(aes(xmin = -0.708, xmax = 0.708,
                  ymin = mean(sz_bot), ymax = mean(sz_top)),
              fill = NA, color = "white", linewidth = 1.2) +
    scale_fill_gradient2(low = "blue", mid = "yellow", high = "red",
                        midpoint = 0.5, name = "Strike\nProbability") +
    coord_fixed(ratio = 1) +
    labs(title = title, x = "Horizontal Location (ft)",
         y = "Vertical Location (ft)") +
    theme_minimal() +
    theme(legend.position = "right")
}

# Compare umpire to league average
plot_umpire_comparison <- function(data, umpire_name) {
  umpire_data <- data %>% filter(umpire == umpire_name)
  league_data <- data

  p1 <- plot_strike_probability_heatmap(league_data, NULL)
  p2 <- plot_strike_probability_heatmap(umpire_data, umpire_name)

  grid.arrange(p1, p2, ncol = 2)
}

# Example: Plot for a specific umpire
plot_strike_probability_heatmap(called_pitches, "Umpire 1")

# Python: Strike zone visualization functions

def plot_strike_zone_base(ax, sz_top_avg, sz_bot_avg):
    """Add strike zone rectangle to plot"""
    from matplotlib.patches import Rectangle

    zone = Rectangle((-0.708, sz_bot_avg), 1.416, sz_top_avg - sz_bot_avg,
                     fill=False, edgecolor='white', linewidth=2)
    ax.add_patch(zone)

    # Add home plate
    ax.plot([-0.708, 0.708], [0, 0], 'k-', linewidth=2)

    ax.set_aspect('equal')
    ax.set_xlabel('Horizontal Location (ft)', fontsize=12)
    ax.set_ylabel('Vertical Location (ft)', fontsize=12)

def plot_strike_probability_heatmap(data, umpire_name=None, ax=None):
    """Create heatmap of called strike probability"""
    if ax is None:
        fig, ax = plt.subplots(figsize=(10, 8))

    if umpire_name:
        plot_data = data[data['umpire'] == umpire_name].copy()
        title = f"Called Strike Probability - {umpire_name}"
    else:
        plot_data = data.copy()
        title = "Called Strike Probability - All Umpires"

    # Create 2D histogram
    from scipy.stats import binned_statistic_2d

    x = plot_data['plate_x']
    y = plot_data['plate_z']
    values = plot_data['called_strike']

    statistic, x_edges, y_edges, _ = binned_statistic_2d(
        x, y, values, statistic='mean', bins=20,
        range=[[-2, 2], [0, 5]]
    )

    # Plot heatmap
    im = ax.imshow(statistic.T, origin='lower', aspect='auto',
                   extent=[-2, 2, 0, 5], cmap='RdYlBu_r',
                   vmin=0, vmax=1, alpha=0.8)

    # Add strike zone
    plot_strike_zone_base(ax, plot_data['sz_top'].mean(),
                         plot_data['sz_bot'].mean())

    ax.set_xlim(-2, 2)
    ax.set_ylim(0, 5)
    ax.set_title(title, fontsize=14, fontweight='bold')

    # Add colorbar
    plt.colorbar(im, ax=ax, label='Strike Probability')

    return ax

def plot_umpire_comparison(data, umpire_name):
    """Compare umpire to league average"""
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))

    plot_strike_probability_heatmap(data, None, ax1)
    plot_strike_probability_heatmap(data, umpire_name, ax2)

    plt.tight_layout()
    return fig

# Example: Plot for a specific umpire
fig = plot_umpire_comparison(called_pitches, 'Umpire_1')
plt.savefig('umpire_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

Accuracy Metrics

Once we've defined the strike zone, we can calculate various accuracy metrics for umpires:

# R: Calculate comprehensive umpire accuracy metrics

calculate_umpire_metrics <- function(data) {
  data %>%
    group_by(umpire) %>%
    summarise(
      # Volume
      total_calls = n(),

      # Overall accuracy
      accuracy = mean(correct_call),

      # Accuracy by true zone
      accuracy_in_zone = mean(correct_call[in_zone]),
      accuracy_out_zone = mean(correct_call[!in_zone]),

      # Called strike rate by true zone
      called_strike_in_zone = mean(called_strike[in_zone]),
      called_strike_out_zone = mean(called_strike[!in_zone]),

      # Edge consistency (within 3 inches of zone)
      edge_pitches = sum(dist_to_zone <= 0.25),
      edge_accuracy = mean(correct_call[dist_to_zone <= 0.25]),

      # Zone expansion (strikes called outside zone / total outside)
      zone_expansion = mean(called_strike[!in_zone]),

      # Zone contraction (balls called inside zone / total inside)
      zone_contraction = mean(!called_strike[in_zone]),

      # Favor metrics (difference in strike rate for home/away)
      # This would require additional data about home/away team

      .groups = "drop"
    ) %>%
    arrange(desc(total_calls))
}

umpire_metrics <- calculate_umpire_metrics(called_pitches)
print(head(umpire_metrics, 10))

# Visualize umpire accuracy distribution
ggplot(umpire_metrics, aes(x = accuracy)) +
  geom_histogram(bins = 20, fill = "steelblue", alpha = 0.7) +
  geom_vline(xintercept = mean(umpire_metrics$accuracy),
             color = "red", linetype = "dashed", linewidth = 1) +
  labs(title = "Distribution of Umpire Accuracy Rates",
       x = "Accuracy Rate", y = "Number of Umpires") +
  theme_minimal()

# Python: Calculate comprehensive umpire accuracy metrics

def calculate_umpire_metrics(data):
    """Calculate comprehensive accuracy metrics for each umpire"""

    metrics = []

    for umpire in data['umpire'].unique():
        ump_data = data[data['umpire'] == umpire]

        in_zone = ump_data['in_zone']
        called_strike = ump_data['called_strike']
        correct_call = ump_data['correct_call']
        dist_to_zone = ump_data['dist_to_zone']

        metric = {
            'umpire': umpire,
            'total_calls': len(ump_data),
            'accuracy': correct_call.mean(),
            'accuracy_in_zone': correct_call[in_zone].mean() if in_zone.sum() > 0 else np.nan,
            'accuracy_out_zone': correct_call[~in_zone].mean() if (~in_zone).sum() > 0 else np.nan,
            'called_strike_in_zone': called_strike[in_zone].mean() if in_zone.sum() > 0 else np.nan,
            'called_strike_out_zone': called_strike[~in_zone].mean() if (~in_zone).sum() > 0 else np.nan,
            'edge_pitches': (dist_to_zone <= 0.25).sum(),
            'edge_accuracy': correct_call[dist_to_zone <= 0.25].mean() if (dist_to_zone <= 0.25).sum() > 0 else np.nan,
            'zone_expansion': called_strike[~in_zone].mean() if (~in_zone).sum() > 0 else np.nan,
            'zone_contraction': (~called_strike[in_zone]).mean() if in_zone.sum() > 0 else np.nan,
        }

        metrics.append(metric)

    return pd.DataFrame(metrics).sort_values('total_calls', ascending=False)

umpire_metrics = calculate_umpire_metrics(called_pitches)
print(umpire_metrics.head(10))

# Visualize umpire accuracy distribution
fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(umpire_metrics['accuracy'], bins=20, alpha=0.7,
        color='steelblue', edgecolor='black')
ax.axvline(umpire_metrics['accuracy'].mean(), color='red',
           linestyle='--', linewidth=2, label='Mean Accuracy')
ax.set_xlabel('Accuracy Rate', fontsize=12)
ax.set_ylabel('Number of Umpires', fontsize=12)
ax.set_title('Distribution of Umpire Accuracy Rates', fontsize=14, fontweight='bold')
ax.legend()
plt.tight_layout()
plt.show()

# R: Strike zone classification functions

# Check if pitch is in rulebook strike zone
in_strike_zone <- function(plate_x, plate_z, sz_top, sz_bot) {
  # Horizontal boundaries (in feet)
  left_edge <- -0.708
  right_edge <- 0.708

  in_horizontal <- (plate_x >= left_edge) & (plate_x <= right_edge)
  in_vertical <- (plate_z >= sz_bot) & (plate_z <= sz_top)

  return(in_horizontal & in_vertical)
}

# Calculate distance to nearest strike zone edge
distance_to_zone <- function(plate_x, plate_z, sz_top, sz_bot) {
  left_edge <- -0.708
  right_edge <- 0.708

  # Horizontal distance
  dx <- pmax(0, pmax(left_edge - plate_x, plate_x - right_edge))

  # Vertical distance
  dz <- pmax(0, pmax(sz_bot - plate_z, plate_z - sz_top))

  # Euclidean distance to zone
  return(sqrt(dx^2 + dz^2))
}

# Apply to our data
called_pitches <- called_pitches %>%
  mutate(
    in_zone = in_strike_zone(plate_x, plate_z, sz_top, sz_bot),
    dist_to_zone = distance_to_zone(plate_x, plate_z, sz_top, sz_bot),
    # Classify pitch location
    location_type = case_when(
      in_zone ~ "In Zone",
      dist_to_zone <= 0.25 ~ "Edge (0-3in)",
      dist_to_zone <= 0.5 ~ "Near (3-6in)",
      TRUE ~ "Outside (6in+)"
    )
  )

# Accuracy analysis
accuracy_by_location <- called_pitches %>%
  mutate(
    correct_call = (in_zone & called_strike == 1) |
                   (!in_zone & called_strike == 0)
  ) %>%
  group_by(location_type) %>%
  summarise(
    n_pitches = n(),
    accuracy = mean(correct_call),
    strike_rate = mean(called_strike),
    expected_strike_rate = mean(in_zone)
  )

print(accuracy_by_location)

# R: Strike zone visualization functions

# Basic strike zone plot
plot_strike_zone <- function(data, title = "Strike Zone") {
  ggplot(data, aes(x = plate_x, y = plate_z)) +
    # Add strike zone box (using average sz_top and sz_bot)
    geom_rect(aes(xmin = -0.708, xmax = 0.708,
                  ymin = mean(sz_bot), ymax = mean(sz_top)),
              fill = NA, color = "black", linewidth = 1) +
    # Add home plate
    geom_segment(aes(x = -0.708, xend = 0.708, y = 0, yend = 0),
                 color = "black", linewidth = 1.5) +
    coord_fixed(ratio = 1) +
    labs(title = title, x = "Horizontal Location (ft)",
         y = "Vertical Location (ft)") +
    theme_minimal()
}

# Heat map of called strike probability
plot_strike_probability_heatmap <- function(data, umpire_name = NULL) {
  if (!is.null(umpire_name)) {
    data <- data %>% filter(umpire == umpire_name)
    title <- paste("Called Strike Probability -", umpire_name)
  } else {
    title <- "Called Strike Probability - All Umpires"
  }

  ggplot(data, aes(x = plate_x, y = plate_z, z = called_strike)) +
    stat_summary_2d(fun = mean, bins = 20) +
    geom_rect(aes(xmin = -0.708, xmax = 0.708,
                  ymin = mean(sz_bot), ymax = mean(sz_top)),
              fill = NA, color = "white", linewidth = 1.2) +
    scale_fill_gradient2(low = "blue", mid = "yellow", high = "red",
                        midpoint = 0.5, name = "Strike\nProbability") +
    coord_fixed(ratio = 1) +
    labs(title = title, x = "Horizontal Location (ft)",
         y = "Vertical Location (ft)") +
    theme_minimal() +
    theme(legend.position = "right")
}

# Compare umpire to league average
plot_umpire_comparison <- function(data, umpire_name) {
  umpire_data <- data %>% filter(umpire == umpire_name)
  league_data <- data

  p1 <- plot_strike_probability_heatmap(league_data, NULL)
  p2 <- plot_strike_probability_heatmap(umpire_data, umpire_name)

  grid.arrange(p1, p2, ncol = 2)
}

# Example: Plot for a specific umpire
plot_strike_probability_heatmap(called_pitches, "Umpire 1")

# R: Calculate comprehensive umpire accuracy metrics

calculate_umpire_metrics <- function(data) {
  data %>%
    group_by(umpire) %>%
    summarise(
      # Volume
      total_calls = n(),

      # Overall accuracy
      accuracy = mean(correct_call),

      # Accuracy by true zone
      accuracy_in_zone = mean(correct_call[in_zone]),
      accuracy_out_zone = mean(correct_call[!in_zone]),

      # Called strike rate by true zone
      called_strike_in_zone = mean(called_strike[in_zone]),
      called_strike_out_zone = mean(called_strike[!in_zone]),

      # Edge consistency (within 3 inches of zone)
      edge_pitches = sum(dist_to_zone <= 0.25),
      edge_accuracy = mean(correct_call[dist_to_zone <= 0.25]),

      # Zone expansion (strikes called outside zone / total outside)
      zone_expansion = mean(called_strike[!in_zone]),

      # Zone contraction (balls called inside zone / total inside)
      zone_contraction = mean(!called_strike[in_zone]),

      # Favor metrics (difference in strike rate for home/away)
      # This would require additional data about home/away team

      .groups = "drop"
    ) %>%
    arrange(desc(total_calls))
}

umpire_metrics <- calculate_umpire_metrics(called_pitches)
print(head(umpire_metrics, 10))

# Visualize umpire accuracy distribution
ggplot(umpire_metrics, aes(x = accuracy)) +
  geom_histogram(bins = 20, fill = "steelblue", alpha = 0.7) +
  geom_vline(xintercept = mean(umpire_metrics$accuracy),
             color = "red", linetype = "dashed", linewidth = 1) +
  labs(title = "Distribution of Umpire Accuracy Rates",
       x = "Accuracy Rate", y = "Number of Umpires") +
  theme_minimal()

Python

# Python: Strike zone classification functions

def in_strike_zone(plate_x, plate_z, sz_top, sz_bot):
    """Check if pitch is in rulebook strike zone"""
    left_edge = -0.708
    right_edge = 0.708

    in_horizontal = (plate_x >= left_edge) & (plate_x <= right_edge)
    in_vertical = (plate_z >= sz_bot) & (plate_z <= sz_top)

    return in_horizontal & in_vertical

def distance_to_zone(plate_x, plate_z, sz_top, sz_bot):
    """Calculate distance to nearest strike zone edge"""
    left_edge = -0.708
    right_edge = 0.708

    # Horizontal distance
    dx = np.maximum(0, np.maximum(left_edge - plate_x, plate_x - right_edge))

    # Vertical distance
    dz = np.maximum(0, np.maximum(sz_bot - plate_z, plate_z - sz_top))

    # Euclidean distance to zone
    return np.sqrt(dx**2 + dz**2)

# Apply to our data
called_pitches['in_zone'] = in_strike_zone(
    called_pitches['plate_x'].values,
    called_pitches['plate_z'].values,
    called_pitches['sz_top'].values,
    called_pitches['sz_bot'].values
)

called_pitches['dist_to_zone'] = distance_to_zone(
    called_pitches['plate_x'].values,
    called_pitches['plate_z'].values,
    called_pitches['sz_top'].values,
    called_pitches['sz_bot'].values
)

# Classify pitch location
def classify_location(dist_to_zone, in_zone):
    if in_zone:
        return "In Zone"
    elif dist_to_zone <= 0.25:
        return "Edge (0-3in)"
    elif dist_to_zone <= 0.5:
        return "Near (3-6in)"
    else:
        return "Outside (6in+)"

called_pitches['location_type'] = called_pitches.apply(
    lambda row: classify_location(row['dist_to_zone'], row['in_zone']),
    axis=1
)

# Accuracy analysis
called_pitches['correct_call'] = (
    (called_pitches['in_zone'] & (called_pitches['called_strike'] == 1)) |
    (~called_pitches['in_zone'] & (called_pitches['called_strike'] == 0))
)

accuracy_by_location = called_pitches.groupby('location_type').agg({
    'called_strike': ['count', 'mean'],
    'in_zone': 'mean',
    'correct_call': 'mean'
}).round(4)

accuracy_by_location.columns = ['n_pitches', 'strike_rate',
                                 'expected_strike_rate', 'accuracy']
print(accuracy_by_location)

Python

# Python: Strike zone visualization functions

def plot_strike_zone_base(ax, sz_top_avg, sz_bot_avg):
    """Add strike zone rectangle to plot"""
    from matplotlib.patches import Rectangle

    zone = Rectangle((-0.708, sz_bot_avg), 1.416, sz_top_avg - sz_bot_avg,
                     fill=False, edgecolor='white', linewidth=2)
    ax.add_patch(zone)

    # Add home plate
    ax.plot([-0.708, 0.708], [0, 0], 'k-', linewidth=2)

    ax.set_aspect('equal')
    ax.set_xlabel('Horizontal Location (ft)', fontsize=12)
    ax.set_ylabel('Vertical Location (ft)', fontsize=12)

def plot_strike_probability_heatmap(data, umpire_name=None, ax=None):
    """Create heatmap of called strike probability"""
    if ax is None:
        fig, ax = plt.subplots(figsize=(10, 8))

    if umpire_name:
        plot_data = data[data['umpire'] == umpire_name].copy()
        title = f"Called Strike Probability - {umpire_name}"
    else:
        plot_data = data.copy()
        title = "Called Strike Probability - All Umpires"

    # Create 2D histogram
    from scipy.stats import binned_statistic_2d

    x = plot_data['plate_x']
    y = plot_data['plate_z']
    values = plot_data['called_strike']

    statistic, x_edges, y_edges, _ = binned_statistic_2d(
        x, y, values, statistic='mean', bins=20,
        range=[[-2, 2], [0, 5]]
    )

    # Plot heatmap
    im = ax.imshow(statistic.T, origin='lower', aspect='auto',
                   extent=[-2, 2, 0, 5], cmap='RdYlBu_r',
                   vmin=0, vmax=1, alpha=0.8)

    # Add strike zone
    plot_strike_zone_base(ax, plot_data['sz_top'].mean(),
                         plot_data['sz_bot'].mean())

    ax.set_xlim(-2, 2)
    ax.set_ylim(0, 5)
    ax.set_title(title, fontsize=14, fontweight='bold')

    # Add colorbar
    plt.colorbar(im, ax=ax, label='Strike Probability')

    return ax

def plot_umpire_comparison(data, umpire_name):
    """Compare umpire to league average"""
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))

    plot_strike_probability_heatmap(data, None, ax1)
    plot_strike_probability_heatmap(data, umpire_name, ax2)

    plt.tight_layout()
    return fig

# Example: Plot for a specific umpire
fig = plot_umpire_comparison(called_pitches, 'Umpire_1')
plt.savefig('umpire_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

Python

# Python: Calculate comprehensive umpire accuracy metrics

def calculate_umpire_metrics(data):
    """Calculate comprehensive accuracy metrics for each umpire"""

    metrics = []

    for umpire in data['umpire'].unique():
        ump_data = data[data['umpire'] == umpire]

        in_zone = ump_data['in_zone']
        called_strike = ump_data['called_strike']
        correct_call = ump_data['correct_call']
        dist_to_zone = ump_data['dist_to_zone']

        metric = {
            'umpire': umpire,
            'total_calls': len(ump_data),
            'accuracy': correct_call.mean(),
            'accuracy_in_zone': correct_call[in_zone].mean() if in_zone.sum() > 0 else np.nan,
            'accuracy_out_zone': correct_call[~in_zone].mean() if (~in_zone).sum() > 0 else np.nan,
            'called_strike_in_zone': called_strike[in_zone].mean() if in_zone.sum() > 0 else np.nan,
            'called_strike_out_zone': called_strike[~in_zone].mean() if (~in_zone).sum() > 0 else np.nan,
            'edge_pitches': (dist_to_zone <= 0.25).sum(),
            'edge_accuracy': correct_call[dist_to_zone <= 0.25].mean() if (dist_to_zone <= 0.25).sum() > 0 else np.nan,
            'zone_expansion': called_strike[~in_zone].mean() if (~in_zone).sum() > 0 else np.nan,
            'zone_contraction': (~called_strike[in_zone]).mean() if in_zone.sum() > 0 else np.nan,
        }

        metrics.append(metric)

    return pd.DataFrame(metrics).sort_values('total_calls', ascending=False)

umpire_metrics = calculate_umpire_metrics(called_pitches)
print(umpire_metrics.head(10))

# Visualize umpire accuracy distribution
fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(umpire_metrics['accuracy'], bins=20, alpha=0.7,
        color='steelblue', edgecolor='black')
ax.axvline(umpire_metrics['accuracy'].mean(), color='red',
           linestyle='--', linewidth=2, label='Mean Accuracy')
ax.set_xlabel('Accuracy Rate', fontsize=12)
ax.set_ylabel('Number of Umpires', fontsize=12)
ax.set_title('Distribution of Umpire Accuracy Rates', fontsize=14, fontweight='bold')
ax.legend()
plt.tight_layout()
plt.show()

21.3 Individual Umpire Tendencies

Different umpires exhibit distinct patterns in their strike zone enforcement. Some consistently call a larger zone, while others are more conservative. Understanding these tendencies is valuable for teams preparing for games and for evaluating umpire performance.

Common Umpire Tendency Patterns

Research has identified several common patterns in umpire behavior:

Zone Size Variation: Some umpires call 5-10% more strikes than others
Directional Bias: Preferences for inside/outside or high/low strikes
Handedness Effects: Different zones for left-handed vs. right-handed batters
Count Sensitivity: Expanding the zone in pitcher-friendly counts (0-2) or hitter-friendly counts (3-0)
Experience Effects: Veteran umpires often show more consistency
Game Context: Some umpires tighten their zone in high-leverage situations

Let's analyze these tendencies systematically:

# R: Analyzing individual umpire tendencies

# 1. Zone size by umpire
zone_size_analysis <- called_pitches %>%
  group_by(umpire) %>%
  summarise(
    total_calls = n(),
    strike_rate = mean(called_strike),
    # Effective zone boundaries (where 50% are called strikes)
    left_boundary = quantile(plate_x[called_strike == 1 & plate_x < 0], 0.05),
    right_boundary = quantile(plate_x[called_strike == 1 & plate_x > 0], 0.95),
    top_boundary = quantile(plate_z[called_strike == 1], 0.95),
    bottom_boundary = quantile(plate_z[called_strike == 1], 0.05),
    # Zone width and height
    zone_width = right_boundary - left_boundary,
    zone_height = top_boundary - bottom_boundary
  ) %>%
  filter(total_calls >= 1000)  # Minimum sample size

# Plot zone size variation
ggplot(zone_size_analysis, aes(x = zone_width, y = zone_height)) +
  geom_point(aes(size = total_calls, color = strike_rate), alpha = 0.7) +
  geom_vline(xintercept = 1.417, linetype = "dashed", color = "red") +
  geom_hline(yintercept = mean(called_pitches$sz_top - called_pitches$sz_bot),
             linetype = "dashed", color = "red") +
  scale_color_gradient2(low = "blue", mid = "white", high = "red",
                       midpoint = median(zone_size_analysis$strike_rate)) +
  labs(title = "Umpire Zone Size Variation",
       subtitle = "Dashed lines show rulebook zone dimensions",
       x = "Effective Zone Width (ft)", y = "Effective Zone Height (ft)") +
  theme_minimal()

# 2. Directional tendencies
directional_analysis <- called_pitches %>%
  mutate(
    zone_region = case_when(
      plate_x < -0.708 ~ "Off Plate Inside",
      plate_x > 0.708 ~ "Off Plate Outside",
      plate_z > sz_top ~ "High",
      plate_z < sz_bot ~ "Low",
      TRUE ~ "In Zone"
    )
  ) %>%
  group_by(umpire, zone_region) %>%
  summarise(
    n = n(),
    strike_rate = mean(called_strike),
    .groups = "drop"
  ) %>%
  filter(n >= 100) %>%
  pivot_wider(names_from = zone_region, values_from = c(n, strike_rate))

print(head(directional_analysis))

# 3. Count sensitivity
count_analysis <- called_pitches %>%
  mutate(
    count_type = case_when(
      balls == 3 & strikes == 0 ~ "3-0",
      balls == 3 & strikes == 1 ~ "3-1",
      balls == 3 & strikes == 2 ~ "3-2",
      balls == 0 & strikes == 2 ~ "0-2",
      balls == 1 & strikes == 2 ~ "1-2",
      balls == 2 & strikes == 2 ~ "2-2",
      TRUE ~ "Other"
    ),
    pitcher_favorable = balls == 0 & strikes == 2,
    hitter_favorable = balls == 3 & strikes == 0
  ) %>%
  group_by(umpire) %>%
  summarise(
    strike_rate_overall = mean(called_strike),
    strike_rate_3_0 = mean(called_strike[hitter_favorable]),
    strike_rate_0_2 = mean(called_strike[pitcher_favorable]),
    count_sensitivity = strike_rate_3_0 - strike_rate_0_2,
    .groups = "drop"
  ) %>%
  filter(!is.na(count_sensitivity))

# Plot count sensitivity
ggplot(count_analysis, aes(x = strike_rate_0_2, y = strike_rate_3_0)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
  geom_point(alpha = 0.6, size = 3) +
  geom_smooth(method = "lm", se = TRUE, color = "blue") +
  labs(title = "Umpire Count Sensitivity",
       subtitle = "Strike rate in pitcher-favorable (0-2) vs hitter-favorable (3-0) counts",
       x = "Strike Rate on 0-2 Count", y = "Strike Rate on 3-0 Count") +
  theme_minimal()

# 4. Batter handedness effects
handedness_analysis <- called_pitches %>%
  mutate(
    inside = (stand == "R" & plate_x < -0.708) |
             (stand == "L" & plate_x > 0.708),
    outside = (stand == "R" & plate_x > 0.708) |
              (stand == "L" & plate_x < -0.708)
  ) %>%
  group_by(umpire, stand) %>%
  summarise(
    strike_rate_inside = mean(called_strike[inside], na.rm = TRUE),
    strike_rate_outside = mean(called_strike[outside], na.rm = TRUE),
    inside_bias = strike_rate_inside - strike_rate_outside,
    .groups = "drop"
  ) %>%
  pivot_wider(names_from = stand, values_from = c(strike_rate_inside,
                                                   strike_rate_outside, inside_bias))

print(head(handedness_analysis))

# Python: Analyzing individual umpire tendencies

# 1. Zone size by umpire
def analyze_zone_size(data):
    """Analyze effective zone size for each umpire"""
    zone_metrics = []

    for umpire in data['umpire'].unique():
        ump_data = data[data['umpire'] == umpire]

        if len(ump_data) < 1000:  # Minimum sample size
            continue

        strikes = ump_data[ump_data['called_strike'] == 1]

        if len(strikes) < 100:
            continue

        metrics = {
            'umpire': umpire,
            'total_calls': len(ump_data),
            'strike_rate': ump_data['called_strike'].mean(),
            'left_boundary': np.percentile(strikes[strikes['plate_x'] < 0]['plate_x'], 5),
            'right_boundary': np.percentile(strikes[strikes['plate_x'] > 0]['plate_x'], 95),
            'top_boundary': np.percentile(strikes['plate_z'], 95),
            'bottom_boundary': np.percentile(strikes['plate_z'], 5),
        }

        metrics['zone_width'] = metrics['right_boundary'] - metrics['left_boundary']
        metrics['zone_height'] = metrics['top_boundary'] - metrics['bottom_boundary']

        zone_metrics.append(metrics)

    return pd.DataFrame(zone_metrics)

zone_size_analysis = analyze_zone_size(called_pitches)

# Plot zone size variation
fig, ax = plt.subplots(figsize=(10, 8))
scatter = ax.scatter(zone_size_analysis['zone_width'],
                    zone_size_analysis['zone_height'],
                    s=zone_size_analysis['total_calls']/50,
                    c=zone_size_analysis['strike_rate'],
                    alpha=0.6, cmap='RdYlBu_r')
ax.axvline(1.417, linestyle='--', color='red', alpha=0.7, label='Rulebook Width')
ax.axhline(called_pitches['sz_top'].mean() - called_pitches['sz_bot'].mean(),
          linestyle='--', color='red', alpha=0.7, label='Avg Rulebook Height')
ax.set_xlabel('Effective Zone Width (ft)', fontsize=12)
ax.set_ylabel('Effective Zone Height (ft)', fontsize=12)
ax.set_title('Umpire Zone Size Variation', fontsize=14, fontweight='bold')
ax.legend()
plt.colorbar(scatter, label='Strike Rate', ax=ax)
plt.tight_layout()
plt.show()

# 2. Count sensitivity
def analyze_count_sensitivity(data):
    """Analyze how umpire zones change with count"""
    count_metrics = []

    for umpire in data['umpire'].unique():
        ump_data = data[data['umpire'] == umpire]

        pitcher_favorable = ump_data[(ump_data['balls'] == 0) &
                                     (ump_data['strikes'] == 2)]
        hitter_favorable = ump_data[(ump_data['balls'] == 3) &
                                    (ump_data['strikes'] == 0)]

        if len(pitcher_favorable) > 20 and len(hitter_favorable) > 20:
            metrics = {
                'umpire': umpire,
                'strike_rate_overall': ump_data['called_strike'].mean(),
                'strike_rate_3_0': hitter_favorable['called_strike'].mean(),
                'strike_rate_0_2': pitcher_favorable['called_strike'].mean(),
            }
            metrics['count_sensitivity'] = (metrics['strike_rate_3_0'] -
                                           metrics['strike_rate_0_2'])
            count_metrics.append(metrics)

    return pd.DataFrame(count_metrics)

count_analysis = analyze_count_sensitivity(called_pitches)

# Plot count sensitivity
fig, ax = plt.subplots(figsize=(10, 8))
ax.scatter(count_analysis['strike_rate_0_2'],
          count_analysis['strike_rate_3_0'],
          alpha=0.6, s=100)
ax.plot([0, 1], [0, 1], 'k--', alpha=0.3, label='Equal rates')

# Fit line
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(
    count_analysis['strike_rate_0_2'], count_analysis['strike_rate_3_0'])
x_line = np.array([count_analysis['strike_rate_0_2'].min(),
                   count_analysis['strike_rate_0_2'].max()])
ax.plot(x_line, slope * x_line + intercept, 'b-',
        label=f'Fit line (R²={r_value**2:.3f})')

ax.set_xlabel('Strike Rate on 0-2 Count', fontsize=12)
ax.set_ylabel('Strike Rate on 3-0 Count', fontsize=12)
ax.set_title('Umpire Count Sensitivity', fontsize=14, fontweight='bold')
ax.legend()
plt.tight_layout()
plt.show()

# 3. Batter handedness effects
def analyze_handedness_effects(data):
    """Analyze inside/outside tendencies by batter handedness"""
    hand_metrics = []

    for umpire in data['umpire'].unique():
        ump_data = data[data['umpire'] == umpire]

        for stand in ['L', 'R']:
            stand_data = ump_data[ump_data['stand'] == stand]

            if stand == 'R':
                inside = stand_data[stand_data['plate_x'] < -0.708]
                outside = stand_data[stand_data['plate_x'] > 0.708]
            else:
                inside = stand_data[stand_data['plate_x'] > 0.708]
                outside = stand_data[stand_data['plate_x'] < -0.708]

            if len(inside) > 20 and len(outside) > 20:
                metrics = {
                    'umpire': umpire,
                    'stand': stand,
                    'strike_rate_inside': inside['called_strike'].mean(),
                    'strike_rate_outside': outside['called_strike'].mean(),
                }
                metrics['inside_bias'] = (metrics['strike_rate_inside'] -
                                         metrics['strike_rate_outside'])
                hand_metrics.append(metrics)

    return pd.DataFrame(hand_metrics)

handedness_analysis = analyze_handedness_effects(called_pitches)
print(handedness_analysis.head(10))

Umpire Consistency Metrics

Beyond accuracy, consistency is a crucial aspect of umpire performance. A consistent umpire may have a slightly expanded zone, but as long as it's predictable, players can adjust. Let's measure consistency:

# R: Measuring umpire consistency

# Consistency metric: Standard deviation of calls at similar locations
calculate_consistency <- function(data) {
  # Create location bins
  data <- data %>%
    mutate(
      x_bin = cut(plate_x, breaks = seq(-2, 2, 0.2)),
      z_bin = cut(plate_z, breaks = seq(0, 5, 0.2))
    )

  # Calculate consistency within bins
  consistency_by_bin <- data %>%
    group_by(umpire, x_bin, z_bin) %>%
    summarise(
      n = n(),
      strike_rate = mean(called_strike),
      consistency = sd(called_strike),
      .groups = "drop"
    ) %>%
    filter(n >= 10)  # Need sufficient sample in each bin

  # Aggregate to umpire level
  umpire_consistency <- consistency_by_bin %>%
    group_by(umpire) %>%
    summarise(
      avg_consistency = mean(consistency, na.rm = TRUE),
      consistency_variation = sd(consistency, na.rm = TRUE),
      bins_analyzed = n()
    )

  return(umpire_consistency)
}

consistency_metrics <- calculate_consistency(called_pitches)

# Merge with accuracy metrics
umpire_performance <- umpire_metrics %>%
  left_join(consistency_metrics, by = "umpire")

# Plot accuracy vs consistency
ggplot(umpire_performance, aes(x = accuracy, y = avg_consistency)) +
  geom_point(aes(size = total_calls), alpha = 0.6) +
  geom_smooth(method = "lm", se = TRUE, color = "blue") +
  labs(title = "Umpire Accuracy vs Consistency",
       x = "Accuracy Rate", y = "Average Consistency (lower is better)",
       size = "Total Calls") +
  theme_minimal()

# R: Analyzing individual umpire tendencies

# 1. Zone size by umpire
zone_size_analysis <- called_pitches %>%
  group_by(umpire) %>%
  summarise(
    total_calls = n(),
    strike_rate = mean(called_strike),
    # Effective zone boundaries (where 50% are called strikes)
    left_boundary = quantile(plate_x[called_strike == 1 & plate_x < 0], 0.05),
    right_boundary = quantile(plate_x[called_strike == 1 & plate_x > 0], 0.95),
    top_boundary = quantile(plate_z[called_strike == 1], 0.95),
    bottom_boundary = quantile(plate_z[called_strike == 1], 0.05),
    # Zone width and height
    zone_width = right_boundary - left_boundary,
    zone_height = top_boundary - bottom_boundary
  ) %>%
  filter(total_calls >= 1000)  # Minimum sample size

# Plot zone size variation
ggplot(zone_size_analysis, aes(x = zone_width, y = zone_height)) +
  geom_point(aes(size = total_calls, color = strike_rate), alpha = 0.7) +
  geom_vline(xintercept = 1.417, linetype = "dashed", color = "red") +
  geom_hline(yintercept = mean(called_pitches$sz_top - called_pitches$sz_bot),
             linetype = "dashed", color = "red") +
  scale_color_gradient2(low = "blue", mid = "white", high = "red",
                       midpoint = median(zone_size_analysis$strike_rate)) +
  labs(title = "Umpire Zone Size Variation",
       subtitle = "Dashed lines show rulebook zone dimensions",
       x = "Effective Zone Width (ft)", y = "Effective Zone Height (ft)") +
  theme_minimal()

# 2. Directional tendencies
directional_analysis <- called_pitches %>%
  mutate(
    zone_region = case_when(
      plate_x < -0.708 ~ "Off Plate Inside",
      plate_x > 0.708 ~ "Off Plate Outside",
      plate_z > sz_top ~ "High",
      plate_z < sz_bot ~ "Low",
      TRUE ~ "In Zone"
    )
  ) %>%
  group_by(umpire, zone_region) %>%
  summarise(
    n = n(),
    strike_rate = mean(called_strike),
    .groups = "drop"
  ) %>%
  filter(n >= 100) %>%
  pivot_wider(names_from = zone_region, values_from = c(n, strike_rate))

print(head(directional_analysis))

# 3. Count sensitivity
count_analysis <- called_pitches %>%
  mutate(
    count_type = case_when(
      balls == 3 & strikes == 0 ~ "3-0",
      balls == 3 & strikes == 1 ~ "3-1",
      balls == 3 & strikes == 2 ~ "3-2",
      balls == 0 & strikes == 2 ~ "0-2",
      balls == 1 & strikes == 2 ~ "1-2",
      balls == 2 & strikes == 2 ~ "2-2",
      TRUE ~ "Other"
    ),
    pitcher_favorable = balls == 0 & strikes == 2,
    hitter_favorable = balls == 3 & strikes == 0
  ) %>%
  group_by(umpire) %>%
  summarise(
    strike_rate_overall = mean(called_strike),
    strike_rate_3_0 = mean(called_strike[hitter_favorable]),
    strike_rate_0_2 = mean(called_strike[pitcher_favorable]),
    count_sensitivity = strike_rate_3_0 - strike_rate_0_2,
    .groups = "drop"
  ) %>%
  filter(!is.na(count_sensitivity))

# Plot count sensitivity
ggplot(count_analysis, aes(x = strike_rate_0_2, y = strike_rate_3_0)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
  geom_point(alpha = 0.6, size = 3) +
  geom_smooth(method = "lm", se = TRUE, color = "blue") +
  labs(title = "Umpire Count Sensitivity",
       subtitle = "Strike rate in pitcher-favorable (0-2) vs hitter-favorable (3-0) counts",
       x = "Strike Rate on 0-2 Count", y = "Strike Rate on 3-0 Count") +
  theme_minimal()

# 4. Batter handedness effects
handedness_analysis <- called_pitches %>%
  mutate(
    inside = (stand == "R" & plate_x < -0.708) |
             (stand == "L" & plate_x > 0.708),
    outside = (stand == "R" & plate_x > 0.708) |
              (stand == "L" & plate_x < -0.708)
  ) %>%
  group_by(umpire, stand) %>%
  summarise(
    strike_rate_inside = mean(called_strike[inside], na.rm = TRUE),
    strike_rate_outside = mean(called_strike[outside], na.rm = TRUE),
    inside_bias = strike_rate_inside - strike_rate_outside,
    .groups = "drop"
  ) %>%
  pivot_wider(names_from = stand, values_from = c(strike_rate_inside,
                                                   strike_rate_outside, inside_bias))

print(head(handedness_analysis))

# R: Measuring umpire consistency

# Consistency metric: Standard deviation of calls at similar locations
calculate_consistency <- function(data) {
  # Create location bins
  data <- data %>%
    mutate(
      x_bin = cut(plate_x, breaks = seq(-2, 2, 0.2)),
      z_bin = cut(plate_z, breaks = seq(0, 5, 0.2))
    )

  # Calculate consistency within bins
  consistency_by_bin <- data %>%
    group_by(umpire, x_bin, z_bin) %>%
    summarise(
      n = n(),
      strike_rate = mean(called_strike),
      consistency = sd(called_strike),
      .groups = "drop"
    ) %>%
    filter(n >= 10)  # Need sufficient sample in each bin

  # Aggregate to umpire level
  umpire_consistency <- consistency_by_bin %>%
    group_by(umpire) %>%
    summarise(
      avg_consistency = mean(consistency, na.rm = TRUE),
      consistency_variation = sd(consistency, na.rm = TRUE),
      bins_analyzed = n()
    )

  return(umpire_consistency)
}

consistency_metrics <- calculate_consistency(called_pitches)

# Merge with accuracy metrics
umpire_performance <- umpire_metrics %>%
  left_join(consistency_metrics, by = "umpire")

# Plot accuracy vs consistency
ggplot(umpire_performance, aes(x = accuracy, y = avg_consistency)) +
  geom_point(aes(size = total_calls), alpha = 0.6) +
  geom_smooth(method = "lm", se = TRUE, color = "blue") +
  labs(title = "Umpire Accuracy vs Consistency",
       x = "Accuracy Rate", y = "Average Consistency (lower is better)",
       size = "Total Calls") +
  theme_minimal()

Python

# Python: Analyzing individual umpire tendencies

# 1. Zone size by umpire
def analyze_zone_size(data):
    """Analyze effective zone size for each umpire"""
    zone_metrics = []

    for umpire in data['umpire'].unique():
        ump_data = data[data['umpire'] == umpire]

        if len(ump_data) < 1000:  # Minimum sample size
            continue

        strikes = ump_data[ump_data['called_strike'] == 1]

        if len(strikes) < 100:
            continue

        metrics = {
            'umpire': umpire,
            'total_calls': len(ump_data),
            'strike_rate': ump_data['called_strike'].mean(),
            'left_boundary': np.percentile(strikes[strikes['plate_x'] < 0]['plate_x'], 5),
            'right_boundary': np.percentile(strikes[strikes['plate_x'] > 0]['plate_x'], 95),
            'top_boundary': np.percentile(strikes['plate_z'], 95),
            'bottom_boundary': np.percentile(strikes['plate_z'], 5),
        }

        metrics['zone_width'] = metrics['right_boundary'] - metrics['left_boundary']
        metrics['zone_height'] = metrics['top_boundary'] - metrics['bottom_boundary']

        zone_metrics.append(metrics)

    return pd.DataFrame(zone_metrics)

zone_size_analysis = analyze_zone_size(called_pitches)

# Plot zone size variation
fig, ax = plt.subplots(figsize=(10, 8))
scatter = ax.scatter(zone_size_analysis['zone_width'],
                    zone_size_analysis['zone_height'],
                    s=zone_size_analysis['total_calls']/50,
                    c=zone_size_analysis['strike_rate'],
                    alpha=0.6, cmap='RdYlBu_r')
ax.axvline(1.417, linestyle='--', color='red', alpha=0.7, label='Rulebook Width')
ax.axhline(called_pitches['sz_top'].mean() - called_pitches['sz_bot'].mean(),
          linestyle='--', color='red', alpha=0.7, label='Avg Rulebook Height')
ax.set_xlabel('Effective Zone Width (ft)', fontsize=12)
ax.set_ylabel('Effective Zone Height (ft)', fontsize=12)
ax.set_title('Umpire Zone Size Variation', fontsize=14, fontweight='bold')
ax.legend()
plt.colorbar(scatter, label='Strike Rate', ax=ax)
plt.tight_layout()
plt.show()

# 2. Count sensitivity
def analyze_count_sensitivity(data):
    """Analyze how umpire zones change with count"""
    count_metrics = []

    for umpire in data['umpire'].unique():
        ump_data = data[data['umpire'] == umpire]

        pitcher_favorable = ump_data[(ump_data['balls'] == 0) &
                                     (ump_data['strikes'] == 2)]
        hitter_favorable = ump_data[(ump_data['balls'] == 3) &
                                    (ump_data['strikes'] == 0)]

        if len(pitcher_favorable) > 20 and len(hitter_favorable) > 20:
            metrics = {
                'umpire': umpire,
                'strike_rate_overall': ump_data['called_strike'].mean(),
                'strike_rate_3_0': hitter_favorable['called_strike'].mean(),
                'strike_rate_0_2': pitcher_favorable['called_strike'].mean(),
            }
            metrics['count_sensitivity'] = (metrics['strike_rate_3_0'] -
                                           metrics['strike_rate_0_2'])
            count_metrics.append(metrics)

    return pd.DataFrame(count_metrics)

count_analysis = analyze_count_sensitivity(called_pitches)

# Plot count sensitivity
fig, ax = plt.subplots(figsize=(10, 8))
ax.scatter(count_analysis['strike_rate_0_2'],
          count_analysis['strike_rate_3_0'],
          alpha=0.6, s=100)
ax.plot([0, 1], [0, 1], 'k--', alpha=0.3, label='Equal rates')

# Fit line
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(
    count_analysis['strike_rate_0_2'], count_analysis['strike_rate_3_0'])
x_line = np.array([count_analysis['strike_rate_0_2'].min(),
                   count_analysis['strike_rate_0_2'].max()])
ax.plot(x_line, slope * x_line + intercept, 'b-',
        label=f'Fit line (R²={r_value**2:.3f})')

ax.set_xlabel('Strike Rate on 0-2 Count', fontsize=12)
ax.set_ylabel('Strike Rate on 3-0 Count', fontsize=12)
ax.set_title('Umpire Count Sensitivity', fontsize=14, fontweight='bold')
ax.legend()
plt.tight_layout()
plt.show()

# 3. Batter handedness effects
def analyze_handedness_effects(data):
    """Analyze inside/outside tendencies by batter handedness"""
    hand_metrics = []

    for umpire in data['umpire'].unique():
        ump_data = data[data['umpire'] == umpire]

        for stand in ['L', 'R']:
            stand_data = ump_data[ump_data['stand'] == stand]

            if stand == 'R':
                inside = stand_data[stand_data['plate_x'] < -0.708]
                outside = stand_data[stand_data['plate_x'] > 0.708]
            else:
                inside = stand_data[stand_data['plate_x'] > 0.708]
                outside = stand_data[stand_data['plate_x'] < -0.708]

            if len(inside) > 20 and len(outside) > 20:
                metrics = {
                    'umpire': umpire,
                    'stand': stand,
                    'strike_rate_inside': inside['called_strike'].mean(),
                    'strike_rate_outside': outside['called_strike'].mean(),
                }
                metrics['inside_bias'] = (metrics['strike_rate_inside'] -
                                         metrics['strike_rate_outside'])
                hand_metrics.append(metrics)

    return pd.DataFrame(hand_metrics)

handedness_analysis = analyze_handedness_effects(called_pitches)
print(handedness_analysis.head(10))

21.4 Called Strike Probability Models

Machine learning models can predict the probability that a pitch will be called a strike based on its location, context, and the umpire calling the game. These models serve multiple purposes:

Expected vs Actual: Compare predicted probabilities to actual calls to identify unusual decisions
Umpire Effects: Quantify how much each umpire deviates from expected behavior
Strategic Planning: Help teams understand strike probabilities in different game situations
Automated Strike Zone: Provide baseline for ABS system calibration

Logistic Regression Model

We'll start with a logistic regression model, which provides interpretable coefficients:

# R: Logistic regression for called strike probability

library(broom)

# Prepare features
model_data <- called_pitches %>%
  filter(!is.na(plate_x) & !is.na(plate_z)) %>%
  mutate(
    # Distance features
    abs_plate_x = abs(plate_x),
    plate_x_squared = plate_x^2,
    plate_z_squared = plate_z^2,

    # Interaction with zone
    x_z_interaction = plate_x * plate_z,

    # Count features
    count = paste0(balls, "-", strikes),
    pitcher_ahead = strikes > balls,

    # Batter-pitcher matchup
    same_hand = stand == p_throws
  )

# Build logistic regression model
logit_model <- glm(
  called_strike ~ plate_x + plate_z +
    plate_x_squared + plate_z_squared +
    x_z_interaction +
    abs_plate_x +
    count + stand + p_throws +
    same_hand + outs,
  data = model_data,
  family = binomial(link = "logit")
)

# Model summary
summary(logit_model)

# Get coefficients
coef_df <- tidy(logit_model) %>%
  arrange(desc(abs(statistic)))

print(coef_df)

# Add predictions to data
model_data$pred_prob_logit <- predict(logit_model, type = "response")

# Model performance
library(pROC)
roc_obj <- roc(model_data$called_strike, model_data$pred_prob_logit)
auc_score <- auc(roc_obj)
cat("Logistic Regression AUC:", round(auc_score, 4), "\n")

# Plot ROC curve
plot(roc_obj, main = paste("ROC Curve - Logistic Regression (AUC =",
                           round(auc_score, 3), ")"))

# Calibration plot
model_data %>%
  mutate(pred_bin = cut(pred_prob_logit, breaks = seq(0, 1, 0.1))) %>%
  group_by(pred_bin) %>%
  summarise(
    predicted = mean(pred_prob_logit),
    actual = mean(called_strike),
    n = n()
  ) %>%
  ggplot(aes(x = predicted, y = actual)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
  geom_point(aes(size = n), alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Calibration Plot - Logistic Regression",
       x = "Predicted Probability", y = "Actual Strike Rate") +
  theme_minimal()

# Python: Logistic regression for called strike probability

from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_curve, auc, classification_report

# Prepare features
model_data = called_pitches.dropna(subset=['plate_x', 'plate_z']).copy()

# Create features
model_data['abs_plate_x'] = np.abs(model_data['plate_x'])
model_data['plate_x_squared'] = model_data['plate_x']**2
model_data['plate_z_squared'] = model_data['plate_z']**2
model_data['x_z_interaction'] = model_data['plate_x'] * model_data['plate_z']
model_data['count'] = model_data['balls'].astype(str) + '-' + model_data['strikes'].astype(str)
model_data['pitcher_ahead'] = (model_data['strikes'] > model_data['balls']).astype(int)
model_data['same_hand'] = (model_data['stand'] == model_data['p_throws']).astype(int)

# Prepare feature matrix
feature_cols = ['plate_x', 'plate_z', 'plate_x_squared', 'plate_z_squared',
                'x_z_interaction', 'abs_plate_x', 'pitcher_ahead',
                'same_hand', 'outs']

X = model_data[feature_cols]
y = model_data['called_strike']

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Fit logistic regression
logit_model = LogisticRegression(max_iter=1000, random_state=42)
logit_model.fit(X_train_scaled, y_train)

# Predictions
y_pred_prob = logit_model.predict_proba(X_test_scaled)[:, 1]
y_pred = logit_model.predict(X_test_scaled)

# Model performance
print("Logistic Regression Performance:")
print(classification_report(y_test, y_pred))

# AUC
fpr, tpr, _ = roc_curve(y_test, y_pred_prob)
roc_auc = auc(fpr, tpr)
print(f"\nAUC: {roc_auc:.4f}")

# Plot ROC curve
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

ax1.plot(fpr, tpr, color='blue', lw=2,
         label=f'ROC curve (AUC = {roc_auc:.3f})')
ax1.plot([0, 1], [0, 1], color='red', lw=2, linestyle='--', label='Random')
ax1.set_xlim([0.0, 1.0])
ax1.set_ylim([0.0, 1.05])
ax1.set_xlabel('False Positive Rate', fontsize=12)
ax1.set_ylabel('True Positive Rate', fontsize=12)
ax1.set_title('ROC Curve - Logistic Regression', fontsize=14, fontweight='bold')
ax1.legend(loc="lower right")
ax1.grid(alpha=0.3)

# Calibration plot
bins = np.linspace(0, 1, 11)
bin_centers = (bins[:-1] + bins[1:]) / 2
digitized = np.digitize(y_pred_prob, bins) - 1

calibration_data = []
for i in range(len(bins) - 1):
    mask = digitized == i
    if mask.sum() > 0:
        calibration_data.append({
            'predicted': y_pred_prob[mask].mean(),
            'actual': y_test.values[mask].mean(),
            'count': mask.sum()
        })

calib_df = pd.DataFrame(calibration_data)

ax2.plot([0, 1], [0, 1], 'r--', lw=2, label='Perfect calibration')
ax2.scatter(calib_df['predicted'], calib_df['actual'],
           s=calib_df['count']/10, alpha=0.6)
ax2.set_xlabel('Predicted Probability', fontsize=12)
ax2.set_ylabel('Actual Strike Rate', fontsize=12)
ax2.set_title('Calibration Plot', fontsize=14, fontweight='bold')
ax2.legend()
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Feature importance
feature_importance = pd.DataFrame({
    'feature': feature_cols,
    'coefficient': logit_model.coef_[0]
}).sort_values('coefficient', key=abs, ascending=False)

print("\nFeature Importance (Logistic Regression Coefficients):")
print(feature_importance)

Random Forest Model

Random forests can capture non-linear relationships and interactions that logistic regression might miss:

# R: Random Forest for called strike probability

library(randomForest)

# Prepare data for random forest
rf_features <- c("plate_x", "plate_z", "abs_plate_x",
                 "dist_to_zone", "balls", "strikes", "outs",
                 "stand", "p_throws", "pitch_type")

rf_data <- model_data[complete.cases(model_data[, rf_features]), ]
rf_data$called_strike <- as.factor(rf_data$called_strike)

# Split data
set.seed(42)
train_idx <- sample(1:nrow(rf_data), 0.7 * nrow(rf_data))
train_data <- rf_data[train_idx, ]
test_data <- rf_data[-train_idx, ]

# Train random forest
rf_model <- randomForest(
  as.formula(paste("called_strike ~", paste(rf_features, collapse = " + "))),
  data = train_data,
  ntree = 100,
  mtry = 3,
  importance = TRUE
)

# Predictions
test_data$pred_prob_rf <- predict(rf_model, test_data, type = "prob")[, 2]

# Performance
rf_roc <- roc(as.numeric(as.character(test_data$called_strike)),
              test_data$pred_prob_rf)
rf_auc <- auc(rf_roc)
cat("Random Forest AUC:", round(rf_auc, 4), "\n")

# Variable importance
importance_df <- as.data.frame(importance(rf_model)) %>%
  tibble::rownames_to_column("variable") %>%
  arrange(desc(MeanDecreaseGini))

# Plot variable importance
ggplot(importance_df, aes(x = reorder(variable, MeanDecreaseGini),
                          y = MeanDecreaseGini)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(title = "Random Forest Variable Importance",
       x = "Variable", y = "Mean Decrease in Gini") +
  theme_minimal()

# Python: Random Forest for called strike probability

# Prepare data
rf_features = ['plate_x', 'plate_z', 'abs_plate_x', 'dist_to_zone',
               'balls', 'strikes', 'outs']

# Encode categorical variables
from sklearn.preprocessing import LabelEncoder

model_data_rf = model_data.copy()
for col in ['stand', 'p_throws', 'pitch_type']:
    le = LabelEncoder()
    model_data_rf[col + '_encoded'] = le.fit_transform(model_data_rf[col])
    rf_features.append(col + '_encoded')

# Prepare feature matrix
X_rf = model_data_rf[rf_features].dropna()
y_rf = model_data_rf.loc[X_rf.index, 'called_strike']

# Split data
X_train_rf, X_test_rf, y_train_rf, y_test_rf = train_test_split(
    X_rf, y_rf, test_size=0.3, random_state=42, stratify=y_rf
)

# Train random forest
rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=20,
    min_samples_split=100,
    random_state=42,
    n_jobs=-1
)

rf_model.fit(X_train_rf, y_train_rf)

# Predictions
y_pred_prob_rf = rf_model.predict_proba(X_test_rf)[:, 1]
y_pred_rf = rf_model.predict(X_test_rf)

# Performance
print("Random Forest Performance:")
print(classification_report(y_test_rf, y_pred_rf))

fpr_rf, tpr_rf, _ = roc_curve(y_test_rf, y_pred_prob_rf)
roc_auc_rf = auc(fpr_rf, tpr_rf)
print(f"\nAUC: {roc_auc_rf:.4f}")

# Feature importance
feature_importance_rf = pd.DataFrame({
    'feature': rf_features,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

# Plot feature importance
fig, ax = plt.subplots(figsize=(10, 6))
ax.barh(feature_importance_rf['feature'], feature_importance_rf['importance'])
ax.set_xlabel('Importance', fontsize=12)
ax.set_title('Random Forest Feature Importance', fontsize=14, fontweight='bold')
ax.invert_yaxis()
plt.tight_layout()
plt.show()

print("\nFeature Importance:")
print(feature_importance_rf)

Umpire-Specific Models

We can build umpire-specific models or include umpire as a feature to capture individual tendencies:

# R: Umpire-specific adjustments

# Add umpire as a factor in the model
umpire_model_data <- model_data %>%
  filter(umpire %in% names(sort(table(umpire), decreasing = TRUE)[1:15]))  # Top 15 umpires

umpire_logit <- glm(
  called_strike ~ plate_x + plate_z +
    plate_x_squared + plate_z_squared +
    count + stand + umpire,
  data = umpire_model_data,
  family = binomial(link = "logit")
)

# Extract umpire coefficients
umpire_effects <- tidy(umpire_logit) %>%
  filter(str_detect(term, "umpire")) %>%
  mutate(
    umpire_name = str_remove(term, "umpire"),
    effect_on_strike_prob = plogis(estimate) - 0.5  # Effect on probability
  ) %>%
  arrange(desc(estimate))

print(umpire_effects)

# Visualize umpire effects
ggplot(umpire_effects, aes(x = reorder(umpire_name, estimate), y = estimate)) +
  geom_col(aes(fill = estimate > 0)) +
  geom_errorbar(aes(ymin = estimate - 1.96*std.error,
                    ymax = estimate + 1.96*std.error), width = 0.2) +
  coord_flip() +
  scale_fill_manual(values = c("blue", "red"), guide = "none") +
  labs(title = "Umpire Effects on Strike Probability",
       subtitle = "Coefficient estimates with 95% confidence intervals",
       x = "Umpire", y = "Log-Odds Effect") +
  theme_minimal()

# R: Logistic regression for called strike probability

library(broom)

# Prepare features
model_data <- called_pitches %>%
  filter(!is.na(plate_x) & !is.na(plate_z)) %>%
  mutate(
    # Distance features
    abs_plate_x = abs(plate_x),
    plate_x_squared = plate_x^2,
    plate_z_squared = plate_z^2,

    # Interaction with zone
    x_z_interaction = plate_x * plate_z,

    # Count features
    count = paste0(balls, "-", strikes),
    pitcher_ahead = strikes > balls,

    # Batter-pitcher matchup
    same_hand = stand == p_throws
  )

# Build logistic regression model
logit_model <- glm(
  called_strike ~ plate_x + plate_z +
    plate_x_squared + plate_z_squared +
    x_z_interaction +
    abs_plate_x +
    count + stand + p_throws +
    same_hand + outs,
  data = model_data,
  family = binomial(link = "logit")
)

# Model summary
summary(logit_model)

# Get coefficients
coef_df <- tidy(logit_model) %>%
  arrange(desc(abs(statistic)))

print(coef_df)

# Add predictions to data
model_data$pred_prob_logit <- predict(logit_model, type = "response")

# Model performance
library(pROC)
roc_obj <- roc(model_data$called_strike, model_data$pred_prob_logit)
auc_score <- auc(roc_obj)
cat("Logistic Regression AUC:", round(auc_score, 4), "\n")

# Plot ROC curve
plot(roc_obj, main = paste("ROC Curve - Logistic Regression (AUC =",
                           round(auc_score, 3), ")"))

# Calibration plot
model_data %>%
  mutate(pred_bin = cut(pred_prob_logit, breaks = seq(0, 1, 0.1))) %>%
  group_by(pred_bin) %>%
  summarise(
    predicted = mean(pred_prob_logit),
    actual = mean(called_strike),
    n = n()
  ) %>%
  ggplot(aes(x = predicted, y = actual)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
  geom_point(aes(size = n), alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Calibration Plot - Logistic Regression",
       x = "Predicted Probability", y = "Actual Strike Rate") +
  theme_minimal()

# R: Random Forest for called strike probability

library(randomForest)

# Prepare data for random forest
rf_features <- c("plate_x", "plate_z", "abs_plate_x",
                 "dist_to_zone", "balls", "strikes", "outs",
                 "stand", "p_throws", "pitch_type")

rf_data <- model_data[complete.cases(model_data[, rf_features]), ]
rf_data$called_strike <- as.factor(rf_data$called_strike)

# Split data
set.seed(42)
train_idx <- sample(1:nrow(rf_data), 0.7 * nrow(rf_data))
train_data <- rf_data[train_idx, ]
test_data <- rf_data[-train_idx, ]

# Train random forest
rf_model <- randomForest(
  as.formula(paste("called_strike ~", paste(rf_features, collapse = " + "))),
  data = train_data,
  ntree = 100,
  mtry = 3,
  importance = TRUE
)

# Predictions
test_data$pred_prob_rf <- predict(rf_model, test_data, type = "prob")[, 2]

# Performance
rf_roc <- roc(as.numeric(as.character(test_data$called_strike)),
              test_data$pred_prob_rf)
rf_auc <- auc(rf_roc)
cat("Random Forest AUC:", round(rf_auc, 4), "\n")

# Variable importance
importance_df <- as.data.frame(importance(rf_model)) %>%
  tibble::rownames_to_column("variable") %>%
  arrange(desc(MeanDecreaseGini))

# Plot variable importance
ggplot(importance_df, aes(x = reorder(variable, MeanDecreaseGini),
                          y = MeanDecreaseGini)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(title = "Random Forest Variable Importance",
       x = "Variable", y = "Mean Decrease in Gini") +
  theme_minimal()

# R: Umpire-specific adjustments

# Add umpire as a factor in the model
umpire_model_data <- model_data %>%
  filter(umpire %in% names(sort(table(umpire), decreasing = TRUE)[1:15]))  # Top 15 umpires

umpire_logit <- glm(
  called_strike ~ plate_x + plate_z +
    plate_x_squared + plate_z_squared +
    count + stand + umpire,
  data = umpire_model_data,
  family = binomial(link = "logit")
)

# Extract umpire coefficients
umpire_effects <- tidy(umpire_logit) %>%
  filter(str_detect(term, "umpire")) %>%
  mutate(
    umpire_name = str_remove(term, "umpire"),
    effect_on_strike_prob = plogis(estimate) - 0.5  # Effect on probability
  ) %>%
  arrange(desc(estimate))

print(umpire_effects)

# Visualize umpire effects
ggplot(umpire_effects, aes(x = reorder(umpire_name, estimate), y = estimate)) +
  geom_col(aes(fill = estimate > 0)) +
  geom_errorbar(aes(ymin = estimate - 1.96*std.error,
                    ymax = estimate + 1.96*std.error), width = 0.2) +
  coord_flip() +
  scale_fill_manual(values = c("blue", "red"), guide = "none") +
  labs(title = "Umpire Effects on Strike Probability",
       subtitle = "Coefficient estimates with 95% confidence intervals",
       x = "Umpire", y = "Log-Odds Effect") +
  theme_minimal()

Python

# Python: Logistic regression for called strike probability

from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_curve, auc, classification_report

# Prepare features
model_data = called_pitches.dropna(subset=['plate_x', 'plate_z']).copy()

# Create features
model_data['abs_plate_x'] = np.abs(model_data['plate_x'])
model_data['plate_x_squared'] = model_data['plate_x']**2
model_data['plate_z_squared'] = model_data['plate_z']**2
model_data['x_z_interaction'] = model_data['plate_x'] * model_data['plate_z']
model_data['count'] = model_data['balls'].astype(str) + '-' + model_data['strikes'].astype(str)
model_data['pitcher_ahead'] = (model_data['strikes'] > model_data['balls']).astype(int)
model_data['same_hand'] = (model_data['stand'] == model_data['p_throws']).astype(int)

# Prepare feature matrix
feature_cols = ['plate_x', 'plate_z', 'plate_x_squared', 'plate_z_squared',
                'x_z_interaction', 'abs_plate_x', 'pitcher_ahead',
                'same_hand', 'outs']

X = model_data[feature_cols]
y = model_data['called_strike']

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Fit logistic regression
logit_model = LogisticRegression(max_iter=1000, random_state=42)
logit_model.fit(X_train_scaled, y_train)

# Predictions
y_pred_prob = logit_model.predict_proba(X_test_scaled)[:, 1]
y_pred = logit_model.predict(X_test_scaled)

# Model performance
print("Logistic Regression Performance:")
print(classification_report(y_test, y_pred))

# AUC
fpr, tpr, _ = roc_curve(y_test, y_pred_prob)
roc_auc = auc(fpr, tpr)
print(f"\nAUC: {roc_auc:.4f}")

# Plot ROC curve
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

ax1.plot(fpr, tpr, color='blue', lw=2,
         label=f'ROC curve (AUC = {roc_auc:.3f})')
ax1.plot([0, 1], [0, 1], color='red', lw=2, linestyle='--', label='Random')
ax1.set_xlim([0.0, 1.0])
ax1.set_ylim([0.0, 1.05])
ax1.set_xlabel('False Positive Rate', fontsize=12)
ax1.set_ylabel('True Positive Rate', fontsize=12)
ax1.set_title('ROC Curve - Logistic Regression', fontsize=14, fontweight='bold')
ax1.legend(loc="lower right")
ax1.grid(alpha=0.3)

# Calibration plot
bins = np.linspace(0, 1, 11)
bin_centers = (bins[:-1] + bins[1:]) / 2
digitized = np.digitize(y_pred_prob, bins) - 1

calibration_data = []
for i in range(len(bins) - 1):
    mask = digitized == i
    if mask.sum() > 0:
        calibration_data.append({
            'predicted': y_pred_prob[mask].mean(),
            'actual': y_test.values[mask].mean(),
            'count': mask.sum()
        })

calib_df = pd.DataFrame(calibration_data)

ax2.plot([0, 1], [0, 1], 'r--', lw=2, label='Perfect calibration')
ax2.scatter(calib_df['predicted'], calib_df['actual'],
           s=calib_df['count']/10, alpha=0.6)
ax2.set_xlabel('Predicted Probability', fontsize=12)
ax2.set_ylabel('Actual Strike Rate', fontsize=12)
ax2.set_title('Calibration Plot', fontsize=14, fontweight='bold')
ax2.legend()
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Feature importance
feature_importance = pd.DataFrame({
    'feature': feature_cols,
    'coefficient': logit_model.coef_[0]
}).sort_values('coefficient', key=abs, ascending=False)

print("\nFeature Importance (Logistic Regression Coefficients):")
print(feature_importance)

Python

# Python: Random Forest for called strike probability

# Prepare data
rf_features = ['plate_x', 'plate_z', 'abs_plate_x', 'dist_to_zone',
               'balls', 'strikes', 'outs']

# Encode categorical variables
from sklearn.preprocessing import LabelEncoder

model_data_rf = model_data.copy()
for col in ['stand', 'p_throws', 'pitch_type']:
    le = LabelEncoder()
    model_data_rf[col + '_encoded'] = le.fit_transform(model_data_rf[col])
    rf_features.append(col + '_encoded')

# Prepare feature matrix
X_rf = model_data_rf[rf_features].dropna()
y_rf = model_data_rf.loc[X_rf.index, 'called_strike']

# Split data
X_train_rf, X_test_rf, y_train_rf, y_test_rf = train_test_split(
    X_rf, y_rf, test_size=0.3, random_state=42, stratify=y_rf
)

# Train random forest
rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=20,
    min_samples_split=100,
    random_state=42,
    n_jobs=-1
)

rf_model.fit(X_train_rf, y_train_rf)

# Predictions
y_pred_prob_rf = rf_model.predict_proba(X_test_rf)[:, 1]
y_pred_rf = rf_model.predict(X_test_rf)

# Performance
print("Random Forest Performance:")
print(classification_report(y_test_rf, y_pred_rf))

fpr_rf, tpr_rf, _ = roc_curve(y_test_rf, y_pred_prob_rf)
roc_auc_rf = auc(fpr_rf, tpr_rf)
print(f"\nAUC: {roc_auc_rf:.4f}")

# Feature importance
feature_importance_rf = pd.DataFrame({
    'feature': rf_features,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

# Plot feature importance
fig, ax = plt.subplots(figsize=(10, 6))
ax.barh(feature_importance_rf['feature'], feature_importance_rf['importance'])
ax.set_xlabel('Importance', fontsize=12)
ax.set_title('Random Forest Feature Importance', fontsize=14, fontweight='bold')
ax.invert_yaxis()
plt.tight_layout()
plt.show()

print("\nFeature Importance:")
print(feature_importance_rf)

21.5 Impact of Umpires on Game Outcomes

Beyond individual call accuracy, we want to understand how umpire decisions affect game outcomes. Poor calls can change at-bat results, inning dynamics, and ultimately win probabilities.

Run Value of Incorrect Calls

We can assign run values to each ball-strike call based on the change in expected runs:

# R: Calculate run impact of umpire calls

# Run expectancy matrix (simplified - would use actual RE24 values)
run_expectancy <- expand.grid(
  balls = 0:3,
  strikes = 0:2,
  outs = 0:2
) %>%
  mutate(
    re = case_when(
      balls == 3 ~ 0.6 + (2 - strikes) * 0.1,
      strikes == 2 ~ 0.3 - balls * 0.05,
      TRUE ~ 0.4 + (balls - strikes) * 0.05
    )
  )

# Calculate impact of incorrect calls
impact_data <- called_pitches %>%
  left_join(run_expectancy, by = c("balls", "strikes", "outs")) %>%
  mutate(
    # What would the count be with correct call?
    correct_balls = ifelse(!in_zone & called_strike == 1, balls + 1, balls),
    correct_strikes = ifelse(in_zone & called_strike == 0, strikes + 1, strikes),

    # Would the at-bat have ended?
    actual_walk = balls == 3 & called_strike == 0 & !in_zone,
    actual_strikeout = strikes == 2 & called_strike == 1 & in_zone,

    # Run value impact
    incorrect_call = !correct_call,
    favor_pitcher = (in_zone & called_strike == 0) |  # Should be strike, called ball
                    (!in_zone & called_strike == 1)    # Should be ball, called strike
  )

# Calculate run value impact by umpire
umpire_impact <- impact_data %>%
  filter(incorrect_call) %>%
  group_by(umpire) %>%
  summarise(
    incorrect_calls = n(),
    calls_favor_pitcher = sum(favor_pitcher),
    calls_favor_batter = sum(!favor_pitcher),
    net_pitcher_favor = calls_favor_pitcher - calls_favor_batter,
    pct_favor_pitcher = mean(favor_pitcher),
    # Simplified run impact (would use actual RE24)
    estimated_run_impact = sum(ifelse(favor_pitcher, -0.05, 0.05))
  ) %>%
  arrange(desc(abs(estimated_run_impact)))

print(head(umpire_impact, 15))

# Visualize impact
ggplot(umpire_impact, aes(x = incorrect_calls, y = estimated_run_impact)) +
  geom_point(aes(color = pct_favor_pitcher, size = abs(estimated_run_impact)),
             alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray") +
  scale_color_gradient2(low = "blue", mid = "white", high = "red",
                       midpoint = 0.5, name = "% Favor\nPitcher") +
  labs(title = "Umpire Impact on Run Expectancy",
       x = "Number of Incorrect Calls",
       y = "Estimated Run Impact",
       size = "Absolute Impact") +
  theme_minimal()

# Python: Calculate run impact of umpire calls

# Create simplified run expectancy matrix
re_data = []
for balls in range(4):
    for strikes in range(3):
        for outs in range(3):
            if balls == 3:
                re = 0.6 + (2 - strikes) * 0.1
            elif strikes == 2:
                re = 0.3 - balls * 0.05
            else:
                re = 0.4 + (balls - strikes) * 0.05

            re_data.append({
                'balls': balls,
                'strikes': strikes,
                'outs': outs,
                're': re
            })

run_expectancy = pd.DataFrame(re_data)

# Merge with pitch data
impact_data = called_pitches.merge(
    run_expectancy,
    on=['balls', 'strikes', 'outs'],
    how='left'
)

# Calculate impact of incorrect calls
impact_data['incorrect_call'] = ~impact_data['correct_call']
impact_data['favor_pitcher'] = (
    (impact_data['in_zone'] & (impact_data['called_strike'] == 0)) |
    (~impact_data['in_zone'] & (impact_data['called_strike'] == 1))
)

# Calculate impact by umpire
umpire_impact = impact_data[impact_data['incorrect_call']].groupby('umpire').agg({
    'incorrect_call': 'count',
    'favor_pitcher': ['sum', lambda x: (~x).sum(), 'mean']
}).reset_index()

umpire_impact.columns = ['umpire', 'incorrect_calls', 'calls_favor_pitcher',
                         'calls_favor_batter', 'pct_favor_pitcher']

umpire_impact['net_pitcher_favor'] = (
    umpire_impact['calls_favor_pitcher'] - umpire_impact['calls_favor_batter']
)

umpire_impact['estimated_run_impact'] = umpire_impact.apply(
    lambda row: row['calls_favor_pitcher'] * -0.05 + row['calls_favor_batter'] * 0.05,
    axis=1
)

umpire_impact = umpire_impact.sort_values('estimated_run_impact',
                                          key=abs, ascending=False)

print(umpire_impact.head(15))

# Visualize impact
fig, ax = plt.subplots(figsize=(12, 8))
scatter = ax.scatter(umpire_impact['incorrect_calls'],
                    umpire_impact['estimated_run_impact'],
                    c=umpire_impact['pct_favor_pitcher'],
                    s=np.abs(umpire_impact['estimated_run_impact']) * 1000,
                    alpha=0.6, cmap='RdBu_r')
ax.axhline(0, linestyle='--', color='gray', alpha=0.5)
ax.set_xlabel('Number of Incorrect Calls', fontsize=12)
ax.set_ylabel('Estimated Run Impact', fontsize=12)
ax.set_title('Umpire Impact on Run Expectancy', fontsize=14, fontweight='bold')
plt.colorbar(scatter, label='% Favor Pitcher', ax=ax)
plt.tight_layout()
plt.show()

Win Probability Impact

High-leverage situations magnify the impact of incorrect calls. A missed strike call with bases loaded in a tie game has far greater impact than the same miss in a 10-0 game:

# R: Win probability impact analysis

# Simplified leverage calculation (would use actual WPA in practice)
leverage_data <- called_pitches %>%
  mutate(
    # Simplified leverage index
    score_diff = abs(rnorm(n(), 0, 2)),  # Placeholder
    inning = sample(1:9, n(), replace = TRUE),
    late_inning = inning >= 7,
    close_game = score_diff <= 2,
    leverage = case_when(
      late_inning & close_game & outs == 2 ~ 2.5,
      late_inning & close_game ~ 2.0,
      close_game ~ 1.5,
      TRUE ~ 1.0
    ),

    # High leverage incorrect call
    high_leverage_error = incorrect_call & leverage > 1.5
  )

# Impact in high-leverage situations
high_leverage_impact <- leverage_data %>%
  group_by(umpire) %>%
  summarise(
    total_high_lev = sum(leverage > 1.5),
    high_lev_errors = sum(high_leverage_error, na.rm = TRUE),
    high_lev_error_rate = mean(high_leverage_error, na.rm = TRUE),
    avg_leverage_of_errors = mean(leverage[incorrect_call], na.rm = TRUE)
  ) %>%
  filter(total_high_lev >= 100) %>%
  arrange(desc(high_lev_error_rate))

print(head(high_leverage_impact, 10))

# R: Calculate run impact of umpire calls

# Run expectancy matrix (simplified - would use actual RE24 values)
run_expectancy <- expand.grid(
  balls = 0:3,
  strikes = 0:2,
  outs = 0:2
) %>%
  mutate(
    re = case_when(
      balls == 3 ~ 0.6 + (2 - strikes) * 0.1,
      strikes == 2 ~ 0.3 - balls * 0.05,
      TRUE ~ 0.4 + (balls - strikes) * 0.05
    )
  )

# Calculate impact of incorrect calls
impact_data <- called_pitches %>%
  left_join(run_expectancy, by = c("balls", "strikes", "outs")) %>%
  mutate(
    # What would the count be with correct call?
    correct_balls = ifelse(!in_zone & called_strike == 1, balls + 1, balls),
    correct_strikes = ifelse(in_zone & called_strike == 0, strikes + 1, strikes),

    # Would the at-bat have ended?
    actual_walk = balls == 3 & called_strike == 0 & !in_zone,
    actual_strikeout = strikes == 2 & called_strike == 1 & in_zone,

    # Run value impact
    incorrect_call = !correct_call,
    favor_pitcher = (in_zone & called_strike == 0) |  # Should be strike, called ball
                    (!in_zone & called_strike == 1)    # Should be ball, called strike
  )

# Calculate run value impact by umpire
umpire_impact <- impact_data %>%
  filter(incorrect_call) %>%
  group_by(umpire) %>%
  summarise(
    incorrect_calls = n(),
    calls_favor_pitcher = sum(favor_pitcher),
    calls_favor_batter = sum(!favor_pitcher),
    net_pitcher_favor = calls_favor_pitcher - calls_favor_batter,
    pct_favor_pitcher = mean(favor_pitcher),
    # Simplified run impact (would use actual RE24)
    estimated_run_impact = sum(ifelse(favor_pitcher, -0.05, 0.05))
  ) %>%
  arrange(desc(abs(estimated_run_impact)))

print(head(umpire_impact, 15))

# Visualize impact
ggplot(umpire_impact, aes(x = incorrect_calls, y = estimated_run_impact)) +
  geom_point(aes(color = pct_favor_pitcher, size = abs(estimated_run_impact)),
             alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray") +
  scale_color_gradient2(low = "blue", mid = "white", high = "red",
                       midpoint = 0.5, name = "% Favor\nPitcher") +
  labs(title = "Umpire Impact on Run Expectancy",
       x = "Number of Incorrect Calls",
       y = "Estimated Run Impact",
       size = "Absolute Impact") +
  theme_minimal()

# R: Win probability impact analysis

# Simplified leverage calculation (would use actual WPA in practice)
leverage_data <- called_pitches %>%
  mutate(
    # Simplified leverage index
    score_diff = abs(rnorm(n(), 0, 2)),  # Placeholder
    inning = sample(1:9, n(), replace = TRUE),
    late_inning = inning >= 7,
    close_game = score_diff <= 2,
    leverage = case_when(
      late_inning & close_game & outs == 2 ~ 2.5,
      late_inning & close_game ~ 2.0,
      close_game ~ 1.5,
      TRUE ~ 1.0
    ),

    # High leverage incorrect call
    high_leverage_error = incorrect_call & leverage > 1.5
  )

# Impact in high-leverage situations
high_leverage_impact <- leverage_data %>%
  group_by(umpire) %>%
  summarise(
    total_high_lev = sum(leverage > 1.5),
    high_lev_errors = sum(high_leverage_error, na.rm = TRUE),
    high_lev_error_rate = mean(high_leverage_error, na.rm = TRUE),
    avg_leverage_of_errors = mean(leverage[incorrect_call], na.rm = TRUE)
  ) %>%
  filter(total_high_lev >= 100) %>%
  arrange(desc(high_lev_error_rate))

print(head(high_leverage_impact, 10))

Python

# Python: Calculate run impact of umpire calls

# Create simplified run expectancy matrix
re_data = []
for balls in range(4):
    for strikes in range(3):
        for outs in range(3):
            if balls == 3:
                re = 0.6 + (2 - strikes) * 0.1
            elif strikes == 2:
                re = 0.3 - balls * 0.05
            else:
                re = 0.4 + (balls - strikes) * 0.05

            re_data.append({
                'balls': balls,
                'strikes': strikes,
                'outs': outs,
                're': re
            })

run_expectancy = pd.DataFrame(re_data)

# Merge with pitch data
impact_data = called_pitches.merge(
    run_expectancy,
    on=['balls', 'strikes', 'outs'],
    how='left'
)

# Calculate impact of incorrect calls
impact_data['incorrect_call'] = ~impact_data['correct_call']
impact_data['favor_pitcher'] = (
    (impact_data['in_zone'] & (impact_data['called_strike'] == 0)) |
    (~impact_data['in_zone'] & (impact_data['called_strike'] == 1))
)

# Calculate impact by umpire
umpire_impact = impact_data[impact_data['incorrect_call']].groupby('umpire').agg({
    'incorrect_call': 'count',
    'favor_pitcher': ['sum', lambda x: (~x).sum(), 'mean']
}).reset_index()

umpire_impact.columns = ['umpire', 'incorrect_calls', 'calls_favor_pitcher',
                         'calls_favor_batter', 'pct_favor_pitcher']

umpire_impact['net_pitcher_favor'] = (
    umpire_impact['calls_favor_pitcher'] - umpire_impact['calls_favor_batter']
)

umpire_impact['estimated_run_impact'] = umpire_impact.apply(
    lambda row: row['calls_favor_pitcher'] * -0.05 + row['calls_favor_batter'] * 0.05,
    axis=1
)

umpire_impact = umpire_impact.sort_values('estimated_run_impact',
                                          key=abs, ascending=False)

print(umpire_impact.head(15))

# Visualize impact
fig, ax = plt.subplots(figsize=(12, 8))
scatter = ax.scatter(umpire_impact['incorrect_calls'],
                    umpire_impact['estimated_run_impact'],
                    c=umpire_impact['pct_favor_pitcher'],
                    s=np.abs(umpire_impact['estimated_run_impact']) * 1000,
                    alpha=0.6, cmap='RdBu_r')
ax.axhline(0, linestyle='--', color='gray', alpha=0.5)
ax.set_xlabel('Number of Incorrect Calls', fontsize=12)
ax.set_ylabel('Estimated Run Impact', fontsize=12)
ax.set_title('Umpire Impact on Run Expectancy', fontsize=14, fontweight='bold')
plt.colorbar(scatter, label='% Favor Pitcher', ax=ax)
plt.tight_layout()
plt.show()

21.6 Robot Umpires & ABS Analysis

The Automated Ball-Strike (ABS) system, colloquially known as "robot umpires," has been tested in minor league baseball since 2019 and represents a potential future for MLB. Understanding the differences between human and automated strike zones is crucial for evaluating this technology.

ABS System Overview

The ABS system uses TrackMan radar technology to determine ball-strike calls instantaneously. Two implementations have been tested:

Full ABS: All ball-strike calls are made by the system
ABS Challenge System: Each team gets a limited number of challenges per game (typically 3)

Comparing Human vs ABS Zones

Let's analyze how an ABS system would call pitches compared to human umpires:

# R: Simulate ABS system and compare to human calls

# Define ABS strike zone (strictly rulebook-based)
abs_call <- function(plate_x, plate_z, sz_top, sz_bot) {
  in_strike_zone(plate_x, plate_z, sz_top, sz_bot)
}

# Apply ABS to our data
abs_comparison <- called_pitches %>%
  mutate(
    abs_strike = abs_call(plate_x, plate_z, sz_top, sz_bot),
    human_strike = called_strike == 1,

    # Agreement/disagreement
    calls_agree = abs_strike == human_strike,
    abs_more_lenient = !abs_strike & human_strike,  # Human called strike, ABS would call ball
    abs_more_strict = abs_strike & !human_strike,    # Human called ball, ABS would call strike

    disagreement_type = case_when(
      calls_agree ~ "Agreement",
      abs_more_lenient ~ "ABS More Lenient",
      abs_more_strict ~ "ABS More Strict"
    )
  )

# Overall agreement rate
agreement_summary <- abs_comparison %>%
  summarise(
    total_pitches = n(),
    agreement_rate = mean(calls_agree),
    abs_more_lenient_pct = mean(abs_more_lenient),
    abs_more_strict_pct = mean(abs_more_strict)
  )

print(agreement_summary)

# Agreement by umpire
umpire_abs_comparison <- abs_comparison %>%
  group_by(umpire) %>%
  summarise(
    pitches = n(),
    agreement_rate = mean(calls_agree),
    abs_more_lenient = sum(abs_more_lenient),
    abs_more_strict = sum(abs_more_strict),
    net_stricter_than_abs = sum(abs_more_strict) - sum(abs_more_lenient)
  ) %>%
  filter(pitches >= 1000) %>%
  arrange(agreement_rate)

print(head(umpire_abs_comparison, 10))

# Visualize disagreement locations
ggplot(abs_comparison %>% filter(!calls_agree),
       aes(x = plate_x, y = plate_z, color = disagreement_type)) +
  geom_point(alpha = 0.3, size = 0.5) +
  geom_rect(aes(xmin = -0.708, xmax = 0.708,
                ymin = mean(sz_bot), ymax = mean(sz_top)),
            fill = NA, color = "black", linewidth = 1, inherit.aes = FALSE) +
  scale_color_manual(values = c("ABS More Lenient" = "blue",
                               "ABS More Strict" = "red")) +
  coord_fixed(ratio = 1) +
  labs(title = "Human-ABS Disagreements by Location",
       x = "Horizontal Location (ft)", y = "Vertical Location (ft)",
       color = "Disagreement Type") +
  theme_minimal()

# Impact of ABS on game statistics
abs_impact <- abs_comparison %>%
  summarise(
    # Current stats with human umps
    current_k_rate = mean(strikes == 2 & human_strike),
    current_bb_rate = mean(balls == 3 & !human_strike),

    # Projected stats with ABS
    abs_k_rate = mean(strikes == 2 & abs_strike),
    abs_bb_rate = mean(balls == 3 & !abs_strike),

    # Differences
    k_rate_change = abs_k_rate - current_k_rate,
    bb_rate_change = abs_bb_rate - current_bb_rate
  )

print(abs_impact)

# Python: Compare human calls to ABS system

# Define ABS strike zone
def abs_call(plate_x, plate_z, sz_top, sz_bot):
    """Automated ball-strike system call (rulebook zone)"""
    return in_strike_zone(plate_x, plate_z, sz_top, sz_bot)

# Apply ABS to data
abs_comparison = called_pitches.copy()
abs_comparison['abs_strike'] = abs_call(
    abs_comparison['plate_x'].values,
    abs_comparison['plate_z'].values,
    abs_comparison['sz_top'].values,
    abs_comparison['sz_bot'].values
)

abs_comparison['human_strike'] = abs_comparison['called_strike'] == 1

# Agreement/disagreement
abs_comparison['calls_agree'] = (
    abs_comparison['abs_strike'] == abs_comparison['human_strike']
)
abs_comparison['abs_more_lenient'] = (
    ~abs_comparison['abs_strike'] & abs_comparison['human_strike']
)
abs_comparison['abs_more_strict'] = (
    abs_comparison['abs_strike'] & ~abs_comparison['human_strike']
)

# Overall agreement
agreement_summary = {
    'total_pitches': len(abs_comparison),
    'agreement_rate': abs_comparison['calls_agree'].mean(),
    'abs_more_lenient_pct': abs_comparison['abs_more_lenient'].mean(),
    'abs_more_strict_pct': abs_comparison['abs_more_strict'].mean()
}

print("Human vs ABS Agreement:")
for key, value in agreement_summary.items():
    print(f"  {key}: {value:.4f}" if isinstance(value, float) else f"  {key}: {value}")

# Agreement by umpire
umpire_abs_comparison = abs_comparison.groupby('umpire').agg({
    'calls_agree': ['count', 'mean'],
    'abs_more_lenient': 'sum',
    'abs_more_strict': 'sum'
}).reset_index()

umpire_abs_comparison.columns = ['umpire', 'pitches', 'agreement_rate',
                                 'abs_more_lenient', 'abs_more_strict']

umpire_abs_comparison['net_stricter_than_abs'] = (
    umpire_abs_comparison['abs_more_strict'] - umpire_abs_comparison['abs_more_lenient']
)

umpire_abs_comparison = umpire_abs_comparison[
    umpire_abs_comparison['pitches'] >= 1000
].sort_values('agreement_rate')

print("\nUmpire vs ABS Agreement Rates:")
print(umpire_abs_comparison.head(10))

# Visualize disagreement locations
fig, ax = plt.subplots(figsize=(10, 10))

disagreements = abs_comparison[~abs_comparison['calls_agree']]

colors = {'ABS More Lenient': 'blue', 'ABS More Strict': 'red'}
for disagreement_type, color in colors.items():
    if disagreement_type == 'ABS More Lenient':
        data = disagreements[disagreements['abs_more_lenient']]
    else:
        data = disagreements[disagreements['abs_more_strict']]

    ax.scatter(data['plate_x'], data['plate_z'],
              c=color, alpha=0.3, s=1, label=disagreement_type)

plot_strike_zone_base(ax, abs_comparison['sz_top'].mean(),
                     abs_comparison['sz_bot'].mean())

ax.set_xlim(-2, 2)
ax.set_ylim(0, 5)
ax.set_title('Human-ABS Disagreements by Location', fontsize=14, fontweight='bold')
ax.legend()
plt.tight_layout()
plt.show()

# Impact on game statistics
def calculate_abs_impact(data):
    """Calculate how ABS would change strikeout and walk rates"""

    # Filter to potential K/BB situations
    potential_k = data[(data['strikes'] == 2)]
    potential_bb = data[(data['balls'] == 3)]

    impact = {
        'current_k_rate': potential_k['human_strike'].mean(),
        'abs_k_rate': potential_k['abs_strike'].mean(),
        'current_bb_rate': (~potential_bb['human_strike']).mean(),
        'abs_bb_rate': (~potential_bb['abs_strike']).mean(),
    }

    impact['k_rate_change'] = impact['abs_k_rate'] - impact['current_k_rate']
    impact['bb_rate_change'] = impact['abs_bb_rate'] - impact['current_bb_rate']

    return impact

abs_impact = calculate_abs_impact(abs_comparison)
print("\nProjected Impact of ABS on Game Outcomes:")
for key, value in abs_impact.items():
    print(f"  {key}: {value:.4f}")

ABS Challenge System Analysis

The challenge system allows teams to strategically contest calls. Let's analyze which situations would benefit most from challenges:

# R: Analyze optimal challenge strategy

# Identify calls that would be overturned by ABS
challenge_data <- abs_comparison %>%
  filter(!calls_agree) %>%
  mutate(
    # Would challenge succeed?
    challenge_success = TRUE,

    # Leverage of situation
    leverage_score = case_when(
      strikes == 2 & balls == 3 ~ 5,  # Full count
      strikes == 2 ~ 4,                # Two-strike count
      balls == 3 ~ 3,                  # Three-ball count
      TRUE ~ 1
    ),

    # Should challenge?
    worth_challenging = leverage_score >= 3
  )

# Challenge value by situation
challenge_value <- challenge_data %>%
  group_by(balls, strikes) %>%
  summarise(
    incorrect_calls = n(),
    avg_leverage = mean(leverage_score),
    pct_worth_challenge = mean(worth_challenging),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_leverage))

print(challenge_value)

# Expected number of successful challenges per game
challenges_per_game <- challenge_data %>%
  group_by(game_date, umpire) %>%
  summarise(
    total_incorrect = n(),
    high_leverage_incorrect = sum(worth_challenging),
    .groups = "drop"
  ) %>%
  summarise(
    avg_incorrect_per_game = mean(total_incorrect),
    avg_challengeable_per_game = mean(high_leverage_incorrect)
  )

print(challenges_per_game)

# R: Simulate ABS system and compare to human calls

# Define ABS strike zone (strictly rulebook-based)
abs_call <- function(plate_x, plate_z, sz_top, sz_bot) {
  in_strike_zone(plate_x, plate_z, sz_top, sz_bot)
}

# Apply ABS to our data
abs_comparison <- called_pitches %>%
  mutate(
    abs_strike = abs_call(plate_x, plate_z, sz_top, sz_bot),
    human_strike = called_strike == 1,

    # Agreement/disagreement
    calls_agree = abs_strike == human_strike,
    abs_more_lenient = !abs_strike & human_strike,  # Human called strike, ABS would call ball
    abs_more_strict = abs_strike & !human_strike,    # Human called ball, ABS would call strike

    disagreement_type = case_when(
      calls_agree ~ "Agreement",
      abs_more_lenient ~ "ABS More Lenient",
      abs_more_strict ~ "ABS More Strict"
    )
  )

# Overall agreement rate
agreement_summary <- abs_comparison %>%
  summarise(
    total_pitches = n(),
    agreement_rate = mean(calls_agree),
    abs_more_lenient_pct = mean(abs_more_lenient),
    abs_more_strict_pct = mean(abs_more_strict)
  )

print(agreement_summary)

# Agreement by umpire
umpire_abs_comparison <- abs_comparison %>%
  group_by(umpire) %>%
  summarise(
    pitches = n(),
    agreement_rate = mean(calls_agree),
    abs_more_lenient = sum(abs_more_lenient),
    abs_more_strict = sum(abs_more_strict),
    net_stricter_than_abs = sum(abs_more_strict) - sum(abs_more_lenient)
  ) %>%
  filter(pitches >= 1000) %>%
  arrange(agreement_rate)

print(head(umpire_abs_comparison, 10))

# Visualize disagreement locations
ggplot(abs_comparison %>% filter(!calls_agree),
       aes(x = plate_x, y = plate_z, color = disagreement_type)) +
  geom_point(alpha = 0.3, size = 0.5) +
  geom_rect(aes(xmin = -0.708, xmax = 0.708,
                ymin = mean(sz_bot), ymax = mean(sz_top)),
            fill = NA, color = "black", linewidth = 1, inherit.aes = FALSE) +
  scale_color_manual(values = c("ABS More Lenient" = "blue",
                               "ABS More Strict" = "red")) +
  coord_fixed(ratio = 1) +
  labs(title = "Human-ABS Disagreements by Location",
       x = "Horizontal Location (ft)", y = "Vertical Location (ft)",
       color = "Disagreement Type") +
  theme_minimal()

# Impact of ABS on game statistics
abs_impact <- abs_comparison %>%
  summarise(
    # Current stats with human umps
    current_k_rate = mean(strikes == 2 & human_strike),
    current_bb_rate = mean(balls == 3 & !human_strike),

    # Projected stats with ABS
    abs_k_rate = mean(strikes == 2 & abs_strike),
    abs_bb_rate = mean(balls == 3 & !abs_strike),

    # Differences
    k_rate_change = abs_k_rate - current_k_rate,
    bb_rate_change = abs_bb_rate - current_bb_rate
  )

print(abs_impact)

# R: Analyze optimal challenge strategy

# Identify calls that would be overturned by ABS
challenge_data <- abs_comparison %>%
  filter(!calls_agree) %>%
  mutate(
    # Would challenge succeed?
    challenge_success = TRUE,

    # Leverage of situation
    leverage_score = case_when(
      strikes == 2 & balls == 3 ~ 5,  # Full count
      strikes == 2 ~ 4,                # Two-strike count
      balls == 3 ~ 3,                  # Three-ball count
      TRUE ~ 1
    ),

    # Should challenge?
    worth_challenging = leverage_score >= 3
  )

# Challenge value by situation
challenge_value <- challenge_data %>%
  group_by(balls, strikes) %>%
  summarise(
    incorrect_calls = n(),
    avg_leverage = mean(leverage_score),
    pct_worth_challenge = mean(worth_challenging),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_leverage))

print(challenge_value)

# Expected number of successful challenges per game
challenges_per_game <- challenge_data %>%
  group_by(game_date, umpire) %>%
  summarise(
    total_incorrect = n(),
    high_leverage_incorrect = sum(worth_challenging),
    .groups = "drop"
  ) %>%
  summarise(
    avg_incorrect_per_game = mean(total_incorrect),
    avg_challengeable_per_game = mean(high_leverage_incorrect)
  )

print(challenges_per_game)

Python

# Python: Compare human calls to ABS system

# Define ABS strike zone
def abs_call(plate_x, plate_z, sz_top, sz_bot):
    """Automated ball-strike system call (rulebook zone)"""
    return in_strike_zone(plate_x, plate_z, sz_top, sz_bot)

# Apply ABS to data
abs_comparison = called_pitches.copy()
abs_comparison['abs_strike'] = abs_call(
    abs_comparison['plate_x'].values,
    abs_comparison['plate_z'].values,
    abs_comparison['sz_top'].values,
    abs_comparison['sz_bot'].values
)

abs_comparison['human_strike'] = abs_comparison['called_strike'] == 1

# Agreement/disagreement
abs_comparison['calls_agree'] = (
    abs_comparison['abs_strike'] == abs_comparison['human_strike']
)
abs_comparison['abs_more_lenient'] = (
    ~abs_comparison['abs_strike'] & abs_comparison['human_strike']
)
abs_comparison['abs_more_strict'] = (
    abs_comparison['abs_strike'] & ~abs_comparison['human_strike']
)

# Overall agreement
agreement_summary = {
    'total_pitches': len(abs_comparison),
    'agreement_rate': abs_comparison['calls_agree'].mean(),
    'abs_more_lenient_pct': abs_comparison['abs_more_lenient'].mean(),
    'abs_more_strict_pct': abs_comparison['abs_more_strict'].mean()
}

print("Human vs ABS Agreement:")
for key, value in agreement_summary.items():
    print(f"  {key}: {value:.4f}" if isinstance(value, float) else f"  {key}: {value}")

# Agreement by umpire
umpire_abs_comparison = abs_comparison.groupby('umpire').agg({
    'calls_agree': ['count', 'mean'],
    'abs_more_lenient': 'sum',
    'abs_more_strict': 'sum'
}).reset_index()

umpire_abs_comparison.columns = ['umpire', 'pitches', 'agreement_rate',
                                 'abs_more_lenient', 'abs_more_strict']

umpire_abs_comparison['net_stricter_than_abs'] = (
    umpire_abs_comparison['abs_more_strict'] - umpire_abs_comparison['abs_more_lenient']
)

umpire_abs_comparison = umpire_abs_comparison[
    umpire_abs_comparison['pitches'] >= 1000
].sort_values('agreement_rate')

print("\nUmpire vs ABS Agreement Rates:")
print(umpire_abs_comparison.head(10))

# Visualize disagreement locations
fig, ax = plt.subplots(figsize=(10, 10))

disagreements = abs_comparison[~abs_comparison['calls_agree']]

colors = {'ABS More Lenient': 'blue', 'ABS More Strict': 'red'}
for disagreement_type, color in colors.items():
    if disagreement_type == 'ABS More Lenient':
        data = disagreements[disagreements['abs_more_lenient']]
    else:
        data = disagreements[disagreements['abs_more_strict']]

    ax.scatter(data['plate_x'], data['plate_z'],
              c=color, alpha=0.3, s=1, label=disagreement_type)

plot_strike_zone_base(ax, abs_comparison['sz_top'].mean(),
                     abs_comparison['sz_bot'].mean())

ax.set_xlim(-2, 2)
ax.set_ylim(0, 5)
ax.set_title('Human-ABS Disagreements by Location', fontsize=14, fontweight='bold')
ax.legend()
plt.tight_layout()
plt.show()

# Impact on game statistics
def calculate_abs_impact(data):
    """Calculate how ABS would change strikeout and walk rates"""

    # Filter to potential K/BB situations
    potential_k = data[(data['strikes'] == 2)]
    potential_bb = data[(data['balls'] == 3)]

    impact = {
        'current_k_rate': potential_k['human_strike'].mean(),
        'abs_k_rate': potential_k['abs_strike'].mean(),
        'current_bb_rate': (~potential_bb['human_strike']).mean(),
        'abs_bb_rate': (~potential_bb['abs_strike']).mean(),
    }

    impact['k_rate_change'] = impact['abs_k_rate'] - impact['current_k_rate']
    impact['bb_rate_change'] = impact['abs_bb_rate'] - impact['current_bb_rate']

    return impact

abs_impact = calculate_abs_impact(abs_comparison)
print("\nProjected Impact of ABS on Game Outcomes:")
for key, value in abs_impact.items():
    print(f"  {key}: {value:.4f}")

21.7 Interactive Strike Zone Tools

Interactive strike zone visualization tools represent the cutting edge of umpire analysis, enabling broadcasters, teams, and fans to explore umpire tendencies with unprecedented clarity. While static heat maps and accuracy tables provide valuable snapshots, interactive tools allow users to dynamically filter by game situation, compare multiple umpires simultaneously, and visualize temporal trends in real-time. This section demonstrates how to build professional-grade interactive strike zone analysis tools using Plotly and modern web visualization frameworks.

Interactive umpire analysis tools serve multiple stakeholders:

Broadcast Integration: Real-time overlays showing umpire tendencies during live games
Team Preparation: Pre-game analysis identifying strategic opportunities based on umpire assignment
League Evaluation: Performance monitoring systems for umpire development and playoff assignments
Public Transparency: Fan-facing tools that increase understanding of ball-strike calling patterns
Academic Research: Comprehensive datasets for studying human decision-making under pressure

Interactive Strike Zone Overlay Comparing Umpires

The strike zone overlay visualization enables direct comparison between umpires by displaying their called strike zones side-by-side or overlapped with adjustable transparency. Users can toggle between different umpires, filter by pitcher/batter handedness, and examine specific count situations.

# R: Interactive Strike Zone Overlay with Umpire Comparison
library(plotly)
library(tidyverse)
library(htmlwidgets)

# Function to calculate umpire strike zone boundaries
calculate_umpire_zone <- function(pitch_data, umpire_name, strike_threshold = 0.5) {
  ump_data <- pitch_data %>%
    filter(umpire == umpire_name)

  # Create grid for probability calculation
  x_seq <- seq(-2, 2, length.out = 40)
  z_seq <- seq(0, 5, length.out = 50)

  grid <- expand.grid(plate_x = x_seq, plate_z = z_seq)

  # Calculate strike probability at each point using local averaging
  grid$strike_prob <- sapply(1:nrow(grid), function(i) {
    # Find pitches within 0.2 feet
    nearby <- ump_data %>%
      filter(abs(plate_x - grid$plate_x[i]) < 0.2,
             abs(plate_z - grid$plate_z[i]) < 0.2)

    if (nrow(nearby) >= 10) {
      mean(nearby$called_strike, na.rm = TRUE)
    } else {
      NA
    }
  })

  return(grid)
}

# Create interactive overlay comparison
create_umpire_overlay <- function(pitch_data, umpires_to_compare) {
  fig <- plot_ly()

  # Color palette for umpires
  colors <- c('rgba(31, 119, 180, 0.6)', 'rgba(255, 127, 14, 0.6)',
              'rgba(44, 160, 44, 0.6)', 'rgba(214, 39, 40, 0.6)')

  # Add contour for each umpire
  for (i in seq_along(umpires_to_compare)) {
    ump_name <- umpires_to_compare[i]
    zone_data <- calculate_umpire_zone(pitch_data, ump_name)

    # Reshape for contour plot
    strike_matrix <- matrix(zone_data$strike_prob,
                           nrow = length(unique(zone_data$plate_z)),
                           ncol = length(unique(zone_data$plate_x)))

    fig <- fig %>%
      add_contour(
        x = unique(zone_data$plate_x),
        y = unique(zone_data$plate_z),
        z = t(strike_matrix),
        contours = list(
          start = 0.5,
          end = 0.5,
          size = 0.01,
          showlabels = FALSE
        ),
        line = list(color = colors[i], width = 3),
        name = ump_name,
        showscale = FALSE,
        hovertemplate = paste0(
          ump_name, "<br>",
          "Location: (%{x:.2f}, %{y:.2f})<br>",
          "Strike Prob: %{z:.1%}<extra></extra>"
        )
      )
  }

  # Add rulebook strike zone rectangle
  fig <- fig %>%
    add_segments(
      x = -0.708, xend = 0.708, y = 1.5, yend = 1.5,
      line = list(color = "black", width = 2, dash = "dash"),
      name = "Rulebook Zone",
      showlegend = TRUE,
      inherit = FALSE
    ) %>%
    add_segments(
      x = -0.708, xend = 0.708, y = 3.5, yend = 3.5,
      line = list(color = "black", width = 2, dash = "dash"),
      showlegend = FALSE,
      inherit = FALSE
    ) %>%
    add_segments(
      x = -0.708, xend = -0.708, y = 1.5, yend = 3.5,
      line = list(color = "black", width = 2, dash = "dash"),
      showlegend = FALSE,
      inherit = FALSE
    ) %>%
    add_segments(
      x = 0.708, xend = 0.708, y = 1.5, yend = 3.5,
      line = list(color = "black", width = 2, dash = "dash"),
      showlegend = FALSE,
      inherit = FALSE
    )

  # Layout configuration
  fig <- fig %>%
    layout(
      title = list(
        text = "<b>Umpire Strike Zone Comparison</b><br><sub>50% Called Strike Probability Contours</sub>",
        x = 0.5,
        xanchor = "center"
      ),
      xaxis = list(
        title = "Horizontal Location (feet)",
        range = c(-1.5, 1.5),
        constrain = "domain",
        zeroline = FALSE
      ),
      yaxis = list(
        title = "Vertical Location (feet)",
        range = c(1, 4),
        scaleanchor = "x",
        scaleratio = 1,
        zeroline = FALSE
      ),
      plot_bgcolor = "rgb(250, 250, 250)",
      paper_bgcolor = "white",
      legend = list(
        x = 1.02,
        y = 0.98,
        xanchor = "left",
        yanchor = "top"
      ),
      hovermode = "closest"
    ) %>%
    config(displayModeBar = TRUE)

  return(fig)
}

# Generate sample data with umpire-specific zones
set.seed(42)
n_pitches <- 15000

sample_umpire_data <- tibble(
  umpire = sample(c("Angel Hernandez", "Joe West", "Pat Hoberg"), n_pitches, replace = TRUE),
  plate_x = rnorm(n_pitches, 0, 0.75),
  plate_z = rnorm(n_pitches, 2.5, 0.7)
) %>%
  mutate(
    dist_from_center = sqrt(plate_x^2 + (plate_z - 2.5)^2),
    # Base strike probability
    base_prob = plogis(2 - 2.5 * dist_from_center),
    # Umpire-specific adjustments
    umpire_effect = case_when(
      umpire == "Angel Hernandez" ~ -0.3,  # Smaller zone
      umpire == "Joe West" ~ 0.2,          # Larger zone
      umpire == "Pat Hoberg" ~ 0.05        # Accurate, slight expansion
    ),
    # Add horizontal bias for variety
    horizontal_bias = case_when(
      umpire == "Angel Hernandez" ~ ifelse(plate_x > 0, -0.2, 0.1),
      TRUE ~ 0
    ),
    strike_prob = plogis(qlogis(base_prob) + umpire_effect + horizontal_bias),
    called_strike = rbinom(n_pitches, 1, strike_prob)
  )

# Create interactive overlay
umpire_overlay <- create_umpire_overlay(
  sample_umpire_data,
  c("Angel Hernandez", "Joe West", "Pat Hoberg")
)
umpire_overlay

# Save as HTML
htmlwidgets::saveWidget(umpire_overlay, "umpire_overlay.html", selfcontained = TRUE)

# Python: Interactive Strike Zone Overlay with Umpire Comparison
import plotly.graph_objects as go
import pandas as pd
import numpy as np
from scipy.stats import gaussian_kde
from scipy.ndimage import gaussian_filter

def calculate_umpire_zone(pitch_data, umpire_name, grid_size=40):
    """Calculate strike probability surface for an umpire"""
    ump_data = pitch_data[pitch_data['umpire'] == umpire_name].copy()

    # Create grid
    x_range = np.linspace(-2, 2, grid_size)
    z_range = np.linspace(0, 5, int(grid_size * 1.25))

    X, Z = np.meshgrid(x_range, z_range)

    # Calculate strike probability using 2D histogram with smoothing
    from scipy.stats import binned_statistic_2d

    strike_prob, x_edges, z_edges, _ = binned_statistic_2d(
        ump_data['plate_x'], ump_data['plate_z'],
        ump_data['called_strike'],
        statistic='mean',
        bins=[x_range, z_range]
    )

    # Apply Gaussian smoothing
    strike_prob_smooth = gaussian_filter(strike_prob.T, sigma=1.5)

    # Mask areas with insufficient data
    counts, _, _, _ = binned_statistic_2d(
        ump_data['plate_x'], ump_data['plate_z'],
        ump_data['called_strike'],
        statistic='count',
        bins=[x_range, z_range]
    )

    strike_prob_smooth[counts.T < 5] = np.nan

    return {
        'x': x_range,
        'z': z_range,
        'strike_prob': strike_prob_smooth
    }

def create_umpire_overlay(pitch_data, umpires_to_compare):
    """Create interactive overlay comparing umpire strike zones"""

    fig = go.Figure()

    # Color palette for umpires
    colors = ['rgba(31, 119, 180, 0.8)', 'rgba(255, 127, 14, 0.8)',
              'rgba(44, 160, 44, 0.8)', 'rgba(214, 39, 40, 0.8)']

    # Add contour for each umpire
    for i, ump_name in enumerate(umpires_to_compare):
        zone_data = calculate_umpire_zone(pitch_data, ump_name)

        # Add filled contour showing probability surface
        fig.add_trace(go.Contour(
            x=zone_data['x'],
            y=zone_data['z'],
            z=zone_data['strike_prob'],
            name=ump_name,
            contours=dict(
                start=0,
                end=1,
                size=0.1,
                showlabels=False,
                coloring='none'
            ),
            line=dict(width=0),
            showscale=False,
            hovertemplate=(
                f"{ump_name}<br>" +
                "Location: (%{x:.2f}, %{y:.2f})<br>" +
                "Strike Prob: %{z:.1%}<extra></extra>"
            ),
            visible=True
        ))

        # Add 50% probability contour line (the "zone boundary")
        fig.add_trace(go.Contour(
            x=zone_data['x'],
            y=zone_data['z'],
            z=zone_data['strike_prob'],
            name=f"{ump_name} Zone",
            contours=dict(
                start=0.5,
                end=0.5,
                size=0.01,
                showlabels=False,
                coloring='lines'
            ),
            line=dict(color=colors[i], width=3),
            showscale=False,
            hoverinfo='skip'
        ))

    # Add rulebook strike zone
    zone_x = [-0.708, 0.708, 0.708, -0.708, -0.708]
    zone_z = [1.5, 1.5, 3.5, 3.5, 1.5]

    fig.add_trace(go.Scatter(
        x=zone_x, y=zone_z,
        mode='lines',
        line=dict(color='black', width=2, dash='dash'),
        name='Rulebook Zone',
        hoverinfo='skip'
    ))

    # Update layout
    fig.update_layout(
        title=dict(
            text="<b>Umpire Strike Zone Comparison</b><br><sub>50% Called Strike Probability Contours</sub>",
            x=0.5,
            xanchor='center',
            font=dict(size=18)
        ),
        xaxis=dict(
            title="Horizontal Location (feet)",
            range=[-1.5, 1.5],
            constrain='domain',
            zeroline=False
        ),
        yaxis=dict(
            title="Vertical Location (feet)",
            range=[1, 4],
            scaleanchor="x",
            scaleratio=1,
            zeroline=False
        ),
        plot_bgcolor='rgb(250, 250, 250)',
        paper_bgcolor='white',
        legend=dict(
            x=1.02,
            y=0.98,
            xanchor='left',
            yanchor='top'
        ),
        hovermode='closest',
        width=800,
        height=800
    )

    return fig

# Generate sample data with umpire-specific zones
np.random.seed(42)
n_pitches = 15000

def inv_logit(x):
    return 1 / (1 + np.exp(-x))

sample_umpire_data = pd.DataFrame({
    'umpire': np.random.choice(['Angel Hernandez', 'Joe West', 'Pat Hoberg'], n_pitches),
    'plate_x': np.random.normal(0, 0.75, n_pitches),
    'plate_z': np.random.normal(2.5, 0.7, n_pitches)
})

sample_umpire_data['dist_from_center'] = np.sqrt(
    sample_umpire_data['plate_x']**2 +
    (sample_umpire_data['plate_z'] - 2.5)**2
)

# Umpire-specific effects
umpire_effects = {
    'Angel Hernandez': -0.3,  # Smaller zone
    'Joe West': 0.2,          # Larger zone
    'Pat Hoberg': 0.05        # Accurate, slight expansion
}

sample_umpire_data['umpire_effect'] = sample_umpire_data['umpire'].map(umpire_effects)

# Add horizontal bias for Angel Hernandez
sample_umpire_data['horizontal_bias'] = np.where(
    (sample_umpire_data['umpire'] == 'Angel Hernandez') & (sample_umpire_data['plate_x'] > 0),
    -0.2,
    np.where(
        (sample_umpire_data['umpire'] == 'Angel Hernandez') & (sample_umpire_data['plate_x'] <= 0),
        0.1,
        0
    )
)

base_logit = 2 - 2.5 * sample_umpire_data['dist_from_center']
strike_prob = inv_logit(
    base_logit + sample_umpire_data['umpire_effect'] + sample_umpire_data['horizontal_bias']
)
sample_umpire_data['called_strike'] = np.random.binomial(1, strike_prob)

# Create interactive overlay
umpire_overlay = create_umpire_overlay(
    sample_umpire_data,
    ['Angel Hernandez', 'Joe West', 'Pat Hoberg']
)
umpire_overlay.show()

# Save as HTML
umpire_overlay.write_html("umpire_overlay.html")

Called Strike Probability Surface (3D Plotly)

Three-dimensional visualizations of strike probability surfaces reveal the complete landscape of an umpire's zone, showing how strike likelihood varies continuously across horizontal and vertical dimensions. Interactive 3D plots allow rotation, zooming, and hover inspection of specific locations.

# R: 3D Strike Probability Surface
library(plotly)

create_3d_strike_surface <- function(pitch_data, umpire_name) {
  # Filter to specific umpire
  ump_data <- pitch_data %>%
    filter(umpire == umpire_name)

  # Create fine grid for smooth surface
  x_seq <- seq(-1.5, 1.5, length.out = 30)
  z_seq <- seq(1, 4, length.out = 40)

  # Calculate strike probability using local regression
  library(mgcv)
  gam_model <- gam(called_strike ~ s(plate_x, plate_z, k = 50),
                   data = ump_data,
                   family = binomial)

  # Predict on grid
  grid <- expand.grid(plate_x = x_seq, plate_z = z_seq)
  grid$strike_prob <- predict(gam_model, newdata = grid, type = "response")

  # Reshape for 3D surface
  strike_matrix <- matrix(grid$strike_prob,
                         nrow = length(z_seq),
                         ncol = length(x_seq),
                         byrow = FALSE)

  # Create 3D surface plot
  fig <- plot_ly(
    x = x_seq,
    y = z_seq,
    z = strike_matrix,
    type = "surface",
    colorscale = list(
      c(0, "rgb(220, 50, 50)"),      # Red for low probability
      c(0.5, "rgb(255, 255, 200)"),  # Yellow for moderate
      c(1, "rgb(50, 50, 220)")       # Blue for high probability
    ),
    colorbar = list(title = "Strike<br>Probability"),
    hovertemplate = paste0(
      "Horizontal: %{x:.2f} ft<br>",
      "Vertical: %{y:.2f} ft<br>",
      "Strike Prob: %{z:.1%}<extra></extra>"
    )
  ) %>%
    layout(
      title = list(
        text = paste0("<b>", umpire_name, " - 3D Strike Probability Surface</b>"),
        x = 0.5,
        xanchor = "center"
      ),
      scene = list(
        xaxis = list(title = "Horizontal Location (ft)", range = c(-1.5, 1.5)),
        yaxis = list(title = "Vertical Location (ft)", range = c(1, 4)),
        zaxis = list(title = "Strike Probability", range = c(0, 1)),
        camera = list(
          eye = list(x = 1.5, y = -1.5, z = 1.2)
        ),
        aspectmode = "manual",
        aspectratio = list(x = 1, y = 1.5, z = 0.7)
      ),
      paper_bgcolor = "white"
    ) %>%
    config(displayModeBar = TRUE)

  return(fig)
}

# Create 3D surface for Joe West
surface_3d <- create_3d_strike_surface(sample_umpire_data, "Joe West")
surface_3d

# Save as HTML
htmlwidgets::saveWidget(surface_3d, "strike_surface_3d.html", selfcontained = TRUE)

# Python: 3D Strike Probability Surface
import plotly.graph_objects as go
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel
import pandas as pd
import numpy as np

def create_3d_strike_surface(pitch_data, umpire_name):
    """Create 3D surface plot of strike probability"""
    # Filter to specific umpire
    ump_data = pitch_data[pitch_data['umpire'] == umpire_name].copy()

    # Create grid for surface
    x_range = np.linspace(-1.5, 1.5, 30)
    z_range = np.linspace(1, 4, 40)
    X_grid, Z_grid = np.meshgrid(x_range, z_range)

    # Fit Gaussian Process model for smooth probability surface
    from sklearn.ensemble import GradientBoostingClassifier

    # Prepare training data
    X_train = ump_data[['plate_x', 'plate_z']].values
    y_train = ump_data['called_strike'].values

    # Train model
    model = GradientBoostingClassifier(n_estimators=100, max_depth=5, random_state=42)
    model.fit(X_train, y_train)

    # Predict on grid
    grid_points = np.column_stack([X_grid.ravel(), Z_grid.ravel()])
    strike_prob = model.predict_proba(grid_points)[:, 1]
    strike_prob_matrix = strike_prob.reshape(X_grid.shape)

    # Create 3D surface
    fig = go.Figure(data=[go.Surface(
        x=x_range,
        y=z_range,
        z=strike_prob_matrix,
        colorscale=[
            [0, 'rgb(220, 50, 50)'],      # Red for low probability
            [0.5, 'rgb(255, 255, 200)'],  # Yellow for moderate
            [1, 'rgb(50, 50, 220)']       # Blue for high probability
        ],
        colorbar=dict(title="Strike<br>Probability"),
        hovertemplate=(
            "Horizontal: %{x:.2f} ft<br>" +
            "Vertical: %{y:.2f} ft<br>" +
            "Strike Prob: %{z:.1%}<extra></extra>"
        )
    )])

    # Add wireframe at 50% probability level
    fig.add_trace(go.Surface(
        x=x_range,
        y=z_range,
        z=np.full_like(strike_prob_matrix, 0.5),
        opacity=0.3,
        colorscale=[[0, 'gray'], [1, 'gray']],
        showscale=False,
        hoverinfo='skip',
        name='50% Threshold'
    ))

    # Update layout
    fig.update_layout(
        title=dict(
            text=f"<b>{umpire_name} - 3D Strike Probability Surface</b>",
            x=0.5,
            xanchor='center',
            font=dict(size=18)
        ),
        scene=dict(
            xaxis=dict(title="Horizontal Location (ft)", range=[-1.5, 1.5]),
            yaxis=dict(title="Vertical Location (ft)", range=[1, 4]),
            zaxis=dict(title="Strike Probability", range=[0, 1]),
            camera=dict(
                eye=dict(x=1.5, y=-1.5, z=1.2)
            ),
            aspectmode="manual",
            aspectratio=dict(x=1, y=1.5, z=0.7)
        ),
        paper_bgcolor='white',
        width=900,
        height=700
    )

    return fig

# Create 3D surface for Joe West
surface_3d = create_3d_strike_surface(sample_umpire_data, 'Joe West')
surface_3d.show()

# Save as HTML
surface_3d.write_html("strike_surface_3d.html")

Animated Umpire Accuracy Over Game Progression

Umpire performance can drift over the course of a game due to fatigue, score effects, or recalibration. Animated visualizations show how accuracy and zone size evolve inning-by-inning, revealing patterns that static aggregates miss.

# R: Animated Umpire Accuracy Over Game Progression
library(plotly)
library(tidyverse)

create_accuracy_animation <- function(pitch_data) {
  # Calculate accuracy by inning for each umpire
  inning_accuracy <- pitch_data %>%
    group_by(umpire, inning) %>%
    summarise(
      pitches = n(),
      accuracy = mean(correct_call, na.rm = TRUE),
      strike_rate = mean(called_strike, na.rm = TRUE),
      zone_expansion = mean(called_strike[!in_zone], na.rm = TRUE),
      .groups = "drop"
    ) %>%
    arrange(umpire, inning)

  # Calculate cumulative accuracy
  inning_accuracy <- inning_accuracy %>%
    group_by(umpire) %>%
    mutate(
      cumulative_accuracy = cummean(accuracy),
      cumulative_pitches = cumsum(pitches)
    ) %>%
    ungroup()

  # Create animated scatter plot
  fig <- plot_ly(
    inning_accuracy,
    x = ~inning,
    y = ~accuracy,
    size = ~pitches,
    color = ~umpire,
    frame = ~inning,
    text = ~paste(
      "Umpire:", umpire, "<br>",
      "Inning:", inning, "<br>",
      "Accuracy:", scales::percent(accuracy, 0.1), "<br>",
      "Pitches:", pitches, "<br>",
      "Strike Rate:", scales::percent(strike_rate, 0.1)
    ),
    hoverinfo = "text",
    type = "scatter",
    mode = "markers+lines",
    marker = list(
      sizemode = "diameter",
      sizeref = 2,
      opacity = 0.7
    )
  ) %>%
    layout(
      title = list(
        text = "<b>Umpire Accuracy Progression Through Game</b>",
        x = 0.5,
        xanchor = "center"
      ),
      xaxis = list(
        title = "Inning",
        range = c(0.5, 9.5)
      ),
      yaxis = list(
        title = "Accuracy Rate",
        range = c(0.80, 1.00),
        tickformat = ".0%"
      ),
      hovermode = "closest",
      showlegend = TRUE
    ) %>%
    animation_opts(
      frame = 500,
      transition = 300,
      redraw = FALSE
    ) %>%
    animation_button(
      x = 1, xanchor = "right",
      y = 0, yanchor = "bottom"
    ) %>%
    animation_slider(
      currentvalue = list(
        prefix = "Inning: ",
        font = list(color = "black")
      )
    )

  return(fig)
}

# Generate sample game progression data
set.seed(123)
n_games <- 50

game_progression_data <- expand_grid(
  game_id = 1:n_games,
  inning = 1:9,
  umpire = c("Angel Hernandez", "Joe West", "Pat Hoberg")
) %>%
  mutate(
    # Simulate pitches per inning
    pitches = rpois(n(), 15),
    # Base accuracy with fatigue effect
    base_accuracy = 0.92 - (inning - 5) * 0.005,
    # Umpire-specific accuracy
    umpire_accuracy_adj = case_when(
      umpire == "Pat Hoberg" ~ 0.04,
      umpire == "Joe West" ~ 0.00,
      umpire == "Angel Hernandez" ~ -0.03
    ),
    # Random game-to-game variation
    game_variation = rnorm(n(), 0, 0.02),
    # Final accuracy
    accuracy = pmin(0.99, pmax(0.80,
      base_accuracy + umpire_accuracy_adj + game_variation
    ))
  ) %>%
  # Add other metrics
  mutate(
    correct_calls = rbinom(n(), pitches, accuracy),
    called_strike = rbinom(n(), pitches, 0.15),
    in_zone = rbinom(n(), called_strike, 0.85),
    correct_call = correct_calls / pitches,
    called_strike = called_strike / pitches
  )

# Create animation
accuracy_animation <- create_accuracy_animation(game_progression_data)
accuracy_animation

# Save as HTML
htmlwidgets::saveWidget(accuracy_animation, "umpire_accuracy_animation.html",
                        selfcontained = TRUE)

# Python: Animated Umpire Accuracy Over Game Progression
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np

def create_accuracy_animation(pitch_data):
    """Create animated plot of umpire accuracy over game progression"""

    # Calculate accuracy by inning for each umpire
    inning_accuracy = pitch_data.groupby(['umpire', 'inning']).agg({
        'correct_call': ['count', 'mean'],
        'called_strike': 'mean'
    }).reset_index()

    inning_accuracy.columns = ['umpire', 'inning', 'pitches', 'accuracy', 'strike_rate']

    # Calculate zone expansion (strikes called outside zone)
    zone_expansion = pitch_data[~pitch_data['in_zone']].groupby(['umpire', 'inning'])['called_strike'].mean()
    inning_accuracy = inning_accuracy.merge(
        zone_expansion.reset_index().rename(columns={'called_strike': 'zone_expansion'}),
        on=['umpire', 'inning'],
        how='left'
    )

    # Calculate cumulative accuracy
    inning_accuracy = inning_accuracy.sort_values(['umpire', 'inning'])
    inning_accuracy['cumulative_accuracy'] = inning_accuracy.groupby('umpire')['accuracy'].transform(
        lambda x: x.expanding().mean()
    )
    inning_accuracy['cumulative_pitches'] = inning_accuracy.groupby('umpire')['pitches'].cumsum()

    # Create animated scatter plot
    fig = px.scatter(
        inning_accuracy,
        x='inning',
        y='accuracy',
        color='umpire',
        size='pitches',
        animation_frame='inning',
        animation_group='umpire',
        hover_data={
            'accuracy': ':.1%',
            'strike_rate': ':.1%',
            'pitches': True,
            'inning': True
        },
        range_x=[0.5, 9.5],
        range_y=[0.80, 1.00],
        labels={
            'inning': 'Inning',
            'accuracy': 'Accuracy Rate',
            'umpire': 'Umpire'
        },
        title="<b>Umpire Accuracy Progression Through Game</b>"
    )

    # Add trend lines for each umpire
    for umpire in inning_accuracy['umpire'].unique():
        ump_data = inning_accuracy[inning_accuracy['umpire'] == umpire]

        # Fit linear trend
        z = np.polyfit(ump_data['inning'], ump_data['accuracy'], 1)
        p = np.poly1d(z)
        trend_y = p(ump_data['inning'])

        fig.add_trace(go.Scatter(
            x=ump_data['inning'],
            y=trend_y,
            mode='lines',
            line=dict(dash='dash', width=1),
            name=f'{umpire} Trend',
            showlegend=True,
            hoverinfo='skip'
        ))

    # Update layout
    fig.update_layout(
        title=dict(
            x=0.5,
            xanchor='center',
            font=dict(size=18)
        ),
        xaxis=dict(title="Inning"),
        yaxis=dict(title="Accuracy Rate", tickformat='.0%'),
        hovermode='closest',
        width=1000,
        height=600,
        paper_bgcolor='white'
    )

    # Update animation settings
    fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 500
    fig.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 300

    return fig

# Generate sample game progression data
np.random.seed(123)
n_games = 50

game_ids = np.repeat(range(1, n_games + 1), 9 * 3)
innings = np.tile(np.repeat(range(1, 10), 3), n_games)
umpires = np.tile(['Angel Hernandez', 'Joe West', 'Pat Hoberg'], n_games * 9)

game_progression_data = pd.DataFrame({
    'game_id': game_ids,
    'inning': innings,
    'umpire': umpires
})

# Simulate pitches per inning
game_progression_data['pitches'] = np.random.poisson(15, len(game_progression_data))

# Base accuracy with fatigue effect
game_progression_data['base_accuracy'] = 0.92 - (game_progression_data['inning'] - 5) * 0.005

# Umpire-specific accuracy adjustments
umpire_adj = {
    'Pat Hoberg': 0.04,
    'Joe West': 0.00,
    'Angel Hernandez': -0.03
}
game_progression_data['umpire_accuracy_adj'] = game_progression_data['umpire'].map(umpire_adj)

# Random variation
game_progression_data['game_variation'] = np.random.normal(0, 0.02, len(game_progression_data))

# Final accuracy
game_progression_data['accuracy'] = np.clip(
    game_progression_data['base_accuracy'] +
    game_progression_data['umpire_accuracy_adj'] +
    game_progression_data['game_variation'],
    0.80, 0.99
)

# Simulate other metrics
game_progression_data['correct_calls'] = np.random.binomial(
    game_progression_data['pitches'],
    game_progression_data['accuracy']
)
game_progression_data['called_strike'] = np.random.binomial(
    game_progression_data['pitches'],
    0.15
)
game_progression_data['in_zone'] = np.random.binomial(
    game_progression_data['called_strike'],
    0.85
).astype(bool)
game_progression_data['correct_call'] = (
    game_progression_data['correct_calls'] / game_progression_data['pitches']
)
game_progression_data['called_strike'] = (
    game_progression_data['called_strike'] / game_progression_data['pitches']
)

# Create animation
accuracy_animation = create_accuracy_animation(game_progression_data)
accuracy_animation.show()

# Save as HTML
accuracy_animation.write_html("umpire_accuracy_animation.html")

These interactive strike zone visualization tools represent the state-of-the-art in umpire analysis. The overlay comparison enables direct evaluation of zone shape differences between umpires, the 3D probability surface reveals the complete decision landscape, and the animated accuracy tracker shows temporal patterns that inform our understanding of human performance under sustained pressure. By deploying these tools in broadcast systems, team analytics platforms, and public-facing websites, stakeholders gain unprecedented insight into the human element of baseball's most frequent and consequential decisions. As MLB continues to evaluate automated ball-strike systems, these visualizations provide essential context for understanding what we gain and lose by removing human judgment from the game.

# R: Interactive Strike Zone Overlay with Umpire Comparison
library(plotly)
library(tidyverse)
library(htmlwidgets)

# Function to calculate umpire strike zone boundaries
calculate_umpire_zone <- function(pitch_data, umpire_name, strike_threshold = 0.5) {
  ump_data <- pitch_data %>%
    filter(umpire == umpire_name)

  # Create grid for probability calculation
  x_seq <- seq(-2, 2, length.out = 40)
  z_seq <- seq(0, 5, length.out = 50)

  grid <- expand.grid(plate_x = x_seq, plate_z = z_seq)

  # Calculate strike probability at each point using local averaging
  grid$strike_prob <- sapply(1:nrow(grid), function(i) {
    # Find pitches within 0.2 feet
    nearby <- ump_data %>%
      filter(abs(plate_x - grid$plate_x[i]) < 0.2,
             abs(plate_z - grid$plate_z[i]) < 0.2)

    if (nrow(nearby) >= 10) {
      mean(nearby$called_strike, na.rm = TRUE)
    } else {
      NA
    }
  })

  return(grid)
}

# Create interactive overlay comparison
create_umpire_overlay <- function(pitch_data, umpires_to_compare) {
  fig <- plot_ly()

  # Color palette for umpires
  colors <- c('rgba(31, 119, 180, 0.6)', 'rgba(255, 127, 14, 0.6)',
              'rgba(44, 160, 44, 0.6)', 'rgba(214, 39, 40, 0.6)')

  # Add contour for each umpire
  for (i in seq_along(umpires_to_compare)) {
    ump_name <- umpires_to_compare[i]
    zone_data <- calculate_umpire_zone(pitch_data, ump_name)

    # Reshape for contour plot
    strike_matrix <- matrix(zone_data$strike_prob,
                           nrow = length(unique(zone_data$plate_z)),
                           ncol = length(unique(zone_data$plate_x)))

    fig <- fig %>%
      add_contour(
        x = unique(zone_data$plate_x),
        y = unique(zone_data$plate_z),
        z = t(strike_matrix),
        contours = list(
          start = 0.5,
          end = 0.5,
          size = 0.01,
          showlabels = FALSE
        ),
        line = list(color = colors[i], width = 3),
        name = ump_name,
        showscale = FALSE,
        hovertemplate = paste0(
          ump_name, "<br>",
          "Location: (%{x:.2f}, %{y:.2f})<br>",
          "Strike Prob: %{z:.1%}<extra></extra>"
        )
      )
  }

  # Add rulebook strike zone rectangle
  fig <- fig %>%
    add_segments(
      x = -0.708, xend = 0.708, y = 1.5, yend = 1.5,
      line = list(color = "black", width = 2, dash = "dash"),
      name = "Rulebook Zone",
      showlegend = TRUE,
      inherit = FALSE
    ) %>%
    add_segments(
      x = -0.708, xend = 0.708, y = 3.5, yend = 3.5,
      line = list(color = "black", width = 2, dash = "dash"),
      showlegend = FALSE,
      inherit = FALSE
    ) %>%
    add_segments(
      x = -0.708, xend = -0.708, y = 1.5, yend = 3.5,
      line = list(color = "black", width = 2, dash = "dash"),
      showlegend = FALSE,
      inherit = FALSE
    ) %>%
    add_segments(
      x = 0.708, xend = 0.708, y = 1.5, yend = 3.5,
      line = list(color = "black", width = 2, dash = "dash"),
      showlegend = FALSE,
      inherit = FALSE
    )

  # Layout configuration
  fig <- fig %>%
    layout(
      title = list(
        text = "<b>Umpire Strike Zone Comparison</b><br><sub>50% Called Strike Probability Contours</sub>",
        x = 0.5,
        xanchor = "center"
      ),
      xaxis = list(
        title = "Horizontal Location (feet)",
        range = c(-1.5, 1.5),
        constrain = "domain",
        zeroline = FALSE
      ),
      yaxis = list(
        title = "Vertical Location (feet)",
        range = c(1, 4),
        scaleanchor = "x",
        scaleratio = 1,
        zeroline = FALSE
      ),
      plot_bgcolor = "rgb(250, 250, 250)",
      paper_bgcolor = "white",
      legend = list(
        x = 1.02,
        y = 0.98,
        xanchor = "left",
        yanchor = "top"
      ),
      hovermode = "closest"
    ) %>%
    config(displayModeBar = TRUE)

  return(fig)
}

# Generate sample data with umpire-specific zones
set.seed(42)
n_pitches <- 15000

sample_umpire_data <- tibble(
  umpire = sample(c("Angel Hernandez", "Joe West", "Pat Hoberg"), n_pitches, replace = TRUE),
  plate_x = rnorm(n_pitches, 0, 0.75),
  plate_z = rnorm(n_pitches, 2.5, 0.7)
) %>%
  mutate(
    dist_from_center = sqrt(plate_x^2 + (plate_z - 2.5)^2),
    # Base strike probability
    base_prob = plogis(2 - 2.5 * dist_from_center),
    # Umpire-specific adjustments
    umpire_effect = case_when(
      umpire == "Angel Hernandez" ~ -0.3,  # Smaller zone
      umpire == "Joe West" ~ 0.2,          # Larger zone
      umpire == "Pat Hoberg" ~ 0.05        # Accurate, slight expansion
    ),
    # Add horizontal bias for variety
    horizontal_bias = case_when(
      umpire == "Angel Hernandez" ~ ifelse(plate_x > 0, -0.2, 0.1),
      TRUE ~ 0
    ),
    strike_prob = plogis(qlogis(base_prob) + umpire_effect + horizontal_bias),
    called_strike = rbinom(n_pitches, 1, strike_prob)
  )

# Create interactive overlay
umpire_overlay <- create_umpire_overlay(
  sample_umpire_data,
  c("Angel Hernandez", "Joe West", "Pat Hoberg")
)
umpire_overlay

# Save as HTML
htmlwidgets::saveWidget(umpire_overlay, "umpire_overlay.html", selfcontained = TRUE)

# R: 3D Strike Probability Surface
library(plotly)

create_3d_strike_surface <- function(pitch_data, umpire_name) {
  # Filter to specific umpire
  ump_data <- pitch_data %>%
    filter(umpire == umpire_name)

  # Create fine grid for smooth surface
  x_seq <- seq(-1.5, 1.5, length.out = 30)
  z_seq <- seq(1, 4, length.out = 40)

  # Calculate strike probability using local regression
  library(mgcv)
  gam_model <- gam(called_strike ~ s(plate_x, plate_z, k = 50),
                   data = ump_data,
                   family = binomial)

  # Predict on grid
  grid <- expand.grid(plate_x = x_seq, plate_z = z_seq)
  grid$strike_prob <- predict(gam_model, newdata = grid, type = "response")

  # Reshape for 3D surface
  strike_matrix <- matrix(grid$strike_prob,
                         nrow = length(z_seq),
                         ncol = length(x_seq),
                         byrow = FALSE)

  # Create 3D surface plot
  fig <- plot_ly(
    x = x_seq,
    y = z_seq,
    z = strike_matrix,
    type = "surface",
    colorscale = list(
      c(0, "rgb(220, 50, 50)"),      # Red for low probability
      c(0.5, "rgb(255, 255, 200)"),  # Yellow for moderate
      c(1, "rgb(50, 50, 220)")       # Blue for high probability
    ),
    colorbar = list(title = "Strike<br>Probability"),
    hovertemplate = paste0(
      "Horizontal: %{x:.2f} ft<br>",
      "Vertical: %{y:.2f} ft<br>",
      "Strike Prob: %{z:.1%}<extra></extra>"
    )
  ) %>%
    layout(
      title = list(
        text = paste0("<b>", umpire_name, " - 3D Strike Probability Surface</b>"),
        x = 0.5,
        xanchor = "center"
      ),
      scene = list(
        xaxis = list(title = "Horizontal Location (ft)", range = c(-1.5, 1.5)),
        yaxis = list(title = "Vertical Location (ft)", range = c(1, 4)),
        zaxis = list(title = "Strike Probability", range = c(0, 1)),
        camera = list(
          eye = list(x = 1.5, y = -1.5, z = 1.2)
        ),
        aspectmode = "manual",
        aspectratio = list(x = 1, y = 1.5, z = 0.7)
      ),
      paper_bgcolor = "white"
    ) %>%
    config(displayModeBar = TRUE)

  return(fig)
}

# Create 3D surface for Joe West
surface_3d <- create_3d_strike_surface(sample_umpire_data, "Joe West")
surface_3d

# Save as HTML
htmlwidgets::saveWidget(surface_3d, "strike_surface_3d.html", selfcontained = TRUE)

# R: Animated Umpire Accuracy Over Game Progression
library(plotly)
library(tidyverse)

create_accuracy_animation <- function(pitch_data) {
  # Calculate accuracy by inning for each umpire
  inning_accuracy <- pitch_data %>%
    group_by(umpire, inning) %>%
    summarise(
      pitches = n(),
      accuracy = mean(correct_call, na.rm = TRUE),
      strike_rate = mean(called_strike, na.rm = TRUE),
      zone_expansion = mean(called_strike[!in_zone], na.rm = TRUE),
      .groups = "drop"
    ) %>%
    arrange(umpire, inning)

  # Calculate cumulative accuracy
  inning_accuracy <- inning_accuracy %>%
    group_by(umpire) %>%
    mutate(
      cumulative_accuracy = cummean(accuracy),
      cumulative_pitches = cumsum(pitches)
    ) %>%
    ungroup()

  # Create animated scatter plot
  fig <- plot_ly(
    inning_accuracy,
    x = ~inning,
    y = ~accuracy,
    size = ~pitches,
    color = ~umpire,
    frame = ~inning,
    text = ~paste(
      "Umpire:", umpire, "<br>",
      "Inning:", inning, "<br>",
      "Accuracy:", scales::percent(accuracy, 0.1), "<br>",
      "Pitches:", pitches, "<br>",
      "Strike Rate:", scales::percent(strike_rate, 0.1)
    ),
    hoverinfo = "text",
    type = "scatter",
    mode = "markers+lines",
    marker = list(
      sizemode = "diameter",
      sizeref = 2,
      opacity = 0.7
    )
  ) %>%
    layout(
      title = list(
        text = "<b>Umpire Accuracy Progression Through Game</b>",
        x = 0.5,
        xanchor = "center"
      ),
      xaxis = list(
        title = "Inning",
        range = c(0.5, 9.5)
      ),
      yaxis = list(
        title = "Accuracy Rate",
        range = c(0.80, 1.00),
        tickformat = ".0%"
      ),
      hovermode = "closest",
      showlegend = TRUE
    ) %>%
    animation_opts(
      frame = 500,
      transition = 300,
      redraw = FALSE
    ) %>%
    animation_button(
      x = 1, xanchor = "right",
      y = 0, yanchor = "bottom"
    ) %>%
    animation_slider(
      currentvalue = list(
        prefix = "Inning: ",
        font = list(color = "black")
      )
    )

  return(fig)
}

# Generate sample game progression data
set.seed(123)
n_games <- 50

game_progression_data <- expand_grid(
  game_id = 1:n_games,
  inning = 1:9,
  umpire = c("Angel Hernandez", "Joe West", "Pat Hoberg")
) %>%
  mutate(
    # Simulate pitches per inning
    pitches = rpois(n(), 15),
    # Base accuracy with fatigue effect
    base_accuracy = 0.92 - (inning - 5) * 0.005,
    # Umpire-specific accuracy
    umpire_accuracy_adj = case_when(
      umpire == "Pat Hoberg" ~ 0.04,
      umpire == "Joe West" ~ 0.00,
      umpire == "Angel Hernandez" ~ -0.03
    ),
    # Random game-to-game variation
    game_variation = rnorm(n(), 0, 0.02),
    # Final accuracy
    accuracy = pmin(0.99, pmax(0.80,
      base_accuracy + umpire_accuracy_adj + game_variation
    ))
  ) %>%
  # Add other metrics
  mutate(
    correct_calls = rbinom(n(), pitches, accuracy),
    called_strike = rbinom(n(), pitches, 0.15),
    in_zone = rbinom(n(), called_strike, 0.85),
    correct_call = correct_calls / pitches,
    called_strike = called_strike / pitches
  )

# Create animation
accuracy_animation <- create_accuracy_animation(game_progression_data)
accuracy_animation

# Save as HTML
htmlwidgets::saveWidget(accuracy_animation, "umpire_accuracy_animation.html",
                        selfcontained = TRUE)

Python

# Python: Interactive Strike Zone Overlay with Umpire Comparison
import plotly.graph_objects as go
import pandas as pd
import numpy as np
from scipy.stats import gaussian_kde
from scipy.ndimage import gaussian_filter

def calculate_umpire_zone(pitch_data, umpire_name, grid_size=40):
    """Calculate strike probability surface for an umpire"""
    ump_data = pitch_data[pitch_data['umpire'] == umpire_name].copy()

    # Create grid
    x_range = np.linspace(-2, 2, grid_size)
    z_range = np.linspace(0, 5, int(grid_size * 1.25))

    X, Z = np.meshgrid(x_range, z_range)

    # Calculate strike probability using 2D histogram with smoothing
    from scipy.stats import binned_statistic_2d

    strike_prob, x_edges, z_edges, _ = binned_statistic_2d(
        ump_data['plate_x'], ump_data['plate_z'],
        ump_data['called_strike'],
        statistic='mean',
        bins=[x_range, z_range]
    )

    # Apply Gaussian smoothing
    strike_prob_smooth = gaussian_filter(strike_prob.T, sigma=1.5)

    # Mask areas with insufficient data
    counts, _, _, _ = binned_statistic_2d(
        ump_data['plate_x'], ump_data['plate_z'],
        ump_data['called_strike'],
        statistic='count',
        bins=[x_range, z_range]
    )

    strike_prob_smooth[counts.T < 5] = np.nan

    return {
        'x': x_range,
        'z': z_range,
        'strike_prob': strike_prob_smooth
    }

def create_umpire_overlay(pitch_data, umpires_to_compare):
    """Create interactive overlay comparing umpire strike zones"""

    fig = go.Figure()

    # Color palette for umpires
    colors = ['rgba(31, 119, 180, 0.8)', 'rgba(255, 127, 14, 0.8)',
              'rgba(44, 160, 44, 0.8)', 'rgba(214, 39, 40, 0.8)']

    # Add contour for each umpire
    for i, ump_name in enumerate(umpires_to_compare):
        zone_data = calculate_umpire_zone(pitch_data, ump_name)

        # Add filled contour showing probability surface
        fig.add_trace(go.Contour(
            x=zone_data['x'],
            y=zone_data['z'],
            z=zone_data['strike_prob'],
            name=ump_name,
            contours=dict(
                start=0,
                end=1,
                size=0.1,
                showlabels=False,
                coloring='none'
            ),
            line=dict(width=0),
            showscale=False,
            hovertemplate=(
                f"{ump_name}<br>" +
                "Location: (%{x:.2f}, %{y:.2f})<br>" +
                "Strike Prob: %{z:.1%}<extra></extra>"
            ),
            visible=True
        ))

        # Add 50% probability contour line (the "zone boundary")
        fig.add_trace(go.Contour(
            x=zone_data['x'],
            y=zone_data['z'],
            z=zone_data['strike_prob'],
            name=f"{ump_name} Zone",
            contours=dict(
                start=0.5,
                end=0.5,
                size=0.01,
                showlabels=False,
                coloring='lines'
            ),
            line=dict(color=colors[i], width=3),
            showscale=False,
            hoverinfo='skip'
        ))

    # Add rulebook strike zone
    zone_x = [-0.708, 0.708, 0.708, -0.708, -0.708]
    zone_z = [1.5, 1.5, 3.5, 3.5, 1.5]

    fig.add_trace(go.Scatter(
        x=zone_x, y=zone_z,
        mode='lines',
        line=dict(color='black', width=2, dash='dash'),
        name='Rulebook Zone',
        hoverinfo='skip'
    ))

    # Update layout
    fig.update_layout(
        title=dict(
            text="<b>Umpire Strike Zone Comparison</b><br><sub>50% Called Strike Probability Contours</sub>",
            x=0.5,
            xanchor='center',
            font=dict(size=18)
        ),
        xaxis=dict(
            title="Horizontal Location (feet)",
            range=[-1.5, 1.5],
            constrain='domain',
            zeroline=False
        ),
        yaxis=dict(
            title="Vertical Location (feet)",
            range=[1, 4],
            scaleanchor="x",
            scaleratio=1,
            zeroline=False
        ),
        plot_bgcolor='rgb(250, 250, 250)',
        paper_bgcolor='white',
        legend=dict(
            x=1.02,
            y=0.98,
            xanchor='left',
            yanchor='top'
        ),
        hovermode='closest',
        width=800,
        height=800
    )

    return fig

# Generate sample data with umpire-specific zones
np.random.seed(42)
n_pitches = 15000

def inv_logit(x):
    return 1 / (1 + np.exp(-x))

sample_umpire_data = pd.DataFrame({
    'umpire': np.random.choice(['Angel Hernandez', 'Joe West', 'Pat Hoberg'], n_pitches),
    'plate_x': np.random.normal(0, 0.75, n_pitches),
    'plate_z': np.random.normal(2.5, 0.7, n_pitches)
})

sample_umpire_data['dist_from_center'] = np.sqrt(
    sample_umpire_data['plate_x']**2 +
    (sample_umpire_data['plate_z'] - 2.5)**2
)

# Umpire-specific effects
umpire_effects = {
    'Angel Hernandez': -0.3,  # Smaller zone
    'Joe West': 0.2,          # Larger zone
    'Pat Hoberg': 0.05        # Accurate, slight expansion
}

sample_umpire_data['umpire_effect'] = sample_umpire_data['umpire'].map(umpire_effects)

# Add horizontal bias for Angel Hernandez
sample_umpire_data['horizontal_bias'] = np.where(
    (sample_umpire_data['umpire'] == 'Angel Hernandez') & (sample_umpire_data['plate_x'] > 0),
    -0.2,
    np.where(
        (sample_umpire_data['umpire'] == 'Angel Hernandez') & (sample_umpire_data['plate_x'] <= 0),
        0.1,
        0
    )
)

base_logit = 2 - 2.5 * sample_umpire_data['dist_from_center']
strike_prob = inv_logit(
    base_logit + sample_umpire_data['umpire_effect'] + sample_umpire_data['horizontal_bias']
)
sample_umpire_data['called_strike'] = np.random.binomial(1, strike_prob)

# Create interactive overlay
umpire_overlay = create_umpire_overlay(
    sample_umpire_data,
    ['Angel Hernandez', 'Joe West', 'Pat Hoberg']
)
umpire_overlay.show()

# Save as HTML
umpire_overlay.write_html("umpire_overlay.html")

Python

# Python: 3D Strike Probability Surface
import plotly.graph_objects as go
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel
import pandas as pd
import numpy as np

def create_3d_strike_surface(pitch_data, umpire_name):
    """Create 3D surface plot of strike probability"""
    # Filter to specific umpire
    ump_data = pitch_data[pitch_data['umpire'] == umpire_name].copy()

    # Create grid for surface
    x_range = np.linspace(-1.5, 1.5, 30)
    z_range = np.linspace(1, 4, 40)
    X_grid, Z_grid = np.meshgrid(x_range, z_range)

    # Fit Gaussian Process model for smooth probability surface
    from sklearn.ensemble import GradientBoostingClassifier

    # Prepare training data
    X_train = ump_data[['plate_x', 'plate_z']].values
    y_train = ump_data['called_strike'].values

    # Train model
    model = GradientBoostingClassifier(n_estimators=100, max_depth=5, random_state=42)
    model.fit(X_train, y_train)

    # Predict on grid
    grid_points = np.column_stack([X_grid.ravel(), Z_grid.ravel()])
    strike_prob = model.predict_proba(grid_points)[:, 1]
    strike_prob_matrix = strike_prob.reshape(X_grid.shape)

    # Create 3D surface
    fig = go.Figure(data=[go.Surface(
        x=x_range,
        y=z_range,
        z=strike_prob_matrix,
        colorscale=[
            [0, 'rgb(220, 50, 50)'],      # Red for low probability
            [0.5, 'rgb(255, 255, 200)'],  # Yellow for moderate
            [1, 'rgb(50, 50, 220)']       # Blue for high probability
        ],
        colorbar=dict(title="Strike<br>Probability"),
        hovertemplate=(
            "Horizontal: %{x:.2f} ft<br>" +
            "Vertical: %{y:.2f} ft<br>" +
            "Strike Prob: %{z:.1%}<extra></extra>"
        )
    )])

    # Add wireframe at 50% probability level
    fig.add_trace(go.Surface(
        x=x_range,
        y=z_range,
        z=np.full_like(strike_prob_matrix, 0.5),
        opacity=0.3,
        colorscale=[[0, 'gray'], [1, 'gray']],
        showscale=False,
        hoverinfo='skip',
        name='50% Threshold'
    ))

    # Update layout
    fig.update_layout(
        title=dict(
            text=f"<b>{umpire_name} - 3D Strike Probability Surface</b>",
            x=0.5,
            xanchor='center',
            font=dict(size=18)
        ),
        scene=dict(
            xaxis=dict(title="Horizontal Location (ft)", range=[-1.5, 1.5]),
            yaxis=dict(title="Vertical Location (ft)", range=[1, 4]),
            zaxis=dict(title="Strike Probability", range=[0, 1]),
            camera=dict(
                eye=dict(x=1.5, y=-1.5, z=1.2)
            ),
            aspectmode="manual",
            aspectratio=dict(x=1, y=1.5, z=0.7)
        ),
        paper_bgcolor='white',
        width=900,
        height=700
    )

    return fig

# Create 3D surface for Joe West
surface_3d = create_3d_strike_surface(sample_umpire_data, 'Joe West')
surface_3d.show()

# Save as HTML
surface_3d.write_html("strike_surface_3d.html")

Python

# Python: Animated Umpire Accuracy Over Game Progression
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np

def create_accuracy_animation(pitch_data):
    """Create animated plot of umpire accuracy over game progression"""

    # Calculate accuracy by inning for each umpire
    inning_accuracy = pitch_data.groupby(['umpire', 'inning']).agg({
        'correct_call': ['count', 'mean'],
        'called_strike': 'mean'
    }).reset_index()

    inning_accuracy.columns = ['umpire', 'inning', 'pitches', 'accuracy', 'strike_rate']

    # Calculate zone expansion (strikes called outside zone)
    zone_expansion = pitch_data[~pitch_data['in_zone']].groupby(['umpire', 'inning'])['called_strike'].mean()
    inning_accuracy = inning_accuracy.merge(
        zone_expansion.reset_index().rename(columns={'called_strike': 'zone_expansion'}),
        on=['umpire', 'inning'],
        how='left'
    )

    # Calculate cumulative accuracy
    inning_accuracy = inning_accuracy.sort_values(['umpire', 'inning'])
    inning_accuracy['cumulative_accuracy'] = inning_accuracy.groupby('umpire')['accuracy'].transform(
        lambda x: x.expanding().mean()
    )
    inning_accuracy['cumulative_pitches'] = inning_accuracy.groupby('umpire')['pitches'].cumsum()

    # Create animated scatter plot
    fig = px.scatter(
        inning_accuracy,
        x='inning',
        y='accuracy',
        color='umpire',
        size='pitches',
        animation_frame='inning',
        animation_group='umpire',
        hover_data={
            'accuracy': ':.1%',
            'strike_rate': ':.1%',
            'pitches': True,
            'inning': True
        },
        range_x=[0.5, 9.5],
        range_y=[0.80, 1.00],
        labels={
            'inning': 'Inning',
            'accuracy': 'Accuracy Rate',
            'umpire': 'Umpire'
        },
        title="<b>Umpire Accuracy Progression Through Game</b>"
    )

    # Add trend lines for each umpire
    for umpire in inning_accuracy['umpire'].unique():
        ump_data = inning_accuracy[inning_accuracy['umpire'] == umpire]

        # Fit linear trend
        z = np.polyfit(ump_data['inning'], ump_data['accuracy'], 1)
        p = np.poly1d(z)
        trend_y = p(ump_data['inning'])

        fig.add_trace(go.Scatter(
            x=ump_data['inning'],
            y=trend_y,
            mode='lines',
            line=dict(dash='dash', width=1),
            name=f'{umpire} Trend',
            showlegend=True,
            hoverinfo='skip'
        ))

    # Update layout
    fig.update_layout(
        title=dict(
            x=0.5,
            xanchor='center',
            font=dict(size=18)
        ),
        xaxis=dict(title="Inning"),
        yaxis=dict(title="Accuracy Rate", tickformat='.0%'),
        hovermode='closest',
        width=1000,
        height=600,
        paper_bgcolor='white'
    )

    # Update animation settings
    fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 500
    fig.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 300

    return fig

# Generate sample game progression data
np.random.seed(123)
n_games = 50

game_ids = np.repeat(range(1, n_games + 1), 9 * 3)
innings = np.tile(np.repeat(range(1, 10), 3), n_games)
umpires = np.tile(['Angel Hernandez', 'Joe West', 'Pat Hoberg'], n_games * 9)

game_progression_data = pd.DataFrame({
    'game_id': game_ids,
    'inning': innings,
    'umpire': umpires
})

# Simulate pitches per inning
game_progression_data['pitches'] = np.random.poisson(15, len(game_progression_data))

# Base accuracy with fatigue effect
game_progression_data['base_accuracy'] = 0.92 - (game_progression_data['inning'] - 5) * 0.005

# Umpire-specific accuracy adjustments
umpire_adj = {
    'Pat Hoberg': 0.04,
    'Joe West': 0.00,
    'Angel Hernandez': -0.03
}
game_progression_data['umpire_accuracy_adj'] = game_progression_data['umpire'].map(umpire_adj)

# Random variation
game_progression_data['game_variation'] = np.random.normal(0, 0.02, len(game_progression_data))

# Final accuracy
game_progression_data['accuracy'] = np.clip(
    game_progression_data['base_accuracy'] +
    game_progression_data['umpire_accuracy_adj'] +
    game_progression_data['game_variation'],
    0.80, 0.99
)

# Simulate other metrics
game_progression_data['correct_calls'] = np.random.binomial(
    game_progression_data['pitches'],
    game_progression_data['accuracy']
)
game_progression_data['called_strike'] = np.random.binomial(
    game_progression_data['pitches'],
    0.15
)
game_progression_data['in_zone'] = np.random.binomial(
    game_progression_data['called_strike'],
    0.85
).astype(bool)
game_progression_data['correct_call'] = (
    game_progression_data['correct_calls'] / game_progression_data['pitches']
)
game_progression_data['called_strike'] = (
    game_progression_data['called_strike'] / game_progression_data['pitches']
)

# Create animation
accuracy_animation = create_accuracy_animation(game_progression_data)
accuracy_animation.show()

# Save as HTML
accuracy_animation.write_html("umpire_accuracy_animation.html")

21.8 Exercises

Exercise 21.1: Umpire Accuracy Analysis

Using pitch-level data from the 2024 season:

a) Calculate the overall accuracy rate for each umpire (minimum 1,000 called pitches)
b) Identify the five most accurate and five least accurate umpires
c) Create a visualization comparing each umpire's accuracy on pitches inside vs. outside the strike zone
d) Test whether there is a statistically significant difference in accuracy between the most and least accurate umpires

Hint: Use a two-sample t-test or permutation test to assess statistical significance. Consider whether accuracy rates are normally distributed.

Exercise 21.2: Strike Zone Visualization

For a specific umpire of your choice:

a) Create a heat map showing the probability of a called strike at different locations
b) Overlay the rulebook strike zone on your visualization
c) Identify regions where the umpire's zone significantly differs from the rulebook (>20 percentage points)
d) Create a similar visualization for the league average and place them side-by-side for comparison

Hint: Use 2D binning or kernel density estimation to create smooth probability surfaces. The stat_summary_2d() function in ggplot2 or scipy.stats.binned_statistic_2d() in Python are helpful.

Exercise 21.3: Predicting Called Strikes

Build and compare predictive models for called strikes:

a) Train a logistic regression model using pitch location, count, batter handedness, and pitcher handedness as features
b) Train a random forest model with the same features
c) Add umpire identity as a feature to both models (use one-hot encoding)
d) Compare the models using AUC, accuracy, and calibration plots
e) Identify which features are most important in each model
f) Use the best model to identify the 10 most surprising calls from the 2024 season (largest difference between predicted probability and actual call)

Hint: Feature importance can be extracted from logistic regression coefficients and random forest's feature_importances_ attribute. For surprising calls, look for high-probability strikes called balls and vice versa.

Exercise 21.4: ABS Impact Simulation

Simulate the impact of implementing full ABS:

a) For each pitch in your dataset, determine whether the human umpire's call matches what ABS would call
b) Calculate the overall agreement rate and identify systematic biases (e.g., do human umpires call more strikes or fewer strikes than ABS?)
c) Estimate how strikeout rates and walk rates would change under full ABS (focus on pitches with 2 strikes and 3 balls respectively)
d) Calculate the expected number of calls that would be overturned per game
e) Analyze whether certain types of pitchers (high strikeout, high walk, etc.) would be helped or hurt more by ABS

Hint: You'll need to define the ABS zone precisely using the sztop and szbot variables. Consider grouping pitchers by strikeout and walk rates to assess differential impacts.

This chapter has covered the fundamentals of umpire analysis and strike zone modeling, from defining accuracy metrics to building predictive models and evaluating the potential impact of automated systems. As MLB continues to consider the role of technology in officiating, these analytical tools will remain essential for understanding how umpires influence the game and how changes to ball-strike calling might affect gameplay and strategy. The combination of granular pitch-tracking data and sophisticated statistical modeling allows us to evaluate umpire performance with unprecedented precision while also informing important decisions about the future of the sport.

Practice Exercises

Reinforce what you've learned with these hands-on exercises. Try to solve them on your own before viewing hints or solutions.

4 exercises

Tips for Success

Read the problem carefully before starting to code
Break down complex problems into smaller steps
Use the hints if you're stuck - they won't give away the answer
After solving, compare your approach with the solution

Exercise 21.1

Umpire Accuracy Analysis

Hard

Using pitch-level data from the 2024 season:

a) Calculate the overall accuracy rate for each umpire (minimum 1,000 called pitches)
b) Identify the five most accurate and five least accurate umpires
c) Create a visualization comparing each umpire's accuracy on pitches inside vs. outside the strike zone
d) Test whether there is a statistically significant difference in accuracy between the most and least accurate umpires

**Hint:** Use a two-sample t-test or permutation test to assess statistical significance. Consider whether accuracy rates are normally distributed.

Exercise 21.2

Strike Zone Visualization

Hard

For a specific umpire of your choice:

a) Create a heat map showing the probability of a called strike at different locations
b) Overlay the rulebook strike zone on your visualization
c) Identify regions where the umpire's zone significantly differs from the rulebook (>20 percentage points)
d) Create a similar visualization for the league average and place them side-by-side for comparison

**Hint:** Use 2D binning or kernel density estimation to create smooth probability surfaces. The `stat_summary_2d()` function in ggplot2 or `scipy.stats.binned_statistic_2d()` in Python are helpful.

Exercise 21.3

Predicting Called Strikes

Hard

Build and compare predictive models for called strikes:

a) Train a logistic regression model using pitch location, count, batter handedness, and pitcher handedness as features
b) Train a random forest model with the same features
c) Add umpire identity as a feature to both models (use one-hot encoding)
d) Compare the models using AUC, accuracy, and calibration plots
e) Identify which features are most important in each model
f) Use the best model to identify the 10 most surprising calls from the 2024 season (largest difference between predicted probability and actual call)

**Hint:** Feature importance can be extracted from logistic regression coefficients and random forest's `feature_importances_` attribute. For surprising calls, look for high-probability strikes called balls and vice versa.

Exercise 21.4

ABS Impact Simulation

Hard

Simulate the impact of implementing full ABS:

a) For each pitch in your dataset, determine whether the human umpire's call matches what ABS would call
b) Calculate the overall agreement rate and identify systematic biases (e.g., do human umpires call more strikes or fewer strikes than ABS?)
c) Estimate how strikeout rates and walk rates would change under full ABS (focus on pitches with 2 strikes and 3 balls respectively)
d) Calculate the expected number of calls that would be overturned per game
e) Analyze whether certain types of pitchers (high strikeout, high walk, etc.) would be helped or hurt more by ABS

**Hint:** You'll need to define the ABS zone precisely using the sz_top and sz_bot variables. Consider grouping pitchers by strikeout and walk rates to assess differential impacts.

---

This chapter has covered the fundamentals of umpire analysis and strike zone modeling, from defining accuracy metrics to building predictive models and evaluating the potential impact of automated systems. As MLB continues to consider the role of technology in officiating, these analytical tools will remain essential for understanding how umpires influence the game and how changes to ball-strike calling might affect gameplay and strategy. The combination of granular pitch-tracking data and sophisticated statistical modeling allows us to evaluate umpire performance with unprecedented precision while also informing important decisions about the future of the sport.

Chapter 21: Umpire Analysis & Strike Zone Modeling

Book Progress

What You'll Learn

Languages in This Chapter

Table of Contents

Quick Navigation

21.1 The Importance of Umpire Analysis

The Human Element in Baseball

Historical Context and Technology Evolution

Key Metrics in Umpire Analysis

21.2 Defining & Measuring the Strike Zone

The Rulebook Strike Zone

Operational Strike Zone Definition

Visualizing the Strike Zone

Accuracy Metrics

21.3 Individual Umpire Tendencies

Common Umpire Tendency Patterns

Umpire Consistency Metrics

21.4 Called Strike Probability Models

Logistic Regression Model

Random Forest Model

Umpire-Specific Models

21.5 Impact of Umpires on Game Outcomes

Run Value of Incorrect Calls

Win Probability Impact

21.6 Robot Umpires & ABS Analysis

ABS System Overview

Comparing Human vs ABS Zones

ABS Challenge System Analysis

21.7 Interactive Strike Zone Tools

Interactive Strike Zone Overlay Comparing Umpires

Called Strike Probability Surface (3D Plotly)

Animated Umpire Accuracy Over Game Progression

21.8 Exercises

Exercise 21.1: Umpire Accuracy Analysis

Exercise 21.2: Strike Zone Visualization

Exercise 21.3: Predicting Called Strikes

Exercise 21.4: ABS Impact Simulation

Practice Exercises

Tips for Success

Umpire Accuracy Analysis

Strike Zone Visualization

Predicting Called Strikes

ABS Impact Simulation

Chapter Summary

Related Resources

Glossary

Resources

All Chapters