The Human Element in Baseball
Home plate umpires make approximately 150-180 ball-strike decisions per game, totaling over 350,000 calls per MLB season. Each of these decisions can influence at-bats, innings, and ultimately game results. Unlike many other sports where technology has largely replaced human judgment on close calls, baseball has maintained the umpire as the final arbiter of balls and strikes (at least until recently with the introduction of ABS in the minor leagues).
The importance of umpire analysis extends to multiple stakeholders:
For Teams and Players:
- Understanding which umpires expand or contract the strike zone
- Adjusting game strategy based on the umpire assignment
- Training hitters to protect against umpires with larger zones
- Pitcher preparation and pitch selection optimization
For Broadcasters and Fans:
- Contextualizing controversial calls within an umpire's historical tendencies
- Evaluating umpire consistency and accuracy
- Enriching game narratives with umpire-specific insights
For League Officials:
- Assessing umpire performance and providing feedback
- Ensuring competitive balance through consistent strike zone enforcement
- Making informed decisions about rule changes and technology adoption
Historical Context and Technology Evolution
Before the PITCHf/x era (introduced in 2007), umpire analysis was largely anecdotal. Scouts and players developed reputations for certain umpires, but quantitative assessment was impossible. The introduction of pitch tracking technology revolutionized our ability to evaluate umpire performance:
- PITCHf/x (2007-2016): Camera-based system that tracked pitch location and trajectory
- Statcast (2015-present): Radar and camera fusion system providing even more precise measurements
- TrackMan/Hawk-Eye (2020-present): Current MLB standard with millimeter-level accuracy
These systems allow us to compare each called ball or strike against the rulebook strike zone, creating objective measures of umpire accuracy and consistency.
Key Metrics in Umpire Analysis
Several metrics have emerged as standards for evaluating umpire performance:
Accuracy Metrics:
- Overall accuracy rate: Percentage of calls that match the rulebook zone
- Called strike accuracy (CSA): Accuracy on pitches called strikes
- Called ball accuracy (CBA): Accuracy on pitches called balls
- Edge consistency: Performance on borderline pitches (within 1-2 inches of zone boundary)
Impact Metrics:
- Runs Above Average (RAA): Run value of incorrect calls
- Win Probability Added (WPA): Impact of calls on win probability
- Favor metrics: Whether an umpire's calls systematically benefit one team
Descriptive Metrics:
- Strike zone expansion/contraction: How the umpire's zone differs from the rulebook
- Zone shape characteristics: Width, height, and asymmetries in the enforced zone
- Context sensitivity: How the zone changes with count, score, inning, etc.
Let's begin our analysis by loading and exploring umpire and pitch data:
# R: Loading and exploring umpire data
library(tidyverse)
library(baseballr)
library(mgcv) # For GAM models
library(randomForest)
library(ggplot2)
library(gridExtra)
# Load pitch data with umpire information
# In practice, this would come from Statcast or PITCHf/x data
load_pitch_data <- function(season = 2024) {
# This is a placeholder - in practice, use baseballr or similar
# pitch_data <- statcast_search(start_date = "2024-04-01",
# end_date = "2024-10-01")
# For demonstration, we'll create a sample dataset structure
set.seed(42)
n_pitches <- 100000
data.frame(
game_date = sample(seq.Date(as.Date("2024-04-01"),
as.Date("2024-09-30"), by = "day"),
n_pitches, replace = TRUE),
umpire = sample(paste("Umpire", 1:20), n_pitches, replace = TRUE),
pitcher = sample(paste("Pitcher", 1:100), n_pitches, replace = TRUE),
batter = sample(paste("Batter", 1:100), n_pitches, replace = TRUE),
plate_x = rnorm(n_pitches, 0, 0.8), # Horizontal location (feet)
plate_z = rnorm(n_pitches, 2.5, 0.8), # Vertical location (feet)
sz_top = rnorm(n_pitches, 3.4, 0.15), # Top of strike zone
sz_bot = rnorm(n_pitches, 1.5, 0.1), # Bottom of strike zone
balls = sample(0:3, n_pitches, replace = TRUE),
strikes = sample(0:2, n_pitches, replace = TRUE),
outs = sample(0:2, n_pitches, replace = TRUE),
pitch_type = sample(c("FF", "SI", "SL", "CH", "CU"), n_pitches, replace = TRUE),
stand = sample(c("L", "R"), n_pitches, replace = TRUE),
p_throws = sample(c("L", "R"), n_pitches, replace = TRUE),
description = sample(c("called_strike", "ball", "hit_into_play", "foul",
"swinging_strike"), n_pitches, replace = TRUE,
prob = c(0.15, 0.35, 0.20, 0.20, 0.10))
)
}
# Load data
pitch_data <- load_pitch_data(2024)
# Filter to called pitches only
called_pitches <- pitch_data %>%
filter(description %in% c("called_strike", "ball")) %>%
mutate(
called_strike = as.numeric(description == "called_strike"),
# Distance from center of zone
dist_from_center = sqrt(plate_x^2 + (plate_z - (sz_top + sz_bot)/2)^2),
# Normalized vertical position (0 = bottom, 1 = top)
norm_z = (plate_z - sz_bot) / (sz_top - sz_bot)
)
# Summary statistics by umpire
umpire_summary <- called_pitches %>%
group_by(umpire) %>%
summarise(
pitches_called = n(),
strike_rate = mean(called_strike),
avg_plate_x = mean(abs(plate_x)),
avg_plate_z = mean(plate_z),
consistency = sd(called_strike)
) %>%
arrange(desc(pitches_called))
print(head(umpire_summary, 10))
# Python: Loading and exploring umpire data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score
import warnings
warnings.filterwarnings('ignore')
# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 8)
def load_pitch_data(season=2024):
"""
Load pitch data with umpire information
In practice, use pybaseball or MLB Stats API
"""
np.random.seed(42)
n_pitches = 100000
# Generate sample data
data = pd.DataFrame({
'game_date': pd.date_range('2024-04-01', '2024-09-30',
periods=n_pitches),
'umpire': np.random.choice([f'Umpire_{i}' for i in range(1, 21)],
n_pitches),
'pitcher': np.random.choice([f'Pitcher_{i}' for i in range(1, 101)],
n_pitches),
'batter': np.random.choice([f'Batter_{i}' for i in range(1, 101)],
n_pitches),
'plate_x': np.random.normal(0, 0.8, n_pitches),
'plate_z': np.random.normal(2.5, 0.8, n_pitches),
'sz_top': np.random.normal(3.4, 0.15, n_pitches),
'sz_bot': np.random.normal(1.5, 0.1, n_pitches),
'balls': np.random.choice([0, 1, 2, 3], n_pitches),
'strikes': np.random.choice([0, 1, 2], n_pitches),
'outs': np.random.choice([0, 1, 2], n_pitches),
'pitch_type': np.random.choice(['FF', 'SI', 'SL', 'CH', 'CU'], n_pitches),
'stand': np.random.choice(['L', 'R'], n_pitches),
'p_throws': np.random.choice(['L', 'R'], n_pitches),
'description': np.random.choice(
['called_strike', 'ball', 'hit_into_play', 'foul', 'swinging_strike'],
n_pitches,
p=[0.15, 0.35, 0.20, 0.20, 0.10]
)
})
return data
# Load data
pitch_data = load_pitch_data(2024)
# Filter to called pitches only
called_pitches = pitch_data[
pitch_data['description'].isin(['called_strike', 'ball'])
].copy()
called_pitches['called_strike'] = (
called_pitches['description'] == 'called_strike'
).astype(int)
# Distance from center of zone
called_pitches['dist_from_center'] = np.sqrt(
called_pitches['plate_x']**2 +
(called_pitches['plate_z'] -
(called_pitches['sz_top'] + called_pitches['sz_bot'])/2)**2
)
# Normalized vertical position
called_pitches['norm_z'] = (
(called_pitches['plate_z'] - called_pitches['sz_bot']) /
(called_pitches['sz_top'] - called_pitches['sz_bot'])
)
# Summary statistics by umpire
umpire_summary = called_pitches.groupby('umpire').agg({
'called_strike': ['count', 'mean', 'std'],
'plate_x': lambda x: np.mean(np.abs(x)),
'plate_z': 'mean'
}).round(4)
umpire_summary.columns = ['pitches_called', 'strike_rate', 'consistency',
'avg_abs_plate_x', 'avg_plate_z']
print(umpire_summary.sort_values('pitches_called', ascending=False).head(10))
The code above demonstrates how to load and prepare pitch data for umpire analysis. In practice, you would use actual Statcast data from Baseball Savant, the baseballr package, or pybaseball library.
# R: Loading and exploring umpire data
library(tidyverse)
library(baseballr)
library(mgcv) # For GAM models
library(randomForest)
library(ggplot2)
library(gridExtra)
# Load pitch data with umpire information
# In practice, this would come from Statcast or PITCHf/x data
load_pitch_data <- function(season = 2024) {
# This is a placeholder - in practice, use baseballr or similar
# pitch_data <- statcast_search(start_date = "2024-04-01",
# end_date = "2024-10-01")
# For demonstration, we'll create a sample dataset structure
set.seed(42)
n_pitches <- 100000
data.frame(
game_date = sample(seq.Date(as.Date("2024-04-01"),
as.Date("2024-09-30"), by = "day"),
n_pitches, replace = TRUE),
umpire = sample(paste("Umpire", 1:20), n_pitches, replace = TRUE),
pitcher = sample(paste("Pitcher", 1:100), n_pitches, replace = TRUE),
batter = sample(paste("Batter", 1:100), n_pitches, replace = TRUE),
plate_x = rnorm(n_pitches, 0, 0.8), # Horizontal location (feet)
plate_z = rnorm(n_pitches, 2.5, 0.8), # Vertical location (feet)
sz_top = rnorm(n_pitches, 3.4, 0.15), # Top of strike zone
sz_bot = rnorm(n_pitches, 1.5, 0.1), # Bottom of strike zone
balls = sample(0:3, n_pitches, replace = TRUE),
strikes = sample(0:2, n_pitches, replace = TRUE),
outs = sample(0:2, n_pitches, replace = TRUE),
pitch_type = sample(c("FF", "SI", "SL", "CH", "CU"), n_pitches, replace = TRUE),
stand = sample(c("L", "R"), n_pitches, replace = TRUE),
p_throws = sample(c("L", "R"), n_pitches, replace = TRUE),
description = sample(c("called_strike", "ball", "hit_into_play", "foul",
"swinging_strike"), n_pitches, replace = TRUE,
prob = c(0.15, 0.35, 0.20, 0.20, 0.10))
)
}
# Load data
pitch_data <- load_pitch_data(2024)
# Filter to called pitches only
called_pitches <- pitch_data %>%
filter(description %in% c("called_strike", "ball")) %>%
mutate(
called_strike = as.numeric(description == "called_strike"),
# Distance from center of zone
dist_from_center = sqrt(plate_x^2 + (plate_z - (sz_top + sz_bot)/2)^2),
# Normalized vertical position (0 = bottom, 1 = top)
norm_z = (plate_z - sz_bot) / (sz_top - sz_bot)
)
# Summary statistics by umpire
umpire_summary <- called_pitches %>%
group_by(umpire) %>%
summarise(
pitches_called = n(),
strike_rate = mean(called_strike),
avg_plate_x = mean(abs(plate_x)),
avg_plate_z = mean(plate_z),
consistency = sd(called_strike)
) %>%
arrange(desc(pitches_called))
print(head(umpire_summary, 10))
# Python: Loading and exploring umpire data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score
import warnings
warnings.filterwarnings('ignore')
# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 8)
def load_pitch_data(season=2024):
"""
Load pitch data with umpire information
In practice, use pybaseball or MLB Stats API
"""
np.random.seed(42)
n_pitches = 100000
# Generate sample data
data = pd.DataFrame({
'game_date': pd.date_range('2024-04-01', '2024-09-30',
periods=n_pitches),
'umpire': np.random.choice([f'Umpire_{i}' for i in range(1, 21)],
n_pitches),
'pitcher': np.random.choice([f'Pitcher_{i}' for i in range(1, 101)],
n_pitches),
'batter': np.random.choice([f'Batter_{i}' for i in range(1, 101)],
n_pitches),
'plate_x': np.random.normal(0, 0.8, n_pitches),
'plate_z': np.random.normal(2.5, 0.8, n_pitches),
'sz_top': np.random.normal(3.4, 0.15, n_pitches),
'sz_bot': np.random.normal(1.5, 0.1, n_pitches),
'balls': np.random.choice([0, 1, 2, 3], n_pitches),
'strikes': np.random.choice([0, 1, 2], n_pitches),
'outs': np.random.choice([0, 1, 2], n_pitches),
'pitch_type': np.random.choice(['FF', 'SI', 'SL', 'CH', 'CU'], n_pitches),
'stand': np.random.choice(['L', 'R'], n_pitches),
'p_throws': np.random.choice(['L', 'R'], n_pitches),
'description': np.random.choice(
['called_strike', 'ball', 'hit_into_play', 'foul', 'swinging_strike'],
n_pitches,
p=[0.15, 0.35, 0.20, 0.20, 0.10]
)
})
return data
# Load data
pitch_data = load_pitch_data(2024)
# Filter to called pitches only
called_pitches = pitch_data[
pitch_data['description'].isin(['called_strike', 'ball'])
].copy()
called_pitches['called_strike'] = (
called_pitches['description'] == 'called_strike'
).astype(int)
# Distance from center of zone
called_pitches['dist_from_center'] = np.sqrt(
called_pitches['plate_x']**2 +
(called_pitches['plate_z'] -
(called_pitches['sz_top'] + called_pitches['sz_bot'])/2)**2
)
# Normalized vertical position
called_pitches['norm_z'] = (
(called_pitches['plate_z'] - called_pitches['sz_bot']) /
(called_pitches['sz_top'] - called_pitches['sz_bot'])
)
# Summary statistics by umpire
umpire_summary = called_pitches.groupby('umpire').agg({
'called_strike': ['count', 'mean', 'std'],
'plate_x': lambda x: np.mean(np.abs(x)),
'plate_z': 'mean'
}).round(4)
umpire_summary.columns = ['pitches_called', 'strike_rate', 'consistency',
'avg_abs_plate_x', 'avg_plate_z']
print(umpire_summary.sort_values('pitches_called', ascending=False).head(10))
The Rulebook Strike Zone
According to MLB's Official Baseball Rules, the strike zone is defined as:
"That area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the hollow beneath the kneecap. The Strike Zone shall be determined from the batter's stance as the batter is prepared to swing at a pitched ball."
This definition creates several measurement challenges:
- Individual variation: Each batter has a unique strike zone based on their height and stance
- Dynamic nature: The zone is determined by the batting stance, which may vary
- Ambiguous boundaries: "Midpoint between shoulders and pants" is subjective
- Horizontal boundaries: The 17-inch home plate width is clear, but pitch location is three-dimensional
Operational Strike Zone Definition
For analytical purposes, we typically define the strike zone using Statcast's sztop and szbot variables, which are calculated for each batter based on their physical dimensions. The horizontal boundaries are typically:
- Left edge: -0.708 feet (-8.5 inches) from the center of home plate
- Right edge: +0.708 feet (+8.5 inches) from the center of home plate
- Width: 1.417 feet (17 inches)
Let's create functions to determine whether a pitch is in the rulebook zone:
# R: Strike zone classification functions
# Check if pitch is in rulebook strike zone
in_strike_zone <- function(plate_x, plate_z, sz_top, sz_bot) {
# Horizontal boundaries (in feet)
left_edge <- -0.708
right_edge <- 0.708
in_horizontal <- (plate_x >= left_edge) & (plate_x <= right_edge)
in_vertical <- (plate_z >= sz_bot) & (plate_z <= sz_top)
return(in_horizontal & in_vertical)
}
# Calculate distance to nearest strike zone edge
distance_to_zone <- function(plate_x, plate_z, sz_top, sz_bot) {
left_edge <- -0.708
right_edge <- 0.708
# Horizontal distance
dx <- pmax(0, pmax(left_edge - plate_x, plate_x - right_edge))
# Vertical distance
dz <- pmax(0, pmax(sz_bot - plate_z, plate_z - sz_top))
# Euclidean distance to zone
return(sqrt(dx^2 + dz^2))
}
# Apply to our data
called_pitches <- called_pitches %>%
mutate(
in_zone = in_strike_zone(plate_x, plate_z, sz_top, sz_bot),
dist_to_zone = distance_to_zone(plate_x, plate_z, sz_top, sz_bot),
# Classify pitch location
location_type = case_when(
in_zone ~ "In Zone",
dist_to_zone <= 0.25 ~ "Edge (0-3in)",
dist_to_zone <= 0.5 ~ "Near (3-6in)",
TRUE ~ "Outside (6in+)"
)
)
# Accuracy analysis
accuracy_by_location <- called_pitches %>%
mutate(
correct_call = (in_zone & called_strike == 1) |
(!in_zone & called_strike == 0)
) %>%
group_by(location_type) %>%
summarise(
n_pitches = n(),
accuracy = mean(correct_call),
strike_rate = mean(called_strike),
expected_strike_rate = mean(in_zone)
)
print(accuracy_by_location)
# Python: Strike zone classification functions
def in_strike_zone(plate_x, plate_z, sz_top, sz_bot):
"""Check if pitch is in rulebook strike zone"""
left_edge = -0.708
right_edge = 0.708
in_horizontal = (plate_x >= left_edge) & (plate_x <= right_edge)
in_vertical = (plate_z >= sz_bot) & (plate_z <= sz_top)
return in_horizontal & in_vertical
def distance_to_zone(plate_x, plate_z, sz_top, sz_bot):
"""Calculate distance to nearest strike zone edge"""
left_edge = -0.708
right_edge = 0.708
# Horizontal distance
dx = np.maximum(0, np.maximum(left_edge - plate_x, plate_x - right_edge))
# Vertical distance
dz = np.maximum(0, np.maximum(sz_bot - plate_z, plate_z - sz_top))
# Euclidean distance to zone
return np.sqrt(dx**2 + dz**2)
# Apply to our data
called_pitches['in_zone'] = in_strike_zone(
called_pitches['plate_x'].values,
called_pitches['plate_z'].values,
called_pitches['sz_top'].values,
called_pitches['sz_bot'].values
)
called_pitches['dist_to_zone'] = distance_to_zone(
called_pitches['plate_x'].values,
called_pitches['plate_z'].values,
called_pitches['sz_top'].values,
called_pitches['sz_bot'].values
)
# Classify pitch location
def classify_location(dist_to_zone, in_zone):
if in_zone:
return "In Zone"
elif dist_to_zone <= 0.25:
return "Edge (0-3in)"
elif dist_to_zone <= 0.5:
return "Near (3-6in)"
else:
return "Outside (6in+)"
called_pitches['location_type'] = called_pitches.apply(
lambda row: classify_location(row['dist_to_zone'], row['in_zone']),
axis=1
)
# Accuracy analysis
called_pitches['correct_call'] = (
(called_pitches['in_zone'] & (called_pitches['called_strike'] == 1)) |
(~called_pitches['in_zone'] & (called_pitches['called_strike'] == 0))
)
accuracy_by_location = called_pitches.groupby('location_type').agg({
'called_strike': ['count', 'mean'],
'in_zone': 'mean',
'correct_call': 'mean'
}).round(4)
accuracy_by_location.columns = ['n_pitches', 'strike_rate',
'expected_strike_rate', 'accuracy']
print(accuracy_by_location)
Visualizing the Strike Zone
Creating effective visualizations of the strike zone is crucial for understanding umpire tendencies. Let's create several visualization functions:
# R: Strike zone visualization functions
# Basic strike zone plot
plot_strike_zone <- function(data, title = "Strike Zone") {
ggplot(data, aes(x = plate_x, y = plate_z)) +
# Add strike zone box (using average sz_top and sz_bot)
geom_rect(aes(xmin = -0.708, xmax = 0.708,
ymin = mean(sz_bot), ymax = mean(sz_top)),
fill = NA, color = "black", linewidth = 1) +
# Add home plate
geom_segment(aes(x = -0.708, xend = 0.708, y = 0, yend = 0),
color = "black", linewidth = 1.5) +
coord_fixed(ratio = 1) +
labs(title = title, x = "Horizontal Location (ft)",
y = "Vertical Location (ft)") +
theme_minimal()
}
# Heat map of called strike probability
plot_strike_probability_heatmap <- function(data, umpire_name = NULL) {
if (!is.null(umpire_name)) {
data <- data %>% filter(umpire == umpire_name)
title <- paste("Called Strike Probability -", umpire_name)
} else {
title <- "Called Strike Probability - All Umpires"
}
ggplot(data, aes(x = plate_x, y = plate_z, z = called_strike)) +
stat_summary_2d(fun = mean, bins = 20) +
geom_rect(aes(xmin = -0.708, xmax = 0.708,
ymin = mean(sz_bot), ymax = mean(sz_top)),
fill = NA, color = "white", linewidth = 1.2) +
scale_fill_gradient2(low = "blue", mid = "yellow", high = "red",
midpoint = 0.5, name = "Strike\nProbability") +
coord_fixed(ratio = 1) +
labs(title = title, x = "Horizontal Location (ft)",
y = "Vertical Location (ft)") +
theme_minimal() +
theme(legend.position = "right")
}
# Compare umpire to league average
plot_umpire_comparison <- function(data, umpire_name) {
umpire_data <- data %>% filter(umpire == umpire_name)
league_data <- data
p1 <- plot_strike_probability_heatmap(league_data, NULL)
p2 <- plot_strike_probability_heatmap(umpire_data, umpire_name)
grid.arrange(p1, p2, ncol = 2)
}
# Example: Plot for a specific umpire
plot_strike_probability_heatmap(called_pitches, "Umpire 1")
# Python: Strike zone visualization functions
def plot_strike_zone_base(ax, sz_top_avg, sz_bot_avg):
"""Add strike zone rectangle to plot"""
from matplotlib.patches import Rectangle
zone = Rectangle((-0.708, sz_bot_avg), 1.416, sz_top_avg - sz_bot_avg,
fill=False, edgecolor='white', linewidth=2)
ax.add_patch(zone)
# Add home plate
ax.plot([-0.708, 0.708], [0, 0], 'k-', linewidth=2)
ax.set_aspect('equal')
ax.set_xlabel('Horizontal Location (ft)', fontsize=12)
ax.set_ylabel('Vertical Location (ft)', fontsize=12)
def plot_strike_probability_heatmap(data, umpire_name=None, ax=None):
"""Create heatmap of called strike probability"""
if ax is None:
fig, ax = plt.subplots(figsize=(10, 8))
if umpire_name:
plot_data = data[data['umpire'] == umpire_name].copy()
title = f"Called Strike Probability - {umpire_name}"
else:
plot_data = data.copy()
title = "Called Strike Probability - All Umpires"
# Create 2D histogram
from scipy.stats import binned_statistic_2d
x = plot_data['plate_x']
y = plot_data['plate_z']
values = plot_data['called_strike']
statistic, x_edges, y_edges, _ = binned_statistic_2d(
x, y, values, statistic='mean', bins=20,
range=[[-2, 2], [0, 5]]
)
# Plot heatmap
im = ax.imshow(statistic.T, origin='lower', aspect='auto',
extent=[-2, 2, 0, 5], cmap='RdYlBu_r',
vmin=0, vmax=1, alpha=0.8)
# Add strike zone
plot_strike_zone_base(ax, plot_data['sz_top'].mean(),
plot_data['sz_bot'].mean())
ax.set_xlim(-2, 2)
ax.set_ylim(0, 5)
ax.set_title(title, fontsize=14, fontweight='bold')
# Add colorbar
plt.colorbar(im, ax=ax, label='Strike Probability')
return ax
def plot_umpire_comparison(data, umpire_name):
"""Compare umpire to league average"""
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))
plot_strike_probability_heatmap(data, None, ax1)
plot_strike_probability_heatmap(data, umpire_name, ax2)
plt.tight_layout()
return fig
# Example: Plot for a specific umpire
fig = plot_umpire_comparison(called_pitches, 'Umpire_1')
plt.savefig('umpire_comparison.png', dpi=300, bbox_inches='tight')
plt.show()
Accuracy Metrics
Once we've defined the strike zone, we can calculate various accuracy metrics for umpires:
# R: Calculate comprehensive umpire accuracy metrics
calculate_umpire_metrics <- function(data) {
data %>%
group_by(umpire) %>%
summarise(
# Volume
total_calls = n(),
# Overall accuracy
accuracy = mean(correct_call),
# Accuracy by true zone
accuracy_in_zone = mean(correct_call[in_zone]),
accuracy_out_zone = mean(correct_call[!in_zone]),
# Called strike rate by true zone
called_strike_in_zone = mean(called_strike[in_zone]),
called_strike_out_zone = mean(called_strike[!in_zone]),
# Edge consistency (within 3 inches of zone)
edge_pitches = sum(dist_to_zone <= 0.25),
edge_accuracy = mean(correct_call[dist_to_zone <= 0.25]),
# Zone expansion (strikes called outside zone / total outside)
zone_expansion = mean(called_strike[!in_zone]),
# Zone contraction (balls called inside zone / total inside)
zone_contraction = mean(!called_strike[in_zone]),
# Favor metrics (difference in strike rate for home/away)
# This would require additional data about home/away team
.groups = "drop"
) %>%
arrange(desc(total_calls))
}
umpire_metrics <- calculate_umpire_metrics(called_pitches)
print(head(umpire_metrics, 10))
# Visualize umpire accuracy distribution
ggplot(umpire_metrics, aes(x = accuracy)) +
geom_histogram(bins = 20, fill = "steelblue", alpha = 0.7) +
geom_vline(xintercept = mean(umpire_metrics$accuracy),
color = "red", linetype = "dashed", linewidth = 1) +
labs(title = "Distribution of Umpire Accuracy Rates",
x = "Accuracy Rate", y = "Number of Umpires") +
theme_minimal()
# Python: Calculate comprehensive umpire accuracy metrics
def calculate_umpire_metrics(data):
"""Calculate comprehensive accuracy metrics for each umpire"""
metrics = []
for umpire in data['umpire'].unique():
ump_data = data[data['umpire'] == umpire]
in_zone = ump_data['in_zone']
called_strike = ump_data['called_strike']
correct_call = ump_data['correct_call']
dist_to_zone = ump_data['dist_to_zone']
metric = {
'umpire': umpire,
'total_calls': len(ump_data),
'accuracy': correct_call.mean(),
'accuracy_in_zone': correct_call[in_zone].mean() if in_zone.sum() > 0 else np.nan,
'accuracy_out_zone': correct_call[~in_zone].mean() if (~in_zone).sum() > 0 else np.nan,
'called_strike_in_zone': called_strike[in_zone].mean() if in_zone.sum() > 0 else np.nan,
'called_strike_out_zone': called_strike[~in_zone].mean() if (~in_zone).sum() > 0 else np.nan,
'edge_pitches': (dist_to_zone <= 0.25).sum(),
'edge_accuracy': correct_call[dist_to_zone <= 0.25].mean() if (dist_to_zone <= 0.25).sum() > 0 else np.nan,
'zone_expansion': called_strike[~in_zone].mean() if (~in_zone).sum() > 0 else np.nan,
'zone_contraction': (~called_strike[in_zone]).mean() if in_zone.sum() > 0 else np.nan,
}
metrics.append(metric)
return pd.DataFrame(metrics).sort_values('total_calls', ascending=False)
umpire_metrics = calculate_umpire_metrics(called_pitches)
print(umpire_metrics.head(10))
# Visualize umpire accuracy distribution
fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(umpire_metrics['accuracy'], bins=20, alpha=0.7,
color='steelblue', edgecolor='black')
ax.axvline(umpire_metrics['accuracy'].mean(), color='red',
linestyle='--', linewidth=2, label='Mean Accuracy')
ax.set_xlabel('Accuracy Rate', fontsize=12)
ax.set_ylabel('Number of Umpires', fontsize=12)
ax.set_title('Distribution of Umpire Accuracy Rates', fontsize=14, fontweight='bold')
ax.legend()
plt.tight_layout()
plt.show()
# R: Strike zone classification functions
# Check if pitch is in rulebook strike zone
in_strike_zone <- function(plate_x, plate_z, sz_top, sz_bot) {
# Horizontal boundaries (in feet)
left_edge <- -0.708
right_edge <- 0.708
in_horizontal <- (plate_x >= left_edge) & (plate_x <= right_edge)
in_vertical <- (plate_z >= sz_bot) & (plate_z <= sz_top)
return(in_horizontal & in_vertical)
}
# Calculate distance to nearest strike zone edge
distance_to_zone <- function(plate_x, plate_z, sz_top, sz_bot) {
left_edge <- -0.708
right_edge <- 0.708
# Horizontal distance
dx <- pmax(0, pmax(left_edge - plate_x, plate_x - right_edge))
# Vertical distance
dz <- pmax(0, pmax(sz_bot - plate_z, plate_z - sz_top))
# Euclidean distance to zone
return(sqrt(dx^2 + dz^2))
}
# Apply to our data
called_pitches <- called_pitches %>%
mutate(
in_zone = in_strike_zone(plate_x, plate_z, sz_top, sz_bot),
dist_to_zone = distance_to_zone(plate_x, plate_z, sz_top, sz_bot),
# Classify pitch location
location_type = case_when(
in_zone ~ "In Zone",
dist_to_zone <= 0.25 ~ "Edge (0-3in)",
dist_to_zone <= 0.5 ~ "Near (3-6in)",
TRUE ~ "Outside (6in+)"
)
)
# Accuracy analysis
accuracy_by_location <- called_pitches %>%
mutate(
correct_call = (in_zone & called_strike == 1) |
(!in_zone & called_strike == 0)
) %>%
group_by(location_type) %>%
summarise(
n_pitches = n(),
accuracy = mean(correct_call),
strike_rate = mean(called_strike),
expected_strike_rate = mean(in_zone)
)
print(accuracy_by_location)
# R: Strike zone visualization functions
# Basic strike zone plot
plot_strike_zone <- function(data, title = "Strike Zone") {
ggplot(data, aes(x = plate_x, y = plate_z)) +
# Add strike zone box (using average sz_top and sz_bot)
geom_rect(aes(xmin = -0.708, xmax = 0.708,
ymin = mean(sz_bot), ymax = mean(sz_top)),
fill = NA, color = "black", linewidth = 1) +
# Add home plate
geom_segment(aes(x = -0.708, xend = 0.708, y = 0, yend = 0),
color = "black", linewidth = 1.5) +
coord_fixed(ratio = 1) +
labs(title = title, x = "Horizontal Location (ft)",
y = "Vertical Location (ft)") +
theme_minimal()
}
# Heat map of called strike probability
plot_strike_probability_heatmap <- function(data, umpire_name = NULL) {
if (!is.null(umpire_name)) {
data <- data %>% filter(umpire == umpire_name)
title <- paste("Called Strike Probability -", umpire_name)
} else {
title <- "Called Strike Probability - All Umpires"
}
ggplot(data, aes(x = plate_x, y = plate_z, z = called_strike)) +
stat_summary_2d(fun = mean, bins = 20) +
geom_rect(aes(xmin = -0.708, xmax = 0.708,
ymin = mean(sz_bot), ymax = mean(sz_top)),
fill = NA, color = "white", linewidth = 1.2) +
scale_fill_gradient2(low = "blue", mid = "yellow", high = "red",
midpoint = 0.5, name = "Strike\nProbability") +
coord_fixed(ratio = 1) +
labs(title = title, x = "Horizontal Location (ft)",
y = "Vertical Location (ft)") +
theme_minimal() +
theme(legend.position = "right")
}
# Compare umpire to league average
plot_umpire_comparison <- function(data, umpire_name) {
umpire_data <- data %>% filter(umpire == umpire_name)
league_data <- data
p1 <- plot_strike_probability_heatmap(league_data, NULL)
p2 <- plot_strike_probability_heatmap(umpire_data, umpire_name)
grid.arrange(p1, p2, ncol = 2)
}
# Example: Plot for a specific umpire
plot_strike_probability_heatmap(called_pitches, "Umpire 1")
# R: Calculate comprehensive umpire accuracy metrics
calculate_umpire_metrics <- function(data) {
data %>%
group_by(umpire) %>%
summarise(
# Volume
total_calls = n(),
# Overall accuracy
accuracy = mean(correct_call),
# Accuracy by true zone
accuracy_in_zone = mean(correct_call[in_zone]),
accuracy_out_zone = mean(correct_call[!in_zone]),
# Called strike rate by true zone
called_strike_in_zone = mean(called_strike[in_zone]),
called_strike_out_zone = mean(called_strike[!in_zone]),
# Edge consistency (within 3 inches of zone)
edge_pitches = sum(dist_to_zone <= 0.25),
edge_accuracy = mean(correct_call[dist_to_zone <= 0.25]),
# Zone expansion (strikes called outside zone / total outside)
zone_expansion = mean(called_strike[!in_zone]),
# Zone contraction (balls called inside zone / total inside)
zone_contraction = mean(!called_strike[in_zone]),
# Favor metrics (difference in strike rate for home/away)
# This would require additional data about home/away team
.groups = "drop"
) %>%
arrange(desc(total_calls))
}
umpire_metrics <- calculate_umpire_metrics(called_pitches)
print(head(umpire_metrics, 10))
# Visualize umpire accuracy distribution
ggplot(umpire_metrics, aes(x = accuracy)) +
geom_histogram(bins = 20, fill = "steelblue", alpha = 0.7) +
geom_vline(xintercept = mean(umpire_metrics$accuracy),
color = "red", linetype = "dashed", linewidth = 1) +
labs(title = "Distribution of Umpire Accuracy Rates",
x = "Accuracy Rate", y = "Number of Umpires") +
theme_minimal()
# Python: Strike zone classification functions
def in_strike_zone(plate_x, plate_z, sz_top, sz_bot):
"""Check if pitch is in rulebook strike zone"""
left_edge = -0.708
right_edge = 0.708
in_horizontal = (plate_x >= left_edge) & (plate_x <= right_edge)
in_vertical = (plate_z >= sz_bot) & (plate_z <= sz_top)
return in_horizontal & in_vertical
def distance_to_zone(plate_x, plate_z, sz_top, sz_bot):
"""Calculate distance to nearest strike zone edge"""
left_edge = -0.708
right_edge = 0.708
# Horizontal distance
dx = np.maximum(0, np.maximum(left_edge - plate_x, plate_x - right_edge))
# Vertical distance
dz = np.maximum(0, np.maximum(sz_bot - plate_z, plate_z - sz_top))
# Euclidean distance to zone
return np.sqrt(dx**2 + dz**2)
# Apply to our data
called_pitches['in_zone'] = in_strike_zone(
called_pitches['plate_x'].values,
called_pitches['plate_z'].values,
called_pitches['sz_top'].values,
called_pitches['sz_bot'].values
)
called_pitches['dist_to_zone'] = distance_to_zone(
called_pitches['plate_x'].values,
called_pitches['plate_z'].values,
called_pitches['sz_top'].values,
called_pitches['sz_bot'].values
)
# Classify pitch location
def classify_location(dist_to_zone, in_zone):
if in_zone:
return "In Zone"
elif dist_to_zone <= 0.25:
return "Edge (0-3in)"
elif dist_to_zone <= 0.5:
return "Near (3-6in)"
else:
return "Outside (6in+)"
called_pitches['location_type'] = called_pitches.apply(
lambda row: classify_location(row['dist_to_zone'], row['in_zone']),
axis=1
)
# Accuracy analysis
called_pitches['correct_call'] = (
(called_pitches['in_zone'] & (called_pitches['called_strike'] == 1)) |
(~called_pitches['in_zone'] & (called_pitches['called_strike'] == 0))
)
accuracy_by_location = called_pitches.groupby('location_type').agg({
'called_strike': ['count', 'mean'],
'in_zone': 'mean',
'correct_call': 'mean'
}).round(4)
accuracy_by_location.columns = ['n_pitches', 'strike_rate',
'expected_strike_rate', 'accuracy']
print(accuracy_by_location)
# Python: Strike zone visualization functions
def plot_strike_zone_base(ax, sz_top_avg, sz_bot_avg):
"""Add strike zone rectangle to plot"""
from matplotlib.patches import Rectangle
zone = Rectangle((-0.708, sz_bot_avg), 1.416, sz_top_avg - sz_bot_avg,
fill=False, edgecolor='white', linewidth=2)
ax.add_patch(zone)
# Add home plate
ax.plot([-0.708, 0.708], [0, 0], 'k-', linewidth=2)
ax.set_aspect('equal')
ax.set_xlabel('Horizontal Location (ft)', fontsize=12)
ax.set_ylabel('Vertical Location (ft)', fontsize=12)
def plot_strike_probability_heatmap(data, umpire_name=None, ax=None):
"""Create heatmap of called strike probability"""
if ax is None:
fig, ax = plt.subplots(figsize=(10, 8))
if umpire_name:
plot_data = data[data['umpire'] == umpire_name].copy()
title = f"Called Strike Probability - {umpire_name}"
else:
plot_data = data.copy()
title = "Called Strike Probability - All Umpires"
# Create 2D histogram
from scipy.stats import binned_statistic_2d
x = plot_data['plate_x']
y = plot_data['plate_z']
values = plot_data['called_strike']
statistic, x_edges, y_edges, _ = binned_statistic_2d(
x, y, values, statistic='mean', bins=20,
range=[[-2, 2], [0, 5]]
)
# Plot heatmap
im = ax.imshow(statistic.T, origin='lower', aspect='auto',
extent=[-2, 2, 0, 5], cmap='RdYlBu_r',
vmin=0, vmax=1, alpha=0.8)
# Add strike zone
plot_strike_zone_base(ax, plot_data['sz_top'].mean(),
plot_data['sz_bot'].mean())
ax.set_xlim(-2, 2)
ax.set_ylim(0, 5)
ax.set_title(title, fontsize=14, fontweight='bold')
# Add colorbar
plt.colorbar(im, ax=ax, label='Strike Probability')
return ax
def plot_umpire_comparison(data, umpire_name):
"""Compare umpire to league average"""
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))
plot_strike_probability_heatmap(data, None, ax1)
plot_strike_probability_heatmap(data, umpire_name, ax2)
plt.tight_layout()
return fig
# Example: Plot for a specific umpire
fig = plot_umpire_comparison(called_pitches, 'Umpire_1')
plt.savefig('umpire_comparison.png', dpi=300, bbox_inches='tight')
plt.show()
# Python: Calculate comprehensive umpire accuracy metrics
def calculate_umpire_metrics(data):
"""Calculate comprehensive accuracy metrics for each umpire"""
metrics = []
for umpire in data['umpire'].unique():
ump_data = data[data['umpire'] == umpire]
in_zone = ump_data['in_zone']
called_strike = ump_data['called_strike']
correct_call = ump_data['correct_call']
dist_to_zone = ump_data['dist_to_zone']
metric = {
'umpire': umpire,
'total_calls': len(ump_data),
'accuracy': correct_call.mean(),
'accuracy_in_zone': correct_call[in_zone].mean() if in_zone.sum() > 0 else np.nan,
'accuracy_out_zone': correct_call[~in_zone].mean() if (~in_zone).sum() > 0 else np.nan,
'called_strike_in_zone': called_strike[in_zone].mean() if in_zone.sum() > 0 else np.nan,
'called_strike_out_zone': called_strike[~in_zone].mean() if (~in_zone).sum() > 0 else np.nan,
'edge_pitches': (dist_to_zone <= 0.25).sum(),
'edge_accuracy': correct_call[dist_to_zone <= 0.25].mean() if (dist_to_zone <= 0.25).sum() > 0 else np.nan,
'zone_expansion': called_strike[~in_zone].mean() if (~in_zone).sum() > 0 else np.nan,
'zone_contraction': (~called_strike[in_zone]).mean() if in_zone.sum() > 0 else np.nan,
}
metrics.append(metric)
return pd.DataFrame(metrics).sort_values('total_calls', ascending=False)
umpire_metrics = calculate_umpire_metrics(called_pitches)
print(umpire_metrics.head(10))
# Visualize umpire accuracy distribution
fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(umpire_metrics['accuracy'], bins=20, alpha=0.7,
color='steelblue', edgecolor='black')
ax.axvline(umpire_metrics['accuracy'].mean(), color='red',
linestyle='--', linewidth=2, label='Mean Accuracy')
ax.set_xlabel('Accuracy Rate', fontsize=12)
ax.set_ylabel('Number of Umpires', fontsize=12)
ax.set_title('Distribution of Umpire Accuracy Rates', fontsize=14, fontweight='bold')
ax.legend()
plt.tight_layout()
plt.show()
Different umpires exhibit distinct patterns in their strike zone enforcement. Some consistently call a larger zone, while others are more conservative. Understanding these tendencies is valuable for teams preparing for games and for evaluating umpire performance.
Common Umpire Tendency Patterns
Research has identified several common patterns in umpire behavior:
- Zone Size Variation: Some umpires call 5-10% more strikes than others
- Directional Bias: Preferences for inside/outside or high/low strikes
- Handedness Effects: Different zones for left-handed vs. right-handed batters
- Count Sensitivity: Expanding the zone in pitcher-friendly counts (0-2) or hitter-friendly counts (3-0)
- Experience Effects: Veteran umpires often show more consistency
- Game Context: Some umpires tighten their zone in high-leverage situations
Let's analyze these tendencies systematically:
# R: Analyzing individual umpire tendencies
# 1. Zone size by umpire
zone_size_analysis <- called_pitches %>%
group_by(umpire) %>%
summarise(
total_calls = n(),
strike_rate = mean(called_strike),
# Effective zone boundaries (where 50% are called strikes)
left_boundary = quantile(plate_x[called_strike == 1 & plate_x < 0], 0.05),
right_boundary = quantile(plate_x[called_strike == 1 & plate_x > 0], 0.95),
top_boundary = quantile(plate_z[called_strike == 1], 0.95),
bottom_boundary = quantile(plate_z[called_strike == 1], 0.05),
# Zone width and height
zone_width = right_boundary - left_boundary,
zone_height = top_boundary - bottom_boundary
) %>%
filter(total_calls >= 1000) # Minimum sample size
# Plot zone size variation
ggplot(zone_size_analysis, aes(x = zone_width, y = zone_height)) +
geom_point(aes(size = total_calls, color = strike_rate), alpha = 0.7) +
geom_vline(xintercept = 1.417, linetype = "dashed", color = "red") +
geom_hline(yintercept = mean(called_pitches$sz_top - called_pitches$sz_bot),
linetype = "dashed", color = "red") +
scale_color_gradient2(low = "blue", mid = "white", high = "red",
midpoint = median(zone_size_analysis$strike_rate)) +
labs(title = "Umpire Zone Size Variation",
subtitle = "Dashed lines show rulebook zone dimensions",
x = "Effective Zone Width (ft)", y = "Effective Zone Height (ft)") +
theme_minimal()
# 2. Directional tendencies
directional_analysis <- called_pitches %>%
mutate(
zone_region = case_when(
plate_x < -0.708 ~ "Off Plate Inside",
plate_x > 0.708 ~ "Off Plate Outside",
plate_z > sz_top ~ "High",
plate_z < sz_bot ~ "Low",
TRUE ~ "In Zone"
)
) %>%
group_by(umpire, zone_region) %>%
summarise(
n = n(),
strike_rate = mean(called_strike),
.groups = "drop"
) %>%
filter(n >= 100) %>%
pivot_wider(names_from = zone_region, values_from = c(n, strike_rate))
print(head(directional_analysis))
# 3. Count sensitivity
count_analysis <- called_pitches %>%
mutate(
count_type = case_when(
balls == 3 & strikes == 0 ~ "3-0",
balls == 3 & strikes == 1 ~ "3-1",
balls == 3 & strikes == 2 ~ "3-2",
balls == 0 & strikes == 2 ~ "0-2",
balls == 1 & strikes == 2 ~ "1-2",
balls == 2 & strikes == 2 ~ "2-2",
TRUE ~ "Other"
),
pitcher_favorable = balls == 0 & strikes == 2,
hitter_favorable = balls == 3 & strikes == 0
) %>%
group_by(umpire) %>%
summarise(
strike_rate_overall = mean(called_strike),
strike_rate_3_0 = mean(called_strike[hitter_favorable]),
strike_rate_0_2 = mean(called_strike[pitcher_favorable]),
count_sensitivity = strike_rate_3_0 - strike_rate_0_2,
.groups = "drop"
) %>%
filter(!is.na(count_sensitivity))
# Plot count sensitivity
ggplot(count_analysis, aes(x = strike_rate_0_2, y = strike_rate_3_0)) +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
geom_point(alpha = 0.6, size = 3) +
geom_smooth(method = "lm", se = TRUE, color = "blue") +
labs(title = "Umpire Count Sensitivity",
subtitle = "Strike rate in pitcher-favorable (0-2) vs hitter-favorable (3-0) counts",
x = "Strike Rate on 0-2 Count", y = "Strike Rate on 3-0 Count") +
theme_minimal()
# 4. Batter handedness effects
handedness_analysis <- called_pitches %>%
mutate(
inside = (stand == "R" & plate_x < -0.708) |
(stand == "L" & plate_x > 0.708),
outside = (stand == "R" & plate_x > 0.708) |
(stand == "L" & plate_x < -0.708)
) %>%
group_by(umpire, stand) %>%
summarise(
strike_rate_inside = mean(called_strike[inside], na.rm = TRUE),
strike_rate_outside = mean(called_strike[outside], na.rm = TRUE),
inside_bias = strike_rate_inside - strike_rate_outside,
.groups = "drop"
) %>%
pivot_wider(names_from = stand, values_from = c(strike_rate_inside,
strike_rate_outside, inside_bias))
print(head(handedness_analysis))
# Python: Analyzing individual umpire tendencies
# 1. Zone size by umpire
def analyze_zone_size(data):
"""Analyze effective zone size for each umpire"""
zone_metrics = []
for umpire in data['umpire'].unique():
ump_data = data[data['umpire'] == umpire]
if len(ump_data) < 1000: # Minimum sample size
continue
strikes = ump_data[ump_data['called_strike'] == 1]
if len(strikes) < 100:
continue
metrics = {
'umpire': umpire,
'total_calls': len(ump_data),
'strike_rate': ump_data['called_strike'].mean(),
'left_boundary': np.percentile(strikes[strikes['plate_x'] < 0]['plate_x'], 5),
'right_boundary': np.percentile(strikes[strikes['plate_x'] > 0]['plate_x'], 95),
'top_boundary': np.percentile(strikes['plate_z'], 95),
'bottom_boundary': np.percentile(strikes['plate_z'], 5),
}
metrics['zone_width'] = metrics['right_boundary'] - metrics['left_boundary']
metrics['zone_height'] = metrics['top_boundary'] - metrics['bottom_boundary']
zone_metrics.append(metrics)
return pd.DataFrame(zone_metrics)
zone_size_analysis = analyze_zone_size(called_pitches)
# Plot zone size variation
fig, ax = plt.subplots(figsize=(10, 8))
scatter = ax.scatter(zone_size_analysis['zone_width'],
zone_size_analysis['zone_height'],
s=zone_size_analysis['total_calls']/50,
c=zone_size_analysis['strike_rate'],
alpha=0.6, cmap='RdYlBu_r')
ax.axvline(1.417, linestyle='--', color='red', alpha=0.7, label='Rulebook Width')
ax.axhline(called_pitches['sz_top'].mean() - called_pitches['sz_bot'].mean(),
linestyle='--', color='red', alpha=0.7, label='Avg Rulebook Height')
ax.set_xlabel('Effective Zone Width (ft)', fontsize=12)
ax.set_ylabel('Effective Zone Height (ft)', fontsize=12)
ax.set_title('Umpire Zone Size Variation', fontsize=14, fontweight='bold')
ax.legend()
plt.colorbar(scatter, label='Strike Rate', ax=ax)
plt.tight_layout()
plt.show()
# 2. Count sensitivity
def analyze_count_sensitivity(data):
"""Analyze how umpire zones change with count"""
count_metrics = []
for umpire in data['umpire'].unique():
ump_data = data[data['umpire'] == umpire]
pitcher_favorable = ump_data[(ump_data['balls'] == 0) &
(ump_data['strikes'] == 2)]
hitter_favorable = ump_data[(ump_data['balls'] == 3) &
(ump_data['strikes'] == 0)]
if len(pitcher_favorable) > 20 and len(hitter_favorable) > 20:
metrics = {
'umpire': umpire,
'strike_rate_overall': ump_data['called_strike'].mean(),
'strike_rate_3_0': hitter_favorable['called_strike'].mean(),
'strike_rate_0_2': pitcher_favorable['called_strike'].mean(),
}
metrics['count_sensitivity'] = (metrics['strike_rate_3_0'] -
metrics['strike_rate_0_2'])
count_metrics.append(metrics)
return pd.DataFrame(count_metrics)
count_analysis = analyze_count_sensitivity(called_pitches)
# Plot count sensitivity
fig, ax = plt.subplots(figsize=(10, 8))
ax.scatter(count_analysis['strike_rate_0_2'],
count_analysis['strike_rate_3_0'],
alpha=0.6, s=100)
ax.plot([0, 1], [0, 1], 'k--', alpha=0.3, label='Equal rates')
# Fit line
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(
count_analysis['strike_rate_0_2'], count_analysis['strike_rate_3_0'])
x_line = np.array([count_analysis['strike_rate_0_2'].min(),
count_analysis['strike_rate_0_2'].max()])
ax.plot(x_line, slope * x_line + intercept, 'b-',
label=f'Fit line (R²={r_value**2:.3f})')
ax.set_xlabel('Strike Rate on 0-2 Count', fontsize=12)
ax.set_ylabel('Strike Rate on 3-0 Count', fontsize=12)
ax.set_title('Umpire Count Sensitivity', fontsize=14, fontweight='bold')
ax.legend()
plt.tight_layout()
plt.show()
# 3. Batter handedness effects
def analyze_handedness_effects(data):
"""Analyze inside/outside tendencies by batter handedness"""
hand_metrics = []
for umpire in data['umpire'].unique():
ump_data = data[data['umpire'] == umpire]
for stand in ['L', 'R']:
stand_data = ump_data[ump_data['stand'] == stand]
if stand == 'R':
inside = stand_data[stand_data['plate_x'] < -0.708]
outside = stand_data[stand_data['plate_x'] > 0.708]
else:
inside = stand_data[stand_data['plate_x'] > 0.708]
outside = stand_data[stand_data['plate_x'] < -0.708]
if len(inside) > 20 and len(outside) > 20:
metrics = {
'umpire': umpire,
'stand': stand,
'strike_rate_inside': inside['called_strike'].mean(),
'strike_rate_outside': outside['called_strike'].mean(),
}
metrics['inside_bias'] = (metrics['strike_rate_inside'] -
metrics['strike_rate_outside'])
hand_metrics.append(metrics)
return pd.DataFrame(hand_metrics)
handedness_analysis = analyze_handedness_effects(called_pitches)
print(handedness_analysis.head(10))
Umpire Consistency Metrics
Beyond accuracy, consistency is a crucial aspect of umpire performance. A consistent umpire may have a slightly expanded zone, but as long as it's predictable, players can adjust. Let's measure consistency:
# R: Measuring umpire consistency
# Consistency metric: Standard deviation of calls at similar locations
calculate_consistency <- function(data) {
# Create location bins
data <- data %>%
mutate(
x_bin = cut(plate_x, breaks = seq(-2, 2, 0.2)),
z_bin = cut(plate_z, breaks = seq(0, 5, 0.2))
)
# Calculate consistency within bins
consistency_by_bin <- data %>%
group_by(umpire, x_bin, z_bin) %>%
summarise(
n = n(),
strike_rate = mean(called_strike),
consistency = sd(called_strike),
.groups = "drop"
) %>%
filter(n >= 10) # Need sufficient sample in each bin
# Aggregate to umpire level
umpire_consistency <- consistency_by_bin %>%
group_by(umpire) %>%
summarise(
avg_consistency = mean(consistency, na.rm = TRUE),
consistency_variation = sd(consistency, na.rm = TRUE),
bins_analyzed = n()
)
return(umpire_consistency)
}
consistency_metrics <- calculate_consistency(called_pitches)
# Merge with accuracy metrics
umpire_performance <- umpire_metrics %>%
left_join(consistency_metrics, by = "umpire")
# Plot accuracy vs consistency
ggplot(umpire_performance, aes(x = accuracy, y = avg_consistency)) +
geom_point(aes(size = total_calls), alpha = 0.6) +
geom_smooth(method = "lm", se = TRUE, color = "blue") +
labs(title = "Umpire Accuracy vs Consistency",
x = "Accuracy Rate", y = "Average Consistency (lower is better)",
size = "Total Calls") +
theme_minimal()
# R: Analyzing individual umpire tendencies
# 1. Zone size by umpire
zone_size_analysis <- called_pitches %>%
group_by(umpire) %>%
summarise(
total_calls = n(),
strike_rate = mean(called_strike),
# Effective zone boundaries (where 50% are called strikes)
left_boundary = quantile(plate_x[called_strike == 1 & plate_x < 0], 0.05),
right_boundary = quantile(plate_x[called_strike == 1 & plate_x > 0], 0.95),
top_boundary = quantile(plate_z[called_strike == 1], 0.95),
bottom_boundary = quantile(plate_z[called_strike == 1], 0.05),
# Zone width and height
zone_width = right_boundary - left_boundary,
zone_height = top_boundary - bottom_boundary
) %>%
filter(total_calls >= 1000) # Minimum sample size
# Plot zone size variation
ggplot(zone_size_analysis, aes(x = zone_width, y = zone_height)) +
geom_point(aes(size = total_calls, color = strike_rate), alpha = 0.7) +
geom_vline(xintercept = 1.417, linetype = "dashed", color = "red") +
geom_hline(yintercept = mean(called_pitches$sz_top - called_pitches$sz_bot),
linetype = "dashed", color = "red") +
scale_color_gradient2(low = "blue", mid = "white", high = "red",
midpoint = median(zone_size_analysis$strike_rate)) +
labs(title = "Umpire Zone Size Variation",
subtitle = "Dashed lines show rulebook zone dimensions",
x = "Effective Zone Width (ft)", y = "Effective Zone Height (ft)") +
theme_minimal()
# 2. Directional tendencies
directional_analysis <- called_pitches %>%
mutate(
zone_region = case_when(
plate_x < -0.708 ~ "Off Plate Inside",
plate_x > 0.708 ~ "Off Plate Outside",
plate_z > sz_top ~ "High",
plate_z < sz_bot ~ "Low",
TRUE ~ "In Zone"
)
) %>%
group_by(umpire, zone_region) %>%
summarise(
n = n(),
strike_rate = mean(called_strike),
.groups = "drop"
) %>%
filter(n >= 100) %>%
pivot_wider(names_from = zone_region, values_from = c(n, strike_rate))
print(head(directional_analysis))
# 3. Count sensitivity
count_analysis <- called_pitches %>%
mutate(
count_type = case_when(
balls == 3 & strikes == 0 ~ "3-0",
balls == 3 & strikes == 1 ~ "3-1",
balls == 3 & strikes == 2 ~ "3-2",
balls == 0 & strikes == 2 ~ "0-2",
balls == 1 & strikes == 2 ~ "1-2",
balls == 2 & strikes == 2 ~ "2-2",
TRUE ~ "Other"
),
pitcher_favorable = balls == 0 & strikes == 2,
hitter_favorable = balls == 3 & strikes == 0
) %>%
group_by(umpire) %>%
summarise(
strike_rate_overall = mean(called_strike),
strike_rate_3_0 = mean(called_strike[hitter_favorable]),
strike_rate_0_2 = mean(called_strike[pitcher_favorable]),
count_sensitivity = strike_rate_3_0 - strike_rate_0_2,
.groups = "drop"
) %>%
filter(!is.na(count_sensitivity))
# Plot count sensitivity
ggplot(count_analysis, aes(x = strike_rate_0_2, y = strike_rate_3_0)) +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
geom_point(alpha = 0.6, size = 3) +
geom_smooth(method = "lm", se = TRUE, color = "blue") +
labs(title = "Umpire Count Sensitivity",
subtitle = "Strike rate in pitcher-favorable (0-2) vs hitter-favorable (3-0) counts",
x = "Strike Rate on 0-2 Count", y = "Strike Rate on 3-0 Count") +
theme_minimal()
# 4. Batter handedness effects
handedness_analysis <- called_pitches %>%
mutate(
inside = (stand == "R" & plate_x < -0.708) |
(stand == "L" & plate_x > 0.708),
outside = (stand == "R" & plate_x > 0.708) |
(stand == "L" & plate_x < -0.708)
) %>%
group_by(umpire, stand) %>%
summarise(
strike_rate_inside = mean(called_strike[inside], na.rm = TRUE),
strike_rate_outside = mean(called_strike[outside], na.rm = TRUE),
inside_bias = strike_rate_inside - strike_rate_outside,
.groups = "drop"
) %>%
pivot_wider(names_from = stand, values_from = c(strike_rate_inside,
strike_rate_outside, inside_bias))
print(head(handedness_analysis))
# R: Measuring umpire consistency
# Consistency metric: Standard deviation of calls at similar locations
calculate_consistency <- function(data) {
# Create location bins
data <- data %>%
mutate(
x_bin = cut(plate_x, breaks = seq(-2, 2, 0.2)),
z_bin = cut(plate_z, breaks = seq(0, 5, 0.2))
)
# Calculate consistency within bins
consistency_by_bin <- data %>%
group_by(umpire, x_bin, z_bin) %>%
summarise(
n = n(),
strike_rate = mean(called_strike),
consistency = sd(called_strike),
.groups = "drop"
) %>%
filter(n >= 10) # Need sufficient sample in each bin
# Aggregate to umpire level
umpire_consistency <- consistency_by_bin %>%
group_by(umpire) %>%
summarise(
avg_consistency = mean(consistency, na.rm = TRUE),
consistency_variation = sd(consistency, na.rm = TRUE),
bins_analyzed = n()
)
return(umpire_consistency)
}
consistency_metrics <- calculate_consistency(called_pitches)
# Merge with accuracy metrics
umpire_performance <- umpire_metrics %>%
left_join(consistency_metrics, by = "umpire")
# Plot accuracy vs consistency
ggplot(umpire_performance, aes(x = accuracy, y = avg_consistency)) +
geom_point(aes(size = total_calls), alpha = 0.6) +
geom_smooth(method = "lm", se = TRUE, color = "blue") +
labs(title = "Umpire Accuracy vs Consistency",
x = "Accuracy Rate", y = "Average Consistency (lower is better)",
size = "Total Calls") +
theme_minimal()
# Python: Analyzing individual umpire tendencies
# 1. Zone size by umpire
def analyze_zone_size(data):
"""Analyze effective zone size for each umpire"""
zone_metrics = []
for umpire in data['umpire'].unique():
ump_data = data[data['umpire'] == umpire]
if len(ump_data) < 1000: # Minimum sample size
continue
strikes = ump_data[ump_data['called_strike'] == 1]
if len(strikes) < 100:
continue
metrics = {
'umpire': umpire,
'total_calls': len(ump_data),
'strike_rate': ump_data['called_strike'].mean(),
'left_boundary': np.percentile(strikes[strikes['plate_x'] < 0]['plate_x'], 5),
'right_boundary': np.percentile(strikes[strikes['plate_x'] > 0]['plate_x'], 95),
'top_boundary': np.percentile(strikes['plate_z'], 95),
'bottom_boundary': np.percentile(strikes['plate_z'], 5),
}
metrics['zone_width'] = metrics['right_boundary'] - metrics['left_boundary']
metrics['zone_height'] = metrics['top_boundary'] - metrics['bottom_boundary']
zone_metrics.append(metrics)
return pd.DataFrame(zone_metrics)
zone_size_analysis = analyze_zone_size(called_pitches)
# Plot zone size variation
fig, ax = plt.subplots(figsize=(10, 8))
scatter = ax.scatter(zone_size_analysis['zone_width'],
zone_size_analysis['zone_height'],
s=zone_size_analysis['total_calls']/50,
c=zone_size_analysis['strike_rate'],
alpha=0.6, cmap='RdYlBu_r')
ax.axvline(1.417, linestyle='--', color='red', alpha=0.7, label='Rulebook Width')
ax.axhline(called_pitches['sz_top'].mean() - called_pitches['sz_bot'].mean(),
linestyle='--', color='red', alpha=0.7, label='Avg Rulebook Height')
ax.set_xlabel('Effective Zone Width (ft)', fontsize=12)
ax.set_ylabel('Effective Zone Height (ft)', fontsize=12)
ax.set_title('Umpire Zone Size Variation', fontsize=14, fontweight='bold')
ax.legend()
plt.colorbar(scatter, label='Strike Rate', ax=ax)
plt.tight_layout()
plt.show()
# 2. Count sensitivity
def analyze_count_sensitivity(data):
"""Analyze how umpire zones change with count"""
count_metrics = []
for umpire in data['umpire'].unique():
ump_data = data[data['umpire'] == umpire]
pitcher_favorable = ump_data[(ump_data['balls'] == 0) &
(ump_data['strikes'] == 2)]
hitter_favorable = ump_data[(ump_data['balls'] == 3) &
(ump_data['strikes'] == 0)]
if len(pitcher_favorable) > 20 and len(hitter_favorable) > 20:
metrics = {
'umpire': umpire,
'strike_rate_overall': ump_data['called_strike'].mean(),
'strike_rate_3_0': hitter_favorable['called_strike'].mean(),
'strike_rate_0_2': pitcher_favorable['called_strike'].mean(),
}
metrics['count_sensitivity'] = (metrics['strike_rate_3_0'] -
metrics['strike_rate_0_2'])
count_metrics.append(metrics)
return pd.DataFrame(count_metrics)
count_analysis = analyze_count_sensitivity(called_pitches)
# Plot count sensitivity
fig, ax = plt.subplots(figsize=(10, 8))
ax.scatter(count_analysis['strike_rate_0_2'],
count_analysis['strike_rate_3_0'],
alpha=0.6, s=100)
ax.plot([0, 1], [0, 1], 'k--', alpha=0.3, label='Equal rates')
# Fit line
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(
count_analysis['strike_rate_0_2'], count_analysis['strike_rate_3_0'])
x_line = np.array([count_analysis['strike_rate_0_2'].min(),
count_analysis['strike_rate_0_2'].max()])
ax.plot(x_line, slope * x_line + intercept, 'b-',
label=f'Fit line (R²={r_value**2:.3f})')
ax.set_xlabel('Strike Rate on 0-2 Count', fontsize=12)
ax.set_ylabel('Strike Rate on 3-0 Count', fontsize=12)
ax.set_title('Umpire Count Sensitivity', fontsize=14, fontweight='bold')
ax.legend()
plt.tight_layout()
plt.show()
# 3. Batter handedness effects
def analyze_handedness_effects(data):
"""Analyze inside/outside tendencies by batter handedness"""
hand_metrics = []
for umpire in data['umpire'].unique():
ump_data = data[data['umpire'] == umpire]
for stand in ['L', 'R']:
stand_data = ump_data[ump_data['stand'] == stand]
if stand == 'R':
inside = stand_data[stand_data['plate_x'] < -0.708]
outside = stand_data[stand_data['plate_x'] > 0.708]
else:
inside = stand_data[stand_data['plate_x'] > 0.708]
outside = stand_data[stand_data['plate_x'] < -0.708]
if len(inside) > 20 and len(outside) > 20:
metrics = {
'umpire': umpire,
'stand': stand,
'strike_rate_inside': inside['called_strike'].mean(),
'strike_rate_outside': outside['called_strike'].mean(),
}
metrics['inside_bias'] = (metrics['strike_rate_inside'] -
metrics['strike_rate_outside'])
hand_metrics.append(metrics)
return pd.DataFrame(hand_metrics)
handedness_analysis = analyze_handedness_effects(called_pitches)
print(handedness_analysis.head(10))
Machine learning models can predict the probability that a pitch will be called a strike based on its location, context, and the umpire calling the game. These models serve multiple purposes:
- Expected vs Actual: Compare predicted probabilities to actual calls to identify unusual decisions
- Umpire Effects: Quantify how much each umpire deviates from expected behavior
- Strategic Planning: Help teams understand strike probabilities in different game situations
- Automated Strike Zone: Provide baseline for ABS system calibration
Logistic Regression Model
We'll start with a logistic regression model, which provides interpretable coefficients:
# R: Logistic regression for called strike probability
library(broom)
# Prepare features
model_data <- called_pitches %>%
filter(!is.na(plate_x) & !is.na(plate_z)) %>%
mutate(
# Distance features
abs_plate_x = abs(plate_x),
plate_x_squared = plate_x^2,
plate_z_squared = plate_z^2,
# Interaction with zone
x_z_interaction = plate_x * plate_z,
# Count features
count = paste0(balls, "-", strikes),
pitcher_ahead = strikes > balls,
# Batter-pitcher matchup
same_hand = stand == p_throws
)
# Build logistic regression model
logit_model <- glm(
called_strike ~ plate_x + plate_z +
plate_x_squared + plate_z_squared +
x_z_interaction +
abs_plate_x +
count + stand + p_throws +
same_hand + outs,
data = model_data,
family = binomial(link = "logit")
)
# Model summary
summary(logit_model)
# Get coefficients
coef_df <- tidy(logit_model) %>%
arrange(desc(abs(statistic)))
print(coef_df)
# Add predictions to data
model_data$pred_prob_logit <- predict(logit_model, type = "response")
# Model performance
library(pROC)
roc_obj <- roc(model_data$called_strike, model_data$pred_prob_logit)
auc_score <- auc(roc_obj)
cat("Logistic Regression AUC:", round(auc_score, 4), "\n")
# Plot ROC curve
plot(roc_obj, main = paste("ROC Curve - Logistic Regression (AUC =",
round(auc_score, 3), ")"))
# Calibration plot
model_data %>%
mutate(pred_bin = cut(pred_prob_logit, breaks = seq(0, 1, 0.1))) %>%
group_by(pred_bin) %>%
summarise(
predicted = mean(pred_prob_logit),
actual = mean(called_strike),
n = n()
) %>%
ggplot(aes(x = predicted, y = actual)) +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
geom_point(aes(size = n), alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Calibration Plot - Logistic Regression",
x = "Predicted Probability", y = "Actual Strike Rate") +
theme_minimal()
# Python: Logistic regression for called strike probability
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_curve, auc, classification_report
# Prepare features
model_data = called_pitches.dropna(subset=['plate_x', 'plate_z']).copy()
# Create features
model_data['abs_plate_x'] = np.abs(model_data['plate_x'])
model_data['plate_x_squared'] = model_data['plate_x']**2
model_data['plate_z_squared'] = model_data['plate_z']**2
model_data['x_z_interaction'] = model_data['plate_x'] * model_data['plate_z']
model_data['count'] = model_data['balls'].astype(str) + '-' + model_data['strikes'].astype(str)
model_data['pitcher_ahead'] = (model_data['strikes'] > model_data['balls']).astype(int)
model_data['same_hand'] = (model_data['stand'] == model_data['p_throws']).astype(int)
# Prepare feature matrix
feature_cols = ['plate_x', 'plate_z', 'plate_x_squared', 'plate_z_squared',
'x_z_interaction', 'abs_plate_x', 'pitcher_ahead',
'same_hand', 'outs']
X = model_data[feature_cols]
y = model_data['called_strike']
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Fit logistic regression
logit_model = LogisticRegression(max_iter=1000, random_state=42)
logit_model.fit(X_train_scaled, y_train)
# Predictions
y_pred_prob = logit_model.predict_proba(X_test_scaled)[:, 1]
y_pred = logit_model.predict(X_test_scaled)
# Model performance
print("Logistic Regression Performance:")
print(classification_report(y_test, y_pred))
# AUC
fpr, tpr, _ = roc_curve(y_test, y_pred_prob)
roc_auc = auc(fpr, tpr)
print(f"\nAUC: {roc_auc:.4f}")
# Plot ROC curve
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))
ax1.plot(fpr, tpr, color='blue', lw=2,
label=f'ROC curve (AUC = {roc_auc:.3f})')
ax1.plot([0, 1], [0, 1], color='red', lw=2, linestyle='--', label='Random')
ax1.set_xlim([0.0, 1.0])
ax1.set_ylim([0.0, 1.05])
ax1.set_xlabel('False Positive Rate', fontsize=12)
ax1.set_ylabel('True Positive Rate', fontsize=12)
ax1.set_title('ROC Curve - Logistic Regression', fontsize=14, fontweight='bold')
ax1.legend(loc="lower right")
ax1.grid(alpha=0.3)
# Calibration plot
bins = np.linspace(0, 1, 11)
bin_centers = (bins[:-1] + bins[1:]) / 2
digitized = np.digitize(y_pred_prob, bins) - 1
calibration_data = []
for i in range(len(bins) - 1):
mask = digitized == i
if mask.sum() > 0:
calibration_data.append({
'predicted': y_pred_prob[mask].mean(),
'actual': y_test.values[mask].mean(),
'count': mask.sum()
})
calib_df = pd.DataFrame(calibration_data)
ax2.plot([0, 1], [0, 1], 'r--', lw=2, label='Perfect calibration')
ax2.scatter(calib_df['predicted'], calib_df['actual'],
s=calib_df['count']/10, alpha=0.6)
ax2.set_xlabel('Predicted Probability', fontsize=12)
ax2.set_ylabel('Actual Strike Rate', fontsize=12)
ax2.set_title('Calibration Plot', fontsize=14, fontweight='bold')
ax2.legend()
ax2.grid(alpha=0.3)
plt.tight_layout()
plt.show()
# Feature importance
feature_importance = pd.DataFrame({
'feature': feature_cols,
'coefficient': logit_model.coef_[0]
}).sort_values('coefficient', key=abs, ascending=False)
print("\nFeature Importance (Logistic Regression Coefficients):")
print(feature_importance)
Random Forest Model
Random forests can capture non-linear relationships and interactions that logistic regression might miss:
# R: Random Forest for called strike probability
library(randomForest)
# Prepare data for random forest
rf_features <- c("plate_x", "plate_z", "abs_plate_x",
"dist_to_zone", "balls", "strikes", "outs",
"stand", "p_throws", "pitch_type")
rf_data <- model_data[complete.cases(model_data[, rf_features]), ]
rf_data$called_strike <- as.factor(rf_data$called_strike)
# Split data
set.seed(42)
train_idx <- sample(1:nrow(rf_data), 0.7 * nrow(rf_data))
train_data <- rf_data[train_idx, ]
test_data <- rf_data[-train_idx, ]
# Train random forest
rf_model <- randomForest(
as.formula(paste("called_strike ~", paste(rf_features, collapse = " + "))),
data = train_data,
ntree = 100,
mtry = 3,
importance = TRUE
)
# Predictions
test_data$pred_prob_rf <- predict(rf_model, test_data, type = "prob")[, 2]
# Performance
rf_roc <- roc(as.numeric(as.character(test_data$called_strike)),
test_data$pred_prob_rf)
rf_auc <- auc(rf_roc)
cat("Random Forest AUC:", round(rf_auc, 4), "\n")
# Variable importance
importance_df <- as.data.frame(importance(rf_model)) %>%
tibble::rownames_to_column("variable") %>%
arrange(desc(MeanDecreaseGini))
# Plot variable importance
ggplot(importance_df, aes(x = reorder(variable, MeanDecreaseGini),
y = MeanDecreaseGini)) +
geom_col(fill = "steelblue") +
coord_flip() +
labs(title = "Random Forest Variable Importance",
x = "Variable", y = "Mean Decrease in Gini") +
theme_minimal()
# Python: Random Forest for called strike probability
# Prepare data
rf_features = ['plate_x', 'plate_z', 'abs_plate_x', 'dist_to_zone',
'balls', 'strikes', 'outs']
# Encode categorical variables
from sklearn.preprocessing import LabelEncoder
model_data_rf = model_data.copy()
for col in ['stand', 'p_throws', 'pitch_type']:
le = LabelEncoder()
model_data_rf[col + '_encoded'] = le.fit_transform(model_data_rf[col])
rf_features.append(col + '_encoded')
# Prepare feature matrix
X_rf = model_data_rf[rf_features].dropna()
y_rf = model_data_rf.loc[X_rf.index, 'called_strike']
# Split data
X_train_rf, X_test_rf, y_train_rf, y_test_rf = train_test_split(
X_rf, y_rf, test_size=0.3, random_state=42, stratify=y_rf
)
# Train random forest
rf_model = RandomForestClassifier(
n_estimators=100,
max_depth=20,
min_samples_split=100,
random_state=42,
n_jobs=-1
)
rf_model.fit(X_train_rf, y_train_rf)
# Predictions
y_pred_prob_rf = rf_model.predict_proba(X_test_rf)[:, 1]
y_pred_rf = rf_model.predict(X_test_rf)
# Performance
print("Random Forest Performance:")
print(classification_report(y_test_rf, y_pred_rf))
fpr_rf, tpr_rf, _ = roc_curve(y_test_rf, y_pred_prob_rf)
roc_auc_rf = auc(fpr_rf, tpr_rf)
print(f"\nAUC: {roc_auc_rf:.4f}")
# Feature importance
feature_importance_rf = pd.DataFrame({
'feature': rf_features,
'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)
# Plot feature importance
fig, ax = plt.subplots(figsize=(10, 6))
ax.barh(feature_importance_rf['feature'], feature_importance_rf['importance'])
ax.set_xlabel('Importance', fontsize=12)
ax.set_title('Random Forest Feature Importance', fontsize=14, fontweight='bold')
ax.invert_yaxis()
plt.tight_layout()
plt.show()
print("\nFeature Importance:")
print(feature_importance_rf)
Umpire-Specific Models
We can build umpire-specific models or include umpire as a feature to capture individual tendencies:
# R: Umpire-specific adjustments
# Add umpire as a factor in the model
umpire_model_data <- model_data %>%
filter(umpire %in% names(sort(table(umpire), decreasing = TRUE)[1:15])) # Top 15 umpires
umpire_logit <- glm(
called_strike ~ plate_x + plate_z +
plate_x_squared + plate_z_squared +
count + stand + umpire,
data = umpire_model_data,
family = binomial(link = "logit")
)
# Extract umpire coefficients
umpire_effects <- tidy(umpire_logit) %>%
filter(str_detect(term, "umpire")) %>%
mutate(
umpire_name = str_remove(term, "umpire"),
effect_on_strike_prob = plogis(estimate) - 0.5 # Effect on probability
) %>%
arrange(desc(estimate))
print(umpire_effects)
# Visualize umpire effects
ggplot(umpire_effects, aes(x = reorder(umpire_name, estimate), y = estimate)) +
geom_col(aes(fill = estimate > 0)) +
geom_errorbar(aes(ymin = estimate - 1.96*std.error,
ymax = estimate + 1.96*std.error), width = 0.2) +
coord_flip() +
scale_fill_manual(values = c("blue", "red"), guide = "none") +
labs(title = "Umpire Effects on Strike Probability",
subtitle = "Coefficient estimates with 95% confidence intervals",
x = "Umpire", y = "Log-Odds Effect") +
theme_minimal()
# R: Logistic regression for called strike probability
library(broom)
# Prepare features
model_data <- called_pitches %>%
filter(!is.na(plate_x) & !is.na(plate_z)) %>%
mutate(
# Distance features
abs_plate_x = abs(plate_x),
plate_x_squared = plate_x^2,
plate_z_squared = plate_z^2,
# Interaction with zone
x_z_interaction = plate_x * plate_z,
# Count features
count = paste0(balls, "-", strikes),
pitcher_ahead = strikes > balls,
# Batter-pitcher matchup
same_hand = stand == p_throws
)
# Build logistic regression model
logit_model <- glm(
called_strike ~ plate_x + plate_z +
plate_x_squared + plate_z_squared +
x_z_interaction +
abs_plate_x +
count + stand + p_throws +
same_hand + outs,
data = model_data,
family = binomial(link = "logit")
)
# Model summary
summary(logit_model)
# Get coefficients
coef_df <- tidy(logit_model) %>%
arrange(desc(abs(statistic)))
print(coef_df)
# Add predictions to data
model_data$pred_prob_logit <- predict(logit_model, type = "response")
# Model performance
library(pROC)
roc_obj <- roc(model_data$called_strike, model_data$pred_prob_logit)
auc_score <- auc(roc_obj)
cat("Logistic Regression AUC:", round(auc_score, 4), "\n")
# Plot ROC curve
plot(roc_obj, main = paste("ROC Curve - Logistic Regression (AUC =",
round(auc_score, 3), ")"))
# Calibration plot
model_data %>%
mutate(pred_bin = cut(pred_prob_logit, breaks = seq(0, 1, 0.1))) %>%
group_by(pred_bin) %>%
summarise(
predicted = mean(pred_prob_logit),
actual = mean(called_strike),
n = n()
) %>%
ggplot(aes(x = predicted, y = actual)) +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
geom_point(aes(size = n), alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Calibration Plot - Logistic Regression",
x = "Predicted Probability", y = "Actual Strike Rate") +
theme_minimal()
# R: Random Forest for called strike probability
library(randomForest)
# Prepare data for random forest
rf_features <- c("plate_x", "plate_z", "abs_plate_x",
"dist_to_zone", "balls", "strikes", "outs",
"stand", "p_throws", "pitch_type")
rf_data <- model_data[complete.cases(model_data[, rf_features]), ]
rf_data$called_strike <- as.factor(rf_data$called_strike)
# Split data
set.seed(42)
train_idx <- sample(1:nrow(rf_data), 0.7 * nrow(rf_data))
train_data <- rf_data[train_idx, ]
test_data <- rf_data[-train_idx, ]
# Train random forest
rf_model <- randomForest(
as.formula(paste("called_strike ~", paste(rf_features, collapse = " + "))),
data = train_data,
ntree = 100,
mtry = 3,
importance = TRUE
)
# Predictions
test_data$pred_prob_rf <- predict(rf_model, test_data, type = "prob")[, 2]
# Performance
rf_roc <- roc(as.numeric(as.character(test_data$called_strike)),
test_data$pred_prob_rf)
rf_auc <- auc(rf_roc)
cat("Random Forest AUC:", round(rf_auc, 4), "\n")
# Variable importance
importance_df <- as.data.frame(importance(rf_model)) %>%
tibble::rownames_to_column("variable") %>%
arrange(desc(MeanDecreaseGini))
# Plot variable importance
ggplot(importance_df, aes(x = reorder(variable, MeanDecreaseGini),
y = MeanDecreaseGini)) +
geom_col(fill = "steelblue") +
coord_flip() +
labs(title = "Random Forest Variable Importance",
x = "Variable", y = "Mean Decrease in Gini") +
theme_minimal()
# R: Umpire-specific adjustments
# Add umpire as a factor in the model
umpire_model_data <- model_data %>%
filter(umpire %in% names(sort(table(umpire), decreasing = TRUE)[1:15])) # Top 15 umpires
umpire_logit <- glm(
called_strike ~ plate_x + plate_z +
plate_x_squared + plate_z_squared +
count + stand + umpire,
data = umpire_model_data,
family = binomial(link = "logit")
)
# Extract umpire coefficients
umpire_effects <- tidy(umpire_logit) %>%
filter(str_detect(term, "umpire")) %>%
mutate(
umpire_name = str_remove(term, "umpire"),
effect_on_strike_prob = plogis(estimate) - 0.5 # Effect on probability
) %>%
arrange(desc(estimate))
print(umpire_effects)
# Visualize umpire effects
ggplot(umpire_effects, aes(x = reorder(umpire_name, estimate), y = estimate)) +
geom_col(aes(fill = estimate > 0)) +
geom_errorbar(aes(ymin = estimate - 1.96*std.error,
ymax = estimate + 1.96*std.error), width = 0.2) +
coord_flip() +
scale_fill_manual(values = c("blue", "red"), guide = "none") +
labs(title = "Umpire Effects on Strike Probability",
subtitle = "Coefficient estimates with 95% confidence intervals",
x = "Umpire", y = "Log-Odds Effect") +
theme_minimal()
# Python: Logistic regression for called strike probability
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_curve, auc, classification_report
# Prepare features
model_data = called_pitches.dropna(subset=['plate_x', 'plate_z']).copy()
# Create features
model_data['abs_plate_x'] = np.abs(model_data['plate_x'])
model_data['plate_x_squared'] = model_data['plate_x']**2
model_data['plate_z_squared'] = model_data['plate_z']**2
model_data['x_z_interaction'] = model_data['plate_x'] * model_data['plate_z']
model_data['count'] = model_data['balls'].astype(str) + '-' + model_data['strikes'].astype(str)
model_data['pitcher_ahead'] = (model_data['strikes'] > model_data['balls']).astype(int)
model_data['same_hand'] = (model_data['stand'] == model_data['p_throws']).astype(int)
# Prepare feature matrix
feature_cols = ['plate_x', 'plate_z', 'plate_x_squared', 'plate_z_squared',
'x_z_interaction', 'abs_plate_x', 'pitcher_ahead',
'same_hand', 'outs']
X = model_data[feature_cols]
y = model_data['called_strike']
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Fit logistic regression
logit_model = LogisticRegression(max_iter=1000, random_state=42)
logit_model.fit(X_train_scaled, y_train)
# Predictions
y_pred_prob = logit_model.predict_proba(X_test_scaled)[:, 1]
y_pred = logit_model.predict(X_test_scaled)
# Model performance
print("Logistic Regression Performance:")
print(classification_report(y_test, y_pred))
# AUC
fpr, tpr, _ = roc_curve(y_test, y_pred_prob)
roc_auc = auc(fpr, tpr)
print(f"\nAUC: {roc_auc:.4f}")
# Plot ROC curve
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))
ax1.plot(fpr, tpr, color='blue', lw=2,
label=f'ROC curve (AUC = {roc_auc:.3f})')
ax1.plot([0, 1], [0, 1], color='red', lw=2, linestyle='--', label='Random')
ax1.set_xlim([0.0, 1.0])
ax1.set_ylim([0.0, 1.05])
ax1.set_xlabel('False Positive Rate', fontsize=12)
ax1.set_ylabel('True Positive Rate', fontsize=12)
ax1.set_title('ROC Curve - Logistic Regression', fontsize=14, fontweight='bold')
ax1.legend(loc="lower right")
ax1.grid(alpha=0.3)
# Calibration plot
bins = np.linspace(0, 1, 11)
bin_centers = (bins[:-1] + bins[1:]) / 2
digitized = np.digitize(y_pred_prob, bins) - 1
calibration_data = []
for i in range(len(bins) - 1):
mask = digitized == i
if mask.sum() > 0:
calibration_data.append({
'predicted': y_pred_prob[mask].mean(),
'actual': y_test.values[mask].mean(),
'count': mask.sum()
})
calib_df = pd.DataFrame(calibration_data)
ax2.plot([0, 1], [0, 1], 'r--', lw=2, label='Perfect calibration')
ax2.scatter(calib_df['predicted'], calib_df['actual'],
s=calib_df['count']/10, alpha=0.6)
ax2.set_xlabel('Predicted Probability', fontsize=12)
ax2.set_ylabel('Actual Strike Rate', fontsize=12)
ax2.set_title('Calibration Plot', fontsize=14, fontweight='bold')
ax2.legend()
ax2.grid(alpha=0.3)
plt.tight_layout()
plt.show()
# Feature importance
feature_importance = pd.DataFrame({
'feature': feature_cols,
'coefficient': logit_model.coef_[0]
}).sort_values('coefficient', key=abs, ascending=False)
print("\nFeature Importance (Logistic Regression Coefficients):")
print(feature_importance)
# Python: Random Forest for called strike probability
# Prepare data
rf_features = ['plate_x', 'plate_z', 'abs_plate_x', 'dist_to_zone',
'balls', 'strikes', 'outs']
# Encode categorical variables
from sklearn.preprocessing import LabelEncoder
model_data_rf = model_data.copy()
for col in ['stand', 'p_throws', 'pitch_type']:
le = LabelEncoder()
model_data_rf[col + '_encoded'] = le.fit_transform(model_data_rf[col])
rf_features.append(col + '_encoded')
# Prepare feature matrix
X_rf = model_data_rf[rf_features].dropna()
y_rf = model_data_rf.loc[X_rf.index, 'called_strike']
# Split data
X_train_rf, X_test_rf, y_train_rf, y_test_rf = train_test_split(
X_rf, y_rf, test_size=0.3, random_state=42, stratify=y_rf
)
# Train random forest
rf_model = RandomForestClassifier(
n_estimators=100,
max_depth=20,
min_samples_split=100,
random_state=42,
n_jobs=-1
)
rf_model.fit(X_train_rf, y_train_rf)
# Predictions
y_pred_prob_rf = rf_model.predict_proba(X_test_rf)[:, 1]
y_pred_rf = rf_model.predict(X_test_rf)
# Performance
print("Random Forest Performance:")
print(classification_report(y_test_rf, y_pred_rf))
fpr_rf, tpr_rf, _ = roc_curve(y_test_rf, y_pred_prob_rf)
roc_auc_rf = auc(fpr_rf, tpr_rf)
print(f"\nAUC: {roc_auc_rf:.4f}")
# Feature importance
feature_importance_rf = pd.DataFrame({
'feature': rf_features,
'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)
# Plot feature importance
fig, ax = plt.subplots(figsize=(10, 6))
ax.barh(feature_importance_rf['feature'], feature_importance_rf['importance'])
ax.set_xlabel('Importance', fontsize=12)
ax.set_title('Random Forest Feature Importance', fontsize=14, fontweight='bold')
ax.invert_yaxis()
plt.tight_layout()
plt.show()
print("\nFeature Importance:")
print(feature_importance_rf)
Beyond individual call accuracy, we want to understand how umpire decisions affect game outcomes. Poor calls can change at-bat results, inning dynamics, and ultimately win probabilities.
Run Value of Incorrect Calls
We can assign run values to each ball-strike call based on the change in expected runs:
# R: Calculate run impact of umpire calls
# Run expectancy matrix (simplified - would use actual RE24 values)
run_expectancy <- expand.grid(
balls = 0:3,
strikes = 0:2,
outs = 0:2
) %>%
mutate(
re = case_when(
balls == 3 ~ 0.6 + (2 - strikes) * 0.1,
strikes == 2 ~ 0.3 - balls * 0.05,
TRUE ~ 0.4 + (balls - strikes) * 0.05
)
)
# Calculate impact of incorrect calls
impact_data <- called_pitches %>%
left_join(run_expectancy, by = c("balls", "strikes", "outs")) %>%
mutate(
# What would the count be with correct call?
correct_balls = ifelse(!in_zone & called_strike == 1, balls + 1, balls),
correct_strikes = ifelse(in_zone & called_strike == 0, strikes + 1, strikes),
# Would the at-bat have ended?
actual_walk = balls == 3 & called_strike == 0 & !in_zone,
actual_strikeout = strikes == 2 & called_strike == 1 & in_zone,
# Run value impact
incorrect_call = !correct_call,
favor_pitcher = (in_zone & called_strike == 0) | # Should be strike, called ball
(!in_zone & called_strike == 1) # Should be ball, called strike
)
# Calculate run value impact by umpire
umpire_impact <- impact_data %>%
filter(incorrect_call) %>%
group_by(umpire) %>%
summarise(
incorrect_calls = n(),
calls_favor_pitcher = sum(favor_pitcher),
calls_favor_batter = sum(!favor_pitcher),
net_pitcher_favor = calls_favor_pitcher - calls_favor_batter,
pct_favor_pitcher = mean(favor_pitcher),
# Simplified run impact (would use actual RE24)
estimated_run_impact = sum(ifelse(favor_pitcher, -0.05, 0.05))
) %>%
arrange(desc(abs(estimated_run_impact)))
print(head(umpire_impact, 15))
# Visualize impact
ggplot(umpire_impact, aes(x = incorrect_calls, y = estimated_run_impact)) +
geom_point(aes(color = pct_favor_pitcher, size = abs(estimated_run_impact)),
alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray") +
scale_color_gradient2(low = "blue", mid = "white", high = "red",
midpoint = 0.5, name = "% Favor\nPitcher") +
labs(title = "Umpire Impact on Run Expectancy",
x = "Number of Incorrect Calls",
y = "Estimated Run Impact",
size = "Absolute Impact") +
theme_minimal()
# Python: Calculate run impact of umpire calls
# Create simplified run expectancy matrix
re_data = []
for balls in range(4):
for strikes in range(3):
for outs in range(3):
if balls == 3:
re = 0.6 + (2 - strikes) * 0.1
elif strikes == 2:
re = 0.3 - balls * 0.05
else:
re = 0.4 + (balls - strikes) * 0.05
re_data.append({
'balls': balls,
'strikes': strikes,
'outs': outs,
're': re
})
run_expectancy = pd.DataFrame(re_data)
# Merge with pitch data
impact_data = called_pitches.merge(
run_expectancy,
on=['balls', 'strikes', 'outs'],
how='left'
)
# Calculate impact of incorrect calls
impact_data['incorrect_call'] = ~impact_data['correct_call']
impact_data['favor_pitcher'] = (
(impact_data['in_zone'] & (impact_data['called_strike'] == 0)) |
(~impact_data['in_zone'] & (impact_data['called_strike'] == 1))
)
# Calculate impact by umpire
umpire_impact = impact_data[impact_data['incorrect_call']].groupby('umpire').agg({
'incorrect_call': 'count',
'favor_pitcher': ['sum', lambda x: (~x).sum(), 'mean']
}).reset_index()
umpire_impact.columns = ['umpire', 'incorrect_calls', 'calls_favor_pitcher',
'calls_favor_batter', 'pct_favor_pitcher']
umpire_impact['net_pitcher_favor'] = (
umpire_impact['calls_favor_pitcher'] - umpire_impact['calls_favor_batter']
)
umpire_impact['estimated_run_impact'] = umpire_impact.apply(
lambda row: row['calls_favor_pitcher'] * -0.05 + row['calls_favor_batter'] * 0.05,
axis=1
)
umpire_impact = umpire_impact.sort_values('estimated_run_impact',
key=abs, ascending=False)
print(umpire_impact.head(15))
# Visualize impact
fig, ax = plt.subplots(figsize=(12, 8))
scatter = ax.scatter(umpire_impact['incorrect_calls'],
umpire_impact['estimated_run_impact'],
c=umpire_impact['pct_favor_pitcher'],
s=np.abs(umpire_impact['estimated_run_impact']) * 1000,
alpha=0.6, cmap='RdBu_r')
ax.axhline(0, linestyle='--', color='gray', alpha=0.5)
ax.set_xlabel('Number of Incorrect Calls', fontsize=12)
ax.set_ylabel('Estimated Run Impact', fontsize=12)
ax.set_title('Umpire Impact on Run Expectancy', fontsize=14, fontweight='bold')
plt.colorbar(scatter, label='% Favor Pitcher', ax=ax)
plt.tight_layout()
plt.show()
Win Probability Impact
High-leverage situations magnify the impact of incorrect calls. A missed strike call with bases loaded in a tie game has far greater impact than the same miss in a 10-0 game:
# R: Win probability impact analysis
# Simplified leverage calculation (would use actual WPA in practice)
leverage_data <- called_pitches %>%
mutate(
# Simplified leverage index
score_diff = abs(rnorm(n(), 0, 2)), # Placeholder
inning = sample(1:9, n(), replace = TRUE),
late_inning = inning >= 7,
close_game = score_diff <= 2,
leverage = case_when(
late_inning & close_game & outs == 2 ~ 2.5,
late_inning & close_game ~ 2.0,
close_game ~ 1.5,
TRUE ~ 1.0
),
# High leverage incorrect call
high_leverage_error = incorrect_call & leverage > 1.5
)
# Impact in high-leverage situations
high_leverage_impact <- leverage_data %>%
group_by(umpire) %>%
summarise(
total_high_lev = sum(leverage > 1.5),
high_lev_errors = sum(high_leverage_error, na.rm = TRUE),
high_lev_error_rate = mean(high_leverage_error, na.rm = TRUE),
avg_leverage_of_errors = mean(leverage[incorrect_call], na.rm = TRUE)
) %>%
filter(total_high_lev >= 100) %>%
arrange(desc(high_lev_error_rate))
print(head(high_leverage_impact, 10))
# R: Calculate run impact of umpire calls
# Run expectancy matrix (simplified - would use actual RE24 values)
run_expectancy <- expand.grid(
balls = 0:3,
strikes = 0:2,
outs = 0:2
) %>%
mutate(
re = case_when(
balls == 3 ~ 0.6 + (2 - strikes) * 0.1,
strikes == 2 ~ 0.3 - balls * 0.05,
TRUE ~ 0.4 + (balls - strikes) * 0.05
)
)
# Calculate impact of incorrect calls
impact_data <- called_pitches %>%
left_join(run_expectancy, by = c("balls", "strikes", "outs")) %>%
mutate(
# What would the count be with correct call?
correct_balls = ifelse(!in_zone & called_strike == 1, balls + 1, balls),
correct_strikes = ifelse(in_zone & called_strike == 0, strikes + 1, strikes),
# Would the at-bat have ended?
actual_walk = balls == 3 & called_strike == 0 & !in_zone,
actual_strikeout = strikes == 2 & called_strike == 1 & in_zone,
# Run value impact
incorrect_call = !correct_call,
favor_pitcher = (in_zone & called_strike == 0) | # Should be strike, called ball
(!in_zone & called_strike == 1) # Should be ball, called strike
)
# Calculate run value impact by umpire
umpire_impact <- impact_data %>%
filter(incorrect_call) %>%
group_by(umpire) %>%
summarise(
incorrect_calls = n(),
calls_favor_pitcher = sum(favor_pitcher),
calls_favor_batter = sum(!favor_pitcher),
net_pitcher_favor = calls_favor_pitcher - calls_favor_batter,
pct_favor_pitcher = mean(favor_pitcher),
# Simplified run impact (would use actual RE24)
estimated_run_impact = sum(ifelse(favor_pitcher, -0.05, 0.05))
) %>%
arrange(desc(abs(estimated_run_impact)))
print(head(umpire_impact, 15))
# Visualize impact
ggplot(umpire_impact, aes(x = incorrect_calls, y = estimated_run_impact)) +
geom_point(aes(color = pct_favor_pitcher, size = abs(estimated_run_impact)),
alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray") +
scale_color_gradient2(low = "blue", mid = "white", high = "red",
midpoint = 0.5, name = "% Favor\nPitcher") +
labs(title = "Umpire Impact on Run Expectancy",
x = "Number of Incorrect Calls",
y = "Estimated Run Impact",
size = "Absolute Impact") +
theme_minimal()
# R: Win probability impact analysis
# Simplified leverage calculation (would use actual WPA in practice)
leverage_data <- called_pitches %>%
mutate(
# Simplified leverage index
score_diff = abs(rnorm(n(), 0, 2)), # Placeholder
inning = sample(1:9, n(), replace = TRUE),
late_inning = inning >= 7,
close_game = score_diff <= 2,
leverage = case_when(
late_inning & close_game & outs == 2 ~ 2.5,
late_inning & close_game ~ 2.0,
close_game ~ 1.5,
TRUE ~ 1.0
),
# High leverage incorrect call
high_leverage_error = incorrect_call & leverage > 1.5
)
# Impact in high-leverage situations
high_leverage_impact <- leverage_data %>%
group_by(umpire) %>%
summarise(
total_high_lev = sum(leverage > 1.5),
high_lev_errors = sum(high_leverage_error, na.rm = TRUE),
high_lev_error_rate = mean(high_leverage_error, na.rm = TRUE),
avg_leverage_of_errors = mean(leverage[incorrect_call], na.rm = TRUE)
) %>%
filter(total_high_lev >= 100) %>%
arrange(desc(high_lev_error_rate))
print(head(high_leverage_impact, 10))
# Python: Calculate run impact of umpire calls
# Create simplified run expectancy matrix
re_data = []
for balls in range(4):
for strikes in range(3):
for outs in range(3):
if balls == 3:
re = 0.6 + (2 - strikes) * 0.1
elif strikes == 2:
re = 0.3 - balls * 0.05
else:
re = 0.4 + (balls - strikes) * 0.05
re_data.append({
'balls': balls,
'strikes': strikes,
'outs': outs,
're': re
})
run_expectancy = pd.DataFrame(re_data)
# Merge with pitch data
impact_data = called_pitches.merge(
run_expectancy,
on=['balls', 'strikes', 'outs'],
how='left'
)
# Calculate impact of incorrect calls
impact_data['incorrect_call'] = ~impact_data['correct_call']
impact_data['favor_pitcher'] = (
(impact_data['in_zone'] & (impact_data['called_strike'] == 0)) |
(~impact_data['in_zone'] & (impact_data['called_strike'] == 1))
)
# Calculate impact by umpire
umpire_impact = impact_data[impact_data['incorrect_call']].groupby('umpire').agg({
'incorrect_call': 'count',
'favor_pitcher': ['sum', lambda x: (~x).sum(), 'mean']
}).reset_index()
umpire_impact.columns = ['umpire', 'incorrect_calls', 'calls_favor_pitcher',
'calls_favor_batter', 'pct_favor_pitcher']
umpire_impact['net_pitcher_favor'] = (
umpire_impact['calls_favor_pitcher'] - umpire_impact['calls_favor_batter']
)
umpire_impact['estimated_run_impact'] = umpire_impact.apply(
lambda row: row['calls_favor_pitcher'] * -0.05 + row['calls_favor_batter'] * 0.05,
axis=1
)
umpire_impact = umpire_impact.sort_values('estimated_run_impact',
key=abs, ascending=False)
print(umpire_impact.head(15))
# Visualize impact
fig, ax = plt.subplots(figsize=(12, 8))
scatter = ax.scatter(umpire_impact['incorrect_calls'],
umpire_impact['estimated_run_impact'],
c=umpire_impact['pct_favor_pitcher'],
s=np.abs(umpire_impact['estimated_run_impact']) * 1000,
alpha=0.6, cmap='RdBu_r')
ax.axhline(0, linestyle='--', color='gray', alpha=0.5)
ax.set_xlabel('Number of Incorrect Calls', fontsize=12)
ax.set_ylabel('Estimated Run Impact', fontsize=12)
ax.set_title('Umpire Impact on Run Expectancy', fontsize=14, fontweight='bold')
plt.colorbar(scatter, label='% Favor Pitcher', ax=ax)
plt.tight_layout()
plt.show()
The Automated Ball-Strike (ABS) system, colloquially known as "robot umpires," has been tested in minor league baseball since 2019 and represents a potential future for MLB. Understanding the differences between human and automated strike zones is crucial for evaluating this technology.
ABS System Overview
The ABS system uses TrackMan radar technology to determine ball-strike calls instantaneously. Two implementations have been tested:
- Full ABS: All ball-strike calls are made by the system
- ABS Challenge System: Each team gets a limited number of challenges per game (typically 3)
Comparing Human vs ABS Zones
Let's analyze how an ABS system would call pitches compared to human umpires:
# R: Simulate ABS system and compare to human calls
# Define ABS strike zone (strictly rulebook-based)
abs_call <- function(plate_x, plate_z, sz_top, sz_bot) {
in_strike_zone(plate_x, plate_z, sz_top, sz_bot)
}
# Apply ABS to our data
abs_comparison <- called_pitches %>%
mutate(
abs_strike = abs_call(plate_x, plate_z, sz_top, sz_bot),
human_strike = called_strike == 1,
# Agreement/disagreement
calls_agree = abs_strike == human_strike,
abs_more_lenient = !abs_strike & human_strike, # Human called strike, ABS would call ball
abs_more_strict = abs_strike & !human_strike, # Human called ball, ABS would call strike
disagreement_type = case_when(
calls_agree ~ "Agreement",
abs_more_lenient ~ "ABS More Lenient",
abs_more_strict ~ "ABS More Strict"
)
)
# Overall agreement rate
agreement_summary <- abs_comparison %>%
summarise(
total_pitches = n(),
agreement_rate = mean(calls_agree),
abs_more_lenient_pct = mean(abs_more_lenient),
abs_more_strict_pct = mean(abs_more_strict)
)
print(agreement_summary)
# Agreement by umpire
umpire_abs_comparison <- abs_comparison %>%
group_by(umpire) %>%
summarise(
pitches = n(),
agreement_rate = mean(calls_agree),
abs_more_lenient = sum(abs_more_lenient),
abs_more_strict = sum(abs_more_strict),
net_stricter_than_abs = sum(abs_more_strict) - sum(abs_more_lenient)
) %>%
filter(pitches >= 1000) %>%
arrange(agreement_rate)
print(head(umpire_abs_comparison, 10))
# Visualize disagreement locations
ggplot(abs_comparison %>% filter(!calls_agree),
aes(x = plate_x, y = plate_z, color = disagreement_type)) +
geom_point(alpha = 0.3, size = 0.5) +
geom_rect(aes(xmin = -0.708, xmax = 0.708,
ymin = mean(sz_bot), ymax = mean(sz_top)),
fill = NA, color = "black", linewidth = 1, inherit.aes = FALSE) +
scale_color_manual(values = c("ABS More Lenient" = "blue",
"ABS More Strict" = "red")) +
coord_fixed(ratio = 1) +
labs(title = "Human-ABS Disagreements by Location",
x = "Horizontal Location (ft)", y = "Vertical Location (ft)",
color = "Disagreement Type") +
theme_minimal()
# Impact of ABS on game statistics
abs_impact <- abs_comparison %>%
summarise(
# Current stats with human umps
current_k_rate = mean(strikes == 2 & human_strike),
current_bb_rate = mean(balls == 3 & !human_strike),
# Projected stats with ABS
abs_k_rate = mean(strikes == 2 & abs_strike),
abs_bb_rate = mean(balls == 3 & !abs_strike),
# Differences
k_rate_change = abs_k_rate - current_k_rate,
bb_rate_change = abs_bb_rate - current_bb_rate
)
print(abs_impact)
# Python: Compare human calls to ABS system
# Define ABS strike zone
def abs_call(plate_x, plate_z, sz_top, sz_bot):
"""Automated ball-strike system call (rulebook zone)"""
return in_strike_zone(plate_x, plate_z, sz_top, sz_bot)
# Apply ABS to data
abs_comparison = called_pitches.copy()
abs_comparison['abs_strike'] = abs_call(
abs_comparison['plate_x'].values,
abs_comparison['plate_z'].values,
abs_comparison['sz_top'].values,
abs_comparison['sz_bot'].values
)
abs_comparison['human_strike'] = abs_comparison['called_strike'] == 1
# Agreement/disagreement
abs_comparison['calls_agree'] = (
abs_comparison['abs_strike'] == abs_comparison['human_strike']
)
abs_comparison['abs_more_lenient'] = (
~abs_comparison['abs_strike'] & abs_comparison['human_strike']
)
abs_comparison['abs_more_strict'] = (
abs_comparison['abs_strike'] & ~abs_comparison['human_strike']
)
# Overall agreement
agreement_summary = {
'total_pitches': len(abs_comparison),
'agreement_rate': abs_comparison['calls_agree'].mean(),
'abs_more_lenient_pct': abs_comparison['abs_more_lenient'].mean(),
'abs_more_strict_pct': abs_comparison['abs_more_strict'].mean()
}
print("Human vs ABS Agreement:")
for key, value in agreement_summary.items():
print(f" {key}: {value:.4f}" if isinstance(value, float) else f" {key}: {value}")
# Agreement by umpire
umpire_abs_comparison = abs_comparison.groupby('umpire').agg({
'calls_agree': ['count', 'mean'],
'abs_more_lenient': 'sum',
'abs_more_strict': 'sum'
}).reset_index()
umpire_abs_comparison.columns = ['umpire', 'pitches', 'agreement_rate',
'abs_more_lenient', 'abs_more_strict']
umpire_abs_comparison['net_stricter_than_abs'] = (
umpire_abs_comparison['abs_more_strict'] - umpire_abs_comparison['abs_more_lenient']
)
umpire_abs_comparison = umpire_abs_comparison[
umpire_abs_comparison['pitches'] >= 1000
].sort_values('agreement_rate')
print("\nUmpire vs ABS Agreement Rates:")
print(umpire_abs_comparison.head(10))
# Visualize disagreement locations
fig, ax = plt.subplots(figsize=(10, 10))
disagreements = abs_comparison[~abs_comparison['calls_agree']]
colors = {'ABS More Lenient': 'blue', 'ABS More Strict': 'red'}
for disagreement_type, color in colors.items():
if disagreement_type == 'ABS More Lenient':
data = disagreements[disagreements['abs_more_lenient']]
else:
data = disagreements[disagreements['abs_more_strict']]
ax.scatter(data['plate_x'], data['plate_z'],
c=color, alpha=0.3, s=1, label=disagreement_type)
plot_strike_zone_base(ax, abs_comparison['sz_top'].mean(),
abs_comparison['sz_bot'].mean())
ax.set_xlim(-2, 2)
ax.set_ylim(0, 5)
ax.set_title('Human-ABS Disagreements by Location', fontsize=14, fontweight='bold')
ax.legend()
plt.tight_layout()
plt.show()
# Impact on game statistics
def calculate_abs_impact(data):
"""Calculate how ABS would change strikeout and walk rates"""
# Filter to potential K/BB situations
potential_k = data[(data['strikes'] == 2)]
potential_bb = data[(data['balls'] == 3)]
impact = {
'current_k_rate': potential_k['human_strike'].mean(),
'abs_k_rate': potential_k['abs_strike'].mean(),
'current_bb_rate': (~potential_bb['human_strike']).mean(),
'abs_bb_rate': (~potential_bb['abs_strike']).mean(),
}
impact['k_rate_change'] = impact['abs_k_rate'] - impact['current_k_rate']
impact['bb_rate_change'] = impact['abs_bb_rate'] - impact['current_bb_rate']
return impact
abs_impact = calculate_abs_impact(abs_comparison)
print("\nProjected Impact of ABS on Game Outcomes:")
for key, value in abs_impact.items():
print(f" {key}: {value:.4f}")
ABS Challenge System Analysis
The challenge system allows teams to strategically contest calls. Let's analyze which situations would benefit most from challenges:
# R: Analyze optimal challenge strategy
# Identify calls that would be overturned by ABS
challenge_data <- abs_comparison %>%
filter(!calls_agree) %>%
mutate(
# Would challenge succeed?
challenge_success = TRUE,
# Leverage of situation
leverage_score = case_when(
strikes == 2 & balls == 3 ~ 5, # Full count
strikes == 2 ~ 4, # Two-strike count
balls == 3 ~ 3, # Three-ball count
TRUE ~ 1
),
# Should challenge?
worth_challenging = leverage_score >= 3
)
# Challenge value by situation
challenge_value <- challenge_data %>%
group_by(balls, strikes) %>%
summarise(
incorrect_calls = n(),
avg_leverage = mean(leverage_score),
pct_worth_challenge = mean(worth_challenging),
.groups = "drop"
) %>%
arrange(desc(avg_leverage))
print(challenge_value)
# Expected number of successful challenges per game
challenges_per_game <- challenge_data %>%
group_by(game_date, umpire) %>%
summarise(
total_incorrect = n(),
high_leverage_incorrect = sum(worth_challenging),
.groups = "drop"
) %>%
summarise(
avg_incorrect_per_game = mean(total_incorrect),
avg_challengeable_per_game = mean(high_leverage_incorrect)
)
print(challenges_per_game)
# R: Simulate ABS system and compare to human calls
# Define ABS strike zone (strictly rulebook-based)
abs_call <- function(plate_x, plate_z, sz_top, sz_bot) {
in_strike_zone(plate_x, plate_z, sz_top, sz_bot)
}
# Apply ABS to our data
abs_comparison <- called_pitches %>%
mutate(
abs_strike = abs_call(plate_x, plate_z, sz_top, sz_bot),
human_strike = called_strike == 1,
# Agreement/disagreement
calls_agree = abs_strike == human_strike,
abs_more_lenient = !abs_strike & human_strike, # Human called strike, ABS would call ball
abs_more_strict = abs_strike & !human_strike, # Human called ball, ABS would call strike
disagreement_type = case_when(
calls_agree ~ "Agreement",
abs_more_lenient ~ "ABS More Lenient",
abs_more_strict ~ "ABS More Strict"
)
)
# Overall agreement rate
agreement_summary <- abs_comparison %>%
summarise(
total_pitches = n(),
agreement_rate = mean(calls_agree),
abs_more_lenient_pct = mean(abs_more_lenient),
abs_more_strict_pct = mean(abs_more_strict)
)
print(agreement_summary)
# Agreement by umpire
umpire_abs_comparison <- abs_comparison %>%
group_by(umpire) %>%
summarise(
pitches = n(),
agreement_rate = mean(calls_agree),
abs_more_lenient = sum(abs_more_lenient),
abs_more_strict = sum(abs_more_strict),
net_stricter_than_abs = sum(abs_more_strict) - sum(abs_more_lenient)
) %>%
filter(pitches >= 1000) %>%
arrange(agreement_rate)
print(head(umpire_abs_comparison, 10))
# Visualize disagreement locations
ggplot(abs_comparison %>% filter(!calls_agree),
aes(x = plate_x, y = plate_z, color = disagreement_type)) +
geom_point(alpha = 0.3, size = 0.5) +
geom_rect(aes(xmin = -0.708, xmax = 0.708,
ymin = mean(sz_bot), ymax = mean(sz_top)),
fill = NA, color = "black", linewidth = 1, inherit.aes = FALSE) +
scale_color_manual(values = c("ABS More Lenient" = "blue",
"ABS More Strict" = "red")) +
coord_fixed(ratio = 1) +
labs(title = "Human-ABS Disagreements by Location",
x = "Horizontal Location (ft)", y = "Vertical Location (ft)",
color = "Disagreement Type") +
theme_minimal()
# Impact of ABS on game statistics
abs_impact <- abs_comparison %>%
summarise(
# Current stats with human umps
current_k_rate = mean(strikes == 2 & human_strike),
current_bb_rate = mean(balls == 3 & !human_strike),
# Projected stats with ABS
abs_k_rate = mean(strikes == 2 & abs_strike),
abs_bb_rate = mean(balls == 3 & !abs_strike),
# Differences
k_rate_change = abs_k_rate - current_k_rate,
bb_rate_change = abs_bb_rate - current_bb_rate
)
print(abs_impact)
# R: Analyze optimal challenge strategy
# Identify calls that would be overturned by ABS
challenge_data <- abs_comparison %>%
filter(!calls_agree) %>%
mutate(
# Would challenge succeed?
challenge_success = TRUE,
# Leverage of situation
leverage_score = case_when(
strikes == 2 & balls == 3 ~ 5, # Full count
strikes == 2 ~ 4, # Two-strike count
balls == 3 ~ 3, # Three-ball count
TRUE ~ 1
),
# Should challenge?
worth_challenging = leverage_score >= 3
)
# Challenge value by situation
challenge_value <- challenge_data %>%
group_by(balls, strikes) %>%
summarise(
incorrect_calls = n(),
avg_leverage = mean(leverage_score),
pct_worth_challenge = mean(worth_challenging),
.groups = "drop"
) %>%
arrange(desc(avg_leverage))
print(challenge_value)
# Expected number of successful challenges per game
challenges_per_game <- challenge_data %>%
group_by(game_date, umpire) %>%
summarise(
total_incorrect = n(),
high_leverage_incorrect = sum(worth_challenging),
.groups = "drop"
) %>%
summarise(
avg_incorrect_per_game = mean(total_incorrect),
avg_challengeable_per_game = mean(high_leverage_incorrect)
)
print(challenges_per_game)
# Python: Compare human calls to ABS system
# Define ABS strike zone
def abs_call(plate_x, plate_z, sz_top, sz_bot):
"""Automated ball-strike system call (rulebook zone)"""
return in_strike_zone(plate_x, plate_z, sz_top, sz_bot)
# Apply ABS to data
abs_comparison = called_pitches.copy()
abs_comparison['abs_strike'] = abs_call(
abs_comparison['plate_x'].values,
abs_comparison['plate_z'].values,
abs_comparison['sz_top'].values,
abs_comparison['sz_bot'].values
)
abs_comparison['human_strike'] = abs_comparison['called_strike'] == 1
# Agreement/disagreement
abs_comparison['calls_agree'] = (
abs_comparison['abs_strike'] == abs_comparison['human_strike']
)
abs_comparison['abs_more_lenient'] = (
~abs_comparison['abs_strike'] & abs_comparison['human_strike']
)
abs_comparison['abs_more_strict'] = (
abs_comparison['abs_strike'] & ~abs_comparison['human_strike']
)
# Overall agreement
agreement_summary = {
'total_pitches': len(abs_comparison),
'agreement_rate': abs_comparison['calls_agree'].mean(),
'abs_more_lenient_pct': abs_comparison['abs_more_lenient'].mean(),
'abs_more_strict_pct': abs_comparison['abs_more_strict'].mean()
}
print("Human vs ABS Agreement:")
for key, value in agreement_summary.items():
print(f" {key}: {value:.4f}" if isinstance(value, float) else f" {key}: {value}")
# Agreement by umpire
umpire_abs_comparison = abs_comparison.groupby('umpire').agg({
'calls_agree': ['count', 'mean'],
'abs_more_lenient': 'sum',
'abs_more_strict': 'sum'
}).reset_index()
umpire_abs_comparison.columns = ['umpire', 'pitches', 'agreement_rate',
'abs_more_lenient', 'abs_more_strict']
umpire_abs_comparison['net_stricter_than_abs'] = (
umpire_abs_comparison['abs_more_strict'] - umpire_abs_comparison['abs_more_lenient']
)
umpire_abs_comparison = umpire_abs_comparison[
umpire_abs_comparison['pitches'] >= 1000
].sort_values('agreement_rate')
print("\nUmpire vs ABS Agreement Rates:")
print(umpire_abs_comparison.head(10))
# Visualize disagreement locations
fig, ax = plt.subplots(figsize=(10, 10))
disagreements = abs_comparison[~abs_comparison['calls_agree']]
colors = {'ABS More Lenient': 'blue', 'ABS More Strict': 'red'}
for disagreement_type, color in colors.items():
if disagreement_type == 'ABS More Lenient':
data = disagreements[disagreements['abs_more_lenient']]
else:
data = disagreements[disagreements['abs_more_strict']]
ax.scatter(data['plate_x'], data['plate_z'],
c=color, alpha=0.3, s=1, label=disagreement_type)
plot_strike_zone_base(ax, abs_comparison['sz_top'].mean(),
abs_comparison['sz_bot'].mean())
ax.set_xlim(-2, 2)
ax.set_ylim(0, 5)
ax.set_title('Human-ABS Disagreements by Location', fontsize=14, fontweight='bold')
ax.legend()
plt.tight_layout()
plt.show()
# Impact on game statistics
def calculate_abs_impact(data):
"""Calculate how ABS would change strikeout and walk rates"""
# Filter to potential K/BB situations
potential_k = data[(data['strikes'] == 2)]
potential_bb = data[(data['balls'] == 3)]
impact = {
'current_k_rate': potential_k['human_strike'].mean(),
'abs_k_rate': potential_k['abs_strike'].mean(),
'current_bb_rate': (~potential_bb['human_strike']).mean(),
'abs_bb_rate': (~potential_bb['abs_strike']).mean(),
}
impact['k_rate_change'] = impact['abs_k_rate'] - impact['current_k_rate']
impact['bb_rate_change'] = impact['abs_bb_rate'] - impact['current_bb_rate']
return impact
abs_impact = calculate_abs_impact(abs_comparison)
print("\nProjected Impact of ABS on Game Outcomes:")
for key, value in abs_impact.items():
print(f" {key}: {value:.4f}")
Interactive strike zone visualization tools represent the cutting edge of umpire analysis, enabling broadcasters, teams, and fans to explore umpire tendencies with unprecedented clarity. While static heat maps and accuracy tables provide valuable snapshots, interactive tools allow users to dynamically filter by game situation, compare multiple umpires simultaneously, and visualize temporal trends in real-time. This section demonstrates how to build professional-grade interactive strike zone analysis tools using Plotly and modern web visualization frameworks.
Interactive umpire analysis tools serve multiple stakeholders:
- Broadcast Integration: Real-time overlays showing umpire tendencies during live games
- Team Preparation: Pre-game analysis identifying strategic opportunities based on umpire assignment
- League Evaluation: Performance monitoring systems for umpire development and playoff assignments
- Public Transparency: Fan-facing tools that increase understanding of ball-strike calling patterns
- Academic Research: Comprehensive datasets for studying human decision-making under pressure
Interactive Strike Zone Overlay Comparing Umpires
The strike zone overlay visualization enables direct comparison between umpires by displaying their called strike zones side-by-side or overlapped with adjustable transparency. Users can toggle between different umpires, filter by pitcher/batter handedness, and examine specific count situations.
# R: Interactive Strike Zone Overlay with Umpire Comparison
library(plotly)
library(tidyverse)
library(htmlwidgets)
# Function to calculate umpire strike zone boundaries
calculate_umpire_zone <- function(pitch_data, umpire_name, strike_threshold = 0.5) {
ump_data <- pitch_data %>%
filter(umpire == umpire_name)
# Create grid for probability calculation
x_seq <- seq(-2, 2, length.out = 40)
z_seq <- seq(0, 5, length.out = 50)
grid <- expand.grid(plate_x = x_seq, plate_z = z_seq)
# Calculate strike probability at each point using local averaging
grid$strike_prob <- sapply(1:nrow(grid), function(i) {
# Find pitches within 0.2 feet
nearby <- ump_data %>%
filter(abs(plate_x - grid$plate_x[i]) < 0.2,
abs(plate_z - grid$plate_z[i]) < 0.2)
if (nrow(nearby) >= 10) {
mean(nearby$called_strike, na.rm = TRUE)
} else {
NA
}
})
return(grid)
}
# Create interactive overlay comparison
create_umpire_overlay <- function(pitch_data, umpires_to_compare) {
fig <- plot_ly()
# Color palette for umpires
colors <- c('rgba(31, 119, 180, 0.6)', 'rgba(255, 127, 14, 0.6)',
'rgba(44, 160, 44, 0.6)', 'rgba(214, 39, 40, 0.6)')
# Add contour for each umpire
for (i in seq_along(umpires_to_compare)) {
ump_name <- umpires_to_compare[i]
zone_data <- calculate_umpire_zone(pitch_data, ump_name)
# Reshape for contour plot
strike_matrix <- matrix(zone_data$strike_prob,
nrow = length(unique(zone_data$plate_z)),
ncol = length(unique(zone_data$plate_x)))
fig <- fig %>%
add_contour(
x = unique(zone_data$plate_x),
y = unique(zone_data$plate_z),
z = t(strike_matrix),
contours = list(
start = 0.5,
end = 0.5,
size = 0.01,
showlabels = FALSE
),
line = list(color = colors[i], width = 3),
name = ump_name,
showscale = FALSE,
hovertemplate = paste0(
ump_name, "<br>",
"Location: (%{x:.2f}, %{y:.2f})<br>",
"Strike Prob: %{z:.1%}<extra></extra>"
)
)
}
# Add rulebook strike zone rectangle
fig <- fig %>%
add_segments(
x = -0.708, xend = 0.708, y = 1.5, yend = 1.5,
line = list(color = "black", width = 2, dash = "dash"),
name = "Rulebook Zone",
showlegend = TRUE,
inherit = FALSE
) %>%
add_segments(
x = -0.708, xend = 0.708, y = 3.5, yend = 3.5,
line = list(color = "black", width = 2, dash = "dash"),
showlegend = FALSE,
inherit = FALSE
) %>%
add_segments(
x = -0.708, xend = -0.708, y = 1.5, yend = 3.5,
line = list(color = "black", width = 2, dash = "dash"),
showlegend = FALSE,
inherit = FALSE
) %>%
add_segments(
x = 0.708, xend = 0.708, y = 1.5, yend = 3.5,
line = list(color = "black", width = 2, dash = "dash"),
showlegend = FALSE,
inherit = FALSE
)
# Layout configuration
fig <- fig %>%
layout(
title = list(
text = "<b>Umpire Strike Zone Comparison</b><br><sub>50% Called Strike Probability Contours</sub>",
x = 0.5,
xanchor = "center"
),
xaxis = list(
title = "Horizontal Location (feet)",
range = c(-1.5, 1.5),
constrain = "domain",
zeroline = FALSE
),
yaxis = list(
title = "Vertical Location (feet)",
range = c(1, 4),
scaleanchor = "x",
scaleratio = 1,
zeroline = FALSE
),
plot_bgcolor = "rgb(250, 250, 250)",
paper_bgcolor = "white",
legend = list(
x = 1.02,
y = 0.98,
xanchor = "left",
yanchor = "top"
),
hovermode = "closest"
) %>%
config(displayModeBar = TRUE)
return(fig)
}
# Generate sample data with umpire-specific zones
set.seed(42)
n_pitches <- 15000
sample_umpire_data <- tibble(
umpire = sample(c("Angel Hernandez", "Joe West", "Pat Hoberg"), n_pitches, replace = TRUE),
plate_x = rnorm(n_pitches, 0, 0.75),
plate_z = rnorm(n_pitches, 2.5, 0.7)
) %>%
mutate(
dist_from_center = sqrt(plate_x^2 + (plate_z - 2.5)^2),
# Base strike probability
base_prob = plogis(2 - 2.5 * dist_from_center),
# Umpire-specific adjustments
umpire_effect = case_when(
umpire == "Angel Hernandez" ~ -0.3, # Smaller zone
umpire == "Joe West" ~ 0.2, # Larger zone
umpire == "Pat Hoberg" ~ 0.05 # Accurate, slight expansion
),
# Add horizontal bias for variety
horizontal_bias = case_when(
umpire == "Angel Hernandez" ~ ifelse(plate_x > 0, -0.2, 0.1),
TRUE ~ 0
),
strike_prob = plogis(qlogis(base_prob) + umpire_effect + horizontal_bias),
called_strike = rbinom(n_pitches, 1, strike_prob)
)
# Create interactive overlay
umpire_overlay <- create_umpire_overlay(
sample_umpire_data,
c("Angel Hernandez", "Joe West", "Pat Hoberg")
)
umpire_overlay
# Save as HTML
htmlwidgets::saveWidget(umpire_overlay, "umpire_overlay.html", selfcontained = TRUE)
# Python: Interactive Strike Zone Overlay with Umpire Comparison
import plotly.graph_objects as go
import pandas as pd
import numpy as np
from scipy.stats import gaussian_kde
from scipy.ndimage import gaussian_filter
def calculate_umpire_zone(pitch_data, umpire_name, grid_size=40):
"""Calculate strike probability surface for an umpire"""
ump_data = pitch_data[pitch_data['umpire'] == umpire_name].copy()
# Create grid
x_range = np.linspace(-2, 2, grid_size)
z_range = np.linspace(0, 5, int(grid_size * 1.25))
X, Z = np.meshgrid(x_range, z_range)
# Calculate strike probability using 2D histogram with smoothing
from scipy.stats import binned_statistic_2d
strike_prob, x_edges, z_edges, _ = binned_statistic_2d(
ump_data['plate_x'], ump_data['plate_z'],
ump_data['called_strike'],
statistic='mean',
bins=[x_range, z_range]
)
# Apply Gaussian smoothing
strike_prob_smooth = gaussian_filter(strike_prob.T, sigma=1.5)
# Mask areas with insufficient data
counts, _, _, _ = binned_statistic_2d(
ump_data['plate_x'], ump_data['plate_z'],
ump_data['called_strike'],
statistic='count',
bins=[x_range, z_range]
)
strike_prob_smooth[counts.T < 5] = np.nan
return {
'x': x_range,
'z': z_range,
'strike_prob': strike_prob_smooth
}
def create_umpire_overlay(pitch_data, umpires_to_compare):
"""Create interactive overlay comparing umpire strike zones"""
fig = go.Figure()
# Color palette for umpires
colors = ['rgba(31, 119, 180, 0.8)', 'rgba(255, 127, 14, 0.8)',
'rgba(44, 160, 44, 0.8)', 'rgba(214, 39, 40, 0.8)']
# Add contour for each umpire
for i, ump_name in enumerate(umpires_to_compare):
zone_data = calculate_umpire_zone(pitch_data, ump_name)
# Add filled contour showing probability surface
fig.add_trace(go.Contour(
x=zone_data['x'],
y=zone_data['z'],
z=zone_data['strike_prob'],
name=ump_name,
contours=dict(
start=0,
end=1,
size=0.1,
showlabels=False,
coloring='none'
),
line=dict(width=0),
showscale=False,
hovertemplate=(
f"{ump_name}<br>" +
"Location: (%{x:.2f}, %{y:.2f})<br>" +
"Strike Prob: %{z:.1%}<extra></extra>"
),
visible=True
))
# Add 50% probability contour line (the "zone boundary")
fig.add_trace(go.Contour(
x=zone_data['x'],
y=zone_data['z'],
z=zone_data['strike_prob'],
name=f"{ump_name} Zone",
contours=dict(
start=0.5,
end=0.5,
size=0.01,
showlabels=False,
coloring='lines'
),
line=dict(color=colors[i], width=3),
showscale=False,
hoverinfo='skip'
))
# Add rulebook strike zone
zone_x = [-0.708, 0.708, 0.708, -0.708, -0.708]
zone_z = [1.5, 1.5, 3.5, 3.5, 1.5]
fig.add_trace(go.Scatter(
x=zone_x, y=zone_z,
mode='lines',
line=dict(color='black', width=2, dash='dash'),
name='Rulebook Zone',
hoverinfo='skip'
))
# Update layout
fig.update_layout(
title=dict(
text="<b>Umpire Strike Zone Comparison</b><br><sub>50% Called Strike Probability Contours</sub>",
x=0.5,
xanchor='center',
font=dict(size=18)
),
xaxis=dict(
title="Horizontal Location (feet)",
range=[-1.5, 1.5],
constrain='domain',
zeroline=False
),
yaxis=dict(
title="Vertical Location (feet)",
range=[1, 4],
scaleanchor="x",
scaleratio=1,
zeroline=False
),
plot_bgcolor='rgb(250, 250, 250)',
paper_bgcolor='white',
legend=dict(
x=1.02,
y=0.98,
xanchor='left',
yanchor='top'
),
hovermode='closest',
width=800,
height=800
)
return fig
# Generate sample data with umpire-specific zones
np.random.seed(42)
n_pitches = 15000
def inv_logit(x):
return 1 / (1 + np.exp(-x))
sample_umpire_data = pd.DataFrame({
'umpire': np.random.choice(['Angel Hernandez', 'Joe West', 'Pat Hoberg'], n_pitches),
'plate_x': np.random.normal(0, 0.75, n_pitches),
'plate_z': np.random.normal(2.5, 0.7, n_pitches)
})
sample_umpire_data['dist_from_center'] = np.sqrt(
sample_umpire_data['plate_x']**2 +
(sample_umpire_data['plate_z'] - 2.5)**2
)
# Umpire-specific effects
umpire_effects = {
'Angel Hernandez': -0.3, # Smaller zone
'Joe West': 0.2, # Larger zone
'Pat Hoberg': 0.05 # Accurate, slight expansion
}
sample_umpire_data['umpire_effect'] = sample_umpire_data['umpire'].map(umpire_effects)
# Add horizontal bias for Angel Hernandez
sample_umpire_data['horizontal_bias'] = np.where(
(sample_umpire_data['umpire'] == 'Angel Hernandez') & (sample_umpire_data['plate_x'] > 0),
-0.2,
np.where(
(sample_umpire_data['umpire'] == 'Angel Hernandez') & (sample_umpire_data['plate_x'] <= 0),
0.1,
0
)
)
base_logit = 2 - 2.5 * sample_umpire_data['dist_from_center']
strike_prob = inv_logit(
base_logit + sample_umpire_data['umpire_effect'] + sample_umpire_data['horizontal_bias']
)
sample_umpire_data['called_strike'] = np.random.binomial(1, strike_prob)
# Create interactive overlay
umpire_overlay = create_umpire_overlay(
sample_umpire_data,
['Angel Hernandez', 'Joe West', 'Pat Hoberg']
)
umpire_overlay.show()
# Save as HTML
umpire_overlay.write_html("umpire_overlay.html")
Called Strike Probability Surface (3D Plotly)
Three-dimensional visualizations of strike probability surfaces reveal the complete landscape of an umpire's zone, showing how strike likelihood varies continuously across horizontal and vertical dimensions. Interactive 3D plots allow rotation, zooming, and hover inspection of specific locations.
# R: 3D Strike Probability Surface
library(plotly)
create_3d_strike_surface <- function(pitch_data, umpire_name) {
# Filter to specific umpire
ump_data <- pitch_data %>%
filter(umpire == umpire_name)
# Create fine grid for smooth surface
x_seq <- seq(-1.5, 1.5, length.out = 30)
z_seq <- seq(1, 4, length.out = 40)
# Calculate strike probability using local regression
library(mgcv)
gam_model <- gam(called_strike ~ s(plate_x, plate_z, k = 50),
data = ump_data,
family = binomial)
# Predict on grid
grid <- expand.grid(plate_x = x_seq, plate_z = z_seq)
grid$strike_prob <- predict(gam_model, newdata = grid, type = "response")
# Reshape for 3D surface
strike_matrix <- matrix(grid$strike_prob,
nrow = length(z_seq),
ncol = length(x_seq),
byrow = FALSE)
# Create 3D surface plot
fig <- plot_ly(
x = x_seq,
y = z_seq,
z = strike_matrix,
type = "surface",
colorscale = list(
c(0, "rgb(220, 50, 50)"), # Red for low probability
c(0.5, "rgb(255, 255, 200)"), # Yellow for moderate
c(1, "rgb(50, 50, 220)") # Blue for high probability
),
colorbar = list(title = "Strike<br>Probability"),
hovertemplate = paste0(
"Horizontal: %{x:.2f} ft<br>",
"Vertical: %{y:.2f} ft<br>",
"Strike Prob: %{z:.1%}<extra></extra>"
)
) %>%
layout(
title = list(
text = paste0("<b>", umpire_name, " - 3D Strike Probability Surface</b>"),
x = 0.5,
xanchor = "center"
),
scene = list(
xaxis = list(title = "Horizontal Location (ft)", range = c(-1.5, 1.5)),
yaxis = list(title = "Vertical Location (ft)", range = c(1, 4)),
zaxis = list(title = "Strike Probability", range = c(0, 1)),
camera = list(
eye = list(x = 1.5, y = -1.5, z = 1.2)
),
aspectmode = "manual",
aspectratio = list(x = 1, y = 1.5, z = 0.7)
),
paper_bgcolor = "white"
) %>%
config(displayModeBar = TRUE)
return(fig)
}
# Create 3D surface for Joe West
surface_3d <- create_3d_strike_surface(sample_umpire_data, "Joe West")
surface_3d
# Save as HTML
htmlwidgets::saveWidget(surface_3d, "strike_surface_3d.html", selfcontained = TRUE)
# Python: 3D Strike Probability Surface
import plotly.graph_objects as go
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel
import pandas as pd
import numpy as np
def create_3d_strike_surface(pitch_data, umpire_name):
"""Create 3D surface plot of strike probability"""
# Filter to specific umpire
ump_data = pitch_data[pitch_data['umpire'] == umpire_name].copy()
# Create grid for surface
x_range = np.linspace(-1.5, 1.5, 30)
z_range = np.linspace(1, 4, 40)
X_grid, Z_grid = np.meshgrid(x_range, z_range)
# Fit Gaussian Process model for smooth probability surface
from sklearn.ensemble import GradientBoostingClassifier
# Prepare training data
X_train = ump_data[['plate_x', 'plate_z']].values
y_train = ump_data['called_strike'].values
# Train model
model = GradientBoostingClassifier(n_estimators=100, max_depth=5, random_state=42)
model.fit(X_train, y_train)
# Predict on grid
grid_points = np.column_stack([X_grid.ravel(), Z_grid.ravel()])
strike_prob = model.predict_proba(grid_points)[:, 1]
strike_prob_matrix = strike_prob.reshape(X_grid.shape)
# Create 3D surface
fig = go.Figure(data=[go.Surface(
x=x_range,
y=z_range,
z=strike_prob_matrix,
colorscale=[
[0, 'rgb(220, 50, 50)'], # Red for low probability
[0.5, 'rgb(255, 255, 200)'], # Yellow for moderate
[1, 'rgb(50, 50, 220)'] # Blue for high probability
],
colorbar=dict(title="Strike<br>Probability"),
hovertemplate=(
"Horizontal: %{x:.2f} ft<br>" +
"Vertical: %{y:.2f} ft<br>" +
"Strike Prob: %{z:.1%}<extra></extra>"
)
)])
# Add wireframe at 50% probability level
fig.add_trace(go.Surface(
x=x_range,
y=z_range,
z=np.full_like(strike_prob_matrix, 0.5),
opacity=0.3,
colorscale=[[0, 'gray'], [1, 'gray']],
showscale=False,
hoverinfo='skip',
name='50% Threshold'
))
# Update layout
fig.update_layout(
title=dict(
text=f"<b>{umpire_name} - 3D Strike Probability Surface</b>",
x=0.5,
xanchor='center',
font=dict(size=18)
),
scene=dict(
xaxis=dict(title="Horizontal Location (ft)", range=[-1.5, 1.5]),
yaxis=dict(title="Vertical Location (ft)", range=[1, 4]),
zaxis=dict(title="Strike Probability", range=[0, 1]),
camera=dict(
eye=dict(x=1.5, y=-1.5, z=1.2)
),
aspectmode="manual",
aspectratio=dict(x=1, y=1.5, z=0.7)
),
paper_bgcolor='white',
width=900,
height=700
)
return fig
# Create 3D surface for Joe West
surface_3d = create_3d_strike_surface(sample_umpire_data, 'Joe West')
surface_3d.show()
# Save as HTML
surface_3d.write_html("strike_surface_3d.html")
Animated Umpire Accuracy Over Game Progression
Umpire performance can drift over the course of a game due to fatigue, score effects, or recalibration. Animated visualizations show how accuracy and zone size evolve inning-by-inning, revealing patterns that static aggregates miss.
# R: Animated Umpire Accuracy Over Game Progression
library(plotly)
library(tidyverse)
create_accuracy_animation <- function(pitch_data) {
# Calculate accuracy by inning for each umpire
inning_accuracy <- pitch_data %>%
group_by(umpire, inning) %>%
summarise(
pitches = n(),
accuracy = mean(correct_call, na.rm = TRUE),
strike_rate = mean(called_strike, na.rm = TRUE),
zone_expansion = mean(called_strike[!in_zone], na.rm = TRUE),
.groups = "drop"
) %>%
arrange(umpire, inning)
# Calculate cumulative accuracy
inning_accuracy <- inning_accuracy %>%
group_by(umpire) %>%
mutate(
cumulative_accuracy = cummean(accuracy),
cumulative_pitches = cumsum(pitches)
) %>%
ungroup()
# Create animated scatter plot
fig <- plot_ly(
inning_accuracy,
x = ~inning,
y = ~accuracy,
size = ~pitches,
color = ~umpire,
frame = ~inning,
text = ~paste(
"Umpire:", umpire, "<br>",
"Inning:", inning, "<br>",
"Accuracy:", scales::percent(accuracy, 0.1), "<br>",
"Pitches:", pitches, "<br>",
"Strike Rate:", scales::percent(strike_rate, 0.1)
),
hoverinfo = "text",
type = "scatter",
mode = "markers+lines",
marker = list(
sizemode = "diameter",
sizeref = 2,
opacity = 0.7
)
) %>%
layout(
title = list(
text = "<b>Umpire Accuracy Progression Through Game</b>",
x = 0.5,
xanchor = "center"
),
xaxis = list(
title = "Inning",
range = c(0.5, 9.5)
),
yaxis = list(
title = "Accuracy Rate",
range = c(0.80, 1.00),
tickformat = ".0%"
),
hovermode = "closest",
showlegend = TRUE
) %>%
animation_opts(
frame = 500,
transition = 300,
redraw = FALSE
) %>%
animation_button(
x = 1, xanchor = "right",
y = 0, yanchor = "bottom"
) %>%
animation_slider(
currentvalue = list(
prefix = "Inning: ",
font = list(color = "black")
)
)
return(fig)
}
# Generate sample game progression data
set.seed(123)
n_games <- 50
game_progression_data <- expand_grid(
game_id = 1:n_games,
inning = 1:9,
umpire = c("Angel Hernandez", "Joe West", "Pat Hoberg")
) %>%
mutate(
# Simulate pitches per inning
pitches = rpois(n(), 15),
# Base accuracy with fatigue effect
base_accuracy = 0.92 - (inning - 5) * 0.005,
# Umpire-specific accuracy
umpire_accuracy_adj = case_when(
umpire == "Pat Hoberg" ~ 0.04,
umpire == "Joe West" ~ 0.00,
umpire == "Angel Hernandez" ~ -0.03
),
# Random game-to-game variation
game_variation = rnorm(n(), 0, 0.02),
# Final accuracy
accuracy = pmin(0.99, pmax(0.80,
base_accuracy + umpire_accuracy_adj + game_variation
))
) %>%
# Add other metrics
mutate(
correct_calls = rbinom(n(), pitches, accuracy),
called_strike = rbinom(n(), pitches, 0.15),
in_zone = rbinom(n(), called_strike, 0.85),
correct_call = correct_calls / pitches,
called_strike = called_strike / pitches
)
# Create animation
accuracy_animation <- create_accuracy_animation(game_progression_data)
accuracy_animation
# Save as HTML
htmlwidgets::saveWidget(accuracy_animation, "umpire_accuracy_animation.html",
selfcontained = TRUE)
# Python: Animated Umpire Accuracy Over Game Progression
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
def create_accuracy_animation(pitch_data):
"""Create animated plot of umpire accuracy over game progression"""
# Calculate accuracy by inning for each umpire
inning_accuracy = pitch_data.groupby(['umpire', 'inning']).agg({
'correct_call': ['count', 'mean'],
'called_strike': 'mean'
}).reset_index()
inning_accuracy.columns = ['umpire', 'inning', 'pitches', 'accuracy', 'strike_rate']
# Calculate zone expansion (strikes called outside zone)
zone_expansion = pitch_data[~pitch_data['in_zone']].groupby(['umpire', 'inning'])['called_strike'].mean()
inning_accuracy = inning_accuracy.merge(
zone_expansion.reset_index().rename(columns={'called_strike': 'zone_expansion'}),
on=['umpire', 'inning'],
how='left'
)
# Calculate cumulative accuracy
inning_accuracy = inning_accuracy.sort_values(['umpire', 'inning'])
inning_accuracy['cumulative_accuracy'] = inning_accuracy.groupby('umpire')['accuracy'].transform(
lambda x: x.expanding().mean()
)
inning_accuracy['cumulative_pitches'] = inning_accuracy.groupby('umpire')['pitches'].cumsum()
# Create animated scatter plot
fig = px.scatter(
inning_accuracy,
x='inning',
y='accuracy',
color='umpire',
size='pitches',
animation_frame='inning',
animation_group='umpire',
hover_data={
'accuracy': ':.1%',
'strike_rate': ':.1%',
'pitches': True,
'inning': True
},
range_x=[0.5, 9.5],
range_y=[0.80, 1.00],
labels={
'inning': 'Inning',
'accuracy': 'Accuracy Rate',
'umpire': 'Umpire'
},
title="<b>Umpire Accuracy Progression Through Game</b>"
)
# Add trend lines for each umpire
for umpire in inning_accuracy['umpire'].unique():
ump_data = inning_accuracy[inning_accuracy['umpire'] == umpire]
# Fit linear trend
z = np.polyfit(ump_data['inning'], ump_data['accuracy'], 1)
p = np.poly1d(z)
trend_y = p(ump_data['inning'])
fig.add_trace(go.Scatter(
x=ump_data['inning'],
y=trend_y,
mode='lines',
line=dict(dash='dash', width=1),
name=f'{umpire} Trend',
showlegend=True,
hoverinfo='skip'
))
# Update layout
fig.update_layout(
title=dict(
x=0.5,
xanchor='center',
font=dict(size=18)
),
xaxis=dict(title="Inning"),
yaxis=dict(title="Accuracy Rate", tickformat='.0%'),
hovermode='closest',
width=1000,
height=600,
paper_bgcolor='white'
)
# Update animation settings
fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 500
fig.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 300
return fig
# Generate sample game progression data
np.random.seed(123)
n_games = 50
game_ids = np.repeat(range(1, n_games + 1), 9 * 3)
innings = np.tile(np.repeat(range(1, 10), 3), n_games)
umpires = np.tile(['Angel Hernandez', 'Joe West', 'Pat Hoberg'], n_games * 9)
game_progression_data = pd.DataFrame({
'game_id': game_ids,
'inning': innings,
'umpire': umpires
})
# Simulate pitches per inning
game_progression_data['pitches'] = np.random.poisson(15, len(game_progression_data))
# Base accuracy with fatigue effect
game_progression_data['base_accuracy'] = 0.92 - (game_progression_data['inning'] - 5) * 0.005
# Umpire-specific accuracy adjustments
umpire_adj = {
'Pat Hoberg': 0.04,
'Joe West': 0.00,
'Angel Hernandez': -0.03
}
game_progression_data['umpire_accuracy_adj'] = game_progression_data['umpire'].map(umpire_adj)
# Random variation
game_progression_data['game_variation'] = np.random.normal(0, 0.02, len(game_progression_data))
# Final accuracy
game_progression_data['accuracy'] = np.clip(
game_progression_data['base_accuracy'] +
game_progression_data['umpire_accuracy_adj'] +
game_progression_data['game_variation'],
0.80, 0.99
)
# Simulate other metrics
game_progression_data['correct_calls'] = np.random.binomial(
game_progression_data['pitches'],
game_progression_data['accuracy']
)
game_progression_data['called_strike'] = np.random.binomial(
game_progression_data['pitches'],
0.15
)
game_progression_data['in_zone'] = np.random.binomial(
game_progression_data['called_strike'],
0.85
).astype(bool)
game_progression_data['correct_call'] = (
game_progression_data['correct_calls'] / game_progression_data['pitches']
)
game_progression_data['called_strike'] = (
game_progression_data['called_strike'] / game_progression_data['pitches']
)
# Create animation
accuracy_animation = create_accuracy_animation(game_progression_data)
accuracy_animation.show()
# Save as HTML
accuracy_animation.write_html("umpire_accuracy_animation.html")
These interactive strike zone visualization tools represent the state-of-the-art in umpire analysis. The overlay comparison enables direct evaluation of zone shape differences between umpires, the 3D probability surface reveals the complete decision landscape, and the animated accuracy tracker shows temporal patterns that inform our understanding of human performance under sustained pressure. By deploying these tools in broadcast systems, team analytics platforms, and public-facing websites, stakeholders gain unprecedented insight into the human element of baseball's most frequent and consequential decisions. As MLB continues to evaluate automated ball-strike systems, these visualizations provide essential context for understanding what we gain and lose by removing human judgment from the game.
# R: Interactive Strike Zone Overlay with Umpire Comparison
library(plotly)
library(tidyverse)
library(htmlwidgets)
# Function to calculate umpire strike zone boundaries
calculate_umpire_zone <- function(pitch_data, umpire_name, strike_threshold = 0.5) {
ump_data <- pitch_data %>%
filter(umpire == umpire_name)
# Create grid for probability calculation
x_seq <- seq(-2, 2, length.out = 40)
z_seq <- seq(0, 5, length.out = 50)
grid <- expand.grid(plate_x = x_seq, plate_z = z_seq)
# Calculate strike probability at each point using local averaging
grid$strike_prob <- sapply(1:nrow(grid), function(i) {
# Find pitches within 0.2 feet
nearby <- ump_data %>%
filter(abs(plate_x - grid$plate_x[i]) < 0.2,
abs(plate_z - grid$plate_z[i]) < 0.2)
if (nrow(nearby) >= 10) {
mean(nearby$called_strike, na.rm = TRUE)
} else {
NA
}
})
return(grid)
}
# Create interactive overlay comparison
create_umpire_overlay <- function(pitch_data, umpires_to_compare) {
fig <- plot_ly()
# Color palette for umpires
colors <- c('rgba(31, 119, 180, 0.6)', 'rgba(255, 127, 14, 0.6)',
'rgba(44, 160, 44, 0.6)', 'rgba(214, 39, 40, 0.6)')
# Add contour for each umpire
for (i in seq_along(umpires_to_compare)) {
ump_name <- umpires_to_compare[i]
zone_data <- calculate_umpire_zone(pitch_data, ump_name)
# Reshape for contour plot
strike_matrix <- matrix(zone_data$strike_prob,
nrow = length(unique(zone_data$plate_z)),
ncol = length(unique(zone_data$plate_x)))
fig <- fig %>%
add_contour(
x = unique(zone_data$plate_x),
y = unique(zone_data$plate_z),
z = t(strike_matrix),
contours = list(
start = 0.5,
end = 0.5,
size = 0.01,
showlabels = FALSE
),
line = list(color = colors[i], width = 3),
name = ump_name,
showscale = FALSE,
hovertemplate = paste0(
ump_name, "<br>",
"Location: (%{x:.2f}, %{y:.2f})<br>",
"Strike Prob: %{z:.1%}<extra></extra>"
)
)
}
# Add rulebook strike zone rectangle
fig <- fig %>%
add_segments(
x = -0.708, xend = 0.708, y = 1.5, yend = 1.5,
line = list(color = "black", width = 2, dash = "dash"),
name = "Rulebook Zone",
showlegend = TRUE,
inherit = FALSE
) %>%
add_segments(
x = -0.708, xend = 0.708, y = 3.5, yend = 3.5,
line = list(color = "black", width = 2, dash = "dash"),
showlegend = FALSE,
inherit = FALSE
) %>%
add_segments(
x = -0.708, xend = -0.708, y = 1.5, yend = 3.5,
line = list(color = "black", width = 2, dash = "dash"),
showlegend = FALSE,
inherit = FALSE
) %>%
add_segments(
x = 0.708, xend = 0.708, y = 1.5, yend = 3.5,
line = list(color = "black", width = 2, dash = "dash"),
showlegend = FALSE,
inherit = FALSE
)
# Layout configuration
fig <- fig %>%
layout(
title = list(
text = "<b>Umpire Strike Zone Comparison</b><br><sub>50% Called Strike Probability Contours</sub>",
x = 0.5,
xanchor = "center"
),
xaxis = list(
title = "Horizontal Location (feet)",
range = c(-1.5, 1.5),
constrain = "domain",
zeroline = FALSE
),
yaxis = list(
title = "Vertical Location (feet)",
range = c(1, 4),
scaleanchor = "x",
scaleratio = 1,
zeroline = FALSE
),
plot_bgcolor = "rgb(250, 250, 250)",
paper_bgcolor = "white",
legend = list(
x = 1.02,
y = 0.98,
xanchor = "left",
yanchor = "top"
),
hovermode = "closest"
) %>%
config(displayModeBar = TRUE)
return(fig)
}
# Generate sample data with umpire-specific zones
set.seed(42)
n_pitches <- 15000
sample_umpire_data <- tibble(
umpire = sample(c("Angel Hernandez", "Joe West", "Pat Hoberg"), n_pitches, replace = TRUE),
plate_x = rnorm(n_pitches, 0, 0.75),
plate_z = rnorm(n_pitches, 2.5, 0.7)
) %>%
mutate(
dist_from_center = sqrt(plate_x^2 + (plate_z - 2.5)^2),
# Base strike probability
base_prob = plogis(2 - 2.5 * dist_from_center),
# Umpire-specific adjustments
umpire_effect = case_when(
umpire == "Angel Hernandez" ~ -0.3, # Smaller zone
umpire == "Joe West" ~ 0.2, # Larger zone
umpire == "Pat Hoberg" ~ 0.05 # Accurate, slight expansion
),
# Add horizontal bias for variety
horizontal_bias = case_when(
umpire == "Angel Hernandez" ~ ifelse(plate_x > 0, -0.2, 0.1),
TRUE ~ 0
),
strike_prob = plogis(qlogis(base_prob) + umpire_effect + horizontal_bias),
called_strike = rbinom(n_pitches, 1, strike_prob)
)
# Create interactive overlay
umpire_overlay <- create_umpire_overlay(
sample_umpire_data,
c("Angel Hernandez", "Joe West", "Pat Hoberg")
)
umpire_overlay
# Save as HTML
htmlwidgets::saveWidget(umpire_overlay, "umpire_overlay.html", selfcontained = TRUE)
# R: 3D Strike Probability Surface
library(plotly)
create_3d_strike_surface <- function(pitch_data, umpire_name) {
# Filter to specific umpire
ump_data <- pitch_data %>%
filter(umpire == umpire_name)
# Create fine grid for smooth surface
x_seq <- seq(-1.5, 1.5, length.out = 30)
z_seq <- seq(1, 4, length.out = 40)
# Calculate strike probability using local regression
library(mgcv)
gam_model <- gam(called_strike ~ s(plate_x, plate_z, k = 50),
data = ump_data,
family = binomial)
# Predict on grid
grid <- expand.grid(plate_x = x_seq, plate_z = z_seq)
grid$strike_prob <- predict(gam_model, newdata = grid, type = "response")
# Reshape for 3D surface
strike_matrix <- matrix(grid$strike_prob,
nrow = length(z_seq),
ncol = length(x_seq),
byrow = FALSE)
# Create 3D surface plot
fig <- plot_ly(
x = x_seq,
y = z_seq,
z = strike_matrix,
type = "surface",
colorscale = list(
c(0, "rgb(220, 50, 50)"), # Red for low probability
c(0.5, "rgb(255, 255, 200)"), # Yellow for moderate
c(1, "rgb(50, 50, 220)") # Blue for high probability
),
colorbar = list(title = "Strike<br>Probability"),
hovertemplate = paste0(
"Horizontal: %{x:.2f} ft<br>",
"Vertical: %{y:.2f} ft<br>",
"Strike Prob: %{z:.1%}<extra></extra>"
)
) %>%
layout(
title = list(
text = paste0("<b>", umpire_name, " - 3D Strike Probability Surface</b>"),
x = 0.5,
xanchor = "center"
),
scene = list(
xaxis = list(title = "Horizontal Location (ft)", range = c(-1.5, 1.5)),
yaxis = list(title = "Vertical Location (ft)", range = c(1, 4)),
zaxis = list(title = "Strike Probability", range = c(0, 1)),
camera = list(
eye = list(x = 1.5, y = -1.5, z = 1.2)
),
aspectmode = "manual",
aspectratio = list(x = 1, y = 1.5, z = 0.7)
),
paper_bgcolor = "white"
) %>%
config(displayModeBar = TRUE)
return(fig)
}
# Create 3D surface for Joe West
surface_3d <- create_3d_strike_surface(sample_umpire_data, "Joe West")
surface_3d
# Save as HTML
htmlwidgets::saveWidget(surface_3d, "strike_surface_3d.html", selfcontained = TRUE)
# R: Animated Umpire Accuracy Over Game Progression
library(plotly)
library(tidyverse)
create_accuracy_animation <- function(pitch_data) {
# Calculate accuracy by inning for each umpire
inning_accuracy <- pitch_data %>%
group_by(umpire, inning) %>%
summarise(
pitches = n(),
accuracy = mean(correct_call, na.rm = TRUE),
strike_rate = mean(called_strike, na.rm = TRUE),
zone_expansion = mean(called_strike[!in_zone], na.rm = TRUE),
.groups = "drop"
) %>%
arrange(umpire, inning)
# Calculate cumulative accuracy
inning_accuracy <- inning_accuracy %>%
group_by(umpire) %>%
mutate(
cumulative_accuracy = cummean(accuracy),
cumulative_pitches = cumsum(pitches)
) %>%
ungroup()
# Create animated scatter plot
fig <- plot_ly(
inning_accuracy,
x = ~inning,
y = ~accuracy,
size = ~pitches,
color = ~umpire,
frame = ~inning,
text = ~paste(
"Umpire:", umpire, "<br>",
"Inning:", inning, "<br>",
"Accuracy:", scales::percent(accuracy, 0.1), "<br>",
"Pitches:", pitches, "<br>",
"Strike Rate:", scales::percent(strike_rate, 0.1)
),
hoverinfo = "text",
type = "scatter",
mode = "markers+lines",
marker = list(
sizemode = "diameter",
sizeref = 2,
opacity = 0.7
)
) %>%
layout(
title = list(
text = "<b>Umpire Accuracy Progression Through Game</b>",
x = 0.5,
xanchor = "center"
),
xaxis = list(
title = "Inning",
range = c(0.5, 9.5)
),
yaxis = list(
title = "Accuracy Rate",
range = c(0.80, 1.00),
tickformat = ".0%"
),
hovermode = "closest",
showlegend = TRUE
) %>%
animation_opts(
frame = 500,
transition = 300,
redraw = FALSE
) %>%
animation_button(
x = 1, xanchor = "right",
y = 0, yanchor = "bottom"
) %>%
animation_slider(
currentvalue = list(
prefix = "Inning: ",
font = list(color = "black")
)
)
return(fig)
}
# Generate sample game progression data
set.seed(123)
n_games <- 50
game_progression_data <- expand_grid(
game_id = 1:n_games,
inning = 1:9,
umpire = c("Angel Hernandez", "Joe West", "Pat Hoberg")
) %>%
mutate(
# Simulate pitches per inning
pitches = rpois(n(), 15),
# Base accuracy with fatigue effect
base_accuracy = 0.92 - (inning - 5) * 0.005,
# Umpire-specific accuracy
umpire_accuracy_adj = case_when(
umpire == "Pat Hoberg" ~ 0.04,
umpire == "Joe West" ~ 0.00,
umpire == "Angel Hernandez" ~ -0.03
),
# Random game-to-game variation
game_variation = rnorm(n(), 0, 0.02),
# Final accuracy
accuracy = pmin(0.99, pmax(0.80,
base_accuracy + umpire_accuracy_adj + game_variation
))
) %>%
# Add other metrics
mutate(
correct_calls = rbinom(n(), pitches, accuracy),
called_strike = rbinom(n(), pitches, 0.15),
in_zone = rbinom(n(), called_strike, 0.85),
correct_call = correct_calls / pitches,
called_strike = called_strike / pitches
)
# Create animation
accuracy_animation <- create_accuracy_animation(game_progression_data)
accuracy_animation
# Save as HTML
htmlwidgets::saveWidget(accuracy_animation, "umpire_accuracy_animation.html",
selfcontained = TRUE)
# Python: Interactive Strike Zone Overlay with Umpire Comparison
import plotly.graph_objects as go
import pandas as pd
import numpy as np
from scipy.stats import gaussian_kde
from scipy.ndimage import gaussian_filter
def calculate_umpire_zone(pitch_data, umpire_name, grid_size=40):
"""Calculate strike probability surface for an umpire"""
ump_data = pitch_data[pitch_data['umpire'] == umpire_name].copy()
# Create grid
x_range = np.linspace(-2, 2, grid_size)
z_range = np.linspace(0, 5, int(grid_size * 1.25))
X, Z = np.meshgrid(x_range, z_range)
# Calculate strike probability using 2D histogram with smoothing
from scipy.stats import binned_statistic_2d
strike_prob, x_edges, z_edges, _ = binned_statistic_2d(
ump_data['plate_x'], ump_data['plate_z'],
ump_data['called_strike'],
statistic='mean',
bins=[x_range, z_range]
)
# Apply Gaussian smoothing
strike_prob_smooth = gaussian_filter(strike_prob.T, sigma=1.5)
# Mask areas with insufficient data
counts, _, _, _ = binned_statistic_2d(
ump_data['plate_x'], ump_data['plate_z'],
ump_data['called_strike'],
statistic='count',
bins=[x_range, z_range]
)
strike_prob_smooth[counts.T < 5] = np.nan
return {
'x': x_range,
'z': z_range,
'strike_prob': strike_prob_smooth
}
def create_umpire_overlay(pitch_data, umpires_to_compare):
"""Create interactive overlay comparing umpire strike zones"""
fig = go.Figure()
# Color palette for umpires
colors = ['rgba(31, 119, 180, 0.8)', 'rgba(255, 127, 14, 0.8)',
'rgba(44, 160, 44, 0.8)', 'rgba(214, 39, 40, 0.8)']
# Add contour for each umpire
for i, ump_name in enumerate(umpires_to_compare):
zone_data = calculate_umpire_zone(pitch_data, ump_name)
# Add filled contour showing probability surface
fig.add_trace(go.Contour(
x=zone_data['x'],
y=zone_data['z'],
z=zone_data['strike_prob'],
name=ump_name,
contours=dict(
start=0,
end=1,
size=0.1,
showlabels=False,
coloring='none'
),
line=dict(width=0),
showscale=False,
hovertemplate=(
f"{ump_name}<br>" +
"Location: (%{x:.2f}, %{y:.2f})<br>" +
"Strike Prob: %{z:.1%}<extra></extra>"
),
visible=True
))
# Add 50% probability contour line (the "zone boundary")
fig.add_trace(go.Contour(
x=zone_data['x'],
y=zone_data['z'],
z=zone_data['strike_prob'],
name=f"{ump_name} Zone",
contours=dict(
start=0.5,
end=0.5,
size=0.01,
showlabels=False,
coloring='lines'
),
line=dict(color=colors[i], width=3),
showscale=False,
hoverinfo='skip'
))
# Add rulebook strike zone
zone_x = [-0.708, 0.708, 0.708, -0.708, -0.708]
zone_z = [1.5, 1.5, 3.5, 3.5, 1.5]
fig.add_trace(go.Scatter(
x=zone_x, y=zone_z,
mode='lines',
line=dict(color='black', width=2, dash='dash'),
name='Rulebook Zone',
hoverinfo='skip'
))
# Update layout
fig.update_layout(
title=dict(
text="<b>Umpire Strike Zone Comparison</b><br><sub>50% Called Strike Probability Contours</sub>",
x=0.5,
xanchor='center',
font=dict(size=18)
),
xaxis=dict(
title="Horizontal Location (feet)",
range=[-1.5, 1.5],
constrain='domain',
zeroline=False
),
yaxis=dict(
title="Vertical Location (feet)",
range=[1, 4],
scaleanchor="x",
scaleratio=1,
zeroline=False
),
plot_bgcolor='rgb(250, 250, 250)',
paper_bgcolor='white',
legend=dict(
x=1.02,
y=0.98,
xanchor='left',
yanchor='top'
),
hovermode='closest',
width=800,
height=800
)
return fig
# Generate sample data with umpire-specific zones
np.random.seed(42)
n_pitches = 15000
def inv_logit(x):
return 1 / (1 + np.exp(-x))
sample_umpire_data = pd.DataFrame({
'umpire': np.random.choice(['Angel Hernandez', 'Joe West', 'Pat Hoberg'], n_pitches),
'plate_x': np.random.normal(0, 0.75, n_pitches),
'plate_z': np.random.normal(2.5, 0.7, n_pitches)
})
sample_umpire_data['dist_from_center'] = np.sqrt(
sample_umpire_data['plate_x']**2 +
(sample_umpire_data['plate_z'] - 2.5)**2
)
# Umpire-specific effects
umpire_effects = {
'Angel Hernandez': -0.3, # Smaller zone
'Joe West': 0.2, # Larger zone
'Pat Hoberg': 0.05 # Accurate, slight expansion
}
sample_umpire_data['umpire_effect'] = sample_umpire_data['umpire'].map(umpire_effects)
# Add horizontal bias for Angel Hernandez
sample_umpire_data['horizontal_bias'] = np.where(
(sample_umpire_data['umpire'] == 'Angel Hernandez') & (sample_umpire_data['plate_x'] > 0),
-0.2,
np.where(
(sample_umpire_data['umpire'] == 'Angel Hernandez') & (sample_umpire_data['plate_x'] <= 0),
0.1,
0
)
)
base_logit = 2 - 2.5 * sample_umpire_data['dist_from_center']
strike_prob = inv_logit(
base_logit + sample_umpire_data['umpire_effect'] + sample_umpire_data['horizontal_bias']
)
sample_umpire_data['called_strike'] = np.random.binomial(1, strike_prob)
# Create interactive overlay
umpire_overlay = create_umpire_overlay(
sample_umpire_data,
['Angel Hernandez', 'Joe West', 'Pat Hoberg']
)
umpire_overlay.show()
# Save as HTML
umpire_overlay.write_html("umpire_overlay.html")
# Python: 3D Strike Probability Surface
import plotly.graph_objects as go
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel
import pandas as pd
import numpy as np
def create_3d_strike_surface(pitch_data, umpire_name):
"""Create 3D surface plot of strike probability"""
# Filter to specific umpire
ump_data = pitch_data[pitch_data['umpire'] == umpire_name].copy()
# Create grid for surface
x_range = np.linspace(-1.5, 1.5, 30)
z_range = np.linspace(1, 4, 40)
X_grid, Z_grid = np.meshgrid(x_range, z_range)
# Fit Gaussian Process model for smooth probability surface
from sklearn.ensemble import GradientBoostingClassifier
# Prepare training data
X_train = ump_data[['plate_x', 'plate_z']].values
y_train = ump_data['called_strike'].values
# Train model
model = GradientBoostingClassifier(n_estimators=100, max_depth=5, random_state=42)
model.fit(X_train, y_train)
# Predict on grid
grid_points = np.column_stack([X_grid.ravel(), Z_grid.ravel()])
strike_prob = model.predict_proba(grid_points)[:, 1]
strike_prob_matrix = strike_prob.reshape(X_grid.shape)
# Create 3D surface
fig = go.Figure(data=[go.Surface(
x=x_range,
y=z_range,
z=strike_prob_matrix,
colorscale=[
[0, 'rgb(220, 50, 50)'], # Red for low probability
[0.5, 'rgb(255, 255, 200)'], # Yellow for moderate
[1, 'rgb(50, 50, 220)'] # Blue for high probability
],
colorbar=dict(title="Strike<br>Probability"),
hovertemplate=(
"Horizontal: %{x:.2f} ft<br>" +
"Vertical: %{y:.2f} ft<br>" +
"Strike Prob: %{z:.1%}<extra></extra>"
)
)])
# Add wireframe at 50% probability level
fig.add_trace(go.Surface(
x=x_range,
y=z_range,
z=np.full_like(strike_prob_matrix, 0.5),
opacity=0.3,
colorscale=[[0, 'gray'], [1, 'gray']],
showscale=False,
hoverinfo='skip',
name='50% Threshold'
))
# Update layout
fig.update_layout(
title=dict(
text=f"<b>{umpire_name} - 3D Strike Probability Surface</b>",
x=0.5,
xanchor='center',
font=dict(size=18)
),
scene=dict(
xaxis=dict(title="Horizontal Location (ft)", range=[-1.5, 1.5]),
yaxis=dict(title="Vertical Location (ft)", range=[1, 4]),
zaxis=dict(title="Strike Probability", range=[0, 1]),
camera=dict(
eye=dict(x=1.5, y=-1.5, z=1.2)
),
aspectmode="manual",
aspectratio=dict(x=1, y=1.5, z=0.7)
),
paper_bgcolor='white',
width=900,
height=700
)
return fig
# Create 3D surface for Joe West
surface_3d = create_3d_strike_surface(sample_umpire_data, 'Joe West')
surface_3d.show()
# Save as HTML
surface_3d.write_html("strike_surface_3d.html")
# Python: Animated Umpire Accuracy Over Game Progression
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
def create_accuracy_animation(pitch_data):
"""Create animated plot of umpire accuracy over game progression"""
# Calculate accuracy by inning for each umpire
inning_accuracy = pitch_data.groupby(['umpire', 'inning']).agg({
'correct_call': ['count', 'mean'],
'called_strike': 'mean'
}).reset_index()
inning_accuracy.columns = ['umpire', 'inning', 'pitches', 'accuracy', 'strike_rate']
# Calculate zone expansion (strikes called outside zone)
zone_expansion = pitch_data[~pitch_data['in_zone']].groupby(['umpire', 'inning'])['called_strike'].mean()
inning_accuracy = inning_accuracy.merge(
zone_expansion.reset_index().rename(columns={'called_strike': 'zone_expansion'}),
on=['umpire', 'inning'],
how='left'
)
# Calculate cumulative accuracy
inning_accuracy = inning_accuracy.sort_values(['umpire', 'inning'])
inning_accuracy['cumulative_accuracy'] = inning_accuracy.groupby('umpire')['accuracy'].transform(
lambda x: x.expanding().mean()
)
inning_accuracy['cumulative_pitches'] = inning_accuracy.groupby('umpire')['pitches'].cumsum()
# Create animated scatter plot
fig = px.scatter(
inning_accuracy,
x='inning',
y='accuracy',
color='umpire',
size='pitches',
animation_frame='inning',
animation_group='umpire',
hover_data={
'accuracy': ':.1%',
'strike_rate': ':.1%',
'pitches': True,
'inning': True
},
range_x=[0.5, 9.5],
range_y=[0.80, 1.00],
labels={
'inning': 'Inning',
'accuracy': 'Accuracy Rate',
'umpire': 'Umpire'
},
title="<b>Umpire Accuracy Progression Through Game</b>"
)
# Add trend lines for each umpire
for umpire in inning_accuracy['umpire'].unique():
ump_data = inning_accuracy[inning_accuracy['umpire'] == umpire]
# Fit linear trend
z = np.polyfit(ump_data['inning'], ump_data['accuracy'], 1)
p = np.poly1d(z)
trend_y = p(ump_data['inning'])
fig.add_trace(go.Scatter(
x=ump_data['inning'],
y=trend_y,
mode='lines',
line=dict(dash='dash', width=1),
name=f'{umpire} Trend',
showlegend=True,
hoverinfo='skip'
))
# Update layout
fig.update_layout(
title=dict(
x=0.5,
xanchor='center',
font=dict(size=18)
),
xaxis=dict(title="Inning"),
yaxis=dict(title="Accuracy Rate", tickformat='.0%'),
hovermode='closest',
width=1000,
height=600,
paper_bgcolor='white'
)
# Update animation settings
fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 500
fig.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 300
return fig
# Generate sample game progression data
np.random.seed(123)
n_games = 50
game_ids = np.repeat(range(1, n_games + 1), 9 * 3)
innings = np.tile(np.repeat(range(1, 10), 3), n_games)
umpires = np.tile(['Angel Hernandez', 'Joe West', 'Pat Hoberg'], n_games * 9)
game_progression_data = pd.DataFrame({
'game_id': game_ids,
'inning': innings,
'umpire': umpires
})
# Simulate pitches per inning
game_progression_data['pitches'] = np.random.poisson(15, len(game_progression_data))
# Base accuracy with fatigue effect
game_progression_data['base_accuracy'] = 0.92 - (game_progression_data['inning'] - 5) * 0.005
# Umpire-specific accuracy adjustments
umpire_adj = {
'Pat Hoberg': 0.04,
'Joe West': 0.00,
'Angel Hernandez': -0.03
}
game_progression_data['umpire_accuracy_adj'] = game_progression_data['umpire'].map(umpire_adj)
# Random variation
game_progression_data['game_variation'] = np.random.normal(0, 0.02, len(game_progression_data))
# Final accuracy
game_progression_data['accuracy'] = np.clip(
game_progression_data['base_accuracy'] +
game_progression_data['umpire_accuracy_adj'] +
game_progression_data['game_variation'],
0.80, 0.99
)
# Simulate other metrics
game_progression_data['correct_calls'] = np.random.binomial(
game_progression_data['pitches'],
game_progression_data['accuracy']
)
game_progression_data['called_strike'] = np.random.binomial(
game_progression_data['pitches'],
0.15
)
game_progression_data['in_zone'] = np.random.binomial(
game_progression_data['called_strike'],
0.85
).astype(bool)
game_progression_data['correct_call'] = (
game_progression_data['correct_calls'] / game_progression_data['pitches']
)
game_progression_data['called_strike'] = (
game_progression_data['called_strike'] / game_progression_data['pitches']
)
# Create animation
accuracy_animation = create_accuracy_animation(game_progression_data)
accuracy_animation.show()
# Save as HTML
accuracy_animation.write_html("umpire_accuracy_animation.html")
Exercise 21.1: Umpire Accuracy Analysis
Using pitch-level data from the 2024 season:
a) Calculate the overall accuracy rate for each umpire (minimum 1,000 called pitches)
b) Identify the five most accurate and five least accurate umpires
c) Create a visualization comparing each umpire's accuracy on pitches inside vs. outside the strike zone
d) Test whether there is a statistically significant difference in accuracy between the most and least accurate umpires
Hint: Use a two-sample t-test or permutation test to assess statistical significance. Consider whether accuracy rates are normally distributed.
Exercise 21.2: Strike Zone Visualization
For a specific umpire of your choice:
a) Create a heat map showing the probability of a called strike at different locations
b) Overlay the rulebook strike zone on your visualization
c) Identify regions where the umpire's zone significantly differs from the rulebook (>20 percentage points)
d) Create a similar visualization for the league average and place them side-by-side for comparison
Hint: Use 2D binning or kernel density estimation to create smooth probability surfaces. The stat_summary_2d() function in ggplot2 or scipy.stats.binned_statistic_2d() in Python are helpful.
Exercise 21.3: Predicting Called Strikes
Build and compare predictive models for called strikes:
a) Train a logistic regression model using pitch location, count, batter handedness, and pitcher handedness as features
b) Train a random forest model with the same features
c) Add umpire identity as a feature to both models (use one-hot encoding)
d) Compare the models using AUC, accuracy, and calibration plots
e) Identify which features are most important in each model
f) Use the best model to identify the 10 most surprising calls from the 2024 season (largest difference between predicted probability and actual call)
Hint: Feature importance can be extracted from logistic regression coefficients and random forest's feature_importances_ attribute. For surprising calls, look for high-probability strikes called balls and vice versa.
Exercise 21.4: ABS Impact Simulation
Simulate the impact of implementing full ABS:
a) For each pitch in your dataset, determine whether the human umpire's call matches what ABS would call
b) Calculate the overall agreement rate and identify systematic biases (e.g., do human umpires call more strikes or fewer strikes than ABS?)
c) Estimate how strikeout rates and walk rates would change under full ABS (focus on pitches with 2 strikes and 3 balls respectively)
d) Calculate the expected number of calls that would be overturned per game
e) Analyze whether certain types of pitchers (high strikeout, high walk, etc.) would be helped or hurt more by ABS
Hint: You'll need to define the ABS zone precisely using the sztop and szbot variables. Consider grouping pitchers by strikeout and walk rates to assess differential impacts.
This chapter has covered the fundamentals of umpire analysis and strike zone modeling, from defining accuracy metrics to building predictive models and evaluating the potential impact of automated systems. As MLB continues to consider the role of technology in officiating, these analytical tools will remain essential for understanding how umpires influence the game and how changes to ball-strike calling might affect gameplay and strategy. The combination of granular pitch-tracking data and sophisticated statistical modeling allows us to evaluate umpire performance with unprecedented precision while also informing important decisions about the future of the sport.
Practice Exercises
Reinforce what you've learned with these hands-on exercises. Try to solve them on your own before viewing hints or solutions.
Tips for Success
- Read the problem carefully before starting to code
- Break down complex problems into smaller steps
- Use the hints if you're stuck - they won't give away the answer
- After solving, compare your approach with the solution
Umpire Accuracy Analysis
a) Calculate the overall accuracy rate for each umpire (minimum 1,000 called pitches)
b) Identify the five most accurate and five least accurate umpires
c) Create a visualization comparing each umpire's accuracy on pitches inside vs. outside the strike zone
d) Test whether there is a statistically significant difference in accuracy between the most and least accurate umpires
**Hint:** Use a two-sample t-test or permutation test to assess statistical significance. Consider whether accuracy rates are normally distributed.
Strike Zone Visualization
a) Create a heat map showing the probability of a called strike at different locations
b) Overlay the rulebook strike zone on your visualization
c) Identify regions where the umpire's zone significantly differs from the rulebook (>20 percentage points)
d) Create a similar visualization for the league average and place them side-by-side for comparison
**Hint:** Use 2D binning or kernel density estimation to create smooth probability surfaces. The `stat_summary_2d()` function in ggplot2 or `scipy.stats.binned_statistic_2d()` in Python are helpful.
Predicting Called Strikes
a) Train a logistic regression model using pitch location, count, batter handedness, and pitcher handedness as features
b) Train a random forest model with the same features
c) Add umpire identity as a feature to both models (use one-hot encoding)
d) Compare the models using AUC, accuracy, and calibration plots
e) Identify which features are most important in each model
f) Use the best model to identify the 10 most surprising calls from the 2024 season (largest difference between predicted probability and actual call)
**Hint:** Feature importance can be extracted from logistic regression coefficients and random forest's `feature_importances_` attribute. For surprising calls, look for high-probability strikes called balls and vice versa.
ABS Impact Simulation
a) For each pitch in your dataset, determine whether the human umpire's call matches what ABS would call
b) Calculate the overall agreement rate and identify systematic biases (e.g., do human umpires call more strikes or fewer strikes than ABS?)
c) Estimate how strikeout rates and walk rates would change under full ABS (focus on pitches with 2 strikes and 3 balls respectively)
d) Calculate the expected number of calls that would be overturned per game
e) Analyze whether certain types of pitchers (high strikeout, high walk, etc.) would be helped or hurt more by ABS
**Hint:** You'll need to define the ABS zone precisely using the sz_top and sz_bot variables. Consider grouping pitchers by strikeout and walk rates to assess differential impacts.
---
This chapter has covered the fundamentals of umpire analysis and strike zone modeling, from defining accuracy metrics to building predictive models and evaluating the potential impact of automated systems. As MLB continues to consider the role of technology in officiating, these analytical tools will remain essential for understanding how umpires influence the game and how changes to ball-strike calling might affect gameplay and strategy. The combination of granular pitch-tracking data and sophisticated statistical modeling allows us to evaluate umpire performance with unprecedented precision while also informing important decisions about the future of the sport.