The catcher position has long been considered the most analytically complex position in baseball. While traditional statistics like batting average and home runs capture only a fraction of a catcher's contribution, modern analytics have revolutionized our understanding of defensive value behind the plate. Catchers influence the game in multiple dimensions: pitch framing, blocking pitches in the dirt, controlling the running game, calling pitches, and managing the pitching staff.
The Evolution of Catcher Evaluation
Historically, catchers were evaluated primarily on their offensive production and basic defensive stats like caught stealing percentage and passed balls. Hall of Fame voters often overlooked defensive specialists, favoring catchers who could hit. However, the analytical revolution revealed that elite pitch framers could add 20-30 runs per season through receiving alone—a contribution worth several wins and millions of dollars in free agent value.
The introduction of PITCHf/x in 2006, and later Statcast in 2015, provided the granular pitch-location data necessary to quantify pitch framing. This technological advancement fundamentally changed how front offices evaluate catchers. Teams began prioritizing receiving skills, leading to longer careers for defense-first catchers like Jeff Mathis and the rise of framing specialists.
Components of Catcher Value
Modern catcher evaluation encompasses five primary defensive skills:
- Pitch Framing: The ability to receive pitches in a way that maximizes strike calls
- Blocking: Preventing wild pitches and passed balls on pitches in the dirt
- Throwing: Deterring and throwing out base stealers
- Game Calling: Pitch selection and sequencing
- Pitcher Management: Building rapport and optimizing pitcher performance
Elite catchers like J.T. Realmuto and Adley Rutschman excel in multiple categories, while others specialize. For example, Tucker Barnhart built a career on elite framing despite modest offensive production, while Salvador Perez remained valuable primarily through his offensive contributions despite below-average framing.
Economic Value
The market has recognized catcher defense. When Yasmani Grandal signed a 4-year, $73 million contract with the White Sox in 2020, his elite framing ability (which had saved 150+ runs over the previous five seasons) was central to his valuation. Similarly, teams have traded prospects for rental catchers with elite defensive skills, understanding that catching defense can be the difference in playoff races.
Research by Mike Fast, the pioneer of pitch framing analysis, estimated that one framing run equals approximately 0.9 WAR wins. At the 2020 free agent market rate of roughly $8-9 million per win, elite framers provide $15-25 million in annual value from receiving alone. This explains why teams like the Rays have built organizational expertise around catcher development and framing optimization.
The Catcher Defensive Spectrum
Let's examine where recent catchers fall on the defensive spectrum using 2019-2023 data as a reference period:
Elite All-Around Defenders: J.T. Realmuto, Adley Rutschman, Sean Murphy, Will Smith (LAD)
Elite Framers: Tucker Barnhart, Austin Hedges, Tyler Stephenson
Elite Throwing: J.T. Realmuto, Salvador Perez, Jorge Alfaro
Elite Blockers: Yadier Molina (before retirement), Roberto Perez, Yan Gomes
Understanding these specializations helps teams construct rosters. A pitching staff with excellent control might prioritize framing over blocking, while a team with young, wild pitchers might value blocking more highly.
Pitch framing—the art of receiving pitches to maximize called strikes—represents the most quantifiable and valuable defensive skill catchers provide. Research shows that the difference between the best and worst framers can exceed 40 runs per season, equivalent to four wins.
The Mechanics of Framing
Effective pitch framing involves several technical components:
- Quiet Receiving: Minimizing glove movement after pitch reception
- Receiving Inside: Catching the ball near the strike zone rather than reaching
- Smooth Transfers: Presenting pitches with subtle, continuous motion toward the zone
- Thumb Position: Keeping the thumb tucked to create a smooth target
- Stick: Holding the glove position momentarily after reception
The best framers, like Yasmani Grandal and Tucker Barnhart, employ barely perceptible techniques that influence umpire judgment. They avoid "snatching" or yanking pitches into the zone—an obvious technique that often backfires—instead using subtle body positioning and glove work.
Framing Metrics: Called Strike Probability Models
Modern framing metrics use probabilistic models to estimate the expected called strike rate for each pitch based on location, count, pitcher handedness, batter handedness, and umpire tendencies. The catcher receives credit or blame for the difference between actual and expected outcomes.
The basic framework:
- Build a logistic regression model predicting P(Called Strike) based on pitch characteristics
- Calculate expected strikes for each catcher's called pitches
- Compare actual called strikes to expected called strikes
- Convert the difference to runs using linear weights
Key Framing Metrics:
- Framing Runs: Total runs added/lost through framing (Baseball Prospectus)
- Strike Rate+: Strike rate compared to league average, scaled to 100 (FanGraphs)
- Adjusted Caught Stealing Above Average (CSAAv): Statcast's framing metric in runs
- Calls Above Average (CAA): Called strikes above expected
Let's build a framing model using pitch location data:
# R: Building a Called Strike Probability Model
library(tidyverse)
library(mgcv) # For GAM models
# Simulate pitch-level data structure
set.seed(2023)
n_pitches <- 50000
pitch_data <- tibble(
pitch_id = 1:n_pitches,
plate_x = rnorm(n_pitches, 0, 0.8), # Horizontal location (feet from center)
plate_z = rnorm(n_pitches, 2.5, 0.6), # Vertical location (feet)
catcher = sample(c("Realmuto", "Rutschman", "Barnhart", "Perez", "League_Avg"),
n_pitches, replace = TRUE),
pitch_type = sample(c("FF", "SL", "CH", "CU"), n_pitches, replace = TRUE),
balls = sample(0:3, n_pitches, replace = TRUE),
strikes = sample(0:2, n_pitches, replace = TRUE),
p_throws = sample(c("R", "L"), n_pitches, replace = TRUE),
stand = sample(c("R", "L"), n_pitches, replace = TRUE)
) %>%
# Simulate called strike outcomes based on location
mutate(
# Distance from center of zone
dist_from_center = sqrt(plate_x^2 + (plate_z - 2.5)^2),
# Probability decreases with distance from zone
strike_prob = plogis(2 - 2.5 * dist_from_center),
# Add catcher effect
catcher_effect = case_when(
catcher == "Barnhart" ~ 0.4,
catcher == "Realmuto" ~ 0.3,
catcher == "Rutschman" ~ 0.25,
catcher == "League_Avg" ~ 0,
catcher == "Perez" ~ -0.2
),
strike_prob_adj = plogis(qlogis(strike_prob) + catcher_effect),
called_strike = rbinom(n_pitches, 1, strike_prob_adj),
# Filter to called pitches only (no swings)
is_called = sample(c(TRUE, FALSE), n_pitches, replace = TRUE, prob = c(0.6, 0.4))
) %>%
filter(is_called)
# Build baseline GAM model without catcher
baseline_model <- gam(
called_strike ~ s(plate_x, plate_z, k = 100) +
factor(balls) + factor(strikes) +
factor(p_throws) + factor(stand),
data = pitch_data,
family = binomial
)
# Add predictions to data
pitch_data <- pitch_data %>%
mutate(expected_strike = predict(baseline_model, newdata = ., type = "response"))
# Calculate framing runs by catcher
framing_results <- pitch_data %>%
group_by(catcher) %>%
summarise(
pitches = n(),
actual_strikes = sum(called_strike),
expected_strikes = sum(expected_strike),
extra_strikes = actual_strikes - expected_strikes,
# Convert to runs (approximately 0.125 runs per strike)
framing_runs = extra_strikes * 0.125,
# Per 7000 called pitches (typical season)
framing_runs_per_7000 = framing_runs / pitches * 7000
) %>%
arrange(desc(framing_runs_per_7000))
print(framing_results)
# Visualize the strike zone by catcher
library(ggplot2)
# Create heatmap of framing value by location
framing_heatmap <- pitch_data %>%
filter(catcher %in% c("Barnhart", "Realmuto", "Perez")) %>%
mutate(
x_bin = cut(plate_x, breaks = seq(-2, 2, 0.2)),
z_bin = cut(plate_z, breaks = seq(0, 5, 0.2))
) %>%
group_by(catcher, x_bin, z_bin) %>%
summarise(
framing_value = mean(called_strike - expected_strike),
n = n(),
.groups = "drop"
) %>%
filter(n >= 10) %>%
mutate(
x_mid = as.numeric(x_bin) * 0.2 - 2.1,
z_mid = as.numeric(z_bin) * 0.2 - 0.1
)
ggplot(framing_heatmap, aes(x = x_mid, y = z_mid, fill = framing_value)) +
geom_tile() +
facet_wrap(~ catcher) +
scale_fill_gradient2(low = "red", mid = "white", high = "blue",
midpoint = 0, name = "Framing\nValue") +
coord_fixed() +
geom_rect(aes(xmin = -0.708, xmax = 0.708, ymin = 1.5, ymax = 3.5),
fill = NA, color = "black", linewidth = 1) +
labs(title = "Pitch Framing Value by Location",
x = "Horizontal Location (ft)",
y = "Vertical Location (ft)") +
theme_minimal()
# Python: Pitch Framing Analysis with Machine Learning
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
import seaborn as sns
# Simulate pitch data
np.random.seed(2023)
n_pitches = 50000
pitch_df = pd.DataFrame({
'pitch_id': range(n_pitches),
'plate_x': np.random.normal(0, 0.8, n_pitches),
'plate_z': np.random.normal(2.5, 0.6, n_pitches),
'catcher': np.random.choice(['Realmuto', 'Rutschman', 'Barnhart', 'Perez', 'Smith'],
n_pitches),
'pitch_type': np.random.choice(['FF', 'SL', 'CH', 'CU'], n_pitches),
'balls': np.random.randint(0, 4, n_pitches),
'strikes': np.random.randint(0, 3, n_pitches),
'p_throws': np.random.choice(['R', 'L'], n_pitches),
'stand': np.random.choice(['R', 'L'], n_pitches)
})
# Create strike probability based on location
pitch_df['dist_from_center'] = np.sqrt(
pitch_df['plate_x']**2 + (pitch_df['plate_z'] - 2.5)**2
)
# Catcher effects
catcher_effects = {
'Barnhart': 0.4,
'Realmuto': 0.3,
'Rutschman': 0.25,
'Smith': 0.15,
'Perez': -0.2
}
pitch_df['catcher_effect'] = pitch_df['catcher'].map(catcher_effects)
# Simulate called strikes
def logit(p):
return np.log(p / (1 - p))
def inv_logit(x):
return 1 / (1 + np.exp(-x))
base_logit = 2 - 2.5 * pitch_df['dist_from_center']
adj_logit = base_logit + pitch_df['catcher_effect']
strike_prob = inv_logit(adj_logit)
pitch_df['called_strike'] = np.random.binomial(1, strike_prob)
pitch_df['is_called'] = np.random.choice([True, False], n_pitches,
p=[0.6, 0.4])
# Filter to called pitches
called_pitches = pitch_df[pitch_df['is_called']].copy()
# Prepare features for modeling (exclude catcher)
le_pitch = LabelEncoder()
le_throws = LabelEncoder()
le_stand = LabelEncoder()
X_features = called_pitches.copy()
X_features['pitch_type_enc'] = le_pitch.fit_transform(X_features['pitch_type'])
X_features['p_throws_enc'] = le_throws.fit_transform(X_features['p_throws'])
X_features['stand_enc'] = le_stand.fit_transform(X_features['stand'])
feature_cols = ['plate_x', 'plate_z', 'balls', 'strikes',
'pitch_type_enc', 'p_throws_enc', 'stand_enc']
X = X_features[feature_cols]
y = X_features['called_strike']
# Train baseline model (without catcher)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
gb_model = GradientBoostingClassifier(
n_estimators=100,
max_depth=5,
learning_rate=0.1,
random_state=42
)
gb_model.fit(X_train, y_train)
# Generate expected strike probabilities
X_features['expected_strike_prob'] = gb_model.predict_proba(X)[:, 1]
X_features['called_strike'] = y
# Calculate framing runs
framing_summary = X_features.groupby('catcher').agg({
'pitch_id': 'count',
'called_strike': 'sum',
'expected_strike_prob': 'sum'
}).round(2)
framing_summary.columns = ['pitches', 'actual_strikes', 'expected_strikes']
framing_summary['extra_strikes'] = (
framing_summary['actual_strikes'] - framing_summary['expected_strikes']
)
framing_summary['framing_runs'] = framing_summary['extra_strikes'] * 0.125
framing_summary['framing_runs_per_7000'] = (
framing_summary['framing_runs'] / framing_summary['pitches'] * 7000
)
framing_summary = framing_summary.sort_values('framing_runs_per_7000',
ascending=False)
print("\nFraming Runs by Catcher:")
print(framing_summary)
# Visualize framing by location for top catchers
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
top_catchers = ['Barnhart', 'Realmuto', 'Perez']
for idx, catcher_name in enumerate(top_catchers):
catcher_data = X_features[X_features['catcher'] == catcher_name].copy()
catcher_data['framing_value'] = (
catcher_data['called_strike'] - catcher_data['expected_strike_prob']
)
# Create 2D histogram
heatmap_data = axes[idx].hexbin(
catcher_data['plate_x'],
catcher_data['plate_z'],
C=catcher_data['framing_value'],
gridsize=15,
cmap='RdBu',
vmin=-0.2,
vmax=0.2,
reduce_C_function=np.mean
)
# Add strike zone rectangle
zone_x = [-0.708, 0.708, 0.708, -0.708, -0.708]
zone_z = [1.5, 1.5, 3.5, 3.5, 1.5]
axes[idx].plot(zone_x, zone_z, 'k-', linewidth=2)
axes[idx].set_title(f'{catcher_name} Framing Value')
axes[idx].set_xlabel('Horizontal Location (ft)')
axes[idx].set_ylabel('Vertical Location (ft)')
axes[idx].set_xlim(-2, 2)
axes[idx].set_ylim(0, 5)
plt.colorbar(heatmap_data, ax=axes, label='Framing Value')
plt.tight_layout()
plt.savefig('framing_heatmap.png', dpi=300, bbox_inches='tight')
plt.close()
# Calculate framing by zone
def classify_zone(row):
x, z = row['plate_x'], row['plate_z']
if -0.708 <= x <= 0.708 and 1.5 <= z <= 3.5:
return 'In Zone'
elif -1.5 <= x <= 1.5 and 0.5 <= z <= 4.5:
return 'Edge (Frameable)'
else:
return 'Out of Zone'
X_features['zone_type'] = X_features.apply(classify_zone, axis=1)
zone_framing = X_features.groupby(['catcher', 'zone_type']).agg({
'called_strike': 'sum',
'expected_strike_prob': 'sum',
'pitch_id': 'count'
}).reset_index()
zone_framing['extra_strikes'] = (
zone_framing['called_strike'] - zone_framing['expected_strike_prob']
)
zone_framing['extra_strike_pct'] = (
zone_framing['extra_strikes'] / zone_framing['pitch_id'] * 100
)
print("\nFraming by Zone Type:")
print(zone_framing.pivot_table(
index='catcher',
columns='zone_type',
values='extra_strike_pct'
).round(2))
Edge Strike Mastery
The most valuable framing occurs on pitches near the strike zone border—the "edge" pitches where umpire judgment is most uncertain. Research shows that:
- Pitches 2-4 inches off the plate horizontally have the highest framing impact
- Low strikes (bottom of zone) show more framing variability than high strikes
- Backdoor breaking balls and inside fastballs benefit most from elite framing
Elite framers like Tucker Barnhart gained 3-4% more called strikes on edge pitches compared to poor framers—a difference worth 15-20 runs per season on those pitches alone.
Umpire and Count Context
Framing value varies significantly by context:
Count Effects: Framers gain more strikes in pitcher-favorable counts (0-2, 1-2) when umpires may be more willing to give borderline calls, and in 3-2 counts when both pitcher and batter are protecting.
Umpire Variation: Some umpires have larger strike zones or are more susceptible to framing. Advanced systems model individual umpire tendencies, allowing catchers to adjust their approach.
Home Plate Umpire Positioning: Umpires who set up more directly behind the catcher may be less influenced by framing than those offset to one side.
The Limits of Framing: Robot Umps
The introduction of the Automated Ball-Strike System (ABS) in minor leagues threatens to eliminate framing value entirely. In Triple-A games using ABS, catchers no longer influence strike calls, fundamentally changing the position's value proposition. This has major implications:
- Defensive-first catchers may lose their primary value source
- Teams may prioritize offense more heavily at the position
- Catcher salaries could compress as the skill gap narrows
- Blocking and throwing skills become relatively more important
However, as of 2024-2025, MLB has not committed to full ABS implementation, instead testing hybrid systems where teams can challenge calls. This preserves some framing value while improving accuracy.
# R: Building a Called Strike Probability Model
library(tidyverse)
library(mgcv) # For GAM models
# Simulate pitch-level data structure
set.seed(2023)
n_pitches <- 50000
pitch_data <- tibble(
pitch_id = 1:n_pitches,
plate_x = rnorm(n_pitches, 0, 0.8), # Horizontal location (feet from center)
plate_z = rnorm(n_pitches, 2.5, 0.6), # Vertical location (feet)
catcher = sample(c("Realmuto", "Rutschman", "Barnhart", "Perez", "League_Avg"),
n_pitches, replace = TRUE),
pitch_type = sample(c("FF", "SL", "CH", "CU"), n_pitches, replace = TRUE),
balls = sample(0:3, n_pitches, replace = TRUE),
strikes = sample(0:2, n_pitches, replace = TRUE),
p_throws = sample(c("R", "L"), n_pitches, replace = TRUE),
stand = sample(c("R", "L"), n_pitches, replace = TRUE)
) %>%
# Simulate called strike outcomes based on location
mutate(
# Distance from center of zone
dist_from_center = sqrt(plate_x^2 + (plate_z - 2.5)^2),
# Probability decreases with distance from zone
strike_prob = plogis(2 - 2.5 * dist_from_center),
# Add catcher effect
catcher_effect = case_when(
catcher == "Barnhart" ~ 0.4,
catcher == "Realmuto" ~ 0.3,
catcher == "Rutschman" ~ 0.25,
catcher == "League_Avg" ~ 0,
catcher == "Perez" ~ -0.2
),
strike_prob_adj = plogis(qlogis(strike_prob) + catcher_effect),
called_strike = rbinom(n_pitches, 1, strike_prob_adj),
# Filter to called pitches only (no swings)
is_called = sample(c(TRUE, FALSE), n_pitches, replace = TRUE, prob = c(0.6, 0.4))
) %>%
filter(is_called)
# Build baseline GAM model without catcher
baseline_model <- gam(
called_strike ~ s(plate_x, plate_z, k = 100) +
factor(balls) + factor(strikes) +
factor(p_throws) + factor(stand),
data = pitch_data,
family = binomial
)
# Add predictions to data
pitch_data <- pitch_data %>%
mutate(expected_strike = predict(baseline_model, newdata = ., type = "response"))
# Calculate framing runs by catcher
framing_results <- pitch_data %>%
group_by(catcher) %>%
summarise(
pitches = n(),
actual_strikes = sum(called_strike),
expected_strikes = sum(expected_strike),
extra_strikes = actual_strikes - expected_strikes,
# Convert to runs (approximately 0.125 runs per strike)
framing_runs = extra_strikes * 0.125,
# Per 7000 called pitches (typical season)
framing_runs_per_7000 = framing_runs / pitches * 7000
) %>%
arrange(desc(framing_runs_per_7000))
print(framing_results)
# Visualize the strike zone by catcher
library(ggplot2)
# Create heatmap of framing value by location
framing_heatmap <- pitch_data %>%
filter(catcher %in% c("Barnhart", "Realmuto", "Perez")) %>%
mutate(
x_bin = cut(plate_x, breaks = seq(-2, 2, 0.2)),
z_bin = cut(plate_z, breaks = seq(0, 5, 0.2))
) %>%
group_by(catcher, x_bin, z_bin) %>%
summarise(
framing_value = mean(called_strike - expected_strike),
n = n(),
.groups = "drop"
) %>%
filter(n >= 10) %>%
mutate(
x_mid = as.numeric(x_bin) * 0.2 - 2.1,
z_mid = as.numeric(z_bin) * 0.2 - 0.1
)
ggplot(framing_heatmap, aes(x = x_mid, y = z_mid, fill = framing_value)) +
geom_tile() +
facet_wrap(~ catcher) +
scale_fill_gradient2(low = "red", mid = "white", high = "blue",
midpoint = 0, name = "Framing\nValue") +
coord_fixed() +
geom_rect(aes(xmin = -0.708, xmax = 0.708, ymin = 1.5, ymax = 3.5),
fill = NA, color = "black", linewidth = 1) +
labs(title = "Pitch Framing Value by Location",
x = "Horizontal Location (ft)",
y = "Vertical Location (ft)") +
theme_minimal()
# Python: Pitch Framing Analysis with Machine Learning
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
import seaborn as sns
# Simulate pitch data
np.random.seed(2023)
n_pitches = 50000
pitch_df = pd.DataFrame({
'pitch_id': range(n_pitches),
'plate_x': np.random.normal(0, 0.8, n_pitches),
'plate_z': np.random.normal(2.5, 0.6, n_pitches),
'catcher': np.random.choice(['Realmuto', 'Rutschman', 'Barnhart', 'Perez', 'Smith'],
n_pitches),
'pitch_type': np.random.choice(['FF', 'SL', 'CH', 'CU'], n_pitches),
'balls': np.random.randint(0, 4, n_pitches),
'strikes': np.random.randint(0, 3, n_pitches),
'p_throws': np.random.choice(['R', 'L'], n_pitches),
'stand': np.random.choice(['R', 'L'], n_pitches)
})
# Create strike probability based on location
pitch_df['dist_from_center'] = np.sqrt(
pitch_df['plate_x']**2 + (pitch_df['plate_z'] - 2.5)**2
)
# Catcher effects
catcher_effects = {
'Barnhart': 0.4,
'Realmuto': 0.3,
'Rutschman': 0.25,
'Smith': 0.15,
'Perez': -0.2
}
pitch_df['catcher_effect'] = pitch_df['catcher'].map(catcher_effects)
# Simulate called strikes
def logit(p):
return np.log(p / (1 - p))
def inv_logit(x):
return 1 / (1 + np.exp(-x))
base_logit = 2 - 2.5 * pitch_df['dist_from_center']
adj_logit = base_logit + pitch_df['catcher_effect']
strike_prob = inv_logit(adj_logit)
pitch_df['called_strike'] = np.random.binomial(1, strike_prob)
pitch_df['is_called'] = np.random.choice([True, False], n_pitches,
p=[0.6, 0.4])
# Filter to called pitches
called_pitches = pitch_df[pitch_df['is_called']].copy()
# Prepare features for modeling (exclude catcher)
le_pitch = LabelEncoder()
le_throws = LabelEncoder()
le_stand = LabelEncoder()
X_features = called_pitches.copy()
X_features['pitch_type_enc'] = le_pitch.fit_transform(X_features['pitch_type'])
X_features['p_throws_enc'] = le_throws.fit_transform(X_features['p_throws'])
X_features['stand_enc'] = le_stand.fit_transform(X_features['stand'])
feature_cols = ['plate_x', 'plate_z', 'balls', 'strikes',
'pitch_type_enc', 'p_throws_enc', 'stand_enc']
X = X_features[feature_cols]
y = X_features['called_strike']
# Train baseline model (without catcher)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
gb_model = GradientBoostingClassifier(
n_estimators=100,
max_depth=5,
learning_rate=0.1,
random_state=42
)
gb_model.fit(X_train, y_train)
# Generate expected strike probabilities
X_features['expected_strike_prob'] = gb_model.predict_proba(X)[:, 1]
X_features['called_strike'] = y
# Calculate framing runs
framing_summary = X_features.groupby('catcher').agg({
'pitch_id': 'count',
'called_strike': 'sum',
'expected_strike_prob': 'sum'
}).round(2)
framing_summary.columns = ['pitches', 'actual_strikes', 'expected_strikes']
framing_summary['extra_strikes'] = (
framing_summary['actual_strikes'] - framing_summary['expected_strikes']
)
framing_summary['framing_runs'] = framing_summary['extra_strikes'] * 0.125
framing_summary['framing_runs_per_7000'] = (
framing_summary['framing_runs'] / framing_summary['pitches'] * 7000
)
framing_summary = framing_summary.sort_values('framing_runs_per_7000',
ascending=False)
print("\nFraming Runs by Catcher:")
print(framing_summary)
# Visualize framing by location for top catchers
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
top_catchers = ['Barnhart', 'Realmuto', 'Perez']
for idx, catcher_name in enumerate(top_catchers):
catcher_data = X_features[X_features['catcher'] == catcher_name].copy()
catcher_data['framing_value'] = (
catcher_data['called_strike'] - catcher_data['expected_strike_prob']
)
# Create 2D histogram
heatmap_data = axes[idx].hexbin(
catcher_data['plate_x'],
catcher_data['plate_z'],
C=catcher_data['framing_value'],
gridsize=15,
cmap='RdBu',
vmin=-0.2,
vmax=0.2,
reduce_C_function=np.mean
)
# Add strike zone rectangle
zone_x = [-0.708, 0.708, 0.708, -0.708, -0.708]
zone_z = [1.5, 1.5, 3.5, 3.5, 1.5]
axes[idx].plot(zone_x, zone_z, 'k-', linewidth=2)
axes[idx].set_title(f'{catcher_name} Framing Value')
axes[idx].set_xlabel('Horizontal Location (ft)')
axes[idx].set_ylabel('Vertical Location (ft)')
axes[idx].set_xlim(-2, 2)
axes[idx].set_ylim(0, 5)
plt.colorbar(heatmap_data, ax=axes, label='Framing Value')
plt.tight_layout()
plt.savefig('framing_heatmap.png', dpi=300, bbox_inches='tight')
plt.close()
# Calculate framing by zone
def classify_zone(row):
x, z = row['plate_x'], row['plate_z']
if -0.708 <= x <= 0.708 and 1.5 <= z <= 3.5:
return 'In Zone'
elif -1.5 <= x <= 1.5 and 0.5 <= z <= 4.5:
return 'Edge (Frameable)'
else:
return 'Out of Zone'
X_features['zone_type'] = X_features.apply(classify_zone, axis=1)
zone_framing = X_features.groupby(['catcher', 'zone_type']).agg({
'called_strike': 'sum',
'expected_strike_prob': 'sum',
'pitch_id': 'count'
}).reset_index()
zone_framing['extra_strikes'] = (
zone_framing['called_strike'] - zone_framing['expected_strike_prob']
)
zone_framing['extra_strike_pct'] = (
zone_framing['extra_strikes'] / zone_framing['pitch_id'] * 100
)
print("\nFraming by Zone Type:")
print(zone_framing.pivot_table(
index='catcher',
columns='zone_type',
values='extra_strike_pct'
).round(2))
While pitch framing receives the most analytical attention, blocking pitches in the dirt remains a crucial skill. Wild pitches and passed balls allow baserunners to advance, with each base costing teams approximately 0.25 runs on average. Elite blockers save 5-10 runs per season compared to poor blockers.
The Mechanics of Blocking
Effective blocking requires:
- Quick Recognition: Identifying pitches in the dirt early
- Proper Positioning: Staying low and square to the pitcher
- Drop Technique: Quickly dropping to knees with chest over the ball
- Creating Surface Area: Spreading the body to cover maximum area
- Angle Control: Directing deflections toward home plate
- Recovery: Quickly recovering to prevent runners from advancing
Yadier Molina, widely considered one of the best blockers in history, demonstrated textbook technique: staying flexible, reading spin early, and using his chest protector to keep balls in front. His blocking ability was particularly valuable with the Cardinals' sinkerball-heavy pitching approach.
Blocking Metrics
Measuring blocking requires accounting for opportunity:
- Block Rate: Percentage of pitches in the dirt successfully blocked
- Expected Blocks: Model-based expected blocks given pitch locations
- Blocks Above Average (BAA): Actual blocks minus expected blocks
- Wild Pitch/Passed Ball Runs: Run value of WP/PB prevented
The challenge lies in defining "blockable" pitches. A pitch 5 feet in front of the plate differs from one just off the dirt. Advanced models use pitch trajectory and location to estimate block probability.
# R: Analyzing Blocking Performance
library(tidyverse)
# Simulate pitch in dirt data
set.seed(2024)
n_dirt_pitches <- 5000
blocking_data <- tibble(
pitch_id = 1:n_dirt_pitches,
catcher = sample(c("Molina", "Realmuto", "Perez", "Rutschman", "Avg_Catcher"),
n_dirt_pitches, replace = TRUE),
# Distance from plate (negative = in front of plate)
plate_y = runif(n_dirt_pitches, -4, -0.5),
# Horizontal location
plate_x = rnorm(n_dirt_pitches, 0, 0.6),
# Height at front of plate
plate_z = runif(n_dirt_pitches, -0.5, 0.8),
pitch_type = sample(c("FF", "SL", "CH", "CU"), n_dirt_pitches, replace = TRUE),
# Runner on base indicator
runner_on = sample(c(TRUE, FALSE), n_dirt_pitches, replace = TRUE)
) %>%
mutate(
# Difficulty score based on location
difficulty = sqrt(plate_x^2 + plate_y^2 + plate_z^2),
# Base block probability
base_block_prob = plogis(2 - 0.8 * difficulty),
# Catcher skill modifier
catcher_skill = case_when(
catcher == "Molina" ~ 0.5,
catcher == "Realmuto" ~ 0.3,
catcher == "Rutschman" ~ 0.2,
catcher == "Avg_Catcher" ~ 0,
catcher == "Perez" ~ -0.1
),
block_prob = plogis(qlogis(base_block_prob) + catcher_skill),
blocked = rbinom(n_dirt_pitches, 1, block_prob)
)
# Build expected blocking model
blocking_model <- glm(
blocked ~ plate_x + plate_y + plate_z + I(plate_x^2) + I(plate_y^2),
data = blocking_data,
family = binomial
)
blocking_data <- blocking_data %>%
mutate(expected_block = predict(blocking_model, newdata = ., type = "response"))
# Calculate blocking runs
blocking_summary <- blocking_data %>%
group_by(catcher) %>%
summarise(
dirt_pitches = n(),
actual_blocks = sum(blocked),
expected_blocks = sum(expected_block),
extra_blocks = actual_blocks - expected_blocks,
block_rate = mean(blocked) * 100,
# Each failed block (WP/PB) costs ~0.3 runs on average
blocking_runs = extra_blocks * 0.3,
blocking_runs_per_1000 = blocking_runs / dirt_pitches * 1000
) %>%
arrange(desc(blocking_runs_per_1000))
print(blocking_summary)
# Analyze blocking by difficulty tier
difficulty_analysis <- blocking_data %>%
mutate(
difficulty_tier = cut(difficulty,
breaks = quantile(difficulty, probs = seq(0, 1, 0.25)),
labels = c("Easy", "Medium", "Hard", "Very Hard"),
include.lowest = TRUE)
) %>%
group_by(catcher, difficulty_tier) %>%
summarise(
pitches = n(),
block_rate = mean(blocked) * 100,
.groups = "drop"
) %>%
pivot_wider(names_from = difficulty_tier, values_from = block_rate,
values_fill = 0)
print(difficulty_analysis)
# Visualize blocking performance
ggplot(blocking_summary, aes(x = reorder(catcher, blocking_runs_per_1000),
y = blocking_runs_per_1000, fill = catcher)) +
geom_col() +
coord_flip() +
labs(title = "Blocking Runs Above Average per 1000 Pitches in Dirt",
x = "Catcher",
y = "Blocking Runs per 1000 Dirt Pitches") +
theme_minimal() +
theme(legend.position = "none")
# Impact of blocking with runners on base
runner_impact <- blocking_data %>%
group_by(catcher, runner_on) %>%
summarise(
pitches = n(),
block_rate = mean(blocked) * 100,
.groups = "drop"
) %>%
pivot_wider(names_from = runner_on, values_from = block_rate,
names_prefix = "runner_on_")
print(runner_impact)
# Python: Blocking Analysis with Spatial Components
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Simulate blocking data
np.random.seed(2024)
n_dirt = 5000
blocking_df = pd.DataFrame({
'pitch_id': range(n_dirt),
'catcher': np.random.choice(['Molina', 'Realmuto', 'Perez', 'Rutschman',
'Avg_Catcher'], n_dirt),
'plate_y': np.random.uniform(-4, -0.5, n_dirt),
'plate_x': np.random.normal(0, 0.6, n_dirt),
'plate_z': np.random.uniform(-0.5, 0.8, n_dirt),
'pitch_type': np.random.choice(['FF', 'SL', 'CH', 'CU'], n_dirt),
'runner_on': np.random.choice([True, False], n_dirt)
})
# Calculate difficulty and block probability
blocking_df['difficulty'] = np.sqrt(
blocking_df['plate_x']**2 +
blocking_df['plate_y']**2 +
blocking_df['plate_z']**2
)
catcher_blocking_skill = {
'Molina': 0.5,
'Realmuto': 0.3,
'Rutschman': 0.2,
'Avg_Catcher': 0,
'Perez': -0.1
}
blocking_df['catcher_skill'] = blocking_df['catcher'].map(catcher_blocking_skill)
def inv_logit(x):
return 1 / (1 + np.exp(-x))
base_logit = 2 - 0.8 * blocking_df['difficulty']
adj_logit = base_logit + blocking_df['catcher_skill']
block_prob = inv_logit(adj_logit)
blocking_df['blocked'] = np.random.binomial(1, block_prob)
# Train Random Forest model for expected blocks
feature_cols = ['plate_x', 'plate_y', 'plate_z']
X = blocking_df[feature_cols]
y = blocking_df['blocked']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
rf_model = RandomForestClassifier(
n_estimators=100,
max_depth=10,
random_state=42
)
rf_model.fit(X_train, y_train)
# Generate expected block probabilities
blocking_df['expected_block'] = rf_model.predict_proba(X)[:, 1]
# Calculate blocking metrics
blocking_summary = blocking_df.groupby('catcher').agg({
'pitch_id': 'count',
'blocked': 'sum',
'expected_block': 'sum'
})
blocking_summary.columns = ['dirt_pitches', 'actual_blocks', 'expected_blocks']
blocking_summary['extra_blocks'] = (
blocking_summary['actual_blocks'] - blocking_summary['expected_blocks']
)
blocking_summary['block_rate'] = (
blocking_summary['actual_blocks'] / blocking_summary['dirt_pitches'] * 100
)
blocking_summary['blocking_runs'] = blocking_summary['extra_blocks'] * 0.3
blocking_summary['blocking_runs_per_1000'] = (
blocking_summary['blocking_runs'] / blocking_summary['dirt_pitches'] * 1000
)
blocking_summary = blocking_summary.sort_values('blocking_runs_per_1000',
ascending=False)
print("\nBlocking Performance Summary:")
print(blocking_summary.round(2))
# 3D visualization of blocking difficulty zones
fig = plt.figure(figsize=(12, 5))
# Plot 1: 3D scatter of blocks vs non-blocks
ax1 = fig.add_subplot(121, projection='3d')
blocked = blocking_df[blocking_df['blocked'] == 1]
not_blocked = blocking_df[blocking_df['blocked'] == 0]
ax1.scatter(blocked['plate_x'], blocked['plate_y'], blocked['plate_z'],
c='green', marker='o', alpha=0.3, label='Blocked')
ax1.scatter(not_blocked['plate_x'], not_blocked['plate_y'], not_blocked['plate_z'],
c='red', marker='x', alpha=0.3, label='Not Blocked')
ax1.set_xlabel('Horizontal Location (ft)')
ax1.set_ylabel('Distance from Plate (ft)')
ax1.set_zlabel('Height (ft)')
ax1.set_title('Blocking Success by Location')
ax1.legend()
# Plot 2: Blocking difficulty heatmap
ax2 = fig.add_subplot(122)
difficulty_bins = pd.cut(blocking_df['difficulty'], bins=10)
blocking_df['diff_bin'] = difficulty_bins
diff_analysis = blocking_df.groupby(['catcher', 'diff_bin']).agg({
'blocked': 'mean',
'pitch_id': 'count'
}).reset_index()
diff_pivot = diff_analysis.pivot_table(
index='catcher',
columns='diff_bin',
values='blocked'
)
import seaborn as sns
sns.heatmap(diff_pivot, annot=True, fmt='.2f', cmap='RdYlGn', ax=ax2)
ax2.set_title('Block Rate by Difficulty Tier')
ax2.set_xlabel('Difficulty (Distance from Ideal)')
ax2.set_ylabel('Catcher')
plt.tight_layout()
plt.savefig('blocking_analysis.png', dpi=300, bbox_inches='tight')
plt.close()
# Calculate blocking value by pitch type
pitch_type_blocking = blocking_df.groupby(['catcher', 'pitch_type']).agg({
'blocked': ['sum', 'count', 'mean'],
'expected_block': 'sum'
}).round(3)
print("\nBlocking Performance by Pitch Type:")
print(pitch_type_blocking)
Situational Blocking
Blocking value increases with runners on base, particularly in scoring position. A passed ball with a runner on third directly costs a run. Elite catchers like Yadier Molina and Buster Posey demonstrated heightened focus in high-leverage situations, maintaining block rates even on difficult pitches.
The Pitcher-Catcher Blocking Dynamic
Certain pitcher types create more blocking challenges:
- Hard sinkerballers (e.g., Zack Britton, Emmanuel Clase) generate many pitches in the dirt
- Splitter pitchers (e.g., Kevin Gausman) have late-diving action
- Young, wild pitchers have less control over pitch location
Teams consider catcher blocking ability when pairing batteries. A pitcher with elite control (e.g., Zack Greinke) can succeed with a poor blocker, while a wild power pitcher benefits from an elite blocker.
# R: Analyzing Blocking Performance
library(tidyverse)
# Simulate pitch in dirt data
set.seed(2024)
n_dirt_pitches <- 5000
blocking_data <- tibble(
pitch_id = 1:n_dirt_pitches,
catcher = sample(c("Molina", "Realmuto", "Perez", "Rutschman", "Avg_Catcher"),
n_dirt_pitches, replace = TRUE),
# Distance from plate (negative = in front of plate)
plate_y = runif(n_dirt_pitches, -4, -0.5),
# Horizontal location
plate_x = rnorm(n_dirt_pitches, 0, 0.6),
# Height at front of plate
plate_z = runif(n_dirt_pitches, -0.5, 0.8),
pitch_type = sample(c("FF", "SL", "CH", "CU"), n_dirt_pitches, replace = TRUE),
# Runner on base indicator
runner_on = sample(c(TRUE, FALSE), n_dirt_pitches, replace = TRUE)
) %>%
mutate(
# Difficulty score based on location
difficulty = sqrt(plate_x^2 + plate_y^2 + plate_z^2),
# Base block probability
base_block_prob = plogis(2 - 0.8 * difficulty),
# Catcher skill modifier
catcher_skill = case_when(
catcher == "Molina" ~ 0.5,
catcher == "Realmuto" ~ 0.3,
catcher == "Rutschman" ~ 0.2,
catcher == "Avg_Catcher" ~ 0,
catcher == "Perez" ~ -0.1
),
block_prob = plogis(qlogis(base_block_prob) + catcher_skill),
blocked = rbinom(n_dirt_pitches, 1, block_prob)
)
# Build expected blocking model
blocking_model <- glm(
blocked ~ plate_x + plate_y + plate_z + I(plate_x^2) + I(plate_y^2),
data = blocking_data,
family = binomial
)
blocking_data <- blocking_data %>%
mutate(expected_block = predict(blocking_model, newdata = ., type = "response"))
# Calculate blocking runs
blocking_summary <- blocking_data %>%
group_by(catcher) %>%
summarise(
dirt_pitches = n(),
actual_blocks = sum(blocked),
expected_blocks = sum(expected_block),
extra_blocks = actual_blocks - expected_blocks,
block_rate = mean(blocked) * 100,
# Each failed block (WP/PB) costs ~0.3 runs on average
blocking_runs = extra_blocks * 0.3,
blocking_runs_per_1000 = blocking_runs / dirt_pitches * 1000
) %>%
arrange(desc(blocking_runs_per_1000))
print(blocking_summary)
# Analyze blocking by difficulty tier
difficulty_analysis <- blocking_data %>%
mutate(
difficulty_tier = cut(difficulty,
breaks = quantile(difficulty, probs = seq(0, 1, 0.25)),
labels = c("Easy", "Medium", "Hard", "Very Hard"),
include.lowest = TRUE)
) %>%
group_by(catcher, difficulty_tier) %>%
summarise(
pitches = n(),
block_rate = mean(blocked) * 100,
.groups = "drop"
) %>%
pivot_wider(names_from = difficulty_tier, values_from = block_rate,
values_fill = 0)
print(difficulty_analysis)
# Visualize blocking performance
ggplot(blocking_summary, aes(x = reorder(catcher, blocking_runs_per_1000),
y = blocking_runs_per_1000, fill = catcher)) +
geom_col() +
coord_flip() +
labs(title = "Blocking Runs Above Average per 1000 Pitches in Dirt",
x = "Catcher",
y = "Blocking Runs per 1000 Dirt Pitches") +
theme_minimal() +
theme(legend.position = "none")
# Impact of blocking with runners on base
runner_impact <- blocking_data %>%
group_by(catcher, runner_on) %>%
summarise(
pitches = n(),
block_rate = mean(blocked) * 100,
.groups = "drop"
) %>%
pivot_wider(names_from = runner_on, values_from = block_rate,
names_prefix = "runner_on_")
print(runner_impact)
# Python: Blocking Analysis with Spatial Components
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Simulate blocking data
np.random.seed(2024)
n_dirt = 5000
blocking_df = pd.DataFrame({
'pitch_id': range(n_dirt),
'catcher': np.random.choice(['Molina', 'Realmuto', 'Perez', 'Rutschman',
'Avg_Catcher'], n_dirt),
'plate_y': np.random.uniform(-4, -0.5, n_dirt),
'plate_x': np.random.normal(0, 0.6, n_dirt),
'plate_z': np.random.uniform(-0.5, 0.8, n_dirt),
'pitch_type': np.random.choice(['FF', 'SL', 'CH', 'CU'], n_dirt),
'runner_on': np.random.choice([True, False], n_dirt)
})
# Calculate difficulty and block probability
blocking_df['difficulty'] = np.sqrt(
blocking_df['plate_x']**2 +
blocking_df['plate_y']**2 +
blocking_df['plate_z']**2
)
catcher_blocking_skill = {
'Molina': 0.5,
'Realmuto': 0.3,
'Rutschman': 0.2,
'Avg_Catcher': 0,
'Perez': -0.1
}
blocking_df['catcher_skill'] = blocking_df['catcher'].map(catcher_blocking_skill)
def inv_logit(x):
return 1 / (1 + np.exp(-x))
base_logit = 2 - 0.8 * blocking_df['difficulty']
adj_logit = base_logit + blocking_df['catcher_skill']
block_prob = inv_logit(adj_logit)
blocking_df['blocked'] = np.random.binomial(1, block_prob)
# Train Random Forest model for expected blocks
feature_cols = ['plate_x', 'plate_y', 'plate_z']
X = blocking_df[feature_cols]
y = blocking_df['blocked']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
rf_model = RandomForestClassifier(
n_estimators=100,
max_depth=10,
random_state=42
)
rf_model.fit(X_train, y_train)
# Generate expected block probabilities
blocking_df['expected_block'] = rf_model.predict_proba(X)[:, 1]
# Calculate blocking metrics
blocking_summary = blocking_df.groupby('catcher').agg({
'pitch_id': 'count',
'blocked': 'sum',
'expected_block': 'sum'
})
blocking_summary.columns = ['dirt_pitches', 'actual_blocks', 'expected_blocks']
blocking_summary['extra_blocks'] = (
blocking_summary['actual_blocks'] - blocking_summary['expected_blocks']
)
blocking_summary['block_rate'] = (
blocking_summary['actual_blocks'] / blocking_summary['dirt_pitches'] * 100
)
blocking_summary['blocking_runs'] = blocking_summary['extra_blocks'] * 0.3
blocking_summary['blocking_runs_per_1000'] = (
blocking_summary['blocking_runs'] / blocking_summary['dirt_pitches'] * 1000
)
blocking_summary = blocking_summary.sort_values('blocking_runs_per_1000',
ascending=False)
print("\nBlocking Performance Summary:")
print(blocking_summary.round(2))
# 3D visualization of blocking difficulty zones
fig = plt.figure(figsize=(12, 5))
# Plot 1: 3D scatter of blocks vs non-blocks
ax1 = fig.add_subplot(121, projection='3d')
blocked = blocking_df[blocking_df['blocked'] == 1]
not_blocked = blocking_df[blocking_df['blocked'] == 0]
ax1.scatter(blocked['plate_x'], blocked['plate_y'], blocked['plate_z'],
c='green', marker='o', alpha=0.3, label='Blocked')
ax1.scatter(not_blocked['plate_x'], not_blocked['plate_y'], not_blocked['plate_z'],
c='red', marker='x', alpha=0.3, label='Not Blocked')
ax1.set_xlabel('Horizontal Location (ft)')
ax1.set_ylabel('Distance from Plate (ft)')
ax1.set_zlabel('Height (ft)')
ax1.set_title('Blocking Success by Location')
ax1.legend()
# Plot 2: Blocking difficulty heatmap
ax2 = fig.add_subplot(122)
difficulty_bins = pd.cut(blocking_df['difficulty'], bins=10)
blocking_df['diff_bin'] = difficulty_bins
diff_analysis = blocking_df.groupby(['catcher', 'diff_bin']).agg({
'blocked': 'mean',
'pitch_id': 'count'
}).reset_index()
diff_pivot = diff_analysis.pivot_table(
index='catcher',
columns='diff_bin',
values='blocked'
)
import seaborn as sns
sns.heatmap(diff_pivot, annot=True, fmt='.2f', cmap='RdYlGn', ax=ax2)
ax2.set_title('Block Rate by Difficulty Tier')
ax2.set_xlabel('Difficulty (Distance from Ideal)')
ax2.set_ylabel('Catcher')
plt.tight_layout()
plt.savefig('blocking_analysis.png', dpi=300, bbox_inches='tight')
plt.close()
# Calculate blocking value by pitch type
pitch_type_blocking = blocking_df.groupby(['catcher', 'pitch_type']).agg({
'blocked': ['sum', 'count', 'mean'],
'expected_block': 'sum'
}).round(3)
print("\nBlocking Performance by Pitch Type:")
print(pitch_type_blocking)
Catcher throwing ability encompasses raw arm strength, transfer speed, throwing accuracy, and the more intangible skill of controlling the running game through reputation and pitcher management.
Components of Throwing Ability
Pop Time: The elapsed time from pitch contact to ball arrival at the base. Elite catchers achieve sub-1.90 second pop times to second base, with J.T. Realmuto regularly posting times in the 1.85-1.88 range.
Exchange Speed: The transfer from glove to throwing hand. Quick exchanges can save 0.1-0.2 seconds compared to slow receivers.
Arm Strength: Raw velocity on throws, typically measured in MPH. Elite arms like Jorge Alfaro and Salvador Perez exceed 85 MPH on throws to second.
Accuracy: Throwing to the correct location for the middle infielder to apply the tag. Erratic throws, even if they arrive on time, often result in safe calls or throwing errors.
Caught Stealing Metrics
Traditional caught stealing percentage (CS%) has limitations: it doesn't account for attempt frequency or baserunner speed. Modern metrics include:
- Caught Stealing Above Average (CSAA): Expected CS based on baserunner speed and pitcher delivery
- Stolen Base Runs Prevented (SBRP): Run value of CS and deterred attempts
- Pop Time: Statcast measurement of exchange and throw speed
- Baserunner Attempt Rate: How often runners try to steal against this catcher
# R: Analyzing Throwing and Caught Stealing
library(tidyverse)
# Simulate stolen base attempt data
set.seed(2025)
n_attempts <- 2000
sb_data <- tibble(
attempt_id = 1:n_attempts,
catcher = sample(c("Realmuto", "Perez", "Salvy", "Rutschman", "Avg_Catcher"),
n_attempts, replace = TRUE),
runner_speed = rnorm(n_attempts, 27, 1.5), # Sprint speed in ft/s
pitcher_time = rnorm(n_attempts, 1.3, 0.15), # Time to plate
lead_distance = rnorm(n_attempts, 12, 2), # Lead in feet
pitcher_hand = sample(c("R", "L"), n_attempts, replace = TRUE)
) %>%
mutate(
# Pop time by catcher
pop_time = case_when(
catcher == "Realmuto" ~ rnorm(n(), 1.87, 0.08),
catcher == "Perez" ~ rnorm(n(), 1.92, 0.10),
catcher == "Rutschman" ~ rnorm(n(), 1.95, 0.09),
catcher == "Salvy" ~ rnorm(n(), 1.98, 0.11),
catcher == "Avg_Catcher" ~ rnorm(n(), 2.00, 0.10)
),
# Total time for runner
total_time_runner = (127 - lead_distance) / runner_speed,
# Total time for catcher
total_time_catcher = pitcher_time + pop_time,
# Caught stealing (with some randomness)
caught_stealing = total_time_catcher < total_time_runner + rnorm(n(), 0, 0.1)
)
# Calculate caught stealing metrics
cs_summary <- sb_data %>%
group_by(catcher) %>%
summarise(
attempts = n(),
caught_stealing = sum(caught_stealing),
cs_pct = mean(caught_stealing) * 100,
avg_pop_time = mean(pop_time),
sb_allowed = attempts - caught_stealing
) %>%
arrange(desc(cs_pct))
print(cs_summary)
# Build expected CS model
cs_model <- glm(
caught_stealing ~ runner_speed + pitcher_time + lead_distance + pitcher_hand,
data = sb_data,
family = binomial
)
sb_data <- sb_data %>%
mutate(expected_cs_prob = predict(cs_model, newdata = ., type = "response"))
# Calculate CS Above Average
cs_above_avg <- sb_data %>%
group_by(catcher) %>%
summarise(
attempts = n(),
actual_cs = sum(caught_stealing),
expected_cs = sum(expected_cs_prob),
cs_above_avg = actual_cs - expected_cs,
# Each CS worth ~0.5 runs, each SB costs ~0.2 runs
throwing_runs = cs_above_avg * 0.7
) %>%
arrange(desc(cs_above_avg))
print(cs_above_avg)
# Visualize pop time distribution
ggplot(sb_data, aes(x = pop_time, fill = catcher)) +
geom_density(alpha = 0.5) +
geom_vline(xintercept = 1.95, linetype = "dashed", color = "black") +
annotate("text", x = 1.95, y = 3, label = "MLB Avg (~1.95s)",
angle = 90, vjust = -0.5) +
labs(title = "Pop Time Distribution by Catcher",
x = "Pop Time (seconds)",
y = "Density") +
theme_minimal()
# Analyze success rate by pop time bins
pop_time_analysis <- sb_data %>%
mutate(
pop_time_bin = cut(pop_time,
breaks = c(0, 1.90, 1.95, 2.00, 2.10, Inf),
labels = c("<1.90", "1.90-1.95", "1.95-2.00",
"2.00-2.10", ">2.10"))
) %>%
group_by(pop_time_bin) %>%
summarise(
attempts = n(),
cs_rate = mean(caught_stealing) * 100,
avg_runner_speed = mean(runner_speed)
)
print(pop_time_analysis)
# Create visualization
ggplot(pop_time_analysis, aes(x = pop_time_bin, y = cs_rate,
fill = pop_time_bin)) +
geom_col() +
geom_text(aes(label = sprintf("%.1f%%", cs_rate)),
vjust = -0.5, size = 4) +
labs(title = "Caught Stealing Rate by Pop Time",
subtitle = "Faster exchanges lead to more caught stealers",
x = "Pop Time Range (seconds)",
y = "Caught Stealing Rate (%)") +
theme_minimal() +
theme(legend.position = "none")
# Python: Comprehensive Throwing Analysis
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
# Simulate throwing data
np.random.seed(2025)
n_attempts = 2000
throwing_df = pd.DataFrame({
'attempt_id': range(n_attempts),
'catcher': np.random.choice(['Realmuto', 'Perez', 'Rutschman',
'Salvy', 'Avg_Catcher'], n_attempts),
'runner_speed': np.random.normal(27, 1.5, n_attempts),
'pitcher_time': np.random.normal(1.3, 0.15, n_attempts),
'lead_distance': np.random.normal(12, 2, n_attempts),
'pitcher_hand': np.random.choice(['R', 'L'], n_attempts)
})
# Assign pop times by catcher skill
pop_time_map = {
'Realmuto': (1.87, 0.08),
'Perez': (1.92, 0.10),
'Rutschman': (1.95, 0.09),
'Salvy': (1.98, 0.11),
'Avg_Catcher': (2.00, 0.10)
}
throwing_df['pop_time'] = throwing_df['catcher'].apply(
lambda x: np.random.normal(pop_time_map[x][0], pop_time_map[x][1])
)
# Calculate times
throwing_df['total_time_runner'] = (
(127 - throwing_df['lead_distance']) / throwing_df['runner_speed']
)
throwing_df['total_time_catcher'] = (
throwing_df['pitcher_time'] + throwing_df['pop_time']
)
# Determine outcome with some randomness
throwing_df['caught_stealing'] = (
throwing_df['total_time_catcher'] <
throwing_df['total_time_runner'] + np.random.normal(0, 0.1, n_attempts)
).astype(int)
# Summary statistics
cs_summary = throwing_df.groupby('catcher').agg({
'attempt_id': 'count',
'caught_stealing': ['sum', 'mean'],
'pop_time': 'mean'
}).round(3)
cs_summary.columns = ['attempts', 'caught_stealing', 'cs_rate', 'avg_pop_time']
cs_summary['cs_pct'] = cs_summary['cs_rate'] * 100
cs_summary = cs_summary.sort_values('cs_pct', ascending=False)
print("\nCaught Stealing Summary:")
print(cs_summary)
# Build logistic regression for expected CS
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
throwing_df['pitcher_hand_enc'] = le.fit_transform(throwing_df['pitcher_hand'])
X = throwing_df[['runner_speed', 'pitcher_time', 'lead_distance',
'pitcher_hand_enc']]
y = throwing_df['caught_stealing']
lr_model = LogisticRegression()
lr_model.fit(X, y)
throwing_df['expected_cs_prob'] = lr_model.predict_proba(X)[:, 1]
# Calculate CS Above Average
cs_above_avg = throwing_df.groupby('catcher').agg({
'attempt_id': 'count',
'caught_stealing': 'sum',
'expected_cs_prob': 'sum'
})
cs_above_avg.columns = ['attempts', 'actual_cs', 'expected_cs']
cs_above_avg['cs_above_avg'] = (
cs_above_avg['actual_cs'] - cs_above_avg['expected_cs']
)
cs_above_avg['throwing_runs'] = cs_above_avg['cs_above_avg'] * 0.7
cs_above_avg = cs_above_avg.sort_values('cs_above_avg', ascending=False)
print("\nCS Above Average:")
print(cs_above_avg.round(2))
# Visualization 1: Pop time vs CS rate
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Scatter plot
for catcher in throwing_df['catcher'].unique():
catcher_data = throwing_df[throwing_df['catcher'] == catcher]
axes[0].scatter(catcher_data['pop_time'],
catcher_data['caught_stealing'],
alpha=0.3, label=catcher)
axes[0].set_xlabel('Pop Time (seconds)')
axes[0].set_ylabel('Caught Stealing (1=Yes, 0=No)')
axes[0].set_title('Pop Time vs Caught Stealing Outcome')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# Box plot of pop times
throwing_df.boxplot(column='pop_time', by='catcher', ax=axes[1])
axes[1].set_xlabel('Catcher')
axes[1].set_ylabel('Pop Time (seconds)')
axes[1].set_title('Pop Time Distribution by Catcher')
axes[1].axhline(y=1.95, color='r', linestyle='--', label='League Average')
plt.suptitle('') # Remove automatic title
plt.tight_layout()
plt.savefig('throwing_analysis.png', dpi=300, bbox_inches='tight')
plt.close()
# Visualization 2: Success rate by pop time bins
throwing_df['pop_time_bin'] = pd.cut(
throwing_df['pop_time'],
bins=[0, 1.90, 1.95, 2.00, 2.10, np.inf],
labels=['<1.90', '1.90-1.95', '1.95-2.00', '2.00-2.10', '>2.10']
)
bin_analysis = throwing_df.groupby('pop_time_bin').agg({
'caught_stealing': ['count', 'mean'],
'runner_speed': 'mean'
}).round(3)
bin_analysis.columns = ['attempts', 'cs_rate', 'avg_runner_speed']
bin_analysis['cs_pct'] = bin_analysis['cs_rate'] * 100
print("\nCS Rate by Pop Time Bin:")
print(bin_analysis)
# Create bar chart
plt.figure(figsize=(10, 6))
bars = plt.bar(range(len(bin_analysis)), bin_analysis['cs_pct'])
plt.xlabel('Pop Time Range (seconds)')
plt.ylabel('Caught Stealing Rate (%)')
plt.title('Impact of Pop Time on Caught Stealing Success')
plt.xticks(range(len(bin_analysis)), bin_analysis.index)
# Add value labels on bars
for i, (idx, row) in enumerate(bin_analysis.iterrows()):
plt.text(i, row['cs_pct'] + 1, f"{row['cs_pct']:.1f}%",
ha='center', va='bottom')
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig('cs_by_poptime.png', dpi=300, bbox_inches='tight')
plt.close()
print("\nCorrelation between pop time and CS success:")
print(f"Pearson r = {stats.pearsonr(throwing_df['pop_time'], throwing_df['caught_stealing'])[0]:.3f}")
Controlling the Running Game
Beyond throwing out runners, elite catchers deter attempts through reputation. J.T. Realmuto's elite arm strength and quick release led to significantly fewer stolen base attempts per game. This "deterrence value" doesn't appear in traditional stats but prevents runs.
Factors Affecting Attempt Rate:
- Catcher pop time and CS%
- Pitcher delivery time (slow deliveries invite attempts)
- Game situation (score, inning, runners)
- Catcher reputation and scouting reports
Teams with elite throwing catchers can employ pitchers with slower deliveries, while teams with poor throwers must prioritize quick-to-the-plate pitchers or employ more pickoff attempts and slide-step deliveries.
The Declining Stolen Base Environment
Stolen base attempts have declined significantly since the 2010s as teams recognize the unfavorable risk-reward ratio (need >75% success rate to break even). This reduces the relative value of elite throwing, though it remains important against aggressive baserunning teams and in high-leverage playoff situations.
The 2023 rule changes (larger bases, restricted pickoffs) increased stolen base attempts significantly, potentially elevating the value of throwing ability once again.
# R: Analyzing Throwing and Caught Stealing
library(tidyverse)
# Simulate stolen base attempt data
set.seed(2025)
n_attempts <- 2000
sb_data <- tibble(
attempt_id = 1:n_attempts,
catcher = sample(c("Realmuto", "Perez", "Salvy", "Rutschman", "Avg_Catcher"),
n_attempts, replace = TRUE),
runner_speed = rnorm(n_attempts, 27, 1.5), # Sprint speed in ft/s
pitcher_time = rnorm(n_attempts, 1.3, 0.15), # Time to plate
lead_distance = rnorm(n_attempts, 12, 2), # Lead in feet
pitcher_hand = sample(c("R", "L"), n_attempts, replace = TRUE)
) %>%
mutate(
# Pop time by catcher
pop_time = case_when(
catcher == "Realmuto" ~ rnorm(n(), 1.87, 0.08),
catcher == "Perez" ~ rnorm(n(), 1.92, 0.10),
catcher == "Rutschman" ~ rnorm(n(), 1.95, 0.09),
catcher == "Salvy" ~ rnorm(n(), 1.98, 0.11),
catcher == "Avg_Catcher" ~ rnorm(n(), 2.00, 0.10)
),
# Total time for runner
total_time_runner = (127 - lead_distance) / runner_speed,
# Total time for catcher
total_time_catcher = pitcher_time + pop_time,
# Caught stealing (with some randomness)
caught_stealing = total_time_catcher < total_time_runner + rnorm(n(), 0, 0.1)
)
# Calculate caught stealing metrics
cs_summary <- sb_data %>%
group_by(catcher) %>%
summarise(
attempts = n(),
caught_stealing = sum(caught_stealing),
cs_pct = mean(caught_stealing) * 100,
avg_pop_time = mean(pop_time),
sb_allowed = attempts - caught_stealing
) %>%
arrange(desc(cs_pct))
print(cs_summary)
# Build expected CS model
cs_model <- glm(
caught_stealing ~ runner_speed + pitcher_time + lead_distance + pitcher_hand,
data = sb_data,
family = binomial
)
sb_data <- sb_data %>%
mutate(expected_cs_prob = predict(cs_model, newdata = ., type = "response"))
# Calculate CS Above Average
cs_above_avg <- sb_data %>%
group_by(catcher) %>%
summarise(
attempts = n(),
actual_cs = sum(caught_stealing),
expected_cs = sum(expected_cs_prob),
cs_above_avg = actual_cs - expected_cs,
# Each CS worth ~0.5 runs, each SB costs ~0.2 runs
throwing_runs = cs_above_avg * 0.7
) %>%
arrange(desc(cs_above_avg))
print(cs_above_avg)
# Visualize pop time distribution
ggplot(sb_data, aes(x = pop_time, fill = catcher)) +
geom_density(alpha = 0.5) +
geom_vline(xintercept = 1.95, linetype = "dashed", color = "black") +
annotate("text", x = 1.95, y = 3, label = "MLB Avg (~1.95s)",
angle = 90, vjust = -0.5) +
labs(title = "Pop Time Distribution by Catcher",
x = "Pop Time (seconds)",
y = "Density") +
theme_minimal()
# Analyze success rate by pop time bins
pop_time_analysis <- sb_data %>%
mutate(
pop_time_bin = cut(pop_time,
breaks = c(0, 1.90, 1.95, 2.00, 2.10, Inf),
labels = c("<1.90", "1.90-1.95", "1.95-2.00",
"2.00-2.10", ">2.10"))
) %>%
group_by(pop_time_bin) %>%
summarise(
attempts = n(),
cs_rate = mean(caught_stealing) * 100,
avg_runner_speed = mean(runner_speed)
)
print(pop_time_analysis)
# Create visualization
ggplot(pop_time_analysis, aes(x = pop_time_bin, y = cs_rate,
fill = pop_time_bin)) +
geom_col() +
geom_text(aes(label = sprintf("%.1f%%", cs_rate)),
vjust = -0.5, size = 4) +
labs(title = "Caught Stealing Rate by Pop Time",
subtitle = "Faster exchanges lead to more caught stealers",
x = "Pop Time Range (seconds)",
y = "Caught Stealing Rate (%)") +
theme_minimal() +
theme(legend.position = "none")
# Python: Comprehensive Throwing Analysis
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
# Simulate throwing data
np.random.seed(2025)
n_attempts = 2000
throwing_df = pd.DataFrame({
'attempt_id': range(n_attempts),
'catcher': np.random.choice(['Realmuto', 'Perez', 'Rutschman',
'Salvy', 'Avg_Catcher'], n_attempts),
'runner_speed': np.random.normal(27, 1.5, n_attempts),
'pitcher_time': np.random.normal(1.3, 0.15, n_attempts),
'lead_distance': np.random.normal(12, 2, n_attempts),
'pitcher_hand': np.random.choice(['R', 'L'], n_attempts)
})
# Assign pop times by catcher skill
pop_time_map = {
'Realmuto': (1.87, 0.08),
'Perez': (1.92, 0.10),
'Rutschman': (1.95, 0.09),
'Salvy': (1.98, 0.11),
'Avg_Catcher': (2.00, 0.10)
}
throwing_df['pop_time'] = throwing_df['catcher'].apply(
lambda x: np.random.normal(pop_time_map[x][0], pop_time_map[x][1])
)
# Calculate times
throwing_df['total_time_runner'] = (
(127 - throwing_df['lead_distance']) / throwing_df['runner_speed']
)
throwing_df['total_time_catcher'] = (
throwing_df['pitcher_time'] + throwing_df['pop_time']
)
# Determine outcome with some randomness
throwing_df['caught_stealing'] = (
throwing_df['total_time_catcher'] <
throwing_df['total_time_runner'] + np.random.normal(0, 0.1, n_attempts)
).astype(int)
# Summary statistics
cs_summary = throwing_df.groupby('catcher').agg({
'attempt_id': 'count',
'caught_stealing': ['sum', 'mean'],
'pop_time': 'mean'
}).round(3)
cs_summary.columns = ['attempts', 'caught_stealing', 'cs_rate', 'avg_pop_time']
cs_summary['cs_pct'] = cs_summary['cs_rate'] * 100
cs_summary = cs_summary.sort_values('cs_pct', ascending=False)
print("\nCaught Stealing Summary:")
print(cs_summary)
# Build logistic regression for expected CS
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
throwing_df['pitcher_hand_enc'] = le.fit_transform(throwing_df['pitcher_hand'])
X = throwing_df[['runner_speed', 'pitcher_time', 'lead_distance',
'pitcher_hand_enc']]
y = throwing_df['caught_stealing']
lr_model = LogisticRegression()
lr_model.fit(X, y)
throwing_df['expected_cs_prob'] = lr_model.predict_proba(X)[:, 1]
# Calculate CS Above Average
cs_above_avg = throwing_df.groupby('catcher').agg({
'attempt_id': 'count',
'caught_stealing': 'sum',
'expected_cs_prob': 'sum'
})
cs_above_avg.columns = ['attempts', 'actual_cs', 'expected_cs']
cs_above_avg['cs_above_avg'] = (
cs_above_avg['actual_cs'] - cs_above_avg['expected_cs']
)
cs_above_avg['throwing_runs'] = cs_above_avg['cs_above_avg'] * 0.7
cs_above_avg = cs_above_avg.sort_values('cs_above_avg', ascending=False)
print("\nCS Above Average:")
print(cs_above_avg.round(2))
# Visualization 1: Pop time vs CS rate
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Scatter plot
for catcher in throwing_df['catcher'].unique():
catcher_data = throwing_df[throwing_df['catcher'] == catcher]
axes[0].scatter(catcher_data['pop_time'],
catcher_data['caught_stealing'],
alpha=0.3, label=catcher)
axes[0].set_xlabel('Pop Time (seconds)')
axes[0].set_ylabel('Caught Stealing (1=Yes, 0=No)')
axes[0].set_title('Pop Time vs Caught Stealing Outcome')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# Box plot of pop times
throwing_df.boxplot(column='pop_time', by='catcher', ax=axes[1])
axes[1].set_xlabel('Catcher')
axes[1].set_ylabel('Pop Time (seconds)')
axes[1].set_title('Pop Time Distribution by Catcher')
axes[1].axhline(y=1.95, color='r', linestyle='--', label='League Average')
plt.suptitle('') # Remove automatic title
plt.tight_layout()
plt.savefig('throwing_analysis.png', dpi=300, bbox_inches='tight')
plt.close()
# Visualization 2: Success rate by pop time bins
throwing_df['pop_time_bin'] = pd.cut(
throwing_df['pop_time'],
bins=[0, 1.90, 1.95, 2.00, 2.10, np.inf],
labels=['<1.90', '1.90-1.95', '1.95-2.00', '2.00-2.10', '>2.10']
)
bin_analysis = throwing_df.groupby('pop_time_bin').agg({
'caught_stealing': ['count', 'mean'],
'runner_speed': 'mean'
}).round(3)
bin_analysis.columns = ['attempts', 'cs_rate', 'avg_runner_speed']
bin_analysis['cs_pct'] = bin_analysis['cs_rate'] * 100
print("\nCS Rate by Pop Time Bin:")
print(bin_analysis)
# Create bar chart
plt.figure(figsize=(10, 6))
bars = plt.bar(range(len(bin_analysis)), bin_analysis['cs_pct'])
plt.xlabel('Pop Time Range (seconds)')
plt.ylabel('Caught Stealing Rate (%)')
plt.title('Impact of Pop Time on Caught Stealing Success')
plt.xticks(range(len(bin_analysis)), bin_analysis.index)
# Add value labels on bars
for i, (idx, row) in enumerate(bin_analysis.iterrows()):
plt.text(i, row['cs_pct'] + 1, f"{row['cs_pct']:.1f}%",
ha='center', va='bottom')
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig('cs_by_poptime.png', dpi=300, bbox_inches='tight')
plt.close()
print("\nCorrelation between pop time and CS success:")
print(f"Pearson r = {stats.pearsonr(throwing_df['pop_time'], throwing_df['caught_stealing'])[0]:.3f}")
The least quantifiable aspect of catcher value involves pitch calling, sequencing, and managing pitcher psychology. While difficult to measure, teams believe these skills significantly impact pitcher performance.
Pitch Calling and Sequencing
Modern pitch calling is increasingly collaborative, with input from:
- Bench coaches calling pitches via signals
- Advance scouting providing batter tendencies
- Analytics departments suggesting pitch types and locations
- Technology (e.g., PitchCom devices) facilitating communication
Despite this support, catchers make real-time adjustments based on:
- Pitcher stuff quality that day
- Batter's swing decisions and timing
- Umpire strike zone tendencies
- Game situation and leverage
- Weather and environmental factors
Elite game callers like Yadier Molina and Buster Posey developed reputations for maximizing pitcher performance. Quantifying this requires comparing pitcher results with different catchers, controlling for opposition quality and other factors.
Measuring Game Calling Impact
Researchers use several approaches:
Catcher ERA: Compare team ERA with each catcher, though this conflates multiple skills and suffers from small sample noise.
Pitch Type Frequency Analysis: Examine whether catchers call for optimal pitch mixes based on situation and batter tendencies.
Sequencing Metrics: Measure whether pitch sequences deviate from expectations in ways that improve outcomes.
Pitcher Testimony: Qualitative feedback from pitchers about catcher impact.
# R: Analyzing Catcher Impact on Pitcher Performance
library(tidyverse)
library(lme4) # For mixed effects models
# Simulate pitcher performance data by catcher
set.seed(2026)
n_games <- 1000
pitcher_catcher_data <- tibble(
game_id = 1:n_games,
pitcher = sample(paste0("P", 1:30), n_games, replace = TRUE),
catcher = sample(c("Realmuto", "Rutschman", "Molina", "Smith", "Avg_Catcher"),
n_games, replace = TRUE),
opponent_wrc = rnorm(n_games, 100, 15), # Opponent quality
temperature = rnorm(n_games, 75, 10),
pitcher_stuff = rnorm(n_games, 50, 10) # Pitcher stuff that day
) %>%
mutate(
# Catcher effect on performance
catcher_effect = case_when(
catcher == "Molina" ~ -0.3, # Negative = better (lower ERA)
catcher == "Realmuto" ~ -0.2,
catcher == "Rutschman" ~ -0.15,
catcher == "Smith" ~ -0.1,
catcher == "Avg_Catcher" ~ 0
),
# Generate ERA for that game
game_era = 4.5 +
(opponent_wrc - 100) * 0.02 +
(pitcher_stuff - 50) * -0.03 +
catcher_effect +
rnorm(n_games, 0, 1.5),
game_era = pmax(0, game_era) # ERA can't be negative
)
# Basic catcher ERA comparison
catcher_era <- pitcher_catcher_data %>%
group_by(catcher) %>%
summarise(
games = n(),
avg_era = mean(game_era),
median_era = median(game_era),
sd_era = sd(game_era)
) %>%
arrange(avg_era)
print(catcher_era)
# Build mixed effects model to control for pitcher quality
# This accounts for the fact that different catchers catch different pitchers
mixed_model <- lmer(
game_era ~ catcher + opponent_wrc + temperature + (1 | pitcher),
data = pitcher_catcher_data
)
# Extract catcher effects
catcher_effects <- summary(mixed_model)$coefficients %>%
as.data.frame() %>%
rownames_to_column("term") %>%
filter(str_detect(term, "catcher")) %>%
select(term, Estimate, `Std. Error`, `t value`)
print(catcher_effects)
# Analyze pitcher performance variance by catcher
# Do certain catchers get more consistent results?
pitcher_variance <- pitcher_catcher_data %>%
group_by(pitcher, catcher) %>%
filter(n() >= 5) %>% # At least 5 starts together
summarise(
starts = n(),
avg_era = mean(game_era),
sd_era = sd(game_era),
.groups = "drop"
)
variance_summary <- pitcher_variance %>%
group_by(catcher) %>%
summarise(
pitcher_pairs = n(),
avg_variance = mean(sd_era),
median_variance = median(sd_era)
) %>%
arrange(avg_variance)
print(variance_summary)
# Visualize catcher impact
ggplot(pitcher_catcher_data, aes(x = reorder(catcher, game_era),
y = game_era, fill = catcher)) +
geom_boxplot() +
coord_flip() +
labs(title = "Game ERA Distribution by Catcher",
subtitle = "Lower ERA indicates better pitcher performance",
x = "Catcher",
y = "Game ERA") +
theme_minimal() +
theme(legend.position = "none")
# Simulate pitch calling data
set.seed(2027)
n_pas <- 10000
pitch_calling <- tibble(
pa_id = 1:n_pas,
catcher = sample(c("Elite_Caller", "Avg_Caller", "Poor_Caller"),
n_pas, replace = TRUE),
count = sample(c("0-0", "0-1", "1-0", "1-1", "2-0", "0-2", "2-1", "3-1", "2-2", "3-2"),
n_pas, replace = TRUE),
pitch_type_called = sample(c("FB", "SL", "CH", "CU"), n_pas, replace = TRUE),
batter_expects = sample(c("FB", "SL", "CH", "CU"), n_pas, replace = TRUE)
) %>%
mutate(
# Surprise factor
surprised_batter = pitch_type_called != batter_expects,
# Better callers surprise batters more in pitcher's counts
is_pitchers_count = count %in% c("0-1", "0-2", "1-2", "2-2"),
caller_skill = case_when(
catcher == "Elite_Caller" ~ 0.15,
catcher == "Avg_Caller" ~ 0,
catcher == "Poor_Caller" ~ -0.1
),
# Outcome (simplified: just contact probability)
base_contact_prob = ifelse(surprised_batter, 0.65, 0.75),
adj_contact_prob = plogis(qlogis(base_contact_prob) + caller_skill),
contact_made = rbinom(n_pas, 1, adj_contact_prob)
)
# Analyze calling effectiveness
calling_effectiveness <- pitch_calling %>%
group_by(catcher, is_pitchers_count) %>%
summarise(
pa = n(),
contact_rate = mean(contact_made) * 100,
surprise_rate = mean(surprised_batter) * 100,
.groups = "drop"
) %>%
pivot_wider(names_from = is_pitchers_count,
values_from = c(contact_rate, surprise_rate),
names_prefix = "pitchers_count_")
print(calling_effectiveness)
# Python: Advanced Pitcher-Catcher Pairing Analysis
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
# Simulate pitcher-catcher pair data
np.random.seed(2026)
n_games = 1000
pitchers = [f'P{i}' for i in range(1, 31)]
catchers = ['Realmuto', 'Rutschman', 'Molina', 'Smith', 'Avg_Catcher']
pc_df = pd.DataFrame({
'game_id': range(n_games),
'pitcher': np.random.choice(pitchers, n_games),
'catcher': np.random.choice(catchers, n_games),
'opponent_wrc': np.random.normal(100, 15, n_games),
'temperature': np.random.normal(75, 10, n_games),
'pitcher_stuff': np.random.normal(50, 10, n_games)
})
# Assign catcher effects
catcher_effects = {
'Molina': -0.3,
'Realmuto': -0.2,
'Rutschman': -0.15,
'Smith': -0.1,
'Avg_Catcher': 0
}
pc_df['catcher_effect'] = pc_df['catcher'].map(catcher_effects)
# Generate game ERA
pc_df['game_era'] = (
4.5 +
(pc_df['opponent_wrc'] - 100) * 0.02 +
(pc_df['pitcher_stuff'] - 50) * -0.03 +
pc_df['catcher_effect'] +
np.random.normal(0, 1.5, n_games)
)
pc_df['game_era'] = pc_df['game_era'].clip(lower=0)
# Basic catcher ERA summary
catcher_era = pc_df.groupby('catcher').agg({
'game_id': 'count',
'game_era': ['mean', 'median', 'std']
}).round(3)
catcher_era.columns = ['games', 'avg_era', 'median_era', 'sd_era']
catcher_era = catcher_era.sort_values('avg_era')
print("\nCatcher ERA Summary:")
print(catcher_era)
# Build regression model controlling for other factors
X = pd.get_dummies(pc_df[['catcher', 'opponent_wrc', 'temperature']],
columns=['catcher'], drop_first=True)
y = pc_df['game_era']
lr_model = LinearRegression()
lr_model.fit(X, y)
# Extract catcher coefficients
catcher_cols = [col for col in X.columns if 'catcher_' in col]
catcher_coefs = pd.DataFrame({
'catcher': [col.replace('catcher_', '') for col in catcher_cols],
'era_effect': lr_model.coef_[[X.columns.get_loc(col) for col in catcher_cols]]
})
catcher_coefs = catcher_coefs.sort_values('era_effect')
print("\nCatcher ERA Effects (controlling for opponent and weather):")
print(catcher_coefs)
# Analyze specific pitcher-catcher pairs
pair_analysis = pc_df.groupby(['pitcher', 'catcher']).agg({
'game_id': 'count',
'game_era': ['mean', 'std']
}).reset_index()
pair_analysis.columns = ['pitcher', 'catcher', 'games', 'avg_era', 'std_era']
pair_analysis = pair_analysis[pair_analysis['games'] >= 5] # Minimum sample
# Find best and worst pairs for each pitcher
pitcher_pairs = []
for pitcher in pair_analysis['pitcher'].unique():
pitcher_data = pair_analysis[pair_analysis['pitcher'] == pitcher]
if len(pitcher_data) >= 2:
best = pitcher_data.nsmallest(1, 'avg_era')
worst = pitcher_data.nlargest(1, 'avg_era')
diff = worst['avg_era'].values[0] - best['avg_era'].values[0]
pitcher_pairs.append({
'pitcher': pitcher,
'best_catcher': best['catcher'].values[0],
'best_era': best['avg_era'].values[0],
'worst_catcher': worst['catcher'].values[0],
'worst_era': worst['avg_era'].values[0],
'era_diff': diff
})
pairs_df = pd.DataFrame(pitcher_pairs).sort_values('era_diff', ascending=False)
print("\nLargest Pitcher-Catcher Pairing Effects:")
print(pairs_df.head(10))
# Visualization 1: Catcher ERA comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Box plot
pc_df.boxplot(column='game_era', by='catcher', ax=axes[0])
axes[0].set_xlabel('Catcher')
axes[0].set_ylabel('Game ERA')
axes[0].set_title('Game ERA Distribution by Catcher')
plt.suptitle('')
# Violin plot with points
sns.violinplot(data=pc_df, x='catcher', y='game_era', ax=axes[1])
axes[1].set_xlabel('Catcher')
axes[1].set_ylabel('Game ERA')
axes[1].set_title('Game ERA Distribution (Violin Plot)')
axes[1].axhline(y=pc_df['game_era'].mean(), color='r', linestyle='--',
label='Overall Mean')
axes[1].legend()
plt.tight_layout()
plt.savefig('catcher_era_analysis.png', dpi=300, bbox_inches='tight')
plt.close()
# Visualization 2: Pitcher consistency by catcher
variance_by_catcher = pc_df.groupby(['pitcher', 'catcher']).agg({
'game_era': ['mean', 'std', 'count']
}).reset_index()
variance_by_catcher.columns = ['pitcher', 'catcher', 'mean_era', 'std_era', 'games']
variance_by_catcher = variance_by_catcher[variance_by_catcher['games'] >= 3]
fig, ax = plt.subplots(figsize=(10, 6))
for catcher in catchers:
catcher_data = variance_by_catcher[variance_by_catcher['catcher'] == catcher]
ax.scatter(catcher_data['mean_era'], catcher_data['std_era'],
label=catcher, alpha=0.6, s=50)
ax.set_xlabel('Mean Game ERA')
ax.set_ylabel('Standard Deviation of Game ERA')
ax.set_title('Pitcher Consistency by Catcher\n(Lower std = more consistent)')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('catcher_consistency.png', dpi=300, bbox_inches='tight')
plt.close()
print("\nAnalysis complete. Visualizations saved.")
Pitcher Relationships and Psychology
Beyond mechanics, elite catchers build trust with pitchers through:
- Communication: Discussing approach between innings and during mound visits
- Confidence Building: Supporting pitchers through struggles
- Strategic Adjustment: Recognizing when pitchers don't have their best stuff
- Veteran Leadership: Calming young pitchers in high-pressure situations
Buster Posey's three World Series championships were partly attributed to his ability to optimize Madison Bumgarner, Tim Lincecum, and other Giants pitchers. While these soft skills resist quantification, teams value them highly when evaluating catchers.
# R: Analyzing Catcher Impact on Pitcher Performance
library(tidyverse)
library(lme4) # For mixed effects models
# Simulate pitcher performance data by catcher
set.seed(2026)
n_games <- 1000
pitcher_catcher_data <- tibble(
game_id = 1:n_games,
pitcher = sample(paste0("P", 1:30), n_games, replace = TRUE),
catcher = sample(c("Realmuto", "Rutschman", "Molina", "Smith", "Avg_Catcher"),
n_games, replace = TRUE),
opponent_wrc = rnorm(n_games, 100, 15), # Opponent quality
temperature = rnorm(n_games, 75, 10),
pitcher_stuff = rnorm(n_games, 50, 10) # Pitcher stuff that day
) %>%
mutate(
# Catcher effect on performance
catcher_effect = case_when(
catcher == "Molina" ~ -0.3, # Negative = better (lower ERA)
catcher == "Realmuto" ~ -0.2,
catcher == "Rutschman" ~ -0.15,
catcher == "Smith" ~ -0.1,
catcher == "Avg_Catcher" ~ 0
),
# Generate ERA for that game
game_era = 4.5 +
(opponent_wrc - 100) * 0.02 +
(pitcher_stuff - 50) * -0.03 +
catcher_effect +
rnorm(n_games, 0, 1.5),
game_era = pmax(0, game_era) # ERA can't be negative
)
# Basic catcher ERA comparison
catcher_era <- pitcher_catcher_data %>%
group_by(catcher) %>%
summarise(
games = n(),
avg_era = mean(game_era),
median_era = median(game_era),
sd_era = sd(game_era)
) %>%
arrange(avg_era)
print(catcher_era)
# Build mixed effects model to control for pitcher quality
# This accounts for the fact that different catchers catch different pitchers
mixed_model <- lmer(
game_era ~ catcher + opponent_wrc + temperature + (1 | pitcher),
data = pitcher_catcher_data
)
# Extract catcher effects
catcher_effects <- summary(mixed_model)$coefficients %>%
as.data.frame() %>%
rownames_to_column("term") %>%
filter(str_detect(term, "catcher")) %>%
select(term, Estimate, `Std. Error`, `t value`)
print(catcher_effects)
# Analyze pitcher performance variance by catcher
# Do certain catchers get more consistent results?
pitcher_variance <- pitcher_catcher_data %>%
group_by(pitcher, catcher) %>%
filter(n() >= 5) %>% # At least 5 starts together
summarise(
starts = n(),
avg_era = mean(game_era),
sd_era = sd(game_era),
.groups = "drop"
)
variance_summary <- pitcher_variance %>%
group_by(catcher) %>%
summarise(
pitcher_pairs = n(),
avg_variance = mean(sd_era),
median_variance = median(sd_era)
) %>%
arrange(avg_variance)
print(variance_summary)
# Visualize catcher impact
ggplot(pitcher_catcher_data, aes(x = reorder(catcher, game_era),
y = game_era, fill = catcher)) +
geom_boxplot() +
coord_flip() +
labs(title = "Game ERA Distribution by Catcher",
subtitle = "Lower ERA indicates better pitcher performance",
x = "Catcher",
y = "Game ERA") +
theme_minimal() +
theme(legend.position = "none")
# Simulate pitch calling data
set.seed(2027)
n_pas <- 10000
pitch_calling <- tibble(
pa_id = 1:n_pas,
catcher = sample(c("Elite_Caller", "Avg_Caller", "Poor_Caller"),
n_pas, replace = TRUE),
count = sample(c("0-0", "0-1", "1-0", "1-1", "2-0", "0-2", "2-1", "3-1", "2-2", "3-2"),
n_pas, replace = TRUE),
pitch_type_called = sample(c("FB", "SL", "CH", "CU"), n_pas, replace = TRUE),
batter_expects = sample(c("FB", "SL", "CH", "CU"), n_pas, replace = TRUE)
) %>%
mutate(
# Surprise factor
surprised_batter = pitch_type_called != batter_expects,
# Better callers surprise batters more in pitcher's counts
is_pitchers_count = count %in% c("0-1", "0-2", "1-2", "2-2"),
caller_skill = case_when(
catcher == "Elite_Caller" ~ 0.15,
catcher == "Avg_Caller" ~ 0,
catcher == "Poor_Caller" ~ -0.1
),
# Outcome (simplified: just contact probability)
base_contact_prob = ifelse(surprised_batter, 0.65, 0.75),
adj_contact_prob = plogis(qlogis(base_contact_prob) + caller_skill),
contact_made = rbinom(n_pas, 1, adj_contact_prob)
)
# Analyze calling effectiveness
calling_effectiveness <- pitch_calling %>%
group_by(catcher, is_pitchers_count) %>%
summarise(
pa = n(),
contact_rate = mean(contact_made) * 100,
surprise_rate = mean(surprised_batter) * 100,
.groups = "drop"
) %>%
pivot_wider(names_from = is_pitchers_count,
values_from = c(contact_rate, surprise_rate),
names_prefix = "pitchers_count_")
print(calling_effectiveness)
# Python: Advanced Pitcher-Catcher Pairing Analysis
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
# Simulate pitcher-catcher pair data
np.random.seed(2026)
n_games = 1000
pitchers = [f'P{i}' for i in range(1, 31)]
catchers = ['Realmuto', 'Rutschman', 'Molina', 'Smith', 'Avg_Catcher']
pc_df = pd.DataFrame({
'game_id': range(n_games),
'pitcher': np.random.choice(pitchers, n_games),
'catcher': np.random.choice(catchers, n_games),
'opponent_wrc': np.random.normal(100, 15, n_games),
'temperature': np.random.normal(75, 10, n_games),
'pitcher_stuff': np.random.normal(50, 10, n_games)
})
# Assign catcher effects
catcher_effects = {
'Molina': -0.3,
'Realmuto': -0.2,
'Rutschman': -0.15,
'Smith': -0.1,
'Avg_Catcher': 0
}
pc_df['catcher_effect'] = pc_df['catcher'].map(catcher_effects)
# Generate game ERA
pc_df['game_era'] = (
4.5 +
(pc_df['opponent_wrc'] - 100) * 0.02 +
(pc_df['pitcher_stuff'] - 50) * -0.03 +
pc_df['catcher_effect'] +
np.random.normal(0, 1.5, n_games)
)
pc_df['game_era'] = pc_df['game_era'].clip(lower=0)
# Basic catcher ERA summary
catcher_era = pc_df.groupby('catcher').agg({
'game_id': 'count',
'game_era': ['mean', 'median', 'std']
}).round(3)
catcher_era.columns = ['games', 'avg_era', 'median_era', 'sd_era']
catcher_era = catcher_era.sort_values('avg_era')
print("\nCatcher ERA Summary:")
print(catcher_era)
# Build regression model controlling for other factors
X = pd.get_dummies(pc_df[['catcher', 'opponent_wrc', 'temperature']],
columns=['catcher'], drop_first=True)
y = pc_df['game_era']
lr_model = LinearRegression()
lr_model.fit(X, y)
# Extract catcher coefficients
catcher_cols = [col for col in X.columns if 'catcher_' in col]
catcher_coefs = pd.DataFrame({
'catcher': [col.replace('catcher_', '') for col in catcher_cols],
'era_effect': lr_model.coef_[[X.columns.get_loc(col) for col in catcher_cols]]
})
catcher_coefs = catcher_coefs.sort_values('era_effect')
print("\nCatcher ERA Effects (controlling for opponent and weather):")
print(catcher_coefs)
# Analyze specific pitcher-catcher pairs
pair_analysis = pc_df.groupby(['pitcher', 'catcher']).agg({
'game_id': 'count',
'game_era': ['mean', 'std']
}).reset_index()
pair_analysis.columns = ['pitcher', 'catcher', 'games', 'avg_era', 'std_era']
pair_analysis = pair_analysis[pair_analysis['games'] >= 5] # Minimum sample
# Find best and worst pairs for each pitcher
pitcher_pairs = []
for pitcher in pair_analysis['pitcher'].unique():
pitcher_data = pair_analysis[pair_analysis['pitcher'] == pitcher]
if len(pitcher_data) >= 2:
best = pitcher_data.nsmallest(1, 'avg_era')
worst = pitcher_data.nlargest(1, 'avg_era')
diff = worst['avg_era'].values[0] - best['avg_era'].values[0]
pitcher_pairs.append({
'pitcher': pitcher,
'best_catcher': best['catcher'].values[0],
'best_era': best['avg_era'].values[0],
'worst_catcher': worst['catcher'].values[0],
'worst_era': worst['avg_era'].values[0],
'era_diff': diff
})
pairs_df = pd.DataFrame(pitcher_pairs).sort_values('era_diff', ascending=False)
print("\nLargest Pitcher-Catcher Pairing Effects:")
print(pairs_df.head(10))
# Visualization 1: Catcher ERA comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Box plot
pc_df.boxplot(column='game_era', by='catcher', ax=axes[0])
axes[0].set_xlabel('Catcher')
axes[0].set_ylabel('Game ERA')
axes[0].set_title('Game ERA Distribution by Catcher')
plt.suptitle('')
# Violin plot with points
sns.violinplot(data=pc_df, x='catcher', y='game_era', ax=axes[1])
axes[1].set_xlabel('Catcher')
axes[1].set_ylabel('Game ERA')
axes[1].set_title('Game ERA Distribution (Violin Plot)')
axes[1].axhline(y=pc_df['game_era'].mean(), color='r', linestyle='--',
label='Overall Mean')
axes[1].legend()
plt.tight_layout()
plt.savefig('catcher_era_analysis.png', dpi=300, bbox_inches='tight')
plt.close()
# Visualization 2: Pitcher consistency by catcher
variance_by_catcher = pc_df.groupby(['pitcher', 'catcher']).agg({
'game_era': ['mean', 'std', 'count']
}).reset_index()
variance_by_catcher.columns = ['pitcher', 'catcher', 'mean_era', 'std_era', 'games']
variance_by_catcher = variance_by_catcher[variance_by_catcher['games'] >= 3]
fig, ax = plt.subplots(figsize=(10, 6))
for catcher in catchers:
catcher_data = variance_by_catcher[variance_by_catcher['catcher'] == catcher]
ax.scatter(catcher_data['mean_era'], catcher_data['std_era'],
label=catcher, alpha=0.6, s=50)
ax.set_xlabel('Mean Game ERA')
ax.set_ylabel('Standard Deviation of Game ERA')
ax.set_title('Pitcher Consistency by Catcher\n(Lower std = more consistent)')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('catcher_consistency.png', dpi=300, bbox_inches='tight')
plt.close()
print("\nAnalysis complete. Visualizations saved.")
Integrating all defensive components into a single value metric represents the ultimate goal of catcher analytics. Several public and proprietary systems exist.
Components of Total Catcher Value
Modern systems typically include:
- Framing Runs (largest component for elite framers)
- Blocking Runs
- Throwing Runs (CS and deterrence)
- Fielding Runs (fielding bunts, pop-ups)
- Game Calling/Pitcher Management (often estimated or omitted)
Popular Metrics:
- FanGraphs DEF: Combines framing, blocking, throwing
- Baseball Prospectus FRAA: Fielding Runs Above Average including framing
- Statcast Catcher Defense: Uses Statcast data for framing and blocking
- Proprietary Team Models: Most teams build internal systems with additional data
Building a Composite Catcher Value Model
# R: Composite Catcher Value Model
library(tidyverse)
# Create synthetic catcher performance data (per 1000 innings)
catcher_performance <- tibble(
catcher = c("Realmuto", "Rutschman", "Smith", "Barnhart", "Perez",
"Grandal", "Contreras", "Kirk", "Vazquez", "Hedges"),
innings = c(1200, 1100, 1150, 900, 1300, 1000, 1100, 950, 1050, 800),
# Framing (runs per 7000 called pitches, normalized to innings)
framing_runs = c(15, 12, 10, 22, -8, 18, -5, 5, 8, 25),
# Blocking (runs per 1000 pitches in dirt)
blocking_runs = c(3, 4, 2, 3, 1, 2, -1, 1, 3, 4),
# Throwing (runs from CS and deterrence)
throwing_runs = c(8, 4, 2, 0, 6, 1, 3, -2, 5, 1),
# Fielding (pop ups, bunts, etc.)
fielding_runs = c(2, 1, 1, 0, 1, 0, 1, 0, 1, 0),
# Offensive runs above average
batting_runs = c(25, 20, 18, -10, 15, 10, 12, 8, -5, -15)
) %>%
mutate(
# Total defensive runs
defensive_runs = framing_runs + blocking_runs + throwing_runs + fielding_runs,
# Total runs above average
total_runs = defensive_runs + batting_runs,
# Convert to WAR (10 runs per win, plus positional adjustment)
# Catchers get +12.5 run positional adjustment per 150 games
games = innings / 9,
positional_adj = (games / 150) * 12.5,
war = (total_runs + positional_adj) / 10
)
# Display comprehensive rankings
catcher_rankings <- catcher_performance %>%
select(catcher, innings, framing_runs, blocking_runs, throwing_runs,
defensive_runs, batting_runs, total_runs, war) %>%
arrange(desc(war))
print(catcher_rankings)
# Analyze value components
value_decomposition <- catcher_performance %>%
select(catcher, framing_runs, blocking_runs, throwing_runs,
fielding_runs, batting_runs) %>%
pivot_longer(cols = -catcher, names_to = "component", values_to = "runs") %>%
mutate(component = str_remove(component, "_runs"))
# Stacked bar chart of value sources
ggplot(value_decomposition, aes(x = reorder(catcher, runs, sum),
y = runs, fill = component)) +
geom_col() +
coord_flip() +
labs(title = "Catcher Value Decomposition",
subtitle = "Runs above average by component",
x = "Catcher",
y = "Runs Above Average",
fill = "Component") +
scale_fill_brewer(palette = "Set2") +
theme_minimal()
# Correlation analysis between components
correlation_data <- catcher_performance %>%
select(framing_runs, blocking_runs, throwing_runs, batting_runs)
cor_matrix <- cor(correlation_data)
print("Correlation between catcher skills:")
print(round(cor_matrix, 3))
# Heatmap of correlations
library(reshape2)
cor_melted <- melt(cor_matrix)
ggplot(cor_melted, aes(Var1, Var2, fill = value)) +
geom_tile() +
geom_text(aes(label = round(value, 2)), color = "white") +
scale_fill_gradient2(low = "blue", mid = "white", high = "red",
midpoint = 0, limit = c(-1, 1)) +
labs(title = "Correlation Between Catcher Skills",
x = "", y = "") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Identify catcher archetypes using clustering
library(cluster)
cluster_data <- catcher_performance %>%
select(framing_runs, blocking_runs, throwing_runs, batting_runs) %>%
scale() # Normalize
kmeans_result <- kmeans(cluster_data, centers = 3, nstart = 25)
catcher_performance$archetype <- case_when(
kmeans_result$cluster == 1 ~ "Elite Defender",
kmeans_result$cluster == 2 ~ "Balanced",
kmeans_result$cluster == 3 ~ "Offensive-Focused"
)
archetype_summary <- catcher_performance %>%
group_by(archetype) %>%
summarise(
n = n(),
avg_framing = mean(framing_runs),
avg_blocking = mean(blocking_runs),
avg_throwing = mean(throwing_runs),
avg_batting = mean(batting_runs),
avg_war = mean(war)
)
print(archetype_summary)
# Visualize archetypes
ggplot(catcher_performance, aes(x = defensive_runs, y = batting_runs,
color = archetype, label = catcher)) +
geom_point(size = 4) +
geom_text(vjust = -1, size = 3) +
geom_vline(xintercept = 0, linetype = "dashed") +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(title = "Catcher Archetypes: Defense vs. Offense",
x = "Defensive Runs Above Average",
y = "Batting Runs Above Average",
color = "Archetype") +
theme_minimal()
# Python: Comprehensive Catcher Value Model
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Create catcher performance dataset
catchers = ['Realmuto', 'Rutschman', 'Smith', 'Barnhart', 'Perez',
'Grandal', 'Contreras', 'Kirk', 'Vazquez', 'Hedges']
catcher_df = pd.DataFrame({
'catcher': catchers,
'innings': [1200, 1100, 1150, 900, 1300, 1000, 1100, 950, 1050, 800],
'framing_runs': [15, 12, 10, 22, -8, 18, -5, 5, 8, 25],
'blocking_runs': [3, 4, 2, 3, 1, 2, -1, 1, 3, 4],
'throwing_runs': [8, 4, 2, 0, 6, 1, 3, -2, 5, 1],
'fielding_runs': [2, 1, 1, 0, 1, 0, 1, 0, 1, 0],
'batting_runs': [25, 20, 18, -10, 15, 10, 12, 8, -5, -15]
})
# Calculate composite metrics
catcher_df['defensive_runs'] = (
catcher_df['framing_runs'] +
catcher_df['blocking_runs'] +
catcher_df['throwing_runs'] +
catcher_df['fielding_runs']
)
catcher_df['total_runs'] = catcher_df['defensive_runs'] + catcher_df['batting_runs']
# Positional adjustment and WAR
catcher_df['games'] = catcher_df['innings'] / 9
catcher_df['positional_adj'] = (catcher_df['games'] / 150) * 12.5
catcher_df['war'] = (catcher_df['total_runs'] + catcher_df['positional_adj']) / 10
# Display rankings
catcher_rankings = catcher_df.sort_values('war', ascending=False)
print("\nCatcher WAR Rankings:")
print(catcher_rankings[['catcher', 'defensive_runs', 'batting_runs',
'total_runs', 'war']].round(2))
# Value decomposition visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# 1. Stacked bar chart of value components
value_components = catcher_df[['catcher', 'framing_runs', 'blocking_runs',
'throwing_runs', 'fielding_runs', 'batting_runs']].set_index('catcher')
value_components.plot(kind='barh', stacked=True, ax=axes[0, 0])
axes[0, 0].set_xlabel('Runs Above Average')
axes[0, 0].set_title('Value Decomposition by Component')
axes[0, 0].legend(loc='best', fontsize=8)
axes[0, 0].axvline(x=0, color='black', linestyle='-', linewidth=0.5)
# 2. Defense vs Offense scatter
axes[0, 1].scatter(catcher_df['defensive_runs'], catcher_df['batting_runs'],
s=100, alpha=0.6)
for idx, row in catcher_df.iterrows():
axes[0, 1].annotate(row['catcher'],
(row['defensive_runs'], row['batting_runs']),
fontsize=8, ha='right')
axes[0, 1].axhline(y=0, color='gray', linestyle='--', alpha=0.5)
axes[0, 1].axvline(x=0, color='gray', linestyle='--', alpha=0.5)
axes[0, 1].set_xlabel('Defensive Runs Above Average')
axes[0, 1].set_ylabel('Batting Runs Above Average')
axes[0, 1].set_title('Defense vs Offense Profile')
axes[0, 1].grid(True, alpha=0.3)
# 3. WAR bar chart
catcher_sorted = catcher_df.sort_values('war')
colors = ['green' if x > 0 else 'red' for x in catcher_sorted['war']]
axes[1, 0].barh(catcher_sorted['catcher'], catcher_sorted['war'], color=colors)
axes[1, 0].set_xlabel('WAR')
axes[1, 0].set_title('Total Catcher Value (WAR)')
axes[1, 0].axvline(x=0, color='black', linestyle='-', linewidth=0.5)
# 4. Correlation heatmap
corr_cols = ['framing_runs', 'blocking_runs', 'throwing_runs', 'batting_runs']
corr_matrix = catcher_df[corr_cols].corr()
sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='coolwarm',
center=0, ax=axes[1, 1], square=True)
axes[1, 1].set_title('Skill Correlation Matrix')
plt.tight_layout()
plt.savefig('catcher_value_comprehensive.png', dpi=300, bbox_inches='tight')
plt.close()
# Cluster analysis for archetypes
cluster_features = catcher_df[['framing_runs', 'blocking_runs',
'throwing_runs', 'batting_runs']]
scaler = StandardScaler()
scaled_features = scaler.fit_transform(cluster_features)
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
catcher_df['cluster'] = kmeans.fit_predict(scaled_features)
# Interpret clusters
cluster_summary = catcher_df.groupby('cluster').agg({
'catcher': 'count',
'framing_runs': 'mean',
'blocking_runs': 'mean',
'throwing_runs': 'mean',
'batting_runs': 'mean',
'war': 'mean'
}).round(2)
cluster_summary.columns = ['count', 'avg_framing', 'avg_blocking',
'avg_throwing', 'avg_batting', 'avg_war']
print("\nCatcher Archetype Clusters:")
print(cluster_summary)
# Assign archetype labels based on cluster characteristics
archetype_map = {}
for cluster_id in range(3):
cluster_data = cluster_summary.loc[cluster_id]
if cluster_data['avg_framing'] > 10:
archetype_map[cluster_id] = 'Elite Framer'
elif cluster_data['avg_batting'] > 10:
archetype_map[cluster_id] = 'Offensive-Focused'
else:
archetype_map[cluster_id] = 'Balanced'
catcher_df['archetype'] = catcher_df['cluster'].map(archetype_map)
print("\nCatcher Archetypes:")
print(catcher_df[['catcher', 'archetype', 'war']].sort_values('war', ascending=False))
# Calculate replacement level value
# Replacement level typically around -2 WAR per 600 PA season
replacement_level_war = -1.5 # For ~1000 innings
catcher_df['war_above_replacement'] = catcher_df['war'] - replacement_level_war
catcher_df['runs_above_replacement'] = catcher_df['war_above_replacement'] * 10
# Estimate dollar value ($8M per WAR in modern market)
dollars_per_war = 8_000_000
catcher_df['estimated_value'] = catcher_df['war_above_replacement'] * dollars_per_war
print("\nEstimated Market Value:")
print(catcher_df[['catcher', 'war', 'war_above_replacement',
'estimated_value']].sort_values('estimated_value', ascending=False))
# Final comprehensive visualization
fig, ax = plt.subplots(figsize=(12, 8))
# Bubble chart: Defense vs Offense, size = WAR, color = archetype
scatter = ax.scatter(catcher_df['defensive_runs'],
catcher_df['batting_runs'],
s=catcher_df['war'].clip(lower=0) * 100, # Size by WAR
c=catcher_df['cluster'],
alpha=0.6,
cmap='viridis',
edgecolors='black',
linewidth=1)
# Add labels
for idx, row in catcher_df.iterrows():
ax.annotate(row['catcher'],
(row['defensive_runs'], row['batting_runs']),
fontsize=9, ha='center', va='bottom')
# Add quadrant lines
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5, linewidth=1)
ax.axvline(x=0, color='gray', linestyle='--', alpha=0.5, linewidth=1)
# Add quadrant labels
ax.text(20, 20, 'Elite Overall', fontsize=10, alpha=0.5, ha='center')
ax.text(20, -10, 'Defense Specialists', fontsize=10, alpha=0.5, ha='center')
ax.text(-5, 20, 'Offense-First', fontsize=10, alpha=0.5, ha='center')
ax.text(-5, -10, 'Below Average', fontsize=10, alpha=0.5, ha='center')
ax.set_xlabel('Defensive Runs Above Average', fontsize=12)
ax.set_ylabel('Batting Runs Above Average', fontsize=12)
ax.set_title('Catcher Value Profile\n(Bubble size = WAR)', fontsize=14)
ax.grid(True, alpha=0.3)
plt.colorbar(scatter, ax=ax, label='Archetype Cluster')
plt.tight_layout()
plt.savefig('catcher_value_profile.png', dpi=300, bbox_inches='tight')
plt.close()
print("\nAnalysis complete. All visualizations saved.")
Trade-offs and Value Optimization
Teams face strategic decisions about catcher allocation:
Offense vs. Defense Balance: How much offensive production to sacrifice for elite defense depends on:
- Team offensive context (good offenses can afford defense-first catchers)
- Pitching staff characteristics (young/wild staff benefits from elite defense)
- Division competition (high-scoring division may require more offense)
- Park factors (pitcher-friendly park reduces need for catcher offense)
Playing Time Distribution: Should teams:
- Ride one elite catcher for 130+ games?
- Split time between defensive/offensive specialists?
- Match catchers to specific pitchers?
Research suggests that catcher defense value increases with playing time due to framing's high volume impact. However, catchers face significant physical demands, making durability management crucial.
The Future of Catcher Valuation
Several trends are shaping catcher analytics:
- Automated Strike Zones: ABS implementation would eliminate framing value
- PitchCom Technology: Reduces game-calling autonomy
- Biomechanics Tracking: New data on receiving mechanics and injury risk
- Advanced Blocking Metrics: Better models of blockable vs. unblockable pitches
- Pitcher Pairing Optimization: Data-driven battery matching
Teams with sophisticated catcher development programs (Rays, Guardians, Cardinals) maintain competitive advantages through superior evaluation and player development.
# R: Composite Catcher Value Model
library(tidyverse)
# Create synthetic catcher performance data (per 1000 innings)
catcher_performance <- tibble(
catcher = c("Realmuto", "Rutschman", "Smith", "Barnhart", "Perez",
"Grandal", "Contreras", "Kirk", "Vazquez", "Hedges"),
innings = c(1200, 1100, 1150, 900, 1300, 1000, 1100, 950, 1050, 800),
# Framing (runs per 7000 called pitches, normalized to innings)
framing_runs = c(15, 12, 10, 22, -8, 18, -5, 5, 8, 25),
# Blocking (runs per 1000 pitches in dirt)
blocking_runs = c(3, 4, 2, 3, 1, 2, -1, 1, 3, 4),
# Throwing (runs from CS and deterrence)
throwing_runs = c(8, 4, 2, 0, 6, 1, 3, -2, 5, 1),
# Fielding (pop ups, bunts, etc.)
fielding_runs = c(2, 1, 1, 0, 1, 0, 1, 0, 1, 0),
# Offensive runs above average
batting_runs = c(25, 20, 18, -10, 15, 10, 12, 8, -5, -15)
) %>%
mutate(
# Total defensive runs
defensive_runs = framing_runs + blocking_runs + throwing_runs + fielding_runs,
# Total runs above average
total_runs = defensive_runs + batting_runs,
# Convert to WAR (10 runs per win, plus positional adjustment)
# Catchers get +12.5 run positional adjustment per 150 games
games = innings / 9,
positional_adj = (games / 150) * 12.5,
war = (total_runs + positional_adj) / 10
)
# Display comprehensive rankings
catcher_rankings <- catcher_performance %>%
select(catcher, innings, framing_runs, blocking_runs, throwing_runs,
defensive_runs, batting_runs, total_runs, war) %>%
arrange(desc(war))
print(catcher_rankings)
# Analyze value components
value_decomposition <- catcher_performance %>%
select(catcher, framing_runs, blocking_runs, throwing_runs,
fielding_runs, batting_runs) %>%
pivot_longer(cols = -catcher, names_to = "component", values_to = "runs") %>%
mutate(component = str_remove(component, "_runs"))
# Stacked bar chart of value sources
ggplot(value_decomposition, aes(x = reorder(catcher, runs, sum),
y = runs, fill = component)) +
geom_col() +
coord_flip() +
labs(title = "Catcher Value Decomposition",
subtitle = "Runs above average by component",
x = "Catcher",
y = "Runs Above Average",
fill = "Component") +
scale_fill_brewer(palette = "Set2") +
theme_minimal()
# Correlation analysis between components
correlation_data <- catcher_performance %>%
select(framing_runs, blocking_runs, throwing_runs, batting_runs)
cor_matrix <- cor(correlation_data)
print("Correlation between catcher skills:")
print(round(cor_matrix, 3))
# Heatmap of correlations
library(reshape2)
cor_melted <- melt(cor_matrix)
ggplot(cor_melted, aes(Var1, Var2, fill = value)) +
geom_tile() +
geom_text(aes(label = round(value, 2)), color = "white") +
scale_fill_gradient2(low = "blue", mid = "white", high = "red",
midpoint = 0, limit = c(-1, 1)) +
labs(title = "Correlation Between Catcher Skills",
x = "", y = "") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Identify catcher archetypes using clustering
library(cluster)
cluster_data <- catcher_performance %>%
select(framing_runs, blocking_runs, throwing_runs, batting_runs) %>%
scale() # Normalize
kmeans_result <- kmeans(cluster_data, centers = 3, nstart = 25)
catcher_performance$archetype <- case_when(
kmeans_result$cluster == 1 ~ "Elite Defender",
kmeans_result$cluster == 2 ~ "Balanced",
kmeans_result$cluster == 3 ~ "Offensive-Focused"
)
archetype_summary <- catcher_performance %>%
group_by(archetype) %>%
summarise(
n = n(),
avg_framing = mean(framing_runs),
avg_blocking = mean(blocking_runs),
avg_throwing = mean(throwing_runs),
avg_batting = mean(batting_runs),
avg_war = mean(war)
)
print(archetype_summary)
# Visualize archetypes
ggplot(catcher_performance, aes(x = defensive_runs, y = batting_runs,
color = archetype, label = catcher)) +
geom_point(size = 4) +
geom_text(vjust = -1, size = 3) +
geom_vline(xintercept = 0, linetype = "dashed") +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(title = "Catcher Archetypes: Defense vs. Offense",
x = "Defensive Runs Above Average",
y = "Batting Runs Above Average",
color = "Archetype") +
theme_minimal()
# Python: Comprehensive Catcher Value Model
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Create catcher performance dataset
catchers = ['Realmuto', 'Rutschman', 'Smith', 'Barnhart', 'Perez',
'Grandal', 'Contreras', 'Kirk', 'Vazquez', 'Hedges']
catcher_df = pd.DataFrame({
'catcher': catchers,
'innings': [1200, 1100, 1150, 900, 1300, 1000, 1100, 950, 1050, 800],
'framing_runs': [15, 12, 10, 22, -8, 18, -5, 5, 8, 25],
'blocking_runs': [3, 4, 2, 3, 1, 2, -1, 1, 3, 4],
'throwing_runs': [8, 4, 2, 0, 6, 1, 3, -2, 5, 1],
'fielding_runs': [2, 1, 1, 0, 1, 0, 1, 0, 1, 0],
'batting_runs': [25, 20, 18, -10, 15, 10, 12, 8, -5, -15]
})
# Calculate composite metrics
catcher_df['defensive_runs'] = (
catcher_df['framing_runs'] +
catcher_df['blocking_runs'] +
catcher_df['throwing_runs'] +
catcher_df['fielding_runs']
)
catcher_df['total_runs'] = catcher_df['defensive_runs'] + catcher_df['batting_runs']
# Positional adjustment and WAR
catcher_df['games'] = catcher_df['innings'] / 9
catcher_df['positional_adj'] = (catcher_df['games'] / 150) * 12.5
catcher_df['war'] = (catcher_df['total_runs'] + catcher_df['positional_adj']) / 10
# Display rankings
catcher_rankings = catcher_df.sort_values('war', ascending=False)
print("\nCatcher WAR Rankings:")
print(catcher_rankings[['catcher', 'defensive_runs', 'batting_runs',
'total_runs', 'war']].round(2))
# Value decomposition visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# 1. Stacked bar chart of value components
value_components = catcher_df[['catcher', 'framing_runs', 'blocking_runs',
'throwing_runs', 'fielding_runs', 'batting_runs']].set_index('catcher')
value_components.plot(kind='barh', stacked=True, ax=axes[0, 0])
axes[0, 0].set_xlabel('Runs Above Average')
axes[0, 0].set_title('Value Decomposition by Component')
axes[0, 0].legend(loc='best', fontsize=8)
axes[0, 0].axvline(x=0, color='black', linestyle='-', linewidth=0.5)
# 2. Defense vs Offense scatter
axes[0, 1].scatter(catcher_df['defensive_runs'], catcher_df['batting_runs'],
s=100, alpha=0.6)
for idx, row in catcher_df.iterrows():
axes[0, 1].annotate(row['catcher'],
(row['defensive_runs'], row['batting_runs']),
fontsize=8, ha='right')
axes[0, 1].axhline(y=0, color='gray', linestyle='--', alpha=0.5)
axes[0, 1].axvline(x=0, color='gray', linestyle='--', alpha=0.5)
axes[0, 1].set_xlabel('Defensive Runs Above Average')
axes[0, 1].set_ylabel('Batting Runs Above Average')
axes[0, 1].set_title('Defense vs Offense Profile')
axes[0, 1].grid(True, alpha=0.3)
# 3. WAR bar chart
catcher_sorted = catcher_df.sort_values('war')
colors = ['green' if x > 0 else 'red' for x in catcher_sorted['war']]
axes[1, 0].barh(catcher_sorted['catcher'], catcher_sorted['war'], color=colors)
axes[1, 0].set_xlabel('WAR')
axes[1, 0].set_title('Total Catcher Value (WAR)')
axes[1, 0].axvline(x=0, color='black', linestyle='-', linewidth=0.5)
# 4. Correlation heatmap
corr_cols = ['framing_runs', 'blocking_runs', 'throwing_runs', 'batting_runs']
corr_matrix = catcher_df[corr_cols].corr()
sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='coolwarm',
center=0, ax=axes[1, 1], square=True)
axes[1, 1].set_title('Skill Correlation Matrix')
plt.tight_layout()
plt.savefig('catcher_value_comprehensive.png', dpi=300, bbox_inches='tight')
plt.close()
# Cluster analysis for archetypes
cluster_features = catcher_df[['framing_runs', 'blocking_runs',
'throwing_runs', 'batting_runs']]
scaler = StandardScaler()
scaled_features = scaler.fit_transform(cluster_features)
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
catcher_df['cluster'] = kmeans.fit_predict(scaled_features)
# Interpret clusters
cluster_summary = catcher_df.groupby('cluster').agg({
'catcher': 'count',
'framing_runs': 'mean',
'blocking_runs': 'mean',
'throwing_runs': 'mean',
'batting_runs': 'mean',
'war': 'mean'
}).round(2)
cluster_summary.columns = ['count', 'avg_framing', 'avg_blocking',
'avg_throwing', 'avg_batting', 'avg_war']
print("\nCatcher Archetype Clusters:")
print(cluster_summary)
# Assign archetype labels based on cluster characteristics
archetype_map = {}
for cluster_id in range(3):
cluster_data = cluster_summary.loc[cluster_id]
if cluster_data['avg_framing'] > 10:
archetype_map[cluster_id] = 'Elite Framer'
elif cluster_data['avg_batting'] > 10:
archetype_map[cluster_id] = 'Offensive-Focused'
else:
archetype_map[cluster_id] = 'Balanced'
catcher_df['archetype'] = catcher_df['cluster'].map(archetype_map)
print("\nCatcher Archetypes:")
print(catcher_df[['catcher', 'archetype', 'war']].sort_values('war', ascending=False))
# Calculate replacement level value
# Replacement level typically around -2 WAR per 600 PA season
replacement_level_war = -1.5 # For ~1000 innings
catcher_df['war_above_replacement'] = catcher_df['war'] - replacement_level_war
catcher_df['runs_above_replacement'] = catcher_df['war_above_replacement'] * 10
# Estimate dollar value ($8M per WAR in modern market)
dollars_per_war = 8_000_000
catcher_df['estimated_value'] = catcher_df['war_above_replacement'] * dollars_per_war
print("\nEstimated Market Value:")
print(catcher_df[['catcher', 'war', 'war_above_replacement',
'estimated_value']].sort_values('estimated_value', ascending=False))
# Final comprehensive visualization
fig, ax = plt.subplots(figsize=(12, 8))
# Bubble chart: Defense vs Offense, size = WAR, color = archetype
scatter = ax.scatter(catcher_df['defensive_runs'],
catcher_df['batting_runs'],
s=catcher_df['war'].clip(lower=0) * 100, # Size by WAR
c=catcher_df['cluster'],
alpha=0.6,
cmap='viridis',
edgecolors='black',
linewidth=1)
# Add labels
for idx, row in catcher_df.iterrows():
ax.annotate(row['catcher'],
(row['defensive_runs'], row['batting_runs']),
fontsize=9, ha='center', va='bottom')
# Add quadrant lines
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5, linewidth=1)
ax.axvline(x=0, color='gray', linestyle='--', alpha=0.5, linewidth=1)
# Add quadrant labels
ax.text(20, 20, 'Elite Overall', fontsize=10, alpha=0.5, ha='center')
ax.text(20, -10, 'Defense Specialists', fontsize=10, alpha=0.5, ha='center')
ax.text(-5, 20, 'Offense-First', fontsize=10, alpha=0.5, ha='center')
ax.text(-5, -10, 'Below Average', fontsize=10, alpha=0.5, ha='center')
ax.set_xlabel('Defensive Runs Above Average', fontsize=12)
ax.set_ylabel('Batting Runs Above Average', fontsize=12)
ax.set_title('Catcher Value Profile\n(Bubble size = WAR)', fontsize=14)
ax.grid(True, alpha=0.3)
plt.colorbar(scatter, ax=ax, label='Archetype Cluster')
plt.tight_layout()
plt.savefig('catcher_value_profile.png', dpi=300, bbox_inches='tight')
plt.close()
print("\nAnalysis complete. All visualizations saved.")
Modern catcher analytics demands interactive visualizations that allow scouts, coaches, and analysts to explore multi-dimensional performance data dynamically. Static charts and tables, while informative, cannot capture the complexity of catcher performance across different pitch locations, game contexts, and time periods. This section demonstrates how to build interactive dashboards using Plotly and Shiny (R) or Dash (Python) to create powerful analytical tools for catcher evaluation.
Interactive visualizations offer several key advantages for catcher analytics:
- Dynamic Filtering: Users can instantly filter by date range, opponent, pitcher, or game situation to isolate specific performance contexts
- Multi-Dimensional Exploration: Hover interactions reveal detailed pitch-level information while maintaining overall pattern visibility
- Comparative Analysis: Side-by-side or overlapping visualizations enable direct catcher comparisons
- Real-Time Updates: Dashboards can be connected to live data sources for in-season performance tracking
- Exportable Insights: Interactive plots can be saved as HTML files for sharing with coaches and front office staff
Interactive Framing Heat Map
The framing heat map visualization shows strike probability by pitch location, allowing users to identify where each catcher excels or struggles with pitch receiving. Interactive features include location-specific statistics, zone overlays, and catcher comparison modes.
# R: Interactive Framing Heat Map with Plotly
library(plotly)
library(tidyverse)
library(htmlwidgets)
# Prepare framing data with location bins
create_framing_heatmap_data <- function(pitch_data, catcher_name) {
pitch_data %>%
filter(catcher == catcher_name) %>%
mutate(
# Create location bins (20x20 grid)
x_bin = cut(plate_x, breaks = seq(-2, 2, length.out = 21),
labels = FALSE, include.lowest = TRUE),
z_bin = cut(plate_z, breaks = seq(0, 5, length.out = 21),
labels = FALSE, include.lowest = TRUE),
# Calculate bin centers
x_center = seq(-1.9, 1.9, length.out = 20)[x_bin],
z_center = seq(0.125, 4.875, length.out = 20)[z_bin]
) %>%
group_by(x_center, z_center) %>%
summarise(
pitches = n(),
actual_strike_rate = mean(called_strike, na.rm = TRUE),
expected_strike_rate = mean(expected_strike, na.rm = TRUE),
framing_value = actual_strike_rate - expected_strike_rate,
.groups = "drop"
) %>%
filter(pitches >= 5) # Minimum sample size per bin
}
# Create interactive heat map
create_interactive_framing_heatmap <- function(pitch_data, catcher_name) {
heatmap_data <- create_framing_heatmap_data(pitch_data, catcher_name)
# Create matrix for heatmap
x_coords <- sort(unique(heatmap_data$x_center))
z_coords <- sort(unique(heatmap_data$z_center))
framing_matrix <- matrix(NA, nrow = length(z_coords), ncol = length(x_coords))
for (i in seq_len(nrow(heatmap_data))) {
row_idx <- which(z_coords == heatmap_data$z_center[i])
col_idx <- which(x_coords == heatmap_data$x_center[i])
framing_matrix[row_idx, col_idx] <- heatmap_data$framing_value[i]
}
# Create custom hover text matrix
hover_matrix <- matrix("", nrow = length(z_coords), ncol = length(x_coords))
for (i in seq_len(nrow(heatmap_data))) {
row_idx <- which(z_coords == heatmap_data$z_center[i])
col_idx <- which(x_coords == heatmap_data$x_center[i])
hover_matrix[row_idx, col_idx] <- paste0(
"Location: (", round(heatmap_data$x_center[i], 2), ", ",
round(heatmap_data$z_center[i], 2), ")<br>",
"Pitches: ", heatmap_data$pitches[i], "<br>",
"Actual Strike %: ", round(heatmap_data$actual_strike_rate[i] * 100, 1), "%<br>",
"Expected Strike %: ", round(heatmap_data$expected_strike_rate[i] * 100, 1), "%<br>",
"Framing Value: ", round(heatmap_data$framing_value[i] * 100, 1), "%"
)
}
# Create plotly heatmap
fig <- plot_ly(
x = x_coords,
y = z_coords,
z = framing_matrix,
type = "heatmap",
colorscale = list(
c(0, "rgb(220, 50, 50)"), # Red for negative
c(0.5, "rgb(255, 255, 255)"), # White for neutral
c(1, "rgb(50, 50, 220)") # Blue for positive
),
zmid = 0,
zmin = -0.15,
zmax = 0.15,
colorbar = list(title = "Framing<br>Value"),
hovertemplate = paste0(
"%{text}<extra></extra>"
),
text = hover_matrix
) %>%
# Add strike zone rectangle
add_segments(
x = -0.708, xend = 0.708, y = 1.5, yend = 1.5,
line = list(color = "black", width = 3),
showlegend = FALSE, inherit = FALSE
) %>%
add_segments(
x = -0.708, xend = 0.708, y = 3.5, yend = 3.5,
line = list(color = "black", width = 3),
showlegend = FALSE, inherit = FALSE
) %>%
add_segments(
x = -0.708, xend = -0.708, y = 1.5, yend = 3.5,
line = list(color = "black", width = 3),
showlegend = FALSE, inherit = FALSE
) %>%
add_segments(
x = 0.708, xend = 0.708, y = 1.5, yend = 3.5,
line = list(color = "black", width = 3),
showlegend = FALSE, inherit = FALSE
) %>%
layout(
title = list(
text = paste0("<b>", catcher_name, " - Pitch Framing Heat Map</b>"),
x = 0.5,
xanchor = "center"
),
xaxis = list(
title = "Horizontal Location (feet)",
range = c(-2, 2),
constrain = "domain"
),
yaxis = list(
title = "Vertical Location (feet)",
range = c(0, 5),
scaleanchor = "x",
scaleratio = 1
),
plot_bgcolor = "rgb(240, 240, 240)",
paper_bgcolor = "white"
) %>%
config(displayModeBar = TRUE,
modeBarButtonsToRemove = c("lasso2d", "select2d"))
return(fig)
}
# Example usage with sample data
set.seed(2023)
sample_pitch_data <- tibble(
catcher = sample(c("Realmuto", "Barnhart", "Perez"), 10000, replace = TRUE),
plate_x = rnorm(10000, 0, 0.8),
plate_z = rnorm(10000, 2.5, 0.6),
dist_from_center = sqrt(plate_x^2 + (plate_z - 2.5)^2),
strike_prob = plogis(2 - 2.5 * dist_from_center),
catcher_effect = case_when(
catcher == "Barnhart" ~ 0.4,
catcher == "Realmuto" ~ 0.3,
catcher == "Perez" ~ -0.2
),
expected_strike = strike_prob,
called_strike = rbinom(10000, 1, plogis(qlogis(strike_prob) + catcher_effect))
)
# Create and display interactive heatmap
framing_heatmap <- create_interactive_framing_heatmap(sample_pitch_data, "Barnhart")
framing_heatmap
# Save as HTML file
htmlwidgets::saveWidget(framing_heatmap, "framing_heatmap.html",
selfcontained = TRUE)
# Python: Interactive Framing Heat Map with Plotly
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy.stats import binned_statistic_2d
def create_framing_heatmap_data(pitch_data, catcher_name):
"""Prepare binned framing data for heatmap visualization"""
catcher_data = pitch_data[pitch_data['catcher'] == catcher_name].copy()
# Create 20x20 grid
x_bins = np.linspace(-2, 2, 21)
z_bins = np.linspace(0, 5, 21)
# Calculate statistics for each bin
actual_strikes, x_edges, z_edges, _ = binned_statistic_2d(
catcher_data['plate_x'], catcher_data['plate_z'],
catcher_data['called_strike'],
statistic='mean', bins=[x_bins, z_bins]
)
expected_strikes, _, _, _ = binned_statistic_2d(
catcher_data['plate_x'], catcher_data['plate_z'],
catcher_data['expected_strike'],
statistic='mean', bins=[x_bins, z_bins]
)
pitch_counts, _, _, _ = binned_statistic_2d(
catcher_data['plate_x'], catcher_data['plate_z'],
catcher_data['called_strike'],
statistic='count', bins=[x_bins, z_bins]
)
# Calculate framing value
framing_value = actual_strikes - expected_strikes
# Mask bins with insufficient data
framing_value[pitch_counts < 5] = np.nan
actual_strikes[pitch_counts < 5] = np.nan
expected_strikes[pitch_counts < 5] = np.nan
return {
'framing_value': framing_value,
'actual_strikes': actual_strikes,
'expected_strikes': expected_strikes,
'pitch_counts': pitch_counts,
'x_centers': (x_edges[:-1] + x_edges[1:]) / 2,
'z_centers': (z_edges[:-1] + z_edges[1:]) / 2
}
def create_interactive_framing_heatmap(pitch_data, catcher_name):
"""Create interactive Plotly heatmap for pitch framing"""
data = create_framing_heatmap_data(pitch_data, catcher_name)
# Create custom hover text
hover_text = []
for i in range(len(data['z_centers'])):
hover_row = []
for j in range(len(data['x_centers'])):
if np.isnan(data['framing_value'][i, j]):
hover_row.append('')
else:
text = (
f"Location: ({data['x_centers'][j]:.2f}, {data['z_centers'][i]:.2f})<br>"
f"Pitches: {int(data['pitch_counts'][i, j])}<br>"
f"Actual Strike %: {data['actual_strikes'][i, j]*100:.1f}%<br>"
f"Expected Strike %: {data['expected_strikes'][i, j]*100:.1f}%<br>"
f"Framing Value: {data['framing_value'][i, j]*100:+.1f}%"
)
hover_row.append(text)
hover_text.append(hover_row)
# Create figure
fig = go.Figure()
# Add heatmap
fig.add_trace(go.Heatmap(
x=data['x_centers'],
y=data['z_centers'],
z=data['framing_value'],
colorscale=[
[0, 'rgb(220, 50, 50)'], # Red for negative
[0.5, 'rgb(255, 255, 255)'], # White for neutral
[1, 'rgb(50, 50, 220)'] # Blue for positive
],
zmid=0,
zmin=-0.15,
zmax=0.15,
colorbar=dict(title="Framing<br>Value"),
hovertemplate='%{text}<extra></extra>',
text=hover_text
))
# Add strike zone rectangle
zone_x = [-0.708, 0.708, 0.708, -0.708, -0.708]
zone_z = [1.5, 1.5, 3.5, 3.5, 1.5]
fig.add_trace(go.Scatter(
x=zone_x, y=zone_z,
mode='lines',
line=dict(color='black', width=3),
showlegend=False,
hoverinfo='skip'
))
# Update layout
fig.update_layout(
title=dict(
text=f"<b>{catcher_name} - Pitch Framing Heat Map</b>",
x=0.5,
xanchor='center',
font=dict(size=18)
),
xaxis=dict(
title="Horizontal Location (feet)",
range=[-2, 2],
constrain='domain'
),
yaxis=dict(
title="Vertical Location (feet)",
range=[0, 5],
scaleanchor="x",
scaleratio=1
),
plot_bgcolor='rgb(240, 240, 240)',
paper_bgcolor='white',
width=700,
height=700
)
return fig
# Example usage with sample data
np.random.seed(2023)
n = 10000
def inv_logit(x):
return 1 / (1 + np.exp(-x))
sample_pitch_data = pd.DataFrame({
'catcher': np.random.choice(['Realmuto', 'Barnhart', 'Perez'], n),
'plate_x': np.random.normal(0, 0.8, n),
'plate_z': np.random.normal(2.5, 0.6, n)
})
sample_pitch_data['dist_from_center'] = np.sqrt(
sample_pitch_data['plate_x']**2 +
(sample_pitch_data['plate_z'] - 2.5)**2
)
catcher_effects = {'Barnhart': 0.4, 'Realmuto': 0.3, 'Perez': -0.2}
sample_pitch_data['catcher_effect'] = sample_pitch_data['catcher'].map(catcher_effects)
base_logit = 2 - 2.5 * sample_pitch_data['dist_from_center']
sample_pitch_data['expected_strike'] = inv_logit(base_logit)
strike_prob = inv_logit(base_logit + sample_pitch_data['catcher_effect'])
sample_pitch_data['called_strike'] = np.random.binomial(1, strike_prob)
# Create and display interactive heatmap
framing_heatmap = create_interactive_framing_heatmap(sample_pitch_data, 'Barnhart')
framing_heatmap.show()
# Save as HTML file
framing_heatmap.write_html("framing_heatmap.html")
Interactive Catcher Comparison Radar Chart
Radar charts excel at displaying multi-dimensional catcher performance, allowing quick visual comparison across framing, blocking, throwing, and offensive contributions. Interactive radar charts add tooltips with exact values and enable toggling between multiple catchers.
# R: Interactive Catcher Comparison Radar Chart
library(plotly)
create_catcher_radar_chart <- function(catcher_stats, catchers_to_compare) {
# Filter to selected catchers
plot_data <- catcher_stats %>%
filter(catcher %in% catchers_to_compare)
# Normalize metrics to 0-100 scale for radar chart
metrics <- c("framing_runs", "blocking_runs", "throwing_runs",
"fielding_runs", "batting_runs")
normalized_data <- plot_data %>%
mutate(across(all_of(metrics),
~scales::rescale(., to = c(0, 100),
from = range(catcher_stats[[cur_column()]]))))
# Create plotly radar chart
fig <- plot_ly(
type = 'scatterpolar',
fill = 'toself'
)
# Add trace for each catcher
for (i in seq_len(nrow(normalized_data))) {
catcher_row <- normalized_data[i, ]
fig <- fig %>%
add_trace(
r = c(catcher_row$framing_runs, catcher_row$blocking_runs,
catcher_row$throwing_runs, catcher_row$fielding_runs,
catcher_row$batting_runs),
theta = c('Framing', 'Blocking', 'Throwing', 'Fielding', 'Batting'),
name = catcher_row$catcher,
mode = 'lines+markers',
marker = list(size = 8),
line = list(width = 2),
hovertemplate = paste0(
"<b>%{theta}</b><br>",
"Percentile: %{r:.1f}<br>",
"<extra>", catcher_row$catcher, "</extra>"
)
)
}
fig <- fig %>%
layout(
polar = list(
radialaxis = list(
visible = TRUE,
range = c(0, 100),
ticktext = c("0", "25", "50", "75", "100"),
tickvals = c(0, 25, 50, 75, 100)
)
),
title = list(
text = "<b>Catcher Performance Comparison</b>",
x = 0.5,
xanchor = "center"
),
showlegend = TRUE,
legend = list(
orientation = "v",
x = 1.1,
y = 0.5
)
)
return(fig)
}
# Create sample catcher statistics
catcher_comparison_stats <- tibble(
catcher = c("Realmuto", "Rutschman", "Smith", "Barnhart", "Perez"),
framing_runs = c(15, 12, 10, 22, -8),
blocking_runs = c(3, 4, 2, 3, 1),
throwing_runs = c(8, 4, 2, 0, 6),
fielding_runs = c(2, 1, 1, 0, 1),
batting_runs = c(25, 20, 18, -10, 15)
)
# Create interactive radar chart
radar_chart <- create_catcher_radar_chart(
catcher_comparison_stats,
c("Realmuto", "Barnhart", "Perez")
)
radar_chart
# Save as HTML
htmlwidgets::saveWidget(radar_chart, "catcher_radar.html", selfcontained = TRUE)
# Python: Interactive Catcher Comparison Radar Chart
import plotly.graph_objects as go
import pandas as pd
import numpy as np
def create_catcher_radar_chart(catcher_stats, catchers_to_compare):
"""Create interactive radar chart comparing multiple catchers"""
# Filter to selected catchers
plot_data = catcher_stats[catcher_stats['catcher'].isin(catchers_to_compare)].copy()
# Metrics to display
metrics = ['framing_runs', 'blocking_runs', 'throwing_runs',
'fielding_runs', 'batting_runs']
metric_labels = ['Framing', 'Blocking', 'Throwing', 'Fielding', 'Batting']
# Normalize to 0-100 percentile scale
for metric in metrics:
min_val = catcher_stats[metric].min()
max_val = catcher_stats[metric].max()
plot_data[f'{metric}_norm'] = (
(plot_data[metric] - min_val) / (max_val - min_val) * 100
)
# Create figure
fig = go.Figure()
# Add trace for each catcher
colors = ['rgb(31, 119, 180)', 'rgb(255, 127, 14)', 'rgb(44, 160, 44)',
'rgb(214, 39, 40)', 'rgb(148, 103, 189)']
for idx, (_, catcher_row) in enumerate(plot_data.iterrows()):
values = [catcher_row[f'{m}_norm'] for m in metrics]
actual_values = [catcher_row[m] for m in metrics]
# Create hover text with actual values
hover_text = [
f"<b>{label}</b><br>Percentile: {val:.1f}<br>Actual: {actual:+.1f} runs"
for label, val, actual in zip(metric_labels, values, actual_values)
]
fig.add_trace(go.Scatterpolar(
r=values,
theta=metric_labels,
fill='toself',
name=catcher_row['catcher'],
line=dict(color=colors[idx % len(colors)], width=2),
marker=dict(size=8),
hovertemplate='%{text}<extra>' + catcher_row['catcher'] + '</extra>',
text=hover_text
))
# Update layout
fig.update_layout(
polar=dict(
radialaxis=dict(
visible=True,
range=[0, 100],
ticktext=['0', '25', '50', '75', '100'],
tickvals=[0, 25, 50, 75, 100]
)
),
title=dict(
text="<b>Catcher Performance Comparison</b>",
x=0.5,
xanchor='center',
font=dict(size=18)
),
showlegend=True,
legend=dict(
orientation="v",
x=1.1,
y=0.5
),
width=800,
height=600
)
return fig
# Create sample catcher statistics
catcher_comparison_stats = pd.DataFrame({
'catcher': ['Realmuto', 'Rutschman', 'Smith', 'Barnhart', 'Perez'],
'framing_runs': [15, 12, 10, 22, -8],
'blocking_runs': [3, 4, 2, 3, 1],
'throwing_runs': [8, 4, 2, 0, 6],
'fielding_runs': [2, 1, 1, 0, 1],
'batting_runs': [25, 20, 18, -10, 15]
})
# Create interactive radar chart
radar_chart = create_catcher_radar_chart(
catcher_comparison_stats,
['Realmuto', 'Barnhart', 'Perez']
)
radar_chart.show()
# Save as HTML
radar_chart.write_html("catcher_radar.html")
Interactive Pop Time Distribution with Filtering
Pop time analysis benefits from interactive filtering that allows users to examine performance under specific conditions: pitcher handedness, base, game situation, and temporal trends. This visualization combines histograms, box plots, and trend lines with dynamic filtering controls.
# R: Interactive Pop Time Analysis with Filtering
library(plotly)
library(crosstalk)
create_pop_time_dashboard <- function(throwing_data) {
# Create shared data for crosstalk filtering
shared_data <- SharedData$new(throwing_data)
# Create distribution plot
pop_time_hist <- plot_ly(shared_data, x = ~pop_time, color = ~catcher,
type = "histogram", alpha = 0.6, nbinsx = 30) %>%
layout(
title = "Pop Time Distribution",
xaxis = list(title = "Pop Time (seconds)"),
yaxis = list(title = "Count"),
barmode = "overlay"
)
# Create box plot by catcher
pop_time_box <- plot_ly(shared_data, y = ~pop_time, color = ~catcher,
type = "box") %>%
layout(
title = "Pop Time by Catcher",
yaxis = list(title = "Pop Time (seconds)"),
xaxis = list(title = "Catcher")
)
# Create scatter plot: pop time vs caught stealing
pop_time_scatter <- plot_ly(shared_data,
x = ~pop_time,
y = ~caught_stealing,
color = ~catcher,
type = "scatter",
mode = "markers",
marker = list(size = 6, opacity = 0.6),
text = ~paste(
"Catcher:", catcher, "<br>",
"Pop Time:", round(pop_time, 2), "s<br>",
"Result:", ifelse(caught_stealing, "CS", "SB"), "<br>",
"Runner Speed:", round(runner_speed, 1), "ft/s"
),
hoverinfo = "text") %>%
layout(
title = "Pop Time vs Outcome",
xaxis = list(title = "Pop Time (seconds)"),
yaxis = list(title = "Caught Stealing", tickvals = c(0, 1),
ticktext = c("Safe", "Out"))
)
# Combine plots using subplot
combined_plot <- subplot(
pop_time_hist,
pop_time_box,
pop_time_scatter,
nrows = 2,
heights = c(0.4, 0.6),
shareX = FALSE,
titleX = TRUE,
titleY = TRUE
) %>%
layout(
title = list(
text = "<b>Interactive Pop Time Analysis Dashboard</b>",
x = 0.5,
xanchor = "center"
),
showlegend = TRUE
)
return(combined_plot)
}
# Create sample throwing data
set.seed(2025)
n_attempts <- 1000
sample_throwing_data <- tibble(
catcher = sample(c("Realmuto", "Perez", "Rutschman"), n_attempts, replace = TRUE),
runner_speed = rnorm(n_attempts, 27, 1.5),
pitcher_time = rnorm(n_attempts, 1.3, 0.15),
pitcher_hand = sample(c("R", "L"), n_attempts, replace = TRUE)
) %>%
mutate(
pop_time = case_when(
catcher == "Realmuto" ~ rnorm(n(), 1.87, 0.08),
catcher == "Perez" ~ rnorm(n(), 1.92, 0.10),
catcher == "Rutschman" ~ rnorm(n(), 1.95, 0.09)
),
total_time_catcher = pitcher_time + pop_time,
total_time_runner = (127 - 12) / runner_speed,
caught_stealing = as.numeric(
total_time_catcher < total_time_runner + rnorm(n(), 0, 0.1)
)
)
# Create dashboard
pop_time_dashboard <- create_pop_time_dashboard(sample_throwing_data)
pop_time_dashboard
# Save as HTML
htmlwidgets::saveWidget(pop_time_dashboard, "pop_time_dashboard.html",
selfcontained = TRUE)
# Python: Interactive Pop Time Analysis with Filtering
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
import numpy as np
def create_pop_time_dashboard(throwing_data):
"""Create interactive dashboard for pop time analysis"""
# Create figure with subplots
fig = make_subplots(
rows=2, cols=2,
subplot_titles=('Pop Time Distribution', 'Pop Time by Catcher',
'Pop Time vs Outcome', 'Success Rate by Pop Time Range'),
specs=[[{"type": "histogram"}, {"type": "box"}],
[{"type": "scatter"}, {"type": "bar"}]],
vertical_spacing=0.12,
horizontal_spacing=0.1
)
# Get unique catchers for color mapping
catchers = throwing_data['catcher'].unique()
colors = ['rgb(31, 119, 180)', 'rgb(255, 127, 14)', 'rgb(44, 160, 44)']
color_map = {catcher: colors[i % len(colors)] for i, catcher in enumerate(catchers)}
# 1. Histogram of pop times
for catcher in catchers:
data = throwing_data[throwing_data['catcher'] == catcher]
fig.add_trace(
go.Histogram(
x=data['pop_time'],
name=catcher,
marker_color=color_map[catcher],
opacity=0.6,
nbinsx=30,
showlegend=True
),
row=1, col=1
)
# 2. Box plot by catcher
for catcher in catchers:
data = throwing_data[throwing_data['catcher'] == catcher]
fig.add_trace(
go.Box(
y=data['pop_time'],
name=catcher,
marker_color=color_map[catcher],
showlegend=False
),
row=1, col=2
)
# 3. Scatter plot: pop time vs outcome
for catcher in catchers:
data = throwing_data[throwing_data['catcher'] == catcher]
hover_text = [
f"Catcher: {catcher}<br>Pop Time: {pt:.2f}s<br>"
f"Result: {'CS' if cs else 'SB'}<br>Runner Speed: {rs:.1f} ft/s"
for pt, cs, rs in zip(data['pop_time'], data['caught_stealing'],
data['runner_speed'])
]
fig.add_trace(
go.Scatter(
x=data['pop_time'],
y=data['caught_stealing'] + np.random.uniform(-0.05, 0.05, len(data)),
mode='markers',
name=catcher,
marker=dict(size=6, opacity=0.6, color=color_map[catcher]),
text=hover_text,
hovertemplate='%{text}<extra></extra>',
showlegend=False
),
row=2, col=1
)
# 4. Success rate by pop time bins
pop_time_bins = pd.cut(throwing_data['pop_time'],
bins=[0, 1.85, 1.90, 1.95, 2.00, 2.10, np.inf],
labels=['<1.85', '1.85-1.90', '1.90-1.95',
'1.95-2.00', '2.00-2.10', '>2.10'])
throwing_data['pop_time_bin'] = pop_time_bins
bin_stats = throwing_data.groupby('pop_time_bin')['caught_stealing'].agg([
('count', 'count'),
('cs_rate', 'mean')
]).reset_index()
fig.add_trace(
go.Bar(
x=bin_stats['pop_time_bin'].astype(str),
y=bin_stats['cs_rate'] * 100,
text=[f"{rate:.1f}%" for rate in bin_stats['cs_rate'] * 100],
textposition='outside',
marker_color='rgb(55, 83, 109)',
showlegend=False
),
row=2, col=2
)
# Update axes
fig.update_xaxes(title_text="Pop Time (seconds)", row=1, col=1)
fig.update_xaxes(title_text="Catcher", row=1, col=2)
fig.update_xaxes(title_text="Pop Time (seconds)", row=2, col=1)
fig.update_xaxes(title_text="Pop Time Range", row=2, col=2)
fig.update_yaxes(title_text="Count", row=1, col=1)
fig.update_yaxes(title_text="Pop Time (seconds)", row=1, col=2)
fig.update_yaxes(title_text="Outcome (0=SB, 1=CS)", row=2, col=1)
fig.update_yaxes(title_text="CS Rate (%)", row=2, col=2)
# Update layout
fig.update_layout(
title=dict(
text="<b>Interactive Pop Time Analysis Dashboard</b>",
x=0.5,
xanchor='center',
font=dict(size=18)
),
height=800,
width=1200,
showlegend=True,
legend=dict(x=1.02, y=0.98),
barmode='overlay'
)
return fig
# Create sample throwing data
np.random.seed(2025)
n_attempts = 1000
pop_time_map = {
'Realmuto': (1.87, 0.08),
'Perez': (1.92, 0.10),
'Rutschman': (1.95, 0.09)
}
sample_throwing_data = pd.DataFrame({
'catcher': np.random.choice(['Realmuto', 'Perez', 'Rutschman'], n_attempts),
'runner_speed': np.random.normal(27, 1.5, n_attempts),
'pitcher_time': np.random.normal(1.3, 0.15, n_attempts),
'pitcher_hand': np.random.choice(['R', 'L'], n_attempts)
})
sample_throwing_data['pop_time'] = sample_throwing_data['catcher'].apply(
lambda x: np.random.normal(pop_time_map[x][0], pop_time_map[x][1])
)
sample_throwing_data['total_time_catcher'] = (
sample_throwing_data['pitcher_time'] + sample_throwing_data['pop_time']
)
sample_throwing_data['total_time_runner'] = (127 - 12) / sample_throwing_data['runner_speed']
sample_throwing_data['caught_stealing'] = (
sample_throwing_data['total_time_catcher'] <
sample_throwing_data['total_time_runner'] + np.random.normal(0, 0.1, n_attempts)
).astype(int)
# Create and display dashboard
pop_time_dashboard = create_pop_time_dashboard(sample_throwing_data)
pop_time_dashboard.show()
# Save as HTML
pop_time_dashboard.write_html("pop_time_dashboard.html")
These interactive visualizations transform static catcher analytics into dynamic exploration tools. The framing heat map reveals location-specific receiving skill, the radar chart enables multi-dimensional comparisons, and the pop time dashboard provides comprehensive throwing analysis with real-time filtering. By incorporating these interactive elements into scouting reports and front office presentations, teams can make more informed decisions about catcher acquisition, development, and deployment. The ability to export these visualizations as standalone HTML files ensures that insights can be easily shared across organizations without requiring specialized software or programming knowledge.
# R: Interactive Framing Heat Map with Plotly
library(plotly)
library(tidyverse)
library(htmlwidgets)
# Prepare framing data with location bins
create_framing_heatmap_data <- function(pitch_data, catcher_name) {
pitch_data %>%
filter(catcher == catcher_name) %>%
mutate(
# Create location bins (20x20 grid)
x_bin = cut(plate_x, breaks = seq(-2, 2, length.out = 21),
labels = FALSE, include.lowest = TRUE),
z_bin = cut(plate_z, breaks = seq(0, 5, length.out = 21),
labels = FALSE, include.lowest = TRUE),
# Calculate bin centers
x_center = seq(-1.9, 1.9, length.out = 20)[x_bin],
z_center = seq(0.125, 4.875, length.out = 20)[z_bin]
) %>%
group_by(x_center, z_center) %>%
summarise(
pitches = n(),
actual_strike_rate = mean(called_strike, na.rm = TRUE),
expected_strike_rate = mean(expected_strike, na.rm = TRUE),
framing_value = actual_strike_rate - expected_strike_rate,
.groups = "drop"
) %>%
filter(pitches >= 5) # Minimum sample size per bin
}
# Create interactive heat map
create_interactive_framing_heatmap <- function(pitch_data, catcher_name) {
heatmap_data <- create_framing_heatmap_data(pitch_data, catcher_name)
# Create matrix for heatmap
x_coords <- sort(unique(heatmap_data$x_center))
z_coords <- sort(unique(heatmap_data$z_center))
framing_matrix <- matrix(NA, nrow = length(z_coords), ncol = length(x_coords))
for (i in seq_len(nrow(heatmap_data))) {
row_idx <- which(z_coords == heatmap_data$z_center[i])
col_idx <- which(x_coords == heatmap_data$x_center[i])
framing_matrix[row_idx, col_idx] <- heatmap_data$framing_value[i]
}
# Create custom hover text matrix
hover_matrix <- matrix("", nrow = length(z_coords), ncol = length(x_coords))
for (i in seq_len(nrow(heatmap_data))) {
row_idx <- which(z_coords == heatmap_data$z_center[i])
col_idx <- which(x_coords == heatmap_data$x_center[i])
hover_matrix[row_idx, col_idx] <- paste0(
"Location: (", round(heatmap_data$x_center[i], 2), ", ",
round(heatmap_data$z_center[i], 2), ")<br>",
"Pitches: ", heatmap_data$pitches[i], "<br>",
"Actual Strike %: ", round(heatmap_data$actual_strike_rate[i] * 100, 1), "%<br>",
"Expected Strike %: ", round(heatmap_data$expected_strike_rate[i] * 100, 1), "%<br>",
"Framing Value: ", round(heatmap_data$framing_value[i] * 100, 1), "%"
)
}
# Create plotly heatmap
fig <- plot_ly(
x = x_coords,
y = z_coords,
z = framing_matrix,
type = "heatmap",
colorscale = list(
c(0, "rgb(220, 50, 50)"), # Red for negative
c(0.5, "rgb(255, 255, 255)"), # White for neutral
c(1, "rgb(50, 50, 220)") # Blue for positive
),
zmid = 0,
zmin = -0.15,
zmax = 0.15,
colorbar = list(title = "Framing<br>Value"),
hovertemplate = paste0(
"%{text}<extra></extra>"
),
text = hover_matrix
) %>%
# Add strike zone rectangle
add_segments(
x = -0.708, xend = 0.708, y = 1.5, yend = 1.5,
line = list(color = "black", width = 3),
showlegend = FALSE, inherit = FALSE
) %>%
add_segments(
x = -0.708, xend = 0.708, y = 3.5, yend = 3.5,
line = list(color = "black", width = 3),
showlegend = FALSE, inherit = FALSE
) %>%
add_segments(
x = -0.708, xend = -0.708, y = 1.5, yend = 3.5,
line = list(color = "black", width = 3),
showlegend = FALSE, inherit = FALSE
) %>%
add_segments(
x = 0.708, xend = 0.708, y = 1.5, yend = 3.5,
line = list(color = "black", width = 3),
showlegend = FALSE, inherit = FALSE
) %>%
layout(
title = list(
text = paste0("<b>", catcher_name, " - Pitch Framing Heat Map</b>"),
x = 0.5,
xanchor = "center"
),
xaxis = list(
title = "Horizontal Location (feet)",
range = c(-2, 2),
constrain = "domain"
),
yaxis = list(
title = "Vertical Location (feet)",
range = c(0, 5),
scaleanchor = "x",
scaleratio = 1
),
plot_bgcolor = "rgb(240, 240, 240)",
paper_bgcolor = "white"
) %>%
config(displayModeBar = TRUE,
modeBarButtonsToRemove = c("lasso2d", "select2d"))
return(fig)
}
# Example usage with sample data
set.seed(2023)
sample_pitch_data <- tibble(
catcher = sample(c("Realmuto", "Barnhart", "Perez"), 10000, replace = TRUE),
plate_x = rnorm(10000, 0, 0.8),
plate_z = rnorm(10000, 2.5, 0.6),
dist_from_center = sqrt(plate_x^2 + (plate_z - 2.5)^2),
strike_prob = plogis(2 - 2.5 * dist_from_center),
catcher_effect = case_when(
catcher == "Barnhart" ~ 0.4,
catcher == "Realmuto" ~ 0.3,
catcher == "Perez" ~ -0.2
),
expected_strike = strike_prob,
called_strike = rbinom(10000, 1, plogis(qlogis(strike_prob) + catcher_effect))
)
# Create and display interactive heatmap
framing_heatmap <- create_interactive_framing_heatmap(sample_pitch_data, "Barnhart")
framing_heatmap
# Save as HTML file
htmlwidgets::saveWidget(framing_heatmap, "framing_heatmap.html",
selfcontained = TRUE)
# R: Interactive Catcher Comparison Radar Chart
library(plotly)
create_catcher_radar_chart <- function(catcher_stats, catchers_to_compare) {
# Filter to selected catchers
plot_data <- catcher_stats %>%
filter(catcher %in% catchers_to_compare)
# Normalize metrics to 0-100 scale for radar chart
metrics <- c("framing_runs", "blocking_runs", "throwing_runs",
"fielding_runs", "batting_runs")
normalized_data <- plot_data %>%
mutate(across(all_of(metrics),
~scales::rescale(., to = c(0, 100),
from = range(catcher_stats[[cur_column()]]))))
# Create plotly radar chart
fig <- plot_ly(
type = 'scatterpolar',
fill = 'toself'
)
# Add trace for each catcher
for (i in seq_len(nrow(normalized_data))) {
catcher_row <- normalized_data[i, ]
fig <- fig %>%
add_trace(
r = c(catcher_row$framing_runs, catcher_row$blocking_runs,
catcher_row$throwing_runs, catcher_row$fielding_runs,
catcher_row$batting_runs),
theta = c('Framing', 'Blocking', 'Throwing', 'Fielding', 'Batting'),
name = catcher_row$catcher,
mode = 'lines+markers',
marker = list(size = 8),
line = list(width = 2),
hovertemplate = paste0(
"<b>%{theta}</b><br>",
"Percentile: %{r:.1f}<br>",
"<extra>", catcher_row$catcher, "</extra>"
)
)
}
fig <- fig %>%
layout(
polar = list(
radialaxis = list(
visible = TRUE,
range = c(0, 100),
ticktext = c("0", "25", "50", "75", "100"),
tickvals = c(0, 25, 50, 75, 100)
)
),
title = list(
text = "<b>Catcher Performance Comparison</b>",
x = 0.5,
xanchor = "center"
),
showlegend = TRUE,
legend = list(
orientation = "v",
x = 1.1,
y = 0.5
)
)
return(fig)
}
# Create sample catcher statistics
catcher_comparison_stats <- tibble(
catcher = c("Realmuto", "Rutschman", "Smith", "Barnhart", "Perez"),
framing_runs = c(15, 12, 10, 22, -8),
blocking_runs = c(3, 4, 2, 3, 1),
throwing_runs = c(8, 4, 2, 0, 6),
fielding_runs = c(2, 1, 1, 0, 1),
batting_runs = c(25, 20, 18, -10, 15)
)
# Create interactive radar chart
radar_chart <- create_catcher_radar_chart(
catcher_comparison_stats,
c("Realmuto", "Barnhart", "Perez")
)
radar_chart
# Save as HTML
htmlwidgets::saveWidget(radar_chart, "catcher_radar.html", selfcontained = TRUE)
# R: Interactive Pop Time Analysis with Filtering
library(plotly)
library(crosstalk)
create_pop_time_dashboard <- function(throwing_data) {
# Create shared data for crosstalk filtering
shared_data <- SharedData$new(throwing_data)
# Create distribution plot
pop_time_hist <- plot_ly(shared_data, x = ~pop_time, color = ~catcher,
type = "histogram", alpha = 0.6, nbinsx = 30) %>%
layout(
title = "Pop Time Distribution",
xaxis = list(title = "Pop Time (seconds)"),
yaxis = list(title = "Count"),
barmode = "overlay"
)
# Create box plot by catcher
pop_time_box <- plot_ly(shared_data, y = ~pop_time, color = ~catcher,
type = "box") %>%
layout(
title = "Pop Time by Catcher",
yaxis = list(title = "Pop Time (seconds)"),
xaxis = list(title = "Catcher")
)
# Create scatter plot: pop time vs caught stealing
pop_time_scatter <- plot_ly(shared_data,
x = ~pop_time,
y = ~caught_stealing,
color = ~catcher,
type = "scatter",
mode = "markers",
marker = list(size = 6, opacity = 0.6),
text = ~paste(
"Catcher:", catcher, "<br>",
"Pop Time:", round(pop_time, 2), "s<br>",
"Result:", ifelse(caught_stealing, "CS", "SB"), "<br>",
"Runner Speed:", round(runner_speed, 1), "ft/s"
),
hoverinfo = "text") %>%
layout(
title = "Pop Time vs Outcome",
xaxis = list(title = "Pop Time (seconds)"),
yaxis = list(title = "Caught Stealing", tickvals = c(0, 1),
ticktext = c("Safe", "Out"))
)
# Combine plots using subplot
combined_plot <- subplot(
pop_time_hist,
pop_time_box,
pop_time_scatter,
nrows = 2,
heights = c(0.4, 0.6),
shareX = FALSE,
titleX = TRUE,
titleY = TRUE
) %>%
layout(
title = list(
text = "<b>Interactive Pop Time Analysis Dashboard</b>",
x = 0.5,
xanchor = "center"
),
showlegend = TRUE
)
return(combined_plot)
}
# Create sample throwing data
set.seed(2025)
n_attempts <- 1000
sample_throwing_data <- tibble(
catcher = sample(c("Realmuto", "Perez", "Rutschman"), n_attempts, replace = TRUE),
runner_speed = rnorm(n_attempts, 27, 1.5),
pitcher_time = rnorm(n_attempts, 1.3, 0.15),
pitcher_hand = sample(c("R", "L"), n_attempts, replace = TRUE)
) %>%
mutate(
pop_time = case_when(
catcher == "Realmuto" ~ rnorm(n(), 1.87, 0.08),
catcher == "Perez" ~ rnorm(n(), 1.92, 0.10),
catcher == "Rutschman" ~ rnorm(n(), 1.95, 0.09)
),
total_time_catcher = pitcher_time + pop_time,
total_time_runner = (127 - 12) / runner_speed,
caught_stealing = as.numeric(
total_time_catcher < total_time_runner + rnorm(n(), 0, 0.1)
)
)
# Create dashboard
pop_time_dashboard <- create_pop_time_dashboard(sample_throwing_data)
pop_time_dashboard
# Save as HTML
htmlwidgets::saveWidget(pop_time_dashboard, "pop_time_dashboard.html",
selfcontained = TRUE)
# Python: Interactive Framing Heat Map with Plotly
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy.stats import binned_statistic_2d
def create_framing_heatmap_data(pitch_data, catcher_name):
"""Prepare binned framing data for heatmap visualization"""
catcher_data = pitch_data[pitch_data['catcher'] == catcher_name].copy()
# Create 20x20 grid
x_bins = np.linspace(-2, 2, 21)
z_bins = np.linspace(0, 5, 21)
# Calculate statistics for each bin
actual_strikes, x_edges, z_edges, _ = binned_statistic_2d(
catcher_data['plate_x'], catcher_data['plate_z'],
catcher_data['called_strike'],
statistic='mean', bins=[x_bins, z_bins]
)
expected_strikes, _, _, _ = binned_statistic_2d(
catcher_data['plate_x'], catcher_data['plate_z'],
catcher_data['expected_strike'],
statistic='mean', bins=[x_bins, z_bins]
)
pitch_counts, _, _, _ = binned_statistic_2d(
catcher_data['plate_x'], catcher_data['plate_z'],
catcher_data['called_strike'],
statistic='count', bins=[x_bins, z_bins]
)
# Calculate framing value
framing_value = actual_strikes - expected_strikes
# Mask bins with insufficient data
framing_value[pitch_counts < 5] = np.nan
actual_strikes[pitch_counts < 5] = np.nan
expected_strikes[pitch_counts < 5] = np.nan
return {
'framing_value': framing_value,
'actual_strikes': actual_strikes,
'expected_strikes': expected_strikes,
'pitch_counts': pitch_counts,
'x_centers': (x_edges[:-1] + x_edges[1:]) / 2,
'z_centers': (z_edges[:-1] + z_edges[1:]) / 2
}
def create_interactive_framing_heatmap(pitch_data, catcher_name):
"""Create interactive Plotly heatmap for pitch framing"""
data = create_framing_heatmap_data(pitch_data, catcher_name)
# Create custom hover text
hover_text = []
for i in range(len(data['z_centers'])):
hover_row = []
for j in range(len(data['x_centers'])):
if np.isnan(data['framing_value'][i, j]):
hover_row.append('')
else:
text = (
f"Location: ({data['x_centers'][j]:.2f}, {data['z_centers'][i]:.2f})<br>"
f"Pitches: {int(data['pitch_counts'][i, j])}<br>"
f"Actual Strike %: {data['actual_strikes'][i, j]*100:.1f}%<br>"
f"Expected Strike %: {data['expected_strikes'][i, j]*100:.1f}%<br>"
f"Framing Value: {data['framing_value'][i, j]*100:+.1f}%"
)
hover_row.append(text)
hover_text.append(hover_row)
# Create figure
fig = go.Figure()
# Add heatmap
fig.add_trace(go.Heatmap(
x=data['x_centers'],
y=data['z_centers'],
z=data['framing_value'],
colorscale=[
[0, 'rgb(220, 50, 50)'], # Red for negative
[0.5, 'rgb(255, 255, 255)'], # White for neutral
[1, 'rgb(50, 50, 220)'] # Blue for positive
],
zmid=0,
zmin=-0.15,
zmax=0.15,
colorbar=dict(title="Framing<br>Value"),
hovertemplate='%{text}<extra></extra>',
text=hover_text
))
# Add strike zone rectangle
zone_x = [-0.708, 0.708, 0.708, -0.708, -0.708]
zone_z = [1.5, 1.5, 3.5, 3.5, 1.5]
fig.add_trace(go.Scatter(
x=zone_x, y=zone_z,
mode='lines',
line=dict(color='black', width=3),
showlegend=False,
hoverinfo='skip'
))
# Update layout
fig.update_layout(
title=dict(
text=f"<b>{catcher_name} - Pitch Framing Heat Map</b>",
x=0.5,
xanchor='center',
font=dict(size=18)
),
xaxis=dict(
title="Horizontal Location (feet)",
range=[-2, 2],
constrain='domain'
),
yaxis=dict(
title="Vertical Location (feet)",
range=[0, 5],
scaleanchor="x",
scaleratio=1
),
plot_bgcolor='rgb(240, 240, 240)',
paper_bgcolor='white',
width=700,
height=700
)
return fig
# Example usage with sample data
np.random.seed(2023)
n = 10000
def inv_logit(x):
return 1 / (1 + np.exp(-x))
sample_pitch_data = pd.DataFrame({
'catcher': np.random.choice(['Realmuto', 'Barnhart', 'Perez'], n),
'plate_x': np.random.normal(0, 0.8, n),
'plate_z': np.random.normal(2.5, 0.6, n)
})
sample_pitch_data['dist_from_center'] = np.sqrt(
sample_pitch_data['plate_x']**2 +
(sample_pitch_data['plate_z'] - 2.5)**2
)
catcher_effects = {'Barnhart': 0.4, 'Realmuto': 0.3, 'Perez': -0.2}
sample_pitch_data['catcher_effect'] = sample_pitch_data['catcher'].map(catcher_effects)
base_logit = 2 - 2.5 * sample_pitch_data['dist_from_center']
sample_pitch_data['expected_strike'] = inv_logit(base_logit)
strike_prob = inv_logit(base_logit + sample_pitch_data['catcher_effect'])
sample_pitch_data['called_strike'] = np.random.binomial(1, strike_prob)
# Create and display interactive heatmap
framing_heatmap = create_interactive_framing_heatmap(sample_pitch_data, 'Barnhart')
framing_heatmap.show()
# Save as HTML file
framing_heatmap.write_html("framing_heatmap.html")
# Python: Interactive Catcher Comparison Radar Chart
import plotly.graph_objects as go
import pandas as pd
import numpy as np
def create_catcher_radar_chart(catcher_stats, catchers_to_compare):
"""Create interactive radar chart comparing multiple catchers"""
# Filter to selected catchers
plot_data = catcher_stats[catcher_stats['catcher'].isin(catchers_to_compare)].copy()
# Metrics to display
metrics = ['framing_runs', 'blocking_runs', 'throwing_runs',
'fielding_runs', 'batting_runs']
metric_labels = ['Framing', 'Blocking', 'Throwing', 'Fielding', 'Batting']
# Normalize to 0-100 percentile scale
for metric in metrics:
min_val = catcher_stats[metric].min()
max_val = catcher_stats[metric].max()
plot_data[f'{metric}_norm'] = (
(plot_data[metric] - min_val) / (max_val - min_val) * 100
)
# Create figure
fig = go.Figure()
# Add trace for each catcher
colors = ['rgb(31, 119, 180)', 'rgb(255, 127, 14)', 'rgb(44, 160, 44)',
'rgb(214, 39, 40)', 'rgb(148, 103, 189)']
for idx, (_, catcher_row) in enumerate(plot_data.iterrows()):
values = [catcher_row[f'{m}_norm'] for m in metrics]
actual_values = [catcher_row[m] for m in metrics]
# Create hover text with actual values
hover_text = [
f"<b>{label}</b><br>Percentile: {val:.1f}<br>Actual: {actual:+.1f} runs"
for label, val, actual in zip(metric_labels, values, actual_values)
]
fig.add_trace(go.Scatterpolar(
r=values,
theta=metric_labels,
fill='toself',
name=catcher_row['catcher'],
line=dict(color=colors[idx % len(colors)], width=2),
marker=dict(size=8),
hovertemplate='%{text}<extra>' + catcher_row['catcher'] + '</extra>',
text=hover_text
))
# Update layout
fig.update_layout(
polar=dict(
radialaxis=dict(
visible=True,
range=[0, 100],
ticktext=['0', '25', '50', '75', '100'],
tickvals=[0, 25, 50, 75, 100]
)
),
title=dict(
text="<b>Catcher Performance Comparison</b>",
x=0.5,
xanchor='center',
font=dict(size=18)
),
showlegend=True,
legend=dict(
orientation="v",
x=1.1,
y=0.5
),
width=800,
height=600
)
return fig
# Create sample catcher statistics
catcher_comparison_stats = pd.DataFrame({
'catcher': ['Realmuto', 'Rutschman', 'Smith', 'Barnhart', 'Perez'],
'framing_runs': [15, 12, 10, 22, -8],
'blocking_runs': [3, 4, 2, 3, 1],
'throwing_runs': [8, 4, 2, 0, 6],
'fielding_runs': [2, 1, 1, 0, 1],
'batting_runs': [25, 20, 18, -10, 15]
})
# Create interactive radar chart
radar_chart = create_catcher_radar_chart(
catcher_comparison_stats,
['Realmuto', 'Barnhart', 'Perez']
)
radar_chart.show()
# Save as HTML
radar_chart.write_html("catcher_radar.html")
# Python: Interactive Pop Time Analysis with Filtering
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
import numpy as np
def create_pop_time_dashboard(throwing_data):
"""Create interactive dashboard for pop time analysis"""
# Create figure with subplots
fig = make_subplots(
rows=2, cols=2,
subplot_titles=('Pop Time Distribution', 'Pop Time by Catcher',
'Pop Time vs Outcome', 'Success Rate by Pop Time Range'),
specs=[[{"type": "histogram"}, {"type": "box"}],
[{"type": "scatter"}, {"type": "bar"}]],
vertical_spacing=0.12,
horizontal_spacing=0.1
)
# Get unique catchers for color mapping
catchers = throwing_data['catcher'].unique()
colors = ['rgb(31, 119, 180)', 'rgb(255, 127, 14)', 'rgb(44, 160, 44)']
color_map = {catcher: colors[i % len(colors)] for i, catcher in enumerate(catchers)}
# 1. Histogram of pop times
for catcher in catchers:
data = throwing_data[throwing_data['catcher'] == catcher]
fig.add_trace(
go.Histogram(
x=data['pop_time'],
name=catcher,
marker_color=color_map[catcher],
opacity=0.6,
nbinsx=30,
showlegend=True
),
row=1, col=1
)
# 2. Box plot by catcher
for catcher in catchers:
data = throwing_data[throwing_data['catcher'] == catcher]
fig.add_trace(
go.Box(
y=data['pop_time'],
name=catcher,
marker_color=color_map[catcher],
showlegend=False
),
row=1, col=2
)
# 3. Scatter plot: pop time vs outcome
for catcher in catchers:
data = throwing_data[throwing_data['catcher'] == catcher]
hover_text = [
f"Catcher: {catcher}<br>Pop Time: {pt:.2f}s<br>"
f"Result: {'CS' if cs else 'SB'}<br>Runner Speed: {rs:.1f} ft/s"
for pt, cs, rs in zip(data['pop_time'], data['caught_stealing'],
data['runner_speed'])
]
fig.add_trace(
go.Scatter(
x=data['pop_time'],
y=data['caught_stealing'] + np.random.uniform(-0.05, 0.05, len(data)),
mode='markers',
name=catcher,
marker=dict(size=6, opacity=0.6, color=color_map[catcher]),
text=hover_text,
hovertemplate='%{text}<extra></extra>',
showlegend=False
),
row=2, col=1
)
# 4. Success rate by pop time bins
pop_time_bins = pd.cut(throwing_data['pop_time'],
bins=[0, 1.85, 1.90, 1.95, 2.00, 2.10, np.inf],
labels=['<1.85', '1.85-1.90', '1.90-1.95',
'1.95-2.00', '2.00-2.10', '>2.10'])
throwing_data['pop_time_bin'] = pop_time_bins
bin_stats = throwing_data.groupby('pop_time_bin')['caught_stealing'].agg([
('count', 'count'),
('cs_rate', 'mean')
]).reset_index()
fig.add_trace(
go.Bar(
x=bin_stats['pop_time_bin'].astype(str),
y=bin_stats['cs_rate'] * 100,
text=[f"{rate:.1f}%" for rate in bin_stats['cs_rate'] * 100],
textposition='outside',
marker_color='rgb(55, 83, 109)',
showlegend=False
),
row=2, col=2
)
# Update axes
fig.update_xaxes(title_text="Pop Time (seconds)", row=1, col=1)
fig.update_xaxes(title_text="Catcher", row=1, col=2)
fig.update_xaxes(title_text="Pop Time (seconds)", row=2, col=1)
fig.update_xaxes(title_text="Pop Time Range", row=2, col=2)
fig.update_yaxes(title_text="Count", row=1, col=1)
fig.update_yaxes(title_text="Pop Time (seconds)", row=1, col=2)
fig.update_yaxes(title_text="Outcome (0=SB, 1=CS)", row=2, col=1)
fig.update_yaxes(title_text="CS Rate (%)", row=2, col=2)
# Update layout
fig.update_layout(
title=dict(
text="<b>Interactive Pop Time Analysis Dashboard</b>",
x=0.5,
xanchor='center',
font=dict(size=18)
),
height=800,
width=1200,
showlegend=True,
legend=dict(x=1.02, y=0.98),
barmode='overlay'
)
return fig
# Create sample throwing data
np.random.seed(2025)
n_attempts = 1000
pop_time_map = {
'Realmuto': (1.87, 0.08),
'Perez': (1.92, 0.10),
'Rutschman': (1.95, 0.09)
}
sample_throwing_data = pd.DataFrame({
'catcher': np.random.choice(['Realmuto', 'Perez', 'Rutschman'], n_attempts),
'runner_speed': np.random.normal(27, 1.5, n_attempts),
'pitcher_time': np.random.normal(1.3, 0.15, n_attempts),
'pitcher_hand': np.random.choice(['R', 'L'], n_attempts)
})
sample_throwing_data['pop_time'] = sample_throwing_data['catcher'].apply(
lambda x: np.random.normal(pop_time_map[x][0], pop_time_map[x][1])
)
sample_throwing_data['total_time_catcher'] = (
sample_throwing_data['pitcher_time'] + sample_throwing_data['pop_time']
)
sample_throwing_data['total_time_runner'] = (127 - 12) / sample_throwing_data['runner_speed']
sample_throwing_data['caught_stealing'] = (
sample_throwing_data['total_time_catcher'] <
sample_throwing_data['total_time_runner'] + np.random.normal(0, 0.1, n_attempts)
).astype(int)
# Create and display dashboard
pop_time_dashboard = create_pop_time_dashboard(sample_throwing_data)
pop_time_dashboard.show()
# Save as HTML
pop_time_dashboard.write_html("pop_time_dashboard.html")
Exercise 1: Build a Pitch Framing Model
Using the provided pitch-level data (or simulated data), build a logistic regression model to predict called strikes based on pitch location, count, and other factors. Then calculate framing runs for different catchers.
Tasks:
a) Build a baseline called strike probability model using pitch location (platex, platez) and count variables
b) Calculate expected strikes vs. actual strikes for each catcher
c) Convert the difference to framing runs (use 0.125 runs per extra strike)
d) Identify which zones (in-zone, edge, out-of-zone) show the largest framing effects
e) Visualize framing value by location using heatmaps for the top 3 and bottom 3 framers
Extension: Incorporate umpire effects by building separate models for different umpires or umpire types (wide/tight zones).
Exercise 2: Blocking Analysis and Difficulty Adjustment
Analyze blocking performance while accounting for pitch difficulty.
Tasks:
a) Create a difficulty metric based on pitch location (distance from home plate, horizontal location, height)
b) Build a model predicting block probability based on difficulty factors
c) Calculate blocks above expected for different catchers
d) Determine which catchers excel on "hard" blocks vs. "easy" blocks
e) Estimate the run value of blocking ability (assume each failed block on a pitch with runners on costs 0.3 runs)
Extension: Analyze whether blocking ability degrades over the course of a game (late innings) or season (fatigue effects).
Exercise 3: Throwing Value and Deterrence Effects
Examine both direct throwing value (CS) and indirect deterrence value.
Tasks:
a) Build a model predicting caught stealing probability based on runner speed, pitcher delivery time, and lead distance
b) Calculate CS above expected for catchers, accounting for quality of baserunners faced
c) Analyze stolen base attempt rates faced by different catchers (controlling for pitcher and game situation)
d) Estimate the deterrence value: how many stolen base attempts are prevented by elite arms?
e) Combine direct CS value and deterrence value into a comprehensive throwing runs metric
Extension: Investigate whether certain catchers are more effective against specific types of runners (speed, steal success rate).
Exercise 4: Comprehensive Catcher Value Model
Build a complete catcher value model integrating all defensive components.
Tasks:
a) Combine framing runs, blocking runs, throwing runs, and fielding runs into a composite defensive value metric
b) Add offensive value (batting runs above average) to create total value
c) Convert total value to WAR using a runs-to-wins converter (typically 10 runs per win)
d) Compare your catcher WAR estimates to public metrics (FanGraphs, Baseball Reference)
e) Identify which catchers provide the most value relative to their salary (create a value-per-dollar metric)
Extension: Build a playing time optimizer that determines the optimal distribution of innings between multiple catchers on a roster, considering both performance and fatigue/injury risk.
Summary
Catcher analytics has evolved dramatically in the past two decades, transforming from a position evaluated primarily on offensive production to one where defensive skills—particularly pitch framing—drive valuation. Modern catcher metrics quantify:
- Pitch Framing: Elite framers add 15-25 runs per season through receiving
- Blocking: Top blockers save 5-10 runs by preventing wild pitches and passed balls
- Throwing: Elite arms add 5-10 runs through caught stealing and deterrence
- Game Calling: The least quantifiable but still valued skill in pitcher management
The integration of these components into comprehensive value models allows teams to properly value defensive specialists like Tucker Barnhart alongside offensive-minded catchers like Salvador Perez. The position continues to evolve with technological changes (PitchCom, potential automated strike zones) and rule modifications (larger bases, pickoff restrictions) that alter the relative importance of different skills.
Understanding catcher analytics requires both quantitative rigor in measurement and appreciation for the subtleties of receiving technique, pitcher relationships, and game management that resist easy quantification. Teams that excel at identifying, developing, and deploying catcher talent gain meaningful competitive advantages in player acquisition and roster construction.
References and Further Reading
- Fast, M. (2011). "Spinning Yarn: The Art of Pitching." The Hardball Times Baseball Annual
- Judge, J., Pavlidis, H., & Brooks, D. (2015). "Moving Beyond WOWY: A Mixed Approach to Measuring Catcher Framing." Baseball Prospectus
- Turkenkopf, M. (2008). "Evaluating Catchers: Framing Pitches." The Hardball Times
- Lindbergh, B. & Sawchik, T. (2015). The Only Rule Is It Has To Work. Chapter on catcher framing
- Mills, B. & Braun, S. (2019). "The Effect of Pitch Framing on the Strike Zone." Journal of Sports Analytics
Data Sources:
- Statcast (Baseball Savant): Pop time, arm strength, framing metrics
- FanGraphs: Comprehensive catcher defense metrics
- Baseball Prospectus: Framing runs, blocking runs, FRAA
- Baseball Reference: Traditional catcher statistics and WAR