Chapter 20: Catcher Analytics & Pitch Framing

20.1 The Value of Catchers

The catcher position has long been considered the most analytically complex position in baseball. While traditional statistics like batting average and home runs capture only a fraction of a catcher's contribution, modern analytics have revolutionized our understanding of defensive value behind the plate. Catchers influence the game in multiple dimensions: pitch framing, blocking pitches in the dirt, controlling the running game, calling pitches, and managing the pitching staff.

The Evolution of Catcher Evaluation

Historically, catchers were evaluated primarily on their offensive production and basic defensive stats like caught stealing percentage and passed balls. Hall of Fame voters often overlooked defensive specialists, favoring catchers who could hit. However, the analytical revolution revealed that elite pitch framers could add 20-30 runs per season through receiving alone—a contribution worth several wins and millions of dollars in free agent value.

The introduction of PITCHf/x in 2006, and later Statcast in 2015, provided the granular pitch-location data necessary to quantify pitch framing. This technological advancement fundamentally changed how front offices evaluate catchers. Teams began prioritizing receiving skills, leading to longer careers for defense-first catchers like Jeff Mathis and the rise of framing specialists.

Components of Catcher Value

Modern catcher evaluation encompasses five primary defensive skills:

Pitch Framing: The ability to receive pitches in a way that maximizes strike calls
Blocking: Preventing wild pitches and passed balls on pitches in the dirt
Throwing: Deterring and throwing out base stealers
Game Calling: Pitch selection and sequencing
Pitcher Management: Building rapport and optimizing pitcher performance

Elite catchers like J.T. Realmuto and Adley Rutschman excel in multiple categories, while others specialize. For example, Tucker Barnhart built a career on elite framing despite modest offensive production, while Salvador Perez remained valuable primarily through his offensive contributions despite below-average framing.

Economic Value

The market has recognized catcher defense. When Yasmani Grandal signed a 4-year, $73 million contract with the White Sox in 2020, his elite framing ability (which had saved 150+ runs over the previous five seasons) was central to his valuation. Similarly, teams have traded prospects for rental catchers with elite defensive skills, understanding that catching defense can be the difference in playoff races.

Research by Mike Fast, the pioneer of pitch framing analysis, estimated that one framing run equals approximately 0.9 WAR wins. At the 2020 free agent market rate of roughly $8-9 million per win, elite framers provide $15-25 million in annual value from receiving alone. This explains why teams like the Rays have built organizational expertise around catcher development and framing optimization.

The Catcher Defensive Spectrum

Let's examine where recent catchers fall on the defensive spectrum using 2019-2023 data as a reference period:

Elite All-Around Defenders: J.T. Realmuto, Adley Rutschman, Sean Murphy, Will Smith (LAD)
Elite Framers: Tucker Barnhart, Austin Hedges, Tyler Stephenson
Elite Throwing: J.T. Realmuto, Salvador Perez, Jorge Alfaro
Elite Blockers: Yadier Molina (before retirement), Roberto Perez, Yan Gomes

Understanding these specializations helps teams construct rosters. A pitching staff with excellent control might prioritize framing over blocking, while a team with young, wild pitchers might value blocking more highly.

20.2 Pitch Framing Fundamentals & Metrics

Pitch framing—the art of receiving pitches to maximize called strikes—represents the most quantifiable and valuable defensive skill catchers provide. Research shows that the difference between the best and worst framers can exceed 40 runs per season, equivalent to four wins.

The Mechanics of Framing

Effective pitch framing involves several technical components:

Quiet Receiving: Minimizing glove movement after pitch reception
Receiving Inside: Catching the ball near the strike zone rather than reaching
Smooth Transfers: Presenting pitches with subtle, continuous motion toward the zone
Thumb Position: Keeping the thumb tucked to create a smooth target
Stick: Holding the glove position momentarily after reception

The best framers, like Yasmani Grandal and Tucker Barnhart, employ barely perceptible techniques that influence umpire judgment. They avoid "snatching" or yanking pitches into the zone—an obvious technique that often backfires—instead using subtle body positioning and glove work.

Framing Metrics: Called Strike Probability Models

Modern framing metrics use probabilistic models to estimate the expected called strike rate for each pitch based on location, count, pitcher handedness, batter handedness, and umpire tendencies. The catcher receives credit or blame for the difference between actual and expected outcomes.

The basic framework:

Build a logistic regression model predicting P(Called Strike) based on pitch characteristics
Calculate expected strikes for each catcher's called pitches
Compare actual called strikes to expected called strikes
Convert the difference to runs using linear weights

Key Framing Metrics:

Framing Runs: Total runs added/lost through framing (Baseball Prospectus)
Strike Rate+: Strike rate compared to league average, scaled to 100 (FanGraphs)
Adjusted Caught Stealing Above Average (CSAAv): Statcast's framing metric in runs
Calls Above Average (CAA): Called strikes above expected

Let's build a framing model using pitch location data:

# R: Building a Called Strike Probability Model
library(tidyverse)
library(mgcv)  # For GAM models

# Simulate pitch-level data structure
set.seed(2023)
n_pitches <- 50000

pitch_data <- tibble(
  pitch_id = 1:n_pitches,
  plate_x = rnorm(n_pitches, 0, 0.8),  # Horizontal location (feet from center)
  plate_z = rnorm(n_pitches, 2.5, 0.6),  # Vertical location (feet)
  catcher = sample(c("Realmuto", "Rutschman", "Barnhart", "Perez", "League_Avg"),
                   n_pitches, replace = TRUE),
  pitch_type = sample(c("FF", "SL", "CH", "CU"), n_pitches, replace = TRUE),
  balls = sample(0:3, n_pitches, replace = TRUE),
  strikes = sample(0:2, n_pitches, replace = TRUE),
  p_throws = sample(c("R", "L"), n_pitches, replace = TRUE),
  stand = sample(c("R", "L"), n_pitches, replace = TRUE)
) %>%
  # Simulate called strike outcomes based on location
  mutate(
    # Distance from center of zone
    dist_from_center = sqrt(plate_x^2 + (plate_z - 2.5)^2),
    # Probability decreases with distance from zone
    strike_prob = plogis(2 - 2.5 * dist_from_center),
    # Add catcher effect
    catcher_effect = case_when(
      catcher == "Barnhart" ~ 0.4,
      catcher == "Realmuto" ~ 0.3,
      catcher == "Rutschman" ~ 0.25,
      catcher == "League_Avg" ~ 0,
      catcher == "Perez" ~ -0.2
    ),
    strike_prob_adj = plogis(qlogis(strike_prob) + catcher_effect),
    called_strike = rbinom(n_pitches, 1, strike_prob_adj),
    # Filter to called pitches only (no swings)
    is_called = sample(c(TRUE, FALSE), n_pitches, replace = TRUE, prob = c(0.6, 0.4))
  ) %>%
  filter(is_called)

# Build baseline GAM model without catcher
baseline_model <- gam(
  called_strike ~ s(plate_x, plate_z, k = 100) +
                  factor(balls) + factor(strikes) +
                  factor(p_throws) + factor(stand),
  data = pitch_data,
  family = binomial
)

# Add predictions to data
pitch_data <- pitch_data %>%
  mutate(expected_strike = predict(baseline_model, newdata = ., type = "response"))

# Calculate framing runs by catcher
framing_results <- pitch_data %>%
  group_by(catcher) %>%
  summarise(
    pitches = n(),
    actual_strikes = sum(called_strike),
    expected_strikes = sum(expected_strike),
    extra_strikes = actual_strikes - expected_strikes,
    # Convert to runs (approximately 0.125 runs per strike)
    framing_runs = extra_strikes * 0.125,
    # Per 7000 called pitches (typical season)
    framing_runs_per_7000 = framing_runs / pitches * 7000
  ) %>%
  arrange(desc(framing_runs_per_7000))

print(framing_results)

# Visualize the strike zone by catcher
library(ggplot2)

# Create heatmap of framing value by location
framing_heatmap <- pitch_data %>%
  filter(catcher %in% c("Barnhart", "Realmuto", "Perez")) %>%
  mutate(
    x_bin = cut(plate_x, breaks = seq(-2, 2, 0.2)),
    z_bin = cut(plate_z, breaks = seq(0, 5, 0.2))
  ) %>%
  group_by(catcher, x_bin, z_bin) %>%
  summarise(
    framing_value = mean(called_strike - expected_strike),
    n = n(),
    .groups = "drop"
  ) %>%
  filter(n >= 10) %>%
  mutate(
    x_mid = as.numeric(x_bin) * 0.2 - 2.1,
    z_mid = as.numeric(z_bin) * 0.2 - 0.1
  )

ggplot(framing_heatmap, aes(x = x_mid, y = z_mid, fill = framing_value)) +
  geom_tile() +
  facet_wrap(~ catcher) +
  scale_fill_gradient2(low = "red", mid = "white", high = "blue",
                       midpoint = 0, name = "Framing\nValue") +
  coord_fixed() +
  geom_rect(aes(xmin = -0.708, xmax = 0.708, ymin = 1.5, ymax = 3.5),
            fill = NA, color = "black", linewidth = 1) +
  labs(title = "Pitch Framing Value by Location",
       x = "Horizontal Location (ft)",
       y = "Vertical Location (ft)") +
  theme_minimal()

# Python: Pitch Framing Analysis with Machine Learning
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
import seaborn as sns

# Simulate pitch data
np.random.seed(2023)
n_pitches = 50000

pitch_df = pd.DataFrame({
    'pitch_id': range(n_pitches),
    'plate_x': np.random.normal(0, 0.8, n_pitches),
    'plate_z': np.random.normal(2.5, 0.6, n_pitches),
    'catcher': np.random.choice(['Realmuto', 'Rutschman', 'Barnhart', 'Perez', 'Smith'],
                                 n_pitches),
    'pitch_type': np.random.choice(['FF', 'SL', 'CH', 'CU'], n_pitches),
    'balls': np.random.randint(0, 4, n_pitches),
    'strikes': np.random.randint(0, 3, n_pitches),
    'p_throws': np.random.choice(['R', 'L'], n_pitches),
    'stand': np.random.choice(['R', 'L'], n_pitches)
})

# Create strike probability based on location
pitch_df['dist_from_center'] = np.sqrt(
    pitch_df['plate_x']**2 + (pitch_df['plate_z'] - 2.5)**2
)

# Catcher effects
catcher_effects = {
    'Barnhart': 0.4,
    'Realmuto': 0.3,
    'Rutschman': 0.25,
    'Smith': 0.15,
    'Perez': -0.2
}

pitch_df['catcher_effect'] = pitch_df['catcher'].map(catcher_effects)

# Simulate called strikes
def logit(p):
    return np.log(p / (1 - p))

def inv_logit(x):
    return 1 / (1 + np.exp(-x))

base_logit = 2 - 2.5 * pitch_df['dist_from_center']
adj_logit = base_logit + pitch_df['catcher_effect']
strike_prob = inv_logit(adj_logit)

pitch_df['called_strike'] = np.random.binomial(1, strike_prob)
pitch_df['is_called'] = np.random.choice([True, False], n_pitches,
                                          p=[0.6, 0.4])

# Filter to called pitches
called_pitches = pitch_df[pitch_df['is_called']].copy()

# Prepare features for modeling (exclude catcher)
le_pitch = LabelEncoder()
le_throws = LabelEncoder()
le_stand = LabelEncoder()

X_features = called_pitches.copy()
X_features['pitch_type_enc'] = le_pitch.fit_transform(X_features['pitch_type'])
X_features['p_throws_enc'] = le_throws.fit_transform(X_features['p_throws'])
X_features['stand_enc'] = le_stand.fit_transform(X_features['stand'])

feature_cols = ['plate_x', 'plate_z', 'balls', 'strikes',
                'pitch_type_enc', 'p_throws_enc', 'stand_enc']

X = X_features[feature_cols]
y = X_features['called_strike']

# Train baseline model (without catcher)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

gb_model = GradientBoostingClassifier(
    n_estimators=100,
    max_depth=5,
    learning_rate=0.1,
    random_state=42
)

gb_model.fit(X_train, y_train)

# Generate expected strike probabilities
X_features['expected_strike_prob'] = gb_model.predict_proba(X)[:, 1]
X_features['called_strike'] = y

# Calculate framing runs
framing_summary = X_features.groupby('catcher').agg({
    'pitch_id': 'count',
    'called_strike': 'sum',
    'expected_strike_prob': 'sum'
}).round(2)

framing_summary.columns = ['pitches', 'actual_strikes', 'expected_strikes']
framing_summary['extra_strikes'] = (
    framing_summary['actual_strikes'] - framing_summary['expected_strikes']
)
framing_summary['framing_runs'] = framing_summary['extra_strikes'] * 0.125
framing_summary['framing_runs_per_7000'] = (
    framing_summary['framing_runs'] / framing_summary['pitches'] * 7000
)

framing_summary = framing_summary.sort_values('framing_runs_per_7000',
                                               ascending=False)
print("\nFraming Runs by Catcher:")
print(framing_summary)

# Visualize framing by location for top catchers
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
top_catchers = ['Barnhart', 'Realmuto', 'Perez']

for idx, catcher_name in enumerate(top_catchers):
    catcher_data = X_features[X_features['catcher'] == catcher_name].copy()
    catcher_data['framing_value'] = (
        catcher_data['called_strike'] - catcher_data['expected_strike_prob']
    )

    # Create 2D histogram
    heatmap_data = axes[idx].hexbin(
        catcher_data['plate_x'],
        catcher_data['plate_z'],
        C=catcher_data['framing_value'],
        gridsize=15,
        cmap='RdBu',
        vmin=-0.2,
        vmax=0.2,
        reduce_C_function=np.mean
    )

    # Add strike zone rectangle
    zone_x = [-0.708, 0.708, 0.708, -0.708, -0.708]
    zone_z = [1.5, 1.5, 3.5, 3.5, 1.5]
    axes[idx].plot(zone_x, zone_z, 'k-', linewidth=2)

    axes[idx].set_title(f'{catcher_name} Framing Value')
    axes[idx].set_xlabel('Horizontal Location (ft)')
    axes[idx].set_ylabel('Vertical Location (ft)')
    axes[idx].set_xlim(-2, 2)
    axes[idx].set_ylim(0, 5)

plt.colorbar(heatmap_data, ax=axes, label='Framing Value')
plt.tight_layout()
plt.savefig('framing_heatmap.png', dpi=300, bbox_inches='tight')
plt.close()

# Calculate framing by zone
def classify_zone(row):
    x, z = row['plate_x'], row['plate_z']
    if -0.708 <= x <= 0.708 and 1.5 <= z <= 3.5:
        return 'In Zone'
    elif -1.5 <= x <= 1.5 and 0.5 <= z <= 4.5:
        return 'Edge (Frameable)'
    else:
        return 'Out of Zone'

X_features['zone_type'] = X_features.apply(classify_zone, axis=1)

zone_framing = X_features.groupby(['catcher', 'zone_type']).agg({
    'called_strike': 'sum',
    'expected_strike_prob': 'sum',
    'pitch_id': 'count'
}).reset_index()

zone_framing['extra_strikes'] = (
    zone_framing['called_strike'] - zone_framing['expected_strike_prob']
)
zone_framing['extra_strike_pct'] = (
    zone_framing['extra_strikes'] / zone_framing['pitch_id'] * 100
)

print("\nFraming by Zone Type:")
print(zone_framing.pivot_table(
    index='catcher',
    columns='zone_type',
    values='extra_strike_pct'
).round(2))

Edge Strike Mastery

The most valuable framing occurs on pitches near the strike zone border—the "edge" pitches where umpire judgment is most uncertain. Research shows that:

Pitches 2-4 inches off the plate horizontally have the highest framing impact
Low strikes (bottom of zone) show more framing variability than high strikes
Backdoor breaking balls and inside fastballs benefit most from elite framing

Elite framers like Tucker Barnhart gained 3-4% more called strikes on edge pitches compared to poor framers—a difference worth 15-20 runs per season on those pitches alone.

Umpire and Count Context

Framing value varies significantly by context:

Count Effects: Framers gain more strikes in pitcher-favorable counts (0-2, 1-2) when umpires may be more willing to give borderline calls, and in 3-2 counts when both pitcher and batter are protecting.

Umpire Variation: Some umpires have larger strike zones or are more susceptible to framing. Advanced systems model individual umpire tendencies, allowing catchers to adjust their approach.

Home Plate Umpire Positioning: Umpires who set up more directly behind the catcher may be less influenced by framing than those offset to one side.

The Limits of Framing: Robot Umps

The introduction of the Automated Ball-Strike System (ABS) in minor leagues threatens to eliminate framing value entirely. In Triple-A games using ABS, catchers no longer influence strike calls, fundamentally changing the position's value proposition. This has major implications:

Defensive-first catchers may lose their primary value source
Teams may prioritize offense more heavily at the position
Catcher salaries could compress as the skill gap narrows
Blocking and throwing skills become relatively more important

However, as of 2024-2025, MLB has not committed to full ABS implementation, instead testing hybrid systems where teams can challenge calls. This preserves some framing value while improving accuracy.

# R: Building a Called Strike Probability Model
library(tidyverse)
library(mgcv)  # For GAM models

# Simulate pitch-level data structure
set.seed(2023)
n_pitches <- 50000

pitch_data <- tibble(
  pitch_id = 1:n_pitches,
  plate_x = rnorm(n_pitches, 0, 0.8),  # Horizontal location (feet from center)
  plate_z = rnorm(n_pitches, 2.5, 0.6),  # Vertical location (feet)
  catcher = sample(c("Realmuto", "Rutschman", "Barnhart", "Perez", "League_Avg"),
                   n_pitches, replace = TRUE),
  pitch_type = sample(c("FF", "SL", "CH", "CU"), n_pitches, replace = TRUE),
  balls = sample(0:3, n_pitches, replace = TRUE),
  strikes = sample(0:2, n_pitches, replace = TRUE),
  p_throws = sample(c("R", "L"), n_pitches, replace = TRUE),
  stand = sample(c("R", "L"), n_pitches, replace = TRUE)
) %>%
  # Simulate called strike outcomes based on location
  mutate(
    # Distance from center of zone
    dist_from_center = sqrt(plate_x^2 + (plate_z - 2.5)^2),
    # Probability decreases with distance from zone
    strike_prob = plogis(2 - 2.5 * dist_from_center),
    # Add catcher effect
    catcher_effect = case_when(
      catcher == "Barnhart" ~ 0.4,
      catcher == "Realmuto" ~ 0.3,
      catcher == "Rutschman" ~ 0.25,
      catcher == "League_Avg" ~ 0,
      catcher == "Perez" ~ -0.2
    ),
    strike_prob_adj = plogis(qlogis(strike_prob) + catcher_effect),
    called_strike = rbinom(n_pitches, 1, strike_prob_adj),
    # Filter to called pitches only (no swings)
    is_called = sample(c(TRUE, FALSE), n_pitches, replace = TRUE, prob = c(0.6, 0.4))
  ) %>%
  filter(is_called)

# Build baseline GAM model without catcher
baseline_model <- gam(
  called_strike ~ s(plate_x, plate_z, k = 100) +
                  factor(balls) + factor(strikes) +
                  factor(p_throws) + factor(stand),
  data = pitch_data,
  family = binomial
)

# Add predictions to data
pitch_data <- pitch_data %>%
  mutate(expected_strike = predict(baseline_model, newdata = ., type = "response"))

# Calculate framing runs by catcher
framing_results <- pitch_data %>%
  group_by(catcher) %>%
  summarise(
    pitches = n(),
    actual_strikes = sum(called_strike),
    expected_strikes = sum(expected_strike),
    extra_strikes = actual_strikes - expected_strikes,
    # Convert to runs (approximately 0.125 runs per strike)
    framing_runs = extra_strikes * 0.125,
    # Per 7000 called pitches (typical season)
    framing_runs_per_7000 = framing_runs / pitches * 7000
  ) %>%
  arrange(desc(framing_runs_per_7000))

print(framing_results)

# Visualize the strike zone by catcher
library(ggplot2)

# Create heatmap of framing value by location
framing_heatmap <- pitch_data %>%
  filter(catcher %in% c("Barnhart", "Realmuto", "Perez")) %>%
  mutate(
    x_bin = cut(plate_x, breaks = seq(-2, 2, 0.2)),
    z_bin = cut(plate_z, breaks = seq(0, 5, 0.2))
  ) %>%
  group_by(catcher, x_bin, z_bin) %>%
  summarise(
    framing_value = mean(called_strike - expected_strike),
    n = n(),
    .groups = "drop"
  ) %>%
  filter(n >= 10) %>%
  mutate(
    x_mid = as.numeric(x_bin) * 0.2 - 2.1,
    z_mid = as.numeric(z_bin) * 0.2 - 0.1
  )

ggplot(framing_heatmap, aes(x = x_mid, y = z_mid, fill = framing_value)) +
  geom_tile() +
  facet_wrap(~ catcher) +
  scale_fill_gradient2(low = "red", mid = "white", high = "blue",
                       midpoint = 0, name = "Framing\nValue") +
  coord_fixed() +
  geom_rect(aes(xmin = -0.708, xmax = 0.708, ymin = 1.5, ymax = 3.5),
            fill = NA, color = "black", linewidth = 1) +
  labs(title = "Pitch Framing Value by Location",
       x = "Horizontal Location (ft)",
       y = "Vertical Location (ft)") +
  theme_minimal()

Python

# Python: Pitch Framing Analysis with Machine Learning
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
import seaborn as sns

# Simulate pitch data
np.random.seed(2023)
n_pitches = 50000

pitch_df = pd.DataFrame({
    'pitch_id': range(n_pitches),
    'plate_x': np.random.normal(0, 0.8, n_pitches),
    'plate_z': np.random.normal(2.5, 0.6, n_pitches),
    'catcher': np.random.choice(['Realmuto', 'Rutschman', 'Barnhart', 'Perez', 'Smith'],
                                 n_pitches),
    'pitch_type': np.random.choice(['FF', 'SL', 'CH', 'CU'], n_pitches),
    'balls': np.random.randint(0, 4, n_pitches),
    'strikes': np.random.randint(0, 3, n_pitches),
    'p_throws': np.random.choice(['R', 'L'], n_pitches),
    'stand': np.random.choice(['R', 'L'], n_pitches)
})

# Create strike probability based on location
pitch_df['dist_from_center'] = np.sqrt(
    pitch_df['plate_x']**2 + (pitch_df['plate_z'] - 2.5)**2
)

# Catcher effects
catcher_effects = {
    'Barnhart': 0.4,
    'Realmuto': 0.3,
    'Rutschman': 0.25,
    'Smith': 0.15,
    'Perez': -0.2
}

pitch_df['catcher_effect'] = pitch_df['catcher'].map(catcher_effects)

# Simulate called strikes
def logit(p):
    return np.log(p / (1 - p))

def inv_logit(x):
    return 1 / (1 + np.exp(-x))

base_logit = 2 - 2.5 * pitch_df['dist_from_center']
adj_logit = base_logit + pitch_df['catcher_effect']
strike_prob = inv_logit(adj_logit)

pitch_df['called_strike'] = np.random.binomial(1, strike_prob)
pitch_df['is_called'] = np.random.choice([True, False], n_pitches,
                                          p=[0.6, 0.4])

# Filter to called pitches
called_pitches = pitch_df[pitch_df['is_called']].copy()

# Prepare features for modeling (exclude catcher)
le_pitch = LabelEncoder()
le_throws = LabelEncoder()
le_stand = LabelEncoder()

X_features = called_pitches.copy()
X_features['pitch_type_enc'] = le_pitch.fit_transform(X_features['pitch_type'])
X_features['p_throws_enc'] = le_throws.fit_transform(X_features['p_throws'])
X_features['stand_enc'] = le_stand.fit_transform(X_features['stand'])

feature_cols = ['plate_x', 'plate_z', 'balls', 'strikes',
                'pitch_type_enc', 'p_throws_enc', 'stand_enc']

X = X_features[feature_cols]
y = X_features['called_strike']

# Train baseline model (without catcher)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

gb_model = GradientBoostingClassifier(
    n_estimators=100,
    max_depth=5,
    learning_rate=0.1,
    random_state=42
)

gb_model.fit(X_train, y_train)

# Generate expected strike probabilities
X_features['expected_strike_prob'] = gb_model.predict_proba(X)[:, 1]
X_features['called_strike'] = y

# Calculate framing runs
framing_summary = X_features.groupby('catcher').agg({
    'pitch_id': 'count',
    'called_strike': 'sum',
    'expected_strike_prob': 'sum'
}).round(2)

framing_summary.columns = ['pitches', 'actual_strikes', 'expected_strikes']
framing_summary['extra_strikes'] = (
    framing_summary['actual_strikes'] - framing_summary['expected_strikes']
)
framing_summary['framing_runs'] = framing_summary['extra_strikes'] * 0.125
framing_summary['framing_runs_per_7000'] = (
    framing_summary['framing_runs'] / framing_summary['pitches'] * 7000
)

framing_summary = framing_summary.sort_values('framing_runs_per_7000',
                                               ascending=False)
print("\nFraming Runs by Catcher:")
print(framing_summary)

# Visualize framing by location for top catchers
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
top_catchers = ['Barnhart', 'Realmuto', 'Perez']

for idx, catcher_name in enumerate(top_catchers):
    catcher_data = X_features[X_features['catcher'] == catcher_name].copy()
    catcher_data['framing_value'] = (
        catcher_data['called_strike'] - catcher_data['expected_strike_prob']
    )

    # Create 2D histogram
    heatmap_data = axes[idx].hexbin(
        catcher_data['plate_x'],
        catcher_data['plate_z'],
        C=catcher_data['framing_value'],
        gridsize=15,
        cmap='RdBu',
        vmin=-0.2,
        vmax=0.2,
        reduce_C_function=np.mean
    )

    # Add strike zone rectangle
    zone_x = [-0.708, 0.708, 0.708, -0.708, -0.708]
    zone_z = [1.5, 1.5, 3.5, 3.5, 1.5]
    axes[idx].plot(zone_x, zone_z, 'k-', linewidth=2)

    axes[idx].set_title(f'{catcher_name} Framing Value')
    axes[idx].set_xlabel('Horizontal Location (ft)')
    axes[idx].set_ylabel('Vertical Location (ft)')
    axes[idx].set_xlim(-2, 2)
    axes[idx].set_ylim(0, 5)

plt.colorbar(heatmap_data, ax=axes, label='Framing Value')
plt.tight_layout()
plt.savefig('framing_heatmap.png', dpi=300, bbox_inches='tight')
plt.close()

# Calculate framing by zone
def classify_zone(row):
    x, z = row['plate_x'], row['plate_z']
    if -0.708 <= x <= 0.708 and 1.5 <= z <= 3.5:
        return 'In Zone'
    elif -1.5 <= x <= 1.5 and 0.5 <= z <= 4.5:
        return 'Edge (Frameable)'
    else:
        return 'Out of Zone'

X_features['zone_type'] = X_features.apply(classify_zone, axis=1)

zone_framing = X_features.groupby(['catcher', 'zone_type']).agg({
    'called_strike': 'sum',
    'expected_strike_prob': 'sum',
    'pitch_id': 'count'
}).reset_index()

zone_framing['extra_strikes'] = (
    zone_framing['called_strike'] - zone_framing['expected_strike_prob']
)
zone_framing['extra_strike_pct'] = (
    zone_framing['extra_strikes'] / zone_framing['pitch_id'] * 100
)

print("\nFraming by Zone Type:")
print(zone_framing.pivot_table(
    index='catcher',
    columns='zone_type',
    values='extra_strike_pct'
).round(2))

20.3 Blocking & Passed Ball Prevention

While pitch framing receives the most analytical attention, blocking pitches in the dirt remains a crucial skill. Wild pitches and passed balls allow baserunners to advance, with each base costing teams approximately 0.25 runs on average. Elite blockers save 5-10 runs per season compared to poor blockers.

The Mechanics of Blocking

Effective blocking requires:

Quick Recognition: Identifying pitches in the dirt early
Proper Positioning: Staying low and square to the pitcher
Drop Technique: Quickly dropping to knees with chest over the ball
Creating Surface Area: Spreading the body to cover maximum area
Angle Control: Directing deflections toward home plate
Recovery: Quickly recovering to prevent runners from advancing

Yadier Molina, widely considered one of the best blockers in history, demonstrated textbook technique: staying flexible, reading spin early, and using his chest protector to keep balls in front. His blocking ability was particularly valuable with the Cardinals' sinkerball-heavy pitching approach.

Blocking Metrics

Measuring blocking requires accounting for opportunity:

Block Rate: Percentage of pitches in the dirt successfully blocked
Expected Blocks: Model-based expected blocks given pitch locations
Blocks Above Average (BAA): Actual blocks minus expected blocks
Wild Pitch/Passed Ball Runs: Run value of WP/PB prevented

The challenge lies in defining "blockable" pitches. A pitch 5 feet in front of the plate differs from one just off the dirt. Advanced models use pitch trajectory and location to estimate block probability.

# R: Analyzing Blocking Performance
library(tidyverse)

# Simulate pitch in dirt data
set.seed(2024)
n_dirt_pitches <- 5000

blocking_data <- tibble(
  pitch_id = 1:n_dirt_pitches,
  catcher = sample(c("Molina", "Realmuto", "Perez", "Rutschman", "Avg_Catcher"),
                   n_dirt_pitches, replace = TRUE),
  # Distance from plate (negative = in front of plate)
  plate_y = runif(n_dirt_pitches, -4, -0.5),
  # Horizontal location
  plate_x = rnorm(n_dirt_pitches, 0, 0.6),
  # Height at front of plate
  plate_z = runif(n_dirt_pitches, -0.5, 0.8),
  pitch_type = sample(c("FF", "SL", "CH", "CU"), n_dirt_pitches, replace = TRUE),
  # Runner on base indicator
  runner_on = sample(c(TRUE, FALSE), n_dirt_pitches, replace = TRUE)
) %>%
  mutate(
    # Difficulty score based on location
    difficulty = sqrt(plate_x^2 + plate_y^2 + plate_z^2),
    # Base block probability
    base_block_prob = plogis(2 - 0.8 * difficulty),
    # Catcher skill modifier
    catcher_skill = case_when(
      catcher == "Molina" ~ 0.5,
      catcher == "Realmuto" ~ 0.3,
      catcher == "Rutschman" ~ 0.2,
      catcher == "Avg_Catcher" ~ 0,
      catcher == "Perez" ~ -0.1
    ),
    block_prob = plogis(qlogis(base_block_prob) + catcher_skill),
    blocked = rbinom(n_dirt_pitches, 1, block_prob)
  )

# Build expected blocking model
blocking_model <- glm(
  blocked ~ plate_x + plate_y + plate_z + I(plate_x^2) + I(plate_y^2),
  data = blocking_data,
  family = binomial
)

blocking_data <- blocking_data %>%
  mutate(expected_block = predict(blocking_model, newdata = ., type = "response"))

# Calculate blocking runs
blocking_summary <- blocking_data %>%
  group_by(catcher) %>%
  summarise(
    dirt_pitches = n(),
    actual_blocks = sum(blocked),
    expected_blocks = sum(expected_block),
    extra_blocks = actual_blocks - expected_blocks,
    block_rate = mean(blocked) * 100,
    # Each failed block (WP/PB) costs ~0.3 runs on average
    blocking_runs = extra_blocks * 0.3,
    blocking_runs_per_1000 = blocking_runs / dirt_pitches * 1000
  ) %>%
  arrange(desc(blocking_runs_per_1000))

print(blocking_summary)

# Analyze blocking by difficulty tier
difficulty_analysis <- blocking_data %>%
  mutate(
    difficulty_tier = cut(difficulty,
                          breaks = quantile(difficulty, probs = seq(0, 1, 0.25)),
                          labels = c("Easy", "Medium", "Hard", "Very Hard"),
                          include.lowest = TRUE)
  ) %>%
  group_by(catcher, difficulty_tier) %>%
  summarise(
    pitches = n(),
    block_rate = mean(blocked) * 100,
    .groups = "drop"
  ) %>%
  pivot_wider(names_from = difficulty_tier, values_from = block_rate,
              values_fill = 0)

print(difficulty_analysis)

# Visualize blocking performance
ggplot(blocking_summary, aes(x = reorder(catcher, blocking_runs_per_1000),
                              y = blocking_runs_per_1000, fill = catcher)) +
  geom_col() +
  coord_flip() +
  labs(title = "Blocking Runs Above Average per 1000 Pitches in Dirt",
       x = "Catcher",
       y = "Blocking Runs per 1000 Dirt Pitches") +
  theme_minimal() +
  theme(legend.position = "none")

# Impact of blocking with runners on base
runner_impact <- blocking_data %>%
  group_by(catcher, runner_on) %>%
  summarise(
    pitches = n(),
    block_rate = mean(blocked) * 100,
    .groups = "drop"
  ) %>%
  pivot_wider(names_from = runner_on, values_from = block_rate,
              names_prefix = "runner_on_")

print(runner_impact)

# Python: Blocking Analysis with Spatial Components
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Simulate blocking data
np.random.seed(2024)
n_dirt = 5000

blocking_df = pd.DataFrame({
    'pitch_id': range(n_dirt),
    'catcher': np.random.choice(['Molina', 'Realmuto', 'Perez', 'Rutschman',
                                  'Avg_Catcher'], n_dirt),
    'plate_y': np.random.uniform(-4, -0.5, n_dirt),
    'plate_x': np.random.normal(0, 0.6, n_dirt),
    'plate_z': np.random.uniform(-0.5, 0.8, n_dirt),
    'pitch_type': np.random.choice(['FF', 'SL', 'CH', 'CU'], n_dirt),
    'runner_on': np.random.choice([True, False], n_dirt)
})

# Calculate difficulty and block probability
blocking_df['difficulty'] = np.sqrt(
    blocking_df['plate_x']**2 +
    blocking_df['plate_y']**2 +
    blocking_df['plate_z']**2
)

catcher_blocking_skill = {
    'Molina': 0.5,
    'Realmuto': 0.3,
    'Rutschman': 0.2,
    'Avg_Catcher': 0,
    'Perez': -0.1
}

blocking_df['catcher_skill'] = blocking_df['catcher'].map(catcher_blocking_skill)

def inv_logit(x):
    return 1 / (1 + np.exp(-x))

base_logit = 2 - 0.8 * blocking_df['difficulty']
adj_logit = base_logit + blocking_df['catcher_skill']
block_prob = inv_logit(adj_logit)

blocking_df['blocked'] = np.random.binomial(1, block_prob)

# Train Random Forest model for expected blocks
feature_cols = ['plate_x', 'plate_y', 'plate_z']
X = blocking_df[feature_cols]
y = blocking_df['blocked']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    random_state=42
)

rf_model.fit(X_train, y_train)

# Generate expected block probabilities
blocking_df['expected_block'] = rf_model.predict_proba(X)[:, 1]

# Calculate blocking metrics
blocking_summary = blocking_df.groupby('catcher').agg({
    'pitch_id': 'count',
    'blocked': 'sum',
    'expected_block': 'sum'
})

blocking_summary.columns = ['dirt_pitches', 'actual_blocks', 'expected_blocks']
blocking_summary['extra_blocks'] = (
    blocking_summary['actual_blocks'] - blocking_summary['expected_blocks']
)
blocking_summary['block_rate'] = (
    blocking_summary['actual_blocks'] / blocking_summary['dirt_pitches'] * 100
)
blocking_summary['blocking_runs'] = blocking_summary['extra_blocks'] * 0.3
blocking_summary['blocking_runs_per_1000'] = (
    blocking_summary['blocking_runs'] / blocking_summary['dirt_pitches'] * 1000
)

blocking_summary = blocking_summary.sort_values('blocking_runs_per_1000',
                                                 ascending=False)
print("\nBlocking Performance Summary:")
print(blocking_summary.round(2))

# 3D visualization of blocking difficulty zones
fig = plt.figure(figsize=(12, 5))

# Plot 1: 3D scatter of blocks vs non-blocks
ax1 = fig.add_subplot(121, projection='3d')
blocked = blocking_df[blocking_df['blocked'] == 1]
not_blocked = blocking_df[blocking_df['blocked'] == 0]

ax1.scatter(blocked['plate_x'], blocked['plate_y'], blocked['plate_z'],
            c='green', marker='o', alpha=0.3, label='Blocked')
ax1.scatter(not_blocked['plate_x'], not_blocked['plate_y'], not_blocked['plate_z'],
            c='red', marker='x', alpha=0.3, label='Not Blocked')

ax1.set_xlabel('Horizontal Location (ft)')
ax1.set_ylabel('Distance from Plate (ft)')
ax1.set_zlabel('Height (ft)')
ax1.set_title('Blocking Success by Location')
ax1.legend()

# Plot 2: Blocking difficulty heatmap
ax2 = fig.add_subplot(122)
difficulty_bins = pd.cut(blocking_df['difficulty'], bins=10)
blocking_df['diff_bin'] = difficulty_bins

diff_analysis = blocking_df.groupby(['catcher', 'diff_bin']).agg({
    'blocked': 'mean',
    'pitch_id': 'count'
}).reset_index()

diff_pivot = diff_analysis.pivot_table(
    index='catcher',
    columns='diff_bin',
    values='blocked'
)

import seaborn as sns
sns.heatmap(diff_pivot, annot=True, fmt='.2f', cmap='RdYlGn', ax=ax2)
ax2.set_title('Block Rate by Difficulty Tier')
ax2.set_xlabel('Difficulty (Distance from Ideal)')
ax2.set_ylabel('Catcher')

plt.tight_layout()
plt.savefig('blocking_analysis.png', dpi=300, bbox_inches='tight')
plt.close()

# Calculate blocking value by pitch type
pitch_type_blocking = blocking_df.groupby(['catcher', 'pitch_type']).agg({
    'blocked': ['sum', 'count', 'mean'],
    'expected_block': 'sum'
}).round(3)

print("\nBlocking Performance by Pitch Type:")
print(pitch_type_blocking)

Situational Blocking

Blocking value increases with runners on base, particularly in scoring position. A passed ball with a runner on third directly costs a run. Elite catchers like Yadier Molina and Buster Posey demonstrated heightened focus in high-leverage situations, maintaining block rates even on difficult pitches.

The Pitcher-Catcher Blocking Dynamic

Certain pitcher types create more blocking challenges:

Hard sinkerballers (e.g., Zack Britton, Emmanuel Clase) generate many pitches in the dirt
Splitter pitchers (e.g., Kevin Gausman) have late-diving action
Young, wild pitchers have less control over pitch location

Teams consider catcher blocking ability when pairing batteries. A pitcher with elite control (e.g., Zack Greinke) can succeed with a poor blocker, while a wild power pitcher benefits from an elite blocker.

# R: Analyzing Blocking Performance
library(tidyverse)

# Simulate pitch in dirt data
set.seed(2024)
n_dirt_pitches <- 5000

blocking_data <- tibble(
  pitch_id = 1:n_dirt_pitches,
  catcher = sample(c("Molina", "Realmuto", "Perez", "Rutschman", "Avg_Catcher"),
                   n_dirt_pitches, replace = TRUE),
  # Distance from plate (negative = in front of plate)
  plate_y = runif(n_dirt_pitches, -4, -0.5),
  # Horizontal location
  plate_x = rnorm(n_dirt_pitches, 0, 0.6),
  # Height at front of plate
  plate_z = runif(n_dirt_pitches, -0.5, 0.8),
  pitch_type = sample(c("FF", "SL", "CH", "CU"), n_dirt_pitches, replace = TRUE),
  # Runner on base indicator
  runner_on = sample(c(TRUE, FALSE), n_dirt_pitches, replace = TRUE)
) %>%
  mutate(
    # Difficulty score based on location
    difficulty = sqrt(plate_x^2 + plate_y^2 + plate_z^2),
    # Base block probability
    base_block_prob = plogis(2 - 0.8 * difficulty),
    # Catcher skill modifier
    catcher_skill = case_when(
      catcher == "Molina" ~ 0.5,
      catcher == "Realmuto" ~ 0.3,
      catcher == "Rutschman" ~ 0.2,
      catcher == "Avg_Catcher" ~ 0,
      catcher == "Perez" ~ -0.1
    ),
    block_prob = plogis(qlogis(base_block_prob) + catcher_skill),
    blocked = rbinom(n_dirt_pitches, 1, block_prob)
  )

# Build expected blocking model
blocking_model <- glm(
  blocked ~ plate_x + plate_y + plate_z + I(plate_x^2) + I(plate_y^2),
  data = blocking_data,
  family = binomial
)

blocking_data <- blocking_data %>%
  mutate(expected_block = predict(blocking_model, newdata = ., type = "response"))

# Calculate blocking runs
blocking_summary <- blocking_data %>%
  group_by(catcher) %>%
  summarise(
    dirt_pitches = n(),
    actual_blocks = sum(blocked),
    expected_blocks = sum(expected_block),
    extra_blocks = actual_blocks - expected_blocks,
    block_rate = mean(blocked) * 100,
    # Each failed block (WP/PB) costs ~0.3 runs on average
    blocking_runs = extra_blocks * 0.3,
    blocking_runs_per_1000 = blocking_runs / dirt_pitches * 1000
  ) %>%
  arrange(desc(blocking_runs_per_1000))

print(blocking_summary)

# Analyze blocking by difficulty tier
difficulty_analysis <- blocking_data %>%
  mutate(
    difficulty_tier = cut(difficulty,
                          breaks = quantile(difficulty, probs = seq(0, 1, 0.25)),
                          labels = c("Easy", "Medium", "Hard", "Very Hard"),
                          include.lowest = TRUE)
  ) %>%
  group_by(catcher, difficulty_tier) %>%
  summarise(
    pitches = n(),
    block_rate = mean(blocked) * 100,
    .groups = "drop"
  ) %>%
  pivot_wider(names_from = difficulty_tier, values_from = block_rate,
              values_fill = 0)

print(difficulty_analysis)

# Visualize blocking performance
ggplot(blocking_summary, aes(x = reorder(catcher, blocking_runs_per_1000),
                              y = blocking_runs_per_1000, fill = catcher)) +
  geom_col() +
  coord_flip() +
  labs(title = "Blocking Runs Above Average per 1000 Pitches in Dirt",
       x = "Catcher",
       y = "Blocking Runs per 1000 Dirt Pitches") +
  theme_minimal() +
  theme(legend.position = "none")

# Impact of blocking with runners on base
runner_impact <- blocking_data %>%
  group_by(catcher, runner_on) %>%
  summarise(
    pitches = n(),
    block_rate = mean(blocked) * 100,
    .groups = "drop"
  ) %>%
  pivot_wider(names_from = runner_on, values_from = block_rate,
              names_prefix = "runner_on_")

print(runner_impact)

Python

# Python: Blocking Analysis with Spatial Components
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Simulate blocking data
np.random.seed(2024)
n_dirt = 5000

blocking_df = pd.DataFrame({
    'pitch_id': range(n_dirt),
    'catcher': np.random.choice(['Molina', 'Realmuto', 'Perez', 'Rutschman',
                                  'Avg_Catcher'], n_dirt),
    'plate_y': np.random.uniform(-4, -0.5, n_dirt),
    'plate_x': np.random.normal(0, 0.6, n_dirt),
    'plate_z': np.random.uniform(-0.5, 0.8, n_dirt),
    'pitch_type': np.random.choice(['FF', 'SL', 'CH', 'CU'], n_dirt),
    'runner_on': np.random.choice([True, False], n_dirt)
})

# Calculate difficulty and block probability
blocking_df['difficulty'] = np.sqrt(
    blocking_df['plate_x']**2 +
    blocking_df['plate_y']**2 +
    blocking_df['plate_z']**2
)

catcher_blocking_skill = {
    'Molina': 0.5,
    'Realmuto': 0.3,
    'Rutschman': 0.2,
    'Avg_Catcher': 0,
    'Perez': -0.1
}

blocking_df['catcher_skill'] = blocking_df['catcher'].map(catcher_blocking_skill)

def inv_logit(x):
    return 1 / (1 + np.exp(-x))

base_logit = 2 - 0.8 * blocking_df['difficulty']
adj_logit = base_logit + blocking_df['catcher_skill']
block_prob = inv_logit(adj_logit)

blocking_df['blocked'] = np.random.binomial(1, block_prob)

# Train Random Forest model for expected blocks
feature_cols = ['plate_x', 'plate_y', 'plate_z']
X = blocking_df[feature_cols]
y = blocking_df['blocked']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    random_state=42
)

rf_model.fit(X_train, y_train)

# Generate expected block probabilities
blocking_df['expected_block'] = rf_model.predict_proba(X)[:, 1]

# Calculate blocking metrics
blocking_summary = blocking_df.groupby('catcher').agg({
    'pitch_id': 'count',
    'blocked': 'sum',
    'expected_block': 'sum'
})

blocking_summary.columns = ['dirt_pitches', 'actual_blocks', 'expected_blocks']
blocking_summary['extra_blocks'] = (
    blocking_summary['actual_blocks'] - blocking_summary['expected_blocks']
)
blocking_summary['block_rate'] = (
    blocking_summary['actual_blocks'] / blocking_summary['dirt_pitches'] * 100
)
blocking_summary['blocking_runs'] = blocking_summary['extra_blocks'] * 0.3
blocking_summary['blocking_runs_per_1000'] = (
    blocking_summary['blocking_runs'] / blocking_summary['dirt_pitches'] * 1000
)

blocking_summary = blocking_summary.sort_values('blocking_runs_per_1000',
                                                 ascending=False)
print("\nBlocking Performance Summary:")
print(blocking_summary.round(2))

# 3D visualization of blocking difficulty zones
fig = plt.figure(figsize=(12, 5))

# Plot 1: 3D scatter of blocks vs non-blocks
ax1 = fig.add_subplot(121, projection='3d')
blocked = blocking_df[blocking_df['blocked'] == 1]
not_blocked = blocking_df[blocking_df['blocked'] == 0]

ax1.scatter(blocked['plate_x'], blocked['plate_y'], blocked['plate_z'],
            c='green', marker='o', alpha=0.3, label='Blocked')
ax1.scatter(not_blocked['plate_x'], not_blocked['plate_y'], not_blocked['plate_z'],
            c='red', marker='x', alpha=0.3, label='Not Blocked')

ax1.set_xlabel('Horizontal Location (ft)')
ax1.set_ylabel('Distance from Plate (ft)')
ax1.set_zlabel('Height (ft)')
ax1.set_title('Blocking Success by Location')
ax1.legend()

# Plot 2: Blocking difficulty heatmap
ax2 = fig.add_subplot(122)
difficulty_bins = pd.cut(blocking_df['difficulty'], bins=10)
blocking_df['diff_bin'] = difficulty_bins

diff_analysis = blocking_df.groupby(['catcher', 'diff_bin']).agg({
    'blocked': 'mean',
    'pitch_id': 'count'
}).reset_index()

diff_pivot = diff_analysis.pivot_table(
    index='catcher',
    columns='diff_bin',
    values='blocked'
)

import seaborn as sns
sns.heatmap(diff_pivot, annot=True, fmt='.2f', cmap='RdYlGn', ax=ax2)
ax2.set_title('Block Rate by Difficulty Tier')
ax2.set_xlabel('Difficulty (Distance from Ideal)')
ax2.set_ylabel('Catcher')

plt.tight_layout()
plt.savefig('blocking_analysis.png', dpi=300, bbox_inches='tight')
plt.close()

# Calculate blocking value by pitch type
pitch_type_blocking = blocking_df.groupby(['catcher', 'pitch_type']).agg({
    'blocked': ['sum', 'count', 'mean'],
    'expected_block': 'sum'
}).round(3)

print("\nBlocking Performance by Pitch Type:")
print(pitch_type_blocking)

20.4 Throwing & Controlling the Running Game

Catcher throwing ability encompasses raw arm strength, transfer speed, throwing accuracy, and the more intangible skill of controlling the running game through reputation and pitcher management.

Components of Throwing Ability

Pop Time: The elapsed time from pitch contact to ball arrival at the base. Elite catchers achieve sub-1.90 second pop times to second base, with J.T. Realmuto regularly posting times in the 1.85-1.88 range.

Exchange Speed: The transfer from glove to throwing hand. Quick exchanges can save 0.1-0.2 seconds compared to slow receivers.

Arm Strength: Raw velocity on throws, typically measured in MPH. Elite arms like Jorge Alfaro and Salvador Perez exceed 85 MPH on throws to second.

Accuracy: Throwing to the correct location for the middle infielder to apply the tag. Erratic throws, even if they arrive on time, often result in safe calls or throwing errors.

Caught Stealing Metrics

Traditional caught stealing percentage (CS%) has limitations: it doesn't account for attempt frequency or baserunner speed. Modern metrics include:

Caught Stealing Above Average (CSAA): Expected CS based on baserunner speed and pitcher delivery
Stolen Base Runs Prevented (SBRP): Run value of CS and deterred attempts
Pop Time: Statcast measurement of exchange and throw speed
Baserunner Attempt Rate: How often runners try to steal against this catcher

# R: Analyzing Throwing and Caught Stealing
library(tidyverse)

# Simulate stolen base attempt data
set.seed(2025)
n_attempts <- 2000

sb_data <- tibble(
  attempt_id = 1:n_attempts,
  catcher = sample(c("Realmuto", "Perez", "Salvy", "Rutschman", "Avg_Catcher"),
                   n_attempts, replace = TRUE),
  runner_speed = rnorm(n_attempts, 27, 1.5),  # Sprint speed in ft/s
  pitcher_time = rnorm(n_attempts, 1.3, 0.15),  # Time to plate
  lead_distance = rnorm(n_attempts, 12, 2),  # Lead in feet
  pitcher_hand = sample(c("R", "L"), n_attempts, replace = TRUE)
) %>%
  mutate(
    # Pop time by catcher
    pop_time = case_when(
      catcher == "Realmuto" ~ rnorm(n(), 1.87, 0.08),
      catcher == "Perez" ~ rnorm(n(), 1.92, 0.10),
      catcher == "Rutschman" ~ rnorm(n(), 1.95, 0.09),
      catcher == "Salvy" ~ rnorm(n(), 1.98, 0.11),
      catcher == "Avg_Catcher" ~ rnorm(n(), 2.00, 0.10)
    ),
    # Total time for runner
    total_time_runner = (127 - lead_distance) / runner_speed,
    # Total time for catcher
    total_time_catcher = pitcher_time + pop_time,
    # Caught stealing (with some randomness)
    caught_stealing = total_time_catcher < total_time_runner + rnorm(n(), 0, 0.1)
  )

# Calculate caught stealing metrics
cs_summary <- sb_data %>%
  group_by(catcher) %>%
  summarise(
    attempts = n(),
    caught_stealing = sum(caught_stealing),
    cs_pct = mean(caught_stealing) * 100,
    avg_pop_time = mean(pop_time),
    sb_allowed = attempts - caught_stealing
  ) %>%
  arrange(desc(cs_pct))

print(cs_summary)

# Build expected CS model
cs_model <- glm(
  caught_stealing ~ runner_speed + pitcher_time + lead_distance + pitcher_hand,
  data = sb_data,
  family = binomial
)

sb_data <- sb_data %>%
  mutate(expected_cs_prob = predict(cs_model, newdata = ., type = "response"))

# Calculate CS Above Average
cs_above_avg <- sb_data %>%
  group_by(catcher) %>%
  summarise(
    attempts = n(),
    actual_cs = sum(caught_stealing),
    expected_cs = sum(expected_cs_prob),
    cs_above_avg = actual_cs - expected_cs,
    # Each CS worth ~0.5 runs, each SB costs ~0.2 runs
    throwing_runs = cs_above_avg * 0.7
  ) %>%
  arrange(desc(cs_above_avg))

print(cs_above_avg)

# Visualize pop time distribution
ggplot(sb_data, aes(x = pop_time, fill = catcher)) +
  geom_density(alpha = 0.5) +
  geom_vline(xintercept = 1.95, linetype = "dashed", color = "black") +
  annotate("text", x = 1.95, y = 3, label = "MLB Avg (~1.95s)",
           angle = 90, vjust = -0.5) +
  labs(title = "Pop Time Distribution by Catcher",
       x = "Pop Time (seconds)",
       y = "Density") +
  theme_minimal()

# Analyze success rate by pop time bins
pop_time_analysis <- sb_data %>%
  mutate(
    pop_time_bin = cut(pop_time,
                       breaks = c(0, 1.90, 1.95, 2.00, 2.10, Inf),
                       labels = c("<1.90", "1.90-1.95", "1.95-2.00",
                                  "2.00-2.10", ">2.10"))
  ) %>%
  group_by(pop_time_bin) %>%
  summarise(
    attempts = n(),
    cs_rate = mean(caught_stealing) * 100,
    avg_runner_speed = mean(runner_speed)
  )

print(pop_time_analysis)

# Create visualization
ggplot(pop_time_analysis, aes(x = pop_time_bin, y = cs_rate,
                               fill = pop_time_bin)) +
  geom_col() +
  geom_text(aes(label = sprintf("%.1f%%", cs_rate)),
            vjust = -0.5, size = 4) +
  labs(title = "Caught Stealing Rate by Pop Time",
       subtitle = "Faster exchanges lead to more caught stealers",
       x = "Pop Time Range (seconds)",
       y = "Caught Stealing Rate (%)") +
  theme_minimal() +
  theme(legend.position = "none")

# Python: Comprehensive Throwing Analysis
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

# Simulate throwing data
np.random.seed(2025)
n_attempts = 2000

throwing_df = pd.DataFrame({
    'attempt_id': range(n_attempts),
    'catcher': np.random.choice(['Realmuto', 'Perez', 'Rutschman',
                                  'Salvy', 'Avg_Catcher'], n_attempts),
    'runner_speed': np.random.normal(27, 1.5, n_attempts),
    'pitcher_time': np.random.normal(1.3, 0.15, n_attempts),
    'lead_distance': np.random.normal(12, 2, n_attempts),
    'pitcher_hand': np.random.choice(['R', 'L'], n_attempts)
})

# Assign pop times by catcher skill
pop_time_map = {
    'Realmuto': (1.87, 0.08),
    'Perez': (1.92, 0.10),
    'Rutschman': (1.95, 0.09),
    'Salvy': (1.98, 0.11),
    'Avg_Catcher': (2.00, 0.10)
}

throwing_df['pop_time'] = throwing_df['catcher'].apply(
    lambda x: np.random.normal(pop_time_map[x][0], pop_time_map[x][1])
)

# Calculate times
throwing_df['total_time_runner'] = (
    (127 - throwing_df['lead_distance']) / throwing_df['runner_speed']
)
throwing_df['total_time_catcher'] = (
    throwing_df['pitcher_time'] + throwing_df['pop_time']
)

# Determine outcome with some randomness
throwing_df['caught_stealing'] = (
    throwing_df['total_time_catcher'] <
    throwing_df['total_time_runner'] + np.random.normal(0, 0.1, n_attempts)
).astype(int)

# Summary statistics
cs_summary = throwing_df.groupby('catcher').agg({
    'attempt_id': 'count',
    'caught_stealing': ['sum', 'mean'],
    'pop_time': 'mean'
}).round(3)

cs_summary.columns = ['attempts', 'caught_stealing', 'cs_rate', 'avg_pop_time']
cs_summary['cs_pct'] = cs_summary['cs_rate'] * 100
cs_summary = cs_summary.sort_values('cs_pct', ascending=False)

print("\nCaught Stealing Summary:")
print(cs_summary)

# Build logistic regression for expected CS
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
throwing_df['pitcher_hand_enc'] = le.fit_transform(throwing_df['pitcher_hand'])

X = throwing_df[['runner_speed', 'pitcher_time', 'lead_distance',
                  'pitcher_hand_enc']]
y = throwing_df['caught_stealing']

lr_model = LogisticRegression()
lr_model.fit(X, y)

throwing_df['expected_cs_prob'] = lr_model.predict_proba(X)[:, 1]

# Calculate CS Above Average
cs_above_avg = throwing_df.groupby('catcher').agg({
    'attempt_id': 'count',
    'caught_stealing': 'sum',
    'expected_cs_prob': 'sum'
})

cs_above_avg.columns = ['attempts', 'actual_cs', 'expected_cs']
cs_above_avg['cs_above_avg'] = (
    cs_above_avg['actual_cs'] - cs_above_avg['expected_cs']
)
cs_above_avg['throwing_runs'] = cs_above_avg['cs_above_avg'] * 0.7
cs_above_avg = cs_above_avg.sort_values('cs_above_avg', ascending=False)

print("\nCS Above Average:")
print(cs_above_avg.round(2))

# Visualization 1: Pop time vs CS rate
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Scatter plot
for catcher in throwing_df['catcher'].unique():
    catcher_data = throwing_df[throwing_df['catcher'] == catcher]
    axes[0].scatter(catcher_data['pop_time'],
                    catcher_data['caught_stealing'],
                    alpha=0.3, label=catcher)

axes[0].set_xlabel('Pop Time (seconds)')
axes[0].set_ylabel('Caught Stealing (1=Yes, 0=No)')
axes[0].set_title('Pop Time vs Caught Stealing Outcome')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Box plot of pop times
throwing_df.boxplot(column='pop_time', by='catcher', ax=axes[1])
axes[1].set_xlabel('Catcher')
axes[1].set_ylabel('Pop Time (seconds)')
axes[1].set_title('Pop Time Distribution by Catcher')
axes[1].axhline(y=1.95, color='r', linestyle='--', label='League Average')
plt.suptitle('')  # Remove automatic title

plt.tight_layout()
plt.savefig('throwing_analysis.png', dpi=300, bbox_inches='tight')
plt.close()

# Visualization 2: Success rate by pop time bins
throwing_df['pop_time_bin'] = pd.cut(
    throwing_df['pop_time'],
    bins=[0, 1.90, 1.95, 2.00, 2.10, np.inf],
    labels=['<1.90', '1.90-1.95', '1.95-2.00', '2.00-2.10', '>2.10']
)

bin_analysis = throwing_df.groupby('pop_time_bin').agg({
    'caught_stealing': ['count', 'mean'],
    'runner_speed': 'mean'
}).round(3)

bin_analysis.columns = ['attempts', 'cs_rate', 'avg_runner_speed']
bin_analysis['cs_pct'] = bin_analysis['cs_rate'] * 100

print("\nCS Rate by Pop Time Bin:")
print(bin_analysis)

# Create bar chart
plt.figure(figsize=(10, 6))
bars = plt.bar(range(len(bin_analysis)), bin_analysis['cs_pct'])
plt.xlabel('Pop Time Range (seconds)')
plt.ylabel('Caught Stealing Rate (%)')
plt.title('Impact of Pop Time on Caught Stealing Success')
plt.xticks(range(len(bin_analysis)), bin_analysis.index)

# Add value labels on bars
for i, (idx, row) in enumerate(bin_analysis.iterrows()):
    plt.text(i, row['cs_pct'] + 1, f"{row['cs_pct']:.1f}%",
             ha='center', va='bottom')

plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig('cs_by_poptime.png', dpi=300, bbox_inches='tight')
plt.close()

print("\nCorrelation between pop time and CS success:")
print(f"Pearson r = {stats.pearsonr(throwing_df['pop_time'], throwing_df['caught_stealing'])[0]:.3f}")

Controlling the Running Game

Beyond throwing out runners, elite catchers deter attempts through reputation. J.T. Realmuto's elite arm strength and quick release led to significantly fewer stolen base attempts per game. This "deterrence value" doesn't appear in traditional stats but prevents runs.

Factors Affecting Attempt Rate:

Catcher pop time and CS%

Pitcher delivery time (slow deliveries invite attempts)

Game situation (score, inning, runners)

Catcher reputation and scouting reports

Teams with elite throwing catchers can employ pitchers with slower deliveries, while teams with poor throwers must prioritize quick-to-the-plate pitchers or employ more pickoff attempts and slide-step deliveries.

The Declining Stolen Base Environment

Stolen base attempts have declined significantly since the 2010s as teams recognize the unfavorable risk-reward ratio (need >75% success rate to break even). This reduces the relative value of elite throwing, though it remains important against aggressive baserunning teams and in high-leverage playoff situations.

The 2023 rule changes (larger bases, restricted pickoffs) increased stolen base attempts significantly, potentially elevating the value of throwing ability once again.

# R: Analyzing Throwing and Caught Stealing
library(tidyverse)

# Simulate stolen base attempt data
set.seed(2025)
n_attempts <- 2000

sb_data <- tibble(
  attempt_id = 1:n_attempts,
  catcher = sample(c("Realmuto", "Perez", "Salvy", "Rutschman", "Avg_Catcher"),
                   n_attempts, replace = TRUE),
  runner_speed = rnorm(n_attempts, 27, 1.5),  # Sprint speed in ft/s
  pitcher_time = rnorm(n_attempts, 1.3, 0.15),  # Time to plate
  lead_distance = rnorm(n_attempts, 12, 2),  # Lead in feet
  pitcher_hand = sample(c("R", "L"), n_attempts, replace = TRUE)
) %>%
  mutate(
    # Pop time by catcher
    pop_time = case_when(
      catcher == "Realmuto" ~ rnorm(n(), 1.87, 0.08),
      catcher == "Perez" ~ rnorm(n(), 1.92, 0.10),
      catcher == "Rutschman" ~ rnorm(n(), 1.95, 0.09),
      catcher == "Salvy" ~ rnorm(n(), 1.98, 0.11),
      catcher == "Avg_Catcher" ~ rnorm(n(), 2.00, 0.10)
    ),
    # Total time for runner
    total_time_runner = (127 - lead_distance) / runner_speed,
    # Total time for catcher
    total_time_catcher = pitcher_time + pop_time,
    # Caught stealing (with some randomness)
    caught_stealing = total_time_catcher < total_time_runner + rnorm(n(), 0, 0.1)
  )

# Calculate caught stealing metrics
cs_summary <- sb_data %>%
  group_by(catcher) %>%
  summarise(
    attempts = n(),
    caught_stealing = sum(caught_stealing),
    cs_pct = mean(caught_stealing) * 100,
    avg_pop_time = mean(pop_time),
    sb_allowed = attempts - caught_stealing
  ) %>%
  arrange(desc(cs_pct))

print(cs_summary)

# Build expected CS model
cs_model <- glm(
  caught_stealing ~ runner_speed + pitcher_time + lead_distance + pitcher_hand,
  data = sb_data,
  family = binomial
)

sb_data <- sb_data %>%
  mutate(expected_cs_prob = predict(cs_model, newdata = ., type = "response"))

# Calculate CS Above Average
cs_above_avg <- sb_data %>%
  group_by(catcher) %>%
  summarise(
    attempts = n(),
    actual_cs = sum(caught_stealing),
    expected_cs = sum(expected_cs_prob),
    cs_above_avg = actual_cs - expected_cs,
    # Each CS worth ~0.5 runs, each SB costs ~0.2 runs
    throwing_runs = cs_above_avg * 0.7
  ) %>%
  arrange(desc(cs_above_avg))

print(cs_above_avg)

# Visualize pop time distribution
ggplot(sb_data, aes(x = pop_time, fill = catcher)) +
  geom_density(alpha = 0.5) +
  geom_vline(xintercept = 1.95, linetype = "dashed", color = "black") +
  annotate("text", x = 1.95, y = 3, label = "MLB Avg (~1.95s)",
           angle = 90, vjust = -0.5) +
  labs(title = "Pop Time Distribution by Catcher",
       x = "Pop Time (seconds)",
       y = "Density") +
  theme_minimal()

# Analyze success rate by pop time bins
pop_time_analysis <- sb_data %>%
  mutate(
    pop_time_bin = cut(pop_time,
                       breaks = c(0, 1.90, 1.95, 2.00, 2.10, Inf),
                       labels = c("<1.90", "1.90-1.95", "1.95-2.00",
                                  "2.00-2.10", ">2.10"))
  ) %>%
  group_by(pop_time_bin) %>%
  summarise(
    attempts = n(),
    cs_rate = mean(caught_stealing) * 100,
    avg_runner_speed = mean(runner_speed)
  )

print(pop_time_analysis)

# Create visualization
ggplot(pop_time_analysis, aes(x = pop_time_bin, y = cs_rate,
                               fill = pop_time_bin)) +
  geom_col() +
  geom_text(aes(label = sprintf("%.1f%%", cs_rate)),
            vjust = -0.5, size = 4) +
  labs(title = "Caught Stealing Rate by Pop Time",
       subtitle = "Faster exchanges lead to more caught stealers",
       x = "Pop Time Range (seconds)",
       y = "Caught Stealing Rate (%)") +
  theme_minimal() +
  theme(legend.position = "none")

Python

# Python: Comprehensive Throwing Analysis
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

# Simulate throwing data
np.random.seed(2025)
n_attempts = 2000

throwing_df = pd.DataFrame({
    'attempt_id': range(n_attempts),
    'catcher': np.random.choice(['Realmuto', 'Perez', 'Rutschman',
                                  'Salvy', 'Avg_Catcher'], n_attempts),
    'runner_speed': np.random.normal(27, 1.5, n_attempts),
    'pitcher_time': np.random.normal(1.3, 0.15, n_attempts),
    'lead_distance': np.random.normal(12, 2, n_attempts),
    'pitcher_hand': np.random.choice(['R', 'L'], n_attempts)
})

# Assign pop times by catcher skill
pop_time_map = {
    'Realmuto': (1.87, 0.08),
    'Perez': (1.92, 0.10),
    'Rutschman': (1.95, 0.09),
    'Salvy': (1.98, 0.11),
    'Avg_Catcher': (2.00, 0.10)
}

throwing_df['pop_time'] = throwing_df['catcher'].apply(
    lambda x: np.random.normal(pop_time_map[x][0], pop_time_map[x][1])
)

# Calculate times
throwing_df['total_time_runner'] = (
    (127 - throwing_df['lead_distance']) / throwing_df['runner_speed']
)
throwing_df['total_time_catcher'] = (
    throwing_df['pitcher_time'] + throwing_df['pop_time']
)

# Determine outcome with some randomness
throwing_df['caught_stealing'] = (
    throwing_df['total_time_catcher'] <
    throwing_df['total_time_runner'] + np.random.normal(0, 0.1, n_attempts)
).astype(int)

# Summary statistics
cs_summary = throwing_df.groupby('catcher').agg({
    'attempt_id': 'count',
    'caught_stealing': ['sum', 'mean'],
    'pop_time': 'mean'
}).round(3)

cs_summary.columns = ['attempts', 'caught_stealing', 'cs_rate', 'avg_pop_time']
cs_summary['cs_pct'] = cs_summary['cs_rate'] * 100
cs_summary = cs_summary.sort_values('cs_pct', ascending=False)

print("\nCaught Stealing Summary:")
print(cs_summary)

# Build logistic regression for expected CS
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
throwing_df['pitcher_hand_enc'] = le.fit_transform(throwing_df['pitcher_hand'])

X = throwing_df[['runner_speed', 'pitcher_time', 'lead_distance',
                  'pitcher_hand_enc']]
y = throwing_df['caught_stealing']

lr_model = LogisticRegression()
lr_model.fit(X, y)

throwing_df['expected_cs_prob'] = lr_model.predict_proba(X)[:, 1]

# Calculate CS Above Average
cs_above_avg = throwing_df.groupby('catcher').agg({
    'attempt_id': 'count',
    'caught_stealing': 'sum',
    'expected_cs_prob': 'sum'
})

cs_above_avg.columns = ['attempts', 'actual_cs', 'expected_cs']
cs_above_avg['cs_above_avg'] = (
    cs_above_avg['actual_cs'] - cs_above_avg['expected_cs']
)
cs_above_avg['throwing_runs'] = cs_above_avg['cs_above_avg'] * 0.7
cs_above_avg = cs_above_avg.sort_values('cs_above_avg', ascending=False)

print("\nCS Above Average:")
print(cs_above_avg.round(2))

# Visualization 1: Pop time vs CS rate
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Scatter plot
for catcher in throwing_df['catcher'].unique():
    catcher_data = throwing_df[throwing_df['catcher'] == catcher]
    axes[0].scatter(catcher_data['pop_time'],
                    catcher_data['caught_stealing'],
                    alpha=0.3, label=catcher)

axes[0].set_xlabel('Pop Time (seconds)')
axes[0].set_ylabel('Caught Stealing (1=Yes, 0=No)')
axes[0].set_title('Pop Time vs Caught Stealing Outcome')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Box plot of pop times
throwing_df.boxplot(column='pop_time', by='catcher', ax=axes[1])
axes[1].set_xlabel('Catcher')
axes[1].set_ylabel('Pop Time (seconds)')
axes[1].set_title('Pop Time Distribution by Catcher')
axes[1].axhline(y=1.95, color='r', linestyle='--', label='League Average')
plt.suptitle('')  # Remove automatic title

plt.tight_layout()
plt.savefig('throwing_analysis.png', dpi=300, bbox_inches='tight')
plt.close()

# Visualization 2: Success rate by pop time bins
throwing_df['pop_time_bin'] = pd.cut(
    throwing_df['pop_time'],
    bins=[0, 1.90, 1.95, 2.00, 2.10, np.inf],
    labels=['<1.90', '1.90-1.95', '1.95-2.00', '2.00-2.10', '>2.10']
)

bin_analysis = throwing_df.groupby('pop_time_bin').agg({
    'caught_stealing': ['count', 'mean'],
    'runner_speed': 'mean'
}).round(3)

bin_analysis.columns = ['attempts', 'cs_rate', 'avg_runner_speed']
bin_analysis['cs_pct'] = bin_analysis['cs_rate'] * 100

print("\nCS Rate by Pop Time Bin:")
print(bin_analysis)

# Create bar chart
plt.figure(figsize=(10, 6))
bars = plt.bar(range(len(bin_analysis)), bin_analysis['cs_pct'])
plt.xlabel('Pop Time Range (seconds)')
plt.ylabel('Caught Stealing Rate (%)')
plt.title('Impact of Pop Time on Caught Stealing Success')
plt.xticks(range(len(bin_analysis)), bin_analysis.index)

# Add value labels on bars
for i, (idx, row) in enumerate(bin_analysis.iterrows()):
    plt.text(i, row['cs_pct'] + 1, f"{row['cs_pct']:.1f}%",
             ha='center', va='bottom')

plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig('cs_by_poptime.png', dpi=300, bbox_inches='tight')
plt.close()

print("\nCorrelation between pop time and CS success:")
print(f"Pearson r = {stats.pearsonr(throwing_df['pop_time'], throwing_df['caught_stealing'])[0]:.3f}")

20.5 Game Calling & Pitcher Handling

The least quantifiable aspect of catcher value involves pitch calling, sequencing, and managing pitcher psychology. While difficult to measure, teams believe these skills significantly impact pitcher performance.

Pitch Calling and Sequencing

Modern pitch calling is increasingly collaborative, with input from:

Bench coaches calling pitches via signals

Advance scouting providing batter tendencies

Analytics departments suggesting pitch types and locations

Technology (e.g., PitchCom devices) facilitating communication

Despite this support, catchers make real-time adjustments based on:

Pitcher stuff quality that day

Batter's swing decisions and timing

Umpire strike zone tendencies

Game situation and leverage

Weather and environmental factors

Elite game callers like Yadier Molina and Buster Posey developed reputations for maximizing pitcher performance. Quantifying this requires comparing pitcher results with different catchers, controlling for opposition quality and other factors.

Measuring Game Calling Impact

Researchers use several approaches:

Catcher ERA: Compare team ERA with each catcher, though this conflates multiple skills and suffers from small sample noise.

Pitch Type Frequency Analysis: Examine whether catchers call for optimal pitch mixes based on situation and batter tendencies.

Sequencing Metrics: Measure whether pitch sequences deviate from expectations in ways that improve outcomes.

Pitcher Testimony: Qualitative feedback from pitchers about catcher impact.

# R: Analyzing Catcher Impact on Pitcher Performance
library(tidyverse)
library(lme4)  # For mixed effects models

# Simulate pitcher performance data by catcher
set.seed(2026)
n_games <- 1000

pitcher_catcher_data <- tibble(
  game_id = 1:n_games,
  pitcher = sample(paste0("P", 1:30), n_games, replace = TRUE),
  catcher = sample(c("Realmuto", "Rutschman", "Molina", "Smith", "Avg_Catcher"),
                   n_games, replace = TRUE),
  opponent_wrc = rnorm(n_games, 100, 15),  # Opponent quality
  temperature = rnorm(n_games, 75, 10),
  pitcher_stuff = rnorm(n_games, 50, 10)  # Pitcher stuff that day
) %>%
  mutate(
    # Catcher effect on performance
    catcher_effect = case_when(
      catcher == "Molina" ~ -0.3,  # Negative = better (lower ERA)
      catcher == "Realmuto" ~ -0.2,
      catcher == "Rutschman" ~ -0.15,
      catcher == "Smith" ~ -0.1,
      catcher == "Avg_Catcher" ~ 0
    ),
    # Generate ERA for that game
    game_era = 4.5 +
               (opponent_wrc - 100) * 0.02 +
               (pitcher_stuff - 50) * -0.03 +
               catcher_effect +
               rnorm(n_games, 0, 1.5),
    game_era = pmax(0, game_era)  # ERA can't be negative
  )

# Basic catcher ERA comparison
catcher_era <- pitcher_catcher_data %>%
  group_by(catcher) %>%
  summarise(
    games = n(),
    avg_era = mean(game_era),
    median_era = median(game_era),
    sd_era = sd(game_era)
  ) %>%
  arrange(avg_era)

print(catcher_era)

# Build mixed effects model to control for pitcher quality
# This accounts for the fact that different catchers catch different pitchers
mixed_model <- lmer(
  game_era ~ catcher + opponent_wrc + temperature + (1 | pitcher),
  data = pitcher_catcher_data
)

# Extract catcher effects
catcher_effects <- summary(mixed_model)$coefficients %>%
  as.data.frame() %>%
  rownames_to_column("term") %>%
  filter(str_detect(term, "catcher")) %>%
  select(term, Estimate, `Std. Error`, `t value`)

print(catcher_effects)

# Analyze pitcher performance variance by catcher
# Do certain catchers get more consistent results?
pitcher_variance <- pitcher_catcher_data %>%
  group_by(pitcher, catcher) %>%
  filter(n() >= 5) %>%  # At least 5 starts together
  summarise(
    starts = n(),
    avg_era = mean(game_era),
    sd_era = sd(game_era),
    .groups = "drop"
  )

variance_summary <- pitcher_variance %>%
  group_by(catcher) %>%
  summarise(
    pitcher_pairs = n(),
    avg_variance = mean(sd_era),
    median_variance = median(sd_era)
  ) %>%
  arrange(avg_variance)

print(variance_summary)

# Visualize catcher impact
ggplot(pitcher_catcher_data, aes(x = reorder(catcher, game_era),
                                  y = game_era, fill = catcher)) +
  geom_boxplot() +
  coord_flip() +
  labs(title = "Game ERA Distribution by Catcher",
       subtitle = "Lower ERA indicates better pitcher performance",
       x = "Catcher",
       y = "Game ERA") +
  theme_minimal() +
  theme(legend.position = "none")

# Simulate pitch calling data
set.seed(2027)
n_pas <- 10000

pitch_calling <- tibble(
  pa_id = 1:n_pas,
  catcher = sample(c("Elite_Caller", "Avg_Caller", "Poor_Caller"),
                   n_pas, replace = TRUE),
  count = sample(c("0-0", "0-1", "1-0", "1-1", "2-0", "0-2", "2-1", "3-1", "2-2", "3-2"),
                 n_pas, replace = TRUE),
  pitch_type_called = sample(c("FB", "SL", "CH", "CU"), n_pas, replace = TRUE),
  batter_expects = sample(c("FB", "SL", "CH", "CU"), n_pas, replace = TRUE)
) %>%
  mutate(
    # Surprise factor
    surprised_batter = pitch_type_called != batter_expects,
    # Better callers surprise batters more in pitcher's counts
    is_pitchers_count = count %in% c("0-1", "0-2", "1-2", "2-2"),
    caller_skill = case_when(
      catcher == "Elite_Caller" ~ 0.15,
      catcher == "Avg_Caller" ~ 0,
      catcher == "Poor_Caller" ~ -0.1
    ),
    # Outcome (simplified: just contact probability)
    base_contact_prob = ifelse(surprised_batter, 0.65, 0.75),
    adj_contact_prob = plogis(qlogis(base_contact_prob) + caller_skill),
    contact_made = rbinom(n_pas, 1, adj_contact_prob)
  )

# Analyze calling effectiveness
calling_effectiveness <- pitch_calling %>%
  group_by(catcher, is_pitchers_count) %>%
  summarise(
    pa = n(),
    contact_rate = mean(contact_made) * 100,
    surprise_rate = mean(surprised_batter) * 100,
    .groups = "drop"
  ) %>%
  pivot_wider(names_from = is_pitchers_count,
              values_from = c(contact_rate, surprise_rate),
              names_prefix = "pitchers_count_")

print(calling_effectiveness)

# Python: Advanced Pitcher-Catcher Pairing Analysis
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression

# Simulate pitcher-catcher pair data
np.random.seed(2026)
n_games = 1000

pitchers = [f'P{i}' for i in range(1, 31)]
catchers = ['Realmuto', 'Rutschman', 'Molina', 'Smith', 'Avg_Catcher']

pc_df = pd.DataFrame({
    'game_id': range(n_games),
    'pitcher': np.random.choice(pitchers, n_games),
    'catcher': np.random.choice(catchers, n_games),
    'opponent_wrc': np.random.normal(100, 15, n_games),
    'temperature': np.random.normal(75, 10, n_games),
    'pitcher_stuff': np.random.normal(50, 10, n_games)
})

# Assign catcher effects
catcher_effects = {
    'Molina': -0.3,
    'Realmuto': -0.2,
    'Rutschman': -0.15,
    'Smith': -0.1,
    'Avg_Catcher': 0
}

pc_df['catcher_effect'] = pc_df['catcher'].map(catcher_effects)

# Generate game ERA
pc_df['game_era'] = (
    4.5 +
    (pc_df['opponent_wrc'] - 100) * 0.02 +
    (pc_df['pitcher_stuff'] - 50) * -0.03 +
    pc_df['catcher_effect'] +
    np.random.normal(0, 1.5, n_games)
)
pc_df['game_era'] = pc_df['game_era'].clip(lower=0)

# Basic catcher ERA summary
catcher_era = pc_df.groupby('catcher').agg({
    'game_id': 'count',
    'game_era': ['mean', 'median', 'std']
}).round(3)

catcher_era.columns = ['games', 'avg_era', 'median_era', 'sd_era']
catcher_era = catcher_era.sort_values('avg_era')

print("\nCatcher ERA Summary:")
print(catcher_era)

# Build regression model controlling for other factors
X = pd.get_dummies(pc_df[['catcher', 'opponent_wrc', 'temperature']],
                   columns=['catcher'], drop_first=True)
y = pc_df['game_era']

lr_model = LinearRegression()
lr_model.fit(X, y)

# Extract catcher coefficients
catcher_cols = [col for col in X.columns if 'catcher_' in col]
catcher_coefs = pd.DataFrame({
    'catcher': [col.replace('catcher_', '') for col in catcher_cols],
    'era_effect': lr_model.coef_[[X.columns.get_loc(col) for col in catcher_cols]]
})
catcher_coefs = catcher_coefs.sort_values('era_effect')

print("\nCatcher ERA Effects (controlling for opponent and weather):")
print(catcher_coefs)

# Analyze specific pitcher-catcher pairs
pair_analysis = pc_df.groupby(['pitcher', 'catcher']).agg({
    'game_id': 'count',
    'game_era': ['mean', 'std']
}).reset_index()

pair_analysis.columns = ['pitcher', 'catcher', 'games', 'avg_era', 'std_era']
pair_analysis = pair_analysis[pair_analysis['games'] >= 5]  # Minimum sample

# Find best and worst pairs for each pitcher
pitcher_pairs = []
for pitcher in pair_analysis['pitcher'].unique():
    pitcher_data = pair_analysis[pair_analysis['pitcher'] == pitcher]
    if len(pitcher_data) >= 2:
        best = pitcher_data.nsmallest(1, 'avg_era')
        worst = pitcher_data.nlargest(1, 'avg_era')
        diff = worst['avg_era'].values[0] - best['avg_era'].values[0]
        pitcher_pairs.append({
            'pitcher': pitcher,
            'best_catcher': best['catcher'].values[0],
            'best_era': best['avg_era'].values[0],
            'worst_catcher': worst['catcher'].values[0],
            'worst_era': worst['avg_era'].values[0],
            'era_diff': diff
        })

pairs_df = pd.DataFrame(pitcher_pairs).sort_values('era_diff', ascending=False)

print("\nLargest Pitcher-Catcher Pairing Effects:")
print(pairs_df.head(10))

# Visualization 1: Catcher ERA comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Box plot
pc_df.boxplot(column='game_era', by='catcher', ax=axes[0])
axes[0].set_xlabel('Catcher')
axes[0].set_ylabel('Game ERA')
axes[0].set_title('Game ERA Distribution by Catcher')
plt.suptitle('')

# Violin plot with points
sns.violinplot(data=pc_df, x='catcher', y='game_era', ax=axes[1])
axes[1].set_xlabel('Catcher')
axes[1].set_ylabel('Game ERA')
axes[1].set_title('Game ERA Distribution (Violin Plot)')
axes[1].axhline(y=pc_df['game_era'].mean(), color='r', linestyle='--',
                label='Overall Mean')
axes[1].legend()

plt.tight_layout()
plt.savefig('catcher_era_analysis.png', dpi=300, bbox_inches='tight')
plt.close()

# Visualization 2: Pitcher consistency by catcher
variance_by_catcher = pc_df.groupby(['pitcher', 'catcher']).agg({
    'game_era': ['mean', 'std', 'count']
}).reset_index()
variance_by_catcher.columns = ['pitcher', 'catcher', 'mean_era', 'std_era', 'games']
variance_by_catcher = variance_by_catcher[variance_by_catcher['games'] >= 3]

fig, ax = plt.subplots(figsize=(10, 6))
for catcher in catchers:
    catcher_data = variance_by_catcher[variance_by_catcher['catcher'] == catcher]
    ax.scatter(catcher_data['mean_era'], catcher_data['std_era'],
               label=catcher, alpha=0.6, s=50)

ax.set_xlabel('Mean Game ERA')
ax.set_ylabel('Standard Deviation of Game ERA')
ax.set_title('Pitcher Consistency by Catcher\n(Lower std = more consistent)')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('catcher_consistency.png', dpi=300, bbox_inches='tight')
plt.close()

print("\nAnalysis complete. Visualizations saved.")

Pitcher Relationships and Psychology

Beyond mechanics, elite catchers build trust with pitchers through:

Communication: Discussing approach between innings and during mound visits

Confidence Building: Supporting pitchers through struggles

Strategic Adjustment: Recognizing when pitchers don't have their best stuff

Veteran Leadership: Calming young pitchers in high-pressure situations

Buster Posey's three World Series championships were partly attributed to his ability to optimize Madison Bumgarner, Tim Lincecum, and other Giants pitchers. While these soft skills resist quantification, teams value them highly when evaluating catchers.

# R: Analyzing Catcher Impact on Pitcher Performance
library(tidyverse)
library(lme4)  # For mixed effects models

# Simulate pitcher performance data by catcher
set.seed(2026)
n_games <- 1000

pitcher_catcher_data <- tibble(
  game_id = 1:n_games,
  pitcher = sample(paste0("P", 1:30), n_games, replace = TRUE),
  catcher = sample(c("Realmuto", "Rutschman", "Molina", "Smith", "Avg_Catcher"),
                   n_games, replace = TRUE),
  opponent_wrc = rnorm(n_games, 100, 15),  # Opponent quality
  temperature = rnorm(n_games, 75, 10),
  pitcher_stuff = rnorm(n_games, 50, 10)  # Pitcher stuff that day
) %>%
  mutate(
    # Catcher effect on performance
    catcher_effect = case_when(
      catcher == "Molina" ~ -0.3,  # Negative = better (lower ERA)
      catcher == "Realmuto" ~ -0.2,
      catcher == "Rutschman" ~ -0.15,
      catcher == "Smith" ~ -0.1,
      catcher == "Avg_Catcher" ~ 0
    ),
    # Generate ERA for that game
    game_era = 4.5 +
               (opponent_wrc - 100) * 0.02 +
               (pitcher_stuff - 50) * -0.03 +
               catcher_effect +
               rnorm(n_games, 0, 1.5),
    game_era = pmax(0, game_era)  # ERA can't be negative
  )

# Basic catcher ERA comparison
catcher_era <- pitcher_catcher_data %>%
  group_by(catcher) %>%
  summarise(
    games = n(),
    avg_era = mean(game_era),
    median_era = median(game_era),
    sd_era = sd(game_era)
  ) %>%
  arrange(avg_era)

print(catcher_era)

# Build mixed effects model to control for pitcher quality
# This accounts for the fact that different catchers catch different pitchers
mixed_model <- lmer(
  game_era ~ catcher + opponent_wrc + temperature + (1 | pitcher),
  data = pitcher_catcher_data
)

# Extract catcher effects
catcher_effects <- summary(mixed_model)$coefficients %>%
  as.data.frame() %>%
  rownames_to_column("term") %>%
  filter(str_detect(term, "catcher")) %>%
  select(term, Estimate, `Std. Error`, `t value`)

print(catcher_effects)

# Analyze pitcher performance variance by catcher
# Do certain catchers get more consistent results?
pitcher_variance <- pitcher_catcher_data %>%
  group_by(pitcher, catcher) %>%
  filter(n() >= 5) %>%  # At least 5 starts together
  summarise(
    starts = n(),
    avg_era = mean(game_era),
    sd_era = sd(game_era),
    .groups = "drop"
  )

variance_summary <- pitcher_variance %>%
  group_by(catcher) %>%
  summarise(
    pitcher_pairs = n(),
    avg_variance = mean(sd_era),
    median_variance = median(sd_era)
  ) %>%
  arrange(avg_variance)

print(variance_summary)

# Visualize catcher impact
ggplot(pitcher_catcher_data, aes(x = reorder(catcher, game_era),
                                  y = game_era, fill = catcher)) +
  geom_boxplot() +
  coord_flip() +
  labs(title = "Game ERA Distribution by Catcher",
       subtitle = "Lower ERA indicates better pitcher performance",
       x = "Catcher",
       y = "Game ERA") +
  theme_minimal() +
  theme(legend.position = "none")

# Simulate pitch calling data
set.seed(2027)
n_pas <- 10000

pitch_calling <- tibble(
  pa_id = 1:n_pas,
  catcher = sample(c("Elite_Caller", "Avg_Caller", "Poor_Caller"),
                   n_pas, replace = TRUE),
  count = sample(c("0-0", "0-1", "1-0", "1-1", "2-0", "0-2", "2-1", "3-1", "2-2", "3-2"),
                 n_pas, replace = TRUE),
  pitch_type_called = sample(c("FB", "SL", "CH", "CU"), n_pas, replace = TRUE),
  batter_expects = sample(c("FB", "SL", "CH", "CU"), n_pas, replace = TRUE)
) %>%
  mutate(
    # Surprise factor
    surprised_batter = pitch_type_called != batter_expects,
    # Better callers surprise batters more in pitcher's counts
    is_pitchers_count = count %in% c("0-1", "0-2", "1-2", "2-2"),
    caller_skill = case_when(
      catcher == "Elite_Caller" ~ 0.15,
      catcher == "Avg_Caller" ~ 0,
      catcher == "Poor_Caller" ~ -0.1
    ),
    # Outcome (simplified: just contact probability)
    base_contact_prob = ifelse(surprised_batter, 0.65, 0.75),
    adj_contact_prob = plogis(qlogis(base_contact_prob) + caller_skill),
    contact_made = rbinom(n_pas, 1, adj_contact_prob)
  )

# Analyze calling effectiveness
calling_effectiveness <- pitch_calling %>%
  group_by(catcher, is_pitchers_count) %>%
  summarise(
    pa = n(),
    contact_rate = mean(contact_made) * 100,
    surprise_rate = mean(surprised_batter) * 100,
    .groups = "drop"
  ) %>%
  pivot_wider(names_from = is_pitchers_count,
              values_from = c(contact_rate, surprise_rate),
              names_prefix = "pitchers_count_")

print(calling_effectiveness)

Python

# Python: Advanced Pitcher-Catcher Pairing Analysis
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression

# Simulate pitcher-catcher pair data
np.random.seed(2026)
n_games = 1000

pitchers = [f'P{i}' for i in range(1, 31)]
catchers = ['Realmuto', 'Rutschman', 'Molina', 'Smith', 'Avg_Catcher']

pc_df = pd.DataFrame({
    'game_id': range(n_games),
    'pitcher': np.random.choice(pitchers, n_games),
    'catcher': np.random.choice(catchers, n_games),
    'opponent_wrc': np.random.normal(100, 15, n_games),
    'temperature': np.random.normal(75, 10, n_games),
    'pitcher_stuff': np.random.normal(50, 10, n_games)
})

# Assign catcher effects
catcher_effects = {
    'Molina': -0.3,
    'Realmuto': -0.2,
    'Rutschman': -0.15,
    'Smith': -0.1,
    'Avg_Catcher': 0
}

pc_df['catcher_effect'] = pc_df['catcher'].map(catcher_effects)

# Generate game ERA
pc_df['game_era'] = (
    4.5 +
    (pc_df['opponent_wrc'] - 100) * 0.02 +
    (pc_df['pitcher_stuff'] - 50) * -0.03 +
    pc_df['catcher_effect'] +
    np.random.normal(0, 1.5, n_games)
)
pc_df['game_era'] = pc_df['game_era'].clip(lower=0)

# Basic catcher ERA summary
catcher_era = pc_df.groupby('catcher').agg({
    'game_id': 'count',
    'game_era': ['mean', 'median', 'std']
}).round(3)

catcher_era.columns = ['games', 'avg_era', 'median_era', 'sd_era']
catcher_era = catcher_era.sort_values('avg_era')

print("\nCatcher ERA Summary:")
print(catcher_era)

# Build regression model controlling for other factors
X = pd.get_dummies(pc_df[['catcher', 'opponent_wrc', 'temperature']],
                   columns=['catcher'], drop_first=True)
y = pc_df['game_era']

lr_model = LinearRegression()
lr_model.fit(X, y)

# Extract catcher coefficients
catcher_cols = [col for col in X.columns if 'catcher_' in col]
catcher_coefs = pd.DataFrame({
    'catcher': [col.replace('catcher_', '') for col in catcher_cols],
    'era_effect': lr_model.coef_[[X.columns.get_loc(col) for col in catcher_cols]]
})
catcher_coefs = catcher_coefs.sort_values('era_effect')

print("\nCatcher ERA Effects (controlling for opponent and weather):")
print(catcher_coefs)

# Analyze specific pitcher-catcher pairs
pair_analysis = pc_df.groupby(['pitcher', 'catcher']).agg({
    'game_id': 'count',
    'game_era': ['mean', 'std']
}).reset_index()

pair_analysis.columns = ['pitcher', 'catcher', 'games', 'avg_era', 'std_era']
pair_analysis = pair_analysis[pair_analysis['games'] >= 5]  # Minimum sample

# Find best and worst pairs for each pitcher
pitcher_pairs = []
for pitcher in pair_analysis['pitcher'].unique():
    pitcher_data = pair_analysis[pair_analysis['pitcher'] == pitcher]
    if len(pitcher_data) >= 2:
        best = pitcher_data.nsmallest(1, 'avg_era')
        worst = pitcher_data.nlargest(1, 'avg_era')
        diff = worst['avg_era'].values[0] - best['avg_era'].values[0]
        pitcher_pairs.append({
            'pitcher': pitcher,
            'best_catcher': best['catcher'].values[0],
            'best_era': best['avg_era'].values[0],
            'worst_catcher': worst['catcher'].values[0],
            'worst_era': worst['avg_era'].values[0],
            'era_diff': diff
        })

pairs_df = pd.DataFrame(pitcher_pairs).sort_values('era_diff', ascending=False)

print("\nLargest Pitcher-Catcher Pairing Effects:")
print(pairs_df.head(10))

# Visualization 1: Catcher ERA comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Box plot
pc_df.boxplot(column='game_era', by='catcher', ax=axes[0])
axes[0].set_xlabel('Catcher')
axes[0].set_ylabel('Game ERA')
axes[0].set_title('Game ERA Distribution by Catcher')
plt.suptitle('')

# Violin plot with points
sns.violinplot(data=pc_df, x='catcher', y='game_era', ax=axes[1])
axes[1].set_xlabel('Catcher')
axes[1].set_ylabel('Game ERA')
axes[1].set_title('Game ERA Distribution (Violin Plot)')
axes[1].axhline(y=pc_df['game_era'].mean(), color='r', linestyle='--',
                label='Overall Mean')
axes[1].legend()

plt.tight_layout()
plt.savefig('catcher_era_analysis.png', dpi=300, bbox_inches='tight')
plt.close()

# Visualization 2: Pitcher consistency by catcher
variance_by_catcher = pc_df.groupby(['pitcher', 'catcher']).agg({
    'game_era': ['mean', 'std', 'count']
}).reset_index()
variance_by_catcher.columns = ['pitcher', 'catcher', 'mean_era', 'std_era', 'games']
variance_by_catcher = variance_by_catcher[variance_by_catcher['games'] >= 3]

fig, ax = plt.subplots(figsize=(10, 6))
for catcher in catchers:
    catcher_data = variance_by_catcher[variance_by_catcher['catcher'] == catcher]
    ax.scatter(catcher_data['mean_era'], catcher_data['std_era'],
               label=catcher, alpha=0.6, s=50)

ax.set_xlabel('Mean Game ERA')
ax.set_ylabel('Standard Deviation of Game ERA')
ax.set_title('Pitcher Consistency by Catcher\n(Lower std = more consistent)')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('catcher_consistency.png', dpi=300, bbox_inches='tight')
plt.close()

print("\nAnalysis complete. Visualizations saved.")

20.6 Total Catcher Value Models

Integrating all defensive components into a single value metric represents the ultimate goal of catcher analytics. Several public and proprietary systems exist.

Components of Total Catcher Value

Modern systems typically include:

Framing Runs (largest component for elite framers)
Blocking Runs
Throwing Runs (CS and deterrence)
Fielding Runs (fielding bunts, pop-ups)
Game Calling/Pitcher Management (often estimated or omitted)

Popular Metrics:

FanGraphs DEF: Combines framing, blocking, throwing

Baseball Prospectus FRAA: Fielding Runs Above Average including framing

Statcast Catcher Defense: Uses Statcast data for framing and blocking

Proprietary Team Models: Most teams build internal systems with additional data

Building a Composite Catcher Value Model

# R: Composite Catcher Value Model
library(tidyverse)

# Create synthetic catcher performance data (per 1000 innings)
catcher_performance <- tibble(
  catcher = c("Realmuto", "Rutschman", "Smith", "Barnhart", "Perez",
              "Grandal", "Contreras", "Kirk", "Vazquez", "Hedges"),
  innings = c(1200, 1100, 1150, 900, 1300, 1000, 1100, 950, 1050, 800),

  # Framing (runs per 7000 called pitches, normalized to innings)
  framing_runs = c(15, 12, 10, 22, -8, 18, -5, 5, 8, 25),

  # Blocking (runs per 1000 pitches in dirt)
  blocking_runs = c(3, 4, 2, 3, 1, 2, -1, 1, 3, 4),

  # Throwing (runs from CS and deterrence)
  throwing_runs = c(8, 4, 2, 0, 6, 1, 3, -2, 5, 1),

  # Fielding (pop ups, bunts, etc.)
  fielding_runs = c(2, 1, 1, 0, 1, 0, 1, 0, 1, 0),

  # Offensive runs above average
  batting_runs = c(25, 20, 18, -10, 15, 10, 12, 8, -5, -15)
) %>%
  mutate(
    # Total defensive runs
    defensive_runs = framing_runs + blocking_runs + throwing_runs + fielding_runs,

    # Total runs above average
    total_runs = defensive_runs + batting_runs,

    # Convert to WAR (10 runs per win, plus positional adjustment)
    # Catchers get +12.5 run positional adjustment per 150 games
    games = innings / 9,
    positional_adj = (games / 150) * 12.5,
    war = (total_runs + positional_adj) / 10
  )

# Display comprehensive rankings
catcher_rankings <- catcher_performance %>%
  select(catcher, innings, framing_runs, blocking_runs, throwing_runs,
         defensive_runs, batting_runs, total_runs, war) %>%
  arrange(desc(war))

print(catcher_rankings)

# Analyze value components
value_decomposition <- catcher_performance %>%
  select(catcher, framing_runs, blocking_runs, throwing_runs,
         fielding_runs, batting_runs) %>%
  pivot_longer(cols = -catcher, names_to = "component", values_to = "runs") %>%
  mutate(component = str_remove(component, "_runs"))

# Stacked bar chart of value sources
ggplot(value_decomposition, aes(x = reorder(catcher, runs, sum),
                                 y = runs, fill = component)) +
  geom_col() +
  coord_flip() +
  labs(title = "Catcher Value Decomposition",
       subtitle = "Runs above average by component",
       x = "Catcher",
       y = "Runs Above Average",
       fill = "Component") +
  scale_fill_brewer(palette = "Set2") +
  theme_minimal()

# Correlation analysis between components
correlation_data <- catcher_performance %>%
  select(framing_runs, blocking_runs, throwing_runs, batting_runs)

cor_matrix <- cor(correlation_data)
print("Correlation between catcher skills:")
print(round(cor_matrix, 3))

# Heatmap of correlations
library(reshape2)
cor_melted <- melt(cor_matrix)

ggplot(cor_melted, aes(Var1, Var2, fill = value)) +
  geom_tile() +
  geom_text(aes(label = round(value, 2)), color = "white") +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red",
                       midpoint = 0, limit = c(-1, 1)) +
  labs(title = "Correlation Between Catcher Skills",
       x = "", y = "") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Identify catcher archetypes using clustering
library(cluster)

cluster_data <- catcher_performance %>%
  select(framing_runs, blocking_runs, throwing_runs, batting_runs) %>%
  scale()  # Normalize

kmeans_result <- kmeans(cluster_data, centers = 3, nstart = 25)

catcher_performance$archetype <- case_when(
  kmeans_result$cluster == 1 ~ "Elite Defender",
  kmeans_result$cluster == 2 ~ "Balanced",
  kmeans_result$cluster == 3 ~ "Offensive-Focused"
)

archetype_summary <- catcher_performance %>%
  group_by(archetype) %>%
  summarise(
    n = n(),
    avg_framing = mean(framing_runs),
    avg_blocking = mean(blocking_runs),
    avg_throwing = mean(throwing_runs),
    avg_batting = mean(batting_runs),
    avg_war = mean(war)
  )

print(archetype_summary)

# Visualize archetypes
ggplot(catcher_performance, aes(x = defensive_runs, y = batting_runs,
                                 color = archetype, label = catcher)) +
  geom_point(size = 4) +
  geom_text(vjust = -1, size = 3) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(title = "Catcher Archetypes: Defense vs. Offense",
       x = "Defensive Runs Above Average",
       y = "Batting Runs Above Average",
       color = "Archetype") +
  theme_minimal()

# Python: Comprehensive Catcher Value Model
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Create catcher performance dataset
catchers = ['Realmuto', 'Rutschman', 'Smith', 'Barnhart', 'Perez',
            'Grandal', 'Contreras', 'Kirk', 'Vazquez', 'Hedges']

catcher_df = pd.DataFrame({
    'catcher': catchers,
    'innings': [1200, 1100, 1150, 900, 1300, 1000, 1100, 950, 1050, 800],
    'framing_runs': [15, 12, 10, 22, -8, 18, -5, 5, 8, 25],
    'blocking_runs': [3, 4, 2, 3, 1, 2, -1, 1, 3, 4],
    'throwing_runs': [8, 4, 2, 0, 6, 1, 3, -2, 5, 1],
    'fielding_runs': [2, 1, 1, 0, 1, 0, 1, 0, 1, 0],
    'batting_runs': [25, 20, 18, -10, 15, 10, 12, 8, -5, -15]
})

# Calculate composite metrics
catcher_df['defensive_runs'] = (
    catcher_df['framing_runs'] +
    catcher_df['blocking_runs'] +
    catcher_df['throwing_runs'] +
    catcher_df['fielding_runs']
)

catcher_df['total_runs'] = catcher_df['defensive_runs'] + catcher_df['batting_runs']

# Positional adjustment and WAR
catcher_df['games'] = catcher_df['innings'] / 9
catcher_df['positional_adj'] = (catcher_df['games'] / 150) * 12.5
catcher_df['war'] = (catcher_df['total_runs'] + catcher_df['positional_adj']) / 10

# Display rankings
catcher_rankings = catcher_df.sort_values('war', ascending=False)
print("\nCatcher WAR Rankings:")
print(catcher_rankings[['catcher', 'defensive_runs', 'batting_runs',
                         'total_runs', 'war']].round(2))

# Value decomposition visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Stacked bar chart of value components
value_components = catcher_df[['catcher', 'framing_runs', 'blocking_runs',
                                'throwing_runs', 'fielding_runs', 'batting_runs']].set_index('catcher')

value_components.plot(kind='barh', stacked=True, ax=axes[0, 0])
axes[0, 0].set_xlabel('Runs Above Average')
axes[0, 0].set_title('Value Decomposition by Component')
axes[0, 0].legend(loc='best', fontsize=8)
axes[0, 0].axvline(x=0, color='black', linestyle='-', linewidth=0.5)

# 2. Defense vs Offense scatter
axes[0, 1].scatter(catcher_df['defensive_runs'], catcher_df['batting_runs'],
                   s=100, alpha=0.6)
for idx, row in catcher_df.iterrows():
    axes[0, 1].annotate(row['catcher'],
                        (row['defensive_runs'], row['batting_runs']),
                        fontsize=8, ha='right')
axes[0, 1].axhline(y=0, color='gray', linestyle='--', alpha=0.5)
axes[0, 1].axvline(x=0, color='gray', linestyle='--', alpha=0.5)
axes[0, 1].set_xlabel('Defensive Runs Above Average')
axes[0, 1].set_ylabel('Batting Runs Above Average')
axes[0, 1].set_title('Defense vs Offense Profile')
axes[0, 1].grid(True, alpha=0.3)

# 3. WAR bar chart
catcher_sorted = catcher_df.sort_values('war')
colors = ['green' if x > 0 else 'red' for x in catcher_sorted['war']]
axes[1, 0].barh(catcher_sorted['catcher'], catcher_sorted['war'], color=colors)
axes[1, 0].set_xlabel('WAR')
axes[1, 0].set_title('Total Catcher Value (WAR)')
axes[1, 0].axvline(x=0, color='black', linestyle='-', linewidth=0.5)

# 4. Correlation heatmap
corr_cols = ['framing_runs', 'blocking_runs', 'throwing_runs', 'batting_runs']
corr_matrix = catcher_df[corr_cols].corr()
sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='coolwarm',
            center=0, ax=axes[1, 1], square=True)
axes[1, 1].set_title('Skill Correlation Matrix')

plt.tight_layout()
plt.savefig('catcher_value_comprehensive.png', dpi=300, bbox_inches='tight')
plt.close()

# Cluster analysis for archetypes
cluster_features = catcher_df[['framing_runs', 'blocking_runs',
                                'throwing_runs', 'batting_runs']]
scaler = StandardScaler()
scaled_features = scaler.fit_transform(cluster_features)

kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
catcher_df['cluster'] = kmeans.fit_predict(scaled_features)

# Interpret clusters
cluster_summary = catcher_df.groupby('cluster').agg({
    'catcher': 'count',
    'framing_runs': 'mean',
    'blocking_runs': 'mean',
    'throwing_runs': 'mean',
    'batting_runs': 'mean',
    'war': 'mean'
}).round(2)

cluster_summary.columns = ['count', 'avg_framing', 'avg_blocking',
                           'avg_throwing', 'avg_batting', 'avg_war']

print("\nCatcher Archetype Clusters:")
print(cluster_summary)

# Assign archetype labels based on cluster characteristics
archetype_map = {}
for cluster_id in range(3):
    cluster_data = cluster_summary.loc[cluster_id]
    if cluster_data['avg_framing'] > 10:
        archetype_map[cluster_id] = 'Elite Framer'
    elif cluster_data['avg_batting'] > 10:
        archetype_map[cluster_id] = 'Offensive-Focused'
    else:
        archetype_map[cluster_id] = 'Balanced'

catcher_df['archetype'] = catcher_df['cluster'].map(archetype_map)

print("\nCatcher Archetypes:")
print(catcher_df[['catcher', 'archetype', 'war']].sort_values('war', ascending=False))

# Calculate replacement level value
# Replacement level typically around -2 WAR per 600 PA season
replacement_level_war = -1.5  # For ~1000 innings

catcher_df['war_above_replacement'] = catcher_df['war'] - replacement_level_war
catcher_df['runs_above_replacement'] = catcher_df['war_above_replacement'] * 10

# Estimate dollar value ($8M per WAR in modern market)
dollars_per_war = 8_000_000
catcher_df['estimated_value'] = catcher_df['war_above_replacement'] * dollars_per_war

print("\nEstimated Market Value:")
print(catcher_df[['catcher', 'war', 'war_above_replacement',
                   'estimated_value']].sort_values('estimated_value', ascending=False))

# Final comprehensive visualization
fig, ax = plt.subplots(figsize=(12, 8))

# Bubble chart: Defense vs Offense, size = WAR, color = archetype
scatter = ax.scatter(catcher_df['defensive_runs'],
                     catcher_df['batting_runs'],
                     s=catcher_df['war'].clip(lower=0) * 100,  # Size by WAR
                     c=catcher_df['cluster'],
                     alpha=0.6,
                     cmap='viridis',
                     edgecolors='black',
                     linewidth=1)

# Add labels
for idx, row in catcher_df.iterrows():
    ax.annotate(row['catcher'],
                (row['defensive_runs'], row['batting_runs']),
                fontsize=9, ha='center', va='bottom')

# Add quadrant lines
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5, linewidth=1)
ax.axvline(x=0, color='gray', linestyle='--', alpha=0.5, linewidth=1)

# Add quadrant labels
ax.text(20, 20, 'Elite Overall', fontsize=10, alpha=0.5, ha='center')
ax.text(20, -10, 'Defense Specialists', fontsize=10, alpha=0.5, ha='center')
ax.text(-5, 20, 'Offense-First', fontsize=10, alpha=0.5, ha='center')
ax.text(-5, -10, 'Below Average', fontsize=10, alpha=0.5, ha='center')

ax.set_xlabel('Defensive Runs Above Average', fontsize=12)
ax.set_ylabel('Batting Runs Above Average', fontsize=12)
ax.set_title('Catcher Value Profile\n(Bubble size = WAR)', fontsize=14)
ax.grid(True, alpha=0.3)

plt.colorbar(scatter, ax=ax, label='Archetype Cluster')
plt.tight_layout()
plt.savefig('catcher_value_profile.png', dpi=300, bbox_inches='tight')
plt.close()

print("\nAnalysis complete. All visualizations saved.")

Trade-offs and Value Optimization

Teams face strategic decisions about catcher allocation:

Offense vs. Defense Balance: How much offensive production to sacrifice for elite defense depends on:

Team offensive context (good offenses can afford defense-first catchers)

Pitching staff characteristics (young/wild staff benefits from elite defense)

Division competition (high-scoring division may require more offense)

Park factors (pitcher-friendly park reduces need for catcher offense)

Playing Time Distribution: Should teams:

Ride one elite catcher for 130+ games?

Split time between defensive/offensive specialists?

Match catchers to specific pitchers?

Research suggests that catcher defense value increases with playing time due to framing's high volume impact. However, catchers face significant physical demands, making durability management crucial.

The Future of Catcher Valuation

Several trends are shaping catcher analytics:

Automated Strike Zones: ABS implementation would eliminate framing value
PitchCom Technology: Reduces game-calling autonomy
Biomechanics Tracking: New data on receiving mechanics and injury risk
Advanced Blocking Metrics: Better models of blockable vs. unblockable pitches
Pitcher Pairing Optimization: Data-driven battery matching

Teams with sophisticated catcher development programs (Rays, Guardians, Cardinals) maintain competitive advantages through superior evaluation and player development.

# R: Composite Catcher Value Model
library(tidyverse)

# Create synthetic catcher performance data (per 1000 innings)
catcher_performance <- tibble(
  catcher = c("Realmuto", "Rutschman", "Smith", "Barnhart", "Perez",
              "Grandal", "Contreras", "Kirk", "Vazquez", "Hedges"),
  innings = c(1200, 1100, 1150, 900, 1300, 1000, 1100, 950, 1050, 800),

  # Framing (runs per 7000 called pitches, normalized to innings)
  framing_runs = c(15, 12, 10, 22, -8, 18, -5, 5, 8, 25),

  # Blocking (runs per 1000 pitches in dirt)
  blocking_runs = c(3, 4, 2, 3, 1, 2, -1, 1, 3, 4),

  # Throwing (runs from CS and deterrence)
  throwing_runs = c(8, 4, 2, 0, 6, 1, 3, -2, 5, 1),

  # Fielding (pop ups, bunts, etc.)
  fielding_runs = c(2, 1, 1, 0, 1, 0, 1, 0, 1, 0),

  # Offensive runs above average
  batting_runs = c(25, 20, 18, -10, 15, 10, 12, 8, -5, -15)
) %>%
  mutate(
    # Total defensive runs
    defensive_runs = framing_runs + blocking_runs + throwing_runs + fielding_runs,

    # Total runs above average
    total_runs = defensive_runs + batting_runs,

    # Convert to WAR (10 runs per win, plus positional adjustment)
    # Catchers get +12.5 run positional adjustment per 150 games
    games = innings / 9,
    positional_adj = (games / 150) * 12.5,
    war = (total_runs + positional_adj) / 10
  )

# Display comprehensive rankings
catcher_rankings <- catcher_performance %>%
  select(catcher, innings, framing_runs, blocking_runs, throwing_runs,
         defensive_runs, batting_runs, total_runs, war) %>%
  arrange(desc(war))

print(catcher_rankings)

# Analyze value components
value_decomposition <- catcher_performance %>%
  select(catcher, framing_runs, blocking_runs, throwing_runs,
         fielding_runs, batting_runs) %>%
  pivot_longer(cols = -catcher, names_to = "component", values_to = "runs") %>%
  mutate(component = str_remove(component, "_runs"))

# Stacked bar chart of value sources
ggplot(value_decomposition, aes(x = reorder(catcher, runs, sum),
                                 y = runs, fill = component)) +
  geom_col() +
  coord_flip() +
  labs(title = "Catcher Value Decomposition",
       subtitle = "Runs above average by component",
       x = "Catcher",
       y = "Runs Above Average",
       fill = "Component") +
  scale_fill_brewer(palette = "Set2") +
  theme_minimal()

# Correlation analysis between components
correlation_data <- catcher_performance %>%
  select(framing_runs, blocking_runs, throwing_runs, batting_runs)

cor_matrix <- cor(correlation_data)
print("Correlation between catcher skills:")
print(round(cor_matrix, 3))

# Heatmap of correlations
library(reshape2)
cor_melted <- melt(cor_matrix)

ggplot(cor_melted, aes(Var1, Var2, fill = value)) +
  geom_tile() +
  geom_text(aes(label = round(value, 2)), color = "white") +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red",
                       midpoint = 0, limit = c(-1, 1)) +
  labs(title = "Correlation Between Catcher Skills",
       x = "", y = "") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Identify catcher archetypes using clustering
library(cluster)

cluster_data <- catcher_performance %>%
  select(framing_runs, blocking_runs, throwing_runs, batting_runs) %>%
  scale()  # Normalize

kmeans_result <- kmeans(cluster_data, centers = 3, nstart = 25)

catcher_performance$archetype <- case_when(
  kmeans_result$cluster == 1 ~ "Elite Defender",
  kmeans_result$cluster == 2 ~ "Balanced",
  kmeans_result$cluster == 3 ~ "Offensive-Focused"
)

archetype_summary <- catcher_performance %>%
  group_by(archetype) %>%
  summarise(
    n = n(),
    avg_framing = mean(framing_runs),
    avg_blocking = mean(blocking_runs),
    avg_throwing = mean(throwing_runs),
    avg_batting = mean(batting_runs),
    avg_war = mean(war)
  )

print(archetype_summary)

# Visualize archetypes
ggplot(catcher_performance, aes(x = defensive_runs, y = batting_runs,
                                 color = archetype, label = catcher)) +
  geom_point(size = 4) +
  geom_text(vjust = -1, size = 3) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(title = "Catcher Archetypes: Defense vs. Offense",
       x = "Defensive Runs Above Average",
       y = "Batting Runs Above Average",
       color = "Archetype") +
  theme_minimal()

Python

# Python: Comprehensive Catcher Value Model
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Create catcher performance dataset
catchers = ['Realmuto', 'Rutschman', 'Smith', 'Barnhart', 'Perez',
            'Grandal', 'Contreras', 'Kirk', 'Vazquez', 'Hedges']

catcher_df = pd.DataFrame({
    'catcher': catchers,
    'innings': [1200, 1100, 1150, 900, 1300, 1000, 1100, 950, 1050, 800],
    'framing_runs': [15, 12, 10, 22, -8, 18, -5, 5, 8, 25],
    'blocking_runs': [3, 4, 2, 3, 1, 2, -1, 1, 3, 4],
    'throwing_runs': [8, 4, 2, 0, 6, 1, 3, -2, 5, 1],
    'fielding_runs': [2, 1, 1, 0, 1, 0, 1, 0, 1, 0],
    'batting_runs': [25, 20, 18, -10, 15, 10, 12, 8, -5, -15]
})

# Calculate composite metrics
catcher_df['defensive_runs'] = (
    catcher_df['framing_runs'] +
    catcher_df['blocking_runs'] +
    catcher_df['throwing_runs'] +
    catcher_df['fielding_runs']
)

catcher_df['total_runs'] = catcher_df['defensive_runs'] + catcher_df['batting_runs']

# Positional adjustment and WAR
catcher_df['games'] = catcher_df['innings'] / 9
catcher_df['positional_adj'] = (catcher_df['games'] / 150) * 12.5
catcher_df['war'] = (catcher_df['total_runs'] + catcher_df['positional_adj']) / 10

# Display rankings
catcher_rankings = catcher_df.sort_values('war', ascending=False)
print("\nCatcher WAR Rankings:")
print(catcher_rankings[['catcher', 'defensive_runs', 'batting_runs',
                         'total_runs', 'war']].round(2))

# Value decomposition visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Stacked bar chart of value components
value_components = catcher_df[['catcher', 'framing_runs', 'blocking_runs',
                                'throwing_runs', 'fielding_runs', 'batting_runs']].set_index('catcher')

value_components.plot(kind='barh', stacked=True, ax=axes[0, 0])
axes[0, 0].set_xlabel('Runs Above Average')
axes[0, 0].set_title('Value Decomposition by Component')
axes[0, 0].legend(loc='best', fontsize=8)
axes[0, 0].axvline(x=0, color='black', linestyle='-', linewidth=0.5)

# 2. Defense vs Offense scatter
axes[0, 1].scatter(catcher_df['defensive_runs'], catcher_df['batting_runs'],
                   s=100, alpha=0.6)
for idx, row in catcher_df.iterrows():
    axes[0, 1].annotate(row['catcher'],
                        (row['defensive_runs'], row['batting_runs']),
                        fontsize=8, ha='right')
axes[0, 1].axhline(y=0, color='gray', linestyle='--', alpha=0.5)
axes[0, 1].axvline(x=0, color='gray', linestyle='--', alpha=0.5)
axes[0, 1].set_xlabel('Defensive Runs Above Average')
axes[0, 1].set_ylabel('Batting Runs Above Average')
axes[0, 1].set_title('Defense vs Offense Profile')
axes[0, 1].grid(True, alpha=0.3)

# 3. WAR bar chart
catcher_sorted = catcher_df.sort_values('war')
colors = ['green' if x > 0 else 'red' for x in catcher_sorted['war']]
axes[1, 0].barh(catcher_sorted['catcher'], catcher_sorted['war'], color=colors)
axes[1, 0].set_xlabel('WAR')
axes[1, 0].set_title('Total Catcher Value (WAR)')
axes[1, 0].axvline(x=0, color='black', linestyle='-', linewidth=0.5)

# 4. Correlation heatmap
corr_cols = ['framing_runs', 'blocking_runs', 'throwing_runs', 'batting_runs']
corr_matrix = catcher_df[corr_cols].corr()
sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='coolwarm',
            center=0, ax=axes[1, 1], square=True)
axes[1, 1].set_title('Skill Correlation Matrix')

plt.tight_layout()
plt.savefig('catcher_value_comprehensive.png', dpi=300, bbox_inches='tight')
plt.close()

# Cluster analysis for archetypes
cluster_features = catcher_df[['framing_runs', 'blocking_runs',
                                'throwing_runs', 'batting_runs']]
scaler = StandardScaler()
scaled_features = scaler.fit_transform(cluster_features)

kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
catcher_df['cluster'] = kmeans.fit_predict(scaled_features)

# Interpret clusters
cluster_summary = catcher_df.groupby('cluster').agg({
    'catcher': 'count',
    'framing_runs': 'mean',
    'blocking_runs': 'mean',
    'throwing_runs': 'mean',
    'batting_runs': 'mean',
    'war': 'mean'
}).round(2)

cluster_summary.columns = ['count', 'avg_framing', 'avg_blocking',
                           'avg_throwing', 'avg_batting', 'avg_war']

print("\nCatcher Archetype Clusters:")
print(cluster_summary)

# Assign archetype labels based on cluster characteristics
archetype_map = {}
for cluster_id in range(3):
    cluster_data = cluster_summary.loc[cluster_id]
    if cluster_data['avg_framing'] > 10:
        archetype_map[cluster_id] = 'Elite Framer'
    elif cluster_data['avg_batting'] > 10:
        archetype_map[cluster_id] = 'Offensive-Focused'
    else:
        archetype_map[cluster_id] = 'Balanced'

catcher_df['archetype'] = catcher_df['cluster'].map(archetype_map)

print("\nCatcher Archetypes:")
print(catcher_df[['catcher', 'archetype', 'war']].sort_values('war', ascending=False))

# Calculate replacement level value
# Replacement level typically around -2 WAR per 600 PA season
replacement_level_war = -1.5  # For ~1000 innings

catcher_df['war_above_replacement'] = catcher_df['war'] - replacement_level_war
catcher_df['runs_above_replacement'] = catcher_df['war_above_replacement'] * 10

# Estimate dollar value ($8M per WAR in modern market)
dollars_per_war = 8_000_000
catcher_df['estimated_value'] = catcher_df['war_above_replacement'] * dollars_per_war

print("\nEstimated Market Value:")
print(catcher_df[['catcher', 'war', 'war_above_replacement',
                   'estimated_value']].sort_values('estimated_value', ascending=False))

# Final comprehensive visualization
fig, ax = plt.subplots(figsize=(12, 8))

# Bubble chart: Defense vs Offense, size = WAR, color = archetype
scatter = ax.scatter(catcher_df['defensive_runs'],
                     catcher_df['batting_runs'],
                     s=catcher_df['war'].clip(lower=0) * 100,  # Size by WAR
                     c=catcher_df['cluster'],
                     alpha=0.6,
                     cmap='viridis',
                     edgecolors='black',
                     linewidth=1)

# Add labels
for idx, row in catcher_df.iterrows():
    ax.annotate(row['catcher'],
                (row['defensive_runs'], row['batting_runs']),
                fontsize=9, ha='center', va='bottom')

# Add quadrant lines
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5, linewidth=1)
ax.axvline(x=0, color='gray', linestyle='--', alpha=0.5, linewidth=1)

# Add quadrant labels
ax.text(20, 20, 'Elite Overall', fontsize=10, alpha=0.5, ha='center')
ax.text(20, -10, 'Defense Specialists', fontsize=10, alpha=0.5, ha='center')
ax.text(-5, 20, 'Offense-First', fontsize=10, alpha=0.5, ha='center')
ax.text(-5, -10, 'Below Average', fontsize=10, alpha=0.5, ha='center')

ax.set_xlabel('Defensive Runs Above Average', fontsize=12)
ax.set_ylabel('Batting Runs Above Average', fontsize=12)
ax.set_title('Catcher Value Profile\n(Bubble size = WAR)', fontsize=14)
ax.grid(True, alpha=0.3)

plt.colorbar(scatter, ax=ax, label='Archetype Cluster')
plt.tight_layout()
plt.savefig('catcher_value_profile.png', dpi=300, bbox_inches='tight')
plt.close()

print("\nAnalysis complete. All visualizations saved.")

20.7 Interactive Catcher Dashboards

Modern catcher analytics demands interactive visualizations that allow scouts, coaches, and analysts to explore multi-dimensional performance data dynamically. Static charts and tables, while informative, cannot capture the complexity of catcher performance across different pitch locations, game contexts, and time periods. This section demonstrates how to build interactive dashboards using Plotly and Shiny (R) or Dash (Python) to create powerful analytical tools for catcher evaluation.

Interactive visualizations offer several key advantages for catcher analytics:

Dynamic Filtering: Users can instantly filter by date range, opponent, pitcher, or game situation to isolate specific performance contexts
Multi-Dimensional Exploration: Hover interactions reveal detailed pitch-level information while maintaining overall pattern visibility
Comparative Analysis: Side-by-side or overlapping visualizations enable direct catcher comparisons
Real-Time Updates: Dashboards can be connected to live data sources for in-season performance tracking
Exportable Insights: Interactive plots can be saved as HTML files for sharing with coaches and front office staff

Interactive Framing Heat Map

The framing heat map visualization shows strike probability by pitch location, allowing users to identify where each catcher excels or struggles with pitch receiving. Interactive features include location-specific statistics, zone overlays, and catcher comparison modes.

# R: Interactive Framing Heat Map with Plotly
library(plotly)
library(tidyverse)
library(htmlwidgets)

# Prepare framing data with location bins
create_framing_heatmap_data <- function(pitch_data, catcher_name) {
  pitch_data %>%
    filter(catcher == catcher_name) %>%
    mutate(
      # Create location bins (20x20 grid)
      x_bin = cut(plate_x, breaks = seq(-2, 2, length.out = 21),
                  labels = FALSE, include.lowest = TRUE),
      z_bin = cut(plate_z, breaks = seq(0, 5, length.out = 21),
                  labels = FALSE, include.lowest = TRUE),
      # Calculate bin centers
      x_center = seq(-1.9, 1.9, length.out = 20)[x_bin],
      z_center = seq(0.125, 4.875, length.out = 20)[z_bin]
    ) %>%
    group_by(x_center, z_center) %>%
    summarise(
      pitches = n(),
      actual_strike_rate = mean(called_strike, na.rm = TRUE),
      expected_strike_rate = mean(expected_strike, na.rm = TRUE),
      framing_value = actual_strike_rate - expected_strike_rate,
      .groups = "drop"
    ) %>%
    filter(pitches >= 5)  # Minimum sample size per bin
}

# Create interactive heat map
create_interactive_framing_heatmap <- function(pitch_data, catcher_name) {
  heatmap_data <- create_framing_heatmap_data(pitch_data, catcher_name)

  # Create matrix for heatmap
  x_coords <- sort(unique(heatmap_data$x_center))
  z_coords <- sort(unique(heatmap_data$z_center))

  framing_matrix <- matrix(NA, nrow = length(z_coords), ncol = length(x_coords))

  for (i in seq_len(nrow(heatmap_data))) {
    row_idx <- which(z_coords == heatmap_data$z_center[i])
    col_idx <- which(x_coords == heatmap_data$x_center[i])
    framing_matrix[row_idx, col_idx] <- heatmap_data$framing_value[i]
  }

  # Create custom hover text matrix
  hover_matrix <- matrix("", nrow = length(z_coords), ncol = length(x_coords))
  for (i in seq_len(nrow(heatmap_data))) {
    row_idx <- which(z_coords == heatmap_data$z_center[i])
    col_idx <- which(x_coords == heatmap_data$x_center[i])
    hover_matrix[row_idx, col_idx] <- paste0(
      "Location: (", round(heatmap_data$x_center[i], 2), ", ",
      round(heatmap_data$z_center[i], 2), ")<br>",
      "Pitches: ", heatmap_data$pitches[i], "<br>",
      "Actual Strike %: ", round(heatmap_data$actual_strike_rate[i] * 100, 1), "%<br>",
      "Expected Strike %: ", round(heatmap_data$expected_strike_rate[i] * 100, 1), "%<br>",
      "Framing Value: ", round(heatmap_data$framing_value[i] * 100, 1), "%"
    )
  }

  # Create plotly heatmap
  fig <- plot_ly(
    x = x_coords,
    y = z_coords,
    z = framing_matrix,
    type = "heatmap",
    colorscale = list(
      c(0, "rgb(220, 50, 50)"),      # Red for negative
      c(0.5, "rgb(255, 255, 255)"),  # White for neutral
      c(1, "rgb(50, 50, 220)")       # Blue for positive
    ),
    zmid = 0,
    zmin = -0.15,
    zmax = 0.15,
    colorbar = list(title = "Framing<br>Value"),
    hovertemplate = paste0(
      "%{text}<extra></extra>"
    ),
    text = hover_matrix
  ) %>%
    # Add strike zone rectangle
    add_segments(
      x = -0.708, xend = 0.708, y = 1.5, yend = 1.5,
      line = list(color = "black", width = 3),
      showlegend = FALSE, inherit = FALSE
    ) %>%
    add_segments(
      x = -0.708, xend = 0.708, y = 3.5, yend = 3.5,
      line = list(color = "black", width = 3),
      showlegend = FALSE, inherit = FALSE
    ) %>%
    add_segments(
      x = -0.708, xend = -0.708, y = 1.5, yend = 3.5,
      line = list(color = "black", width = 3),
      showlegend = FALSE, inherit = FALSE
    ) %>%
    add_segments(
      x = 0.708, xend = 0.708, y = 1.5, yend = 3.5,
      line = list(color = "black", width = 3),
      showlegend = FALSE, inherit = FALSE
    ) %>%
    layout(
      title = list(
        text = paste0("<b>", catcher_name, " - Pitch Framing Heat Map</b>"),
        x = 0.5,
        xanchor = "center"
      ),
      xaxis = list(
        title = "Horizontal Location (feet)",
        range = c(-2, 2),
        constrain = "domain"
      ),
      yaxis = list(
        title = "Vertical Location (feet)",
        range = c(0, 5),
        scaleanchor = "x",
        scaleratio = 1
      ),
      plot_bgcolor = "rgb(240, 240, 240)",
      paper_bgcolor = "white"
    ) %>%
    config(displayModeBar = TRUE,
           modeBarButtonsToRemove = c("lasso2d", "select2d"))

  return(fig)
}

# Example usage with sample data
set.seed(2023)
sample_pitch_data <- tibble(
  catcher = sample(c("Realmuto", "Barnhart", "Perez"), 10000, replace = TRUE),
  plate_x = rnorm(10000, 0, 0.8),
  plate_z = rnorm(10000, 2.5, 0.6),
  dist_from_center = sqrt(plate_x^2 + (plate_z - 2.5)^2),
  strike_prob = plogis(2 - 2.5 * dist_from_center),
  catcher_effect = case_when(
    catcher == "Barnhart" ~ 0.4,
    catcher == "Realmuto" ~ 0.3,
    catcher == "Perez" ~ -0.2
  ),
  expected_strike = strike_prob,
  called_strike = rbinom(10000, 1, plogis(qlogis(strike_prob) + catcher_effect))
)

# Create and display interactive heatmap
framing_heatmap <- create_interactive_framing_heatmap(sample_pitch_data, "Barnhart")
framing_heatmap

# Save as HTML file
htmlwidgets::saveWidget(framing_heatmap, "framing_heatmap.html",
                        selfcontained = TRUE)

# Python: Interactive Framing Heat Map with Plotly
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy.stats import binned_statistic_2d

def create_framing_heatmap_data(pitch_data, catcher_name):
    """Prepare binned framing data for heatmap visualization"""
    catcher_data = pitch_data[pitch_data['catcher'] == catcher_name].copy()

    # Create 20x20 grid
    x_bins = np.linspace(-2, 2, 21)
    z_bins = np.linspace(0, 5, 21)

    # Calculate statistics for each bin
    actual_strikes, x_edges, z_edges, _ = binned_statistic_2d(
        catcher_data['plate_x'], catcher_data['plate_z'],
        catcher_data['called_strike'],
        statistic='mean', bins=[x_bins, z_bins]
    )

    expected_strikes, _, _, _ = binned_statistic_2d(
        catcher_data['plate_x'], catcher_data['plate_z'],
        catcher_data['expected_strike'],
        statistic='mean', bins=[x_bins, z_bins]
    )

    pitch_counts, _, _, _ = binned_statistic_2d(
        catcher_data['plate_x'], catcher_data['plate_z'],
        catcher_data['called_strike'],
        statistic='count', bins=[x_bins, z_bins]
    )

    # Calculate framing value
    framing_value = actual_strikes - expected_strikes

    # Mask bins with insufficient data
    framing_value[pitch_counts < 5] = np.nan
    actual_strikes[pitch_counts < 5] = np.nan
    expected_strikes[pitch_counts < 5] = np.nan

    return {
        'framing_value': framing_value,
        'actual_strikes': actual_strikes,
        'expected_strikes': expected_strikes,
        'pitch_counts': pitch_counts,
        'x_centers': (x_edges[:-1] + x_edges[1:]) / 2,
        'z_centers': (z_edges[:-1] + z_edges[1:]) / 2
    }

def create_interactive_framing_heatmap(pitch_data, catcher_name):
    """Create interactive Plotly heatmap for pitch framing"""
    data = create_framing_heatmap_data(pitch_data, catcher_name)

    # Create custom hover text
    hover_text = []
    for i in range(len(data['z_centers'])):
        hover_row = []
        for j in range(len(data['x_centers'])):
            if np.isnan(data['framing_value'][i, j]):
                hover_row.append('')
            else:
                text = (
                    f"Location: ({data['x_centers'][j]:.2f}, {data['z_centers'][i]:.2f})<br>"
                    f"Pitches: {int(data['pitch_counts'][i, j])}<br>"
                    f"Actual Strike %: {data['actual_strikes'][i, j]*100:.1f}%<br>"
                    f"Expected Strike %: {data['expected_strikes'][i, j]*100:.1f}%<br>"
                    f"Framing Value: {data['framing_value'][i, j]*100:+.1f}%"
                )
                hover_row.append(text)
        hover_text.append(hover_row)

    # Create figure
    fig = go.Figure()

    # Add heatmap
    fig.add_trace(go.Heatmap(
        x=data['x_centers'],
        y=data['z_centers'],
        z=data['framing_value'],
        colorscale=[
            [0, 'rgb(220, 50, 50)'],      # Red for negative
            [0.5, 'rgb(255, 255, 255)'],  # White for neutral
            [1, 'rgb(50, 50, 220)']       # Blue for positive
        ],
        zmid=0,
        zmin=-0.15,
        zmax=0.15,
        colorbar=dict(title="Framing<br>Value"),
        hovertemplate='%{text}<extra></extra>',
        text=hover_text
    ))

    # Add strike zone rectangle
    zone_x = [-0.708, 0.708, 0.708, -0.708, -0.708]
    zone_z = [1.5, 1.5, 3.5, 3.5, 1.5]

    fig.add_trace(go.Scatter(
        x=zone_x, y=zone_z,
        mode='lines',
        line=dict(color='black', width=3),
        showlegend=False,
        hoverinfo='skip'
    ))

    # Update layout
    fig.update_layout(
        title=dict(
            text=f"<b>{catcher_name} - Pitch Framing Heat Map</b>",
            x=0.5,
            xanchor='center',
            font=dict(size=18)
        ),
        xaxis=dict(
            title="Horizontal Location (feet)",
            range=[-2, 2],
            constrain='domain'
        ),
        yaxis=dict(
            title="Vertical Location (feet)",
            range=[0, 5],
            scaleanchor="x",
            scaleratio=1
        ),
        plot_bgcolor='rgb(240, 240, 240)',
        paper_bgcolor='white',
        width=700,
        height=700
    )

    return fig

# Example usage with sample data
np.random.seed(2023)
n = 10000

def inv_logit(x):
    return 1 / (1 + np.exp(-x))

sample_pitch_data = pd.DataFrame({
    'catcher': np.random.choice(['Realmuto', 'Barnhart', 'Perez'], n),
    'plate_x': np.random.normal(0, 0.8, n),
    'plate_z': np.random.normal(2.5, 0.6, n)
})

sample_pitch_data['dist_from_center'] = np.sqrt(
    sample_pitch_data['plate_x']**2 +
    (sample_pitch_data['plate_z'] - 2.5)**2
)

catcher_effects = {'Barnhart': 0.4, 'Realmuto': 0.3, 'Perez': -0.2}
sample_pitch_data['catcher_effect'] = sample_pitch_data['catcher'].map(catcher_effects)

base_logit = 2 - 2.5 * sample_pitch_data['dist_from_center']
sample_pitch_data['expected_strike'] = inv_logit(base_logit)
strike_prob = inv_logit(base_logit + sample_pitch_data['catcher_effect'])
sample_pitch_data['called_strike'] = np.random.binomial(1, strike_prob)

# Create and display interactive heatmap
framing_heatmap = create_interactive_framing_heatmap(sample_pitch_data, 'Barnhart')
framing_heatmap.show()

# Save as HTML file
framing_heatmap.write_html("framing_heatmap.html")

Interactive Catcher Comparison Radar Chart

Radar charts excel at displaying multi-dimensional catcher performance, allowing quick visual comparison across framing, blocking, throwing, and offensive contributions. Interactive radar charts add tooltips with exact values and enable toggling between multiple catchers.

# R: Interactive Catcher Comparison Radar Chart
library(plotly)

create_catcher_radar_chart <- function(catcher_stats, catchers_to_compare) {
  # Filter to selected catchers
  plot_data <- catcher_stats %>%
    filter(catcher %in% catchers_to_compare)

  # Normalize metrics to 0-100 scale for radar chart
  metrics <- c("framing_runs", "blocking_runs", "throwing_runs",
               "fielding_runs", "batting_runs")

  normalized_data <- plot_data %>%
    mutate(across(all_of(metrics),
                  ~scales::rescale(., to = c(0, 100),
                                   from = range(catcher_stats[[cur_column()]]))))

  # Create plotly radar chart
  fig <- plot_ly(
    type = 'scatterpolar',
    fill = 'toself'
  )

  # Add trace for each catcher
  for (i in seq_len(nrow(normalized_data))) {
    catcher_row <- normalized_data[i, ]

    fig <- fig %>%
      add_trace(
        r = c(catcher_row$framing_runs, catcher_row$blocking_runs,
              catcher_row$throwing_runs, catcher_row$fielding_runs,
              catcher_row$batting_runs),
        theta = c('Framing', 'Blocking', 'Throwing', 'Fielding', 'Batting'),
        name = catcher_row$catcher,
        mode = 'lines+markers',
        marker = list(size = 8),
        line = list(width = 2),
        hovertemplate = paste0(
          "<b>%{theta}</b><br>",
          "Percentile: %{r:.1f}<br>",
          "<extra>", catcher_row$catcher, "</extra>"
        )
      )
  }

  fig <- fig %>%
    layout(
      polar = list(
        radialaxis = list(
          visible = TRUE,
          range = c(0, 100),
          ticktext = c("0", "25", "50", "75", "100"),
          tickvals = c(0, 25, 50, 75, 100)
        )
      ),
      title = list(
        text = "<b>Catcher Performance Comparison</b>",
        x = 0.5,
        xanchor = "center"
      ),
      showlegend = TRUE,
      legend = list(
        orientation = "v",
        x = 1.1,
        y = 0.5
      )
    )

  return(fig)
}

# Create sample catcher statistics
catcher_comparison_stats <- tibble(
  catcher = c("Realmuto", "Rutschman", "Smith", "Barnhart", "Perez"),
  framing_runs = c(15, 12, 10, 22, -8),
  blocking_runs = c(3, 4, 2, 3, 1),
  throwing_runs = c(8, 4, 2, 0, 6),
  fielding_runs = c(2, 1, 1, 0, 1),
  batting_runs = c(25, 20, 18, -10, 15)
)

# Create interactive radar chart
radar_chart <- create_catcher_radar_chart(
  catcher_comparison_stats,
  c("Realmuto", "Barnhart", "Perez")
)
radar_chart

# Save as HTML
htmlwidgets::saveWidget(radar_chart, "catcher_radar.html", selfcontained = TRUE)

# Python: Interactive Catcher Comparison Radar Chart
import plotly.graph_objects as go
import pandas as pd
import numpy as np

def create_catcher_radar_chart(catcher_stats, catchers_to_compare):
    """Create interactive radar chart comparing multiple catchers"""
    # Filter to selected catchers
    plot_data = catcher_stats[catcher_stats['catcher'].isin(catchers_to_compare)].copy()

    # Metrics to display
    metrics = ['framing_runs', 'blocking_runs', 'throwing_runs',
               'fielding_runs', 'batting_runs']
    metric_labels = ['Framing', 'Blocking', 'Throwing', 'Fielding', 'Batting']

    # Normalize to 0-100 percentile scale
    for metric in metrics:
        min_val = catcher_stats[metric].min()
        max_val = catcher_stats[metric].max()
        plot_data[f'{metric}_norm'] = (
            (plot_data[metric] - min_val) / (max_val - min_val) * 100
        )

    # Create figure
    fig = go.Figure()

    # Add trace for each catcher
    colors = ['rgb(31, 119, 180)', 'rgb(255, 127, 14)', 'rgb(44, 160, 44)',
              'rgb(214, 39, 40)', 'rgb(148, 103, 189)']

    for idx, (_, catcher_row) in enumerate(plot_data.iterrows()):
        values = [catcher_row[f'{m}_norm'] for m in metrics]
        actual_values = [catcher_row[m] for m in metrics]

        # Create hover text with actual values
        hover_text = [
            f"<b>{label}</b><br>Percentile: {val:.1f}<br>Actual: {actual:+.1f} runs"
            for label, val, actual in zip(metric_labels, values, actual_values)
        ]

        fig.add_trace(go.Scatterpolar(
            r=values,
            theta=metric_labels,
            fill='toself',
            name=catcher_row['catcher'],
            line=dict(color=colors[idx % len(colors)], width=2),
            marker=dict(size=8),
            hovertemplate='%{text}<extra>' + catcher_row['catcher'] + '</extra>',
            text=hover_text
        ))

    # Update layout
    fig.update_layout(
        polar=dict(
            radialaxis=dict(
                visible=True,
                range=[0, 100],
                ticktext=['0', '25', '50', '75', '100'],
                tickvals=[0, 25, 50, 75, 100]
            )
        ),
        title=dict(
            text="<b>Catcher Performance Comparison</b>",
            x=0.5,
            xanchor='center',
            font=dict(size=18)
        ),
        showlegend=True,
        legend=dict(
            orientation="v",
            x=1.1,
            y=0.5
        ),
        width=800,
        height=600
    )

    return fig

# Create sample catcher statistics
catcher_comparison_stats = pd.DataFrame({
    'catcher': ['Realmuto', 'Rutschman', 'Smith', 'Barnhart', 'Perez'],
    'framing_runs': [15, 12, 10, 22, -8],
    'blocking_runs': [3, 4, 2, 3, 1],
    'throwing_runs': [8, 4, 2, 0, 6],
    'fielding_runs': [2, 1, 1, 0, 1],
    'batting_runs': [25, 20, 18, -10, 15]
})

# Create interactive radar chart
radar_chart = create_catcher_radar_chart(
    catcher_comparison_stats,
    ['Realmuto', 'Barnhart', 'Perez']
)
radar_chart.show()

# Save as HTML
radar_chart.write_html("catcher_radar.html")

Interactive Pop Time Distribution with Filtering

Pop time analysis benefits from interactive filtering that allows users to examine performance under specific conditions: pitcher handedness, base, game situation, and temporal trends. This visualization combines histograms, box plots, and trend lines with dynamic filtering controls.

# R: Interactive Pop Time Analysis with Filtering
library(plotly)
library(crosstalk)

create_pop_time_dashboard <- function(throwing_data) {
  # Create shared data for crosstalk filtering
  shared_data <- SharedData$new(throwing_data)

  # Create distribution plot
  pop_time_hist <- plot_ly(shared_data, x = ~pop_time, color = ~catcher,
                            type = "histogram", alpha = 0.6, nbinsx = 30) %>%
    layout(
      title = "Pop Time Distribution",
      xaxis = list(title = "Pop Time (seconds)"),
      yaxis = list(title = "Count"),
      barmode = "overlay"
    )

  # Create box plot by catcher
  pop_time_box <- plot_ly(shared_data, y = ~pop_time, color = ~catcher,
                          type = "box") %>%
    layout(
      title = "Pop Time by Catcher",
      yaxis = list(title = "Pop Time (seconds)"),
      xaxis = list(title = "Catcher")
    )

  # Create scatter plot: pop time vs caught stealing
  pop_time_scatter <- plot_ly(shared_data,
                              x = ~pop_time,
                              y = ~caught_stealing,
                              color = ~catcher,
                              type = "scatter",
                              mode = "markers",
                              marker = list(size = 6, opacity = 0.6),
                              text = ~paste(
                                "Catcher:", catcher, "<br>",
                                "Pop Time:", round(pop_time, 2), "s<br>",
                                "Result:", ifelse(caught_stealing, "CS", "SB"), "<br>",
                                "Runner Speed:", round(runner_speed, 1), "ft/s"
                              ),
                              hoverinfo = "text") %>%
    layout(
      title = "Pop Time vs Outcome",
      xaxis = list(title = "Pop Time (seconds)"),
      yaxis = list(title = "Caught Stealing", tickvals = c(0, 1),
                   ticktext = c("Safe", "Out"))
    )

  # Combine plots using subplot
  combined_plot <- subplot(
    pop_time_hist,
    pop_time_box,
    pop_time_scatter,
    nrows = 2,
    heights = c(0.4, 0.6),
    shareX = FALSE,
    titleX = TRUE,
    titleY = TRUE
  ) %>%
    layout(
      title = list(
        text = "<b>Interactive Pop Time Analysis Dashboard</b>",
        x = 0.5,
        xanchor = "center"
      ),
      showlegend = TRUE
    )

  return(combined_plot)
}

# Create sample throwing data
set.seed(2025)
n_attempts <- 1000

sample_throwing_data <- tibble(
  catcher = sample(c("Realmuto", "Perez", "Rutschman"), n_attempts, replace = TRUE),
  runner_speed = rnorm(n_attempts, 27, 1.5),
  pitcher_time = rnorm(n_attempts, 1.3, 0.15),
  pitcher_hand = sample(c("R", "L"), n_attempts, replace = TRUE)
) %>%
  mutate(
    pop_time = case_when(
      catcher == "Realmuto" ~ rnorm(n(), 1.87, 0.08),
      catcher == "Perez" ~ rnorm(n(), 1.92, 0.10),
      catcher == "Rutschman" ~ rnorm(n(), 1.95, 0.09)
    ),
    total_time_catcher = pitcher_time + pop_time,
    total_time_runner = (127 - 12) / runner_speed,
    caught_stealing = as.numeric(
      total_time_catcher < total_time_runner + rnorm(n(), 0, 0.1)
    )
  )

# Create dashboard
pop_time_dashboard <- create_pop_time_dashboard(sample_throwing_data)
pop_time_dashboard

# Save as HTML
htmlwidgets::saveWidget(pop_time_dashboard, "pop_time_dashboard.html",
                        selfcontained = TRUE)

# Python: Interactive Pop Time Analysis with Filtering
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
import numpy as np

def create_pop_time_dashboard(throwing_data):
    """Create interactive dashboard for pop time analysis"""

    # Create figure with subplots
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Pop Time Distribution', 'Pop Time by Catcher',
                       'Pop Time vs Outcome', 'Success Rate by Pop Time Range'),
        specs=[[{"type": "histogram"}, {"type": "box"}],
               [{"type": "scatter"}, {"type": "bar"}]],
        vertical_spacing=0.12,
        horizontal_spacing=0.1
    )

    # Get unique catchers for color mapping
    catchers = throwing_data['catcher'].unique()
    colors = ['rgb(31, 119, 180)', 'rgb(255, 127, 14)', 'rgb(44, 160, 44)']
    color_map = {catcher: colors[i % len(colors)] for i, catcher in enumerate(catchers)}

    # 1. Histogram of pop times
    for catcher in catchers:
        data = throwing_data[throwing_data['catcher'] == catcher]
        fig.add_trace(
            go.Histogram(
                x=data['pop_time'],
                name=catcher,
                marker_color=color_map[catcher],
                opacity=0.6,
                nbinsx=30,
                showlegend=True
            ),
            row=1, col=1
        )

    # 2. Box plot by catcher
    for catcher in catchers:
        data = throwing_data[throwing_data['catcher'] == catcher]
        fig.add_trace(
            go.Box(
                y=data['pop_time'],
                name=catcher,
                marker_color=color_map[catcher],
                showlegend=False
            ),
            row=1, col=2
        )

    # 3. Scatter plot: pop time vs outcome
    for catcher in catchers:
        data = throwing_data[throwing_data['catcher'] == catcher]
        hover_text = [
            f"Catcher: {catcher}<br>Pop Time: {pt:.2f}s<br>"
            f"Result: {'CS' if cs else 'SB'}<br>Runner Speed: {rs:.1f} ft/s"
            for pt, cs, rs in zip(data['pop_time'], data['caught_stealing'],
                                  data['runner_speed'])
        ]

        fig.add_trace(
            go.Scatter(
                x=data['pop_time'],
                y=data['caught_stealing'] + np.random.uniform(-0.05, 0.05, len(data)),
                mode='markers',
                name=catcher,
                marker=dict(size=6, opacity=0.6, color=color_map[catcher]),
                text=hover_text,
                hovertemplate='%{text}<extra></extra>',
                showlegend=False
            ),
            row=2, col=1
        )

    # 4. Success rate by pop time bins
    pop_time_bins = pd.cut(throwing_data['pop_time'],
                           bins=[0, 1.85, 1.90, 1.95, 2.00, 2.10, np.inf],
                           labels=['<1.85', '1.85-1.90', '1.90-1.95',
                                  '1.95-2.00', '2.00-2.10', '>2.10'])
    throwing_data['pop_time_bin'] = pop_time_bins

    bin_stats = throwing_data.groupby('pop_time_bin')['caught_stealing'].agg([
        ('count', 'count'),
        ('cs_rate', 'mean')
    ]).reset_index()

    fig.add_trace(
        go.Bar(
            x=bin_stats['pop_time_bin'].astype(str),
            y=bin_stats['cs_rate'] * 100,
            text=[f"{rate:.1f}%" for rate in bin_stats['cs_rate'] * 100],
            textposition='outside',
            marker_color='rgb(55, 83, 109)',
            showlegend=False
        ),
        row=2, col=2
    )

    # Update axes
    fig.update_xaxes(title_text="Pop Time (seconds)", row=1, col=1)
    fig.update_xaxes(title_text="Catcher", row=1, col=2)
    fig.update_xaxes(title_text="Pop Time (seconds)", row=2, col=1)
    fig.update_xaxes(title_text="Pop Time Range", row=2, col=2)

    fig.update_yaxes(title_text="Count", row=1, col=1)
    fig.update_yaxes(title_text="Pop Time (seconds)", row=1, col=2)
    fig.update_yaxes(title_text="Outcome (0=SB, 1=CS)", row=2, col=1)
    fig.update_yaxes(title_text="CS Rate (%)", row=2, col=2)

    # Update layout
    fig.update_layout(
        title=dict(
            text="<b>Interactive Pop Time Analysis Dashboard</b>",
            x=0.5,
            xanchor='center',
            font=dict(size=18)
        ),
        height=800,
        width=1200,
        showlegend=True,
        legend=dict(x=1.02, y=0.98),
        barmode='overlay'
    )

    return fig

# Create sample throwing data
np.random.seed(2025)
n_attempts = 1000

pop_time_map = {
    'Realmuto': (1.87, 0.08),
    'Perez': (1.92, 0.10),
    'Rutschman': (1.95, 0.09)
}

sample_throwing_data = pd.DataFrame({
    'catcher': np.random.choice(['Realmuto', 'Perez', 'Rutschman'], n_attempts),
    'runner_speed': np.random.normal(27, 1.5, n_attempts),
    'pitcher_time': np.random.normal(1.3, 0.15, n_attempts),
    'pitcher_hand': np.random.choice(['R', 'L'], n_attempts)
})

sample_throwing_data['pop_time'] = sample_throwing_data['catcher'].apply(
    lambda x: np.random.normal(pop_time_map[x][0], pop_time_map[x][1])
)

sample_throwing_data['total_time_catcher'] = (
    sample_throwing_data['pitcher_time'] + sample_throwing_data['pop_time']
)
sample_throwing_data['total_time_runner'] = (127 - 12) / sample_throwing_data['runner_speed']
sample_throwing_data['caught_stealing'] = (
    sample_throwing_data['total_time_catcher'] <
    sample_throwing_data['total_time_runner'] + np.random.normal(0, 0.1, n_attempts)
).astype(int)

# Create and display dashboard
pop_time_dashboard = create_pop_time_dashboard(sample_throwing_data)
pop_time_dashboard.show()

# Save as HTML
pop_time_dashboard.write_html("pop_time_dashboard.html")

These interactive visualizations transform static catcher analytics into dynamic exploration tools. The framing heat map reveals location-specific receiving skill, the radar chart enables multi-dimensional comparisons, and the pop time dashboard provides comprehensive throwing analysis with real-time filtering. By incorporating these interactive elements into scouting reports and front office presentations, teams can make more informed decisions about catcher acquisition, development, and deployment. The ability to export these visualizations as standalone HTML files ensures that insights can be easily shared across organizations without requiring specialized software or programming knowledge.

# R: Interactive Framing Heat Map with Plotly
library(plotly)
library(tidyverse)
library(htmlwidgets)

# Prepare framing data with location bins
create_framing_heatmap_data <- function(pitch_data, catcher_name) {
  pitch_data %>%
    filter(catcher == catcher_name) %>%
    mutate(
      # Create location bins (20x20 grid)
      x_bin = cut(plate_x, breaks = seq(-2, 2, length.out = 21),
                  labels = FALSE, include.lowest = TRUE),
      z_bin = cut(plate_z, breaks = seq(0, 5, length.out = 21),
                  labels = FALSE, include.lowest = TRUE),
      # Calculate bin centers
      x_center = seq(-1.9, 1.9, length.out = 20)[x_bin],
      z_center = seq(0.125, 4.875, length.out = 20)[z_bin]
    ) %>%
    group_by(x_center, z_center) %>%
    summarise(
      pitches = n(),
      actual_strike_rate = mean(called_strike, na.rm = TRUE),
      expected_strike_rate = mean(expected_strike, na.rm = TRUE),
      framing_value = actual_strike_rate - expected_strike_rate,
      .groups = "drop"
    ) %>%
    filter(pitches >= 5)  # Minimum sample size per bin
}

# Create interactive heat map
create_interactive_framing_heatmap <- function(pitch_data, catcher_name) {
  heatmap_data <- create_framing_heatmap_data(pitch_data, catcher_name)

  # Create matrix for heatmap
  x_coords <- sort(unique(heatmap_data$x_center))
  z_coords <- sort(unique(heatmap_data$z_center))

  framing_matrix <- matrix(NA, nrow = length(z_coords), ncol = length(x_coords))

  for (i in seq_len(nrow(heatmap_data))) {
    row_idx <- which(z_coords == heatmap_data$z_center[i])
    col_idx <- which(x_coords == heatmap_data$x_center[i])
    framing_matrix[row_idx, col_idx] <- heatmap_data$framing_value[i]
  }

  # Create custom hover text matrix
  hover_matrix <- matrix("", nrow = length(z_coords), ncol = length(x_coords))
  for (i in seq_len(nrow(heatmap_data))) {
    row_idx <- which(z_coords == heatmap_data$z_center[i])
    col_idx <- which(x_coords == heatmap_data$x_center[i])
    hover_matrix[row_idx, col_idx] <- paste0(
      "Location: (", round(heatmap_data$x_center[i], 2), ", ",
      round(heatmap_data$z_center[i], 2), ")<br>",
      "Pitches: ", heatmap_data$pitches[i], "<br>",
      "Actual Strike %: ", round(heatmap_data$actual_strike_rate[i] * 100, 1), "%<br>",
      "Expected Strike %: ", round(heatmap_data$expected_strike_rate[i] * 100, 1), "%<br>",
      "Framing Value: ", round(heatmap_data$framing_value[i] * 100, 1), "%"
    )
  }

  # Create plotly heatmap
  fig <- plot_ly(
    x = x_coords,
    y = z_coords,
    z = framing_matrix,
    type = "heatmap",
    colorscale = list(
      c(0, "rgb(220, 50, 50)"),      # Red for negative
      c(0.5, "rgb(255, 255, 255)"),  # White for neutral
      c(1, "rgb(50, 50, 220)")       # Blue for positive
    ),
    zmid = 0,
    zmin = -0.15,
    zmax = 0.15,
    colorbar = list(title = "Framing<br>Value"),
    hovertemplate = paste0(
      "%{text}<extra></extra>"
    ),
    text = hover_matrix
  ) %>%
    # Add strike zone rectangle
    add_segments(
      x = -0.708, xend = 0.708, y = 1.5, yend = 1.5,
      line = list(color = "black", width = 3),
      showlegend = FALSE, inherit = FALSE
    ) %>%
    add_segments(
      x = -0.708, xend = 0.708, y = 3.5, yend = 3.5,
      line = list(color = "black", width = 3),
      showlegend = FALSE, inherit = FALSE
    ) %>%
    add_segments(
      x = -0.708, xend = -0.708, y = 1.5, yend = 3.5,
      line = list(color = "black", width = 3),
      showlegend = FALSE, inherit = FALSE
    ) %>%
    add_segments(
      x = 0.708, xend = 0.708, y = 1.5, yend = 3.5,
      line = list(color = "black", width = 3),
      showlegend = FALSE, inherit = FALSE
    ) %>%
    layout(
      title = list(
        text = paste0("<b>", catcher_name, " - Pitch Framing Heat Map</b>"),
        x = 0.5,
        xanchor = "center"
      ),
      xaxis = list(
        title = "Horizontal Location (feet)",
        range = c(-2, 2),
        constrain = "domain"
      ),
      yaxis = list(
        title = "Vertical Location (feet)",
        range = c(0, 5),
        scaleanchor = "x",
        scaleratio = 1
      ),
      plot_bgcolor = "rgb(240, 240, 240)",
      paper_bgcolor = "white"
    ) %>%
    config(displayModeBar = TRUE,
           modeBarButtonsToRemove = c("lasso2d", "select2d"))

  return(fig)
}

# Example usage with sample data
set.seed(2023)
sample_pitch_data <- tibble(
  catcher = sample(c("Realmuto", "Barnhart", "Perez"), 10000, replace = TRUE),
  plate_x = rnorm(10000, 0, 0.8),
  plate_z = rnorm(10000, 2.5, 0.6),
  dist_from_center = sqrt(plate_x^2 + (plate_z - 2.5)^2),
  strike_prob = plogis(2 - 2.5 * dist_from_center),
  catcher_effect = case_when(
    catcher == "Barnhart" ~ 0.4,
    catcher == "Realmuto" ~ 0.3,
    catcher == "Perez" ~ -0.2
  ),
  expected_strike = strike_prob,
  called_strike = rbinom(10000, 1, plogis(qlogis(strike_prob) + catcher_effect))
)

# Create and display interactive heatmap
framing_heatmap <- create_interactive_framing_heatmap(sample_pitch_data, "Barnhart")
framing_heatmap

# Save as HTML file
htmlwidgets::saveWidget(framing_heatmap, "framing_heatmap.html",
                        selfcontained = TRUE)

# R: Interactive Catcher Comparison Radar Chart
library(plotly)

create_catcher_radar_chart <- function(catcher_stats, catchers_to_compare) {
  # Filter to selected catchers
  plot_data <- catcher_stats %>%
    filter(catcher %in% catchers_to_compare)

  # Normalize metrics to 0-100 scale for radar chart
  metrics <- c("framing_runs", "blocking_runs", "throwing_runs",
               "fielding_runs", "batting_runs")

  normalized_data <- plot_data %>%
    mutate(across(all_of(metrics),
                  ~scales::rescale(., to = c(0, 100),
                                   from = range(catcher_stats[[cur_column()]]))))

  # Create plotly radar chart
  fig <- plot_ly(
    type = 'scatterpolar',
    fill = 'toself'
  )

  # Add trace for each catcher
  for (i in seq_len(nrow(normalized_data))) {
    catcher_row <- normalized_data[i, ]

    fig <- fig %>%
      add_trace(
        r = c(catcher_row$framing_runs, catcher_row$blocking_runs,
              catcher_row$throwing_runs, catcher_row$fielding_runs,
              catcher_row$batting_runs),
        theta = c('Framing', 'Blocking', 'Throwing', 'Fielding', 'Batting'),
        name = catcher_row$catcher,
        mode = 'lines+markers',
        marker = list(size = 8),
        line = list(width = 2),
        hovertemplate = paste0(
          "<b>%{theta}</b><br>",
          "Percentile: %{r:.1f}<br>",
          "<extra>", catcher_row$catcher, "</extra>"
        )
      )
  }

  fig <- fig %>%
    layout(
      polar = list(
        radialaxis = list(
          visible = TRUE,
          range = c(0, 100),
          ticktext = c("0", "25", "50", "75", "100"),
          tickvals = c(0, 25, 50, 75, 100)
        )
      ),
      title = list(
        text = "<b>Catcher Performance Comparison</b>",
        x = 0.5,
        xanchor = "center"
      ),
      showlegend = TRUE,
      legend = list(
        orientation = "v",
        x = 1.1,
        y = 0.5
      )
    )

  return(fig)
}

# Create sample catcher statistics
catcher_comparison_stats <- tibble(
  catcher = c("Realmuto", "Rutschman", "Smith", "Barnhart", "Perez"),
  framing_runs = c(15, 12, 10, 22, -8),
  blocking_runs = c(3, 4, 2, 3, 1),
  throwing_runs = c(8, 4, 2, 0, 6),
  fielding_runs = c(2, 1, 1, 0, 1),
  batting_runs = c(25, 20, 18, -10, 15)
)

# Create interactive radar chart
radar_chart <- create_catcher_radar_chart(
  catcher_comparison_stats,
  c("Realmuto", "Barnhart", "Perez")
)
radar_chart

# Save as HTML
htmlwidgets::saveWidget(radar_chart, "catcher_radar.html", selfcontained = TRUE)

# R: Interactive Pop Time Analysis with Filtering
library(plotly)
library(crosstalk)

create_pop_time_dashboard <- function(throwing_data) {
  # Create shared data for crosstalk filtering
  shared_data <- SharedData$new(throwing_data)

  # Create distribution plot
  pop_time_hist <- plot_ly(shared_data, x = ~pop_time, color = ~catcher,
                            type = "histogram", alpha = 0.6, nbinsx = 30) %>%
    layout(
      title = "Pop Time Distribution",
      xaxis = list(title = "Pop Time (seconds)"),
      yaxis = list(title = "Count"),
      barmode = "overlay"
    )

  # Create box plot by catcher
  pop_time_box <- plot_ly(shared_data, y = ~pop_time, color = ~catcher,
                          type = "box") %>%
    layout(
      title = "Pop Time by Catcher",
      yaxis = list(title = "Pop Time (seconds)"),
      xaxis = list(title = "Catcher")
    )

  # Create scatter plot: pop time vs caught stealing
  pop_time_scatter <- plot_ly(shared_data,
                              x = ~pop_time,
                              y = ~caught_stealing,
                              color = ~catcher,
                              type = "scatter",
                              mode = "markers",
                              marker = list(size = 6, opacity = 0.6),
                              text = ~paste(
                                "Catcher:", catcher, "<br>",
                                "Pop Time:", round(pop_time, 2), "s<br>",
                                "Result:", ifelse(caught_stealing, "CS", "SB"), "<br>",
                                "Runner Speed:", round(runner_speed, 1), "ft/s"
                              ),
                              hoverinfo = "text") %>%
    layout(
      title = "Pop Time vs Outcome",
      xaxis = list(title = "Pop Time (seconds)"),
      yaxis = list(title = "Caught Stealing", tickvals = c(0, 1),
                   ticktext = c("Safe", "Out"))
    )

  # Combine plots using subplot
  combined_plot <- subplot(
    pop_time_hist,
    pop_time_box,
    pop_time_scatter,
    nrows = 2,
    heights = c(0.4, 0.6),
    shareX = FALSE,
    titleX = TRUE,
    titleY = TRUE
  ) %>%
    layout(
      title = list(
        text = "<b>Interactive Pop Time Analysis Dashboard</b>",
        x = 0.5,
        xanchor = "center"
      ),
      showlegend = TRUE
    )

  return(combined_plot)
}

# Create sample throwing data
set.seed(2025)
n_attempts <- 1000

sample_throwing_data <- tibble(
  catcher = sample(c("Realmuto", "Perez", "Rutschman"), n_attempts, replace = TRUE),
  runner_speed = rnorm(n_attempts, 27, 1.5),
  pitcher_time = rnorm(n_attempts, 1.3, 0.15),
  pitcher_hand = sample(c("R", "L"), n_attempts, replace = TRUE)
) %>%
  mutate(
    pop_time = case_when(
      catcher == "Realmuto" ~ rnorm(n(), 1.87, 0.08),
      catcher == "Perez" ~ rnorm(n(), 1.92, 0.10),
      catcher == "Rutschman" ~ rnorm(n(), 1.95, 0.09)
    ),
    total_time_catcher = pitcher_time + pop_time,
    total_time_runner = (127 - 12) / runner_speed,
    caught_stealing = as.numeric(
      total_time_catcher < total_time_runner + rnorm(n(), 0, 0.1)
    )
  )

# Create dashboard
pop_time_dashboard <- create_pop_time_dashboard(sample_throwing_data)
pop_time_dashboard

# Save as HTML
htmlwidgets::saveWidget(pop_time_dashboard, "pop_time_dashboard.html",
                        selfcontained = TRUE)

Python

# Python: Interactive Framing Heat Map with Plotly
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy.stats import binned_statistic_2d

def create_framing_heatmap_data(pitch_data, catcher_name):
    """Prepare binned framing data for heatmap visualization"""
    catcher_data = pitch_data[pitch_data['catcher'] == catcher_name].copy()

    # Create 20x20 grid
    x_bins = np.linspace(-2, 2, 21)
    z_bins = np.linspace(0, 5, 21)

    # Calculate statistics for each bin
    actual_strikes, x_edges, z_edges, _ = binned_statistic_2d(
        catcher_data['plate_x'], catcher_data['plate_z'],
        catcher_data['called_strike'],
        statistic='mean', bins=[x_bins, z_bins]
    )

    expected_strikes, _, _, _ = binned_statistic_2d(
        catcher_data['plate_x'], catcher_data['plate_z'],
        catcher_data['expected_strike'],
        statistic='mean', bins=[x_bins, z_bins]
    )

    pitch_counts, _, _, _ = binned_statistic_2d(
        catcher_data['plate_x'], catcher_data['plate_z'],
        catcher_data['called_strike'],
        statistic='count', bins=[x_bins, z_bins]
    )

    # Calculate framing value
    framing_value = actual_strikes - expected_strikes

    # Mask bins with insufficient data
    framing_value[pitch_counts < 5] = np.nan
    actual_strikes[pitch_counts < 5] = np.nan
    expected_strikes[pitch_counts < 5] = np.nan

    return {
        'framing_value': framing_value,
        'actual_strikes': actual_strikes,
        'expected_strikes': expected_strikes,
        'pitch_counts': pitch_counts,
        'x_centers': (x_edges[:-1] + x_edges[1:]) / 2,
        'z_centers': (z_edges[:-1] + z_edges[1:]) / 2
    }

def create_interactive_framing_heatmap(pitch_data, catcher_name):
    """Create interactive Plotly heatmap for pitch framing"""
    data = create_framing_heatmap_data(pitch_data, catcher_name)

    # Create custom hover text
    hover_text = []
    for i in range(len(data['z_centers'])):
        hover_row = []
        for j in range(len(data['x_centers'])):
            if np.isnan(data['framing_value'][i, j]):
                hover_row.append('')
            else:
                text = (
                    f"Location: ({data['x_centers'][j]:.2f}, {data['z_centers'][i]:.2f})<br>"
                    f"Pitches: {int(data['pitch_counts'][i, j])}<br>"
                    f"Actual Strike %: {data['actual_strikes'][i, j]*100:.1f}%<br>"
                    f"Expected Strike %: {data['expected_strikes'][i, j]*100:.1f}%<br>"
                    f"Framing Value: {data['framing_value'][i, j]*100:+.1f}%"
                )
                hover_row.append(text)
        hover_text.append(hover_row)

    # Create figure
    fig = go.Figure()

    # Add heatmap
    fig.add_trace(go.Heatmap(
        x=data['x_centers'],
        y=data['z_centers'],
        z=data['framing_value'],
        colorscale=[
            [0, 'rgb(220, 50, 50)'],      # Red for negative
            [0.5, 'rgb(255, 255, 255)'],  # White for neutral
            [1, 'rgb(50, 50, 220)']       # Blue for positive
        ],
        zmid=0,
        zmin=-0.15,
        zmax=0.15,
        colorbar=dict(title="Framing<br>Value"),
        hovertemplate='%{text}<extra></extra>',
        text=hover_text
    ))

    # Add strike zone rectangle
    zone_x = [-0.708, 0.708, 0.708, -0.708, -0.708]
    zone_z = [1.5, 1.5, 3.5, 3.5, 1.5]

    fig.add_trace(go.Scatter(
        x=zone_x, y=zone_z,
        mode='lines',
        line=dict(color='black', width=3),
        showlegend=False,
        hoverinfo='skip'
    ))

    # Update layout
    fig.update_layout(
        title=dict(
            text=f"<b>{catcher_name} - Pitch Framing Heat Map</b>",
            x=0.5,
            xanchor='center',
            font=dict(size=18)
        ),
        xaxis=dict(
            title="Horizontal Location (feet)",
            range=[-2, 2],
            constrain='domain'
        ),
        yaxis=dict(
            title="Vertical Location (feet)",
            range=[0, 5],
            scaleanchor="x",
            scaleratio=1
        ),
        plot_bgcolor='rgb(240, 240, 240)',
        paper_bgcolor='white',
        width=700,
        height=700
    )

    return fig

# Example usage with sample data
np.random.seed(2023)
n = 10000

def inv_logit(x):
    return 1 / (1 + np.exp(-x))

sample_pitch_data = pd.DataFrame({
    'catcher': np.random.choice(['Realmuto', 'Barnhart', 'Perez'], n),
    'plate_x': np.random.normal(0, 0.8, n),
    'plate_z': np.random.normal(2.5, 0.6, n)
})

sample_pitch_data['dist_from_center'] = np.sqrt(
    sample_pitch_data['plate_x']**2 +
    (sample_pitch_data['plate_z'] - 2.5)**2
)

catcher_effects = {'Barnhart': 0.4, 'Realmuto': 0.3, 'Perez': -0.2}
sample_pitch_data['catcher_effect'] = sample_pitch_data['catcher'].map(catcher_effects)

base_logit = 2 - 2.5 * sample_pitch_data['dist_from_center']
sample_pitch_data['expected_strike'] = inv_logit(base_logit)
strike_prob = inv_logit(base_logit + sample_pitch_data['catcher_effect'])
sample_pitch_data['called_strike'] = np.random.binomial(1, strike_prob)

# Create and display interactive heatmap
framing_heatmap = create_interactive_framing_heatmap(sample_pitch_data, 'Barnhart')
framing_heatmap.show()

# Save as HTML file
framing_heatmap.write_html("framing_heatmap.html")

Python

# Python: Interactive Catcher Comparison Radar Chart
import plotly.graph_objects as go
import pandas as pd
import numpy as np

def create_catcher_radar_chart(catcher_stats, catchers_to_compare):
    """Create interactive radar chart comparing multiple catchers"""
    # Filter to selected catchers
    plot_data = catcher_stats[catcher_stats['catcher'].isin(catchers_to_compare)].copy()

    # Metrics to display
    metrics = ['framing_runs', 'blocking_runs', 'throwing_runs',
               'fielding_runs', 'batting_runs']
    metric_labels = ['Framing', 'Blocking', 'Throwing', 'Fielding', 'Batting']

    # Normalize to 0-100 percentile scale
    for metric in metrics:
        min_val = catcher_stats[metric].min()
        max_val = catcher_stats[metric].max()
        plot_data[f'{metric}_norm'] = (
            (plot_data[metric] - min_val) / (max_val - min_val) * 100
        )

    # Create figure
    fig = go.Figure()

    # Add trace for each catcher
    colors = ['rgb(31, 119, 180)', 'rgb(255, 127, 14)', 'rgb(44, 160, 44)',
              'rgb(214, 39, 40)', 'rgb(148, 103, 189)']

    for idx, (_, catcher_row) in enumerate(plot_data.iterrows()):
        values = [catcher_row[f'{m}_norm'] for m in metrics]
        actual_values = [catcher_row[m] for m in metrics]

        # Create hover text with actual values
        hover_text = [
            f"<b>{label}</b><br>Percentile: {val:.1f}<br>Actual: {actual:+.1f} runs"
            for label, val, actual in zip(metric_labels, values, actual_values)
        ]

        fig.add_trace(go.Scatterpolar(
            r=values,
            theta=metric_labels,
            fill='toself',
            name=catcher_row['catcher'],
            line=dict(color=colors[idx % len(colors)], width=2),
            marker=dict(size=8),
            hovertemplate='%{text}<extra>' + catcher_row['catcher'] + '</extra>',
            text=hover_text
        ))

    # Update layout
    fig.update_layout(
        polar=dict(
            radialaxis=dict(
                visible=True,
                range=[0, 100],
                ticktext=['0', '25', '50', '75', '100'],
                tickvals=[0, 25, 50, 75, 100]
            )
        ),
        title=dict(
            text="<b>Catcher Performance Comparison</b>",
            x=0.5,
            xanchor='center',
            font=dict(size=18)
        ),
        showlegend=True,
        legend=dict(
            orientation="v",
            x=1.1,
            y=0.5
        ),
        width=800,
        height=600
    )

    return fig

# Create sample catcher statistics
catcher_comparison_stats = pd.DataFrame({
    'catcher': ['Realmuto', 'Rutschman', 'Smith', 'Barnhart', 'Perez'],
    'framing_runs': [15, 12, 10, 22, -8],
    'blocking_runs': [3, 4, 2, 3, 1],
    'throwing_runs': [8, 4, 2, 0, 6],
    'fielding_runs': [2, 1, 1, 0, 1],
    'batting_runs': [25, 20, 18, -10, 15]
})

# Create interactive radar chart
radar_chart = create_catcher_radar_chart(
    catcher_comparison_stats,
    ['Realmuto', 'Barnhart', 'Perez']
)
radar_chart.show()

# Save as HTML
radar_chart.write_html("catcher_radar.html")

Python

# Python: Interactive Pop Time Analysis with Filtering
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
import numpy as np

def create_pop_time_dashboard(throwing_data):
    """Create interactive dashboard for pop time analysis"""

    # Create figure with subplots
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Pop Time Distribution', 'Pop Time by Catcher',
                       'Pop Time vs Outcome', 'Success Rate by Pop Time Range'),
        specs=[[{"type": "histogram"}, {"type": "box"}],
               [{"type": "scatter"}, {"type": "bar"}]],
        vertical_spacing=0.12,
        horizontal_spacing=0.1
    )

    # Get unique catchers for color mapping
    catchers = throwing_data['catcher'].unique()
    colors = ['rgb(31, 119, 180)', 'rgb(255, 127, 14)', 'rgb(44, 160, 44)']
    color_map = {catcher: colors[i % len(colors)] for i, catcher in enumerate(catchers)}

    # 1. Histogram of pop times
    for catcher in catchers:
        data = throwing_data[throwing_data['catcher'] == catcher]
        fig.add_trace(
            go.Histogram(
                x=data['pop_time'],
                name=catcher,
                marker_color=color_map[catcher],
                opacity=0.6,
                nbinsx=30,
                showlegend=True
            ),
            row=1, col=1
        )

    # 2. Box plot by catcher
    for catcher in catchers:
        data = throwing_data[throwing_data['catcher'] == catcher]
        fig.add_trace(
            go.Box(
                y=data['pop_time'],
                name=catcher,
                marker_color=color_map[catcher],
                showlegend=False
            ),
            row=1, col=2
        )

    # 3. Scatter plot: pop time vs outcome
    for catcher in catchers:
        data = throwing_data[throwing_data['catcher'] == catcher]
        hover_text = [
            f"Catcher: {catcher}<br>Pop Time: {pt:.2f}s<br>"
            f"Result: {'CS' if cs else 'SB'}<br>Runner Speed: {rs:.1f} ft/s"
            for pt, cs, rs in zip(data['pop_time'], data['caught_stealing'],
                                  data['runner_speed'])
        ]

        fig.add_trace(
            go.Scatter(
                x=data['pop_time'],
                y=data['caught_stealing'] + np.random.uniform(-0.05, 0.05, len(data)),
                mode='markers',
                name=catcher,
                marker=dict(size=6, opacity=0.6, color=color_map[catcher]),
                text=hover_text,
                hovertemplate='%{text}<extra></extra>',
                showlegend=False
            ),
            row=2, col=1
        )

    # 4. Success rate by pop time bins
    pop_time_bins = pd.cut(throwing_data['pop_time'],
                           bins=[0, 1.85, 1.90, 1.95, 2.00, 2.10, np.inf],
                           labels=['<1.85', '1.85-1.90', '1.90-1.95',
                                  '1.95-2.00', '2.00-2.10', '>2.10'])
    throwing_data['pop_time_bin'] = pop_time_bins

    bin_stats = throwing_data.groupby('pop_time_bin')['caught_stealing'].agg([
        ('count', 'count'),
        ('cs_rate', 'mean')
    ]).reset_index()

    fig.add_trace(
        go.Bar(
            x=bin_stats['pop_time_bin'].astype(str),
            y=bin_stats['cs_rate'] * 100,
            text=[f"{rate:.1f}%" for rate in bin_stats['cs_rate'] * 100],
            textposition='outside',
            marker_color='rgb(55, 83, 109)',
            showlegend=False
        ),
        row=2, col=2
    )

    # Update axes
    fig.update_xaxes(title_text="Pop Time (seconds)", row=1, col=1)
    fig.update_xaxes(title_text="Catcher", row=1, col=2)
    fig.update_xaxes(title_text="Pop Time (seconds)", row=2, col=1)
    fig.update_xaxes(title_text="Pop Time Range", row=2, col=2)

    fig.update_yaxes(title_text="Count", row=1, col=1)
    fig.update_yaxes(title_text="Pop Time (seconds)", row=1, col=2)
    fig.update_yaxes(title_text="Outcome (0=SB, 1=CS)", row=2, col=1)
    fig.update_yaxes(title_text="CS Rate (%)", row=2, col=2)

    # Update layout
    fig.update_layout(
        title=dict(
            text="<b>Interactive Pop Time Analysis Dashboard</b>",
            x=0.5,
            xanchor='center',
            font=dict(size=18)
        ),
        height=800,
        width=1200,
        showlegend=True,
        legend=dict(x=1.02, y=0.98),
        barmode='overlay'
    )

    return fig

# Create sample throwing data
np.random.seed(2025)
n_attempts = 1000

pop_time_map = {
    'Realmuto': (1.87, 0.08),
    'Perez': (1.92, 0.10),
    'Rutschman': (1.95, 0.09)
}

sample_throwing_data = pd.DataFrame({
    'catcher': np.random.choice(['Realmuto', 'Perez', 'Rutschman'], n_attempts),
    'runner_speed': np.random.normal(27, 1.5, n_attempts),
    'pitcher_time': np.random.normal(1.3, 0.15, n_attempts),
    'pitcher_hand': np.random.choice(['R', 'L'], n_attempts)
})

sample_throwing_data['pop_time'] = sample_throwing_data['catcher'].apply(
    lambda x: np.random.normal(pop_time_map[x][0], pop_time_map[x][1])
)

sample_throwing_data['total_time_catcher'] = (
    sample_throwing_data['pitcher_time'] + sample_throwing_data['pop_time']
)
sample_throwing_data['total_time_runner'] = (127 - 12) / sample_throwing_data['runner_speed']
sample_throwing_data['caught_stealing'] = (
    sample_throwing_data['total_time_catcher'] <
    sample_throwing_data['total_time_runner'] + np.random.normal(0, 0.1, n_attempts)
).astype(int)

# Create and display dashboard
pop_time_dashboard = create_pop_time_dashboard(sample_throwing_data)
pop_time_dashboard.show()

# Save as HTML
pop_time_dashboard.write_html("pop_time_dashboard.html")

20.8 Exercises

Exercise 1: Build a Pitch Framing Model

Using the provided pitch-level data (or simulated data), build a logistic regression model to predict called strikes based on pitch location, count, and other factors. Then calculate framing runs for different catchers.

Tasks:
a) Build a baseline called strike probability model using pitch location (platex, platez) and count variables
b) Calculate expected strikes vs. actual strikes for each catcher
c) Convert the difference to framing runs (use 0.125 runs per extra strike)
d) Identify which zones (in-zone, edge, out-of-zone) show the largest framing effects
e) Visualize framing value by location using heatmaps for the top 3 and bottom 3 framers

Extension: Incorporate umpire effects by building separate models for different umpires or umpire types (wide/tight zones).

Exercise 2: Blocking Analysis and Difficulty Adjustment

Analyze blocking performance while accounting for pitch difficulty.

Tasks:
a) Create a difficulty metric based on pitch location (distance from home plate, horizontal location, height)
b) Build a model predicting block probability based on difficulty factors
c) Calculate blocks above expected for different catchers
d) Determine which catchers excel on "hard" blocks vs. "easy" blocks
e) Estimate the run value of blocking ability (assume each failed block on a pitch with runners on costs 0.3 runs)

Extension: Analyze whether blocking ability degrades over the course of a game (late innings) or season (fatigue effects).

Exercise 3: Throwing Value and Deterrence Effects

Examine both direct throwing value (CS) and indirect deterrence value.

Tasks:
a) Build a model predicting caught stealing probability based on runner speed, pitcher delivery time, and lead distance
b) Calculate CS above expected for catchers, accounting for quality of baserunners faced
c) Analyze stolen base attempt rates faced by different catchers (controlling for pitcher and game situation)
d) Estimate the deterrence value: how many stolen base attempts are prevented by elite arms?
e) Combine direct CS value and deterrence value into a comprehensive throwing runs metric

Extension: Investigate whether certain catchers are more effective against specific types of runners (speed, steal success rate).

Exercise 4: Comprehensive Catcher Value Model

Build a complete catcher value model integrating all defensive components.

Tasks:
a) Combine framing runs, blocking runs, throwing runs, and fielding runs into a composite defensive value metric
b) Add offensive value (batting runs above average) to create total value
c) Convert total value to WAR using a runs-to-wins converter (typically 10 runs per win)
d) Compare your catcher WAR estimates to public metrics (FanGraphs, Baseball Reference)
e) Identify which catchers provide the most value relative to their salary (create a value-per-dollar metric)

Extension: Build a playing time optimizer that determines the optimal distribution of innings between multiple catchers on a roster, considering both performance and fatigue/injury risk.

Summary

Catcher analytics has evolved dramatically in the past two decades, transforming from a position evaluated primarily on offensive production to one where defensive skills—particularly pitch framing—drive valuation. Modern catcher metrics quantify:

Pitch Framing: Elite framers add 15-25 runs per season through receiving
Blocking: Top blockers save 5-10 runs by preventing wild pitches and passed balls
Throwing: Elite arms add 5-10 runs through caught stealing and deterrence
Game Calling: The least quantifiable but still valued skill in pitcher management

The integration of these components into comprehensive value models allows teams to properly value defensive specialists like Tucker Barnhart alongside offensive-minded catchers like Salvador Perez. The position continues to evolve with technological changes (PitchCom, potential automated strike zones) and rule modifications (larger bases, pickoff restrictions) that alter the relative importance of different skills.

Understanding catcher analytics requires both quantitative rigor in measurement and appreciation for the subtleties of receiving technique, pitcher relationships, and game management that resist easy quantification. Teams that excel at identifying, developing, and deploying catcher talent gain meaningful competitive advantages in player acquisition and roster construction.

References and Further Reading

Fast, M. (2011). "Spinning Yarn: The Art of Pitching." The Hardball Times Baseball Annual
Judge, J., Pavlidis, H., & Brooks, D. (2015). "Moving Beyond WOWY: A Mixed Approach to Measuring Catcher Framing." Baseball Prospectus
Turkenkopf, M. (2008). "Evaluating Catchers: Framing Pitches." The Hardball Times
Lindbergh, B. & Sawchik, T. (2015). The Only Rule Is It Has To Work. Chapter on catcher framing
Mills, B. & Braun, S. (2019). "The Effect of Pitch Framing on the Strike Zone." Journal of Sports Analytics

Data Sources:

Statcast (Baseball Savant): Pop time, arm strength, framing metrics

FanGraphs: Comprehensive catcher defense metrics

Baseball Prospectus: Framing runs, blocking runs, FRAA

Baseball Reference: Traditional catcher statistics and WAR

Chapter 20: Catcher Analytics & Pitch Framing

Book Progress

What You'll Learn

Languages in This Chapter

Table of Contents

Quick Navigation

20.1 The Value of Catchers

The Evolution of Catcher Evaluation

Components of Catcher Value

Economic Value

The Catcher Defensive Spectrum

20.2 Pitch Framing Fundamentals & Metrics

The Mechanics of Framing

Framing Metrics: Called Strike Probability Models

Edge Strike Mastery

Umpire and Count Context

The Limits of Framing: Robot Umps

20.3 Blocking & Passed Ball Prevention

The Mechanics of Blocking

Blocking Metrics

Situational Blocking

The Pitcher-Catcher Blocking Dynamic

20.4 Throwing & Controlling the Running Game

Components of Throwing Ability

Caught Stealing Metrics

Controlling the Running Game

The Declining Stolen Base Environment

20.5 Game Calling & Pitcher Handling

Pitch Calling and Sequencing

Measuring Game Calling Impact

Pitcher Relationships and Psychology

20.6 Total Catcher Value Models

Components of Total Catcher Value

Building a Composite Catcher Value Model

Trade-offs and Value Optimization

The Future of Catcher Valuation

20.7 Interactive Catcher Dashboards

Interactive Framing Heat Map

Interactive Catcher Comparison Radar Chart

Interactive Pop Time Distribution with Filtering

20.8 Exercises

Exercise 1: Build a Pitch Framing Model

Exercise 2: Blocking Analysis and Difficulty Adjustment

Exercise 3: Throwing Value and Deterrence Effects

Exercise 4: Comprehensive Catcher Value Model

Summary

Chapter Summary

Related Resources

Glossary

Resources

All Chapters