Chapter 8: Fielding & Baserunning Analytics

8.1 The Challenge of Measuring Defense

8.1.1 Why Defense is Hard to Quantify {#defense-challenge}

Defense presents unique measurement challenges that make it far more difficult to evaluate than offense or pitching:

Limited sample sizes: Even full-time players get 200-400 defensive chances per season, compared to 600-700 plate appearances. A shortstop might get three difficult ground balls in one game, none in the next. This variance makes defensive metrics noisier than offensive metrics—even a full season might not provide enough data for confident conclusions about true talent.

Context dependency: Defensive opportunities depend heavily on factors beyond the fielder's control. A pitcher who induces many ground balls gives his infielders more chances (and potentially better defensive metrics). A fly-ball pitcher makes his outfielders look better. Park dimensions matter enormously—Fenway Park's Green Monster creates unique defensive challenges that don't exist in symmetrical ballparks.

Team effects: Unlike hitting where each plate appearance is largely independent, defense involves coordination. A slow first baseman might make the second baseman's job harder by requiring wider throws. A catcher's pitch framing affects how umpires call pitches for all the team's pitchers. Disentangling individual contributions from team effects requires sophisticated modeling.

Opportunity variation: Not all defensive chances are created equal. A ground ball hit 70 mph directly at a fielder is easy; one hit 100 mph in the 4-hole requires elite reaction time and range. Traditional statistics like fielding percentage treated all chances equally, while a routine play and a spectacular diving catch both counted as a single out.

The counterfactual problem: To evaluate defense, we need to know what would have happened with an average defender at that position. If a shortstop makes a diving stop on a ball up the middle, should that be worth +0.75 runs because an average shortstop would have let it through 75% of the time? But we can't actually observe the counterfactual—we only see what this specific defender did.

8.1.2 Evolution of Defensive Metrics {#defense-evolution}

Baseball's attempts to quantify defense have evolved through several generations:

Traditional Era (1900s-1990s): Defense was measured almost exclusively by errors and fielding percentage (chances handled cleanly / total chances). These metrics were deeply flawed. They penalized defenders with great range (who got to balls others couldn't reach, creating more error opportunities) and credited defenders with poor range (who never touched balls they couldn't reach). A shortstop who made 10 errors but reached 50 balls others couldn't was rated worse than one who made 5 errors but only reached routine plays.

Zone Rating Era (1990s-2000s): Stats Inc. introduced zone rating, which divided the field into zones and measured how often fielders made plays on balls hit into their zone. This was the first metric to account for range. Players were credited for making plays, not just penalized for errors. However, zone definitions were somewhat arbitrary, and the metric didn't account for batted ball velocity or exact positioning.

UZR/DRS Era (2000s-2010s): Ultimate Zone Rating (UZR) and Defensive Runs Saved (DRS) represented major advances. Both metrics:

Divided the field into much finer zones

Accounted for batted ball type (ground ball, line drive, fly ball)

Considered game situation (runners on base, shift positioning)

Converted defensive plays to runs saved/cost relative to average

These metrics became the gold standard for defensive evaluation and remain widely used. We'll examine them in detail later in this chapter.

Statcast Era (2015-present): MLB's Statcast system uses high-speed cameras and radar to track every player's position and movement at 30 frames per second. This enables unprecedented precision in defensive measurement:

Exact positioning when the ball is hit

Route efficiency to the ball

Sprint speed and burst

Jump/reaction time

Catch probability based on distance, hang time, and direction

Arm strength and exchange time for throws

Statcast's Outs Above Average (OAA) has become the premier defensive metric because it's based on objective tracking data rather than subjective zone assignments.

8.1.3 What Good Defensive Metrics Should Measure {#good-metrics}

An ideal defensive metric should:

Account for opportunity: Credit fielders for attempting difficult plays, not just making routine ones
Adjust for context: Consider ballpark, pitcher tendencies, and positioning
Be repeatable: Measure true talent rather than random variance (year-to-year correlation)
Translate to runs: Express defensive value in runs saved/cost for easy comparison to offense
Be granular: Break down into components (range, arm, hands) so we understand what drives value
Update with information: Incorporate new tracking data as it becomes available

Statcast-era metrics meet these criteria better than any previous system, though imperfections remain. Sample size issues persist, and certain aspects of defense (positioning, communication, relay throws) remain difficult to fully capture.

8.2 Outs Above Average (OAA)

Outs Above Average (OAA) is Statcast's flagship defensive metric. It estimates how many outs a player made above or below what an average fielder at their position would have made on the same set of batted balls.

8.2.1 Understanding OAA {#understanding-oaa}

OAA works by calculating the probability that an average fielder would successfully convert each batted ball into an out, then comparing actual results to these probabilities:

OAA = Σ (Actual Outcome - Expected Outcome)

For each batted ball:

If a fielder makes a play with 30% catch probability: +0.7 outs (1.0 actual - 0.3 expected)

If a fielder misses a play with 80% catch probability: -0.8 outs (0.0 actual - 0.8 expected)

If a fielder makes a routine play with 99% catch probability: +0.01 outs

Over a full season, these credits and debits accumulate. A player with +10 OAA made 10 more outs than an average fielder would have made on the same batted balls. A player with -5 OAA made 5 fewer outs.

Why OAA is valuable:

Based on objective tracking data, not subjective scorer decisions

Accounts for difficulty of each play via catch probability

Updates continuously as more data accumulates

Breaks down by component (range, arm, blocking for catchers)

Publicly available for all players at Baseball Savant

Limitations of OAA:

Small sample sizes create noise (even full-season metrics have uncertainty)

Doesn't account for all defensive contributions (positioning, communication)

Catch probability models can struggle with extreme plays (diving catches, wall plays)

Positioning data isn't always perfect, affecting opportunity calculations

8.2.2 OAA Components {#oaa-components}

OAA breaks down into several components depending on position:

For outfielders:

Outs Above Average: Total outfield OAA (range + jumps + routes)

Arm component: Baserunner advancement prevented (not fully integrated into total OAA)

For infielders:

Outs Above Average: Total infield OAA (range + reaction)

Arm component: Evaluated separately for some positions

For catchers:

Framing runs: Extra strikes gained through receiving

Throwing runs: Preventing stolen bases

Blocking runs: Preventing wild pitches/passed balls

Let's examine how to calculate OAA-like metrics using Statcast data:

R Implementation:

library(tidyverse)
library(baseballr)

# Get Statcast data for outfield plays (example: July 2024)
outfield_plays <- statcast_search(
  start_date = "2024-07-01",
  end_date = "2024-07-31",
  player_type = "batter"
) %>%
  filter(
    !is.na(hit_distance_sc),
    !is.na(hit_location),
    !is.na(hc_x),
    !is.na(hc_y)
  )

# Calculate catch probability based on hang time and distance
# This is simplified - actual Statcast model is more complex
calc_catch_probability <- function(hang_time, distance, direction) {
  # Baseline probability
  base_prob <- plogis(3 - 0.15 * distance - 0.5 * abs(direction) + 2 * hang_time)
  # Bound between 0 and 1
  pmax(0.01, pmin(0.99, base_prob))
}

# Process outfield catches
outfield_defense <- outfield_plays %>%
  filter(hit_location %in% c(7, 8, 9)) %>%  # Outfield positions
  mutate(
    was_caught = ifelse(events %in% c("field_out", "double_play",
                                       "triple_play", "sac_fly"), 1, 0),
    hang_time = launch_speed / 100,  # Simplified hang time calculation
    direction = hc_x - 125.42,  # Distance from center line
    catch_prob = calc_catch_probability(hang_time, hit_distance_sc, direction),
    outs_above_avg = was_caught - catch_prob
  )

# Aggregate by fielder
player_oaa <- outfield_defense %>%
  filter(!is.na(fielder_2)) %>%
  group_by(fielder_2) %>%
  summarize(
    opportunities = n(),
    catches = sum(was_caught),
    expected_catches = sum(catch_prob),
    oaa = sum(outs_above_avg),
    .groups = "drop"
  ) %>%
  filter(opportunities >= 20) %>%  # Minimum sample size
  arrange(desc(oaa))

# Display top defenders
print(player_oaa %>% head(10))

# Visualize OAA distribution
ggplot(player_oaa, aes(x = oaa)) +
  geom_histogram(binwidth = 0.5, fill = "#2b8cbe", color = "white", alpha = 0.8) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "red", size = 1) +
  labs(
    title = "Distribution of Outfielder OAA - July 2024",
    subtitle = "One month of data shows wide variance in defensive value",
    x = "Outs Above Average (OAA)",
    y = "Number of Players",
    caption = "Data: MLB Statcast via baseballr\nNote: Simplified catch probability model"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

Python Implementation:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pybaseball import statcast
from scipy.special import expit  # Logistic function

# Set style
sns.set_style("whitegrid")

# Get Statcast data for July 2024
outfield_plays = statcast(start_dt='2024-07-01', end_dt='2024-07-31')

# Filter for valid outfield plays
outfield_plays = outfield_plays[
    outfield_plays['hit_distance_sc'].notna() &
    outfield_plays['hc_x'].notna() &
    outfield_plays['hc_y'].notna() &
    outfield_plays['hit_location'].isin([7, 8, 9])  # Outfield
].copy()

# Calculate catch probability (simplified model)
def calc_catch_probability(row):
    hang_time = row['launch_speed'] / 100  # Simplified
    distance = row['hit_distance_sc']
    direction = abs(row['hc_x'] - 125.42)  # Distance from center

    # Logistic regression model (simplified)
    logit = 3 - 0.15 * distance - 0.5 * direction + 2 * hang_time
    prob = expit(logit)  # Sigmoid function

    return np.clip(prob, 0.01, 0.99)

# Apply catch probability calculation
outfield_plays['catch_prob'] = outfield_plays.apply(calc_catch_probability, axis=1)

# Determine if ball was caught
caught_events = ['field_out', 'double_play', 'triple_play', 'sac_fly']
outfield_plays['was_caught'] = outfield_plays['events'].isin(caught_events).astype(int)

# Calculate outs above average
outfield_plays['outs_above_avg'] = outfield_plays['was_caught'] - outfield_plays['catch_prob']

# Aggregate by fielder
player_oaa = outfield_plays.groupby('fielder_2').agg(
    opportunities=('outs_above_avg', 'count'),
    catches=('was_caught', 'sum'),
    expected_catches=('catch_prob', 'sum'),
    oaa=('outs_above_avg', 'sum')
).reset_index()

# Filter minimum sample size
player_oaa = player_oaa[player_oaa['opportunities'] >= 20].sort_values('oaa', ascending=False)

print("\nTop 10 Defenders by OAA (July 2024):")
print(player_oaa.head(10))

# Visualize OAA distribution
fig, ax = plt.subplots(figsize=(12, 6))

ax.hist(player_oaa['oaa'], bins=20, color='#2b8cbe', alpha=0.8, edgecolor='white')
ax.axvline(x=0, color='red', linestyle='--', linewidth=2, label='Average (0 OAA)')

ax.set_xlabel('Outs Above Average (OAA)', fontsize=12)
ax.set_ylabel('Number of Players', fontsize=12)
ax.set_title('Distribution of Outfielder OAA - July 2024\nOne month of data shows wide variance in defensive value',
             fontsize=14, fontweight='bold', pad=20)
ax.legend()

plt.figtext(0.99, 0.01, 'Data: MLB Statcast via pybaseball\nNote: Simplified catch probability model',
            ha='right', fontsize=9, style='italic')

plt.tight_layout()
plt.show()

# Calculate summary statistics
print(f"\nOAA Summary Statistics:")
print(f"Mean: {player_oaa['oaa'].mean():.2f}")
print(f"Median: {player_oaa['oaa'].median():.2f}")
print(f"Std Dev: {player_oaa['oaa'].std():.2f}")
print(f"Range: {player_oaa['oaa'].min():.2f} to {player_oaa['oaa'].max():.2f}")

8.2.3 Catch Probability Explained {#catch-probability}

Catch probability is the foundation of OAA. For each batted ball, Statcast estimates the likelihood an average defender would convert it to an out based on:

Distance to ball: How far the fielder must travel from their starting position
Hang time: How long the ball is in the air (for fly balls)
Direction: Whether the ball is directly in front, to the side, or behind
Initial velocity: How hard the ball was hit
Spin and trajectory: Ball flight characteristics

The actual Statcast model uses machine learning trained on hundreds of thousands of batted balls. For each play, it identifies similar plays from the historical database and calculates what percentage resulted in outs.

Real example - Kevin Kiermaier diving catch (2019):

Distance to ball: 92 feet

Hang time: 4.2 seconds

Direction: Directly to right

Catch probability: 28%

Credit: +0.72 OAA

Real example - Byron Buxton routine fly (2023):

Distance to ball: 35 feet

Hang time: 5.1 seconds

Direction: Slightly to left

Catch probability: 97%

Credit: +0.03 OAA

The beauty of this system is that spectacular plays on low-probability balls are properly valued, while routine plays on high-probability balls receive minimal credit. This solves the long-standing problem of defenders getting equal credit for vastly different difficulty levels.

library(tidyverse)
library(baseballr)

# Get Statcast data for outfield plays (example: July 2024)
outfield_plays <- statcast_search(
  start_date = "2024-07-01",
  end_date = "2024-07-31",
  player_type = "batter"
) %>%
  filter(
    !is.na(hit_distance_sc),
    !is.na(hit_location),
    !is.na(hc_x),
    !is.na(hc_y)
  )

# Calculate catch probability based on hang time and distance
# This is simplified - actual Statcast model is more complex
calc_catch_probability <- function(hang_time, distance, direction) {
  # Baseline probability
  base_prob <- plogis(3 - 0.15 * distance - 0.5 * abs(direction) + 2 * hang_time)
  # Bound between 0 and 1
  pmax(0.01, pmin(0.99, base_prob))
}

# Process outfield catches
outfield_defense <- outfield_plays %>%
  filter(hit_location %in% c(7, 8, 9)) %>%  # Outfield positions
  mutate(
    was_caught = ifelse(events %in% c("field_out", "double_play",
                                       "triple_play", "sac_fly"), 1, 0),
    hang_time = launch_speed / 100,  # Simplified hang time calculation
    direction = hc_x - 125.42,  # Distance from center line
    catch_prob = calc_catch_probability(hang_time, hit_distance_sc, direction),
    outs_above_avg = was_caught - catch_prob
  )

# Aggregate by fielder
player_oaa <- outfield_defense %>%
  filter(!is.na(fielder_2)) %>%
  group_by(fielder_2) %>%
  summarize(
    opportunities = n(),
    catches = sum(was_caught),
    expected_catches = sum(catch_prob),
    oaa = sum(outs_above_avg),
    .groups = "drop"
  ) %>%
  filter(opportunities >= 20) %>%  # Minimum sample size
  arrange(desc(oaa))

# Display top defenders
print(player_oaa %>% head(10))

# Visualize OAA distribution
ggplot(player_oaa, aes(x = oaa)) +
  geom_histogram(binwidth = 0.5, fill = "#2b8cbe", color = "white", alpha = 0.8) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "red", size = 1) +
  labs(
    title = "Distribution of Outfielder OAA - July 2024",
    subtitle = "One month of data shows wide variance in defensive value",
    x = "Outs Above Average (OAA)",
    y = "Number of Players",
    caption = "Data: MLB Statcast via baseballr\nNote: Simplified catch probability model"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pybaseball import statcast
from scipy.special import expit  # Logistic function

# Set style
sns.set_style("whitegrid")

# Get Statcast data for July 2024
outfield_plays = statcast(start_dt='2024-07-01', end_dt='2024-07-31')

# Filter for valid outfield plays
outfield_plays = outfield_plays[
    outfield_plays['hit_distance_sc'].notna() &
    outfield_plays['hc_x'].notna() &
    outfield_plays['hc_y'].notna() &
    outfield_plays['hit_location'].isin([7, 8, 9])  # Outfield
].copy()

# Calculate catch probability (simplified model)
def calc_catch_probability(row):
    hang_time = row['launch_speed'] / 100  # Simplified
    distance = row['hit_distance_sc']
    direction = abs(row['hc_x'] - 125.42)  # Distance from center

    # Logistic regression model (simplified)
    logit = 3 - 0.15 * distance - 0.5 * direction + 2 * hang_time
    prob = expit(logit)  # Sigmoid function

    return np.clip(prob, 0.01, 0.99)

# Apply catch probability calculation
outfield_plays['catch_prob'] = outfield_plays.apply(calc_catch_probability, axis=1)

# Determine if ball was caught
caught_events = ['field_out', 'double_play', 'triple_play', 'sac_fly']
outfield_plays['was_caught'] = outfield_plays['events'].isin(caught_events).astype(int)

# Calculate outs above average
outfield_plays['outs_above_avg'] = outfield_plays['was_caught'] - outfield_plays['catch_prob']

# Aggregate by fielder
player_oaa = outfield_plays.groupby('fielder_2').agg(
    opportunities=('outs_above_avg', 'count'),
    catches=('was_caught', 'sum'),
    expected_catches=('catch_prob', 'sum'),
    oaa=('outs_above_avg', 'sum')
).reset_index()

# Filter minimum sample size
player_oaa = player_oaa[player_oaa['opportunities'] >= 20].sort_values('oaa', ascending=False)

print("\nTop 10 Defenders by OAA (July 2024):")
print(player_oaa.head(10))

# Visualize OAA distribution
fig, ax = plt.subplots(figsize=(12, 6))

ax.hist(player_oaa['oaa'], bins=20, color='#2b8cbe', alpha=0.8, edgecolor='white')
ax.axvline(x=0, color='red', linestyle='--', linewidth=2, label='Average (0 OAA)')

ax.set_xlabel('Outs Above Average (OAA)', fontsize=12)
ax.set_ylabel('Number of Players', fontsize=12)
ax.set_title('Distribution of Outfielder OAA - July 2024\nOne month of data shows wide variance in defensive value',
             fontsize=14, fontweight='bold', pad=20)
ax.legend()

plt.figtext(0.99, 0.01, 'Data: MLB Statcast via pybaseball\nNote: Simplified catch probability model',
            ha='right', fontsize=9, style='italic')

plt.tight_layout()
plt.show()

# Calculate summary statistics
print(f"\nOAA Summary Statistics:")
print(f"Mean: {player_oaa['oaa'].mean():.2f}")
print(f"Median: {player_oaa['oaa'].median():.2f}")
print(f"Std Dev: {player_oaa['oaa'].std():.2f}")
print(f"Range: {player_oaa['oaa'].min():.2f} to {player_oaa['oaa'].max():.2f}")

8.3 Route Efficiency and Jump

Beyond just making plays, how efficiently fielders reach batted balls matters enormously. Two outfielders might both catch the same fly ball, but one might take an optimal route while the other wastes precious seconds on a roundabout path.

8.3.1 Outfielder Routes and Metrics {#outfield-routes}

Route efficiency measures how direct a fielder's path was to the ball, expressed as a percentage:

Route Efficiency = (Direct Distance to Ball) / (Actual Distance Covered) × 100%

A perfectly direct route has 100% efficiency. A route that covers extra distance (due to poor reads, hesitation, or avoiding obstacles) scores lower. Elite outfielders consistently achieve 95%+ efficiency on routine flies.

Jump measures how quickly a fielder reacts to the batted ball, calculated as:

Jump = Distance Covered in First 3 Seconds After Contact

A good jump requires:

Quick reaction to ball off bat

Correct read of ball trajectory

Explosive first step

Proper positioning for the play

Real-world examples:

Mookie Betts (2023 average):

Route efficiency: 96.8% (elite)

Average jump: 11.2 feet

Sprint speed: 28.9 ft/s

OAA: +12 (excellent)

Kevin Kiermaier (career average):

Route efficiency: 97.1% (historically elite)

Average jump: 12.1 feet (exceptional)

Sprint speed: 28.2 ft/s

OAA: +67 from 2015-2022 (Gold Glove caliber)

Let's analyze route efficiency with code:

R Implementation:

library(tidyverse)
library(baseballr)

# Get player tracking data (conceptual - actual API varies)
# In practice, you'd use Baseball Savant's sprint speed and route data

# Simulated route data for illustration
set.seed(42)
route_data <- data.frame(
  player = rep(c("Mookie Betts", "Harrison Bader", "Randy Arozarena",
                  "Kyle Tucker", "Byron Buxton"), each = 50),
  play_id = 1:250
) %>%
  mutate(
    direct_distance = runif(250, 20, 100),
    route_efficiency = case_when(
      player == "Mookie Betts" ~ rnorm(50, 0.968, 0.02),
      player == "Harrison Bader" ~ rnorm(50, 0.965, 0.025),
      player == "Byron Buxton" ~ rnorm(50, 0.972, 0.018),
      player == "Randy Arozarena" ~ rnorm(50, 0.955, 0.03),
      player == "Kyle Tucker" ~ rnorm(50, 0.958, 0.028)
    ),
    route_efficiency = pmin(1.0, pmax(0.85, route_efficiency)),
    actual_distance = direct_distance / route_efficiency,
    extra_distance = actual_distance - direct_distance,
    jump_distance = rnorm(250, 11, 1.5)
  )

# Summarize by player
player_routes <- route_data %>%
  group_by(player) %>%
  summarize(
    plays = n(),
    avg_route_efficiency = mean(route_efficiency),
    avg_jump = mean(jump_distance),
    avg_direct_distance = mean(direct_distance),
    avg_extra_distance = mean(extra_distance),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_route_efficiency))

print(player_routes)

# Visualize route efficiency vs jump
ggplot(player_routes, aes(x = avg_route_efficiency, y = avg_jump)) +
  geom_point(size = 4, color = "#2b8cbe", alpha = 0.7) +
  geom_text(aes(label = player), vjust = -0.8, size = 3.5) +
  geom_vline(xintercept = mean(player_routes$avg_route_efficiency),
             linetype = "dashed", color = "gray50", alpha = 0.7) +
  geom_hline(yintercept = mean(player_routes$avg_jump),
             linetype = "dashed", color = "gray50", alpha = 0.7) +
  scale_x_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(
    title = "Outfielder Route Efficiency vs Jump",
    subtitle = "Top right quadrant shows elite combination of reads and reactions",
    x = "Average Route Efficiency (%)",
    y = "Average Jump (feet in first 3 seconds)",
    caption = "Data: Simulated for illustration"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    panel.grid.minor = element_blank()
  )

# Show distribution of route efficiency
ggplot(route_data, aes(x = route_efficiency, fill = player)) +
  geom_density(alpha = 0.5) +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Route Efficiency Distribution by Player",
    subtitle = "Tighter distributions indicate more consistent routes",
    x = "Route Efficiency",
    y = "Density",
    fill = "Player"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "bottom"
  )

Python Implementation:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set random seed for reproducibility
np.random.seed(42)
sns.set_style("whitegrid")

# Simulate route data for top outfielders
players = ['Mookie Betts', 'Harrison Bader', 'Randy Arozarena', 'Kyle Tucker', 'Byron Buxton']
n_plays = 50

route_data = []
for player in players:
    # Generate player-specific route efficiency
    if player == 'Mookie Betts':
        efficiency = np.random.normal(0.968, 0.02, n_plays)
    elif player == 'Harrison Bader':
        efficiency = np.random.normal(0.965, 0.025, n_plays)
    elif player == 'Byron Buxton':
        efficiency = np.random.normal(0.972, 0.018, n_plays)
    elif player == 'Randy Arozarena':
        efficiency = np.random.normal(0.955, 0.03, n_plays)
    else:  # Kyle Tucker
        efficiency = np.random.normal(0.958, 0.028, n_plays)

    # Clip to realistic range
    efficiency = np.clip(efficiency, 0.85, 1.0)

    # Generate other metrics
    direct_distance = np.random.uniform(20, 100, n_plays)
    actual_distance = direct_distance / efficiency
    jump_distance = np.random.normal(11, 1.5, n_plays)

    for i in range(n_plays):
        route_data.append({
            'player': player,
            'play_id': len(route_data) + 1,
            'direct_distance': direct_distance[i],
            'route_efficiency': efficiency[i],
            'actual_distance': actual_distance[i],
            'extra_distance': actual_distance[i] - direct_distance[i],
            'jump_distance': jump_distance[i]
        })

route_df = pd.DataFrame(route_data)

# Summarize by player
player_routes = route_df.groupby('player').agg({
    'play_id': 'count',
    'route_efficiency': 'mean',
    'jump_distance': 'mean',
    'direct_distance': 'mean',
    'extra_distance': 'mean'
}).round(3)

player_routes.columns = ['plays', 'avg_route_efficiency', 'avg_jump',
                          'avg_direct_distance', 'avg_extra_distance']
player_routes = player_routes.sort_values('avg_route_efficiency', ascending=False)

print("\nPlayer Route Metrics:")
print(player_routes)

# Visualize route efficiency vs jump
fig, ax = plt.subplots(figsize=(12, 8))

for player in players:
    player_data = player_routes.loc[player]
    ax.scatter(player_data['avg_route_efficiency'], player_data['avg_jump'],
               s=150, alpha=0.7, label=player)
    ax.text(player_data['avg_route_efficiency'], player_data['avg_jump'] + 0.15,
            player, fontsize=9, ha='center')

# Add average lines
ax.axvline(player_routes['avg_route_efficiency'].mean(),
           color='gray', linestyle='--', alpha=0.5, label='Avg Efficiency')
ax.axhline(player_routes['avg_jump'].mean(),
           color='gray', linestyle='--', alpha=0.5, label='Avg Jump')

ax.set_xlabel('Average Route Efficiency', fontsize=12)
ax.set_ylabel('Average Jump (feet in first 3 seconds)', fontsize=12)
ax.set_title('Outfielder Route Efficiency vs Jump\nTop right quadrant shows elite combination of reads and reactions',
             fontsize=14, fontweight='bold', pad=20)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x:.1%}'))

plt.figtext(0.99, 0.01, 'Data: Simulated for illustration',
            ha='right', fontsize=9, style='italic')

plt.tight_layout()
plt.show()

# Show distribution of route efficiency by player
fig, ax = plt.subplots(figsize=(12, 6))

for player in players:
    player_data = route_df[route_df['player'] == player]
    ax.hist(player_data['route_efficiency'], bins=15, alpha=0.4, label=player)

ax.set_xlabel('Route Efficiency', fontsize=12)
ax.set_ylabel('Frequency', fontsize=12)
ax.set_title('Route Efficiency Distribution by Player\nTighter distributions indicate more consistent routes',
             fontsize=14, fontweight='bold')
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x:.1%}'))
ax.legend()

plt.tight_layout()
plt.show()

8.3.2 Infielder Metrics {#infield-metrics}

Infield defense requires different skills than outfield defense. Rather than long routes to fly balls, infielders need explosive reactions to hard-hit grounders and line drives, plus quick exchanges and accurate throws.

Key infield metrics:

Reaction time: Time from ball contact to first movement (typical: 0.15-0.25 seconds)

Exchange time: Time from ball hitting glove to release of throw (elite: <0.6 seconds, average: 0.7-0.8 seconds)

Arm strength: Throwing velocity across the infield (varies by position)

Third basemen: 80-90 mph

Shortstops: 80-88 mph

Second basemen: 75-85 mph

First basemen: 70-80 mph (less relevant)

Range: Lateral movement and area covered (measured via OAA)

Hands/Fielding: Converting batted balls into controlled outs (measured via error rate on contacted balls)

Real-world examples:

Dansby Swanson (2024):

Reaction time: 0.18 seconds (elite)

Median exchange: 0.65 seconds

Arm strength: 86.2 mph (strong for SS)

OAA: +11 (excellent)

Matt Chapman (2024):

Reaction time: 0.19 seconds

Median exchange: 0.63 seconds

Arm strength: 89.1 mph (elite for 3B)

OAA: +14 (Gold Glove caliber)

Infielders also benefit from good positioning and pitch anticipation. Knowing a pitcher tends to induce ground balls to the pull side allows infielders to shade accordingly, improving their effective range.

library(tidyverse)
library(baseballr)

# Get player tracking data (conceptual - actual API varies)
# In practice, you'd use Baseball Savant's sprint speed and route data

# Simulated route data for illustration
set.seed(42)
route_data <- data.frame(
  player = rep(c("Mookie Betts", "Harrison Bader", "Randy Arozarena",
                  "Kyle Tucker", "Byron Buxton"), each = 50),
  play_id = 1:250
) %>%
  mutate(
    direct_distance = runif(250, 20, 100),
    route_efficiency = case_when(
      player == "Mookie Betts" ~ rnorm(50, 0.968, 0.02),
      player == "Harrison Bader" ~ rnorm(50, 0.965, 0.025),
      player == "Byron Buxton" ~ rnorm(50, 0.972, 0.018),
      player == "Randy Arozarena" ~ rnorm(50, 0.955, 0.03),
      player == "Kyle Tucker" ~ rnorm(50, 0.958, 0.028)
    ),
    route_efficiency = pmin(1.0, pmax(0.85, route_efficiency)),
    actual_distance = direct_distance / route_efficiency,
    extra_distance = actual_distance - direct_distance,
    jump_distance = rnorm(250, 11, 1.5)
  )

# Summarize by player
player_routes <- route_data %>%
  group_by(player) %>%
  summarize(
    plays = n(),
    avg_route_efficiency = mean(route_efficiency),
    avg_jump = mean(jump_distance),
    avg_direct_distance = mean(direct_distance),
    avg_extra_distance = mean(extra_distance),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_route_efficiency))

print(player_routes)

# Visualize route efficiency vs jump
ggplot(player_routes, aes(x = avg_route_efficiency, y = avg_jump)) +
  geom_point(size = 4, color = "#2b8cbe", alpha = 0.7) +
  geom_text(aes(label = player), vjust = -0.8, size = 3.5) +
  geom_vline(xintercept = mean(player_routes$avg_route_efficiency),
             linetype = "dashed", color = "gray50", alpha = 0.7) +
  geom_hline(yintercept = mean(player_routes$avg_jump),
             linetype = "dashed", color = "gray50", alpha = 0.7) +
  scale_x_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(
    title = "Outfielder Route Efficiency vs Jump",
    subtitle = "Top right quadrant shows elite combination of reads and reactions",
    x = "Average Route Efficiency (%)",
    y = "Average Jump (feet in first 3 seconds)",
    caption = "Data: Simulated for illustration"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    panel.grid.minor = element_blank()
  )

# Show distribution of route efficiency
ggplot(route_data, aes(x = route_efficiency, fill = player)) +
  geom_density(alpha = 0.5) +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Route Efficiency Distribution by Player",
    subtitle = "Tighter distributions indicate more consistent routes",
    x = "Route Efficiency",
    y = "Density",
    fill = "Player"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "bottom"
  )

Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set random seed for reproducibility
np.random.seed(42)
sns.set_style("whitegrid")

# Simulate route data for top outfielders
players = ['Mookie Betts', 'Harrison Bader', 'Randy Arozarena', 'Kyle Tucker', 'Byron Buxton']
n_plays = 50

route_data = []
for player in players:
    # Generate player-specific route efficiency
    if player == 'Mookie Betts':
        efficiency = np.random.normal(0.968, 0.02, n_plays)
    elif player == 'Harrison Bader':
        efficiency = np.random.normal(0.965, 0.025, n_plays)
    elif player == 'Byron Buxton':
        efficiency = np.random.normal(0.972, 0.018, n_plays)
    elif player == 'Randy Arozarena':
        efficiency = np.random.normal(0.955, 0.03, n_plays)
    else:  # Kyle Tucker
        efficiency = np.random.normal(0.958, 0.028, n_plays)

    # Clip to realistic range
    efficiency = np.clip(efficiency, 0.85, 1.0)

    # Generate other metrics
    direct_distance = np.random.uniform(20, 100, n_plays)
    actual_distance = direct_distance / efficiency
    jump_distance = np.random.normal(11, 1.5, n_plays)

    for i in range(n_plays):
        route_data.append({
            'player': player,
            'play_id': len(route_data) + 1,
            'direct_distance': direct_distance[i],
            'route_efficiency': efficiency[i],
            'actual_distance': actual_distance[i],
            'extra_distance': actual_distance[i] - direct_distance[i],
            'jump_distance': jump_distance[i]
        })

route_df = pd.DataFrame(route_data)

# Summarize by player
player_routes = route_df.groupby('player').agg({
    'play_id': 'count',
    'route_efficiency': 'mean',
    'jump_distance': 'mean',
    'direct_distance': 'mean',
    'extra_distance': 'mean'
}).round(3)

player_routes.columns = ['plays', 'avg_route_efficiency', 'avg_jump',
                          'avg_direct_distance', 'avg_extra_distance']
player_routes = player_routes.sort_values('avg_route_efficiency', ascending=False)

print("\nPlayer Route Metrics:")
print(player_routes)

# Visualize route efficiency vs jump
fig, ax = plt.subplots(figsize=(12, 8))

for player in players:
    player_data = player_routes.loc[player]
    ax.scatter(player_data['avg_route_efficiency'], player_data['avg_jump'],
               s=150, alpha=0.7, label=player)
    ax.text(player_data['avg_route_efficiency'], player_data['avg_jump'] + 0.15,
            player, fontsize=9, ha='center')

# Add average lines
ax.axvline(player_routes['avg_route_efficiency'].mean(),
           color='gray', linestyle='--', alpha=0.5, label='Avg Efficiency')
ax.axhline(player_routes['avg_jump'].mean(),
           color='gray', linestyle='--', alpha=0.5, label='Avg Jump')

ax.set_xlabel('Average Route Efficiency', fontsize=12)
ax.set_ylabel('Average Jump (feet in first 3 seconds)', fontsize=12)
ax.set_title('Outfielder Route Efficiency vs Jump\nTop right quadrant shows elite combination of reads and reactions',
             fontsize=14, fontweight='bold', pad=20)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x:.1%}'))

plt.figtext(0.99, 0.01, 'Data: Simulated for illustration',
            ha='right', fontsize=9, style='italic')

plt.tight_layout()
plt.show()

# Show distribution of route efficiency by player
fig, ax = plt.subplots(figsize=(12, 6))

for player in players:
    player_data = route_df[route_df['player'] == player]
    ax.hist(player_data['route_efficiency'], bins=15, alpha=0.4, label=player)

ax.set_xlabel('Route Efficiency', fontsize=12)
ax.set_ylabel('Frequency', fontsize=12)
ax.set_title('Route Efficiency Distribution by Player\nTighter distributions indicate more consistent routes',
             fontsize=14, fontweight='bold')
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x:.1%}'))
ax.legend()

plt.tight_layout()
plt.show()

8.4 Arm Strength and Throwing

8.4.1 Outfield Arm Metrics {#outfield-arm}

An outfielder's arm affects the game both by throwing out baserunners and by deterring them from attempting extra bases. A strong-armed outfielder might only record 8 assists per season, but prevent 20+ runners from trying to advance.

Outfield arm metrics:

Arm strength: Maximum throwing velocity (measured on hardest throw)

Elite: 95+ mph

Above average: 90-95 mph

Average: 85-90 mph

Below average: <85 mph

Assists: Direct outs via throws (traditional stat, limited sample)

Baserunner kills: Runners thrown out attempting to advance

Extra bases prevented: Runners who held up due to arm reputation (harder to measure directly)

Real examples (2024 leaders):

Jorge Soler (OF, Miami):

Max arm strength: 97.4 mph

MLB rank: Top 5%

Assists: 7

Randy Arozarena (OF, Tampa Bay/Seattle):

Max arm strength: 95.8 mph

MLB rank: Top 10%

Assists: 9

Aaron Judge (OF, New York Yankees):

Max arm strength: 94.1 mph

MLB rank: Top 15%

Assists: 6

Let's analyze outfield arms:

R Implementation:

library(tidyverse)
library(baseballr)

# Simulated outfield arm data
set.seed(123)
n_players <- 30

arm_data <- data.frame(
  player = paste("Player", 1:n_players),
  max_arm_strength = rnorm(n_players, 88, 4),
  assists = rpois(n_players, 5),
  opportunities = rpois(n_players, 25)
) %>%
  mutate(
    max_arm_strength = pmax(78, pmin(98, max_arm_strength)),
    assist_rate = assists / opportunities,
    arm_tier = case_when(
      max_arm_strength >= 95 ~ "Elite (95+)",
      max_arm_strength >= 90 ~ "Above Avg (90-95)",
      max_arm_strength >= 85 ~ "Average (85-90)",
      TRUE ~ "Below Avg (<85)"
    ),
    arm_tier = factor(arm_tier, levels = c("Elite (95+)", "Above Avg (90-95)",
                                             "Average (85-90)", "Below Avg (<85)"))
  )

# Analyze arm strength vs assists
arm_summary <- arm_data %>%
  group_by(arm_tier) %>%
  summarize(
    players = n(),
    avg_arm_strength = mean(max_arm_strength),
    avg_assists = mean(assists),
    avg_assist_rate = mean(assist_rate),
    .groups = "drop"
  )

print(arm_summary)

# Visualize relationship between arm strength and assists
ggplot(arm_data, aes(x = max_arm_strength, y = assists)) +
  geom_point(aes(color = arm_tier), size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE, color = "#2b8cbe", linetype = "dashed") +
  scale_color_manual(
    values = c("Elite (95+)" = "#d7191c",
               "Above Avg (90-95)" = "#fdae61",
               "Average (85-90)" = "#abd9e9",
               "Below Avg (<85)" = "#2c7bb6")
  ) +
  labs(
    title = "Outfield Arm Strength vs Assists",
    subtitle = "Stronger arms correlate with more assists, though opportunities matter",
    x = "Max Arm Strength (mph)",
    y = "Assists",
    color = "Arm Tier",
    caption = "Data: Simulated for illustration"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "bottom"
  )

# Box plot of arm strength by tier
ggplot(arm_data, aes(x = arm_tier, y = max_arm_strength, fill = arm_tier)) +
  geom_boxplot(alpha = 0.7, outlier.shape = 16, outlier.size = 2) +
  scale_fill_manual(
    values = c("Elite (95+)" = "#d7191c",
               "Above Avg (90-95)" = "#fdae61",
               "Average (85-90)" = "#abd9e9",
               "Below Avg (<85)" = "#2c7bb6")
  ) +
  labs(
    title = "Distribution of Arm Strength by Tier",
    x = NULL,
    y = "Max Arm Strength (mph)",
    caption = "Data: Simulated for illustration"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "none",
    axis.text.x = element_text(angle = 15, hjust = 1)
  )

Python Implementation:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Set random seed
np.random.seed(123)
sns.set_style("whitegrid")

# Simulate outfield arm data
n_players = 30

arm_data = pd.DataFrame({
    'player': [f'Player {i}' for i in range(1, n_players + 1)],
    'max_arm_strength': np.random.normal(88, 4, n_players),
    'assists': np.random.poisson(5, n_players),
    'opportunities': np.random.poisson(25, n_players)
})

# Clip arm strength to realistic range
arm_data['max_arm_strength'] = arm_data['max_arm_strength'].clip(78, 98)
arm_data['assist_rate'] = arm_data['assists'] / arm_data['opportunities']

# Create arm tier categories
def categorize_arm(strength):
    if strength >= 95:
        return 'Elite (95+)'
    elif strength >= 90:
        return 'Above Avg (90-95)'
    elif strength >= 85:
        return 'Average (85-90)'
    else:
        return 'Below Avg (<85)'

arm_data['arm_tier'] = arm_data['max_arm_strength'].apply(categorize_arm)

# Order categories
tier_order = ['Elite (95+)', 'Above Avg (90-95)', 'Average (85-90)', 'Below Avg (<85)']
arm_data['arm_tier'] = pd.Categorical(arm_data['arm_tier'], categories=tier_order, ordered=True)

# Summary statistics
arm_summary = arm_data.groupby('arm_tier').agg({
    'player': 'count',
    'max_arm_strength': 'mean',
    'assists': 'mean',
    'assist_rate': 'mean'
}).round(2)

arm_summary.columns = ['players', 'avg_arm_strength', 'avg_assists', 'avg_assist_rate']
print("\nArm Strength Summary by Tier:")
print(arm_summary)

# Visualize arm strength vs assists
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Scatter plot with regression
colors = {'Elite (95+)': '#d7191c', 'Above Avg (90-95)': '#fdae61',
          'Average (85-90)': '#abd9e9', 'Below Avg (<85)': '#2c7bb6'}

for tier in tier_order:
    tier_data = arm_data[arm_data['arm_tier'] == tier]
    ax1.scatter(tier_data['max_arm_strength'], tier_data['assists'],
                label=tier, color=colors[tier], s=80, alpha=0.7)

# Add regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(
    arm_data['max_arm_strength'], arm_data['assists'])
x_line = np.array([arm_data['max_arm_strength'].min(), arm_data['max_arm_strength'].max()])
y_line = slope * x_line + intercept
ax1.plot(x_line, y_line, 'b--', alpha=0.5, linewidth=2,
         label=f'Regression (R² = {r_value**2:.3f})')

ax1.set_xlabel('Max Arm Strength (mph)', fontsize=12)
ax1.set_ylabel('Assists', fontsize=12)
ax1.set_title('Outfield Arm Strength vs Assists\nStronger arms correlate with more assists',
              fontsize=13, fontweight='bold')
ax1.legend(loc='upper left', fontsize=9)
ax1.grid(True, alpha=0.3)

# Box plot of arm strength by tier
positions = range(len(tier_order))
bp = ax2.boxplot([arm_data[arm_data['arm_tier'] == tier]['max_arm_strength'].values
                   for tier in tier_order],
                  positions=positions,
                  labels=tier_order,
                  patch_artist=True,
                  widths=0.6)

# Color the boxes
for patch, tier in zip(bp['boxes'], tier_order):
    patch.set_facecolor(colors[tier])
    patch.set_alpha(0.7)

ax2.set_ylabel('Max Arm Strength (mph)', fontsize=12)
ax2.set_title('Distribution of Arm Strength by Tier',
              fontsize=13, fontweight='bold')
ax2.tick_params(axis='x', rotation=15)
ax2.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.figtext(0.99, 0.01, 'Data: Simulated for illustration',
            ha='right', fontsize=9, style='italic')
plt.show()

8.4.2 Infield Arm Considerations {#infield-arm}

Infield arms work differently than outfield arms. The premium is on quick release and accuracy rather than pure velocity, though elite infielders combine all three.

Position-specific arm requirements:

Third base: Longest throw across the diamond (127 feet to first). Needs elite arm strength and quick release. Third basemen routinely make 85-90 mph throws.

Shortstop: Slightly shorter throw than 3B (105-120 feet depending on positioning). Must combine arm strength with the ability to throw from multiple arm angles and off-balance positions.

Second base: Shortest throws (90-100 feet). Quick release matters more than raw strength. However, second basemen turning double plays need enough arm to complete the relay throw.

First base: Rarely makes long throws. Arm strength is least important at this position.

The exchange time (from catch to release) is often more important than velocity for infielders. A shortstop with a 0.6-second exchange and 85 mph throw will get more outs than one with a 0.8-second exchange and 88 mph throw, because the extra 0.2 seconds allows the runner to cover more distance than the 3 mph velocity difference saves.

library(tidyverse)
library(baseballr)

# Simulated outfield arm data
set.seed(123)
n_players <- 30

arm_data <- data.frame(
  player = paste("Player", 1:n_players),
  max_arm_strength = rnorm(n_players, 88, 4),
  assists = rpois(n_players, 5),
  opportunities = rpois(n_players, 25)
) %>%
  mutate(
    max_arm_strength = pmax(78, pmin(98, max_arm_strength)),
    assist_rate = assists / opportunities,
    arm_tier = case_when(
      max_arm_strength >= 95 ~ "Elite (95+)",
      max_arm_strength >= 90 ~ "Above Avg (90-95)",
      max_arm_strength >= 85 ~ "Average (85-90)",
      TRUE ~ "Below Avg (<85)"
    ),
    arm_tier = factor(arm_tier, levels = c("Elite (95+)", "Above Avg (90-95)",
                                             "Average (85-90)", "Below Avg (<85)"))
  )

# Analyze arm strength vs assists
arm_summary <- arm_data %>%
  group_by(arm_tier) %>%
  summarize(
    players = n(),
    avg_arm_strength = mean(max_arm_strength),
    avg_assists = mean(assists),
    avg_assist_rate = mean(assist_rate),
    .groups = "drop"
  )

print(arm_summary)

# Visualize relationship between arm strength and assists
ggplot(arm_data, aes(x = max_arm_strength, y = assists)) +
  geom_point(aes(color = arm_tier), size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE, color = "#2b8cbe", linetype = "dashed") +
  scale_color_manual(
    values = c("Elite (95+)" = "#d7191c",
               "Above Avg (90-95)" = "#fdae61",
               "Average (85-90)" = "#abd9e9",
               "Below Avg (<85)" = "#2c7bb6")
  ) +
  labs(
    title = "Outfield Arm Strength vs Assists",
    subtitle = "Stronger arms correlate with more assists, though opportunities matter",
    x = "Max Arm Strength (mph)",
    y = "Assists",
    color = "Arm Tier",
    caption = "Data: Simulated for illustration"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "bottom"
  )

# Box plot of arm strength by tier
ggplot(arm_data, aes(x = arm_tier, y = max_arm_strength, fill = arm_tier)) +
  geom_boxplot(alpha = 0.7, outlier.shape = 16, outlier.size = 2) +
  scale_fill_manual(
    values = c("Elite (95+)" = "#d7191c",
               "Above Avg (90-95)" = "#fdae61",
               "Average (85-90)" = "#abd9e9",
               "Below Avg (<85)" = "#2c7bb6")
  ) +
  labs(
    title = "Distribution of Arm Strength by Tier",
    x = NULL,
    y = "Max Arm Strength (mph)",
    caption = "Data: Simulated for illustration"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "none",
    axis.text.x = element_text(angle = 15, hjust = 1)
  )

Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Set random seed
np.random.seed(123)
sns.set_style("whitegrid")

# Simulate outfield arm data
n_players = 30

arm_data = pd.DataFrame({
    'player': [f'Player {i}' for i in range(1, n_players + 1)],
    'max_arm_strength': np.random.normal(88, 4, n_players),
    'assists': np.random.poisson(5, n_players),
    'opportunities': np.random.poisson(25, n_players)
})

# Clip arm strength to realistic range
arm_data['max_arm_strength'] = arm_data['max_arm_strength'].clip(78, 98)
arm_data['assist_rate'] = arm_data['assists'] / arm_data['opportunities']

# Create arm tier categories
def categorize_arm(strength):
    if strength >= 95:
        return 'Elite (95+)'
    elif strength >= 90:
        return 'Above Avg (90-95)'
    elif strength >= 85:
        return 'Average (85-90)'
    else:
        return 'Below Avg (<85)'

arm_data['arm_tier'] = arm_data['max_arm_strength'].apply(categorize_arm)

# Order categories
tier_order = ['Elite (95+)', 'Above Avg (90-95)', 'Average (85-90)', 'Below Avg (<85)']
arm_data['arm_tier'] = pd.Categorical(arm_data['arm_tier'], categories=tier_order, ordered=True)

# Summary statistics
arm_summary = arm_data.groupby('arm_tier').agg({
    'player': 'count',
    'max_arm_strength': 'mean',
    'assists': 'mean',
    'assist_rate': 'mean'
}).round(2)

arm_summary.columns = ['players', 'avg_arm_strength', 'avg_assists', 'avg_assist_rate']
print("\nArm Strength Summary by Tier:")
print(arm_summary)

# Visualize arm strength vs assists
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Scatter plot with regression
colors = {'Elite (95+)': '#d7191c', 'Above Avg (90-95)': '#fdae61',
          'Average (85-90)': '#abd9e9', 'Below Avg (<85)': '#2c7bb6'}

for tier in tier_order:
    tier_data = arm_data[arm_data['arm_tier'] == tier]
    ax1.scatter(tier_data['max_arm_strength'], tier_data['assists'],
                label=tier, color=colors[tier], s=80, alpha=0.7)

# Add regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(
    arm_data['max_arm_strength'], arm_data['assists'])
x_line = np.array([arm_data['max_arm_strength'].min(), arm_data['max_arm_strength'].max()])
y_line = slope * x_line + intercept
ax1.plot(x_line, y_line, 'b--', alpha=0.5, linewidth=2,
         label=f'Regression (R² = {r_value**2:.3f})')

ax1.set_xlabel('Max Arm Strength (mph)', fontsize=12)
ax1.set_ylabel('Assists', fontsize=12)
ax1.set_title('Outfield Arm Strength vs Assists\nStronger arms correlate with more assists',
              fontsize=13, fontweight='bold')
ax1.legend(loc='upper left', fontsize=9)
ax1.grid(True, alpha=0.3)

# Box plot of arm strength by tier
positions = range(len(tier_order))
bp = ax2.boxplot([arm_data[arm_data['arm_tier'] == tier]['max_arm_strength'].values
                   for tier in tier_order],
                  positions=positions,
                  labels=tier_order,
                  patch_artist=True,
                  widths=0.6)

# Color the boxes
for patch, tier in zip(bp['boxes'], tier_order):
    patch.set_facecolor(colors[tier])
    patch.set_alpha(0.7)

ax2.set_ylabel('Max Arm Strength (mph)', fontsize=12)
ax2.set_title('Distribution of Arm Strength by Tier',
              fontsize=13, fontweight='bold')
ax2.tick_params(axis='x', rotation=15)
ax2.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.figtext(0.99, 0.01, 'Data: Simulated for illustration',
            ha='right', fontsize=9, style='italic')
plt.show()

8.5 Catcher Defense

Catchers are baseball's most complex defensive position. They handle every pitch, frame borderline calls, block balls in the dirt, and control the running game. Modern analytics has revealed that elite catchers provide enormous value—sometimes 20-30 runs per season—that was completely invisible in traditional statistics.

8.5.1 Framing {#framing}

Pitch framing is the art of receiving pitches to maximize the likelihood that borderline pitches are called strikes. Elite framers can "steal" 10-15 strikes per game, adding up to 150-200 runs over a full season.

How framing works:

Subtle glove movement to bring pitches toward the strike zone

"Sticking" pitches rather than stabbing at them

Quiet receiving without excessive movement

Understanding each umpire's zone tendencies

Framing value calculation:

Determine the expected called strike probability for each pitch based on location

Compare actual calls to expected calls

Convert extra strikes to runs using run expectancy framework

Sum over all pitches caught

Top framers (2024):

Tyler Stephenson (Cincinnati):

Framing runs: +11

Extra strikes per game: ~2.5

Strike rate on edge pitches: 55% (league avg: 50%)

Salvador Perez (Kansas City):

Framing runs: +9

Extra strikes per game: ~2.2

Career framing: +45 runs (2015-2024)

How much is framing worth?: A catcher who provides +15 framing runs is adding 1.5 wins above average through receiving alone—roughly equivalent to a hitter providing a .280/.340/.450 slash line. This is an enormous skill that was completely unmeasured before pitch-tracking technology.

R Implementation:

library(tidyverse)

# Simulated framing data
set.seed(456)

# Generate pitch locations and outcomes
framing_data <- data.frame(
  catcher = rep(c("Elite Framer", "Average Framer", "Poor Framer"), each = 1000),
  pitch_id = 1:3000
) %>%
  mutate(
    # Distance from zone edge (0 = edge, negative = inside zone, positive = outside)
    edge_distance = rnorm(3000, 0, 0.3),
    # Expected strike probability based on location
    expected_strike_prob = plogis(1.5 - 6 * abs(edge_distance)),
    # Actual call depends on catcher skill
    framing_adjustment = case_when(
      catcher == "Elite Framer" ~ 0.15,
      catcher == "Average Framer" ~ 0.0,
      catcher == "Poor Framer" ~ -0.12
    ),
    actual_strike_prob = pmin(0.95, pmax(0.05,
                                          expected_strike_prob + framing_adjustment)),
    called_strike = rbinom(3000, 1, actual_strike_prob),
    # Each strike worth ~0.13 runs
    runs_value = (called_strike - expected_strike_prob) * 0.13
  )

# Summarize by catcher
catcher_framing <- framing_data %>%
  group_by(catcher) %>%
  summarize(
    pitches = n(),
    strikes = sum(called_strike),
    expected_strikes = sum(expected_strike_prob),
    extra_strikes = strikes - expected_strikes,
    framing_runs = sum(runs_value),
    strikes_per_game = (extra_strikes / pitches) * 140,  # ~140 pitches/game
    .groups = "drop"
  ) %>%
  arrange(desc(framing_runs))

print(catcher_framing)

# Visualize framing runs
ggplot(catcher_framing, aes(x = reorder(catcher, framing_runs), y = framing_runs,
                             fill = framing_runs)) +
  geom_col(width = 0.7) +
  geom_text(aes(label = sprintf("+%.1f", framing_runs)),
            hjust = ifelse(catcher_framing$framing_runs > 0, -0.2, 1.2),
            size = 5, fontface = "bold") +
  scale_fill_gradient2(low = "#d7191c", mid = "gray90", high = "#2c7bb6",
                       midpoint = 0) +
  coord_flip() +
  labs(
    title = "Catcher Framing Runs Above Average",
    subtitle = "Elite framers add 10+ runs per season through receiving",
    x = NULL,
    y = "Framing Runs Above Average",
    caption = "Data: Simulated | Each strike worth ~0.13 runs"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "none",
    panel.grid.major.y = element_blank()
  )

# Show strike rate by pitch location
ggplot(framing_data, aes(x = edge_distance, y = called_strike, color = catcher)) +
  geom_smooth(method = "loess", se = FALSE, size = 1.3) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "black") +
  annotate("text", x = -0.2, y = 0.9, label = "Inside Zone", size = 3) +
  annotate("text", x = 0.2, y = 0.9, label = "Outside Zone", size = 3) +
  scale_color_manual(values = c("Elite Framer" = "#2c7bb6",
                                  "Average Framer" = "gray50",
                                  "Poor Framer" = "#d7191c")) +
  labs(
    title = "Called Strike Rate by Pitch Location and Framer Quality",
    subtitle = "Elite framers get more strikes on borderline pitches",
    x = "Distance from Zone Edge (feet, negative = inside)",
    y = "Called Strike Rate",
    color = "Catcher Type"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

Python Implementation:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.special import expit

# Set random seed
np.random.seed(456)
sns.set_style("whitegrid")

# Simulate framing data
catchers = ['Elite Framer', 'Average Framer', 'Poor Framer']
n_pitches = 1000

framing_data = []
for catcher in catchers:
    # Generate pitch locations relative to zone edge
    edge_distance = np.random.normal(0, 0.3, n_pitches)

    # Expected strike probability based on location
    expected_strike_prob = expit(1.5 - 6 * np.abs(edge_distance))

    # Framing adjustment
    if catcher == 'Elite Framer':
        adjustment = 0.15
    elif catcher == 'Average Framer':
        adjustment = 0.0
    else:
        adjustment = -0.12

    # Actual strike probability with framing
    actual_strike_prob = np.clip(expected_strike_prob + adjustment, 0.05, 0.95)

    # Generate actual calls
    called_strike = np.random.binomial(1, actual_strike_prob)

    # Calculate runs value (each strike worth ~0.13 runs)
    runs_value = (called_strike - expected_strike_prob) * 0.13

    for i in range(n_pitches):
        framing_data.append({
            'catcher': catcher,
            'edge_distance': edge_distance[i],
            'expected_strike_prob': expected_strike_prob[i],
            'actual_strike_prob': actual_strike_prob[i],
            'called_strike': called_strike[i],
            'runs_value': runs_value[i]
        })

framing_df = pd.DataFrame(framing_data)

# Summarize by catcher
catcher_framing = framing_df.groupby('catcher').agg({
    'edge_distance': 'count',
    'called_strike': 'sum',
    'expected_strike_prob': 'sum',
    'runs_value': 'sum'
}).round(2)

catcher_framing.columns = ['pitches', 'strikes', 'expected_strikes', 'framing_runs']
catcher_framing['extra_strikes'] = catcher_framing['strikes'] - catcher_framing['expected_strikes']
catcher_framing['strikes_per_game'] = (catcher_framing['extra_strikes'] /
                                         catcher_framing['pitches']) * 140

catcher_framing = catcher_framing.sort_values('framing_runs', ascending=False)
print("\nCatcher Framing Summary:")
print(catcher_framing)

# Visualize framing runs
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Bar chart of framing runs
colors_map = {'Elite Framer': '#2c7bb6', 'Average Framer': 'gray', 'Poor Framer': '#d7191c'}
colors = [colors_map[c] for c in catcher_framing.index]

bars = ax1.barh(range(len(catcher_framing)), catcher_framing['framing_runs'], color=colors, alpha=0.8)
ax1.set_yticks(range(len(catcher_framing)))
ax1.set_yticklabels(catcher_framing.index)
ax1.set_xlabel('Framing Runs Above Average', fontsize=12)
ax1.set_title('Catcher Framing Runs Above Average\nElite framers add 10+ runs per season through receiving',
              fontsize=13, fontweight='bold')
ax1.axvline(x=0, color='black', linestyle='--', linewidth=1, alpha=0.5)
ax1.grid(True, alpha=0.3, axis='x')

# Add value labels
for i, (bar, value) in enumerate(zip(bars, catcher_framing['framing_runs'])):
    x_pos = value + (0.5 if value > 0 else -0.5)
    ax1.text(x_pos, bar.get_y() + bar.get_height()/2, f'{value:+.1f}',
             va='center', ha='left' if value > 0 else 'right', fontweight='bold')

# Strike rate by location
for catcher in catchers:
    catcher_data = framing_df[framing_df['catcher'] == catcher].sort_values('edge_distance')

    # Smooth with rolling average
    window = 100
    x_smooth = catcher_data['edge_distance'].rolling(window, center=True).mean()
    y_smooth = catcher_data['called_strike'].rolling(window, center=True).mean()

    ax2.plot(x_smooth, y_smooth, linewidth=2.5, label=catcher, color=colors_map[catcher])

ax2.axvline(x=0, color='black', linestyle='--', linewidth=1.5, alpha=0.7)
ax2.text(-0.2, 0.9, 'Inside Zone', fontsize=10, ha='center')
ax2.text(0.2, 0.9, 'Outside Zone', fontsize=10, ha='center')
ax2.set_xlabel('Distance from Zone Edge (feet, negative = inside)', fontsize=12)
ax2.set_ylabel('Called Strike Rate', fontsize=12)
ax2.set_title('Called Strike Rate by Pitch Location\nElite framers get more strikes on borderline pitches',
              fontsize=13, fontweight='bold')
ax2.legend(loc='lower right')
ax2.grid(True, alpha=0.3)
ax2.set_ylim(0, 1)

plt.tight_layout()
plt.figtext(0.99, 0.01, 'Data: Simulated | Each strike worth ~0.13 runs',
            ha='right', fontsize=9, style='italic')
plt.show()

8.5.2 Pop Time {#pop-time}

Pop time measures how quickly a catcher receives a pitch, transfers it, and throws to second base, measured from the moment the ball hits the catcher's glove to when it reaches the fielder's glove at second base.

Pop time benchmarks:

Elite: <1.90 seconds

Above average: 1.90-1.95 seconds

Average: 1.95-2.00 seconds

Below average: 2.00-2.05 seconds

Poor: >2.05 seconds

Components of pop time:

Transfer time: From glove to throwing hand (elite: 0.6-0.7 seconds)

Arm strength: Throwing velocity (elite: 85+ mph to second base)

Accuracy: On-target throws allow quicker tags

Pop time leaders (2024):

J.T. Realmuto (Philadelphia):

Average pop time: 1.87 seconds

Exchange time: 0.64 seconds

Arm strength: 86.1 mph

CS%: 38% (elite)

Adley Rutschman (Baltimore):

Average pop time: 1.94 seconds

Exchange time: 0.68 seconds

Arm strength: 83.7 mph

CS%: 32% (above average)

Salvador Perez (Kansas City):

Average pop time: 1.98 seconds

Exchange time: 0.71 seconds

Arm strength: 82.4 mph

CS%: 26% (average)

Why pop time matters: A runner stealing second base covers 90 feet. At typical sprint speed (28 ft/s), this takes about 3.2 seconds from first movement. The pitcher's delivery adds ~1.3-1.5 seconds (from leg lift to ball reaching catcher). This leaves roughly 1.7-1.9 seconds for the catcher to throw out the runner. A catcher with 1.85-second pop time has a realistic chance; one with 2.10-second pop time has almost none.

8.5.3 Blocking {#blocking}

Blocking measures a catcher's ability to prevent wild pitches and passed balls on pitches in the dirt. While less valuable than framing (fewer opportunities, smaller run value per event), good blocking still saves 3-5 runs per season.

Blocking metrics:

Block rate: Percentage of pitches in the dirt that are blocked (average: ~85%)

Blocking runs: Runs saved by preventing wild pitches/passed balls

What makes good blocking:

Quick drop to knees

Wide base to cover maximum area

Proper glove position (down, creating a backstop)

Anticipation of breaking balls in the dirt

Top blockers (2024 estimated):

Willson Contreras: +3 blocking runs

Tyler Stephenson: +2 blocking runs

Sean Murphy: +2 blocking runs

Poor blockers can cost their team 5+ runs through passed balls that advance runners or allow scoring.

library(tidyverse)

# Simulated framing data
set.seed(456)

# Generate pitch locations and outcomes
framing_data <- data.frame(
  catcher = rep(c("Elite Framer", "Average Framer", "Poor Framer"), each = 1000),
  pitch_id = 1:3000
) %>%
  mutate(
    # Distance from zone edge (0 = edge, negative = inside zone, positive = outside)
    edge_distance = rnorm(3000, 0, 0.3),
    # Expected strike probability based on location
    expected_strike_prob = plogis(1.5 - 6 * abs(edge_distance)),
    # Actual call depends on catcher skill
    framing_adjustment = case_when(
      catcher == "Elite Framer" ~ 0.15,
      catcher == "Average Framer" ~ 0.0,
      catcher == "Poor Framer" ~ -0.12
    ),
    actual_strike_prob = pmin(0.95, pmax(0.05,
                                          expected_strike_prob + framing_adjustment)),
    called_strike = rbinom(3000, 1, actual_strike_prob),
    # Each strike worth ~0.13 runs
    runs_value = (called_strike - expected_strike_prob) * 0.13
  )

# Summarize by catcher
catcher_framing <- framing_data %>%
  group_by(catcher) %>%
  summarize(
    pitches = n(),
    strikes = sum(called_strike),
    expected_strikes = sum(expected_strike_prob),
    extra_strikes = strikes - expected_strikes,
    framing_runs = sum(runs_value),
    strikes_per_game = (extra_strikes / pitches) * 140,  # ~140 pitches/game
    .groups = "drop"
  ) %>%
  arrange(desc(framing_runs))

print(catcher_framing)

# Visualize framing runs
ggplot(catcher_framing, aes(x = reorder(catcher, framing_runs), y = framing_runs,
                             fill = framing_runs)) +
  geom_col(width = 0.7) +
  geom_text(aes(label = sprintf("+%.1f", framing_runs)),
            hjust = ifelse(catcher_framing$framing_runs > 0, -0.2, 1.2),
            size = 5, fontface = "bold") +
  scale_fill_gradient2(low = "#d7191c", mid = "gray90", high = "#2c7bb6",
                       midpoint = 0) +
  coord_flip() +
  labs(
    title = "Catcher Framing Runs Above Average",
    subtitle = "Elite framers add 10+ runs per season through receiving",
    x = NULL,
    y = "Framing Runs Above Average",
    caption = "Data: Simulated | Each strike worth ~0.13 runs"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "none",
    panel.grid.major.y = element_blank()
  )

# Show strike rate by pitch location
ggplot(framing_data, aes(x = edge_distance, y = called_strike, color = catcher)) +
  geom_smooth(method = "loess", se = FALSE, size = 1.3) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "black") +
  annotate("text", x = -0.2, y = 0.9, label = "Inside Zone", size = 3) +
  annotate("text", x = 0.2, y = 0.9, label = "Outside Zone", size = 3) +
  scale_color_manual(values = c("Elite Framer" = "#2c7bb6",
                                  "Average Framer" = "gray50",
                                  "Poor Framer" = "#d7191c")) +
  labs(
    title = "Called Strike Rate by Pitch Location and Framer Quality",
    subtitle = "Elite framers get more strikes on borderline pitches",
    x = "Distance from Zone Edge (feet, negative = inside)",
    y = "Called Strike Rate",
    color = "Catcher Type"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.special import expit

# Set random seed
np.random.seed(456)
sns.set_style("whitegrid")

# Simulate framing data
catchers = ['Elite Framer', 'Average Framer', 'Poor Framer']
n_pitches = 1000

framing_data = []
for catcher in catchers:
    # Generate pitch locations relative to zone edge
    edge_distance = np.random.normal(0, 0.3, n_pitches)

    # Expected strike probability based on location
    expected_strike_prob = expit(1.5 - 6 * np.abs(edge_distance))

    # Framing adjustment
    if catcher == 'Elite Framer':
        adjustment = 0.15
    elif catcher == 'Average Framer':
        adjustment = 0.0
    else:
        adjustment = -0.12

    # Actual strike probability with framing
    actual_strike_prob = np.clip(expected_strike_prob + adjustment, 0.05, 0.95)

    # Generate actual calls
    called_strike = np.random.binomial(1, actual_strike_prob)

    # Calculate runs value (each strike worth ~0.13 runs)
    runs_value = (called_strike - expected_strike_prob) * 0.13

    for i in range(n_pitches):
        framing_data.append({
            'catcher': catcher,
            'edge_distance': edge_distance[i],
            'expected_strike_prob': expected_strike_prob[i],
            'actual_strike_prob': actual_strike_prob[i],
            'called_strike': called_strike[i],
            'runs_value': runs_value[i]
        })

framing_df = pd.DataFrame(framing_data)

# Summarize by catcher
catcher_framing = framing_df.groupby('catcher').agg({
    'edge_distance': 'count',
    'called_strike': 'sum',
    'expected_strike_prob': 'sum',
    'runs_value': 'sum'
}).round(2)

catcher_framing.columns = ['pitches', 'strikes', 'expected_strikes', 'framing_runs']
catcher_framing['extra_strikes'] = catcher_framing['strikes'] - catcher_framing['expected_strikes']
catcher_framing['strikes_per_game'] = (catcher_framing['extra_strikes'] /
                                         catcher_framing['pitches']) * 140

catcher_framing = catcher_framing.sort_values('framing_runs', ascending=False)
print("\nCatcher Framing Summary:")
print(catcher_framing)

# Visualize framing runs
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Bar chart of framing runs
colors_map = {'Elite Framer': '#2c7bb6', 'Average Framer': 'gray', 'Poor Framer': '#d7191c'}
colors = [colors_map[c] for c in catcher_framing.index]

bars = ax1.barh(range(len(catcher_framing)), catcher_framing['framing_runs'], color=colors, alpha=0.8)
ax1.set_yticks(range(len(catcher_framing)))
ax1.set_yticklabels(catcher_framing.index)
ax1.set_xlabel('Framing Runs Above Average', fontsize=12)
ax1.set_title('Catcher Framing Runs Above Average\nElite framers add 10+ runs per season through receiving',
              fontsize=13, fontweight='bold')
ax1.axvline(x=0, color='black', linestyle='--', linewidth=1, alpha=0.5)
ax1.grid(True, alpha=0.3, axis='x')

# Add value labels
for i, (bar, value) in enumerate(zip(bars, catcher_framing['framing_runs'])):
    x_pos = value + (0.5 if value > 0 else -0.5)
    ax1.text(x_pos, bar.get_y() + bar.get_height()/2, f'{value:+.1f}',
             va='center', ha='left' if value > 0 else 'right', fontweight='bold')

# Strike rate by location
for catcher in catchers:
    catcher_data = framing_df[framing_df['catcher'] == catcher].sort_values('edge_distance')

    # Smooth with rolling average
    window = 100
    x_smooth = catcher_data['edge_distance'].rolling(window, center=True).mean()
    y_smooth = catcher_data['called_strike'].rolling(window, center=True).mean()

    ax2.plot(x_smooth, y_smooth, linewidth=2.5, label=catcher, color=colors_map[catcher])

ax2.axvline(x=0, color='black', linestyle='--', linewidth=1.5, alpha=0.7)
ax2.text(-0.2, 0.9, 'Inside Zone', fontsize=10, ha='center')
ax2.text(0.2, 0.9, 'Outside Zone', fontsize=10, ha='center')
ax2.set_xlabel('Distance from Zone Edge (feet, negative = inside)', fontsize=12)
ax2.set_ylabel('Called Strike Rate', fontsize=12)
ax2.set_title('Called Strike Rate by Pitch Location\nElite framers get more strikes on borderline pitches',
              fontsize=13, fontweight='bold')
ax2.legend(loc='lower right')
ax2.grid(True, alpha=0.3)
ax2.set_ylim(0, 1)

plt.tight_layout()
plt.figtext(0.99, 0.01, 'Data: Simulated | Each strike worth ~0.13 runs',
            ha='right', fontsize=9, style='italic')
plt.show()

8.6 Pre-Statcast Defensive Metrics

Before Statcast, two metrics dominated defensive evaluation: Ultimate Zone Rating (UZR) and Defensive Runs Saved (DRS). Both remain widely used and valuable, especially for historical comparisons.

8.6.1 Ultimate Zone Rating (UZR) {#uzr}

UZR, developed by Mitchel Lichtman and published by FanGraphs, divides the field into zones and credits fielders for plays made within those zones relative to league average.

UZR components:

Range runs: Value from getting to balls (most important component)

Error runs: Cost of errors made

Double play runs: Value from turning double plays (infielders only)

Arm runs: Value from assists and preventing advancement (outfielders)

Calculation approach:

Classify each batted ball by type (GB, FB, LD), location, and velocity

Determine league-average out rate for similar balls

Credit fielder for outs above/below expectation

Convert to runs using run expectancy

Sum over full season

UZR interpretation:

+15: Elite (Gold Glove caliber)

+5 to +15: Above average

-5 to +5: Average

-5 to -15: Below average

<-15: Poor

UZR strengths:

Long track record (data back to 2002)

Publicly available at FanGraphs

Breaks down into understandable components

Generally stable year-to-year for true talent

UZR limitations:

Relies on STATS Inc. zone data (less precise than Statcast)

Doesn't account for exact positioning

Sample size issues persist (one season = 100-150 runs uncertainty)

Can be affected by team defensive shifts

8.6.2 Defensive Runs Saved (DRS) {#drs}

DRS, developed by John Dewan and published by Sports Info Solutions, uses a similar zone-based approach but with some methodological differences.

DRS components:

Plus/Minus runs: Range and fielding (similar to UZR's range runs)

Outfield Arm runs: Assists and baserunner kills

Double Play runs: Value from turning DPs

Bunt defense runs: Infielder ability on bunts

Good Play/Misplay runs: Subjective evaluation of technique

DRS vs UZR differences:

DRS includes some subjective "good play/misplay" evaluations

DRS and UZR often agree directionally but differ in magnitude

DRS has separate metrics for specific skills (bunt defense)

Both use proprietary data, limiting replication

DRS interpretation (same scale as UZR):

+15: Elite

+5 to +15: Above average

-5 to +5: Average

-5 to -15: Below average

<-15: Poor

8.6.3 Comparison Table of Metrics {#metrics-comparison}

Metric	Data Source	Availability	Strengths	Weaknesses	Best Use
OAA	Statcast tracking	2016-present	Most precise, objective	Limited history, small samples	Modern player evaluation
UZR	STATS zones	2002-present	Long history, public	Zone-based approximation	Historical comparisons
DRS	SIS zones	2003-present	Detailed components	Some subjectivity	Comprehensive evaluation
Fielding %	Official stats	1876-present	Long history, simple	Ignores range completely	Only for historical context

Correlation between metrics (typical season):

OAA vs UZR: r ≈ 0.70

OAA vs DRS: r ≈ 0.72

UZR vs DRS: r ≈ 0.85

The metrics generally agree on who are elite/poor defenders but differ on exact magnitudes. For modern players (2016+), prefer OAA. For historical analysis or when OAA isn't available, use UZR or DRS.

8.7 Defensive Positioning

Where fielders stand before the pitch dramatically affects defensive outcomes. The shift era revolutionized this understanding, and rule changes in 2023 created a natural experiment in positioning's impact.

8.7.1 The Shift Era (2010-2022) {#shift-era}

What was "the shift"? In traditional defensive alignment, two infielders play on each side of second base. The shift involved placing three infielders on one side—typically against left-handed pull hitters who hit ground balls to the right side.

Evolution of shift usage:

2010: ~2% of plate appearances

2015: ~13% of plate appearances

2020: ~30% of plate appearances

2022: ~35% of plate appearances

Why did shifting increase?

Data availability: Spray charts showed extreme pull tendencies

Analytical acceptance: Teams trusted data over tradition

Proven effectiveness: Shifts reduced BABIP by 20-30 points for shifted batters

Competitive pressure: If opponents shifted, you had to also

Real examples of shift effectiveness:

Kyle Seager (2022, LHH):

PA faced with shift: 482 (79% of PA)

BABIP with shift: .254

BABIP without shift: .312

Estimated hits prevented: ~18

Brian Anderson (2022, RHH):

PA faced with shift: 301 (56% of PA)

BABIP with shift: .241

BABIP without shift: .289

Estimated hits prevented: ~10

Teams collectively prevented an estimated 2,000-3,000 hits per season through shifting in the peak years (2020-2022).

8.7.2 Post-Shift Rules (2023+) {#post-shift-rules}

Starting in 2023, MLB banned most shifts by requiring:

Two infielders on each side of second base when the pitch is delivered

Four infielders on the infield dirt (no more outfielders playing shallow infield)

Feet must be touching dirt when pitch is released

Impact of shift ban (2023 vs 2022):

League-wide batting average:

2022: .243

2023: .248 (+5 points)

Ground ball hit rate (left-handed batters):

2022: .235

2023: .245 (+10 points, +4.3%)

Biggest beneficiaries (2023 improvements):

Pull-heavy left-handed hitters: +.015 AVG

Extreme ground ball hitters: +.020 AVG

Power hitters who grounded out frequently: +.012 AVG

Strategic adjustments:
Teams now optimize positioning within legal constraints:

Deeper/shallower positioning by batter tendency

Shade toward pull side while keeping two on each side

More aggressive outfield positioning (can still shift OF)

Let's analyze shift ban impact:

R Implementation:

library(tidyverse)

# Simulate 2022 vs 2023 ground ball outcomes for left-handed batters
set.seed(789)

shift_comparison <- data.frame(
  year = rep(c(2022, 2023), each = 5000),
  batter_id = rep(1:5000, 2)
) %>%
  mutate(
    # 2022: Heavy shift usage
    shift_rate = ifelse(year == 2022, 0.75, 0.0),
    was_shifted = rbinom(n(), 1, shift_rate),
    # Ground ball hit rate depends on shift
    base_hit_rate = rnorm(n(), 0.240, 0.05),
    shift_penalty = ifelse(was_shifted == 1, -0.03, 0),
    year_boost = ifelse(year == 2023, 0.01, 0),  # Overall rule changes
    hit_rate = pmax(0.1, pmin(0.4, base_hit_rate + shift_penalty + year_boost)),
    was_hit = rbinom(n(), 1, hit_rate)
  )

# Compare years
year_summary <- shift_comparison %>%
  group_by(year) %>%
  summarize(
    ground_balls = n(),
    hits = sum(was_hit),
    hit_rate = hits / ground_balls,
    avg_shift_rate = mean(shift_rate),
    .groups = "drop"
  )

print(year_summary)

# Statistical test
gb_2022 <- shift_comparison %>% filter(year == 2022)
gb_2023 <- shift_comparison %>% filter(year == 2023)

test_result <- prop.test(
  x = c(sum(gb_2022$was_hit), sum(gb_2023$was_hit)),
  n = c(nrow(gb_2022), nrow(gb_2023))
)

cat("\nProportion test for hit rate difference:\n")
print(test_result)

# Visualize shift impact
ggplot(year_summary, aes(x = factor(year), y = hit_rate, fill = factor(year))) +
  geom_col(width = 0.6) +
  geom_text(aes(label = sprintf(".%.3f", hit_rate)),
            vjust = -0.5, size = 6, fontface = "bold") +
  scale_y_continuous(labels = scales::number_format(accuracy = 0.001),
                     limits = c(0, 0.30)) +
  scale_fill_manual(values = c("2022" = "#E03A3E", "2023" = "#4A90E2")) +
  labs(
    title = "Ground Ball Hit Rate: Left-Handed Batters",
    subtitle = "Comparing 2022 (Shifts Allowed) vs 2023 (Shifts Banned)",
    x = "Season",
    y = "Ground Ball Hit Rate",
    caption = "Data: Simulated based on actual MLB trends\nP-value < 0.001"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(size = 12),
    legend.position = "none",
    panel.grid.major.x = element_blank()
  )

# Show shift usage decline
shift_usage <- data.frame(
  year = 2015:2023,
  shift_pct = c(0.13, 0.17, 0.21, 0.26, 0.28, 0.31, 0.33, 0.35, 0.01)
)

ggplot(shift_usage, aes(x = year, y = shift_pct)) +
  geom_line(size = 1.5, color = "#2b8cbe") +
  geom_point(size = 4, color = "#08519c") +
  geom_vline(xintercept = 2022.5, linetype = "dashed", color = "red", size = 1) +
  annotate("text", x = 2020, y = 0.32, label = "Shift Era", size = 5, fontface = "bold") +
  annotate("text", x = 2023, y = 0.15, label = "Ban Implemented",
           size = 4, color = "red") +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "The Rise and Fall of Defensive Shifting",
    subtitle = "Percentage of plate appearances with defensive shift",
    x = "Year",
    y = "Shift Usage Rate",
    caption = "Data: MLB.com"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    panel.grid.minor = element_blank()
  )

Python Implementation:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import proportions_ztest

# Set random seed
np.random.seed(789)
sns.set_style("whitegrid")

# Simulate 2022 vs 2023 ground ball outcomes
years = [2022, 2023]
n_batters = 5000

shift_data = []
for year in years:
    shift_rate = 0.75 if year == 2022 else 0.0
    year_boost = 0.01 if year == 2023 else 0.0

    for batter in range(n_batters):
        was_shifted = np.random.binomial(1, shift_rate)
        base_hit_rate = np.random.normal(0.240, 0.05)
        shift_penalty = -0.03 if was_shifted else 0

        hit_rate = np.clip(base_hit_rate + shift_penalty + year_boost, 0.1, 0.4)
        was_hit = np.random.binomial(1, hit_rate)

        shift_data.append({
            'year': year,
            'was_shifted': was_shifted,
            'hit_rate': hit_rate,
            'was_hit': was_hit
        })

shift_df = pd.DataFrame(shift_data)

# Compare years
year_summary = shift_df.groupby('year').agg({
    'was_hit': ['count', 'sum', 'mean'],
    'was_shifted': 'mean'
}).round(4)

year_summary.columns = ['ground_balls', 'hits', 'hit_rate', 'avg_shift_rate']
print("\nYearly Comparison:")
print(year_summary)

# Statistical test
gb_2022 = shift_df[shift_df['year'] == 2022]['was_hit']
gb_2023 = shift_df[shift_df['year'] == 2023]['was_hit']

count = np.array([gb_2022.sum(), gb_2023.sum()])
nobs = np.array([len(gb_2022), len(gb_2023)])
stat, pval = proportions_ztest(count, nobs)

print(f"\nProportions Z-test:")
print(f"Z-statistic: {stat:.4f}")
print(f"P-value: {pval:.4e}")

# Visualize comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Bar chart comparing hit rates
years_plot = year_summary.index.astype(str)
colors = ['#E03A3E', '#4A90E2']

bars = ax1.bar(years_plot, year_summary['hit_rate'], color=colors, width=0.6, alpha=0.8)

for bar, rate in zip(bars, year_summary['hit_rate']):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height,
             f'.{rate:.3f}',
             ha='center', va='bottom', fontsize=14, fontweight='bold')

ax1.set_ylabel('Ground Ball Hit Rate', fontsize=12)
ax1.set_xlabel('Season', fontsize=12)
ax1.set_title('Ground Ball Hit Rate: Left-Handed Batters\nComparing 2022 (Shifts Allowed) vs 2023 (Shifts Banned)',
              fontsize=13, fontweight='bold', pad=15)
ax1.set_ylim(0, 0.30)
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.3f}'))

# Line chart showing shift usage over time
shift_usage = pd.DataFrame({
    'year': range(2015, 2024),
    'shift_pct': [0.13, 0.17, 0.21, 0.26, 0.28, 0.31, 0.33, 0.35, 0.01]
})

ax2.plot(shift_usage['year'], shift_usage['shift_pct'],
         linewidth=2.5, color='#2b8cbe', marker='o', markersize=8, markerfacecolor='#08519c')
ax2.axvline(x=2022.5, color='red', linestyle='--', linewidth=2, alpha=0.7)
ax2.text(2020, 0.32, 'Shift Era', fontsize=12, fontweight='bold', ha='center')
ax2.text(2023, 0.15, 'Ban Implemented', fontsize=10, color='red', ha='center')

ax2.set_xlabel('Year', fontsize=12)
ax2.set_ylabel('Shift Usage Rate', fontsize=12)
ax2.set_title('The Rise and Fall of Defensive Shifting\nPercentage of plate appearances with defensive shift',
              fontsize=13, fontweight='bold', pad=15)
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.figtext(0.99, 0.01, 'Data: Simulated based on actual MLB trends | P-value < 0.001',
            ha='right', fontsize=9, style='italic')
plt.show()

8.7.3 Optimal Positioning Analysis {#optimal-positioning}

Even without extreme shifts, positioning matters. Studies show that optimal positioning (within the new rules) can save 10-15 runs per team per season compared to traditional positioning.

Positioning optimization factors:

Batter spray tendencies: Pull%, opposite field%

Pitch type: Fastballs pulled more than off-speed

Count: Behind in count = more pull

Ballpark: Dimensions affect optimal positioning

Score/situation: Late innings, close games affect approach

Modern teams use algorithmic positioning that updates based on pitch type and count, maximizing defensive coverage within legal constraints.

library(tidyverse)

# Simulate 2022 vs 2023 ground ball outcomes for left-handed batters
set.seed(789)

shift_comparison <- data.frame(
  year = rep(c(2022, 2023), each = 5000),
  batter_id = rep(1:5000, 2)
) %>%
  mutate(
    # 2022: Heavy shift usage
    shift_rate = ifelse(year == 2022, 0.75, 0.0),
    was_shifted = rbinom(n(), 1, shift_rate),
    # Ground ball hit rate depends on shift
    base_hit_rate = rnorm(n(), 0.240, 0.05),
    shift_penalty = ifelse(was_shifted == 1, -0.03, 0),
    year_boost = ifelse(year == 2023, 0.01, 0),  # Overall rule changes
    hit_rate = pmax(0.1, pmin(0.4, base_hit_rate + shift_penalty + year_boost)),
    was_hit = rbinom(n(), 1, hit_rate)
  )

# Compare years
year_summary <- shift_comparison %>%
  group_by(year) %>%
  summarize(
    ground_balls = n(),
    hits = sum(was_hit),
    hit_rate = hits / ground_balls,
    avg_shift_rate = mean(shift_rate),
    .groups = "drop"
  )

print(year_summary)

# Statistical test
gb_2022 <- shift_comparison %>% filter(year == 2022)
gb_2023 <- shift_comparison %>% filter(year == 2023)

test_result <- prop.test(
  x = c(sum(gb_2022$was_hit), sum(gb_2023$was_hit)),
  n = c(nrow(gb_2022), nrow(gb_2023))
)

cat("\nProportion test for hit rate difference:\n")
print(test_result)

# Visualize shift impact
ggplot(year_summary, aes(x = factor(year), y = hit_rate, fill = factor(year))) +
  geom_col(width = 0.6) +
  geom_text(aes(label = sprintf(".%.3f", hit_rate)),
            vjust = -0.5, size = 6, fontface = "bold") +
  scale_y_continuous(labels = scales::number_format(accuracy = 0.001),
                     limits = c(0, 0.30)) +
  scale_fill_manual(values = c("2022" = "#E03A3E", "2023" = "#4A90E2")) +
  labs(
    title = "Ground Ball Hit Rate: Left-Handed Batters",
    subtitle = "Comparing 2022 (Shifts Allowed) vs 2023 (Shifts Banned)",
    x = "Season",
    y = "Ground Ball Hit Rate",
    caption = "Data: Simulated based on actual MLB trends\nP-value < 0.001"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(size = 12),
    legend.position = "none",
    panel.grid.major.x = element_blank()
  )

# Show shift usage decline
shift_usage <- data.frame(
  year = 2015:2023,
  shift_pct = c(0.13, 0.17, 0.21, 0.26, 0.28, 0.31, 0.33, 0.35, 0.01)
)

ggplot(shift_usage, aes(x = year, y = shift_pct)) +
  geom_line(size = 1.5, color = "#2b8cbe") +
  geom_point(size = 4, color = "#08519c") +
  geom_vline(xintercept = 2022.5, linetype = "dashed", color = "red", size = 1) +
  annotate("text", x = 2020, y = 0.32, label = "Shift Era", size = 5, fontface = "bold") +
  annotate("text", x = 2023, y = 0.15, label = "Ban Implemented",
           size = 4, color = "red") +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "The Rise and Fall of Defensive Shifting",
    subtitle = "Percentage of plate appearances with defensive shift",
    x = "Year",
    y = "Shift Usage Rate",
    caption = "Data: MLB.com"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    panel.grid.minor = element_blank()
  )

Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import proportions_ztest

# Set random seed
np.random.seed(789)
sns.set_style("whitegrid")

# Simulate 2022 vs 2023 ground ball outcomes
years = [2022, 2023]
n_batters = 5000

shift_data = []
for year in years:
    shift_rate = 0.75 if year == 2022 else 0.0
    year_boost = 0.01 if year == 2023 else 0.0

    for batter in range(n_batters):
        was_shifted = np.random.binomial(1, shift_rate)
        base_hit_rate = np.random.normal(0.240, 0.05)
        shift_penalty = -0.03 if was_shifted else 0

        hit_rate = np.clip(base_hit_rate + shift_penalty + year_boost, 0.1, 0.4)
        was_hit = np.random.binomial(1, hit_rate)

        shift_data.append({
            'year': year,
            'was_shifted': was_shifted,
            'hit_rate': hit_rate,
            'was_hit': was_hit
        })

shift_df = pd.DataFrame(shift_data)

# Compare years
year_summary = shift_df.groupby('year').agg({
    'was_hit': ['count', 'sum', 'mean'],
    'was_shifted': 'mean'
}).round(4)

year_summary.columns = ['ground_balls', 'hits', 'hit_rate', 'avg_shift_rate']
print("\nYearly Comparison:")
print(year_summary)

# Statistical test
gb_2022 = shift_df[shift_df['year'] == 2022]['was_hit']
gb_2023 = shift_df[shift_df['year'] == 2023]['was_hit']

count = np.array([gb_2022.sum(), gb_2023.sum()])
nobs = np.array([len(gb_2022), len(gb_2023)])
stat, pval = proportions_ztest(count, nobs)

print(f"\nProportions Z-test:")
print(f"Z-statistic: {stat:.4f}")
print(f"P-value: {pval:.4e}")

# Visualize comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Bar chart comparing hit rates
years_plot = year_summary.index.astype(str)
colors = ['#E03A3E', '#4A90E2']

bars = ax1.bar(years_plot, year_summary['hit_rate'], color=colors, width=0.6, alpha=0.8)

for bar, rate in zip(bars, year_summary['hit_rate']):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height,
             f'.{rate:.3f}',
             ha='center', va='bottom', fontsize=14, fontweight='bold')

ax1.set_ylabel('Ground Ball Hit Rate', fontsize=12)
ax1.set_xlabel('Season', fontsize=12)
ax1.set_title('Ground Ball Hit Rate: Left-Handed Batters\nComparing 2022 (Shifts Allowed) vs 2023 (Shifts Banned)',
              fontsize=13, fontweight='bold', pad=15)
ax1.set_ylim(0, 0.30)
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.3f}'))

# Line chart showing shift usage over time
shift_usage = pd.DataFrame({
    'year': range(2015, 2024),
    'shift_pct': [0.13, 0.17, 0.21, 0.26, 0.28, 0.31, 0.33, 0.35, 0.01]
})

ax2.plot(shift_usage['year'], shift_usage['shift_pct'],
         linewidth=2.5, color='#2b8cbe', marker='o', markersize=8, markerfacecolor='#08519c')
ax2.axvline(x=2022.5, color='red', linestyle='--', linewidth=2, alpha=0.7)
ax2.text(2020, 0.32, 'Shift Era', fontsize=12, fontweight='bold', ha='center')
ax2.text(2023, 0.15, 'Ban Implemented', fontsize=10, color='red', ha='center')

ax2.set_xlabel('Year', fontsize=12)
ax2.set_ylabel('Shift Usage Rate', fontsize=12)
ax2.set_title('The Rise and Fall of Defensive Shifting\nPercentage of plate appearances with defensive shift',
              fontsize=13, fontweight='bold', pad=15)
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.figtext(0.99, 0.01, 'Data: Simulated based on actual MLB trends | P-value < 0.001',
            ha='right', fontsize=9, style='italic')
plt.show()

8.8 Baserunning Analytics

Baserunning is the forgotten skill of baseball analytics. Unlike hitting or fielding, baserunning provides smaller but consistent value—elite baserunners are worth 5-10 runs per season, which equals 0.5-1.0 WAR.

8.8.1 Sprint Speed {#sprint-speed}

Sprint speed measures a player's maximum running velocity, expressed in feet per second (ft/s). Statcast calculates it using the player's fastest one-second window during competitive plays.

Sprint speed benchmarks (2024):

Elite (Top 10%): 30+ ft/s

Above average: 28-30 ft/s

Average: 27-28 ft/s

Below average: 26-27 ft/s

Poor (Bottom 10%): <26 ft/s

Sprint speed leaders (2024):

Elly De La Cruz (Cincinnati):

Sprint speed: 30.8 ft/s (fastest in MLB)

Age: 22

Position: Shortstop

Bobby Witt Jr. (Kansas City):

Sprint speed: 30.5 ft/s

Age: 24

Position: Shortstop

Corbin Carroll (Arizona):

Sprint speed: 30.3 ft/s

Age: 24

Position: Outfield

Why sprint speed matters:

Stolen base success: Faster runners steal more successfully

Infield hits: Speed creates hits on ground balls

Extra bases: Fast runners stretch singles to doubles, doubles to triples

Defensive value: Speed improves range in outfield

R Implementation:

library(tidyverse)
library(baseballr)

# Simulated sprint speed data
set.seed(321)

sprint_data <- data.frame(
  player = paste("Player", 1:150),
  position = sample(c("IF", "OF", "C"), 150, replace = TRUE, prob = c(0.45, 0.45, 0.1))
) %>%
  mutate(
    # Sprint speed varies by position
    sprint_speed = case_when(
      position == "IF" ~ rnorm(n(), 28.0, 1.8),
      position == "OF" ~ rnorm(n(), 28.5, 1.7),
      position == "C" ~ rnorm(n(), 26.2, 1.4)
    ),
    sprint_speed = pmax(24, pmin(31, sprint_speed)),
    # Speed affects baserunning outcomes
    sb_attempts = rpois(n(), 8),
    sb_success_rate = plogis(-1 + 0.15 * sprint_speed + rnorm(n(), 0, 0.3)),
    stolen_bases = rbinom(n(), sb_attempts, sb_success_rate),
    caught_stealing = sb_attempts - stolen_bases,
    # Extra bases taken
    opportunities_1b_to_3b = rpois(n(), 12),
    extra_base_rate = plogis(-2 + 0.12 * sprint_speed),
    extra_bases_taken = rbinom(n(), opportunities_1b_to_3b, extra_base_rate)
  )

# Sprint speed by position
position_speed <- sprint_data %>%
  group_by(position) %>%
  summarize(
    players = n(),
    avg_speed = mean(sprint_speed),
    sd_speed = sd(sprint_speed),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_speed))

print(position_speed)

# Visualize sprint speed distribution by position
ggplot(sprint_data, aes(x = sprint_speed, fill = position)) +
  geom_density(alpha = 0.6) +
  geom_vline(xintercept = 27, linetype = "dashed", color = "black", size = 1) +
  annotate("text", x = 27.5, y = 0.25, label = "MLB Average (27 ft/s)", size = 3.5) +
  scale_fill_manual(
    values = c("IF" = "#2b8cbe", "OF" = "#2ca25f", "C" = "#de2d26"),
    labels = c("Infielders", "Outfielders", "Catchers")
  ) +
  labs(
    title = "Sprint Speed Distribution by Position",
    subtitle = "Outfielders and infielders are faster than catchers",
    x = "Sprint Speed (ft/s)",
    y = "Density",
    fill = "Position",
    caption = "Data: Simulated | MLB average ≈ 27 ft/s"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "bottom"
  )

# Sprint speed vs stolen base success
ggplot(sprint_data %>% filter(sb_attempts >= 5),
       aes(x = sprint_speed, y = sb_success_rate)) +
  geom_point(aes(color = position, size = sb_attempts), alpha = 0.6) +
  geom_smooth(method = "lm", se = TRUE, color = "black", linetype = "dashed") +
  scale_color_manual(
    values = c("IF" = "#2b8cbe", "OF" = "#2ca25f", "C" = "#de2d26"),
    labels = c("Infielders", "Outfielders", "Catchers")
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "Sprint Speed vs Stolen Base Success Rate",
    subtitle = "Faster players steal bases more successfully",
    x = "Sprint Speed (ft/s)",
    y = "Stolen Base Success Rate",
    color = "Position",
    size = "SB Attempts",
    caption = "Data: Simulated | Minimum 5 SB attempts"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "bottom"
  )

Python Implementation:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.special import expit

# Set random seed
np.random.seed(321)
sns.set_style("whitegrid")

# Simulate sprint speed data
n_players = 150
positions = np.random.choice(['IF', 'OF', 'C'], n_players, p=[0.45, 0.45, 0.1])

sprint_data = []
for i, pos in enumerate(positions):
    # Sprint speed varies by position
    if pos == 'IF':
        speed = np.random.normal(28.0, 1.8)
    elif pos == 'OF':
        speed = np.random.normal(28.5, 1.7)
    else:  # C
        speed = np.random.normal(26.2, 1.4)

    speed = np.clip(speed, 24, 31)

    # Speed affects baserunning outcomes
    sb_attempts = np.random.poisson(8)
    sb_success_rate = expit(-1 + 0.15 * speed + np.random.normal(0, 0.3))
    stolen_bases = np.random.binomial(sb_attempts, sb_success_rate)

    opportunities_1b_to_3b = np.random.poisson(12)
    extra_base_rate = expit(-2 + 0.12 * speed)
    extra_bases_taken = np.random.binomial(opportunities_1b_to_3b, extra_base_rate)

    sprint_data.append({
        'player': f'Player {i+1}',
        'position': pos,
        'sprint_speed': speed,
        'sb_attempts': sb_attempts,
        'sb_success_rate': sb_success_rate,
        'stolen_bases': stolen_bases,
        'caught_stealing': sb_attempts - stolen_bases,
        'opportunities_1b_to_3b': opportunities_1b_to_3b,
        'extra_bases_taken': extra_bases_taken
    })

sprint_df = pd.DataFrame(sprint_data)

# Sprint speed by position
position_speed = sprint_df.groupby('position')['sprint_speed'].agg(['count', 'mean', 'std']).round(2)
position_speed.columns = ['players', 'avg_speed', 'sd_speed']
position_speed = position_speed.sort_values('avg_speed', ascending=False)

print("\nSprint Speed by Position:")
print(position_speed)

# Visualize sprint speed distribution by position
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Density plot
colors = {'IF': '#2b8cbe', 'OF': '#2ca25f', 'C': '#de2d26'}
for pos in ['IF', 'OF', 'C']:
    pos_data = sprint_df[sprint_df['position'] == pos]['sprint_speed']
    pos_data.plot(kind='density', ax=ax1, label=pos, color=colors[pos], alpha=0.6, linewidth=2.5)

ax1.axvline(x=27, color='black', linestyle='--', linewidth=2, alpha=0.7)
ax1.text(27.3, 0.25, 'MLB Average\n(27 ft/s)', fontsize=10, ha='left')
ax1.set_xlabel('Sprint Speed (ft/s)', fontsize=12)
ax1.set_ylabel('Density', fontsize=12)
ax1.set_title('Sprint Speed Distribution by Position\nOutfielders and infielders are faster than catchers',
              fontsize=13, fontweight='bold')
ax1.legend(title='Position', labels=['Catchers', 'Infielders', 'Outfielders'])
ax1.grid(True, alpha=0.3)

# Sprint speed vs SB success
sprint_sb = sprint_df[sprint_df['sb_attempts'] >= 5].copy()

for pos in ['IF', 'OF', 'C']:
    pos_data = sprint_sb[sprint_sb['position'] == pos]
    ax2.scatter(pos_data['sprint_speed'], pos_data['sb_success_rate'],
                s=pos_data['sb_attempts'] * 10, alpha=0.6, color=colors[pos], label=pos)

# Add regression line
from scipy.stats import linregress
slope, intercept, r_value, p_value, std_err = linregress(
    sprint_sb['sprint_speed'], sprint_sb['sb_success_rate'])
x_line = np.array([sprint_sb['sprint_speed'].min(), sprint_sb['sprint_speed'].max()])
y_line = slope * x_line + intercept
ax2.plot(x_line, y_line, 'k--', alpha=0.7, linewidth=2, label=f'Trend (R² = {r_value**2:.3f})')

ax2.set_xlabel('Sprint Speed (ft/s)', fontsize=12)
ax2.set_ylabel('Stolen Base Success Rate', fontsize=12)
ax2.set_title('Sprint Speed vs Stolen Base Success Rate\nFaster players steal bases more successfully',
              fontsize=13, fontweight='bold')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax2.legend(title='Position', labels=['Catchers', 'Infielders', 'Outfielders', 'Trend'])
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.figtext(0.99, 0.01, 'Data: Simulated | MLB average ≈ 27 ft/s | Minimum 5 SB attempts for scatter plot',
            ha='right', fontsize=9, style='italic')
plt.show()

# Print top speedsters
print("\nTop 10 Fastest Players:")
print(sprint_df.nlargest(10, 'sprint_speed')[['player', 'position', 'sprint_speed', 'stolen_bases']])

8.8.2 Baserunning Value {#baserunning-value}

Beyond sprint speed, actual baserunning decisions matter enormously. A fast player who takes bad risks provides little value; a smart player with average speed can be highly valuable.

Components of baserunning value:

Stolen bases (SB) and caught stealing (CS):

Success threshold: ~75% (need 3 successes per failure to break even)

Value of SB: +0.2 runs

Cost of CS: -0.45 runs

Net value: (SB × 0.2) - (CS × 0.45)

Extra bases taken:

First to third on single: +0.27 runs

Second to home on single: +0.53 runs

First to home on double: +0.40 runs

Outs on bases:

Getting thrown out on basepaths: -0.50 runs (varies by situation)

Going first to third (example):

Opportunities: 35 per season (varies by lineup position)

League average rate: 28%

Elite baserunner rate: 40-45%

Runs above average: (Your rate - 0.28) × 35 × 0.27 ≈ 1-2 runs

Real examples (2024 estimated):

Bobby Witt Jr. (Kansas City):

Sprint speed: 30.5 ft/s

Stolen bases: 31 (86% success rate)

Extra bases taken: Above average

Baserunning value: +8 runs

Ronald Acuna Jr. (Atlanta, when healthy):

Sprint speed: 30.1 ft/s

Stolen bases: Elite success rate (typically 85%+)

Extra bases taken: Elite

Baserunning value: +10-12 runs (full season)

Salvador Perez (Kansas City):

Sprint speed: 25.1 ft/s

Stolen bases: 1 (minimal attempts)

Extra bases taken: Below average

Baserunning value: -3 runs

8.8.3 BsR (Baserunning Runs) {#bsr}

BsR (Baserunning Runs Above Average) is FanGraphs' comprehensive baserunning metric. It combines all baserunning contributions into a single runs number:

BsR = wSB + UBR + wGDP

Where:

wSB: Weighted stolen base runs (SB value - CS cost)

UBR: Ultimate Base Running (extra bases, outs on bases)

wGDP: Grounded into double play runs (avoiding GIDP is valuable)

BsR interpretation:

+5: Elite baserunner (provides real value)

+2 to +5: Above average

-2 to +2: Average

-2 to -5: Below average

<-5: Poor baserunner (costs team runs)

BsR leaders often combine:

Speed to take extra bases and steal successfully

Aggressiveness to attempt steals and advances

Instincts to avoid outs on bases

Speed to avoid double plays

BsR shows that baserunning, while less valuable than hitting or pitching, still matters. The gap between elite and poor baserunners is 10-15 runs per season—equivalent to 30-40 points of wOBA or 1.0-1.5 WAR.

library(tidyverse)
library(baseballr)

# Simulated sprint speed data
set.seed(321)

sprint_data <- data.frame(
  player = paste("Player", 1:150),
  position = sample(c("IF", "OF", "C"), 150, replace = TRUE, prob = c(0.45, 0.45, 0.1))
) %>%
  mutate(
    # Sprint speed varies by position
    sprint_speed = case_when(
      position == "IF" ~ rnorm(n(), 28.0, 1.8),
      position == "OF" ~ rnorm(n(), 28.5, 1.7),
      position == "C" ~ rnorm(n(), 26.2, 1.4)
    ),
    sprint_speed = pmax(24, pmin(31, sprint_speed)),
    # Speed affects baserunning outcomes
    sb_attempts = rpois(n(), 8),
    sb_success_rate = plogis(-1 + 0.15 * sprint_speed + rnorm(n(), 0, 0.3)),
    stolen_bases = rbinom(n(), sb_attempts, sb_success_rate),
    caught_stealing = sb_attempts - stolen_bases,
    # Extra bases taken
    opportunities_1b_to_3b = rpois(n(), 12),
    extra_base_rate = plogis(-2 + 0.12 * sprint_speed),
    extra_bases_taken = rbinom(n(), opportunities_1b_to_3b, extra_base_rate)
  )

# Sprint speed by position
position_speed <- sprint_data %>%
  group_by(position) %>%
  summarize(
    players = n(),
    avg_speed = mean(sprint_speed),
    sd_speed = sd(sprint_speed),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_speed))

print(position_speed)

# Visualize sprint speed distribution by position
ggplot(sprint_data, aes(x = sprint_speed, fill = position)) +
  geom_density(alpha = 0.6) +
  geom_vline(xintercept = 27, linetype = "dashed", color = "black", size = 1) +
  annotate("text", x = 27.5, y = 0.25, label = "MLB Average (27 ft/s)", size = 3.5) +
  scale_fill_manual(
    values = c("IF" = "#2b8cbe", "OF" = "#2ca25f", "C" = "#de2d26"),
    labels = c("Infielders", "Outfielders", "Catchers")
  ) +
  labs(
    title = "Sprint Speed Distribution by Position",
    subtitle = "Outfielders and infielders are faster than catchers",
    x = "Sprint Speed (ft/s)",
    y = "Density",
    fill = "Position",
    caption = "Data: Simulated | MLB average ≈ 27 ft/s"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "bottom"
  )

# Sprint speed vs stolen base success
ggplot(sprint_data %>% filter(sb_attempts >= 5),
       aes(x = sprint_speed, y = sb_success_rate)) +
  geom_point(aes(color = position, size = sb_attempts), alpha = 0.6) +
  geom_smooth(method = "lm", se = TRUE, color = "black", linetype = "dashed") +
  scale_color_manual(
    values = c("IF" = "#2b8cbe", "OF" = "#2ca25f", "C" = "#de2d26"),
    labels = c("Infielders", "Outfielders", "Catchers")
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "Sprint Speed vs Stolen Base Success Rate",
    subtitle = "Faster players steal bases more successfully",
    x = "Sprint Speed (ft/s)",
    y = "Stolen Base Success Rate",
    color = "Position",
    size = "SB Attempts",
    caption = "Data: Simulated | Minimum 5 SB attempts"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "bottom"
  )

Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.special import expit

# Set random seed
np.random.seed(321)
sns.set_style("whitegrid")

# Simulate sprint speed data
n_players = 150
positions = np.random.choice(['IF', 'OF', 'C'], n_players, p=[0.45, 0.45, 0.1])

sprint_data = []
for i, pos in enumerate(positions):
    # Sprint speed varies by position
    if pos == 'IF':
        speed = np.random.normal(28.0, 1.8)
    elif pos == 'OF':
        speed = np.random.normal(28.5, 1.7)
    else:  # C
        speed = np.random.normal(26.2, 1.4)

    speed = np.clip(speed, 24, 31)

    # Speed affects baserunning outcomes
    sb_attempts = np.random.poisson(8)
    sb_success_rate = expit(-1 + 0.15 * speed + np.random.normal(0, 0.3))
    stolen_bases = np.random.binomial(sb_attempts, sb_success_rate)

    opportunities_1b_to_3b = np.random.poisson(12)
    extra_base_rate = expit(-2 + 0.12 * speed)
    extra_bases_taken = np.random.binomial(opportunities_1b_to_3b, extra_base_rate)

    sprint_data.append({
        'player': f'Player {i+1}',
        'position': pos,
        'sprint_speed': speed,
        'sb_attempts': sb_attempts,
        'sb_success_rate': sb_success_rate,
        'stolen_bases': stolen_bases,
        'caught_stealing': sb_attempts - stolen_bases,
        'opportunities_1b_to_3b': opportunities_1b_to_3b,
        'extra_bases_taken': extra_bases_taken
    })

sprint_df = pd.DataFrame(sprint_data)

# Sprint speed by position
position_speed = sprint_df.groupby('position')['sprint_speed'].agg(['count', 'mean', 'std']).round(2)
position_speed.columns = ['players', 'avg_speed', 'sd_speed']
position_speed = position_speed.sort_values('avg_speed', ascending=False)

print("\nSprint Speed by Position:")
print(position_speed)

# Visualize sprint speed distribution by position
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Density plot
colors = {'IF': '#2b8cbe', 'OF': '#2ca25f', 'C': '#de2d26'}
for pos in ['IF', 'OF', 'C']:
    pos_data = sprint_df[sprint_df['position'] == pos]['sprint_speed']
    pos_data.plot(kind='density', ax=ax1, label=pos, color=colors[pos], alpha=0.6, linewidth=2.5)

ax1.axvline(x=27, color='black', linestyle='--', linewidth=2, alpha=0.7)
ax1.text(27.3, 0.25, 'MLB Average\n(27 ft/s)', fontsize=10, ha='left')
ax1.set_xlabel('Sprint Speed (ft/s)', fontsize=12)
ax1.set_ylabel('Density', fontsize=12)
ax1.set_title('Sprint Speed Distribution by Position\nOutfielders and infielders are faster than catchers',
              fontsize=13, fontweight='bold')
ax1.legend(title='Position', labels=['Catchers', 'Infielders', 'Outfielders'])
ax1.grid(True, alpha=0.3)

# Sprint speed vs SB success
sprint_sb = sprint_df[sprint_df['sb_attempts'] >= 5].copy()

for pos in ['IF', 'OF', 'C']:
    pos_data = sprint_sb[sprint_sb['position'] == pos]
    ax2.scatter(pos_data['sprint_speed'], pos_data['sb_success_rate'],
                s=pos_data['sb_attempts'] * 10, alpha=0.6, color=colors[pos], label=pos)

# Add regression line
from scipy.stats import linregress
slope, intercept, r_value, p_value, std_err = linregress(
    sprint_sb['sprint_speed'], sprint_sb['sb_success_rate'])
x_line = np.array([sprint_sb['sprint_speed'].min(), sprint_sb['sprint_speed'].max()])
y_line = slope * x_line + intercept
ax2.plot(x_line, y_line, 'k--', alpha=0.7, linewidth=2, label=f'Trend (R² = {r_value**2:.3f})')

ax2.set_xlabel('Sprint Speed (ft/s)', fontsize=12)
ax2.set_ylabel('Stolen Base Success Rate', fontsize=12)
ax2.set_title('Sprint Speed vs Stolen Base Success Rate\nFaster players steal bases more successfully',
              fontsize=13, fontweight='bold')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax2.legend(title='Position', labels=['Catchers', 'Infielders', 'Outfielders', 'Trend'])
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.figtext(0.99, 0.01, 'Data: Simulated | MLB average ≈ 27 ft/s | Minimum 5 SB attempts for scatter plot',
            ha='right', fontsize=9, style='italic')
plt.show()

# Print top speedsters
print("\nTop 10 Fastest Players:")
print(sprint_df.nlargest(10, 'sprint_speed')[['player', 'position', 'sprint_speed', 'stolen_bases']])

8.9 Interactive Fielding Visualizations

Interactive visualizations transform fielding analysis from static reporting to dynamic exploration, enabling coaches, analysts, and fans to investigate defensive performance patterns that aggregated metrics alone cannot reveal. While traditional defensive statistics reduce complex spatial and temporal data to single numbers, interactive tools preserve the richness of the underlying information. This section introduces three advanced interactive visualization techniques using Plotly's powerful graphing library, which provides zoom, filtering, hover details, and animation capabilities essential for modern defensive analysis.

The value of interactive fielding visualizations extends across multiple use cases. Player development staff identify positioning adjustments and skill gaps requiring targeted training. Front office personnel evaluate trade and free agent targets by exploring defensive performance in various contexts. Opposing teams scout defensive tendencies to optimize baserunning and hit placement strategies. The shift from static to interactive visualization represents a paradigm shift in how defensive data informs decision-making.

8.8.1 Interactive Catch Probability Heat Map

Catch probability heat maps visualize where fielders make plays and which plays they convert at rates above or below expectation. Making these maps interactive enables filtering by game situation (close vs. blowout), batted ball type (fly ball vs. line drive), or time period (early season vs. late season). Users can hover over specific zones to see conversion rates, expected rates, and outs above average for that region. This granularity reveals positioning inefficiencies and range limitations that summary statistics obscure.

R Implementation:

library(tidyverse)
library(plotly)
library(baseballr)

create_catch_probability_heatmap <- function(fielding_data, player_name = "Fielder",
                                             position = "OF") {
  # Filter and prepare data
  plays <- fielding_data %>%
    filter(!is.na(hc_x), !is.na(hc_y), !is.na(hit_distance_sc)) %>%
    mutate(
      # Adjust coordinates for plotting (home plate at origin)
      x_coord = hc_x - 125.42,  # Center on home plate
      y_coord = 208.48 - hc_y,  # Flip y-axis
      # Simplified catch probability based on distance
      distance = hit_distance_sc,
      expected_catch_prob = case_when(
        distance < 50 ~ 0.98,
        distance < 100 ~ 0.85,
        distance < 150 ~ 0.65,
        distance < 200 ~ 0.40,
        distance < 250 ~ 0.20,
        TRUE ~ 0.05
      ),
      was_caught = ifelse(events %in% c("field_out", "double_play",
                                         "sac_fly"), 1, 0),
      outs_above_avg = was_caught - expected_catch_prob
    )

  # Create hexagonal bins for aggregation
  # Convert to grid cells (20x20 foot cells)
  plays <- plays %>%
    mutate(
      x_bin = round(x_coord / 20) * 20,
      y_bin = round(y_coord / 20) * 20,
      hover_text = paste0(
        "Location: (", round(x_coord, 0), ", ", round(y_coord, 0), ")<br>",
        "Distance: ", round(distance, 0), " ft<br>",
        "Expected: ", round(expected_catch_prob * 100, 0), "%<br>",
        "Result: ", ifelse(was_caught == 1, "Out", "Hit")
      )
    )

  # Aggregate by bins
  binned_data <- plays %>%
    group_by(x_bin, y_bin) %>%
    summarize(
      plays = n(),
      catches = sum(was_caught),
      expected_catches = sum(expected_catch_prob),
      catch_rate = catches / plays,
      oaa_zone = sum(outs_above_avg),
      avg_distance = mean(distance),
      .groups = "drop"
    ) %>%
    filter(plays >= 3) %>%  # Minimum plays per zone
    mutate(
      hover_info = paste0(
        "<b>Zone Performance</b><br>",
        "Plays: ", plays, "<br>",
        "Catch Rate: ", round(catch_rate * 100, 0), "%<br>",
        "Expected: ", round((expected_catches/plays) * 100, 0), "%<br>",
        "OAA (zone): ", sprintf("%+.1f", oaa_zone), "<br>",
        "Avg Distance: ", round(avg_distance, 0), " ft"
      )
    )

  # Create interactive heatmap
  p <- plot_ly(
    data = binned_data,
    x = ~x_bin,
    y = ~y_bin,
    z = ~oaa_zone,
    type = "contour",
    colorscale = list(
      c(0, "rgb(215, 48, 39)"),      # Red for negative OAA
      c(0.5, "rgb(255, 255, 191)"),  # Yellow for average
      c(1, "rgb(44, 123, 182)")      # Blue for positive OAA
    ),
    colorbar = list(
      title = "<b>Outs Above<br>Average</b>",
      tickformat = "+.1f"
    ),
    text = ~hover_info,
    hoverinfo = "text",
    contours = list(
      showlabels = TRUE,
      labelfont = list(size = 10, color = 'white')
    )
  ) %>%
    add_trace(
      data = plays,
      x = ~x_coord,
      y = ~y_coord,
      type = "scatter",
      mode = "markers",
      marker = list(
        size = 4,
        color = ~was_caught,
        colorscale = list(c(0, "red"), c(1, "green")),
        opacity = 0.3,
        line = list(width = 0.5, color = 'black')
      ),
      text = ~hover_text,
      hoverinfo = "text",
      showlegend = FALSE
    ) %>%
    layout(
      title = list(
        text = paste0("<b>", player_name, " Catch Probability Heat Map</b><br>",
                     "<sub>", position, " - Red zones = below average, Blue = above average</sub>"),
        font = list(size = 16)
      ),
      xaxis = list(
        title = "<b>Horizontal Distance from Home Plate (ft)</b><br>← LF | CF | RF →",
        range = c(-250, 250),
        zeroline = TRUE,
        zerolinewidth = 2,
        zerolinecolor = 'black',
        gridcolor = 'lightgray'
      ),
      yaxis = list(
        title = "<b>Distance from Home Plate (ft)</b>",
        range = c(0, 450),
        scaleanchor = "x",
        scaleratio = 1,
        gridcolor = 'lightgray'
      ),
      hovermode = 'closest',
      showlegend = FALSE,
      margin = list(l = 80, r = 120, t = 100, b = 80)
    ) %>%
    config(displayModeBar = TRUE, displaylogo = FALSE)

  return(p)
}

# Example usage
# fielding_data <- statcast_search(
#   start_date = "2024-04-01",
#   end_date = "2024-10-01",
#   player_type = "batter"
# ) %>%
#   filter(fielder_2 == player_id_of_interest)  # Filter for specific fielder
#
# heatmap <- create_catch_probability_heatmap(
#   fielding_data,
#   "Mookie Betts",
#   "RF"
# )
# heatmap

Python Implementation:

import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from pybaseball import statcast

def create_catch_probability_heatmap(fielding_data, player_name="Fielder",
                                     position="OF"):
    """
    Create interactive catch probability heat map.

    Parameters:
    fielding_data: DataFrame with Statcast batted ball data
    player_name: Name for chart title
    position: Fielder position

    Returns:
    Plotly figure object
    """
    # Filter and prepare data
    plays = fielding_data[
        fielding_data['hc_x'].notna() &
        fielding_data['hc_y'].notna() &
        fielding_data['hit_distance_sc'].notna()
    ].copy()

    # Adjust coordinates (home plate at origin)
    plays['x_coord'] = plays['hc_x'] - 125.42
    plays['y_coord'] = 208.48 - plays['hc_y']
    plays['distance'] = plays['hit_distance_sc']

    # Simplified catch probability model
    def calc_expected_catch(distance):
        if distance < 50:
            return 0.98
        elif distance < 100:
            return 0.85
        elif distance < 150:
            return 0.65
        elif distance < 200:
            return 0.40
        elif distance < 250:
            return 0.20
        else:
            return 0.05

    plays['expected_catch_prob'] = plays['distance'].apply(calc_expected_catch)

    # Determine outcomes
    caught_events = ['field_out', 'double_play', 'sac_fly']
    plays['was_caught'] = plays['events'].isin(caught_events).astype(int)
    plays['outs_above_avg'] = plays['was_caught'] - plays['expected_catch_prob']

    # Create hover text for individual plays
    plays['hover_text'] = plays.apply(
        lambda row: f"Location: ({row['x_coord']:.0f}, {row['y_coord']:.0f})<br>" +
                   f"Distance: {row['distance']:.0f} ft<br>" +
                   f"Expected: {row['expected_catch_prob']*100:.0f}%<br>" +
                   f"Result: {'Out' if row['was_caught'] else 'Hit'}",
        axis=1
    )

    # Create grid bins (20x20 feet)
    plays['x_bin'] = (plays['x_coord'] / 20).round() * 20
    plays['y_bin'] = (plays['y_coord'] / 20).round() * 20

    # Aggregate by bins
    binned_data = plays.groupby(['x_bin', 'y_bin']).agg({
        'was_caught': ['count', 'sum'],
        'expected_catch_prob': 'sum',
        'outs_above_avg': 'sum',
        'distance': 'mean'
    }).reset_index()

    binned_data.columns = ['x_bin', 'y_bin', 'plays', 'catches',
                           'expected_catches', 'oaa_zone', 'avg_distance']

    # Filter minimum plays
    binned_data = binned_data[binned_data['plays'] >= 3]

    binned_data['catch_rate'] = binned_data['catches'] / binned_data['plays']
    binned_data['expected_rate'] = binned_data['expected_catches'] / binned_data['plays']

    # Create hover info
    binned_data['hover_info'] = binned_data.apply(
        lambda row: f"<b>Zone Performance</b><br>" +
                   f"Plays: {row['plays']}<br>" +
                   f"Catch Rate: {row['catch_rate']*100:.0f}%<br>" +
                   f"Expected: {row['expected_rate']*100:.0f}%<br>" +
                   f"OAA (zone): {row['oaa_zone']:+.1f}<br>" +
                   f"Avg Distance: {row['avg_distance']:.0f} ft",
        axis=1
    )

    # Create figure
    fig = go.Figure()

    # Add contour heat map
    # Create grid for contour
    x_range = np.arange(binned_data['x_bin'].min(), binned_data['x_bin'].max() + 20, 20)
    y_range = np.arange(binned_data['y_bin'].min(), binned_data['y_bin'].max() + 20, 20)

    # Create z-values matrix
    z_matrix = np.full((len(y_range), len(x_range)), np.nan)

    for _, row in binned_data.iterrows():
        x_idx = np.where(x_range == row['x_bin'])[0]
        y_idx = np.where(y_range == row['y_bin'])[0]
        if len(x_idx) > 0 and len(y_idx) > 0:
            z_matrix[y_idx[0], x_idx[0]] = row['oaa_zone']

    fig.add_trace(go.Contour(
        x=x_range,
        y=y_range,
        z=z_matrix,
        colorscale=[
            [0, 'rgb(215, 48, 39)'],      # Red for negative
            [0.5, 'rgb(255, 255, 191)'],  # Yellow for average
            [1, 'rgb(44, 123, 182)']      # Blue for positive
        ],
        colorbar=dict(
            title="<b>Outs Above<br>Average</b>",
            tickformat="+.1f"
        ),
        hoverinfo='skip',
        contours=dict(
            showlabels=True,
            labelfont=dict(size=10, color='white')
        )
    ))

    # Add scatter points for individual plays
    fig.add_trace(go.Scatter(
        x=plays['x_coord'],
        y=plays['y_coord'],
        mode='markers',
        marker=dict(
            size=4,
            color=plays['was_caught'],
            colorscale=[[0, 'red'], [1, 'green']],
            opacity=0.3,
            line=dict(width=0.5, color='black'),
            showscale=False
        ),
        text=plays['hover_text'],
        hoverinfo='text',
        showlegend=False
    ))

    # Update layout
    fig.update_layout(
        title=dict(
            text=f"<b>{player_name} Catch Probability Heat Map</b><br>" +
                 f"<sub>{position} - Red zones = below average, Blue = above average</sub>",
            x=0.5,
            xanchor='center',
            font=dict(size=16)
        ),
        xaxis=dict(
            title="<b>Horizontal Distance from Home Plate (ft)</b><br>← LF | CF | RF →",
            range=[-250, 250],
            zeroline=True,
            zerolinewidth=2,
            zerolinecolor='black',
            gridcolor='lightgray',
            showgrid=True
        ),
        yaxis=dict(
            title="<b>Distance from Home Plate (ft)</b>",
            range=[0, 450],
            scaleanchor="x",
            scaleratio=1,
            gridcolor='lightgray',
            showgrid=True
        ),
        hovermode='closest',
        showlegend=False,
        width=1000,
        height=1000,
        margin=dict(l=80, r=150, t=100, b=80),
        template='plotly_white'
    )

    return fig

# Example usage
# fielding_data = statcast(start_dt='2024-04-01', end_dt='2024-10-01')
# fielding_data = fielding_data[fielding_data['fielder_2'] == player_id]
# fig = create_catch_probability_heatmap(fielding_data, "Mookie Betts", "RF")
# fig.show()

The interactive catch probability heat map reveals positioning and range patterns that single-number metrics cannot capture. A right fielder might have excellent OAA in straightaway right field but struggle in the gap, suggesting either positioning adjustments or specific training on balls hit to that zone. The ability to click and filter by batted ball characteristics or game situations enables highly targeted developmental interventions.

8.8.2 Interactive Sprint Speed Comparison Chart

Sprint speed affects both offensive (baserunning) and defensive (range) performance, making it a crucial athletic tool for position players. An interactive sprint speed comparison chart allows filtering by position, age, or team while displaying relationships between speed and performance outcomes. Users can identify speed-dependent skills, track aging curves, and benchmark players against positional peers. Hover details reveal sprint speed rankings, percentiles, and associated performance metrics.

R Implementation:

library(plotly)
library(dplyr)

create_sprint_speed_comparison <- function(player_data) {
  # Prepare data
  sprint_data <- player_data %>%
    filter(!is.na(sprint_speed)) %>%
    mutate(
      # Create position groups
      position_group = case_when(
        position %in% c("LF", "CF", "RF") ~ "Outfield",
        position %in% c("SS", "2B", "3B") ~ "Infield",
        position == "C" ~ "Catcher",
        position == "1B" ~ "First Base",
        TRUE ~ "Other"
      ),
      # Calculate percentile
      speed_percentile = percent_rank(sprint_speed) * 100,
      hover_text = paste0(
        "<b>", player_name, "</b><br>",
        "Position: ", position, "<br>",
        "Sprint Speed: ", round(sprint_speed, 2), " ft/s<br>",
        "Percentile: ", round(speed_percentile, 0), "%<br>",
        "Age: ", age, "<br>",
        "SB: ", stolen_bases, " (", round(sb_success_rate * 100, 0), "% success)"
      )
    )

  # Create interactive scatter plot
  p <- plot_ly(
    data = sprint_data,
    x = ~age,
    y = ~sprint_speed,
    color = ~position_group,
    colors = "Set2",
    size = ~stolen_bases,
    sizes = c(10, 100),
    text = ~hover_text,
    hoverinfo = "text",
    type = "scatter",
    mode = "markers",
    marker = list(
      opacity = 0.7,
      line = list(width = 1, color = 'black')
    )
  ) %>%
    layout(
      title = list(
        text = "<b>Sprint Speed by Age and Position</b><br>" +
               "<sub>Size = Stolen Bases | Hover for Details</sub>",
        font = list(size = 16)
      ),
      xaxis = list(
        title = "<b>Age (years)</b>",
        range = c(20, 42),
        gridcolor = 'lightgray'
      ),
      yaxis = list(
        title = "<b>Sprint Speed (ft/s)</b>",
        range = c(24, 32),
        gridcolor = 'lightgray'
      ),
      hovermode = 'closest',
      showlegend = TRUE,
      legend = list(
        title = list(text = '<b>Position Group</b>'),
        orientation = 'v',
        x = 1.02,
        y = 1
      ),
      margin = list(l = 80, r = 150, t = 100, b = 80)
    ) %>%
    add_lines(
      data = sprint_data %>%
        group_by(age) %>%
        summarize(avg_speed = mean(sprint_speed, na.rm = TRUE)),
      x = ~age,
      y = ~avg_speed,
      line = list(color = 'black', width = 2, dash = 'dash'),
      name = 'Age Curve',
      hoverinfo = 'skip',
      showlegend = TRUE
    ) %>%
    config(displayModeBar = TRUE, displaylogo = FALSE)

  return(p)
}

# Example usage with simulated data
# set.seed(123)
# player_data <- data.frame(
#   player_name = paste("Player", 1:200),
#   position = sample(c("CF", "RF", "LF", "SS", "2B", "3B", "1B", "C"),
#                    200, replace = TRUE),
#   age = sample(22:40, 200, replace = TRUE),
#   sprint_speed = rnorm(200, 27.5, 2),
#   stolen_bases = rpois(200, 10),
#   sb_success_rate = rbeta(200, 7, 3)
# )
#
# sprint_chart <- create_sprint_speed_comparison(player_data)
# sprint_chart

Python Implementation:

import plotly.graph_objects as go
import pandas as pd
import numpy as np

def create_sprint_speed_comparison(player_data):
    """
    Create interactive sprint speed comparison chart.

    Parameters:
    player_data: DataFrame with player sprint speed and performance data

    Returns:
    Plotly figure object
    """
    # Prepare data
    sprint_data = player_data[player_data['sprint_speed'].notna()].copy()

    # Create position groups
    def position_group(pos):
        if pos in ['LF', 'CF', 'RF']:
            return 'Outfield'
        elif pos in ['SS', '2B', '3B']:
            return 'Infield'
        elif pos == 'C':
            return 'Catcher'
        elif pos == '1B':
            return 'First Base'
        else:
            return 'Other'

    sprint_data['position_group'] = sprint_data['position'].apply(position_group)

    # Calculate percentiles
    sprint_data['speed_percentile'] = sprint_data['sprint_speed'].rank(pct=True) * 100

    # Create hover text
    sprint_data['hover_text'] = sprint_data.apply(
        lambda row: f"<b>{row['player_name']}</b><br>" +
                   f"Position: {row['position']}<br>" +
                   f"Sprint Speed: {row['sprint_speed']:.2f} ft/s<br>" +
                   f"Percentile: {row['speed_percentile']:.0f}%<br>" +
                   f"Age: {row['age']}<br>" +
                   f"SB: {row['stolen_bases']} " +
                   f"({row['sb_success_rate']*100:.0f}% success)",
        axis=1
    )

    # Create figure
    fig = go.Figure()

    # Color mapping for position groups
    color_map = {
        'Outfield': '#1f77b4',
        'Infield': '#ff7f0e',
        'Catcher': '#2ca02c',
        'First Base': '#d62728',
        'Other': '#9467bd'
    }

    # Add scatter for each position group
    for pos_group in sprint_data['position_group'].unique():
        group_data = sprint_data[sprint_data['position_group'] == pos_group]

        fig.add_trace(go.Scatter(
            x=group_data['age'],
            y=group_data['sprint_speed'],
            mode='markers',
            name=pos_group,
            text=group_data['hover_text'],
            hoverinfo='text',
            marker=dict(
                size=group_data['stolen_bases'].clip(upper=30),  # Cap size
                sizemode='diameter',
                sizeref=0.5,
                color=color_map.get(pos_group, '#7f7f7f'),
                opacity=0.7,
                line=dict(width=1, color='black')
            )
        ))

    # Add age curve trend line
    age_curve = sprint_data.groupby('age')['sprint_speed'].mean().reset_index()

    fig.add_trace(go.Scatter(
        x=age_curve['age'],
        y=age_curve['sprint_speed'],
        mode='lines',
        name='Age Curve',
        line=dict(color='black', width=2, dash='dash'),
        hoverinfo='skip'
    ))

    # Update layout
    fig.update_layout(
        title=dict(
            text="<b>Sprint Speed by Age and Position</b><br>" +
                 "<sub>Size = Stolen Bases | Hover for Details</sub>",
            x=0.5,
            xanchor='center',
            font=dict(size=16)
        ),
        xaxis=dict(
            title="<b>Age (years)</b>",
            range=[20, 42],
            gridcolor='lightgray',
            showgrid=True
        ),
        yaxis=dict(
            title="<b>Sprint Speed (ft/s)</b>",
            range=[24, 32],
            gridcolor='lightgray',
            showgrid=True
        ),
        hovermode='closest',
        showlegend=True,
        legend=dict(
            title=dict(text='<b>Position Group</b>'),
            orientation='v',
            x=1.02,
            y=1
        ),
        width=1100,
        height=700,
        margin=dict(l=80, r=150, t=100, b=80),
        template='plotly_white'
    )

    return fig

# Example usage with simulated data
# np.random.seed(123)
# player_data = pd.DataFrame({
#     'player_name': [f'Player {i}' for i in range(1, 201)],
#     'position': np.random.choice(['CF', 'RF', 'LF', 'SS', '2B', '3B', '1B', 'C'],
#                                  200),
#     'age': np.random.randint(22, 41, 200),
#     'sprint_speed': np.random.normal(27.5, 2, 200),
#     'stolen_bases': np.random.poisson(10, 200),
#     'sb_success_rate': np.random.beta(7, 3, 200)
# })
#
# fig = create_sprint_speed_comparison(player_data)
# fig.show()

The interactive sprint speed comparison chart enables multi-dimensional analysis of the speed-performance relationship. Analysts can quickly identify players maintaining elite speed into their 30s (potential trade targets), young players with poor speed who might not age well defensively, or position players whose speed doesn't translate to stolen base success (indicating poor baserunning instincts despite physical tools).

8.8.3 Animated Play Trajectory Visualization

Defensive plays unfold over time and space, making them ideal candidates for animated visualization. An animated play trajectory shows fielder movement from initial positioning through ball contact to catch or miss, overlaid with optimal routes and timing benchmarks. This reveals jump efficiency, route selection, and closing speed in ways that static metrics cannot. Analysts can compare actual routes to optimal paths, identify hesitation or misdirection, and evaluate recovery ability after poor initial reads.

R Implementation:

library(plotly)
library(dplyr)

create_animated_play_trajectory <- function(play_tracking_data, play_description = "Defensive Play") {
  # Prepare tracking data
  # Expected format: frame-by-frame position data
  tracking <- play_tracking_data %>%
    arrange(frame_time) %>%
    mutate(
      time_elapsed = frame_time - min(frame_time),
      distance_from_ball = sqrt((x_position - ball_x)^2 + (y_position - ball_y)^2),
      speed = sqrt(velocity_x^2 + velocity_y^2),
      frame_label = paste0(
        "Time: ", round(time_elapsed, 2), " sec<br>",
        "Fielder Position: (", round(x_position, 1), ", ", round(y_position, 1), ")<br>",
        "Distance to Ball: ", round(distance_from_ball, 1), " ft<br>",
        "Speed: ", round(speed, 2), " ft/s"
      )
    )

  # Calculate optimal route (straight line from start to catch point)
  start_x <- tracking$x_position[1]
  start_y <- tracking$y_position[1]
  end_x <- tracking$ball_x[nrow(tracking)]
  end_y <- tracking$ball_y[nrow(tracking)]

  optimal_route <- data.frame(
    x = seq(start_x, end_x, length.out = 50),
    y = seq(start_y, end_y, length.out = 50)
  )

  # Create animated plot
  p <- plot_ly() %>%
    # Add field outline (simplified)
    add_trace(
      type = "scatter",
      mode = "lines",
      x = c(-200, 200, 200, -200, -200),
      y = c(0, 0, 400, 400, 0),
      line = list(color = "green", width = 2),
      showlegend = FALSE,
      hoverinfo = "skip"
    ) %>%
    # Add optimal route
    add_trace(
      data = optimal_route,
      x = ~x,
      y = ~y,
      type = "scatter",
      mode = "lines",
      line = list(color = "blue", width = 2, dash = "dash"),
      name = "Optimal Route",
      hoverinfo = "skip"
    ) %>%
    # Add fielder trajectory (animated)
    add_trace(
      data = tracking,
      x = ~x_position,
      y = ~y_position,
      frame = ~frame_time,
      type = "scatter",
      mode = "markers+lines",
      marker = list(
        color = "red",
        size = 12,
        symbol = "circle"
      ),
      line = list(color = "red", width = 2),
      text = ~frame_label,
      hoverinfo = "text",
      name = "Fielder Path"
    ) %>%
    # Add ball position (animated)
    add_trace(
      data = tracking,
      x = ~ball_x,
      y = ~ball_y,
      frame = ~frame_time,
      type = "scatter",
      mode = "markers",
      marker = list(
        color = "orange",
        size = 10,
        symbol = "star"
      ),
      name = "Ball",
      hoverinfo = "skip"
    ) %>%
    layout(
      title = list(
        text = paste0("<b>", play_description, "</b><br>",
                     "<sub>Red = Fielder | Orange = Ball | Blue Dash = Optimal Route</sub>"),
        font = list(size = 16)
      ),
      xaxis = list(
        title = "<b>Distance from Home (ft)</b>",
        range = c(-250, 250),
        zeroline = TRUE,
        zerolinecolor = 'black'
      ),
      yaxis = list(
        title = "<b>Distance from Home (ft)</b>",
        range = c(0, 450),
        scaleanchor = "x",
        scaleratio = 1
      ),
      showlegend = TRUE,
      legend = list(x = 1.02, y = 1)
    ) %>%
    animation_opts(
      frame = 100,  # 100ms per frame
      transition = 50,
      redraw = FALSE
    ) %>%
    animation_slider(
      currentvalue = list(
        prefix = "Time: ",
        suffix = " sec",
        font = list(size = 14)
      )
    ) %>%
    config(displayModeBar = TRUE, displaylogo = FALSE)

  return(p)
}

# Example usage with simulated tracking data
# n_frames <- 30
# play_tracking <- data.frame(
#   frame_time = seq(0, 3, length.out = n_frames),
#   x_position = seq(50, 180, length.out = n_frames) + rnorm(n_frames, 0, 5),
#   y_position = seq(100, 250, length.out = n_frames) + rnorm(n_frames, 0, 5),
#   ball_x = rep(180, n_frames),
#   ball_y = seq(0, 250, length.out = n_frames),
#   velocity_x = c(rep(4, n_frames)),
#   velocity_y = c(rep(5, n_frames))
# )
#
# play_viz <- create_animated_play_trajectory(play_tracking, "Byron Buxton Diving Catch")
# play_viz

Python Implementation:

import plotly.graph_objects as go
import pandas as pd
import numpy as np

def create_animated_play_trajectory(play_tracking_data, play_description="Defensive Play"):
    """
    Create animated defensive play trajectory visualization.

    Parameters:
    play_tracking_data: DataFrame with frame-by-frame tracking data
        Required columns: frame_time, x_position, y_position, ball_x, ball_y,
                         velocity_x, velocity_y
    play_description: Description for chart title

    Returns:
    Plotly figure object
    """
    # Prepare data
    tracking = play_tracking_data.sort_values('frame_time').copy()

    tracking['time_elapsed'] = tracking['frame_time'] - tracking['frame_time'].min()
    tracking['distance_from_ball'] = np.sqrt(
        (tracking['x_position'] - tracking['ball_x'])**2 +
        (tracking['y_position'] - tracking['ball_y'])**2
    )
    tracking['speed'] = np.sqrt(
        tracking['velocity_x']**2 + tracking['velocity_y']**2
    )

    # Create hover labels
    tracking['frame_label'] = tracking.apply(
        lambda row: f"Time: {row['time_elapsed']:.2f} sec<br>" +
                   f"Fielder Position: ({row['x_position']:.1f}, {row['y_position']:.1f})<br>" +
                   f"Distance to Ball: {row['distance_from_ball']:.1f} ft<br>" +
                   f"Speed: {row['speed']:.2f} ft/s",
        axis=1
    )

    # Calculate optimal route
    start_x = tracking['x_position'].iloc[0]
    start_y = tracking['y_position'].iloc[0]
    end_x = tracking['ball_x'].iloc[-1]
    end_y = tracking['ball_y'].iloc[-1]

    optimal_x = np.linspace(start_x, end_x, 50)
    optimal_y = np.linspace(start_y, end_y, 50)

    # Create figure
    fig = go.Figure()

    # Add field outline
    fig.add_trace(go.Scatter(
        x=[-200, 200, 200, -200, -200],
        y=[0, 0, 400, 400, 0],
        mode='lines',
        line=dict(color='green', width=2),
        showlegend=False,
        hoverinfo='skip'
    ))

    # Add optimal route
    fig.add_trace(go.Scatter(
        x=optimal_x,
        y=optimal_y,
        mode='lines',
        line=dict(color='blue', width=2, dash='dash'),
        name='Optimal Route',
        hoverinfo='skip'
    ))

    # Create frames for animation
    frames = []
    for i, frame_time in enumerate(tracking['frame_time'].unique()):
        frame_data = tracking[tracking['frame_time'] <= frame_time]

        frame_traces = [
            # Field outline (static)
            go.Scatter(
                x=[-200, 200, 200, -200, -200],
                y=[0, 0, 400, 400, 0],
                mode='lines',
                line=dict(color='green', width=2),
                showlegend=False,
                hoverinfo='skip'
            ),
            # Optimal route (static)
            go.Scatter(
                x=optimal_x,
                y=optimal_y,
                mode='lines',
                line=dict(color='blue', width=2, dash='dash'),
                name='Optimal Route',
                hoverinfo='skip',
                showlegend=(i == 0)
            ),
            # Fielder path
            go.Scatter(
                x=frame_data['x_position'],
                y=frame_data['y_position'],
                mode='markers+lines',
                marker=dict(color='red', size=12),
                line=dict(color='red', width=2),
                text=frame_data['frame_label'],
                hoverinfo='text',
                name='Fielder Path',
                showlegend=(i == 0)
            ),
            # Current ball position
            go.Scatter(
                x=[frame_data['ball_x'].iloc[-1]],
                y=[frame_data['ball_y'].iloc[-1]],
                mode='markers',
                marker=dict(color='orange', size=15, symbol='star'),
                name='Ball',
                hoverinfo='skip',
                showlegend=(i == 0)
            )
        ]

        frames.append(go.Frame(data=frame_traces, name=str(frame_time)))

    # Add initial frame data
    initial_data = tracking.iloc[:1]
    fig.add_trace(go.Scatter(
        x=initial_data['x_position'],
        y=initial_data['y_position'],
        mode='markers+lines',
        marker=dict(color='red', size=12),
        line=dict(color='red', width=2),
        text=initial_data['frame_label'],
        hoverinfo='text',
        name='Fielder Path'
    ))

    fig.add_trace(go.Scatter(
        x=initial_data['ball_x'],
        y=initial_data['ball_y'],
        mode='markers',
        marker=dict(color='orange', size=15, symbol='star'),
        name='Ball',
        hoverinfo='skip'
    ))

    fig.frames = frames

    # Update layout
    fig.update_layout(
        title=dict(
            text=f"<b>{play_description}</b><br>" +
                 "<sub>Red = Fielder | Orange = Ball | Blue Dash = Optimal Route</sub>",
            x=0.5,
            xanchor='center',
            font=dict(size=16)
        ),
        xaxis=dict(
            title="<b>Distance from Home (ft)</b>",
            range=[-250, 250],
            zeroline=True,
            zerolinecolor='black',
            gridcolor='lightgray',
            showgrid=True
        ),
        yaxis=dict(
            title="<b>Distance from Home (ft)</b>",
            range=[0, 450],
            scaleanchor="x",
            scaleratio=1,
            gridcolor='lightgray',
            showgrid=True
        ),
        showlegend=True,
        legend=dict(x=1.02, y=1),
        updatemenus=[{
            'type': 'buttons',
            'showactive': False,
            'buttons': [
                {
                    'label': 'Play',
                    'method': 'animate',
                    'args': [None, {
                        'frame': {'duration': 100, 'redraw': True},
                        'fromcurrent': True,
                        'transition': {'duration': 50}
                    }]
                },
                {
                    'label': 'Pause',
                    'method': 'animate',
                    'args': [[None], {
                        'frame': {'duration': 0, 'redraw': False},
                        'mode': 'immediate',
                        'transition': {'duration': 0}
                    }]
                }
            ],
            'x': 0.1,
            'y': 0
        }],
        sliders=[{
            'active': 0,
            'steps': [
                {
                    'args': [[f.name], {
                        'frame': {'duration': 0, 'redraw': True},
                        'mode': 'immediate',
                        'transition': {'duration': 0}
                    }],
                    'label': f'{float(f.name):.2f}',
                    'method': 'animate'
                }
                for f in frames
            ],
            'currentvalue': {
                'prefix': 'Time: ',
                'suffix': ' sec',
                'font': {'size': 14}
            },
            'x': 0.1,
            'len': 0.9,
            'xanchor': 'left',
            'y': 0,
            'yanchor': 'top'
        }],
        width=1000,
        height=1000,
        template='plotly_white'
    )

    return fig

# Example usage with simulated tracking data
# n_frames = 30
# play_tracking = pd.DataFrame({
#     'frame_time': np.linspace(0, 3, n_frames),
#     'x_position': np.linspace(50, 180, n_frames) + np.random.normal(0, 5, n_frames),
#     'y_position': np.linspace(100, 250, n_frames) + np.random.normal(0, 5, n_frames),
#     'ball_x': np.repeat(180, n_frames),
#     'ball_y': np.linspace(0, 250, n_frames),
#     'velocity_x': np.repeat(4, n_frames),
#     'velocity_y': np.repeat(5, n_frames)
# })
#
# fig = create_animated_play_trajectory(play_tracking, "Byron Buxton Diving Catch")
# fig.show()

The animated play trajectory visualization brings defensive plays to life in ways that highlight-reel videos and OAA numbers alone cannot achieve. Coaches can identify exactly when a fielder hesitated, took a suboptimal angle, or compensated brilliantly for an initial misstep. The comparison between actual and optimal routes quantifies route efficiency in intuitive visual terms, making technical feedback accessible to players at all analytical sophistication levels.

These three interactive fielding visualizations—catch probability heat maps, sprint speed comparisons, and animated play trajectories—represent the frontier of defensive analysis. They preserve spatial and temporal richness while enabling exploration and filtering impossible with static charts. The combination of interactive tools and traditional defensive metrics creates a comprehensive analytical framework for understanding, evaluating, and improving defensive performance in modern baseball.

library(tidyverse)
library(plotly)
library(baseballr)

create_catch_probability_heatmap <- function(fielding_data, player_name = "Fielder",
                                             position = "OF") {
  # Filter and prepare data
  plays <- fielding_data %>%
    filter(!is.na(hc_x), !is.na(hc_y), !is.na(hit_distance_sc)) %>%
    mutate(
      # Adjust coordinates for plotting (home plate at origin)
      x_coord = hc_x - 125.42,  # Center on home plate
      y_coord = 208.48 - hc_y,  # Flip y-axis
      # Simplified catch probability based on distance
      distance = hit_distance_sc,
      expected_catch_prob = case_when(
        distance < 50 ~ 0.98,
        distance < 100 ~ 0.85,
        distance < 150 ~ 0.65,
        distance < 200 ~ 0.40,
        distance < 250 ~ 0.20,
        TRUE ~ 0.05
      ),
      was_caught = ifelse(events %in% c("field_out", "double_play",
                                         "sac_fly"), 1, 0),
      outs_above_avg = was_caught - expected_catch_prob
    )

  # Create hexagonal bins for aggregation
  # Convert to grid cells (20x20 foot cells)
  plays <- plays %>%
    mutate(
      x_bin = round(x_coord / 20) * 20,
      y_bin = round(y_coord / 20) * 20,
      hover_text = paste0(
        "Location: (", round(x_coord, 0), ", ", round(y_coord, 0), ")<br>",
        "Distance: ", round(distance, 0), " ft<br>",
        "Expected: ", round(expected_catch_prob * 100, 0), "%<br>",
        "Result: ", ifelse(was_caught == 1, "Out", "Hit")
      )
    )

  # Aggregate by bins
  binned_data <- plays %>%
    group_by(x_bin, y_bin) %>%
    summarize(
      plays = n(),
      catches = sum(was_caught),
      expected_catches = sum(expected_catch_prob),
      catch_rate = catches / plays,
      oaa_zone = sum(outs_above_avg),
      avg_distance = mean(distance),
      .groups = "drop"
    ) %>%
    filter(plays >= 3) %>%  # Minimum plays per zone
    mutate(
      hover_info = paste0(
        "<b>Zone Performance</b><br>",
        "Plays: ", plays, "<br>",
        "Catch Rate: ", round(catch_rate * 100, 0), "%<br>",
        "Expected: ", round((expected_catches/plays) * 100, 0), "%<br>",
        "OAA (zone): ", sprintf("%+.1f", oaa_zone), "<br>",
        "Avg Distance: ", round(avg_distance, 0), " ft"
      )
    )

  # Create interactive heatmap
  p <- plot_ly(
    data = binned_data,
    x = ~x_bin,
    y = ~y_bin,
    z = ~oaa_zone,
    type = "contour",
    colorscale = list(
      c(0, "rgb(215, 48, 39)"),      # Red for negative OAA
      c(0.5, "rgb(255, 255, 191)"),  # Yellow for average
      c(1, "rgb(44, 123, 182)")      # Blue for positive OAA
    ),
    colorbar = list(
      title = "<b>Outs Above<br>Average</b>",
      tickformat = "+.1f"
    ),
    text = ~hover_info,
    hoverinfo = "text",
    contours = list(
      showlabels = TRUE,
      labelfont = list(size = 10, color = 'white')
    )
  ) %>%
    add_trace(
      data = plays,
      x = ~x_coord,
      y = ~y_coord,
      type = "scatter",
      mode = "markers",
      marker = list(
        size = 4,
        color = ~was_caught,
        colorscale = list(c(0, "red"), c(1, "green")),
        opacity = 0.3,
        line = list(width = 0.5, color = 'black')
      ),
      text = ~hover_text,
      hoverinfo = "text",
      showlegend = FALSE
    ) %>%
    layout(
      title = list(
        text = paste0("<b>", player_name, " Catch Probability Heat Map</b><br>",
                     "<sub>", position, " - Red zones = below average, Blue = above average</sub>"),
        font = list(size = 16)
      ),
      xaxis = list(
        title = "<b>Horizontal Distance from Home Plate (ft)</b><br>← LF | CF | RF →",
        range = c(-250, 250),
        zeroline = TRUE,
        zerolinewidth = 2,
        zerolinecolor = 'black',
        gridcolor = 'lightgray'
      ),
      yaxis = list(
        title = "<b>Distance from Home Plate (ft)</b>",
        range = c(0, 450),
        scaleanchor = "x",
        scaleratio = 1,
        gridcolor = 'lightgray'
      ),
      hovermode = 'closest',
      showlegend = FALSE,
      margin = list(l = 80, r = 120, t = 100, b = 80)
    ) %>%
    config(displayModeBar = TRUE, displaylogo = FALSE)

  return(p)
}

# Example usage
# fielding_data <- statcast_search(
#   start_date = "2024-04-01",
#   end_date = "2024-10-01",
#   player_type = "batter"
# ) %>%
#   filter(fielder_2 == player_id_of_interest)  # Filter for specific fielder
#
# heatmap <- create_catch_probability_heatmap(
#   fielding_data,
#   "Mookie Betts",
#   "RF"
# )
# heatmap

library(plotly)
library(dplyr)

create_sprint_speed_comparison <- function(player_data) {
  # Prepare data
  sprint_data <- player_data %>%
    filter(!is.na(sprint_speed)) %>%
    mutate(
      # Create position groups
      position_group = case_when(
        position %in% c("LF", "CF", "RF") ~ "Outfield",
        position %in% c("SS", "2B", "3B") ~ "Infield",
        position == "C" ~ "Catcher",
        position == "1B" ~ "First Base",
        TRUE ~ "Other"
      ),
      # Calculate percentile
      speed_percentile = percent_rank(sprint_speed) * 100,
      hover_text = paste0(
        "<b>", player_name, "</b><br>",
        "Position: ", position, "<br>",
        "Sprint Speed: ", round(sprint_speed, 2), " ft/s<br>",
        "Percentile: ", round(speed_percentile, 0), "%<br>",
        "Age: ", age, "<br>",
        "SB: ", stolen_bases, " (", round(sb_success_rate * 100, 0), "% success)"
      )
    )

  # Create interactive scatter plot
  p <- plot_ly(
    data = sprint_data,
    x = ~age,
    y = ~sprint_speed,
    color = ~position_group,
    colors = "Set2",
    size = ~stolen_bases,
    sizes = c(10, 100),
    text = ~hover_text,
    hoverinfo = "text",
    type = "scatter",
    mode = "markers",
    marker = list(
      opacity = 0.7,
      line = list(width = 1, color = 'black')
    )
  ) %>%
    layout(
      title = list(
        text = "<b>Sprint Speed by Age and Position</b><br>" +
               "<sub>Size = Stolen Bases | Hover for Details</sub>",
        font = list(size = 16)
      ),
      xaxis = list(
        title = "<b>Age (years)</b>",
        range = c(20, 42),
        gridcolor = 'lightgray'
      ),
      yaxis = list(
        title = "<b>Sprint Speed (ft/s)</b>",
        range = c(24, 32),
        gridcolor = 'lightgray'
      ),
      hovermode = 'closest',
      showlegend = TRUE,
      legend = list(
        title = list(text = '<b>Position Group</b>'),
        orientation = 'v',
        x = 1.02,
        y = 1
      ),
      margin = list(l = 80, r = 150, t = 100, b = 80)
    ) %>%
    add_lines(
      data = sprint_data %>%
        group_by(age) %>%
        summarize(avg_speed = mean(sprint_speed, na.rm = TRUE)),
      x = ~age,
      y = ~avg_speed,
      line = list(color = 'black', width = 2, dash = 'dash'),
      name = 'Age Curve',
      hoverinfo = 'skip',
      showlegend = TRUE
    ) %>%
    config(displayModeBar = TRUE, displaylogo = FALSE)

  return(p)
}

# Example usage with simulated data
# set.seed(123)
# player_data <- data.frame(
#   player_name = paste("Player", 1:200),
#   position = sample(c("CF", "RF", "LF", "SS", "2B", "3B", "1B", "C"),
#                    200, replace = TRUE),
#   age = sample(22:40, 200, replace = TRUE),
#   sprint_speed = rnorm(200, 27.5, 2),
#   stolen_bases = rpois(200, 10),
#   sb_success_rate = rbeta(200, 7, 3)
# )
#
# sprint_chart <- create_sprint_speed_comparison(player_data)
# sprint_chart

library(plotly)
library(dplyr)

create_animated_play_trajectory <- function(play_tracking_data, play_description = "Defensive Play") {
  # Prepare tracking data
  # Expected format: frame-by-frame position data
  tracking <- play_tracking_data %>%
    arrange(frame_time) %>%
    mutate(
      time_elapsed = frame_time - min(frame_time),
      distance_from_ball = sqrt((x_position - ball_x)^2 + (y_position - ball_y)^2),
      speed = sqrt(velocity_x^2 + velocity_y^2),
      frame_label = paste0(
        "Time: ", round(time_elapsed, 2), " sec<br>",
        "Fielder Position: (", round(x_position, 1), ", ", round(y_position, 1), ")<br>",
        "Distance to Ball: ", round(distance_from_ball, 1), " ft<br>",
        "Speed: ", round(speed, 2), " ft/s"
      )
    )

  # Calculate optimal route (straight line from start to catch point)
  start_x <- tracking$x_position[1]
  start_y <- tracking$y_position[1]
  end_x <- tracking$ball_x[nrow(tracking)]
  end_y <- tracking$ball_y[nrow(tracking)]

  optimal_route <- data.frame(
    x = seq(start_x, end_x, length.out = 50),
    y = seq(start_y, end_y, length.out = 50)
  )

  # Create animated plot
  p <- plot_ly() %>%
    # Add field outline (simplified)
    add_trace(
      type = "scatter",
      mode = "lines",
      x = c(-200, 200, 200, -200, -200),
      y = c(0, 0, 400, 400, 0),
      line = list(color = "green", width = 2),
      showlegend = FALSE,
      hoverinfo = "skip"
    ) %>%
    # Add optimal route
    add_trace(
      data = optimal_route,
      x = ~x,
      y = ~y,
      type = "scatter",
      mode = "lines",
      line = list(color = "blue", width = 2, dash = "dash"),
      name = "Optimal Route",
      hoverinfo = "skip"
    ) %>%
    # Add fielder trajectory (animated)
    add_trace(
      data = tracking,
      x = ~x_position,
      y = ~y_position,
      frame = ~frame_time,
      type = "scatter",
      mode = "markers+lines",
      marker = list(
        color = "red",
        size = 12,
        symbol = "circle"
      ),
      line = list(color = "red", width = 2),
      text = ~frame_label,
      hoverinfo = "text",
      name = "Fielder Path"
    ) %>%
    # Add ball position (animated)
    add_trace(
      data = tracking,
      x = ~ball_x,
      y = ~ball_y,
      frame = ~frame_time,
      type = "scatter",
      mode = "markers",
      marker = list(
        color = "orange",
        size = 10,
        symbol = "star"
      ),
      name = "Ball",
      hoverinfo = "skip"
    ) %>%
    layout(
      title = list(
        text = paste0("<b>", play_description, "</b><br>",
                     "<sub>Red = Fielder | Orange = Ball | Blue Dash = Optimal Route</sub>"),
        font = list(size = 16)
      ),
      xaxis = list(
        title = "<b>Distance from Home (ft)</b>",
        range = c(-250, 250),
        zeroline = TRUE,
        zerolinecolor = 'black'
      ),
      yaxis = list(
        title = "<b>Distance from Home (ft)</b>",
        range = c(0, 450),
        scaleanchor = "x",
        scaleratio = 1
      ),
      showlegend = TRUE,
      legend = list(x = 1.02, y = 1)
    ) %>%
    animation_opts(
      frame = 100,  # 100ms per frame
      transition = 50,
      redraw = FALSE
    ) %>%
    animation_slider(
      currentvalue = list(
        prefix = "Time: ",
        suffix = " sec",
        font = list(size = 14)
      )
    ) %>%
    config(displayModeBar = TRUE, displaylogo = FALSE)

  return(p)
}

# Example usage with simulated tracking data
# n_frames <- 30
# play_tracking <- data.frame(
#   frame_time = seq(0, 3, length.out = n_frames),
#   x_position = seq(50, 180, length.out = n_frames) + rnorm(n_frames, 0, 5),
#   y_position = seq(100, 250, length.out = n_frames) + rnorm(n_frames, 0, 5),
#   ball_x = rep(180, n_frames),
#   ball_y = seq(0, 250, length.out = n_frames),
#   velocity_x = c(rep(4, n_frames)),
#   velocity_y = c(rep(5, n_frames))
# )
#
# play_viz <- create_animated_play_trajectory(play_tracking, "Byron Buxton Diving Catch")
# play_viz

Python

import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from pybaseball import statcast

def create_catch_probability_heatmap(fielding_data, player_name="Fielder",
                                     position="OF"):
    """
    Create interactive catch probability heat map.

    Parameters:
    fielding_data: DataFrame with Statcast batted ball data
    player_name: Name for chart title
    position: Fielder position

    Returns:
    Plotly figure object
    """
    # Filter and prepare data
    plays = fielding_data[
        fielding_data['hc_x'].notna() &
        fielding_data['hc_y'].notna() &
        fielding_data['hit_distance_sc'].notna()
    ].copy()

    # Adjust coordinates (home plate at origin)
    plays['x_coord'] = plays['hc_x'] - 125.42
    plays['y_coord'] = 208.48 - plays['hc_y']
    plays['distance'] = plays['hit_distance_sc']

    # Simplified catch probability model
    def calc_expected_catch(distance):
        if distance < 50:
            return 0.98
        elif distance < 100:
            return 0.85
        elif distance < 150:
            return 0.65
        elif distance < 200:
            return 0.40
        elif distance < 250:
            return 0.20
        else:
            return 0.05

    plays['expected_catch_prob'] = plays['distance'].apply(calc_expected_catch)

    # Determine outcomes
    caught_events = ['field_out', 'double_play', 'sac_fly']
    plays['was_caught'] = plays['events'].isin(caught_events).astype(int)
    plays['outs_above_avg'] = plays['was_caught'] - plays['expected_catch_prob']

    # Create hover text for individual plays
    plays['hover_text'] = plays.apply(
        lambda row: f"Location: ({row['x_coord']:.0f}, {row['y_coord']:.0f})<br>" +
                   f"Distance: {row['distance']:.0f} ft<br>" +
                   f"Expected: {row['expected_catch_prob']*100:.0f}%<br>" +
                   f"Result: {'Out' if row['was_caught'] else 'Hit'}",
        axis=1
    )

    # Create grid bins (20x20 feet)
    plays['x_bin'] = (plays['x_coord'] / 20).round() * 20
    plays['y_bin'] = (plays['y_coord'] / 20).round() * 20

    # Aggregate by bins
    binned_data = plays.groupby(['x_bin', 'y_bin']).agg({
        'was_caught': ['count', 'sum'],
        'expected_catch_prob': 'sum',
        'outs_above_avg': 'sum',
        'distance': 'mean'
    }).reset_index()

    binned_data.columns = ['x_bin', 'y_bin', 'plays', 'catches',
                           'expected_catches', 'oaa_zone', 'avg_distance']

    # Filter minimum plays
    binned_data = binned_data[binned_data['plays'] >= 3]

    binned_data['catch_rate'] = binned_data['catches'] / binned_data['plays']
    binned_data['expected_rate'] = binned_data['expected_catches'] / binned_data['plays']

    # Create hover info
    binned_data['hover_info'] = binned_data.apply(
        lambda row: f"<b>Zone Performance</b><br>" +
                   f"Plays: {row['plays']}<br>" +
                   f"Catch Rate: {row['catch_rate']*100:.0f}%<br>" +
                   f"Expected: {row['expected_rate']*100:.0f}%<br>" +
                   f"OAA (zone): {row['oaa_zone']:+.1f}<br>" +
                   f"Avg Distance: {row['avg_distance']:.0f} ft",
        axis=1
    )

    # Create figure
    fig = go.Figure()

    # Add contour heat map
    # Create grid for contour
    x_range = np.arange(binned_data['x_bin'].min(), binned_data['x_bin'].max() + 20, 20)
    y_range = np.arange(binned_data['y_bin'].min(), binned_data['y_bin'].max() + 20, 20)

    # Create z-values matrix
    z_matrix = np.full((len(y_range), len(x_range)), np.nan)

    for _, row in binned_data.iterrows():
        x_idx = np.where(x_range == row['x_bin'])[0]
        y_idx = np.where(y_range == row['y_bin'])[0]
        if len(x_idx) > 0 and len(y_idx) > 0:
            z_matrix[y_idx[0], x_idx[0]] = row['oaa_zone']

    fig.add_trace(go.Contour(
        x=x_range,
        y=y_range,
        z=z_matrix,
        colorscale=[
            [0, 'rgb(215, 48, 39)'],      # Red for negative
            [0.5, 'rgb(255, 255, 191)'],  # Yellow for average
            [1, 'rgb(44, 123, 182)']      # Blue for positive
        ],
        colorbar=dict(
            title="<b>Outs Above<br>Average</b>",
            tickformat="+.1f"
        ),
        hoverinfo='skip',
        contours=dict(
            showlabels=True,
            labelfont=dict(size=10, color='white')
        )
    ))

    # Add scatter points for individual plays
    fig.add_trace(go.Scatter(
        x=plays['x_coord'],
        y=plays['y_coord'],
        mode='markers',
        marker=dict(
            size=4,
            color=plays['was_caught'],
            colorscale=[[0, 'red'], [1, 'green']],
            opacity=0.3,
            line=dict(width=0.5, color='black'),
            showscale=False
        ),
        text=plays['hover_text'],
        hoverinfo='text',
        showlegend=False
    ))

    # Update layout
    fig.update_layout(
        title=dict(
            text=f"<b>{player_name} Catch Probability Heat Map</b><br>" +
                 f"<sub>{position} - Red zones = below average, Blue = above average</sub>",
            x=0.5,
            xanchor='center',
            font=dict(size=16)
        ),
        xaxis=dict(
            title="<b>Horizontal Distance from Home Plate (ft)</b><br>← LF | CF | RF →",
            range=[-250, 250],
            zeroline=True,
            zerolinewidth=2,
            zerolinecolor='black',
            gridcolor='lightgray',
            showgrid=True
        ),
        yaxis=dict(
            title="<b>Distance from Home Plate (ft)</b>",
            range=[0, 450],
            scaleanchor="x",
            scaleratio=1,
            gridcolor='lightgray',
            showgrid=True
        ),
        hovermode='closest',
        showlegend=False,
        width=1000,
        height=1000,
        margin=dict(l=80, r=150, t=100, b=80),
        template='plotly_white'
    )

    return fig

# Example usage
# fielding_data = statcast(start_dt='2024-04-01', end_dt='2024-10-01')
# fielding_data = fielding_data[fielding_data['fielder_2'] == player_id]
# fig = create_catch_probability_heatmap(fielding_data, "Mookie Betts", "RF")
# fig.show()

Python

import plotly.graph_objects as go
import pandas as pd
import numpy as np

def create_sprint_speed_comparison(player_data):
    """
    Create interactive sprint speed comparison chart.

    Parameters:
    player_data: DataFrame with player sprint speed and performance data

    Returns:
    Plotly figure object
    """
    # Prepare data
    sprint_data = player_data[player_data['sprint_speed'].notna()].copy()

    # Create position groups
    def position_group(pos):
        if pos in ['LF', 'CF', 'RF']:
            return 'Outfield'
        elif pos in ['SS', '2B', '3B']:
            return 'Infield'
        elif pos == 'C':
            return 'Catcher'
        elif pos == '1B':
            return 'First Base'
        else:
            return 'Other'

    sprint_data['position_group'] = sprint_data['position'].apply(position_group)

    # Calculate percentiles
    sprint_data['speed_percentile'] = sprint_data['sprint_speed'].rank(pct=True) * 100

    # Create hover text
    sprint_data['hover_text'] = sprint_data.apply(
        lambda row: f"<b>{row['player_name']}</b><br>" +
                   f"Position: {row['position']}<br>" +
                   f"Sprint Speed: {row['sprint_speed']:.2f} ft/s<br>" +
                   f"Percentile: {row['speed_percentile']:.0f}%<br>" +
                   f"Age: {row['age']}<br>" +
                   f"SB: {row['stolen_bases']} " +
                   f"({row['sb_success_rate']*100:.0f}% success)",
        axis=1
    )

    # Create figure
    fig = go.Figure()

    # Color mapping for position groups
    color_map = {
        'Outfield': '#1f77b4',
        'Infield': '#ff7f0e',
        'Catcher': '#2ca02c',
        'First Base': '#d62728',
        'Other': '#9467bd'
    }

    # Add scatter for each position group
    for pos_group in sprint_data['position_group'].unique():
        group_data = sprint_data[sprint_data['position_group'] == pos_group]

        fig.add_trace(go.Scatter(
            x=group_data['age'],
            y=group_data['sprint_speed'],
            mode='markers',
            name=pos_group,
            text=group_data['hover_text'],
            hoverinfo='text',
            marker=dict(
                size=group_data['stolen_bases'].clip(upper=30),  # Cap size
                sizemode='diameter',
                sizeref=0.5,
                color=color_map.get(pos_group, '#7f7f7f'),
                opacity=0.7,
                line=dict(width=1, color='black')
            )
        ))

    # Add age curve trend line
    age_curve = sprint_data.groupby('age')['sprint_speed'].mean().reset_index()

    fig.add_trace(go.Scatter(
        x=age_curve['age'],
        y=age_curve['sprint_speed'],
        mode='lines',
        name='Age Curve',
        line=dict(color='black', width=2, dash='dash'),
        hoverinfo='skip'
    ))

    # Update layout
    fig.update_layout(
        title=dict(
            text="<b>Sprint Speed by Age and Position</b><br>" +
                 "<sub>Size = Stolen Bases | Hover for Details</sub>",
            x=0.5,
            xanchor='center',
            font=dict(size=16)
        ),
        xaxis=dict(
            title="<b>Age (years)</b>",
            range=[20, 42],
            gridcolor='lightgray',
            showgrid=True
        ),
        yaxis=dict(
            title="<b>Sprint Speed (ft/s)</b>",
            range=[24, 32],
            gridcolor='lightgray',
            showgrid=True
        ),
        hovermode='closest',
        showlegend=True,
        legend=dict(
            title=dict(text='<b>Position Group</b>'),
            orientation='v',
            x=1.02,
            y=1
        ),
        width=1100,
        height=700,
        margin=dict(l=80, r=150, t=100, b=80),
        template='plotly_white'
    )

    return fig

# Example usage with simulated data
# np.random.seed(123)
# player_data = pd.DataFrame({
#     'player_name': [f'Player {i}' for i in range(1, 201)],
#     'position': np.random.choice(['CF', 'RF', 'LF', 'SS', '2B', '3B', '1B', 'C'],
#                                  200),
#     'age': np.random.randint(22, 41, 200),
#     'sprint_speed': np.random.normal(27.5, 2, 200),
#     'stolen_bases': np.random.poisson(10, 200),
#     'sb_success_rate': np.random.beta(7, 3, 200)
# })
#
# fig = create_sprint_speed_comparison(player_data)
# fig.show()

Python

import plotly.graph_objects as go
import pandas as pd
import numpy as np

def create_animated_play_trajectory(play_tracking_data, play_description="Defensive Play"):
    """
    Create animated defensive play trajectory visualization.

    Parameters:
    play_tracking_data: DataFrame with frame-by-frame tracking data
        Required columns: frame_time, x_position, y_position, ball_x, ball_y,
                         velocity_x, velocity_y
    play_description: Description for chart title

    Returns:
    Plotly figure object
    """
    # Prepare data
    tracking = play_tracking_data.sort_values('frame_time').copy()

    tracking['time_elapsed'] = tracking['frame_time'] - tracking['frame_time'].min()
    tracking['distance_from_ball'] = np.sqrt(
        (tracking['x_position'] - tracking['ball_x'])**2 +
        (tracking['y_position'] - tracking['ball_y'])**2
    )
    tracking['speed'] = np.sqrt(
        tracking['velocity_x']**2 + tracking['velocity_y']**2
    )

    # Create hover labels
    tracking['frame_label'] = tracking.apply(
        lambda row: f"Time: {row['time_elapsed']:.2f} sec<br>" +
                   f"Fielder Position: ({row['x_position']:.1f}, {row['y_position']:.1f})<br>" +
                   f"Distance to Ball: {row['distance_from_ball']:.1f} ft<br>" +
                   f"Speed: {row['speed']:.2f} ft/s",
        axis=1
    )

    # Calculate optimal route
    start_x = tracking['x_position'].iloc[0]
    start_y = tracking['y_position'].iloc[0]
    end_x = tracking['ball_x'].iloc[-1]
    end_y = tracking['ball_y'].iloc[-1]

    optimal_x = np.linspace(start_x, end_x, 50)
    optimal_y = np.linspace(start_y, end_y, 50)

    # Create figure
    fig = go.Figure()

    # Add field outline
    fig.add_trace(go.Scatter(
        x=[-200, 200, 200, -200, -200],
        y=[0, 0, 400, 400, 0],
        mode='lines',
        line=dict(color='green', width=2),
        showlegend=False,
        hoverinfo='skip'
    ))

    # Add optimal route
    fig.add_trace(go.Scatter(
        x=optimal_x,
        y=optimal_y,
        mode='lines',
        line=dict(color='blue', width=2, dash='dash'),
        name='Optimal Route',
        hoverinfo='skip'
    ))

    # Create frames for animation
    frames = []
    for i, frame_time in enumerate(tracking['frame_time'].unique()):
        frame_data = tracking[tracking['frame_time'] <= frame_time]

        frame_traces = [
            # Field outline (static)
            go.Scatter(
                x=[-200, 200, 200, -200, -200],
                y=[0, 0, 400, 400, 0],
                mode='lines',
                line=dict(color='green', width=2),
                showlegend=False,
                hoverinfo='skip'
            ),
            # Optimal route (static)
            go.Scatter(
                x=optimal_x,
                y=optimal_y,
                mode='lines',
                line=dict(color='blue', width=2, dash='dash'),
                name='Optimal Route',
                hoverinfo='skip',
                showlegend=(i == 0)
            ),
            # Fielder path
            go.Scatter(
                x=frame_data['x_position'],
                y=frame_data['y_position'],
                mode='markers+lines',
                marker=dict(color='red', size=12),
                line=dict(color='red', width=2),
                text=frame_data['frame_label'],
                hoverinfo='text',
                name='Fielder Path',
                showlegend=(i == 0)
            ),
            # Current ball position
            go.Scatter(
                x=[frame_data['ball_x'].iloc[-1]],
                y=[frame_data['ball_y'].iloc[-1]],
                mode='markers',
                marker=dict(color='orange', size=15, symbol='star'),
                name='Ball',
                hoverinfo='skip',
                showlegend=(i == 0)
            )
        ]

        frames.append(go.Frame(data=frame_traces, name=str(frame_time)))

    # Add initial frame data
    initial_data = tracking.iloc[:1]
    fig.add_trace(go.Scatter(
        x=initial_data['x_position'],
        y=initial_data['y_position'],
        mode='markers+lines',
        marker=dict(color='red', size=12),
        line=dict(color='red', width=2),
        text=initial_data['frame_label'],
        hoverinfo='text',
        name='Fielder Path'
    ))

    fig.add_trace(go.Scatter(
        x=initial_data['ball_x'],
        y=initial_data['ball_y'],
        mode='markers',
        marker=dict(color='orange', size=15, symbol='star'),
        name='Ball',
        hoverinfo='skip'
    ))

    fig.frames = frames

    # Update layout
    fig.update_layout(
        title=dict(
            text=f"<b>{play_description}</b><br>" +
                 "<sub>Red = Fielder | Orange = Ball | Blue Dash = Optimal Route</sub>",
            x=0.5,
            xanchor='center',
            font=dict(size=16)
        ),
        xaxis=dict(
            title="<b>Distance from Home (ft)</b>",
            range=[-250, 250],
            zeroline=True,
            zerolinecolor='black',
            gridcolor='lightgray',
            showgrid=True
        ),
        yaxis=dict(
            title="<b>Distance from Home (ft)</b>",
            range=[0, 450],
            scaleanchor="x",
            scaleratio=1,
            gridcolor='lightgray',
            showgrid=True
        ),
        showlegend=True,
        legend=dict(x=1.02, y=1),
        updatemenus=[{
            'type': 'buttons',
            'showactive': False,
            'buttons': [
                {
                    'label': 'Play',
                    'method': 'animate',
                    'args': [None, {
                        'frame': {'duration': 100, 'redraw': True},
                        'fromcurrent': True,
                        'transition': {'duration': 50}
                    }]
                },
                {
                    'label': 'Pause',
                    'method': 'animate',
                    'args': [[None], {
                        'frame': {'duration': 0, 'redraw': False},
                        'mode': 'immediate',
                        'transition': {'duration': 0}
                    }]
                }
            ],
            'x': 0.1,
            'y': 0
        }],
        sliders=[{
            'active': 0,
            'steps': [
                {
                    'args': [[f.name], {
                        'frame': {'duration': 0, 'redraw': True},
                        'mode': 'immediate',
                        'transition': {'duration': 0}
                    }],
                    'label': f'{float(f.name):.2f}',
                    'method': 'animate'
                }
                for f in frames
            ],
            'currentvalue': {
                'prefix': 'Time: ',
                'suffix': ' sec',
                'font': {'size': 14}
            },
            'x': 0.1,
            'len': 0.9,
            'xanchor': 'left',
            'y': 0,
            'yanchor': 'top'
        }],
        width=1000,
        height=1000,
        template='plotly_white'
    )

    return fig

# Example usage with simulated tracking data
# n_frames = 30
# play_tracking = pd.DataFrame({
#     'frame_time': np.linspace(0, 3, n_frames),
#     'x_position': np.linspace(50, 180, n_frames) + np.random.normal(0, 5, n_frames),
#     'y_position': np.linspace(100, 250, n_frames) + np.random.normal(0, 5, n_frames),
#     'ball_x': np.repeat(180, n_frames),
#     'ball_y': np.linspace(0, 250, n_frames),
#     'velocity_x': np.repeat(4, n_frames),
#     'velocity_y': np.repeat(5, n_frames)
# })
#
# fig = create_animated_play_trajectory(play_tracking, "Byron Buxton Diving Catch")
# fig.show()

8.10 Exercises

Exercise 8.1: Calculating Simple OAA

Using Statcast data for a single month, calculate a simplified version of OAA for outfielders:

Get batted ball data for one month (July 2024 recommended)
Filter for outfield fly balls and line drives
Calculate catch probability based on distance and hang time (use simplified model from section 8.2.2)
Determine actual outcomes (caught or not)
Calculate OAA for each outfielder as sum of (actual - expected)
Identify the top 5 and bottom 5 defenders

Bonus: Compare your simplified OAA to Baseball Savant's official OAA for the same players and time period. How close are they?

Exercise 8.2: Shift Impact Analysis

Replicate the shift ban analysis from section 8.7.2 using real data:

Get Statcast data for ground balls hit by left-handed batters for:

May 2022 (shifts allowed)
May 2023 (shifts banned)

Calculate ground ball hit rates for each month
Perform a statistical test for the difference
Create a visualization comparing the two periods
Calculate how many extra hits occurred in 2023 vs expected based on 2022 rates

Challenge: Identify which individual players benefited most from the shift ban by comparing their 2022 vs 2023 ground ball BABIP.

Exercise 8.3: Sprint Speed and Stolen Base Efficiency

Analyze the relationship between sprint speed and stolen base success:

Get sprint speed data for all qualified players (2024)
Get stolen base attempts and success rates
Calculate stolen base success rate for players with 10+ attempts
Create a scatter plot of sprint speed vs SB success rate
Fit a regression model and interpret the relationship
Identify players who over/under-perform their expected SB rate based on speed

Question: What sprint speed corresponds to 75% stolen base success (break-even point)?

Exercise 8.4: Defensive Value Comparison

Compare defensive metrics across different systems:

Select 20 players across multiple positions (2024 season)
Collect their OAA, UZR, and DRS values
Standardize all metrics to same scale (z-scores)
Calculate correlation between metrics
Identify players where metrics disagree significantly
Create a visualization showing agreement/disagreement

Question: For which positions do the metrics agree most? Where do they diverge most? Why might this be?

You've now completed your introduction to fielding and baserunning analytics. You understand why defense is challenging to measure, how modern metrics like OAA work, the value of positioning and shifts, and how to evaluate baserunning contribution. Defense and baserunning combined can account for 2-3 WAR per season for elite players—real, measurable value that traditional statistics completely missed.

The Statcast revolution has transformed defensive evaluation from subjective ("he looks good") to objective ("he made 73% of plays with 65% average probability"). We can now properly credit players like Kevin Kiermaier, Yadier Molina, and Matt Chapman for defensive excellence that was previously unrecognized in conventional statistics.

In Chapter 9, we'll turn to win probability and leveraged situations, understanding how context affects player and managerial decisions. The technical skills you've developed throughout this book will combine to help you evaluate complete players—offense, defense, baserunning, and situational performance—using the full arsenal of modern analytics.

Practice Exercises

Reinforce what you've learned with these hands-on exercises. Try to solve them on your own before viewing hints or solutions.

4 exercises

Tips for Success

Read the problem carefully before starting to code
Break down complex problems into smaller steps
Use the hints if you're stuck - they won't give away the answer
After solving, compare your approach with the solution

Exercise 8.1

Calculating Simple OAA

Hard

Using Statcast data for a single month, calculate a simplified version of OAA for outfielders:

1. Get batted ball data for one month (July 2024 recommended)
2. Filter for outfield fly balls and line drives
3. Calculate catch probability based on distance and hang time (use simplified model from section 8.2.2)
4. Determine actual outcomes (caught or not)
5. Calculate OAA for each outfielder as sum of (actual - expected)
6. Identify the top 5 and bottom 5 defenders

**Bonus**: Compare your simplified OAA to Baseball Savant's official OAA for the same players and time period. How close are they?

Exercise 8.2

Shift Impact Analysis

Hard

Replicate the shift ban analysis from section 8.7.2 using real data:

1. Get Statcast data for ground balls hit by left-handed batters for:
- May 2022 (shifts allowed)
- May 2023 (shifts banned)
2. Calculate ground ball hit rates for each month
3. Perform a statistical test for the difference
4. Create a visualization comparing the two periods
5. Calculate how many extra hits occurred in 2023 vs expected based on 2022 rates

**Challenge**: Identify which individual players benefited most from the shift ban by comparing their 2022 vs 2023 ground ball BABIP.

Exercise 8.3

Sprint Speed and Stolen Base Efficiency

Hard

Analyze the relationship between sprint speed and stolen base success:

1. Get sprint speed data for all qualified players (2024)
2. Get stolen base attempts and success rates
3. Calculate stolen base success rate for players with 10+ attempts
4. Create a scatter plot of sprint speed vs SB success rate
5. Fit a regression model and interpret the relationship
6. Identify players who over/under-perform their expected SB rate based on speed

**Question**: What sprint speed corresponds to 75% stolen base success (break-even point)?

Exercise 8.4

Defensive Value Comparison

Hard

Compare defensive metrics across different systems:

1. Select 20 players across multiple positions (2024 season)
2. Collect their OAA, UZR, and DRS values
3. Standardize all metrics to same scale (z-scores)
4. Calculate correlation between metrics
5. Identify players where metrics disagree significantly
6. Create a visualization showing agreement/disagreement

**Question**: For which positions do the metrics agree most? Where do they diverge most? Why might this be?

---

You've now completed your introduction to fielding and baserunning analytics. You understand why defense is challenging to measure, how modern metrics like OAA work, the value of positioning and shifts, and how to evaluate baserunning contribution. Defense and baserunning combined can account for 2-3 WAR per season for elite players—real, measurable value that traditional statistics completely missed.

The Statcast revolution has transformed defensive evaluation from subjective ("he looks good") to objective ("he made 73% of plays with 65% average probability"). We can now properly credit players like Kevin Kiermaier, Yadier Molina, and Matt Chapman for defensive excellence that was previously unrecognized in conventional statistics.

In Chapter 9, we'll turn to win probability and leveraged situations, understanding how context affects player and managerial decisions. The technical skills you've developed throughout this book will combine to help you evaluate complete players—offense, defense, baserunning, and situational performance—using the full arsenal of modern analytics.

Chapter 8: Fielding & Baserunning Analytics

Book Progress

What You'll Learn

Languages in This Chapter

Table of Contents

Quick Navigation

8.1 The Challenge of Measuring Defense

8.1.1 Why Defense is Hard to Quantify {#defense-challenge}

8.1.2 Evolution of Defensive Metrics {#defense-evolution}

8.1.3 What Good Defensive Metrics Should Measure {#good-metrics}

8.2 Outs Above Average (OAA)

8.2.1 Understanding OAA {#understanding-oaa}

8.2.2 OAA Components {#oaa-components}

8.2.3 Catch Probability Explained {#catch-probability}

8.3 Route Efficiency and Jump

8.3.1 Outfielder Routes and Metrics {#outfield-routes}

8.3.2 Infielder Metrics {#infield-metrics}

8.4 Arm Strength and Throwing

8.4.1 Outfield Arm Metrics {#outfield-arm}

8.4.2 Infield Arm Considerations {#infield-arm}

8.5 Catcher Defense

8.5.1 Framing {#framing}

8.5.2 Pop Time {#pop-time}

8.5.3 Blocking {#blocking}

8.6 Pre-Statcast Defensive Metrics

8.6.1 Ultimate Zone Rating (UZR) {#uzr}

8.6.2 Defensive Runs Saved (DRS) {#drs}

8.6.3 Comparison Table of Metrics {#metrics-comparison}

8.7 Defensive Positioning

8.7.1 The Shift Era (2010-2022) {#shift-era}

8.7.2 Post-Shift Rules (2023+) {#post-shift-rules}

8.7.3 Optimal Positioning Analysis {#optimal-positioning}

8.8 Baserunning Analytics

8.8.1 Sprint Speed {#sprint-speed}

8.8.2 Baserunning Value {#baserunning-value}

8.8.3 BsR (Baserunning Runs) {#bsr}

8.9 Interactive Fielding Visualizations

8.8.1 Interactive Catch Probability Heat Map

8.8.2 Interactive Sprint Speed Comparison Chart

8.8.3 Animated Play Trajectory Visualization

8.10 Exercises

Exercise 8.1: Calculating Simple OAA

Exercise 8.2: Shift Impact Analysis

Exercise 8.3: Sprint Speed and Stolen Base Efficiency

Exercise 8.4: Defensive Value Comparison

Practice Exercises

Tips for Success

Calculating Simple OAA

Shift Impact Analysis

Sprint Speed and Stolen Base Efficiency

Defensive Value Comparison

Chapter Summary

Related Resources

Glossary

Resources

All Chapters