Chapter 25: Advanced Statcast & Bat Tracking

The evolution of baseball analytics has been marked by revolutionary moments: the introduction of sabermetrics, the PITCHf/x system, and most recently, Statcast. However, the 2024 season ushered in a new frontier with the widespread implementation of bat tracking technology. This system, using high-speed cameras and computer vision algorithms, captures previously unmeasurable aspects of hitting mechanics in real-time.

Advanced ~9 min read 8 sections 15 code examples
Book Progress
48%
Chapter 26 of 54
What You'll Learn
  • Introduction to Advanced Tracking Technology
  • Bat Tracking Metrics
  • Swing Decisions
  • Expected Run Values
  • And 4 more topics...
Languages in This Chapter
R (7) Python (8)

All code examples can be copied and run in your environment.

25.1 Introduction to Advanced Tracking Technology

The evolution of baseball analytics has been marked by revolutionary moments: the introduction of sabermetrics, the PITCHf/x system, and most recently, Statcast. However, the 2024 season ushered in a new frontier with the widespread implementation of bat tracking technology. This system, using high-speed cameras and computer vision algorithms, captures previously unmeasurable aspects of hitting mechanics in real-time.

The Technology Behind Bat Tracking

Bat tracking technology employs a network of stereoscopic cameras positioned throughout MLB stadiums, operating at frame rates exceeding 300 frames per second. These cameras track reflective markers or patterns on the bat, combined with advanced machine learning models that can identify the bat's position, orientation, and velocity throughout the swing.

The system captures:

  • Bat speed: The maximum velocity of the sweet spot (6 inches from the barrel end) during the swing
  • Swing length: The total distance the bat head travels from the start of the swing to contact
  • Time to contact: The elapsed time from swing initiation to ball contact
  • Attack angle: The vertical angle of the bat path at the moment of contact
  • Bat acceleration: The rate of change in bat velocity throughout the swing

These metrics, when combined with existing Statcast data on ball flight, pitch characteristics, and fielder positioning, create an unprecedented dataset for understanding offensive performance.

Integration with Existing Statcast Metrics

Traditional Statcast metrics like exit velocity, launch angle, and expected batting average (xBA) tell us what happened after contact. Bat tracking reveals why it happened. This distinction is crucial for:

  1. Skill vs. Outcome Separation: Understanding whether a player's results are sustainable
  2. Mechanical Diagnosis: Identifying specific swing flaws or strengths
  3. Pitcher-Hitter Matchups: Predicting success based on swing characteristics and pitch profiles
  4. Developmental Feedback: Providing actionable insights for player development

The integration of these datasets allows analysts to construct complete models of plate appearance outcomes, from pitch recognition to swing decision to contact quality to batted ball result.


25.2 Bat Tracking Metrics

Bat Speed

Bat speed, measured in miles per hour, represents the maximum velocity of the bat's sweet spot during the swing. The average MLB bat speed in 2024 was approximately 72.8 mph, but elite hitters regularly exceed 75 mph.

Key Insights:

  • Bat speed correlates strongly with exit velocity (r ≈ 0.85)
  • However, bat speed alone doesn't guarantee success
  • Contact quality depends on bat speed, timing, and swing path alignment

Let's analyze bat speed data using both R and Python:

# R: Analyzing Bat Speed Data with baseballr
library(baseballr)
library(dplyr)
library(ggplot2)

# Note: As of 2024, baseballr may not have direct bat tracking endpoints
# This example shows how to work with Statcast data and prepare for bat tracking integration

# Get Statcast data for top hitters
ohtani_data <- statcast_search(
  start_date = "2024-04-01",
  end_date = "2024-09-30",
  playerid = 660271,  # Shohei Ohtani
  player_type = "batter"
)

# Create a proxy for bat speed analysis using exit velocity
# In practice, bat tracking data would be merged with this dataset

bat_speed_analysis <- ohtani_data %>%
  filter(!is.na(launch_speed)) %>%
  mutate(
    # Estimate bat speed from exit velocity (simplified model)
    estimated_bat_speed = (launch_speed - 10) / 1.2,
    pitch_speed_bucket = cut(release_speed,
                             breaks = c(0, 88, 92, 96, 105),
                             labels = c("Slow", "Medium", "Fast", "Very Fast"))
  ) %>%
  group_by(pitch_speed_bucket) %>%
  summarise(
    avg_exit_velo = mean(launch_speed, na.rm = TRUE),
    est_bat_speed = mean(estimated_bat_speed, na.rm = TRUE),
    n_swings = n()
  )

print(bat_speed_analysis)

# Visualize bat speed by pitch type
ggplot(ohtani_data %>% filter(!is.na(launch_speed), !is.na(pitch_type)),
       aes(x = pitch_type, y = launch_speed)) +
  geom_boxplot(fill = "steelblue", alpha = 0.7) +
  labs(title = "Shohei Ohtani: Exit Velocity by Pitch Type (2024)",
       subtitle = "Exit velocity as a proxy for bat speed effectiveness",
       x = "Pitch Type",
       y = "Exit Velocity (mph)") +
  theme_minimal()
# Python: Bat Speed Analysis with pybaseball
import pybaseball as pyb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

pyb.cache.enable()

# Get Statcast data
ohtani_data = pyb.statcast_batter('2024-04-01', '2024-09-30', 660271)

# Simulate bat tracking data (in practice, this would come from MLB's API)
# Creating realistic bat speed values based on exit velocity
np.random.seed(42)
ohtani_data['bat_speed'] = np.where(
    ohtani_data['launch_speed'].notna(),
    70 + (ohtani_data['launch_speed'] - 85) * 0.15 + np.random.normal(0, 1.5, len(ohtani_data)),
    np.nan
)

# Analyze bat speed metrics
bat_speed_stats = ohtani_data[ohtani_data['bat_speed'].notna()].groupby('pitch_type').agg({
    'bat_speed': ['mean', 'std', 'max'],
    'launch_speed': 'mean',
    'events': 'count'
}).round(2)

print("Ohtani Bat Speed by Pitch Type:")
print(bat_speed_stats)

# Relationship between bat speed and exit velocity
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Scatter plot
valid_data = ohtani_data[ohtani_data['bat_speed'].notna() &
                         ohtani_data['launch_speed'].notna()]
ax1.scatter(valid_data['bat_speed'], valid_data['launch_speed'],
           alpha=0.5, c=valid_data['launch_angle'], cmap='viridis')
ax1.set_xlabel('Bat Speed (mph)')
ax1.set_ylabel('Exit Velocity (mph)')
ax1.set_title('Bat Speed vs Exit Velocity - Shohei Ohtani')
plt.colorbar(ax1.collections[0], ax=ax1, label='Launch Angle')

# Distribution of bat speed
ax2.hist(valid_data['bat_speed'], bins=30, alpha=0.7, color='steelblue', edgecolor='black')
ax2.axvline(valid_data['bat_speed'].mean(), color='red',
           linestyle='--', linewidth=2, label=f'Mean: {valid_data["bat_speed"].mean():.1f} mph')
ax2.set_xlabel('Bat Speed (mph)')
ax2.set_ylabel('Frequency')
ax2.set_title('Distribution of Bat Speed')
ax2.legend()

plt.tight_layout()
plt.savefig('ohtani_bat_speed_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

Swing Length

Swing length measures the total distance traveled by the bat barrel from swing initiation to contact point. Shorter swings generally allow for better plate coverage and adjustment, while longer swings can generate more power but require earlier commitment.

Optimal Swing Length:


  • Elite hitters: 7.0 - 7.5 feet

  • Average MLB: 7.5 - 8.0 feet

  • Power hitters often have longer swings (8.0+ feet) but compensate with exceptional bat speed

# R: Swing Length Analysis
# Simulating swing length data for demonstration

set.seed(123)
n_swings <- 500

swing_data <- data.frame(
  swing_length = rnorm(n_swings, mean = 7.3, sd = 0.6),
  bat_speed = rnorm(n_swings, mean = 73.5, sd = 3.2),
  contact_rate = rbinom(n_swings, 1, 0.78)
)

# Add exit velocity based on bat speed and swing efficiency
swing_data <- swing_data %>%
  mutate(
    swing_efficiency = 1 / (swing_length / 7.0),
    exit_velo = ifelse(contact_rate == 1,
                      bat_speed * 1.25 * swing_efficiency + rnorm(n_swings, 0, 5),
                      NA),
    squared_up = ifelse(!is.na(exit_velo) & exit_velo > 95, 1, 0)
  )

# Analyze relationship between swing length and contact quality
length_analysis <- swing_data %>%
  mutate(length_bucket = cut(swing_length,
                             breaks = quantile(swing_length, probs = seq(0, 1, 0.25)),
                             labels = c("Short", "Med-Short", "Med-Long", "Long"),
                             include.lowest = TRUE)) %>%
  group_by(length_bucket) %>%
  summarise(
    avg_bat_speed = mean(bat_speed),
    contact_pct = mean(contact_rate) * 100,
    avg_exit_velo = mean(exit_velo, na.rm = TRUE),
    squared_up_rate = sum(squared_up, na.rm = TRUE) / sum(contact_rate) * 100
  )

print(length_analysis)

# Visualization
ggplot(swing_data %>% filter(!is.na(exit_velo)),
       aes(x = swing_length, y = exit_velo)) +
  geom_point(alpha = 0.4, color = "steelblue") +
  geom_smooth(method = "loess", color = "red", se = TRUE) +
  labs(title = "Swing Length vs Exit Velocity",
       subtitle = "Optimal swing length balances speed and efficiency",
       x = "Swing Length (feet)",
       y = "Exit Velocity (mph)") +
  theme_minimal()

Squared-Up Rate

Squared-up rate measures the frequency with which a hitter achieves optimal contact—defined as an exit velocity of 95+ mph with a launch angle between 8 and 32 degrees. This metric combines bat tracking data with Statcast outcomes to evaluate contact quality.

# Python: Squared-Up Rate Analysis
import pybaseball as pyb
import pandas as pd
import numpy as np

# Get multiple players for comparison
players = {
    'Mookie Betts': 605141,
    'Shohei Ohtani': 660271,
    'Aaron Judge': 592450,
    'Juan Soto': 665742
}

squared_up_comparison = []

for name, player_id in players.items():
    data = pyb.statcast_batter('2024-04-01', '2024-09-30', player_id)

    # Calculate squared-up metrics
    total_swings = len(data[data['description'].str.contains('hit_into_play|foul|swinging_strike',
                                                             case=False, na=False)])

    squared_up = data[
        (data['launch_speed'] >= 95) &
        (data['launch_angle'] >= 8) &
        (data['launch_angle'] <= 32)
    ]

    squared_up_comparison.append({
        'Player': name,
        'Squared-Up Rate': len(squared_up) / total_swings * 100 if total_swings > 0 else 0,
        'Avg Exit Velo (Squared-Up)': squared_up['launch_speed'].mean(),
        'Barrel Rate': len(data[data['barrel'] == 1]) / total_swings * 100 if total_swings > 0 else 0,
        'Total PA': len(data)
    })

comparison_df = pd.DataFrame(squared_up_comparison)
print("\nSquared-Up Rate Comparison (2024):")
print(comparison_df.round(2))

# Visualization
fig, ax = plt.subplots(figsize=(10, 6))
x = np.arange(len(comparison_df))
width = 0.35

bars1 = ax.bar(x - width/2, comparison_df['Squared-Up Rate'], width,
               label='Squared-Up Rate', alpha=0.8, color='steelblue')
bars2 = ax.bar(x + width/2, comparison_df['Barrel Rate'], width,
               label='Barrel Rate', alpha=0.8, color='coral')

ax.set_xlabel('Player')
ax.set_ylabel('Rate (%)')
ax.set_title('Squared-Up Rate vs Barrel Rate Comparison')
ax.set_xticks(x)
ax.set_xticklabels(comparison_df['Player'], rotation=45, ha='right')
ax.legend()
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.savefig('squared_up_comparison.png', dpi=300, bbox_inches='tight')
plt.show()
R
# R: Analyzing Bat Speed Data with baseballr
library(baseballr)
library(dplyr)
library(ggplot2)

# Note: As of 2024, baseballr may not have direct bat tracking endpoints
# This example shows how to work with Statcast data and prepare for bat tracking integration

# Get Statcast data for top hitters
ohtani_data <- statcast_search(
  start_date = "2024-04-01",
  end_date = "2024-09-30",
  playerid = 660271,  # Shohei Ohtani
  player_type = "batter"
)

# Create a proxy for bat speed analysis using exit velocity
# In practice, bat tracking data would be merged with this dataset

bat_speed_analysis <- ohtani_data %>%
  filter(!is.na(launch_speed)) %>%
  mutate(
    # Estimate bat speed from exit velocity (simplified model)
    estimated_bat_speed = (launch_speed - 10) / 1.2,
    pitch_speed_bucket = cut(release_speed,
                             breaks = c(0, 88, 92, 96, 105),
                             labels = c("Slow", "Medium", "Fast", "Very Fast"))
  ) %>%
  group_by(pitch_speed_bucket) %>%
  summarise(
    avg_exit_velo = mean(launch_speed, na.rm = TRUE),
    est_bat_speed = mean(estimated_bat_speed, na.rm = TRUE),
    n_swings = n()
  )

print(bat_speed_analysis)

# Visualize bat speed by pitch type
ggplot(ohtani_data %>% filter(!is.na(launch_speed), !is.na(pitch_type)),
       aes(x = pitch_type, y = launch_speed)) +
  geom_boxplot(fill = "steelblue", alpha = 0.7) +
  labs(title = "Shohei Ohtani: Exit Velocity by Pitch Type (2024)",
       subtitle = "Exit velocity as a proxy for bat speed effectiveness",
       x = "Pitch Type",
       y = "Exit Velocity (mph)") +
  theme_minimal()
R
# R: Swing Length Analysis
# Simulating swing length data for demonstration

set.seed(123)
n_swings <- 500

swing_data <- data.frame(
  swing_length = rnorm(n_swings, mean = 7.3, sd = 0.6),
  bat_speed = rnorm(n_swings, mean = 73.5, sd = 3.2),
  contact_rate = rbinom(n_swings, 1, 0.78)
)

# Add exit velocity based on bat speed and swing efficiency
swing_data <- swing_data %>%
  mutate(
    swing_efficiency = 1 / (swing_length / 7.0),
    exit_velo = ifelse(contact_rate == 1,
                      bat_speed * 1.25 * swing_efficiency + rnorm(n_swings, 0, 5),
                      NA),
    squared_up = ifelse(!is.na(exit_velo) & exit_velo > 95, 1, 0)
  )

# Analyze relationship between swing length and contact quality
length_analysis <- swing_data %>%
  mutate(length_bucket = cut(swing_length,
                             breaks = quantile(swing_length, probs = seq(0, 1, 0.25)),
                             labels = c("Short", "Med-Short", "Med-Long", "Long"),
                             include.lowest = TRUE)) %>%
  group_by(length_bucket) %>%
  summarise(
    avg_bat_speed = mean(bat_speed),
    contact_pct = mean(contact_rate) * 100,
    avg_exit_velo = mean(exit_velo, na.rm = TRUE),
    squared_up_rate = sum(squared_up, na.rm = TRUE) / sum(contact_rate) * 100
  )

print(length_analysis)

# Visualization
ggplot(swing_data %>% filter(!is.na(exit_velo)),
       aes(x = swing_length, y = exit_velo)) +
  geom_point(alpha = 0.4, color = "steelblue") +
  geom_smooth(method = "loess", color = "red", se = TRUE) +
  labs(title = "Swing Length vs Exit Velocity",
       subtitle = "Optimal swing length balances speed and efficiency",
       x = "Swing Length (feet)",
       y = "Exit Velocity (mph)") +
  theme_minimal()
Python
# Python: Bat Speed Analysis with pybaseball
import pybaseball as pyb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

pyb.cache.enable()

# Get Statcast data
ohtani_data = pyb.statcast_batter('2024-04-01', '2024-09-30', 660271)

# Simulate bat tracking data (in practice, this would come from MLB's API)
# Creating realistic bat speed values based on exit velocity
np.random.seed(42)
ohtani_data['bat_speed'] = np.where(
    ohtani_data['launch_speed'].notna(),
    70 + (ohtani_data['launch_speed'] - 85) * 0.15 + np.random.normal(0, 1.5, len(ohtani_data)),
    np.nan
)

# Analyze bat speed metrics
bat_speed_stats = ohtani_data[ohtani_data['bat_speed'].notna()].groupby('pitch_type').agg({
    'bat_speed': ['mean', 'std', 'max'],
    'launch_speed': 'mean',
    'events': 'count'
}).round(2)

print("Ohtani Bat Speed by Pitch Type:")
print(bat_speed_stats)

# Relationship between bat speed and exit velocity
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Scatter plot
valid_data = ohtani_data[ohtani_data['bat_speed'].notna() &
                         ohtani_data['launch_speed'].notna()]
ax1.scatter(valid_data['bat_speed'], valid_data['launch_speed'],
           alpha=0.5, c=valid_data['launch_angle'], cmap='viridis')
ax1.set_xlabel('Bat Speed (mph)')
ax1.set_ylabel('Exit Velocity (mph)')
ax1.set_title('Bat Speed vs Exit Velocity - Shohei Ohtani')
plt.colorbar(ax1.collections[0], ax=ax1, label='Launch Angle')

# Distribution of bat speed
ax2.hist(valid_data['bat_speed'], bins=30, alpha=0.7, color='steelblue', edgecolor='black')
ax2.axvline(valid_data['bat_speed'].mean(), color='red',
           linestyle='--', linewidth=2, label=f'Mean: {valid_data["bat_speed"].mean():.1f} mph')
ax2.set_xlabel('Bat Speed (mph)')
ax2.set_ylabel('Frequency')
ax2.set_title('Distribution of Bat Speed')
ax2.legend()

plt.tight_layout()
plt.savefig('ohtani_bat_speed_analysis.png', dpi=300, bbox_inches='tight')
plt.show()
Python
# Python: Squared-Up Rate Analysis
import pybaseball as pyb
import pandas as pd
import numpy as np

# Get multiple players for comparison
players = {
    'Mookie Betts': 605141,
    'Shohei Ohtani': 660271,
    'Aaron Judge': 592450,
    'Juan Soto': 665742
}

squared_up_comparison = []

for name, player_id in players.items():
    data = pyb.statcast_batter('2024-04-01', '2024-09-30', player_id)

    # Calculate squared-up metrics
    total_swings = len(data[data['description'].str.contains('hit_into_play|foul|swinging_strike',
                                                             case=False, na=False)])

    squared_up = data[
        (data['launch_speed'] >= 95) &
        (data['launch_angle'] >= 8) &
        (data['launch_angle'] <= 32)
    ]

    squared_up_comparison.append({
        'Player': name,
        'Squared-Up Rate': len(squared_up) / total_swings * 100 if total_swings > 0 else 0,
        'Avg Exit Velo (Squared-Up)': squared_up['launch_speed'].mean(),
        'Barrel Rate': len(data[data['barrel'] == 1]) / total_swings * 100 if total_swings > 0 else 0,
        'Total PA': len(data)
    })

comparison_df = pd.DataFrame(squared_up_comparison)
print("\nSquared-Up Rate Comparison (2024):")
print(comparison_df.round(2))

# Visualization
fig, ax = plt.subplots(figsize=(10, 6))
x = np.arange(len(comparison_df))
width = 0.35

bars1 = ax.bar(x - width/2, comparison_df['Squared-Up Rate'], width,
               label='Squared-Up Rate', alpha=0.8, color='steelblue')
bars2 = ax.bar(x + width/2, comparison_df['Barrel Rate'], width,
               label='Barrel Rate', alpha=0.8, color='coral')

ax.set_xlabel('Player')
ax.set_ylabel('Rate (%)')
ax.set_title('Squared-Up Rate vs Barrel Rate Comparison')
ax.set_xticks(x)
ax.set_xticklabels(comparison_df['Player'], rotation=45, ha='right')
ax.legend()
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.savefig('squared_up_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

25.3 Swing Decisions

Understanding when and why hitters swing is crucial for evaluating plate discipline and approach. Bat tracking technology, combined with pitch tracking, allows us to analyze swing decisions with unprecedented detail.

Chase Rate and Zone Contact

Chase rate measures how often a hitter swings at pitches outside the strike zone. Elite hitters typically have chase rates below 25%, while league average hovers around 28-30%.

# R: Swing Decision Analysis
library(baseballr)
library(dplyr)
library(ggplot2)

# Get pitch-by-pitch data
betts_data <- statcast_search(
  start_date = "2024-04-01",
  end_date = "2024-09-30",
  playerid = 605141,  # Mookie Betts
  player_type = "batter"
)

# Analyze swing decisions by zone
swing_decisions <- betts_data %>%
  mutate(
    in_zone = ifelse(zone <= 9, "In Zone", "Out of Zone"),
    swing = ifelse(description %in% c("hit_into_play", "foul", "swinging_strike",
                                      "foul_tip", "swinging_strike_blocked"),
                  "Swing", "Take"),
    contact = ifelse(description %in% c("hit_into_play", "foul", "foul_tip"),
                    "Contact", "Miss/Take")
  ) %>%
  filter(!is.na(in_zone))

# Calculate key metrics
decision_metrics <- swing_decisions %>%
  group_by(in_zone) %>%
  summarise(
    total_pitches = n(),
    swings = sum(swing == "Swing"),
    swing_rate = swings / total_pitches * 100,
    contact_count = sum(contact == "Contact" & swing == "Swing"),
    contact_rate = contact_count / swings * 100,
    whiff_rate = sum(description %in% c("swinging_strike", "swinging_strike_blocked")) / swings * 100
  )

print("Mookie Betts - Swing Decision Metrics:")
print(decision_metrics)

# O-Swing% (Chase Rate)
chase_rate <- decision_metrics %>%
  filter(in_zone == "Out of Zone") %>%
  pull(swing_rate)

# Z-Contact% (Zone Contact Rate)
zone_contact <- decision_metrics %>%
  filter(in_zone == "In Zone") %>%
  pull(contact_rate)

cat(sprintf("\nChase Rate: %.1f%%\n", chase_rate))
cat(sprintf("Zone Contact Rate: %.1f%%\n", zone_contact))

# Visualize swing decisions by location
ggplot(swing_decisions %>% filter(!is.na(plate_x), !is.na(plate_z)),
       aes(x = plate_x, y = plate_z, color = swing)) +
  geom_point(alpha = 0.3, size = 2) +
  geom_rect(aes(xmin = -0.83, xmax = 0.83, ymin = 1.5, ymax = 3.5),
            fill = NA, color = "black", linewidth = 1.5, inherit.aes = FALSE) +
  scale_color_manual(values = c("Swing" = "red", "Take" = "blue")) +
  labs(title = "Mookie Betts - Swing Decisions (Catcher's View)",
       subtitle = "2024 Season",
       x = "Horizontal Location (feet)",
       y = "Vertical Location (feet)") +
  coord_fixed(ratio = 1) +
  theme_minimal()

Whiff Rate Analysis

Whiff rate (swinging strike rate) has become increasingly important as strikeout rates have climbed. Bat tracking data reveals that whiff rate correlates strongly with swing length and bat speed relative to pitch velocity.

# Python: Advanced Whiff Rate Analysis
import pybaseball as pyb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Get data for analysis
soto_data = pyb.statcast_batter('2024-04-01', '2024-09-30', 665742)

# Define swing events
swing_events = ['hit_into_play', 'foul', 'swinging_strike',
                'foul_tip', 'swinging_strike_blocked']

# Calculate whiff rate by pitch characteristics
soto_swings = soto_data[soto_data['description'].isin(swing_events)].copy()

# Add velocity bins
soto_swings['velo_bin'] = pd.cut(soto_swings['release_speed'],
                                  bins=[0, 88, 92, 96, 105],
                                  labels=['<88', '88-92', '92-96', '96+'])

# Calculate whiff metrics
whiff_analysis = soto_swings.groupby(['pitch_type', 'velo_bin']).agg({
    'description': 'count',
    'release_speed': 'mean'
}).rename(columns={'description': 'total_swings'})

# Calculate whiff rate
whiff_counts = soto_swings[soto_swings['description'].isin(['swinging_strike',
                                                             'swinging_strike_blocked'])].groupby(
    ['pitch_type', 'velo_bin']
).size().rename('whiffs')

whiff_analysis = whiff_analysis.join(whiff_counts)
whiff_analysis['whiff_rate'] = (whiff_analysis['whiffs'] / whiff_analysis['total_swings'] * 100).fillna(0)

print("Juan Soto - Whiff Rate by Pitch Type and Velocity:")
print(whiff_analysis[whiff_analysis['total_swings'] >= 10].round(2))

# Advanced: Whiff rate by location
# Create zone grid
def assign_zone_grid(row):
    """Assign pitch to a grid zone (3x3)"""
    if pd.isna(row['plate_x']) or pd.isna(row['plate_z']):
        return None

    # Horizontal zones
    if row['plate_x'] < -0.28:
        h_zone = 'Away'
    elif row['plate_x'] > 0.28:
        h_zone = 'Inside'
    else:
        h_zone = 'Middle'

    # Vertical zones
    if row['plate_z'] < 2.0:
        v_zone = 'Low'
    elif row['plate_z'] > 2.8:
        v_zone = 'High'
    else:
        v_zone = 'Middle'

    return f"{v_zone}-{h_zone}"

soto_swings['zone_grid'] = soto_swings.apply(assign_zone_grid, axis=1)

zone_whiff = soto_swings.groupby('zone_grid').agg({
    'description': 'count'
}).rename(columns={'description': 'swings'})

zone_whiff_counts = soto_swings[
    soto_swings['description'].isin(['swinging_strike', 'swinging_strike_blocked'])
].groupby('zone_grid').size().rename('whiffs')

zone_whiff = zone_whiff.join(zone_whiff_counts)
zone_whiff['whiff_rate'] = (zone_whiff['whiffs'] / zone_whiff['swings'] * 100).fillna(0)

print("\nWhiff Rate by Zone Grid:")
print(zone_whiff[zone_whiff['swings'] >= 20].sort_values('whiff_rate', ascending=False))

# Visualization: Whiff rate heatmap
fig, ax = plt.subplots(figsize=(10, 8))

# Create heatmap data
heatmap_data = soto_swings.groupby(['plate_x', 'plate_z']).agg({
    'description': lambda x: (x.isin(['swinging_strike', 'swinging_strike_blocked'])).sum() / len(x) * 100
}).reset_index()

scatter = ax.scatter(heatmap_data['plate_x'], heatmap_data['plate_z'],
                    c=heatmap_data['description'], cmap='RdYlGn_r',
                    s=100, alpha=0.6, vmin=0, vmax=50)

# Add strike zone
strike_zone = plt.Rectangle((-0.83, 1.5), 1.66, 2.0,
                            fill=False, edgecolor='black', linewidth=2)
ax.add_patch(strike_zone)

ax.set_xlabel('Horizontal Location (feet)', fontsize=12)
ax.set_ylabel('Vertical Location (feet)', fontsize=12)
ax.set_title('Juan Soto - Whiff Rate Heatmap (2024)', fontsize=14, fontweight='bold')
plt.colorbar(scatter, label='Whiff Rate (%)')
ax.set_aspect('equal')

plt.tight_layout()
plt.savefig('soto_whiff_heatmap.png', dpi=300, bbox_inches='tight')
plt.show()

Two-Strike Approach

Elite hitters often modify their approach with two strikes, typically shortening their swing and focusing on contact. Bat tracking data can quantify these adjustments.

# Python: Two-Strike Approach Analysis
import pybaseball as pyb
import pandas as pd
import numpy as np

def analyze_two_strike_approach(player_id, player_name, start_date='2024-04-01', end_date='2024-09-30'):
    """Analyze how a player's approach changes with two strikes"""

    data = pyb.statcast_batter(start_date, end_date, player_id)

    # Simulate bat tracking metrics
    np.random.seed(42)
    data['bat_speed'] = np.where(
        data['launch_speed'].notna(),
        70 + (data['launch_speed'] - 85) * 0.15 + np.random.normal(0, 1.5, len(data)),
        np.nan
    )
    data['swing_length'] = np.random.normal(7.3, 0.6, len(data))

    # Define swing events
    swing_events = ['hit_into_play', 'foul', 'swinging_strike',
                   'foul_tip', 'swinging_strike_blocked']

    # Separate by count situation
    swings = data[data['description'].isin(swing_events)].copy()

    swings['two_strike'] = swings['strikes'] == 2
    swings['in_zone'] = swings['zone'] <= 9

    # Calculate metrics by two-strike situation
    results = swings.groupby('two_strike').agg({
        'bat_speed': 'mean',
        'swing_length': 'mean',
        'description': 'count'
    }).rename(columns={'description': 'total_swings'})

    # Whiff rate
    whiffs = swings[swings['description'].isin(['swinging_strike',
                                                 'swinging_strike_blocked'])].groupby('two_strike').size()
    results['whiffs'] = whiffs
    results['whiff_rate'] = (results['whiffs'] / results['total_swings'] * 100).fillna(0)

    # Contact quality
    quality_contact = swings[(swings['launch_speed'] >= 95) &
                            (swings['launch_angle'] >= 8) &
                            (swings['launch_angle'] <= 32)].groupby('two_strike').size()
    results['quality_contact'] = quality_contact
    results['quality_contact_rate'] = (results['quality_contact'] / results['total_swings'] * 100).fillna(0)

    # Chase rate
    chases = swings[(~swings['in_zone']) & (swings['description'].isin(swing_events))].groupby('two_strike').size()
    out_zone_pitches = swings[~swings['in_zone']].groupby('two_strike').size()
    results['chase_rate'] = (chases / out_zone_pitches * 100).fillna(0)

    results.index = ['0-1 Strikes', '2 Strikes']

    print(f"\n{player_name} - Two-Strike Approach Analysis:")
    print(results.round(2))

    return results

# Analyze multiple players
players = {
    'Mookie Betts': 605141,
    'Juan Soto': 665742,
    'Aaron Judge': 592450
}

all_results = {}
for name, player_id in players.items():
    all_results[name] = analyze_two_strike_approach(player_id, name)

# Compare adjustments
print("\n\nTwo-Strike Adjustment Comparison:")
comparison = pd.DataFrame({
    name: {
        'Bat Speed Change': df.loc['2 Strikes', 'bat_speed'] - df.loc['0-1 Strikes', 'bat_speed'],
        'Swing Length Change': df.loc['2 Strikes', 'swing_length'] - df.loc['0-1 Strikes', 'swing_length'],
        'Whiff Rate Change': df.loc['2 Strikes', 'whiff_rate'] - df.loc['0-1 Strikes', 'whiff_rate'],
        'Chase Rate Change': df.loc['2 Strikes', 'chase_rate'] - df.loc['0-1 Strikes', 'chase_rate']
    }
    for name, df in all_results.items()
}).T

print(comparison.round(2))
R
# R: Swing Decision Analysis
library(baseballr)
library(dplyr)
library(ggplot2)

# Get pitch-by-pitch data
betts_data <- statcast_search(
  start_date = "2024-04-01",
  end_date = "2024-09-30",
  playerid = 605141,  # Mookie Betts
  player_type = "batter"
)

# Analyze swing decisions by zone
swing_decisions <- betts_data %>%
  mutate(
    in_zone = ifelse(zone <= 9, "In Zone", "Out of Zone"),
    swing = ifelse(description %in% c("hit_into_play", "foul", "swinging_strike",
                                      "foul_tip", "swinging_strike_blocked"),
                  "Swing", "Take"),
    contact = ifelse(description %in% c("hit_into_play", "foul", "foul_tip"),
                    "Contact", "Miss/Take")
  ) %>%
  filter(!is.na(in_zone))

# Calculate key metrics
decision_metrics <- swing_decisions %>%
  group_by(in_zone) %>%
  summarise(
    total_pitches = n(),
    swings = sum(swing == "Swing"),
    swing_rate = swings / total_pitches * 100,
    contact_count = sum(contact == "Contact" & swing == "Swing"),
    contact_rate = contact_count / swings * 100,
    whiff_rate = sum(description %in% c("swinging_strike", "swinging_strike_blocked")) / swings * 100
  )

print("Mookie Betts - Swing Decision Metrics:")
print(decision_metrics)

# O-Swing% (Chase Rate)
chase_rate <- decision_metrics %>%
  filter(in_zone == "Out of Zone") %>%
  pull(swing_rate)

# Z-Contact% (Zone Contact Rate)
zone_contact <- decision_metrics %>%
  filter(in_zone == "In Zone") %>%
  pull(contact_rate)

cat(sprintf("\nChase Rate: %.1f%%\n", chase_rate))
cat(sprintf("Zone Contact Rate: %.1f%%\n", zone_contact))

# Visualize swing decisions by location
ggplot(swing_decisions %>% filter(!is.na(plate_x), !is.na(plate_z)),
       aes(x = plate_x, y = plate_z, color = swing)) +
  geom_point(alpha = 0.3, size = 2) +
  geom_rect(aes(xmin = -0.83, xmax = 0.83, ymin = 1.5, ymax = 3.5),
            fill = NA, color = "black", linewidth = 1.5, inherit.aes = FALSE) +
  scale_color_manual(values = c("Swing" = "red", "Take" = "blue")) +
  labs(title = "Mookie Betts - Swing Decisions (Catcher's View)",
       subtitle = "2024 Season",
       x = "Horizontal Location (feet)",
       y = "Vertical Location (feet)") +
  coord_fixed(ratio = 1) +
  theme_minimal()
Python
# Python: Advanced Whiff Rate Analysis
import pybaseball as pyb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Get data for analysis
soto_data = pyb.statcast_batter('2024-04-01', '2024-09-30', 665742)

# Define swing events
swing_events = ['hit_into_play', 'foul', 'swinging_strike',
                'foul_tip', 'swinging_strike_blocked']

# Calculate whiff rate by pitch characteristics
soto_swings = soto_data[soto_data['description'].isin(swing_events)].copy()

# Add velocity bins
soto_swings['velo_bin'] = pd.cut(soto_swings['release_speed'],
                                  bins=[0, 88, 92, 96, 105],
                                  labels=['<88', '88-92', '92-96', '96+'])

# Calculate whiff metrics
whiff_analysis = soto_swings.groupby(['pitch_type', 'velo_bin']).agg({
    'description': 'count',
    'release_speed': 'mean'
}).rename(columns={'description': 'total_swings'})

# Calculate whiff rate
whiff_counts = soto_swings[soto_swings['description'].isin(['swinging_strike',
                                                             'swinging_strike_blocked'])].groupby(
    ['pitch_type', 'velo_bin']
).size().rename('whiffs')

whiff_analysis = whiff_analysis.join(whiff_counts)
whiff_analysis['whiff_rate'] = (whiff_analysis['whiffs'] / whiff_analysis['total_swings'] * 100).fillna(0)

print("Juan Soto - Whiff Rate by Pitch Type and Velocity:")
print(whiff_analysis[whiff_analysis['total_swings'] >= 10].round(2))

# Advanced: Whiff rate by location
# Create zone grid
def assign_zone_grid(row):
    """Assign pitch to a grid zone (3x3)"""
    if pd.isna(row['plate_x']) or pd.isna(row['plate_z']):
        return None

    # Horizontal zones
    if row['plate_x'] < -0.28:
        h_zone = 'Away'
    elif row['plate_x'] > 0.28:
        h_zone = 'Inside'
    else:
        h_zone = 'Middle'

    # Vertical zones
    if row['plate_z'] < 2.0:
        v_zone = 'Low'
    elif row['plate_z'] > 2.8:
        v_zone = 'High'
    else:
        v_zone = 'Middle'

    return f"{v_zone}-{h_zone}"

soto_swings['zone_grid'] = soto_swings.apply(assign_zone_grid, axis=1)

zone_whiff = soto_swings.groupby('zone_grid').agg({
    'description': 'count'
}).rename(columns={'description': 'swings'})

zone_whiff_counts = soto_swings[
    soto_swings['description'].isin(['swinging_strike', 'swinging_strike_blocked'])
].groupby('zone_grid').size().rename('whiffs')

zone_whiff = zone_whiff.join(zone_whiff_counts)
zone_whiff['whiff_rate'] = (zone_whiff['whiffs'] / zone_whiff['swings'] * 100).fillna(0)

print("\nWhiff Rate by Zone Grid:")
print(zone_whiff[zone_whiff['swings'] >= 20].sort_values('whiff_rate', ascending=False))

# Visualization: Whiff rate heatmap
fig, ax = plt.subplots(figsize=(10, 8))

# Create heatmap data
heatmap_data = soto_swings.groupby(['plate_x', 'plate_z']).agg({
    'description': lambda x: (x.isin(['swinging_strike', 'swinging_strike_blocked'])).sum() / len(x) * 100
}).reset_index()

scatter = ax.scatter(heatmap_data['plate_x'], heatmap_data['plate_z'],
                    c=heatmap_data['description'], cmap='RdYlGn_r',
                    s=100, alpha=0.6, vmin=0, vmax=50)

# Add strike zone
strike_zone = plt.Rectangle((-0.83, 1.5), 1.66, 2.0,
                            fill=False, edgecolor='black', linewidth=2)
ax.add_patch(strike_zone)

ax.set_xlabel('Horizontal Location (feet)', fontsize=12)
ax.set_ylabel('Vertical Location (feet)', fontsize=12)
ax.set_title('Juan Soto - Whiff Rate Heatmap (2024)', fontsize=14, fontweight='bold')
plt.colorbar(scatter, label='Whiff Rate (%)')
ax.set_aspect('equal')

plt.tight_layout()
plt.savefig('soto_whiff_heatmap.png', dpi=300, bbox_inches='tight')
plt.show()
Python
# Python: Two-Strike Approach Analysis
import pybaseball as pyb
import pandas as pd
import numpy as np

def analyze_two_strike_approach(player_id, player_name, start_date='2024-04-01', end_date='2024-09-30'):
    """Analyze how a player's approach changes with two strikes"""

    data = pyb.statcast_batter(start_date, end_date, player_id)

    # Simulate bat tracking metrics
    np.random.seed(42)
    data['bat_speed'] = np.where(
        data['launch_speed'].notna(),
        70 + (data['launch_speed'] - 85) * 0.15 + np.random.normal(0, 1.5, len(data)),
        np.nan
    )
    data['swing_length'] = np.random.normal(7.3, 0.6, len(data))

    # Define swing events
    swing_events = ['hit_into_play', 'foul', 'swinging_strike',
                   'foul_tip', 'swinging_strike_blocked']

    # Separate by count situation
    swings = data[data['description'].isin(swing_events)].copy()

    swings['two_strike'] = swings['strikes'] == 2
    swings['in_zone'] = swings['zone'] <= 9

    # Calculate metrics by two-strike situation
    results = swings.groupby('two_strike').agg({
        'bat_speed': 'mean',
        'swing_length': 'mean',
        'description': 'count'
    }).rename(columns={'description': 'total_swings'})

    # Whiff rate
    whiffs = swings[swings['description'].isin(['swinging_strike',
                                                 'swinging_strike_blocked'])].groupby('two_strike').size()
    results['whiffs'] = whiffs
    results['whiff_rate'] = (results['whiffs'] / results['total_swings'] * 100).fillna(0)

    # Contact quality
    quality_contact = swings[(swings['launch_speed'] >= 95) &
                            (swings['launch_angle'] >= 8) &
                            (swings['launch_angle'] <= 32)].groupby('two_strike').size()
    results['quality_contact'] = quality_contact
    results['quality_contact_rate'] = (results['quality_contact'] / results['total_swings'] * 100).fillna(0)

    # Chase rate
    chases = swings[(~swings['in_zone']) & (swings['description'].isin(swing_events))].groupby('two_strike').size()
    out_zone_pitches = swings[~swings['in_zone']].groupby('two_strike').size()
    results['chase_rate'] = (chases / out_zone_pitches * 100).fillna(0)

    results.index = ['0-1 Strikes', '2 Strikes']

    print(f"\n{player_name} - Two-Strike Approach Analysis:")
    print(results.round(2))

    return results

# Analyze multiple players
players = {
    'Mookie Betts': 605141,
    'Juan Soto': 665742,
    'Aaron Judge': 592450
}

all_results = {}
for name, player_id in players.items():
    all_results[name] = analyze_two_strike_approach(player_id, name)

# Compare adjustments
print("\n\nTwo-Strike Adjustment Comparison:")
comparison = pd.DataFrame({
    name: {
        'Bat Speed Change': df.loc['2 Strikes', 'bat_speed'] - df.loc['0-1 Strikes', 'bat_speed'],
        'Swing Length Change': df.loc['2 Strikes', 'swing_length'] - df.loc['0-1 Strikes', 'swing_length'],
        'Whiff Rate Change': df.loc['2 Strikes', 'whiff_rate'] - df.loc['0-1 Strikes', 'whiff_rate'],
        'Chase Rate Change': df.loc['2 Strikes', 'chase_rate'] - df.loc['0-1 Strikes', 'chase_rate']
    }
    for name, df in all_results.items()
}).T

print(comparison.round(2))

25.4 Expected Run Values

Expected run values (xRV) represent the expected change in run expectancy for each plate appearance outcome. By combining bat tracking data with Statcast outcomes, we can build predictive models that estimate a player's true offensive value.

Calculating Pitch Values

Every pitch has an expected run value based on the count, base-out state, and pitch outcome. These values accumulate throughout a plate appearance to determine overall offensive contribution.

# R: Expected Run Values and Pitch Values
library(dplyr)
library(ggplot2)

# Run expectancy matrix (based on 2024 MLB data)
run_expectancy <- data.frame(
  bases = c("Empty", "1st", "2nd", "3rd", "1st_2nd", "1st_3rd", "2nd_3rd", "Loaded"),
  zero_outs = c(0.481, 0.831, 1.068, 1.298, 1.373, 1.798, 1.946, 2.282),
  one_out = c(0.254, 0.489, 0.644, 0.897, 0.908, 1.140, 1.352, 1.520),
  two_outs = c(0.098, 0.214, 0.305, 0.353, 0.343, 0.559, 0.578, 0.736)
)

print("Run Expectancy Matrix (2024):")
print(run_expectancy)

# Simulate pitch-by-pitch data with outcomes
set.seed(456)
n_pitches <- 1000

pitch_data <- data.frame(
  pitch_num = 1:n_pitches,
  balls = sample(0:3, n_pitches, replace = TRUE),
  strikes = sample(0:2, n_pitches, replace = TRUE),
  outs = sample(0:2, n_pitches, replace = TRUE),
  pitch_type = sample(c("FF", "SL", "CH", "CU", "SI"), n_pitches, replace = TRUE,
                     prob = c(0.35, 0.25, 0.15, 0.15, 0.10)),
  pitch_speed = rnorm(n_pitches, 92, 4),
  in_zone = sample(c(TRUE, FALSE), n_pitches, replace = TRUE, prob = c(0.48, 0.52))
)

# Simulate outcomes based on count and location
pitch_data <- pitch_data %>%
  mutate(
    # Swing probability based on zone and count
    swing_prob = case_when(
      in_zone & strikes == 2 ~ 0.85,
      in_zone ~ 0.70,
      !in_zone & strikes == 2 ~ 0.50,
      !in_zone ~ 0.25
    ),
    swing = rbinom(n_pitches, 1, swing_prob),

    # Contact probability
    contact_prob = case_when(
      in_zone ~ 0.82,
      TRUE ~ 0.60
    ),
    contact = ifelse(swing == 1, rbinom(n_pitches, 1, contact_prob), 0),

    # Outcome
    outcome = case_when(
      swing == 0 & in_zone ~ "Called Strike",
      swing == 0 & !in_zone ~ "Ball",
      swing == 1 & contact == 0 ~ "Swinging Strike",
      swing == 1 & contact == 1 & runif(n_pitches) > 0.15 ~ "Foul",
      swing == 1 & contact == 1 ~ "In Play"
    )
  )

# Calculate count values
count_values <- data.frame(
  balls = rep(0:3, each = 3),
  strikes = rep(0:2, 4)
) %>%
  filter(!(balls == 3 & strikes == 2)) %>%
  mutate(
    run_value = case_when(
      balls == 3 ~ 0.12,  # Walk imminent
      strikes == 2 ~ -0.08,  # Strikeout risk
      balls > strikes ~ 0.04,  # Hitter's count
      TRUE ~ -0.02  # Pitcher's count
    )
  )

# Calculate pitch values
pitch_values <- pitch_data %>%
  mutate(
    new_balls = case_when(
      outcome == "Ball" ~ pmin(balls + 1, 4),
      TRUE ~ balls
    ),
    new_strikes = case_when(
      outcome %in% c("Called Strike", "Swinging Strike") ~ pmin(strikes + 1, 3),
      outcome == "Foul" & strikes < 2 ~ strikes + 1,
      TRUE ~ strikes
    ),

    # Simplified run value change
    pitch_value = case_when(
      outcome == "Ball" ~ 0.04,
      outcome %in% c("Called Strike", "Swinging Strike") ~ -0.05,
      outcome == "Foul" ~ -0.02,
      outcome == "In Play" ~ rnorm(n_pitches, 0.1, 0.3)  # Simplified
    )
  )

# Analyze by pitch type
pitch_type_values <- pitch_values %>%
  group_by(pitch_type) %>%
  summarise(
    total_pitches = n(),
    avg_pitch_value = mean(pitch_value),
    swing_rate = mean(swing) * 100,
    whiff_rate = sum(outcome == "Swinging Strike") / sum(swing) * 100,
    in_zone_rate = mean(in_zone) * 100
  )

print("\nPitch Type Values:")
print(pitch_type_values)

# Visualization
ggplot(pitch_values %>% filter(outcome == "In Play"),
       aes(x = pitch_type, y = pitch_value)) +
  geom_boxplot(fill = "steelblue", alpha = 0.7) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Pitch Values by Type (In-Play Outcomes)",
       subtitle = "Positive values favor the hitter",
       x = "Pitch Type",
       y = "Pitch Value (Runs)") +
  theme_minimal()

Player Expected Run Value Model

By aggregating pitch values across all plate appearances, we can calculate a player's total run contribution above average.

# Python: Player Expected Run Value Model
import pybaseball as pyb
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import LabelEncoder

def calculate_player_xrv(player_id, player_name, start_date='2024-04-01', end_date='2024-09-30'):
    """Calculate expected run value for a player using Statcast data"""

    # Get Statcast data
    data = pyb.statcast_batter(start_date, end_date, player_id)

    # Define count-based run values (simplified model)
    count_rv = {
        (0, 0): 0.00, (1, 0): 0.04, (2, 0): 0.08, (3, 0): 0.12,
        (0, 1): -0.03, (1, 1): 0.01, (2, 1): 0.05, (3, 1): 0.09,
        (0, 2): -0.06, (1, 2): -0.02, (2, 2): 0.02, (3, 2): 0.06
    }

    # Outcome run values (based on linear weights)
    outcome_rv = {
        'single': 0.47,
        'double': 0.77,
        'triple': 1.04,
        'home_run': 1.40,
        'walk': 0.31,
        'hit_by_pitch': 0.33,
        'strikeout': -0.30,
        'field_out': -0.27,
        'force_out': -0.27,
        'grounded_into_double_play': -0.42,
        'fielders_choice_out': -0.27,
        'sac_fly': -0.03,
        'double_play': -0.42
    }

    # Calculate pitch-by-pitch xRV
    data['count_rv'] = data.apply(
        lambda x: count_rv.get((x['balls'], x['strikes']), 0), axis=1
    )

    data['outcome_rv'] = data['events'].map(outcome_rv).fillna(0)

    # For batted balls, adjust by exit velocity and launch angle
    data['xrv_adjustment'] = 0

    batted_balls = data[data['launch_speed'].notna()].copy()
    if len(batted_balls) > 0:
        # Simple model: higher EV and optimal LA = higher xRV
        batted_balls['la_optimal'] = np.abs(batted_balls['launch_angle'] - 20)
        batted_balls['xrv_adjustment'] = (
            (batted_balls['launch_speed'] - 88) * 0.01 -
            batted_balls['la_optimal'] * 0.002
        )
        data.loc[batted_balls.index, 'xrv_adjustment'] = batted_balls['xrv_adjustment']

    data['total_xrv'] = data['outcome_rv'] + data['xrv_adjustment']

    # Aggregate results
    results = {
        'Player': player_name,
        'PA': len(data),
        'Total xRV': data['total_xrv'].sum(),
        'xRV per PA': data['total_xrv'].mean(),
        'xRV per 100 PA': data['total_xrv'].mean() * 100,
        'Batted Ball xRV': data[data['launch_speed'].notna()]['total_xrv'].sum(),
        'BB/HBP xRV': data[data['events'].isin(['walk', 'hit_by_pitch'])]['total_xrv'].sum(),
        'K xRV': data[data['events'] == 'strikeout']['total_xrv'].sum()
    }

    return results, data

# Analyze multiple elite hitters
players = {
    'Shohei Ohtani': 660271,
    'Mookie Betts': 605141,
    'Aaron Judge': 592450,
    'Juan Soto': 665742,
    'Freddie Freeman': 518692
}

xrv_comparison = []
all_player_data = {}

for name, player_id in players.items():
    results, player_data = calculate_player_xrv(player_id, name)
    xrv_comparison.append(results)
    all_player_data[name] = player_data

xrv_df = pd.DataFrame(xrv_comparison)
xrv_df = xrv_df.sort_values('Total xRV', ascending=False)

print("Expected Run Value Leaderboard (2024):")
print(xrv_df.round(2))

# Visualization: xRV breakdown
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Bar chart of total xRV
ax1.barh(xrv_df['Player'], xrv_df['Total xRV'], color='steelblue', alpha=0.8)
ax1.set_xlabel('Total Expected Run Value', fontsize=12)
ax1.set_title('Total xRV by Player (2024)', fontsize=14, fontweight='bold')
ax1.grid(axis='x', alpha=0.3)

# Stacked bar chart of xRV components
components = xrv_df[['Player', 'Batted Ball xRV', 'BB/HBP xRV', 'K xRV']]
components = components.set_index('Player')

components.plot(kind='barh', stacked=True, ax=ax2,
                color=['steelblue', 'green', 'red'], alpha=0.8)
ax2.set_xlabel('Expected Run Value', fontsize=12)
ax2.set_title('xRV Components by Player', fontsize=14, fontweight='bold')
ax2.legend(title='Component', bbox_to_anchor=(1.05, 1), loc='upper left')
ax2.grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.savefig('xrv_comparison.png', dpi=300, bbox_inches='tight')
plt.show()
R
# R: Expected Run Values and Pitch Values
library(dplyr)
library(ggplot2)

# Run expectancy matrix (based on 2024 MLB data)
run_expectancy <- data.frame(
  bases = c("Empty", "1st", "2nd", "3rd", "1st_2nd", "1st_3rd", "2nd_3rd", "Loaded"),
  zero_outs = c(0.481, 0.831, 1.068, 1.298, 1.373, 1.798, 1.946, 2.282),
  one_out = c(0.254, 0.489, 0.644, 0.897, 0.908, 1.140, 1.352, 1.520),
  two_outs = c(0.098, 0.214, 0.305, 0.353, 0.343, 0.559, 0.578, 0.736)
)

print("Run Expectancy Matrix (2024):")
print(run_expectancy)

# Simulate pitch-by-pitch data with outcomes
set.seed(456)
n_pitches <- 1000

pitch_data <- data.frame(
  pitch_num = 1:n_pitches,
  balls = sample(0:3, n_pitches, replace = TRUE),
  strikes = sample(0:2, n_pitches, replace = TRUE),
  outs = sample(0:2, n_pitches, replace = TRUE),
  pitch_type = sample(c("FF", "SL", "CH", "CU", "SI"), n_pitches, replace = TRUE,
                     prob = c(0.35, 0.25, 0.15, 0.15, 0.10)),
  pitch_speed = rnorm(n_pitches, 92, 4),
  in_zone = sample(c(TRUE, FALSE), n_pitches, replace = TRUE, prob = c(0.48, 0.52))
)

# Simulate outcomes based on count and location
pitch_data <- pitch_data %>%
  mutate(
    # Swing probability based on zone and count
    swing_prob = case_when(
      in_zone & strikes == 2 ~ 0.85,
      in_zone ~ 0.70,
      !in_zone & strikes == 2 ~ 0.50,
      !in_zone ~ 0.25
    ),
    swing = rbinom(n_pitches, 1, swing_prob),

    # Contact probability
    contact_prob = case_when(
      in_zone ~ 0.82,
      TRUE ~ 0.60
    ),
    contact = ifelse(swing == 1, rbinom(n_pitches, 1, contact_prob), 0),

    # Outcome
    outcome = case_when(
      swing == 0 & in_zone ~ "Called Strike",
      swing == 0 & !in_zone ~ "Ball",
      swing == 1 & contact == 0 ~ "Swinging Strike",
      swing == 1 & contact == 1 & runif(n_pitches) > 0.15 ~ "Foul",
      swing == 1 & contact == 1 ~ "In Play"
    )
  )

# Calculate count values
count_values <- data.frame(
  balls = rep(0:3, each = 3),
  strikes = rep(0:2, 4)
) %>%
  filter(!(balls == 3 & strikes == 2)) %>%
  mutate(
    run_value = case_when(
      balls == 3 ~ 0.12,  # Walk imminent
      strikes == 2 ~ -0.08,  # Strikeout risk
      balls > strikes ~ 0.04,  # Hitter's count
      TRUE ~ -0.02  # Pitcher's count
    )
  )

# Calculate pitch values
pitch_values <- pitch_data %>%
  mutate(
    new_balls = case_when(
      outcome == "Ball" ~ pmin(balls + 1, 4),
      TRUE ~ balls
    ),
    new_strikes = case_when(
      outcome %in% c("Called Strike", "Swinging Strike") ~ pmin(strikes + 1, 3),
      outcome == "Foul" & strikes < 2 ~ strikes + 1,
      TRUE ~ strikes
    ),

    # Simplified run value change
    pitch_value = case_when(
      outcome == "Ball" ~ 0.04,
      outcome %in% c("Called Strike", "Swinging Strike") ~ -0.05,
      outcome == "Foul" ~ -0.02,
      outcome == "In Play" ~ rnorm(n_pitches, 0.1, 0.3)  # Simplified
    )
  )

# Analyze by pitch type
pitch_type_values <- pitch_values %>%
  group_by(pitch_type) %>%
  summarise(
    total_pitches = n(),
    avg_pitch_value = mean(pitch_value),
    swing_rate = mean(swing) * 100,
    whiff_rate = sum(outcome == "Swinging Strike") / sum(swing) * 100,
    in_zone_rate = mean(in_zone) * 100
  )

print("\nPitch Type Values:")
print(pitch_type_values)

# Visualization
ggplot(pitch_values %>% filter(outcome == "In Play"),
       aes(x = pitch_type, y = pitch_value)) +
  geom_boxplot(fill = "steelblue", alpha = 0.7) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Pitch Values by Type (In-Play Outcomes)",
       subtitle = "Positive values favor the hitter",
       x = "Pitch Type",
       y = "Pitch Value (Runs)") +
  theme_minimal()
Python
# Python: Player Expected Run Value Model
import pybaseball as pyb
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import LabelEncoder

def calculate_player_xrv(player_id, player_name, start_date='2024-04-01', end_date='2024-09-30'):
    """Calculate expected run value for a player using Statcast data"""

    # Get Statcast data
    data = pyb.statcast_batter(start_date, end_date, player_id)

    # Define count-based run values (simplified model)
    count_rv = {
        (0, 0): 0.00, (1, 0): 0.04, (2, 0): 0.08, (3, 0): 0.12,
        (0, 1): -0.03, (1, 1): 0.01, (2, 1): 0.05, (3, 1): 0.09,
        (0, 2): -0.06, (1, 2): -0.02, (2, 2): 0.02, (3, 2): 0.06
    }

    # Outcome run values (based on linear weights)
    outcome_rv = {
        'single': 0.47,
        'double': 0.77,
        'triple': 1.04,
        'home_run': 1.40,
        'walk': 0.31,
        'hit_by_pitch': 0.33,
        'strikeout': -0.30,
        'field_out': -0.27,
        'force_out': -0.27,
        'grounded_into_double_play': -0.42,
        'fielders_choice_out': -0.27,
        'sac_fly': -0.03,
        'double_play': -0.42
    }

    # Calculate pitch-by-pitch xRV
    data['count_rv'] = data.apply(
        lambda x: count_rv.get((x['balls'], x['strikes']), 0), axis=1
    )

    data['outcome_rv'] = data['events'].map(outcome_rv).fillna(0)

    # For batted balls, adjust by exit velocity and launch angle
    data['xrv_adjustment'] = 0

    batted_balls = data[data['launch_speed'].notna()].copy()
    if len(batted_balls) > 0:
        # Simple model: higher EV and optimal LA = higher xRV
        batted_balls['la_optimal'] = np.abs(batted_balls['launch_angle'] - 20)
        batted_balls['xrv_adjustment'] = (
            (batted_balls['launch_speed'] - 88) * 0.01 -
            batted_balls['la_optimal'] * 0.002
        )
        data.loc[batted_balls.index, 'xrv_adjustment'] = batted_balls['xrv_adjustment']

    data['total_xrv'] = data['outcome_rv'] + data['xrv_adjustment']

    # Aggregate results
    results = {
        'Player': player_name,
        'PA': len(data),
        'Total xRV': data['total_xrv'].sum(),
        'xRV per PA': data['total_xrv'].mean(),
        'xRV per 100 PA': data['total_xrv'].mean() * 100,
        'Batted Ball xRV': data[data['launch_speed'].notna()]['total_xrv'].sum(),
        'BB/HBP xRV': data[data['events'].isin(['walk', 'hit_by_pitch'])]['total_xrv'].sum(),
        'K xRV': data[data['events'] == 'strikeout']['total_xrv'].sum()
    }

    return results, data

# Analyze multiple elite hitters
players = {
    'Shohei Ohtani': 660271,
    'Mookie Betts': 605141,
    'Aaron Judge': 592450,
    'Juan Soto': 665742,
    'Freddie Freeman': 518692
}

xrv_comparison = []
all_player_data = {}

for name, player_id in players.items():
    results, player_data = calculate_player_xrv(player_id, name)
    xrv_comparison.append(results)
    all_player_data[name] = player_data

xrv_df = pd.DataFrame(xrv_comparison)
xrv_df = xrv_df.sort_values('Total xRV', ascending=False)

print("Expected Run Value Leaderboard (2024):")
print(xrv_df.round(2))

# Visualization: xRV breakdown
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Bar chart of total xRV
ax1.barh(xrv_df['Player'], xrv_df['Total xRV'], color='steelblue', alpha=0.8)
ax1.set_xlabel('Total Expected Run Value', fontsize=12)
ax1.set_title('Total xRV by Player (2024)', fontsize=14, fontweight='bold')
ax1.grid(axis='x', alpha=0.3)

# Stacked bar chart of xRV components
components = xrv_df[['Player', 'Batted Ball xRV', 'BB/HBP xRV', 'K xRV']]
components = components.set_index('Player')

components.plot(kind='barh', stacked=True, ax=ax2,
                color=['steelblue', 'green', 'red'], alpha=0.8)
ax2.set_xlabel('Expected Run Value', fontsize=12)
ax2.set_title('xRV Components by Player', fontsize=14, fontweight='bold')
ax2.legend(title='Component', bbox_to_anchor=(1.05, 1), loc='upper left')
ax2.grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.savefig('xrv_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

25.5 Attack Angle & Launch Optimization

Attack angle—the vertical angle of the bat path at contact—has become a focal point in modern hitting instruction. The goal is to match the bat path to the pitch plane, maximizing the window for solid contact.

Understanding Attack Angle

The optimal attack angle depends on pitch location:


  • High pitches: Steeper attack angle (15-20 degrees)

  • Middle pitches: Moderate attack angle (10-15 degrees)

  • Low pitches: Shallow attack angle (5-10 degrees)

Elite hitters demonstrate "bat-to-ball adaptability"—the ability to adjust attack angle based on pitch height.

# R: Attack Angle and Launch Optimization
library(dplyr)
library(ggplot2)
library(tidyr)

# Simulate attack angle and outcome data
set.seed(789)
n_swings <- 800

attack_data <- data.frame(
  swing_id = 1:n_swings,
  pitch_height = rnorm(n_swings, 2.3, 0.5),  # Feet above ground
  pitch_speed = rnorm(n_swings, 92, 4),
  attack_angle = rnorm(n_swings, 12, 5)
) %>%
  mutate(
    # Optimal attack angle based on pitch height
    optimal_attack = (pitch_height - 1.5) * 8 + 5,

    # Angle deviation from optimal
    angle_deviation = abs(attack_angle - optimal_attack),

    # Contact quality based on deviation
    contact_quality = pmax(0, 100 - angle_deviation * 8 + rnorm(n_swings, 0, 10)),

    # Exit velocity (affected by contact quality and bat speed)
    bat_speed = rnorm(n_swings, 73, 3),
    exit_velo = pmin(120, bat_speed * 1.2 * (contact_quality / 100) + 20 + rnorm(n_swings, 0, 3)),

    # Launch angle (related to attack angle and pitch height)
    launch_angle = attack_angle + (pitch_height - 2.3) * 5 + rnorm(n_swings, 0, 5),

    # Outcome quality
    barrel = ifelse(exit_velo >= 98 & launch_angle >= 26 & launch_angle <= 30, 1, 0),
    solid_contact = ifelse(exit_velo >= 95 & launch_angle >= 8 & launch_angle <= 32, 1, 0),

    # Pitch location bins
    pitch_location = cut(pitch_height,
                        breaks = c(0, 1.8, 2.5, 4.0),
                        labels = c("Low", "Middle", "High"))
  )

# Analyze optimal attack angles by pitch location
location_analysis <- attack_data %>%
  group_by(pitch_location) %>%
  summarise(
    avg_attack_angle = mean(attack_angle),
    optimal_attack_angle = mean(optimal_attack),
    avg_exit_velo = mean(exit_velo),
    barrel_rate = mean(barrel) * 100,
    solid_contact_rate = mean(solid_contact) * 100,
    n_swings = n()
  )

print("Attack Angle Analysis by Pitch Location:")
print(location_analysis)

# Find ideal attack angle for each zone
ideal_analysis <- attack_data %>%
  mutate(angle_bucket = cut(attack_angle,
                            breaks = seq(0, 30, 5),
                            labels = c("0-5", "5-10", "10-15", "15-20", "20-25", "25-30"))) %>%
  group_by(pitch_location, angle_bucket) %>%
  summarise(
    avg_exit_velo = mean(exit_velo),
    solid_contact_rate = mean(solid_contact) * 100,
    n = n()
  ) %>%
  filter(n >= 10)

print("\nExit Velocity by Attack Angle and Pitch Location:")
print(ideal_analysis)

# Visualization 1: Attack angle vs exit velocity by pitch location
ggplot(attack_data, aes(x = attack_angle, y = exit_velo, color = pitch_location)) +
  geom_point(alpha = 0.4, size = 2) +
  geom_smooth(method = "loess", se = TRUE, linewidth = 1.2) +
  labs(title = "Attack Angle Optimization by Pitch Location",
       subtitle = "Optimal attack angle varies by pitch height",
       x = "Attack Angle (degrees)",
       y = "Exit Velocity (mph)",
       color = "Pitch Location") +
  theme_minimal() +
  theme(legend.position = "bottom")

# Visualization 2: Heat map of outcomes
ggplot(attack_data, aes(x = attack_angle, y = launch_angle)) +
  geom_bin2d(bins = 20) +
  geom_density2d(color = "white", alpha = 0.5) +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +
  labs(title = "Attack Angle vs Launch Angle Distribution",
       subtitle = "Relationship between swing path and batted ball trajectory",
       x = "Attack Angle (degrees)",
       y = "Launch Angle (degrees)") +
  theme_minimal()

Launch Angle Optimization

While attack angle describes the swing path, launch angle describes the trajectory of the batted ball. The relationship between the two is mediated by pitch angle and contact point.

# Python: Launch Angle Optimization Model
import pybaseball as pyb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

def analyze_launch_angle_outcomes(player_id, player_name):
    """Comprehensive launch angle analysis"""

    data = pyb.statcast_batter('2024-04-01', '2024-09-30', player_id)

    # Filter to batted balls
    batted_balls = data[data['launch_speed'].notna() & data['launch_angle'].notna()].copy()

    # Create launch angle bins
    batted_balls['la_bin'] = pd.cut(batted_balls['launch_angle'],
                                     bins=[-90, -10, 10, 25, 40, 90],
                                     labels=['Ground Ball', 'Low LD', 'Optimal', 'High FB', 'Pop Up'])

    # Analyze outcomes by launch angle bin
    la_outcomes = batted_balls.groupby('la_bin').agg({
        'launch_speed': ['mean', 'std'],
        'estimated_woba_using_speedangle': 'mean',
        'launch_angle': 'count',
        'barrel': lambda x: x.sum() / len(x) * 100
    }).round(3)

    la_outcomes.columns = ['Avg EV', 'EV Std', 'Avg xwOBA', 'Count', 'Barrel Rate']

    print(f"\n{player_name} - Launch Angle Outcomes:")
    print(la_outcomes)

    # Calculate sweet spot percentage (8-32 degrees)
    sweet_spot_pct = len(batted_balls[(batted_balls['launch_angle'] >= 8) &
                                     (batted_balls['launch_angle'] <= 32)]) / len(batted_balls) * 100

    # Calculate average launch angle on different pitch types
    pitch_type_la = batted_balls.groupby('pitch_type').agg({
        'launch_angle': 'mean',
        'launch_speed': 'mean',
        'estimated_woba_using_speedangle': 'mean',
        'events': 'count'
    }).round(2)
    pitch_type_la.columns = ['Avg LA', 'Avg EV', 'xwOBA', 'Count']
    pitch_type_la = pitch_type_la[pitch_type_la['Count'] >= 10].sort_values('xwOBA', ascending=False)

    print(f"\nSweet Spot %: {sweet_spot_pct:.1f}%")
    print(f"\nLaunch Angle by Pitch Type:")
    print(pitch_type_la)

    return batted_balls, la_outcomes

# Analyze multiple players
players = {
    'Aaron Judge': 592450,
    'Juan Soto': 665742,
    'Freddie Freeman': 518692
}

fig, axes = plt.subplots(2, 2, figsize=(16, 12))
axes = axes.flatten()

for idx, (name, player_id) in enumerate(players.items()):
    batted_balls, _ = analyze_launch_angle_outcomes(player_id, name)

    # Plot 1: Launch angle distribution
    if idx < len(axes):
        ax = axes[idx]

        # Histogram
        ax.hist(batted_balls['launch_angle'], bins=40, alpha=0.6,
               color='steelblue', edgecolor='black')

        # Add optimal zone
        ax.axvspan(8, 32, alpha=0.2, color='green', label='Sweet Spot (8-32°)')
        ax.axvline(batted_balls['launch_angle'].mean(), color='red',
                  linestyle='--', linewidth=2, label=f'Mean: {batted_balls["launch_angle"].mean():.1f}°')

        ax.set_xlabel('Launch Angle (degrees)', fontsize=11)
        ax.set_ylabel('Frequency', fontsize=11)
        ax.set_title(f'{name} - Launch Angle Distribution', fontsize=12, fontweight='bold')
        ax.legend()
        ax.grid(axis='y', alpha=0.3)

# Plot 4: Combined spray chart colored by launch angle
ax = axes[3]

for name, player_id in players.items():
    data = pyb.statcast_batter('2024-04-01', '2024-09-30', player_id)
    batted_balls = data[data['launch_speed'].notna() & data['launch_angle'].notna()]

    scatter = ax.scatter(batted_balls['hc_x'], batted_balls['hc_y'],
                        c=batted_balls['launch_angle'], cmap='RdYlGn',
                        s=20, alpha=0.3, vmin=-20, vmax=50)

ax.set_xlim(0, 250)
ax.set_ylim(0, 250)
ax.set_xlabel('Horizontal Position', fontsize=11)
ax.set_ylabel('Vertical Position', fontsize=11)
ax.set_title('Combined Spray Chart (Color = Launch Angle)', fontsize=12, fontweight='bold')
plt.colorbar(scatter, ax=ax, label='Launch Angle (degrees)')

plt.tight_layout()
plt.savefig('launch_angle_optimization.png', dpi=300, bbox_inches='tight')
plt.show()

# Advanced: Optimal launch angle by exit velocity
def find_optimal_la_by_ev():
    """Determine optimal launch angle for different exit velocities"""

    # Combine data from multiple players
    all_data = []
    for player_id in [592450, 665742, 518692, 660271, 605141]:
        data = pyb.statcast_batter('2024-04-01', '2024-09-30', player_id)
        all_data.append(data)

    combined = pd.concat(all_data, ignore_index=True)
    batted_balls = combined[combined['launch_speed'].notna() &
                           combined['launch_angle'].notna()].copy()

    # Create EV bins
    batted_balls['ev_bin'] = pd.cut(batted_balls['launch_speed'],
                                     bins=[0, 90, 95, 100, 105, 125],
                                     labels=['<90', '90-95', '95-100', '100-105', '105+'])

    # For each EV bin, find launch angles with highest xwOBA
    optimal_results = []

    for ev_bin in batted_balls['ev_bin'].unique():
        bin_data = batted_balls[batted_balls['ev_bin'] == ev_bin].copy()
        bin_data['la_bucket'] = pd.cut(bin_data['launch_angle'],
                                        bins=range(-20, 60, 5))

        la_analysis = bin_data.groupby('la_bucket').agg({
            'estimated_woba_using_speedangle': 'mean',
            'launch_angle': 'count'
        }).reset_index()

        la_analysis = la_analysis[la_analysis['launch_angle'] >= 10]

        if len(la_analysis) > 0:
            optimal_la = la_analysis.loc[la_analysis['estimated_woba_using_speedangle'].idxmax()]
            optimal_results.append({
                'EV Bin': ev_bin,
                'Optimal LA Range': optimal_la['la_bucket'],
                'xwOBA': optimal_la['estimated_woba_using_speedangle']
            })

    optimal_df = pd.DataFrame(optimal_results)
    print("\nOptimal Launch Angle by Exit Velocity:")
    print(optimal_df)

    return batted_balls

batted_balls_combined = find_optimal_la_by_ev()
R
# R: Attack Angle and Launch Optimization
library(dplyr)
library(ggplot2)
library(tidyr)

# Simulate attack angle and outcome data
set.seed(789)
n_swings <- 800

attack_data <- data.frame(
  swing_id = 1:n_swings,
  pitch_height = rnorm(n_swings, 2.3, 0.5),  # Feet above ground
  pitch_speed = rnorm(n_swings, 92, 4),
  attack_angle = rnorm(n_swings, 12, 5)
) %>%
  mutate(
    # Optimal attack angle based on pitch height
    optimal_attack = (pitch_height - 1.5) * 8 + 5,

    # Angle deviation from optimal
    angle_deviation = abs(attack_angle - optimal_attack),

    # Contact quality based on deviation
    contact_quality = pmax(0, 100 - angle_deviation * 8 + rnorm(n_swings, 0, 10)),

    # Exit velocity (affected by contact quality and bat speed)
    bat_speed = rnorm(n_swings, 73, 3),
    exit_velo = pmin(120, bat_speed * 1.2 * (contact_quality / 100) + 20 + rnorm(n_swings, 0, 3)),

    # Launch angle (related to attack angle and pitch height)
    launch_angle = attack_angle + (pitch_height - 2.3) * 5 + rnorm(n_swings, 0, 5),

    # Outcome quality
    barrel = ifelse(exit_velo >= 98 & launch_angle >= 26 & launch_angle <= 30, 1, 0),
    solid_contact = ifelse(exit_velo >= 95 & launch_angle >= 8 & launch_angle <= 32, 1, 0),

    # Pitch location bins
    pitch_location = cut(pitch_height,
                        breaks = c(0, 1.8, 2.5, 4.0),
                        labels = c("Low", "Middle", "High"))
  )

# Analyze optimal attack angles by pitch location
location_analysis <- attack_data %>%
  group_by(pitch_location) %>%
  summarise(
    avg_attack_angle = mean(attack_angle),
    optimal_attack_angle = mean(optimal_attack),
    avg_exit_velo = mean(exit_velo),
    barrel_rate = mean(barrel) * 100,
    solid_contact_rate = mean(solid_contact) * 100,
    n_swings = n()
  )

print("Attack Angle Analysis by Pitch Location:")
print(location_analysis)

# Find ideal attack angle for each zone
ideal_analysis <- attack_data %>%
  mutate(angle_bucket = cut(attack_angle,
                            breaks = seq(0, 30, 5),
                            labels = c("0-5", "5-10", "10-15", "15-20", "20-25", "25-30"))) %>%
  group_by(pitch_location, angle_bucket) %>%
  summarise(
    avg_exit_velo = mean(exit_velo),
    solid_contact_rate = mean(solid_contact) * 100,
    n = n()
  ) %>%
  filter(n >= 10)

print("\nExit Velocity by Attack Angle and Pitch Location:")
print(ideal_analysis)

# Visualization 1: Attack angle vs exit velocity by pitch location
ggplot(attack_data, aes(x = attack_angle, y = exit_velo, color = pitch_location)) +
  geom_point(alpha = 0.4, size = 2) +
  geom_smooth(method = "loess", se = TRUE, linewidth = 1.2) +
  labs(title = "Attack Angle Optimization by Pitch Location",
       subtitle = "Optimal attack angle varies by pitch height",
       x = "Attack Angle (degrees)",
       y = "Exit Velocity (mph)",
       color = "Pitch Location") +
  theme_minimal() +
  theme(legend.position = "bottom")

# Visualization 2: Heat map of outcomes
ggplot(attack_data, aes(x = attack_angle, y = launch_angle)) +
  geom_bin2d(bins = 20) +
  geom_density2d(color = "white", alpha = 0.5) +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +
  labs(title = "Attack Angle vs Launch Angle Distribution",
       subtitle = "Relationship between swing path and batted ball trajectory",
       x = "Attack Angle (degrees)",
       y = "Launch Angle (degrees)") +
  theme_minimal()
Python
# Python: Launch Angle Optimization Model
import pybaseball as pyb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

def analyze_launch_angle_outcomes(player_id, player_name):
    """Comprehensive launch angle analysis"""

    data = pyb.statcast_batter('2024-04-01', '2024-09-30', player_id)

    # Filter to batted balls
    batted_balls = data[data['launch_speed'].notna() & data['launch_angle'].notna()].copy()

    # Create launch angle bins
    batted_balls['la_bin'] = pd.cut(batted_balls['launch_angle'],
                                     bins=[-90, -10, 10, 25, 40, 90],
                                     labels=['Ground Ball', 'Low LD', 'Optimal', 'High FB', 'Pop Up'])

    # Analyze outcomes by launch angle bin
    la_outcomes = batted_balls.groupby('la_bin').agg({
        'launch_speed': ['mean', 'std'],
        'estimated_woba_using_speedangle': 'mean',
        'launch_angle': 'count',
        'barrel': lambda x: x.sum() / len(x) * 100
    }).round(3)

    la_outcomes.columns = ['Avg EV', 'EV Std', 'Avg xwOBA', 'Count', 'Barrel Rate']

    print(f"\n{player_name} - Launch Angle Outcomes:")
    print(la_outcomes)

    # Calculate sweet spot percentage (8-32 degrees)
    sweet_spot_pct = len(batted_balls[(batted_balls['launch_angle'] >= 8) &
                                     (batted_balls['launch_angle'] <= 32)]) / len(batted_balls) * 100

    # Calculate average launch angle on different pitch types
    pitch_type_la = batted_balls.groupby('pitch_type').agg({
        'launch_angle': 'mean',
        'launch_speed': 'mean',
        'estimated_woba_using_speedangle': 'mean',
        'events': 'count'
    }).round(2)
    pitch_type_la.columns = ['Avg LA', 'Avg EV', 'xwOBA', 'Count']
    pitch_type_la = pitch_type_la[pitch_type_la['Count'] >= 10].sort_values('xwOBA', ascending=False)

    print(f"\nSweet Spot %: {sweet_spot_pct:.1f}%")
    print(f"\nLaunch Angle by Pitch Type:")
    print(pitch_type_la)

    return batted_balls, la_outcomes

# Analyze multiple players
players = {
    'Aaron Judge': 592450,
    'Juan Soto': 665742,
    'Freddie Freeman': 518692
}

fig, axes = plt.subplots(2, 2, figsize=(16, 12))
axes = axes.flatten()

for idx, (name, player_id) in enumerate(players.items()):
    batted_balls, _ = analyze_launch_angle_outcomes(player_id, name)

    # Plot 1: Launch angle distribution
    if idx < len(axes):
        ax = axes[idx]

        # Histogram
        ax.hist(batted_balls['launch_angle'], bins=40, alpha=0.6,
               color='steelblue', edgecolor='black')

        # Add optimal zone
        ax.axvspan(8, 32, alpha=0.2, color='green', label='Sweet Spot (8-32°)')
        ax.axvline(batted_balls['launch_angle'].mean(), color='red',
                  linestyle='--', linewidth=2, label=f'Mean: {batted_balls["launch_angle"].mean():.1f}°')

        ax.set_xlabel('Launch Angle (degrees)', fontsize=11)
        ax.set_ylabel('Frequency', fontsize=11)
        ax.set_title(f'{name} - Launch Angle Distribution', fontsize=12, fontweight='bold')
        ax.legend()
        ax.grid(axis='y', alpha=0.3)

# Plot 4: Combined spray chart colored by launch angle
ax = axes[3]

for name, player_id in players.items():
    data = pyb.statcast_batter('2024-04-01', '2024-09-30', player_id)
    batted_balls = data[data['launch_speed'].notna() & data['launch_angle'].notna()]

    scatter = ax.scatter(batted_balls['hc_x'], batted_balls['hc_y'],
                        c=batted_balls['launch_angle'], cmap='RdYlGn',
                        s=20, alpha=0.3, vmin=-20, vmax=50)

ax.set_xlim(0, 250)
ax.set_ylim(0, 250)
ax.set_xlabel('Horizontal Position', fontsize=11)
ax.set_ylabel('Vertical Position', fontsize=11)
ax.set_title('Combined Spray Chart (Color = Launch Angle)', fontsize=12, fontweight='bold')
plt.colorbar(scatter, ax=ax, label='Launch Angle (degrees)')

plt.tight_layout()
plt.savefig('launch_angle_optimization.png', dpi=300, bbox_inches='tight')
plt.show()

# Advanced: Optimal launch angle by exit velocity
def find_optimal_la_by_ev():
    """Determine optimal launch angle for different exit velocities"""

    # Combine data from multiple players
    all_data = []
    for player_id in [592450, 665742, 518692, 660271, 605141]:
        data = pyb.statcast_batter('2024-04-01', '2024-09-30', player_id)
        all_data.append(data)

    combined = pd.concat(all_data, ignore_index=True)
    batted_balls = combined[combined['launch_speed'].notna() &
                           combined['launch_angle'].notna()].copy()

    # Create EV bins
    batted_balls['ev_bin'] = pd.cut(batted_balls['launch_speed'],
                                     bins=[0, 90, 95, 100, 105, 125],
                                     labels=['<90', '90-95', '95-100', '100-105', '105+'])

    # For each EV bin, find launch angles with highest xwOBA
    optimal_results = []

    for ev_bin in batted_balls['ev_bin'].unique():
        bin_data = batted_balls[batted_balls['ev_bin'] == ev_bin].copy()
        bin_data['la_bucket'] = pd.cut(bin_data['launch_angle'],
                                        bins=range(-20, 60, 5))

        la_analysis = bin_data.groupby('la_bucket').agg({
            'estimated_woba_using_speedangle': 'mean',
            'launch_angle': 'count'
        }).reset_index()

        la_analysis = la_analysis[la_analysis['launch_angle'] >= 10]

        if len(la_analysis) > 0:
            optimal_la = la_analysis.loc[la_analysis['estimated_woba_using_speedangle'].idxmax()]
            optimal_results.append({
                'EV Bin': ev_bin,
                'Optimal LA Range': optimal_la['la_bucket'],
                'xwOBA': optimal_la['estimated_woba_using_speedangle']
            })

    optimal_df = pd.DataFrame(optimal_results)
    print("\nOptimal Launch Angle by Exit Velocity:")
    print(optimal_df)

    return batted_balls

batted_balls_combined = find_optimal_la_by_ev()

25.6 Combining Tracking Data for Player Evaluation

The true power of advanced tracking data emerges when we combine multiple metrics to create comprehensive player evaluations. This section demonstrates how to build a holistic hitting profile using bat tracking, Statcast, and traditional statistics.

Building a Comprehensive Hitting Profile

# Python: Comprehensive Player Evaluation System
import pybaseball as pyb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

class HitterProfile:
    """Comprehensive hitter evaluation using multiple data sources"""

    def __init__(self, player_id, player_name, season='2024'):
        self.player_id = player_id
        self.player_name = player_name
        self.season = season
        self.data = None
        self.profile = {}

    def load_data(self, start_date='2024-04-01', end_date='2024-09-30'):
        """Load Statcast data"""
        self.data = pyb.statcast_batter(start_date, end_date, self.player_id)

        # Simulate bat tracking metrics
        np.random.seed(self.player_id)
        self.data['bat_speed'] = np.where(
            self.data['launch_speed'].notna(),
            70 + (self.data['launch_speed'] - 85) * 0.15 + np.random.normal(0, 1.5, len(self.data)),
            np.nan
        )
        self.data['swing_length'] = np.random.normal(7.3, 0.6, len(self.data))
        self.data['attack_angle'] = np.where(
            self.data['launch_angle'].notna(),
            self.data['launch_angle'] - 8 + np.random.normal(0, 3, len(self.data)),
            np.nan
        )

    def calculate_bat_tracking_metrics(self):
        """Calculate bat tracking metrics"""
        swings = self.data[self.data['bat_speed'].notna()]

        if len(swings) > 0:
            self.profile['avg_bat_speed'] = swings['bat_speed'].mean()
            self.profile['max_bat_speed'] = swings['bat_speed'].max()
            self.profile['avg_swing_length'] = swings['swing_length'].mean()
            self.profile['bat_speed_90th'] = swings['bat_speed'].quantile(0.9)
        else:
            self.profile['avg_bat_speed'] = np.nan
            self.profile['max_bat_speed'] = np.nan
            self.profile['avg_swing_length'] = np.nan
            self.profile['bat_speed_90th'] = np.nan

    def calculate_contact_metrics(self):
        """Calculate contact quality metrics"""
        swing_events = ['hit_into_play', 'foul', 'swinging_strike',
                       'foul_tip', 'swinging_strike_blocked']
        swings = self.data[self.data['description'].isin(swing_events)]

        if len(swings) > 0:
            # Contact rate
            contact = swings[swings['description'].isin(['hit_into_play', 'foul', 'foul_tip'])]
            self.profile['contact_rate'] = len(contact) / len(swings) * 100

            # Whiff rate
            whiffs = swings[swings['description'].isin(['swinging_strike', 'swinging_strike_blocked'])]
            self.profile['whiff_rate'] = len(whiffs) / len(swings) * 100

            # Zone metrics
            in_zone = swings[swings['zone'] <= 9]
            out_zone = swings[swings['zone'] > 9]

            self.profile['z_swing_rate'] = len(in_zone) / len(self.data[self.data['zone'] <= 9]) * 100 if len(self.data[self.data['zone'] <= 9]) > 0 else 0
            self.profile['o_swing_rate'] = len(out_zone) / len(self.data[self.data['zone'] > 9]) * 100 if len(self.data[self.data['zone'] > 9]) > 0 else 0
            self.profile['chase_rate'] = self.profile['o_swing_rate']
        else:
            self.profile['contact_rate'] = np.nan
            self.profile['whiff_rate'] = np.nan
            self.profile['z_swing_rate'] = np.nan
            self.profile['o_swing_rate'] = np.nan
            self.profile['chase_rate'] = np.nan

    def calculate_batted_ball_metrics(self):
        """Calculate batted ball quality metrics"""
        batted_balls = self.data[self.data['launch_speed'].notna()]

        if len(batted_balls) > 0:
            self.profile['avg_exit_velo'] = batted_balls['launch_speed'].mean()
            self.profile['max_exit_velo'] = batted_balls['launch_speed'].max()
            self.profile['avg_launch_angle'] = batted_balls['launch_angle'].mean()

            # Sweet spot % (8-32 degrees)
            sweet_spot = batted_balls[(batted_balls['launch_angle'] >= 8) &
                                     (batted_balls['launch_angle'] <= 32)]
            self.profile['sweet_spot_pct'] = len(sweet_spot) / len(batted_balls) * 100

            # Barrel rate
            barrels = batted_balls[batted_balls['barrel'] == 1]
            self.profile['barrel_rate'] = len(barrels) / len(batted_balls) * 100

            # Hard hit rate (95+ mph)
            hard_hit = batted_balls[batted_balls['launch_speed'] >= 95]
            self.profile['hard_hit_rate'] = len(hard_hit) / len(batted_balls) * 100

            # Expected stats
            self.profile['xBA'] = batted_balls['estimated_ba_using_speedangle'].mean()
            self.profile['xwOBA'] = batted_balls['estimated_woba_using_speedangle'].mean()
        else:
            for metric in ['avg_exit_velo', 'max_exit_velo', 'avg_launch_angle',
                          'sweet_spot_pct', 'barrel_rate', 'hard_hit_rate', 'xBA', 'xwOBA']:
                self.profile[metric] = np.nan

    def calculate_approach_metrics(self):
        """Calculate plate approach metrics"""
        # Swing decisions by count
        ahead_counts = self.data[self.data['balls'] > self.data['strikes']]
        behind_counts = self.data[self.data['balls'] < self.data['strikes']]

        swing_events = ['hit_into_play', 'foul', 'swinging_strike',
                       'foul_tip', 'swinging_strike_blocked']

        if len(ahead_counts) > 0:
            ahead_swings = ahead_counts[ahead_counts['description'].isin(swing_events)]
            self.profile['swing_rate_ahead'] = len(ahead_swings) / len(ahead_counts) * 100
        else:
            self.profile['swing_rate_ahead'] = np.nan

        if len(behind_counts) > 0:
            behind_swings = behind_counts[behind_counts['description'].isin(swing_events)]
            self.profile['swing_rate_behind'] = len(behind_swings) / len(behind_counts) * 100
        else:
            self.profile['swing_rate_behind'] = np.nan

    def generate_full_profile(self):
        """Generate complete player profile"""
        self.calculate_bat_tracking_metrics()
        self.calculate_contact_metrics()
        self.calculate_batted_ball_metrics()
        self.calculate_approach_metrics()

        return pd.Series(self.profile, name=self.player_name)

# Generate profiles for multiple players
players = {
    'Shohei Ohtani': 660271,
    'Aaron Judge': 592450,
    'Juan Soto': 665742,
    'Mookie Betts': 605141,
    'Freddie Freeman': 518692,
    'Bobby Witt Jr': 677951
}

profiles = []

for name, player_id in players.items():
    print(f"Processing {name}...")
    profiler = HitterProfile(player_id, name)
    profiler.load_data()
    profile = profiler.generate_full_profile()
    profiles.append(profile)

# Combine into dataframe
profiles_df = pd.DataFrame(profiles)
profiles_df = profiles_df.round(2)

print("\nComprehensive Hitter Profiles:")
print(profiles_df.T)

# Standardize metrics for comparison
scaler = StandardScaler()
metrics_to_scale = ['avg_bat_speed', 'avg_exit_velo', 'contact_rate',
                   'barrel_rate', 'hard_hit_rate', 'sweet_spot_pct']
profiles_df_scaled = profiles_df.copy()
profiles_df_scaled[metrics_to_scale] = scaler.fit_transform(profiles_df[metrics_to_scale].fillna(0))

# Create radar chart for comparison
def create_radar_chart(data, players, metrics, title):
    """Create radar chart comparing players"""

    angles = np.linspace(0, 2 * np.pi, len(metrics), endpoint=False).tolist()
    angles += angles[:1]

    fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(projection='polar'))

    colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b']

    for idx, player in enumerate(players):
        values = data.loc[data.index[idx], metrics].tolist()
        values += values[:1]

        ax.plot(angles, values, 'o-', linewidth=2, label=player, color=colors[idx % len(colors)])
        ax.fill(angles, values, alpha=0.15, color=colors[idx % len(colors)])

    ax.set_xticks(angles[:-1])
    ax.set_xticklabels(metrics, size=10)
    ax.set_ylim(-2, 2)
    ax.set_title(title, size=14, fontweight='bold', pad=20)
    ax.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1))
    ax.grid(True)

    return fig

# Create radar chart
radar_metrics = ['avg_bat_speed', 'avg_exit_velo', 'contact_rate',
                'barrel_rate', 'hard_hit_rate', 'sweet_spot_pct']
fig = create_radar_chart(profiles_df_scaled, list(players.keys()), radar_metrics,
                        'Comprehensive Hitter Comparison (Standardized)')
plt.tight_layout()
plt.savefig('hitter_radar_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

# Correlation matrix
fig, ax = plt.subplots(figsize=(12, 10))
correlation_metrics = ['avg_bat_speed', 'avg_exit_velo', 'contact_rate',
                      'whiff_rate', 'chase_rate', 'barrel_rate',
                      'hard_hit_rate', 'sweet_spot_pct', 'xwOBA']
corr_data = profiles_df[correlation_metrics].T
sns.heatmap(corr_data, annot=True, fmt='.2f', cmap='RdYlGn', center=0,
           square=True, linewidths=1, cbar_kws={"shrink": 0.8})
ax.set_title('Metric Correlation Across Elite Hitters', fontsize=14, fontweight='bold', pad=20)
plt.tight_layout()
plt.savefig('hitter_metrics_correlation.png', dpi=300, bbox_inches='tight')
plt.show()

Creating a Player Similarity Index

# R: Player Similarity Index
library(dplyr)
library(ggplot2)
library(cluster)

# Create sample player data (in practice, use real Statcast/bat tracking data)
set.seed(100)

players <- data.frame(
  player = c("Shohei Ohtani", "Aaron Judge", "Juan Soto", "Mookie Betts",
             "Freddie Freeman", "Bobby Witt Jr", "Jose Ramirez", "Kyle Tucker",
             "Ronald Acuna Jr", "Fernando Tatis Jr"),
  bat_speed = c(75.2, 74.8, 72.1, 73.5, 71.8, 74.2, 72.9, 73.8, 74.5, 75.0),
  swing_length = c(7.4, 7.8, 7.1, 7.2, 7.0, 7.5, 7.3, 7.4, 7.6, 7.7),
  contact_rate = c(77.2, 72.5, 83.1, 81.5, 85.2, 75.8, 82.3, 79.1, 74.2, 71.8),
  chase_rate = c(24.5, 28.3, 18.9, 22.1, 19.5, 26.8, 23.5, 25.2, 29.1, 30.5),
  avg_exit_velo = c(92.1, 95.2, 91.3, 91.8, 90.5, 93.1, 90.8, 92.5, 93.8, 94.2),
  barrel_rate = c(12.8, 17.2, 10.5, 11.9, 9.8, 13.5, 10.2, 12.1, 14.8, 15.5),
  sweet_spot_pct = c(38.5, 34.2, 42.1, 40.3, 43.8, 36.5, 41.2, 39.1, 35.8, 33.9),
  xwOBA = c(.385, .398, .392, .388, .378, .391, .375, .383, .395, .401)
)

# Standardize metrics
player_names <- players$player
players_scaled <- scale(players[, -1])
rownames(players_scaled) <- player_names

# Calculate distance matrix
dist_matrix <- dist(players_scaled, method = "euclidean")

# Hierarchical clustering
hc <- hclust(dist_matrix, method = "ward.D2")

# Plot dendrogram
plot(hc, main = "Player Similarity Dendrogram",
     xlab = "Player", ylab = "Distance",
     sub = "Based on bat tracking and Statcast metrics")
rect.hclust(hc, k = 4, border = "red")

# Create similarity function
find_similar_players <- function(player_name, n = 3) {
  player_idx <- which(rownames(players_scaled) == player_name)

  if (length(player_idx) == 0) {
    stop("Player not found")
  }

  # Calculate distances to all other players
  distances <- apply(players_scaled, 1, function(x) {
    sqrt(sum((players_scaled[player_idx, ] - x)^2))
  })

  # Sort and get top n similar players (excluding the player themselves)
  similar <- sort(distances)[-1][1:n]

  result <- data.frame(
    similar_player = names(similar),
    similarity_score = 100 - (similar / max(distances) * 100)
  )

  return(result)
}

# Find similar players
cat("\nPlayers most similar to Juan Soto:\n")
print(find_similar_players("Juan Soto"))

cat("\nPlayers most similar to Aaron Judge:\n")
print(find_similar_players("Aaron Judge"))

# PCA for visualization
pca <- prcomp(players_scaled)

# Create PCA plot
pca_data <- data.frame(
  player = rownames(players_scaled),
  PC1 = pca$x[, 1],
  PC2 = pca$x[, 2]
)

ggplot(pca_data, aes(x = PC1, y = PC2, label = player)) +
  geom_point(size = 4, color = "steelblue", alpha = 0.7) +
  geom_text(vjust = -1, size = 3) +
  labs(title = "Player Similarity Map (PCA)",
       subtitle = paste0("PC1 explains ", round(summary(pca)$importance[2, 1] * 100, 1),
                        "% variance, PC2 explains ",
                        round(summary(pca)$importance[2, 2] * 100, 1), "%"),
       x = paste0("PC1 (", round(summary(pca)$importance[2, 1] * 100, 1), "%)"),
       y = paste0("PC2 (", round(summary(pca)$importance[2, 2] * 100, 1), "%)")) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

# Interpret principal components
cat("\nPrincipal Component Loadings:\n")
print(pca$rotation[, 1:2])
R
# R: Player Similarity Index
library(dplyr)
library(ggplot2)
library(cluster)

# Create sample player data (in practice, use real Statcast/bat tracking data)
set.seed(100)

players <- data.frame(
  player = c("Shohei Ohtani", "Aaron Judge", "Juan Soto", "Mookie Betts",
             "Freddie Freeman", "Bobby Witt Jr", "Jose Ramirez", "Kyle Tucker",
             "Ronald Acuna Jr", "Fernando Tatis Jr"),
  bat_speed = c(75.2, 74.8, 72.1, 73.5, 71.8, 74.2, 72.9, 73.8, 74.5, 75.0),
  swing_length = c(7.4, 7.8, 7.1, 7.2, 7.0, 7.5, 7.3, 7.4, 7.6, 7.7),
  contact_rate = c(77.2, 72.5, 83.1, 81.5, 85.2, 75.8, 82.3, 79.1, 74.2, 71.8),
  chase_rate = c(24.5, 28.3, 18.9, 22.1, 19.5, 26.8, 23.5, 25.2, 29.1, 30.5),
  avg_exit_velo = c(92.1, 95.2, 91.3, 91.8, 90.5, 93.1, 90.8, 92.5, 93.8, 94.2),
  barrel_rate = c(12.8, 17.2, 10.5, 11.9, 9.8, 13.5, 10.2, 12.1, 14.8, 15.5),
  sweet_spot_pct = c(38.5, 34.2, 42.1, 40.3, 43.8, 36.5, 41.2, 39.1, 35.8, 33.9),
  xwOBA = c(.385, .398, .392, .388, .378, .391, .375, .383, .395, .401)
)

# Standardize metrics
player_names <- players$player
players_scaled <- scale(players[, -1])
rownames(players_scaled) <- player_names

# Calculate distance matrix
dist_matrix <- dist(players_scaled, method = "euclidean")

# Hierarchical clustering
hc <- hclust(dist_matrix, method = "ward.D2")

# Plot dendrogram
plot(hc, main = "Player Similarity Dendrogram",
     xlab = "Player", ylab = "Distance",
     sub = "Based on bat tracking and Statcast metrics")
rect.hclust(hc, k = 4, border = "red")

# Create similarity function
find_similar_players <- function(player_name, n = 3) {
  player_idx <- which(rownames(players_scaled) == player_name)

  if (length(player_idx) == 0) {
    stop("Player not found")
  }

  # Calculate distances to all other players
  distances <- apply(players_scaled, 1, function(x) {
    sqrt(sum((players_scaled[player_idx, ] - x)^2))
  })

  # Sort and get top n similar players (excluding the player themselves)
  similar <- sort(distances)[-1][1:n]

  result <- data.frame(
    similar_player = names(similar),
    similarity_score = 100 - (similar / max(distances) * 100)
  )

  return(result)
}

# Find similar players
cat("\nPlayers most similar to Juan Soto:\n")
print(find_similar_players("Juan Soto"))

cat("\nPlayers most similar to Aaron Judge:\n")
print(find_similar_players("Aaron Judge"))

# PCA for visualization
pca <- prcomp(players_scaled)

# Create PCA plot
pca_data <- data.frame(
  player = rownames(players_scaled),
  PC1 = pca$x[, 1],
  PC2 = pca$x[, 2]
)

ggplot(pca_data, aes(x = PC1, y = PC2, label = player)) +
  geom_point(size = 4, color = "steelblue", alpha = 0.7) +
  geom_text(vjust = -1, size = 3) +
  labs(title = "Player Similarity Map (PCA)",
       subtitle = paste0("PC1 explains ", round(summary(pca)$importance[2, 1] * 100, 1),
                        "% variance, PC2 explains ",
                        round(summary(pca)$importance[2, 2] * 100, 1), "%"),
       x = paste0("PC1 (", round(summary(pca)$importance[2, 1] * 100, 1), "%)"),
       y = paste0("PC2 (", round(summary(pca)$importance[2, 2] * 100, 1), "%)")) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

# Interpret principal components
cat("\nPrincipal Component Loadings:\n")
print(pca$rotation[, 1:2])
Python
# Python: Comprehensive Player Evaluation System
import pybaseball as pyb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

class HitterProfile:
    """Comprehensive hitter evaluation using multiple data sources"""

    def __init__(self, player_id, player_name, season='2024'):
        self.player_id = player_id
        self.player_name = player_name
        self.season = season
        self.data = None
        self.profile = {}

    def load_data(self, start_date='2024-04-01', end_date='2024-09-30'):
        """Load Statcast data"""
        self.data = pyb.statcast_batter(start_date, end_date, self.player_id)

        # Simulate bat tracking metrics
        np.random.seed(self.player_id)
        self.data['bat_speed'] = np.where(
            self.data['launch_speed'].notna(),
            70 + (self.data['launch_speed'] - 85) * 0.15 + np.random.normal(0, 1.5, len(self.data)),
            np.nan
        )
        self.data['swing_length'] = np.random.normal(7.3, 0.6, len(self.data))
        self.data['attack_angle'] = np.where(
            self.data['launch_angle'].notna(),
            self.data['launch_angle'] - 8 + np.random.normal(0, 3, len(self.data)),
            np.nan
        )

    def calculate_bat_tracking_metrics(self):
        """Calculate bat tracking metrics"""
        swings = self.data[self.data['bat_speed'].notna()]

        if len(swings) > 0:
            self.profile['avg_bat_speed'] = swings['bat_speed'].mean()
            self.profile['max_bat_speed'] = swings['bat_speed'].max()
            self.profile['avg_swing_length'] = swings['swing_length'].mean()
            self.profile['bat_speed_90th'] = swings['bat_speed'].quantile(0.9)
        else:
            self.profile['avg_bat_speed'] = np.nan
            self.profile['max_bat_speed'] = np.nan
            self.profile['avg_swing_length'] = np.nan
            self.profile['bat_speed_90th'] = np.nan

    def calculate_contact_metrics(self):
        """Calculate contact quality metrics"""
        swing_events = ['hit_into_play', 'foul', 'swinging_strike',
                       'foul_tip', 'swinging_strike_blocked']
        swings = self.data[self.data['description'].isin(swing_events)]

        if len(swings) > 0:
            # Contact rate
            contact = swings[swings['description'].isin(['hit_into_play', 'foul', 'foul_tip'])]
            self.profile['contact_rate'] = len(contact) / len(swings) * 100

            # Whiff rate
            whiffs = swings[swings['description'].isin(['swinging_strike', 'swinging_strike_blocked'])]
            self.profile['whiff_rate'] = len(whiffs) / len(swings) * 100

            # Zone metrics
            in_zone = swings[swings['zone'] <= 9]
            out_zone = swings[swings['zone'] > 9]

            self.profile['z_swing_rate'] = len(in_zone) / len(self.data[self.data['zone'] <= 9]) * 100 if len(self.data[self.data['zone'] <= 9]) > 0 else 0
            self.profile['o_swing_rate'] = len(out_zone) / len(self.data[self.data['zone'] > 9]) * 100 if len(self.data[self.data['zone'] > 9]) > 0 else 0
            self.profile['chase_rate'] = self.profile['o_swing_rate']
        else:
            self.profile['contact_rate'] = np.nan
            self.profile['whiff_rate'] = np.nan
            self.profile['z_swing_rate'] = np.nan
            self.profile['o_swing_rate'] = np.nan
            self.profile['chase_rate'] = np.nan

    def calculate_batted_ball_metrics(self):
        """Calculate batted ball quality metrics"""
        batted_balls = self.data[self.data['launch_speed'].notna()]

        if len(batted_balls) > 0:
            self.profile['avg_exit_velo'] = batted_balls['launch_speed'].mean()
            self.profile['max_exit_velo'] = batted_balls['launch_speed'].max()
            self.profile['avg_launch_angle'] = batted_balls['launch_angle'].mean()

            # Sweet spot % (8-32 degrees)
            sweet_spot = batted_balls[(batted_balls['launch_angle'] >= 8) &
                                     (batted_balls['launch_angle'] <= 32)]
            self.profile['sweet_spot_pct'] = len(sweet_spot) / len(batted_balls) * 100

            # Barrel rate
            barrels = batted_balls[batted_balls['barrel'] == 1]
            self.profile['barrel_rate'] = len(barrels) / len(batted_balls) * 100

            # Hard hit rate (95+ mph)
            hard_hit = batted_balls[batted_balls['launch_speed'] >= 95]
            self.profile['hard_hit_rate'] = len(hard_hit) / len(batted_balls) * 100

            # Expected stats
            self.profile['xBA'] = batted_balls['estimated_ba_using_speedangle'].mean()
            self.profile['xwOBA'] = batted_balls['estimated_woba_using_speedangle'].mean()
        else:
            for metric in ['avg_exit_velo', 'max_exit_velo', 'avg_launch_angle',
                          'sweet_spot_pct', 'barrel_rate', 'hard_hit_rate', 'xBA', 'xwOBA']:
                self.profile[metric] = np.nan

    def calculate_approach_metrics(self):
        """Calculate plate approach metrics"""
        # Swing decisions by count
        ahead_counts = self.data[self.data['balls'] > self.data['strikes']]
        behind_counts = self.data[self.data['balls'] < self.data['strikes']]

        swing_events = ['hit_into_play', 'foul', 'swinging_strike',
                       'foul_tip', 'swinging_strike_blocked']

        if len(ahead_counts) > 0:
            ahead_swings = ahead_counts[ahead_counts['description'].isin(swing_events)]
            self.profile['swing_rate_ahead'] = len(ahead_swings) / len(ahead_counts) * 100
        else:
            self.profile['swing_rate_ahead'] = np.nan

        if len(behind_counts) > 0:
            behind_swings = behind_counts[behind_counts['description'].isin(swing_events)]
            self.profile['swing_rate_behind'] = len(behind_swings) / len(behind_counts) * 100
        else:
            self.profile['swing_rate_behind'] = np.nan

    def generate_full_profile(self):
        """Generate complete player profile"""
        self.calculate_bat_tracking_metrics()
        self.calculate_contact_metrics()
        self.calculate_batted_ball_metrics()
        self.calculate_approach_metrics()

        return pd.Series(self.profile, name=self.player_name)

# Generate profiles for multiple players
players = {
    'Shohei Ohtani': 660271,
    'Aaron Judge': 592450,
    'Juan Soto': 665742,
    'Mookie Betts': 605141,
    'Freddie Freeman': 518692,
    'Bobby Witt Jr': 677951
}

profiles = []

for name, player_id in players.items():
    print(f"Processing {name}...")
    profiler = HitterProfile(player_id, name)
    profiler.load_data()
    profile = profiler.generate_full_profile()
    profiles.append(profile)

# Combine into dataframe
profiles_df = pd.DataFrame(profiles)
profiles_df = profiles_df.round(2)

print("\nComprehensive Hitter Profiles:")
print(profiles_df.T)

# Standardize metrics for comparison
scaler = StandardScaler()
metrics_to_scale = ['avg_bat_speed', 'avg_exit_velo', 'contact_rate',
                   'barrel_rate', 'hard_hit_rate', 'sweet_spot_pct']
profiles_df_scaled = profiles_df.copy()
profiles_df_scaled[metrics_to_scale] = scaler.fit_transform(profiles_df[metrics_to_scale].fillna(0))

# Create radar chart for comparison
def create_radar_chart(data, players, metrics, title):
    """Create radar chart comparing players"""

    angles = np.linspace(0, 2 * np.pi, len(metrics), endpoint=False).tolist()
    angles += angles[:1]

    fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(projection='polar'))

    colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b']

    for idx, player in enumerate(players):
        values = data.loc[data.index[idx], metrics].tolist()
        values += values[:1]

        ax.plot(angles, values, 'o-', linewidth=2, label=player, color=colors[idx % len(colors)])
        ax.fill(angles, values, alpha=0.15, color=colors[idx % len(colors)])

    ax.set_xticks(angles[:-1])
    ax.set_xticklabels(metrics, size=10)
    ax.set_ylim(-2, 2)
    ax.set_title(title, size=14, fontweight='bold', pad=20)
    ax.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1))
    ax.grid(True)

    return fig

# Create radar chart
radar_metrics = ['avg_bat_speed', 'avg_exit_velo', 'contact_rate',
                'barrel_rate', 'hard_hit_rate', 'sweet_spot_pct']
fig = create_radar_chart(profiles_df_scaled, list(players.keys()), radar_metrics,
                        'Comprehensive Hitter Comparison (Standardized)')
plt.tight_layout()
plt.savefig('hitter_radar_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

# Correlation matrix
fig, ax = plt.subplots(figsize=(12, 10))
correlation_metrics = ['avg_bat_speed', 'avg_exit_velo', 'contact_rate',
                      'whiff_rate', 'chase_rate', 'barrel_rate',
                      'hard_hit_rate', 'sweet_spot_pct', 'xwOBA']
corr_data = profiles_df[correlation_metrics].T
sns.heatmap(corr_data, annot=True, fmt='.2f', cmap='RdYlGn', center=0,
           square=True, linewidths=1, cbar_kws={"shrink": 0.8})
ax.set_title('Metric Correlation Across Elite Hitters', fontsize=14, fontweight='bold', pad=20)
plt.tight_layout()
plt.savefig('hitter_metrics_correlation.png', dpi=300, bbox_inches='tight')
plt.show()

25.7 Interactive Visualizations

Interactive visualizations allow analysts and coaches to explore multidimensional data in real-time. This section demonstrates how to create interactive dashboards using R Shiny and Python Plotly.

Python: Interactive Plotly Dashboard

# Python: Interactive Bat Tracking Dashboard with Plotly
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import pybaseball as pyb
import pandas as pd
import numpy as np

def create_interactive_dashboard(player_id, player_name):
    """Create comprehensive interactive dashboard for a player"""

    # Load data
    data = pyb.statcast_batter('2024-04-01', '2024-09-30', player_id)

    # Simulate bat tracking
    np.random.seed(player_id)
    data['bat_speed'] = np.where(
        data['launch_speed'].notna(),
        70 + (data['launch_speed'] - 85) * 0.15 + np.random.normal(0, 1.5, len(data)),
        np.nan
    )

    # Create figure with subplots
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Exit Velocity vs Launch Angle',
                       'Bat Speed Distribution',
                       'Swing Decisions by Zone',
                       'Performance by Count'),
        specs=[[{'type': 'scatter'}, {'type': 'histogram'}],
               [{'type': 'scatter'}, {'type': 'bar'}]]
    )

    # Plot 1: Exit velocity vs launch angle
    batted_balls = data[data['launch_speed'].notna()]
    fig.add_trace(
        go.Scatter(
            x=batted_balls['launch_angle'],
            y=batted_balls['launch_speed'],
            mode='markers',
            marker=dict(
                size=8,
                color=batted_balls['estimated_woba_using_speedangle'],
                colorscale='Viridis',
                showscale=True,
                colorbar=dict(title="xwOBA", x=0.46)
            ),
            text=batted_balls['events'],
            name='Batted Balls',
            hovertemplate='<b>LA:</b> %{x:.1f}°<br>' +
                         '<b>EV:</b> %{y:.1f} mph<br>' +
                         '<b>Outcome:</b> %{text}<br>' +
                         '<extra></extra>'
        ),
        row=1, col=1
    )

    # Add barrel zone
    fig.add_shape(
        type="rect",
        x0=26, y0=98, x1=30, y1=120,
        line=dict(color="Red", width=2, dash="dash"),
        fillcolor="red",
        opacity=0.1,
        row=1, col=1
    )

    # Plot 2: Bat speed distribution
    bat_speeds = data[data['bat_speed'].notna()]['bat_speed']
    fig.add_trace(
        go.Histogram(
            x=bat_speeds,
            nbinsx=30,
            name='Bat Speed',
            marker_color='steelblue',
            hovertemplate='<b>Bat Speed:</b> %{x:.1f} mph<br>' +
                         '<b>Count:</b> %{y}<br>' +
                         '<extra></extra>'
        ),
        row=1, col=2
    )

    # Add average line
    avg_bat_speed = bat_speeds.mean()
    fig.add_vline(
        x=avg_bat_speed,
        line_dash="dash",
        line_color="red",
        annotation_text=f"Avg: {avg_bat_speed:.1f}",
        row=1, col=2
    )

    # Plot 3: Swing decisions by zone
    swing_events = ['hit_into_play', 'foul', 'swinging_strike',
                   'foul_tip', 'swinging_strike_blocked']
    swings = data[data['description'].isin(swing_events)]

    # Create zone grid
    zone_data = swings.groupby('zone').size().reset_index(name='count')

    fig.add_trace(
        go.Scatter(
            x=swings['plate_x'],
            y=swings['plate_z'],
            mode='markers',
            marker=dict(
                size=5,
                color='red',
                opacity=0.3
            ),
            name='Swings',
            hovertemplate='<b>X:</b> %{x:.2f}<br>' +
                         '<b>Z:</b> %{y:.2f}<br>' +
                         '<extra></extra>'
        ),
        row=2, col=1
    )

    # Add strike zone
    fig.add_shape(
        type="rect",
        x0=-0.83, y0=1.5, x1=0.83, y1=3.5,
        line=dict(color="Black", width=2),
        fillcolor="lightgray",
        opacity=0.2,
        row=2, col=1
    )

    # Plot 4: Performance by count
    # Group by balls and strikes
    count_groups = data.groupby(['balls', 'strikes']).agg({
        'launch_speed': 'mean',
        'estimated_woba_using_speedangle': 'mean'
    }).reset_index()
    count_groups['count'] = count_groups['balls'].astype(str) + '-' + count_groups['strikes'].astype(str)

    fig.add_trace(
        go.Bar(
            x=count_groups['count'],
            y=count_groups['estimated_woba_using_speedangle'],
            name='xwOBA by Count',
            marker_color='steelblue',
            hovertemplate='<b>Count:</b> %{x}<br>' +
                         '<b>xwOBA:</b> %{y:.3f}<br>' +
                         '<extra></extra>'
        ),
        row=2, col=2
    )

    # Update layout
    fig.update_xaxes(title_text="Launch Angle (degrees)", row=1, col=1)
    fig.update_yaxes(title_text="Exit Velocity (mph)", row=1, col=1)

    fig.update_xaxes(title_text="Bat Speed (mph)", row=1, col=2)
    fig.update_yaxes(title_text="Frequency", row=1, col=2)

    fig.update_xaxes(title_text="Horizontal Location", row=2, col=1)
    fig.update_yaxes(title_text="Vertical Location", row=2, col=1)

    fig.update_xaxes(title_text="Count", row=2, col=2)
    fig.update_yaxes(title_text="xwOBA", row=2, col=2)

    fig.update_layout(
        title_text=f"{player_name} - Advanced Tracking Dashboard (2024)",
        title_font_size=20,
        showlegend=False,
        height=900,
        width=1400
    )

    return fig

# Create dashboard for Shohei Ohtani
dashboard = create_interactive_dashboard(660271, "Shohei Ohtani")
dashboard.write_html("ohtani_dashboard.html")
dashboard.show()

print("Interactive dashboard saved to 'ohtani_dashboard.html'")

R Shiny: Interactive Application

# R: Shiny App for Bat Tracking Analysis
# Save this as app.R and run with: shiny::runApp()

library(shiny)
library(dplyr)
library(ggplot2)
library(plotly)

# Note: This is a template. In production, connect to real Statcast API

# UI
ui <- fluidPage(
  titlePanel("Advanced Bat Tracking Analysis"),

  sidebarLayout(
    sidebarPanel(
      selectInput("player", "Select Player:",
                 choices = c("Shohei Ohtani", "Aaron Judge", "Juan Soto",
                           "Mookie Betts", "Freddie Freeman")),

      dateRangeInput("dates", "Date Range:",
                    start = "2024-04-01",
                    end = "2024-09-30"),

      sliderInput("min_ev", "Minimum Exit Velocity:",
                 min = 50, max = 120, value = 80),

      checkboxGroupInput("pitch_types", "Pitch Types:",
                        choices = c("Fastball" = "FF",
                                  "Slider" = "SL",
                                  "Changeup" = "CH",
                                  "Curveball" = "CU"),
                        selected = c("FF", "SL", "CH", "CU")),

      actionButton("refresh", "Refresh Data",
                  class = "btn-primary")
    ),

    mainPanel(
      tabsetPanel(
        tabPanel("Bat Speed Analysis",
                plotlyOutput("bat_speed_plot", height = "500px"),
                verbatimTextOutput("bat_speed_stats")),

        tabPanel("Launch Optimization",
                plotlyOutput("launch_plot", height = "500px"),
                verbatimTextOutput("launch_stats")),

        tabPanel("Swing Decisions",
                plotlyOutput("swing_decisions_plot", height = "500px"),
                verbatimTextOutput("swing_stats")),

        tabPanel("Summary Metrics",
                tableOutput("summary_table"))
      )
    )
  )
)

# Server
server <- function(input, output) {

  # Reactive data loading
  player_data <- reactive({
    # In production, load from Statcast API
    # This is simulated data
    set.seed(123)
    n <- 500

    data.frame(
      bat_speed = rnorm(n, 73, 3),
      exit_velo = rnorm(n, 91, 8),
      launch_angle = rnorm(n, 12, 15),
      pitch_type = sample(c("FF", "SL", "CH", "CU"), n, replace = TRUE),
      in_zone = sample(c(TRUE, FALSE), n, replace = TRUE, prob = c(0.5, 0.5)),
      swing = sample(c(TRUE, FALSE), n, replace = TRUE, prob = c(0.6, 0.4))
    ) %>%
      filter(exit_velo >= input$min_ev,
            pitch_type %in% input$pitch_types)
  })

  # Bat speed plot
  output$bat_speed_plot <- renderPlotly({
    data <- player_data()

    p <- ggplot(data, aes(x = bat_speed, y = exit_velo, color = pitch_type)) +
      geom_point(alpha = 0.5, size = 2) +
      geom_smooth(method = "lm", se = FALSE) +
      labs(title = paste(input$player, "- Bat Speed vs Exit Velocity"),
           x = "Bat Speed (mph)",
           y = "Exit Velocity (mph)",
           color = "Pitch Type") +
      theme_minimal()

    ggplotly(p)
  })

  # Bat speed stats
  output$bat_speed_stats <- renderPrint({
    data <- player_data()

    cat("Bat Speed Statistics\n")
    cat("====================\n")
    cat(sprintf("Mean: %.2f mph\n", mean(data$bat_speed)))
    cat(sprintf("Median: %.2f mph\n", median(data$bat_speed)))
    cat(sprintf("90th Percentile: %.2f mph\n", quantile(data$bat_speed, 0.9)))
    cat(sprintf("Max: %.2f mph\n", max(data$bat_speed)))
  })

  # Launch optimization plot
  output$launch_plot <- renderPlotly({
    data <- player_data()

    p <- ggplot(data, aes(x = launch_angle, y = exit_velo)) +
      geom_point(alpha = 0.4, color = "steelblue") +
      geom_density_2d(color = "red") +
      geom_vline(xintercept = c(8, 32), linetype = "dashed", color = "green") +
      annotate("rect", xmin = 8, xmax = 32, ymin = -Inf, ymax = Inf,
              alpha = 0.1, fill = "green") +
      labs(title = paste(input$player, "- Launch Angle Optimization"),
           x = "Launch Angle (degrees)",
           y = "Exit Velocity (mph)") +
      theme_minimal()

    ggplotly(p)
  })

  # Launch stats
  output$launch_stats <- renderPrint({
    data <- player_data()
    sweet_spot <- sum(data$launch_angle >= 8 & data$launch_angle <= 32)

    cat("Launch Angle Statistics\n")
    cat("=======================\n")
    cat(sprintf("Mean LA: %.2f degrees\n", mean(data$launch_angle)))
    cat(sprintf("Sweet Spot %%: %.1f%%\n", sweet_spot / nrow(data) * 100))
    cat(sprintf("Avg EV in Sweet Spot: %.2f mph\n",
               mean(data$exit_velo[data$launch_angle >= 8 & data$launch_angle <= 32])))
  })

  # Swing decisions plot
  output$swing_decisions_plot <- renderPlotly({
    data <- player_data()

    decision_summary <- data %>%
      group_by(in_zone, swing) %>%
      summarise(count = n(), .groups = "drop") %>%
      mutate(zone = ifelse(in_zone, "In Zone", "Out of Zone"),
            decision = ifelse(swing, "Swing", "Take"))

    p <- ggplot(decision_summary, aes(x = zone, y = count, fill = decision)) +
      geom_bar(stat = "identity", position = "fill") +
      scale_y_continuous(labels = scales::percent) +
      labs(title = paste(input$player, "- Swing Decisions by Zone"),
           x = "Zone",
           y = "Percentage",
           fill = "Decision") +
      theme_minimal()

    ggplotly(p)
  })

  # Swing stats
  output$swing_stats <- renderPrint({
    data <- player_data()

    in_zone_swing <- mean(data$swing[data$in_zone])
    out_zone_swing <- mean(data$swing[!data$in_zone])

    cat("Swing Decision Statistics\n")
    cat("==========================\n")
    cat(sprintf("Z-Swing%%: %.1f%%\n", in_zone_swing * 100))
    cat(sprintf("O-Swing%% (Chase): %.1f%%\n", out_zone_swing * 100))
    cat(sprintf("Overall Swing%%: %.1f%%\n", mean(data$swing) * 100))
  })

  # Summary table
  output$summary_table <- renderTable({
    data <- player_data()

    data.frame(
      Metric = c("Avg Bat Speed", "Avg Exit Velocity", "Avg Launch Angle",
                "Sweet Spot %", "Chase Rate", "Sample Size"),
      Value = c(
        sprintf("%.2f mph", mean(data$bat_speed)),
        sprintf("%.2f mph", mean(data$exit_velo)),
        sprintf("%.2f degrees", mean(data$launch_angle)),
        sprintf("%.1f%%", sum(data$launch_angle >= 8 & data$launch_angle <= 32) / nrow(data) * 100),
        sprintf("%.1f%%", mean(data$swing[!data$in_zone]) * 100),
        nrow(data)
      )
    )
  }, striped = TRUE, hover = TRUE)
}

# Run the app
# shinyApp(ui = ui, server = server)

cat("Shiny app code ready. Run with: shiny::runApp()\n")
R
# R: Shiny App for Bat Tracking Analysis
# Save this as app.R and run with: shiny::runApp()

library(shiny)
library(dplyr)
library(ggplot2)
library(plotly)

# Note: This is a template. In production, connect to real Statcast API

# UI
ui <- fluidPage(
  titlePanel("Advanced Bat Tracking Analysis"),

  sidebarLayout(
    sidebarPanel(
      selectInput("player", "Select Player:",
                 choices = c("Shohei Ohtani", "Aaron Judge", "Juan Soto",
                           "Mookie Betts", "Freddie Freeman")),

      dateRangeInput("dates", "Date Range:",
                    start = "2024-04-01",
                    end = "2024-09-30"),

      sliderInput("min_ev", "Minimum Exit Velocity:",
                 min = 50, max = 120, value = 80),

      checkboxGroupInput("pitch_types", "Pitch Types:",
                        choices = c("Fastball" = "FF",
                                  "Slider" = "SL",
                                  "Changeup" = "CH",
                                  "Curveball" = "CU"),
                        selected = c("FF", "SL", "CH", "CU")),

      actionButton("refresh", "Refresh Data",
                  class = "btn-primary")
    ),

    mainPanel(
      tabsetPanel(
        tabPanel("Bat Speed Analysis",
                plotlyOutput("bat_speed_plot", height = "500px"),
                verbatimTextOutput("bat_speed_stats")),

        tabPanel("Launch Optimization",
                plotlyOutput("launch_plot", height = "500px"),
                verbatimTextOutput("launch_stats")),

        tabPanel("Swing Decisions",
                plotlyOutput("swing_decisions_plot", height = "500px"),
                verbatimTextOutput("swing_stats")),

        tabPanel("Summary Metrics",
                tableOutput("summary_table"))
      )
    )
  )
)

# Server
server <- function(input, output) {

  # Reactive data loading
  player_data <- reactive({
    # In production, load from Statcast API
    # This is simulated data
    set.seed(123)
    n <- 500

    data.frame(
      bat_speed = rnorm(n, 73, 3),
      exit_velo = rnorm(n, 91, 8),
      launch_angle = rnorm(n, 12, 15),
      pitch_type = sample(c("FF", "SL", "CH", "CU"), n, replace = TRUE),
      in_zone = sample(c(TRUE, FALSE), n, replace = TRUE, prob = c(0.5, 0.5)),
      swing = sample(c(TRUE, FALSE), n, replace = TRUE, prob = c(0.6, 0.4))
    ) %>%
      filter(exit_velo >= input$min_ev,
            pitch_type %in% input$pitch_types)
  })

  # Bat speed plot
  output$bat_speed_plot <- renderPlotly({
    data <- player_data()

    p <- ggplot(data, aes(x = bat_speed, y = exit_velo, color = pitch_type)) +
      geom_point(alpha = 0.5, size = 2) +
      geom_smooth(method = "lm", se = FALSE) +
      labs(title = paste(input$player, "- Bat Speed vs Exit Velocity"),
           x = "Bat Speed (mph)",
           y = "Exit Velocity (mph)",
           color = "Pitch Type") +
      theme_minimal()

    ggplotly(p)
  })

  # Bat speed stats
  output$bat_speed_stats <- renderPrint({
    data <- player_data()

    cat("Bat Speed Statistics\n")
    cat("====================\n")
    cat(sprintf("Mean: %.2f mph\n", mean(data$bat_speed)))
    cat(sprintf("Median: %.2f mph\n", median(data$bat_speed)))
    cat(sprintf("90th Percentile: %.2f mph\n", quantile(data$bat_speed, 0.9)))
    cat(sprintf("Max: %.2f mph\n", max(data$bat_speed)))
  })

  # Launch optimization plot
  output$launch_plot <- renderPlotly({
    data <- player_data()

    p <- ggplot(data, aes(x = launch_angle, y = exit_velo)) +
      geom_point(alpha = 0.4, color = "steelblue") +
      geom_density_2d(color = "red") +
      geom_vline(xintercept = c(8, 32), linetype = "dashed", color = "green") +
      annotate("rect", xmin = 8, xmax = 32, ymin = -Inf, ymax = Inf,
              alpha = 0.1, fill = "green") +
      labs(title = paste(input$player, "- Launch Angle Optimization"),
           x = "Launch Angle (degrees)",
           y = "Exit Velocity (mph)") +
      theme_minimal()

    ggplotly(p)
  })

  # Launch stats
  output$launch_stats <- renderPrint({
    data <- player_data()
    sweet_spot <- sum(data$launch_angle >= 8 & data$launch_angle <= 32)

    cat("Launch Angle Statistics\n")
    cat("=======================\n")
    cat(sprintf("Mean LA: %.2f degrees\n", mean(data$launch_angle)))
    cat(sprintf("Sweet Spot %%: %.1f%%\n", sweet_spot / nrow(data) * 100))
    cat(sprintf("Avg EV in Sweet Spot: %.2f mph\n",
               mean(data$exit_velo[data$launch_angle >= 8 & data$launch_angle <= 32])))
  })

  # Swing decisions plot
  output$swing_decisions_plot <- renderPlotly({
    data <- player_data()

    decision_summary <- data %>%
      group_by(in_zone, swing) %>%
      summarise(count = n(), .groups = "drop") %>%
      mutate(zone = ifelse(in_zone, "In Zone", "Out of Zone"),
            decision = ifelse(swing, "Swing", "Take"))

    p <- ggplot(decision_summary, aes(x = zone, y = count, fill = decision)) +
      geom_bar(stat = "identity", position = "fill") +
      scale_y_continuous(labels = scales::percent) +
      labs(title = paste(input$player, "- Swing Decisions by Zone"),
           x = "Zone",
           y = "Percentage",
           fill = "Decision") +
      theme_minimal()

    ggplotly(p)
  })

  # Swing stats
  output$swing_stats <- renderPrint({
    data <- player_data()

    in_zone_swing <- mean(data$swing[data$in_zone])
    out_zone_swing <- mean(data$swing[!data$in_zone])

    cat("Swing Decision Statistics\n")
    cat("==========================\n")
    cat(sprintf("Z-Swing%%: %.1f%%\n", in_zone_swing * 100))
    cat(sprintf("O-Swing%% (Chase): %.1f%%\n", out_zone_swing * 100))
    cat(sprintf("Overall Swing%%: %.1f%%\n", mean(data$swing) * 100))
  })

  # Summary table
  output$summary_table <- renderTable({
    data <- player_data()

    data.frame(
      Metric = c("Avg Bat Speed", "Avg Exit Velocity", "Avg Launch Angle",
                "Sweet Spot %", "Chase Rate", "Sample Size"),
      Value = c(
        sprintf("%.2f mph", mean(data$bat_speed)),
        sprintf("%.2f mph", mean(data$exit_velo)),
        sprintf("%.2f degrees", mean(data$launch_angle)),
        sprintf("%.1f%%", sum(data$launch_angle >= 8 & data$launch_angle <= 32) / nrow(data) * 100),
        sprintf("%.1f%%", mean(data$swing[!data$in_zone]) * 100),
        nrow(data)
      )
    )
  }, striped = TRUE, hover = TRUE)
}

# Run the app
# shinyApp(ui = ui, server = server)

cat("Shiny app code ready. Run with: shiny::runApp()\n")
Python
# Python: Interactive Bat Tracking Dashboard with Plotly
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import pybaseball as pyb
import pandas as pd
import numpy as np

def create_interactive_dashboard(player_id, player_name):
    """Create comprehensive interactive dashboard for a player"""

    # Load data
    data = pyb.statcast_batter('2024-04-01', '2024-09-30', player_id)

    # Simulate bat tracking
    np.random.seed(player_id)
    data['bat_speed'] = np.where(
        data['launch_speed'].notna(),
        70 + (data['launch_speed'] - 85) * 0.15 + np.random.normal(0, 1.5, len(data)),
        np.nan
    )

    # Create figure with subplots
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Exit Velocity vs Launch Angle',
                       'Bat Speed Distribution',
                       'Swing Decisions by Zone',
                       'Performance by Count'),
        specs=[[{'type': 'scatter'}, {'type': 'histogram'}],
               [{'type': 'scatter'}, {'type': 'bar'}]]
    )

    # Plot 1: Exit velocity vs launch angle
    batted_balls = data[data['launch_speed'].notna()]
    fig.add_trace(
        go.Scatter(
            x=batted_balls['launch_angle'],
            y=batted_balls['launch_speed'],
            mode='markers',
            marker=dict(
                size=8,
                color=batted_balls['estimated_woba_using_speedangle'],
                colorscale='Viridis',
                showscale=True,
                colorbar=dict(title="xwOBA", x=0.46)
            ),
            text=batted_balls['events'],
            name='Batted Balls',
            hovertemplate='<b>LA:</b> %{x:.1f}°<br>' +
                         '<b>EV:</b> %{y:.1f} mph<br>' +
                         '<b>Outcome:</b> %{text}<br>' +
                         '<extra></extra>'
        ),
        row=1, col=1
    )

    # Add barrel zone
    fig.add_shape(
        type="rect",
        x0=26, y0=98, x1=30, y1=120,
        line=dict(color="Red", width=2, dash="dash"),
        fillcolor="red",
        opacity=0.1,
        row=1, col=1
    )

    # Plot 2: Bat speed distribution
    bat_speeds = data[data['bat_speed'].notna()]['bat_speed']
    fig.add_trace(
        go.Histogram(
            x=bat_speeds,
            nbinsx=30,
            name='Bat Speed',
            marker_color='steelblue',
            hovertemplate='<b>Bat Speed:</b> %{x:.1f} mph<br>' +
                         '<b>Count:</b> %{y}<br>' +
                         '<extra></extra>'
        ),
        row=1, col=2
    )

    # Add average line
    avg_bat_speed = bat_speeds.mean()
    fig.add_vline(
        x=avg_bat_speed,
        line_dash="dash",
        line_color="red",
        annotation_text=f"Avg: {avg_bat_speed:.1f}",
        row=1, col=2
    )

    # Plot 3: Swing decisions by zone
    swing_events = ['hit_into_play', 'foul', 'swinging_strike',
                   'foul_tip', 'swinging_strike_blocked']
    swings = data[data['description'].isin(swing_events)]

    # Create zone grid
    zone_data = swings.groupby('zone').size().reset_index(name='count')

    fig.add_trace(
        go.Scatter(
            x=swings['plate_x'],
            y=swings['plate_z'],
            mode='markers',
            marker=dict(
                size=5,
                color='red',
                opacity=0.3
            ),
            name='Swings',
            hovertemplate='<b>X:</b> %{x:.2f}<br>' +
                         '<b>Z:</b> %{y:.2f}<br>' +
                         '<extra></extra>'
        ),
        row=2, col=1
    )

    # Add strike zone
    fig.add_shape(
        type="rect",
        x0=-0.83, y0=1.5, x1=0.83, y1=3.5,
        line=dict(color="Black", width=2),
        fillcolor="lightgray",
        opacity=0.2,
        row=2, col=1
    )

    # Plot 4: Performance by count
    # Group by balls and strikes
    count_groups = data.groupby(['balls', 'strikes']).agg({
        'launch_speed': 'mean',
        'estimated_woba_using_speedangle': 'mean'
    }).reset_index()
    count_groups['count'] = count_groups['balls'].astype(str) + '-' + count_groups['strikes'].astype(str)

    fig.add_trace(
        go.Bar(
            x=count_groups['count'],
            y=count_groups['estimated_woba_using_speedangle'],
            name='xwOBA by Count',
            marker_color='steelblue',
            hovertemplate='<b>Count:</b> %{x}<br>' +
                         '<b>xwOBA:</b> %{y:.3f}<br>' +
                         '<extra></extra>'
        ),
        row=2, col=2
    )

    # Update layout
    fig.update_xaxes(title_text="Launch Angle (degrees)", row=1, col=1)
    fig.update_yaxes(title_text="Exit Velocity (mph)", row=1, col=1)

    fig.update_xaxes(title_text="Bat Speed (mph)", row=1, col=2)
    fig.update_yaxes(title_text="Frequency", row=1, col=2)

    fig.update_xaxes(title_text="Horizontal Location", row=2, col=1)
    fig.update_yaxes(title_text="Vertical Location", row=2, col=1)

    fig.update_xaxes(title_text="Count", row=2, col=2)
    fig.update_yaxes(title_text="xwOBA", row=2, col=2)

    fig.update_layout(
        title_text=f"{player_name} - Advanced Tracking Dashboard (2024)",
        title_font_size=20,
        showlegend=False,
        height=900,
        width=1400
    )

    return fig

# Create dashboard for Shohei Ohtani
dashboard = create_interactive_dashboard(660271, "Shohei Ohtani")
dashboard.write_html("ohtani_dashboard.html")
dashboard.show()

print("Interactive dashboard saved to 'ohtani_dashboard.html'")

25.8 Exercises

Easy Exercises

Exercise 1: Basic Bat Speed Analysis (Easy)

Using the pybaseball package, load Statcast data for your favorite player from the 2024 season. Calculate their average exit velocity on pitches in the top third of the strike zone vs. the bottom third. What does this tell you about their bat path?

<details>
<summary>Hint</summary>
Filter the data by plate_z values. The strike zone typically ranges from about 1.5 to 3.5 feet. Top third would be plate_z > 2.8, bottom third would be plate_z < 2.2.
</details>


Exercise 2: Chase Rate Calculation (Easy)

Write a function in R that takes Statcast data and calculates:


  • O-Swing% (chase rate)

  • Z-Swing% (zone swing rate)

  • Overall swing rate

Test it on data from Aaron Judge and Juan Soto. Who has better plate discipline?


Exercise 3: Sweet Spot Percentage (Easy)

Calculate the sweet spot percentage (launch angles between 8-32 degrees) for the following players using 2024 data:


  • Mookie Betts

  • Ronald Acuna Jr

  • Freddie Freeman

Create a bar chart comparing their sweet spot percentages.


Medium Exercises

Exercise 4: Two-Strike Adjustment Analysis (Medium)

Analyze how elite hitters modify their approach with two strikes. For at least 3 players:

  1. Calculate their whiff rate with 0-1 strikes vs. 2 strikes
  2. Compare their average exit velocity in both situations
  3. Analyze their chase rate in both situations
  4. Create visualizations showing these adjustments

Which player makes the best two-strike adjustments?


Exercise 5: Optimal Attack Angle Model (Medium)

Build a simple linear model that predicts exit velocity based on:


  • Bat speed

  • Attack angle

  • Pitch location (height)

  • Pitch velocity

Use simulated or real data. Interpret the coefficients. What attack angle maximizes exit velocity for pitches at different heights?


Exercise 6: Pitch Type Performance Dashboard (Medium)

Create an interactive visualization (using Plotly or ggplot2) that shows a player's performance against different pitch types. Include:

  1. Exit velocity by pitch type
  2. Whiff rate by pitch type
  3. Launch angle distribution by pitch type
  4. xwOBA by pitch type

Make it filterable by count situation (ahead/behind/even).


Hard Exercises

Exercise 7: Expected Run Value Model (Hard)

Build a comprehensive expected run value model that:

  1. Assigns run values to different count states
  2. Calculates pitch values based on count changes
  3. Incorporates batted ball outcomes with Statcast expected stats
  4. Aggregates to player-level total run contribution

Compare your model's player rankings to traditional stats like wOBA or wRC+. How well do they correlate?


Exercise 8: Player Clustering Analysis (Hard)

Perform a clustering analysis on MLB hitters using the following metrics:


  • Bat speed

  • Swing length

  • Contact rate

  • Chase rate

  • Average exit velocity

  • Launch angle

  • Barrel rate

Tasks:


  1. Determine the optimal number of clusters using the elbow method

  2. Perform k-means clustering

  3. Interpret what each cluster represents (e.g., "power hitters," "contact hitters," etc.)

  4. Create visualizations showing the clusters

  5. Identify which players are most similar to each other


Exercise 9: Swing Decision Machine Learning Model (Hard)

Build a machine learning model that predicts whether a hitter will swing based on:


  • Count (balls, strikes)

  • Pitch location (platex, platez)

  • Pitch type

  • Pitch velocity

  • Previous pitch outcome

  • Player characteristics (bat speed, typical chase rate)

Tasks:


  1. Collect and prepare training data

  2. Engineer relevant features

  3. Train multiple models (logistic regression, random forest, XGBoost)

  4. Evaluate model performance

  5. Interpret feature importance

  6. Identify which hitters are most predictable and which are most unpredictable


Exercise 10: Comprehensive Hitter Evaluation System (Hard)

Create a complete hitter evaluation system that:

  1. Combines bat tracking, Statcast, and traditional statistics
  2. Weights each component appropriately
  3. Produces a single overall rating (0-100 scale)
  4. Breaks down the rating into sub-components:
  • Bat speed & strength
  • Contact ability
  • Plate discipline
  • Quality of contact
  • Launch optimization
  1. Validates your system by comparing to actual performance (wOBA, wRC+)
  2. Creates a comprehensive report for any player that includes:
  • Overall rating and percentile rank
  • Radar chart of skills
  • Comparisons to similar players
  • Strengths and weaknesses
  • Actionable development recommendations

Apply your system to at least 50 qualified hitters from 2024 and present your top 10.


Summary

This chapter has explored the cutting edge of baseball analytics through advanced Statcast and bat tracking technology. We've covered:

  • The technical foundations of bat tracking systems
  • Key metrics including bat speed, swing length, and squared-up rate
  • Swing decision analysis through chase rate, zone contact, and whiff rate
  • Expected run value models for quantifying offensive contribution
  • Attack angle and launch optimization for contact quality
  • Comprehensive player evaluation frameworks
  • Interactive visualization techniques for multidimensional data

As tracking technology continues to evolve, the ability to combine multiple data sources into holistic player evaluations will become increasingly important. The skills developed in this chapter—data integration, statistical modeling, and interactive visualization—form the foundation for modern baseball analysis.

The exercises provide opportunities to apply these concepts, from basic calculations to sophisticated machine learning models. By working through them, you'll develop the practical skills needed to perform advanced analytics in professional baseball operations.


Further Reading:

  • Baseball Savant Statcast Leaderboards: https://baseballsavant.mlb.com
  • Driveline Baseball Research: https://www.drivelinebaseball.com/research/
  • FanGraphs Advanced Statistics Library: https://library.fangraphs.com
  • MLB's Official Statcast Documentation
  • Academic papers on biomechanics and hitting optimization

Next Chapter Preview:

Chapter 26 will explore pitch design and arsenal optimization, using Statcast tracking data to help pitchers develop new pitches and optimize their existing repertoire for maximum effectiveness.

Chapter Summary

In this chapter, you learned about advanced statcast & bat tracking. Key topics covered:

  • Introduction to Advanced Tracking Technology
  • Bat Tracking Metrics
  • Swing Decisions
  • Expected Run Values
  • Attack Angle & Launch Optimization
  • Combining Tracking Data for Player Evaluation