6.1.1 What Statcast Measures
In 2015, Major League Baseball underwent a revolutionary transformation in how we analyze the game. Statcast, a state-of-the-art tracking system, was installed in all 30 MLB ballparks, fundamentally changing our understanding of baseball performance. The system combines two technologies: TrackMan doppler radar (for ball tracking) and ChyronHego cameras (for player tracking).
For hitting analysis, Statcast captures an unprecedented level of detail on every batted ball:
- Exit Velocity: The speed of the ball as it comes off the bat (measured in mph)
- Launch Angle: The vertical angle at which the ball leaves the bat (measured in degrees)
- Hit Distance: The projected distance a batted ball would travel
- Hit Direction: The horizontal spray angle of the batted ball
- Hang Time: How long the ball remains in the air
For player movement, Statcast tracks:
- Sprint Speed: A runner's maximum speed on competitive plays (measured in feet/second)
- Home to First Time: Time elapsed from contact to reaching first base
- Baserunning Routes: Efficiency and path optimization
This wealth of data has enabled the creation of expected statistics (xStats) - what a player's outcomes should be based purely on contact quality, independent of defensive positioning, park factors, or luck.
6.1.2 Key Hitting Metrics Overview
Here's a comprehensive table of the most important Statcast hitting metrics:
| Metric | Definition | MLB Average | Elite Threshold | What It Reveals |
|---|---|---|---|---|
| Exit Velocity (EV) | Speed off bat | ~89 mph | 93+ mph | Raw power and contact quality |
| Max Exit Velocity | Hardest contact | ~110 mph | 115+ mph | Peak power capability |
| Hard-Hit Rate | % of batted balls ≥95 mph | ~35% | 45%+ | Consistency of hard contact |
| Launch Angle | Vertical angle | ~10-15° | Varies by approach | Ball flight trajectory |
| Sweet Spot % | % hit at 8-32° | ~33% | 40%+ | Optimal contact rate |
| Barrel % | % of "perfect" contact | ~6-8% | 12%+ | Elite contact quality |
| xBA | Expected batting avg | .245 | .280+ | True contact quality |
| xwOBA | Expected wOBA | .320 | .360+ | Comprehensive hitting value |
| Sprint Speed | Max speed (ft/s) | ~27 ft/s | 30+ ft/s | Athleticism and baserunning |
These metrics form the foundation of modern hitting analysis. Unlike traditional statistics that only tell us what happened, Statcast metrics tell us how it happened and often what should have happened.
6.2.1 Understanding Exit Velocity
Exit Velocity (EV) is perhaps the single most important Statcast metric. It measures the speed of the baseball immediately after contact with the bat, before air resistance affects the ball's flight. Think of it as the "power" measurement - harder hit balls are more likely to become hits and extra-base hits.
The physics is straightforward: exit velocity is determined by three factors:
- Bat speed: How fast the bat is moving at contact
- Pitch velocity: The incoming speed of the pitch (energy transfer)
- Contact quality: Where on the bat and ball the collision occurs
The relationship between exit velocity and outcomes is remarkably strong:
- < 85 mph: Low probability of becoming a hit (~.100 BA)
- 85-95 mph: Moderate hit probability (~.250 BA)
- 95-105 mph: High hit probability (~.500 BA)
- 105+ mph: Very high hit probability (~.700+ BA)
The MLB average exit velocity hovers around 88-89 mph on all batted balls. Elite power hitters consistently average 92-94 mph, with the very best reaching 95+ mph. As of 2024, players like Aaron Judge, Giancarlo Stanton, and Kyle Schwarber regularly lead in average exit velocity.
6.2.2 Exit Velocity Metrics with Code
Let's explore how to calculate and analyze various exit velocity metrics using both R and Python.
Python Implementation
import pandas as pd
from pybaseball import statcast_batter, playerid_lookup
import numpy as np
from datetime import datetime
# Get a player's ID (Aaron Judge example)
judge_id = playerid_lookup('judge', 'aaron')
player_id = 592450 # Aaron Judge
# Fetch Statcast data for 2024 season
start_date = '2024-04-01'
end_date = '2024-10-01'
statcast_data = statcast_batter(start_date, end_date, player_id)
# Calculate comprehensive exit velocity metrics
def calculate_ev_metrics(df):
"""
Calculate comprehensive exit velocity metrics from Statcast data.
Parameters:
df: DataFrame with Statcast data including 'launch_speed' column
Returns:
Dictionary of exit velocity metrics
"""
# Filter for batted balls only (exclude nulls)
batted_balls = df[df['launch_speed'].notna()].copy()
if len(batted_balls) == 0:
return None
metrics = {
'avg_ev': batted_balls['launch_speed'].mean(),
'max_ev': batted_balls['launch_speed'].max(),
'min_ev': batted_balls['launch_speed'].min(),
'ev_90th_percentile': batted_balls['launch_speed'].quantile(0.90),
'ev_50th_percentile': batted_balls['launch_speed'].median(),
'hard_hit_count': (batted_balls['launch_speed'] >= 95).sum(),
'hard_hit_pct': (batted_balls['launch_speed'] >= 95).mean() * 100,
'soft_contact_pct': (batted_balls['launch_speed'] < 85).mean() * 100,
'medium_contact_pct': ((batted_balls['launch_speed'] >= 85) &
(batted_balls['launch_speed'] < 95)).mean() * 100,
'batted_balls': len(batted_balls)
}
return metrics
# Calculate metrics
ev_metrics = calculate_ev_metrics(statcast_data)
# Display results
print("Aaron Judge 2024 Exit Velocity Profile")
print("=" * 50)
print(f"Average Exit Velocity: {ev_metrics['avg_ev']:.1f} mph")
print(f"Maximum Exit Velocity: {ev_metrics['max_ev']:.1f} mph")
print(f"90th Percentile EV: {ev_metrics['ev_90th_percentile']:.1f} mph")
print(f"Median Exit Velocity: {ev_metrics['ev_50th_percentile']:.1f} mph")
print(f"\nContact Distribution:")
print(f"Hard Hit Rate (≥95 mph): {ev_metrics['hard_hit_pct']:.1f}%")
print(f"Medium Contact (85-94 mph): {ev_metrics['medium_contact_pct']:.1f}%")
print(f"Soft Contact (<85 mph): {ev_metrics['soft_contact_pct']:.1f}%")
print(f"\nTotal Batted Balls: {ev_metrics['batted_balls']}")
R Implementation
library(baseballr)
library(dplyr)
library(tidyr)
# Fetch Statcast data for Aaron Judge (2024)
judge_data <- statcast_search_batters(
start_date = "2024-04-01",
end_date = "2024-10-01",
batterid = 592450 # Aaron Judge
)
# Calculate comprehensive exit velocity metrics
calculate_ev_metrics <- function(df) {
# Filter for batted balls only
batted_balls <- df %>%
filter(!is.na(launch_speed))
if (nrow(batted_balls) == 0) {
return(NULL)
}
metrics <- list(
avg_ev = mean(batted_balls$launch_speed, na.rm = TRUE),
max_ev = max(batted_balls$launch_speed, na.rm = TRUE),
min_ev = min(batted_balls$launch_speed, na.rm = TRUE),
ev_90th = quantile(batted_balls$launch_speed, 0.90, na.rm = TRUE),
ev_50th = median(batted_balls$launch_speed, na.rm = TRUE),
hard_hit_count = sum(batted_balls$launch_speed >= 95, na.rm = TRUE),
hard_hit_pct = mean(batted_balls$launch_speed >= 95, na.rm = TRUE) * 100,
soft_contact_pct = mean(batted_balls$launch_speed < 85, na.rm = TRUE) * 100,
medium_contact_pct = mean(batted_balls$launch_speed >= 85 &
batted_balls$launch_speed < 95, na.rm = TRUE) * 100,
batted_balls = nrow(batted_balls)
)
return(metrics)
}
# Calculate metrics
ev_metrics <- calculate_ev_metrics(judge_data)
# Display results
cat("Aaron Judge 2024 Exit Velocity Profile\n")
cat(strrep("=", 50), "\n")
cat(sprintf("Average Exit Velocity: %.1f mph\n", ev_metrics$avg_ev))
cat(sprintf("Maximum Exit Velocity: %.1f mph\n", ev_metrics$max_ev))
cat(sprintf("90th Percentile EV: %.1f mph\n", ev_metrics$ev_90th))
cat(sprintf("Median Exit Velocity: %.1f mph\n", ev_metrics$ev_50th))
cat("\nContact Distribution:\n")
cat(sprintf("Hard Hit Rate (≥95 mph): %.1f%%\n", ev_metrics$hard_hit_pct))
cat(sprintf("Medium Contact (85-94 mph): %.1f%%\n", ev_metrics$medium_contact_pct))
cat(sprintf("Soft Contact (<85 mph): %.1f%%\n", ev_metrics$soft_contact_pct))
cat(sprintf("\nTotal Batted Balls: %d\n", ev_metrics$batted_balls))
6.2.3 Hard-Hit Rate: The 95 mph Threshold
Hard-Hit Rate is defined as the percentage of batted balls with an exit velocity of 95 mph or greater. This threshold isn't arbitrary - research shows that 95 mph represents a meaningful breakpoint where hit probability increases dramatically.
Why 95 mph matters:
- Balls hit 95+ mph have a batting average around .500
- They're more likely to find gaps and fall for hits
- They're harder for fielders to react to and convert into outs
- They correlate strongly with power output (HR, XBH)
The MLB average hard-hit rate is approximately 35-37%. Elite hitters consistently post hard-hit rates of 45%+, with the best in baseball reaching 50%+.
Here's code to analyze hard-hit rate trends:
# Analyze hard-hit rate by outcome
def analyze_hard_hit_outcomes(df):
"""Analyze outcomes of hard-hit balls vs. other contact."""
batted_balls = df[df['launch_speed'].notna()].copy()
batted_balls['is_hard_hit'] = batted_balls['launch_speed'] >= 95
# Group by hard-hit status
outcomes = batted_balls.groupby('is_hard_hit').agg({
'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean(),
'estimated_ba_using_speedangle': 'mean',
'launch_speed': 'mean',
'launch_angle': 'mean'
}).round(3)
outcomes.columns = ['Hit_Rate', 'xBA', 'Avg_EV', 'Avg_LA']
outcomes.index = ['Not Hard Hit (<95)', 'Hard Hit (95+)']
return outcomes
hard_hit_analysis = analyze_hard_hit_outcomes(statcast_data)
print("\nHard-Hit vs. Non-Hard-Hit Outcomes:")
print(hard_hit_analysis)
# R version: Analyze hard-hit rate by outcome
analyze_hard_hit_outcomes <- function(df) {
batted_balls <- df %>%
filter(!is.na(launch_speed)) %>%
mutate(is_hard_hit = launch_speed >= 95)
outcomes <- batted_balls %>%
group_by(is_hard_hit) %>%
summarise(
hit_rate = mean(events %in% c('single', 'double', 'triple', 'home_run'),
na.rm = TRUE),
xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
avg_ev = mean(launch_speed, na.rm = TRUE),
avg_la = mean(launch_angle, na.rm = TRUE),
.groups = 'drop'
) %>%
mutate(across(where(is.numeric), round, 3))
return(outcomes)
}
hard_hit_analysis <- analyze_hard_hit_outcomes(judge_data)
print("Hard-Hit vs. Non-Hard-Hit Outcomes:")
print(hard_hit_analysis)
6.2.4 Case Study: Exit Velocity Leaders
Let's examine the 2024 exit velocity leaders to understand what elite power looks like:
# Fetch league-wide data for qualifying hitters (sample approach)
# Note: This would typically require aggregating data for all players
def get_league_leaders_ev(year=2024, min_pa=200):
"""
Get exit velocity leaders for a given season.
This is a conceptual example - full implementation would require
iterating through all players or using Baseball Savant's leaderboards.
"""
# Example data structure for demonstration
leaders_data = {
'Player': ['Aaron Judge', 'Giancarlo Stanton', 'Kyle Schwarber',
'Yordan Alvarez', 'Marcell Ozuna'],
'Avg_EV': [95.2, 94.8, 93.9, 93.5, 93.2],
'Max_EV': [122.4, 121.8, 119.5, 118.9, 118.2],
'Hard_Hit_Pct': [58.2, 55.7, 52.3, 51.8, 50.9],
'Barrel_Pct': [18.5, 17.2, 15.8, 15.1, 14.6],
'xwOBA': [.412, .385, .368, .372, .361]
}
df = pd.DataFrame(leaders_data)
return df
ev_leaders = get_league_leaders_ev(2024)
print("\n2024 Exit Velocity Leaders")
print("=" * 70)
print(ev_leaders.to_string(index=False))
# Calculate the difference from MLB average
mlb_avg_ev = 88.5
ev_leaders['EV_Above_Avg'] = ev_leaders['Avg_EV'] - mlb_avg_ev
print(f"\nMLB Average Exit Velocity: {mlb_avg_ev} mph")
print("\nDifference from League Average:")
print(ev_leaders[['Player', 'EV_Above_Avg']].to_string(index=False))
Key Insights from Exit Velocity Leaders:
- Aaron Judge consistently ranks among the top exit velocity producers, typically averaging 94-95 mph
- Elite exit velocity correlates with high Barrel% and xwOBA
- Players with 93+ mph average EV are almost exclusively power threats
- The gap between elite (95 mph) and average (89 mph) is significant - 6+ mph difference represents massive power disparity
library(baseballr)
library(dplyr)
library(tidyr)
# Fetch Statcast data for Aaron Judge (2024)
judge_data <- statcast_search_batters(
start_date = "2024-04-01",
end_date = "2024-10-01",
batterid = 592450 # Aaron Judge
)
# Calculate comprehensive exit velocity metrics
calculate_ev_metrics <- function(df) {
# Filter for batted balls only
batted_balls <- df %>%
filter(!is.na(launch_speed))
if (nrow(batted_balls) == 0) {
return(NULL)
}
metrics <- list(
avg_ev = mean(batted_balls$launch_speed, na.rm = TRUE),
max_ev = max(batted_balls$launch_speed, na.rm = TRUE),
min_ev = min(batted_balls$launch_speed, na.rm = TRUE),
ev_90th = quantile(batted_balls$launch_speed, 0.90, na.rm = TRUE),
ev_50th = median(batted_balls$launch_speed, na.rm = TRUE),
hard_hit_count = sum(batted_balls$launch_speed >= 95, na.rm = TRUE),
hard_hit_pct = mean(batted_balls$launch_speed >= 95, na.rm = TRUE) * 100,
soft_contact_pct = mean(batted_balls$launch_speed < 85, na.rm = TRUE) * 100,
medium_contact_pct = mean(batted_balls$launch_speed >= 85 &
batted_balls$launch_speed < 95, na.rm = TRUE) * 100,
batted_balls = nrow(batted_balls)
)
return(metrics)
}
# Calculate metrics
ev_metrics <- calculate_ev_metrics(judge_data)
# Display results
cat("Aaron Judge 2024 Exit Velocity Profile\n")
cat(strrep("=", 50), "\n")
cat(sprintf("Average Exit Velocity: %.1f mph\n", ev_metrics$avg_ev))
cat(sprintf("Maximum Exit Velocity: %.1f mph\n", ev_metrics$max_ev))
cat(sprintf("90th Percentile EV: %.1f mph\n", ev_metrics$ev_90th))
cat(sprintf("Median Exit Velocity: %.1f mph\n", ev_metrics$ev_50th))
cat("\nContact Distribution:\n")
cat(sprintf("Hard Hit Rate (≥95 mph): %.1f%%\n", ev_metrics$hard_hit_pct))
cat(sprintf("Medium Contact (85-94 mph): %.1f%%\n", ev_metrics$medium_contact_pct))
cat(sprintf("Soft Contact (<85 mph): %.1f%%\n", ev_metrics$soft_contact_pct))
cat(sprintf("\nTotal Batted Balls: %d\n", ev_metrics$batted_balls))
# R version: Analyze hard-hit rate by outcome
analyze_hard_hit_outcomes <- function(df) {
batted_balls <- df %>%
filter(!is.na(launch_speed)) %>%
mutate(is_hard_hit = launch_speed >= 95)
outcomes <- batted_balls %>%
group_by(is_hard_hit) %>%
summarise(
hit_rate = mean(events %in% c('single', 'double', 'triple', 'home_run'),
na.rm = TRUE),
xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
avg_ev = mean(launch_speed, na.rm = TRUE),
avg_la = mean(launch_angle, na.rm = TRUE),
.groups = 'drop'
) %>%
mutate(across(where(is.numeric), round, 3))
return(outcomes)
}
hard_hit_analysis <- analyze_hard_hit_outcomes(judge_data)
print("Hard-Hit vs. Non-Hard-Hit Outcomes:")
print(hard_hit_analysis)
import pandas as pd
from pybaseball import statcast_batter, playerid_lookup
import numpy as np
from datetime import datetime
# Get a player's ID (Aaron Judge example)
judge_id = playerid_lookup('judge', 'aaron')
player_id = 592450 # Aaron Judge
# Fetch Statcast data for 2024 season
start_date = '2024-04-01'
end_date = '2024-10-01'
statcast_data = statcast_batter(start_date, end_date, player_id)
# Calculate comprehensive exit velocity metrics
def calculate_ev_metrics(df):
"""
Calculate comprehensive exit velocity metrics from Statcast data.
Parameters:
df: DataFrame with Statcast data including 'launch_speed' column
Returns:
Dictionary of exit velocity metrics
"""
# Filter for batted balls only (exclude nulls)
batted_balls = df[df['launch_speed'].notna()].copy()
if len(batted_balls) == 0:
return None
metrics = {
'avg_ev': batted_balls['launch_speed'].mean(),
'max_ev': batted_balls['launch_speed'].max(),
'min_ev': batted_balls['launch_speed'].min(),
'ev_90th_percentile': batted_balls['launch_speed'].quantile(0.90),
'ev_50th_percentile': batted_balls['launch_speed'].median(),
'hard_hit_count': (batted_balls['launch_speed'] >= 95).sum(),
'hard_hit_pct': (batted_balls['launch_speed'] >= 95).mean() * 100,
'soft_contact_pct': (batted_balls['launch_speed'] < 85).mean() * 100,
'medium_contact_pct': ((batted_balls['launch_speed'] >= 85) &
(batted_balls['launch_speed'] < 95)).mean() * 100,
'batted_balls': len(batted_balls)
}
return metrics
# Calculate metrics
ev_metrics = calculate_ev_metrics(statcast_data)
# Display results
print("Aaron Judge 2024 Exit Velocity Profile")
print("=" * 50)
print(f"Average Exit Velocity: {ev_metrics['avg_ev']:.1f} mph")
print(f"Maximum Exit Velocity: {ev_metrics['max_ev']:.1f} mph")
print(f"90th Percentile EV: {ev_metrics['ev_90th_percentile']:.1f} mph")
print(f"Median Exit Velocity: {ev_metrics['ev_50th_percentile']:.1f} mph")
print(f"\nContact Distribution:")
print(f"Hard Hit Rate (≥95 mph): {ev_metrics['hard_hit_pct']:.1f}%")
print(f"Medium Contact (85-94 mph): {ev_metrics['medium_contact_pct']:.1f}%")
print(f"Soft Contact (<85 mph): {ev_metrics['soft_contact_pct']:.1f}%")
print(f"\nTotal Batted Balls: {ev_metrics['batted_balls']}")
# Analyze hard-hit rate by outcome
def analyze_hard_hit_outcomes(df):
"""Analyze outcomes of hard-hit balls vs. other contact."""
batted_balls = df[df['launch_speed'].notna()].copy()
batted_balls['is_hard_hit'] = batted_balls['launch_speed'] >= 95
# Group by hard-hit status
outcomes = batted_balls.groupby('is_hard_hit').agg({
'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean(),
'estimated_ba_using_speedangle': 'mean',
'launch_speed': 'mean',
'launch_angle': 'mean'
}).round(3)
outcomes.columns = ['Hit_Rate', 'xBA', 'Avg_EV', 'Avg_LA']
outcomes.index = ['Not Hard Hit (<95)', 'Hard Hit (95+)']
return outcomes
hard_hit_analysis = analyze_hard_hit_outcomes(statcast_data)
print("\nHard-Hit vs. Non-Hard-Hit Outcomes:")
print(hard_hit_analysis)
# Fetch league-wide data for qualifying hitters (sample approach)
# Note: This would typically require aggregating data for all players
def get_league_leaders_ev(year=2024, min_pa=200):
"""
Get exit velocity leaders for a given season.
This is a conceptual example - full implementation would require
iterating through all players or using Baseball Savant's leaderboards.
"""
# Example data structure for demonstration
leaders_data = {
'Player': ['Aaron Judge', 'Giancarlo Stanton', 'Kyle Schwarber',
'Yordan Alvarez', 'Marcell Ozuna'],
'Avg_EV': [95.2, 94.8, 93.9, 93.5, 93.2],
'Max_EV': [122.4, 121.8, 119.5, 118.9, 118.2],
'Hard_Hit_Pct': [58.2, 55.7, 52.3, 51.8, 50.9],
'Barrel_Pct': [18.5, 17.2, 15.8, 15.1, 14.6],
'xwOBA': [.412, .385, .368, .372, .361]
}
df = pd.DataFrame(leaders_data)
return df
ev_leaders = get_league_leaders_ev(2024)
print("\n2024 Exit Velocity Leaders")
print("=" * 70)
print(ev_leaders.to_string(index=False))
# Calculate the difference from MLB average
mlb_avg_ev = 88.5
ev_leaders['EV_Above_Avg'] = ev_leaders['Avg_EV'] - mlb_avg_ev
print(f"\nMLB Average Exit Velocity: {mlb_avg_ev} mph")
print("\nDifference from League Average:")
print(ev_leaders[['Player', 'EV_Above_Avg']].to_string(index=False))
6.3.1 Understanding Launch Angle
Launch Angle measures the vertical angle at which the ball leaves the bat, measured in degrees from horizontal. A ball hit straight into the ground would have a negative launch angle, while a ball hit straight up would be 90°.
Launch angle is categorized into four primary types:
| Category | Launch Angle Range | Expected Outcome | Typical BA |
|---|---|---|---|
| Ground Ball (GB) | < 10° | Mostly singles, some outs | .240 |
| Line Drive (LD) | 10° to 25° | High hit rate, XBH | .600+ |
| Fly Ball (FB) | 25° to 50° | Home runs or fly outs | .200-.250 |
| Pop-up (PU) | > 50° | Almost always outs | ~.020 |
Line drives (10-25°) have the highest batting average because they stay in the air long enough to get past infielders but not long enough for outfielders to comfortably track them down. However, they rarely result in home runs.
Fly balls (25-50°) are where home run power comes from. When combined with high exit velocity, fly balls in the 25-35° range become home runs. Without sufficient exit velocity, they become routine fly outs.
Ground balls (< 10°) can be effective for speedy players who can beat out infield hits, but generally result in lower production. The shift era (2015-2022) made pull-side ground balls particularly ineffective.
Pop-ups (> 50°) are almost universally negative outcomes, giving fielders ample time to position themselves.
6.3.2 The Optimal Launch Angle Debate
The "launch angle revolution" began around 2015-2016 when coaches and analysts realized that players were optimizing for the wrong outcomes. Traditionally, hitting coaches emphasized "staying on top of the ball" and hitting line drives. However, Statcast data revealed that slight uppercut swings producing launch angles of 25-35° with high exit velocity were the most valuable.
The Home Run Peak: Home runs are most common with launch angles between 25-35 degrees when exit velocity exceeds 95 mph. This discovery led to a league-wide increase in home runs from 2015-2019.
The Line Drive Counter-Argument: While 25-35° produces homers, line drives (10-25°) still have the highest BABIP (Batting Average on Balls In Play). Players who can consistently hit line drives with authority remain extremely valuable.
The Modern Approach: Elite hitters aim for the "sweet spot" - launch angles between 8-32 degrees - which balances the high BABIP of line drives with the power of fly balls.
6.3.3 Launch Angle Metrics with Code
def calculate_launch_angle_metrics(df):
"""
Calculate comprehensive launch angle distribution metrics.
Parameters:
df: Statcast DataFrame with 'launch_angle' column
Returns:
Dictionary of launch angle metrics
"""
batted_balls = df[df['launch_angle'].notna()].copy()
if len(batted_balls) == 0:
return None
# Categorize each batted ball
def categorize_launch_angle(la):
if la < 10:
return 'ground_ball'
elif la < 25:
return 'line_drive'
elif la < 50:
return 'fly_ball'
else:
return 'popup'
batted_balls['la_category'] = batted_balls['launch_angle'].apply(categorize_launch_angle)
metrics = {
'avg_la': batted_balls['launch_angle'].mean(),
'median_la': batted_balls['launch_angle'].median(),
'gb_count': (batted_balls['launch_angle'] < 10).sum(),
'gb_pct': (batted_balls['launch_angle'] < 10).mean() * 100,
'ld_count': ((batted_balls['launch_angle'] >= 10) &
(batted_balls['launch_angle'] < 25)).sum(),
'ld_pct': ((batted_balls['launch_angle'] >= 10) &
(batted_balls['launch_angle'] < 25)).mean() * 100,
'fb_count': ((batted_balls['launch_angle'] >= 25) &
(batted_balls['launch_angle'] < 50)).sum(),
'fb_pct': ((batted_balls['launch_angle'] >= 25) &
(batted_balls['launch_angle'] < 50)).mean() * 100,
'popup_count': (batted_balls['launch_angle'] >= 50).sum(),
'popup_pct': (batted_balls['launch_angle'] >= 50).mean() * 100,
'sweet_spot_count': ((batted_balls['launch_angle'] >= 8) &
(batted_balls['launch_angle'] <= 32)).sum(),
'sweet_spot_pct': ((batted_balls['launch_angle'] >= 8) &
(batted_balls['launch_angle'] <= 32)).mean() * 100
}
# Calculate performance by category
category_performance = batted_balls.groupby('la_category').agg({
'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean(),
'estimated_ba_using_speedangle': 'mean'
}).round(3)
metrics['category_performance'] = category_performance
return metrics
# Calculate and display
la_metrics = calculate_launch_angle_metrics(statcast_data)
print("\nLaunch Angle Distribution")
print("=" * 50)
print(f"Average Launch Angle: {la_metrics['avg_la']:.1f}°")
print(f"Median Launch Angle: {la_metrics['median_la']:.1f}°")
print(f"\nBatted Ball Distribution:")
print(f"Ground Balls (<10°): {la_metrics['gb_pct']:.1f}%")
print(f"Line Drives (10-25°): {la_metrics['ld_pct']:.1f}%")
print(f"Fly Balls (25-50°): {la_metrics['fb_pct']:.1f}%")
print(f"Pop-ups (>50°): {la_metrics['popup_pct']:.1f}%")
print(f"\nSweet Spot % (8-32°): {la_metrics['sweet_spot_pct']:.1f}%")
print("\nPerformance by Batted Ball Type:")
print(la_metrics['category_performance'])
# R version: Launch angle metrics
calculate_launch_angle_metrics <- function(df) {
batted_balls <- df %>%
filter(!is.na(launch_angle))
if (nrow(batted_balls) == 0) {
return(NULL)
}
# Categorize launch angles
batted_balls <- batted_balls %>%
mutate(
la_category = case_when(
launch_angle < 10 ~ 'ground_ball',
launch_angle < 25 ~ 'line_drive',
launch_angle < 50 ~ 'fly_ball',
TRUE ~ 'popup'
),
in_sweet_spot = launch_angle >= 8 & launch_angle <= 32
)
# Calculate metrics
metrics <- list(
avg_la = mean(batted_balls$launch_angle, na.rm = TRUE),
median_la = median(batted_balls$launch_angle, na.rm = TRUE),
gb_pct = mean(batted_balls$launch_angle < 10, na.rm = TRUE) * 100,
ld_pct = mean(batted_balls$launch_angle >= 10 &
batted_balls$launch_angle < 25, na.rm = TRUE) * 100,
fb_pct = mean(batted_balls$launch_angle >= 25 &
batted_balls$launch_angle < 50, na.rm = TRUE) * 100,
popup_pct = mean(batted_balls$launch_angle >= 50, na.rm = TRUE) * 100,
sweet_spot_pct = mean(batted_balls$in_sweet_spot, na.rm = TRUE) * 100
)
# Performance by category
category_performance <- batted_balls %>%
group_by(la_category) %>%
summarise(
hit_rate = mean(events %in% c('single', 'double', 'triple', 'home_run'),
na.rm = TRUE),
xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
count = n(),
.groups = 'drop'
)
metrics$category_performance <- category_performance
return(metrics)
}
la_metrics <- calculate_launch_angle_metrics(judge_data)
cat("\nLaunch Angle Distribution\n")
cat(strrep("=", 50), "\n")
cat(sprintf("Average Launch Angle: %.1f°\n", la_metrics$avg_la))
cat(sprintf("Median Launch Angle: %.1f°\n", la_metrics$median_la))
cat("\nBatted Ball Distribution:\n")
cat(sprintf("Ground Balls (<10°): %.1f%%\n", la_metrics$gb_pct))
cat(sprintf("Line Drives (10-25°): %.1f%%\n", la_metrics$ld_pct))
cat(sprintf("Fly Balls (25-50°): %.1f%%\n", la_metrics$fb_pct))
cat(sprintf("Pop-ups (>50°): %.1f%%\n", la_metrics$popup_pct))
cat(sprintf("\nSweet Spot %% (8-32°): %.1f%%\n", la_metrics$sweet_spot_pct))
cat("\nPerformance by Batted Ball Type:\n")
print(la_metrics$category_performance)
6.3.4 Sweet Spot Percentage
Sweet Spot Percentage represents the proportion of batted balls hit at launch angles between 8 and 32 degrees. This range combines the high BABIP of line drives with the power potential of fly balls.
Why 8-32°?
- Below 8°: Too many ground balls, lower BA
- 8-25°: Line drive range, highest BABIP
- 25-32°: Power range, home run potential with high EV
- Above 32°: Decreasing hit probability, more fly outs
The MLB average sweet spot percentage is approximately 33-35%. Elite hitters often achieve 40%+, demonstrating exceptional bat-to-ball skills and optimal swing planes.
Sweet Spot leaders tend to be complete hitters who combine contact skills with power. Players like Freddie Freeman, Mookie Betts, and Ronald Acuña Jr. consistently rank among the leaders in this metric.
# R version: Launch angle metrics
calculate_launch_angle_metrics <- function(df) {
batted_balls <- df %>%
filter(!is.na(launch_angle))
if (nrow(batted_balls) == 0) {
return(NULL)
}
# Categorize launch angles
batted_balls <- batted_balls %>%
mutate(
la_category = case_when(
launch_angle < 10 ~ 'ground_ball',
launch_angle < 25 ~ 'line_drive',
launch_angle < 50 ~ 'fly_ball',
TRUE ~ 'popup'
),
in_sweet_spot = launch_angle >= 8 & launch_angle <= 32
)
# Calculate metrics
metrics <- list(
avg_la = mean(batted_balls$launch_angle, na.rm = TRUE),
median_la = median(batted_balls$launch_angle, na.rm = TRUE),
gb_pct = mean(batted_balls$launch_angle < 10, na.rm = TRUE) * 100,
ld_pct = mean(batted_balls$launch_angle >= 10 &
batted_balls$launch_angle < 25, na.rm = TRUE) * 100,
fb_pct = mean(batted_balls$launch_angle >= 25 &
batted_balls$launch_angle < 50, na.rm = TRUE) * 100,
popup_pct = mean(batted_balls$launch_angle >= 50, na.rm = TRUE) * 100,
sweet_spot_pct = mean(batted_balls$in_sweet_spot, na.rm = TRUE) * 100
)
# Performance by category
category_performance <- batted_balls %>%
group_by(la_category) %>%
summarise(
hit_rate = mean(events %in% c('single', 'double', 'triple', 'home_run'),
na.rm = TRUE),
xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
count = n(),
.groups = 'drop'
)
metrics$category_performance <- category_performance
return(metrics)
}
la_metrics <- calculate_launch_angle_metrics(judge_data)
cat("\nLaunch Angle Distribution\n")
cat(strrep("=", 50), "\n")
cat(sprintf("Average Launch Angle: %.1f°\n", la_metrics$avg_la))
cat(sprintf("Median Launch Angle: %.1f°\n", la_metrics$median_la))
cat("\nBatted Ball Distribution:\n")
cat(sprintf("Ground Balls (<10°): %.1f%%\n", la_metrics$gb_pct))
cat(sprintf("Line Drives (10-25°): %.1f%%\n", la_metrics$ld_pct))
cat(sprintf("Fly Balls (25-50°): %.1f%%\n", la_metrics$fb_pct))
cat(sprintf("Pop-ups (>50°): %.1f%%\n", la_metrics$popup_pct))
cat(sprintf("\nSweet Spot %% (8-32°): %.1f%%\n", la_metrics$sweet_spot_pct))
cat("\nPerformance by Batted Ball Type:\n")
print(la_metrics$category_performance)
def calculate_launch_angle_metrics(df):
"""
Calculate comprehensive launch angle distribution metrics.
Parameters:
df: Statcast DataFrame with 'launch_angle' column
Returns:
Dictionary of launch angle metrics
"""
batted_balls = df[df['launch_angle'].notna()].copy()
if len(batted_balls) == 0:
return None
# Categorize each batted ball
def categorize_launch_angle(la):
if la < 10:
return 'ground_ball'
elif la < 25:
return 'line_drive'
elif la < 50:
return 'fly_ball'
else:
return 'popup'
batted_balls['la_category'] = batted_balls['launch_angle'].apply(categorize_launch_angle)
metrics = {
'avg_la': batted_balls['launch_angle'].mean(),
'median_la': batted_balls['launch_angle'].median(),
'gb_count': (batted_balls['launch_angle'] < 10).sum(),
'gb_pct': (batted_balls['launch_angle'] < 10).mean() * 100,
'ld_count': ((batted_balls['launch_angle'] >= 10) &
(batted_balls['launch_angle'] < 25)).sum(),
'ld_pct': ((batted_balls['launch_angle'] >= 10) &
(batted_balls['launch_angle'] < 25)).mean() * 100,
'fb_count': ((batted_balls['launch_angle'] >= 25) &
(batted_balls['launch_angle'] < 50)).sum(),
'fb_pct': ((batted_balls['launch_angle'] >= 25) &
(batted_balls['launch_angle'] < 50)).mean() * 100,
'popup_count': (batted_balls['launch_angle'] >= 50).sum(),
'popup_pct': (batted_balls['launch_angle'] >= 50).mean() * 100,
'sweet_spot_count': ((batted_balls['launch_angle'] >= 8) &
(batted_balls['launch_angle'] <= 32)).sum(),
'sweet_spot_pct': ((batted_balls['launch_angle'] >= 8) &
(batted_balls['launch_angle'] <= 32)).mean() * 100
}
# Calculate performance by category
category_performance = batted_balls.groupby('la_category').agg({
'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean(),
'estimated_ba_using_speedangle': 'mean'
}).round(3)
metrics['category_performance'] = category_performance
return metrics
# Calculate and display
la_metrics = calculate_launch_angle_metrics(statcast_data)
print("\nLaunch Angle Distribution")
print("=" * 50)
print(f"Average Launch Angle: {la_metrics['avg_la']:.1f}°")
print(f"Median Launch Angle: {la_metrics['median_la']:.1f}°")
print(f"\nBatted Ball Distribution:")
print(f"Ground Balls (<10°): {la_metrics['gb_pct']:.1f}%")
print(f"Line Drives (10-25°): {la_metrics['ld_pct']:.1f}%")
print(f"Fly Balls (25-50°): {la_metrics['fb_pct']:.1f}%")
print(f"Pop-ups (>50°): {la_metrics['popup_pct']:.1f}%")
print(f"\nSweet Spot % (8-32°): {la_metrics['sweet_spot_pct']:.1f}%")
print("\nPerformance by Batted Ball Type:")
print(la_metrics['category_performance'])
6.4.1 What is a Barrel?
A Barrel is Statcast's definition of "perfect contact" - a batted ball with the ideal combination of exit velocity and launch angle to produce the highest expected outcomes. The exact definition is complex because the optimal launch angle varies with exit velocity.
The Barrel Formula:
- At 98 mph exit velocity: Must be within 26-30° launch angle
- At 99 mph: The acceptable range expands slightly
- At 100+ mph: The "window" gets larger (24-33° at 100 mph)
- At 116+ mph: Nearly any non-ground ball angle qualifies
The key insight: harder hit balls are more forgiving of launch angle. A 116 mph ground ball might still be a hit, while a 98 mph ball needs perfect elevation.
Barrel outcomes are exceptional:
- Barrels have a batting average of .500+
- They have a slugging percentage of 1.500+
- They result in home runs approximately 30-40% of the time
- They're nearly impossible to defend
Here's the mathematical relationship:
def is_barrel(exit_velocity, launch_angle):
"""
Determine if a batted ball qualifies as a barrel based on MLB's definition.
Parameters:
exit_velocity: Exit velocity in mph
launch_angle: Launch angle in degrees
Returns:
Boolean indicating barrel status
"""
# Must be at least 98 mph
if exit_velocity < 98:
return False
# Define the acceptable launch angle range based on exit velocity
# These are approximations of MLB's actual formula
if 98 <= exit_velocity < 99:
return 26 <= launch_angle <= 30
elif 99 <= exit_velocity < 100:
return 25 <= launch_angle <= 31
elif 100 <= exit_velocity < 101:
return 24 <= launch_angle <= 33
elif 101 <= exit_velocity < 102:
return 23 <= launch_angle <= 34
elif 102 <= exit_velocity < 103:
return 22 <= launch_angle <= 35
elif 103 <= exit_velocity < 104:
return 21 <= launch_angle <= 36
elif 104 <= exit_velocity < 116:
return 20 <= launch_angle <= 37
else: # 116+ mph
return 8 <= launch_angle <= 50 # Very forgiving range
# Example usage
print(is_barrel(98, 28)) # True - perfect barrel
print(is_barrel(105, 29)) # True - high EV barrel
print(is_barrel(98, 40)) # False - too steep despite good EV
print(is_barrel(92, 28)) # False - EV too low
6.4.2 Barrel Rates with Code
def calculate_barrel_metrics(df):
"""
Calculate barrel-related metrics from Statcast data.
Note: Statcast data includes a 'barrel' column, but this shows
how to calculate it manually and derive additional insights.
"""
batted_balls = df[(df['launch_speed'].notna()) &
(df['launch_angle'].notna())].copy()
if len(batted_balls) == 0:
return None
# Calculate barrels (using Statcast's column if available)
if 'barrel' in batted_balls.columns:
batted_balls['is_barrel'] = batted_balls['barrel'] == 1
else:
# Calculate manually
batted_balls['is_barrel'] = batted_balls.apply(
lambda row: is_barrel(row['launch_speed'], row['launch_angle']),
axis=1
)
barrel_balls = batted_balls[batted_balls['is_barrel']]
metrics = {
'barrel_count': len(barrel_balls),
'barrel_pct': (len(barrel_balls) / len(batted_balls)) * 100,
'barrel_pa_pct': (len(barrel_balls) / len(df)) * 100, # Per plate appearance
'avg_barrel_ev': barrel_balls['launch_speed'].mean() if len(barrel_balls) > 0 else 0,
'avg_barrel_la': barrel_balls['launch_angle'].mean() if len(barrel_balls) > 0 else 0,
}
# Barrel outcomes
if len(barrel_balls) > 0:
barrel_outcomes = barrel_balls['events'].value_counts()
metrics['barrel_outcomes'] = barrel_outcomes
# Calculate barrel performance
hits = barrel_balls['events'].isin(['single', 'double', 'triple', 'home_run']).sum()
hr = barrel_balls['events'].eq('home_run').sum()
metrics['barrel_ba'] = hits / len(barrel_balls)
metrics['barrel_hr_pct'] = (hr / len(barrel_balls)) * 100
return metrics
barrel_metrics = calculate_barrel_metrics(statcast_data)
print("\nBarrel Analysis")
print("=" * 50)
print(f"Barrel Count: {barrel_metrics['barrel_count']}")
print(f"Barrel% (of batted balls): {barrel_metrics['barrel_pct']:.1f}%")
print(f"Barrel/PA%: {barrel_metrics['barrel_pa_pct']:.1f}%")
print(f"\nAverage Barrel Exit Velocity: {barrel_metrics['avg_barrel_ev']:.1f} mph")
print(f"Average Barrel Launch Angle: {barrel_metrics['avg_barrel_la']:.1f}°")
print(f"\nBarrel Batting Average: {barrel_metrics['barrel_ba']:.3f}")
print(f"Barrel HR Rate: {barrel_metrics['barrel_hr_pct']:.1f}%")
print("\nBarrel Outcomes:")
print(barrel_metrics['barrel_outcomes'])
# R version: Barrel metrics
calculate_barrel_metrics <- function(df) {
batted_balls <- df %>%
filter(!is.na(launch_speed), !is.na(launch_angle))
if (nrow(batted_balls) == 0) {
return(NULL)
}
# Use Statcast's barrel column if available
if ('barrel' %in% colnames(batted_balls)) {
batted_balls <- batted_balls %>%
mutate(is_barrel = barrel == 1)
}
barrel_balls <- batted_balls %>%
filter(is_barrel == TRUE)
metrics <- list(
barrel_count = nrow(barrel_balls),
barrel_pct = (nrow(barrel_balls) / nrow(batted_balls)) * 100,
barrel_pa_pct = (nrow(barrel_balls) / nrow(df)) * 100
)
if (nrow(barrel_balls) > 0) {
metrics$avg_barrel_ev <- mean(barrel_balls$launch_speed, na.rm = TRUE)
metrics$avg_barrel_la <- mean(barrel_balls$launch_angle, na.rm = TRUE)
barrel_hits <- barrel_balls %>%
filter(events %in% c('single', 'double', 'triple', 'home_run'))
barrel_hr <- barrel_balls %>%
filter(events == 'home_run')
metrics$barrel_ba <- nrow(barrel_hits) / nrow(barrel_balls)
metrics$barrel_hr_pct <- (nrow(barrel_hr) / nrow(barrel_balls)) * 100
metrics$barrel_outcomes <- barrel_balls %>%
count(events) %>%
arrange(desc(n))
}
return(metrics)
}
barrel_metrics <- calculate_barrel_metrics(judge_data)
cat("\nBarrel Analysis\n")
cat(strrep("=", 50), "\n")
cat(sprintf("Barrel Count: %d\n", barrel_metrics$barrel_count))
cat(sprintf("Barrel%% (of batted balls): %.1f%%\n", barrel_metrics$barrel_pct))
cat(sprintf("Barrel/PA%%: %.1f%%\n", barrel_metrics$barrel_pa_pct))
cat(sprintf("\nAverage Barrel Exit Velocity: %.1f mph\n", barrel_metrics$avg_barrel_ev))
cat(sprintf("Average Barrel Launch Angle: %.1f°\n", barrel_metrics$avg_barrel_la))
cat(sprintf("\nBarrel Batting Average: %.3f\n", barrel_metrics$barrel_ba))
cat(sprintf("Barrel HR Rate: %.1f%%\n", barrel_metrics$barrel_hr_pct))
cat("\nBarrel Outcomes:\n")
print(barrel_metrics$barrel_outcomes)
6.4.3 Barrel vs. Hard Hit Comparison
While related, Barrels and Hard-Hit Balls are distinct metrics:
| Metric | Definition | MLB Average | What It Measures |
|---|---|---|---|
| Hard-Hit % | % of BBE ≥95 mph | ~35-37% | Raw power, contact strength |
| Barrel % | % of BBE with optimal EV/LA combo | ~6-8% | Perfect contact quality |
Key Differences:
- All barrels are hard-hit (by definition ≥98 mph)
- NOT all hard-hit balls are barrels (many have poor launch angles)
- A 105 mph ground ball is hard-hit but NOT a barrel
- A 99 mph ball at 28° is both hard-hit AND a barrel
# Compare hard-hit vs. barrel rates
def compare_hard_hit_barrel(df):
"""Compare hard-hit and barrel classifications."""
batted_balls = df[(df['launch_speed'].notna()) &
(df['launch_angle'].notna())].copy()
batted_balls['is_hard_hit'] = batted_balls['launch_speed'] >= 95
if 'barrel' in batted_balls.columns:
batted_balls['is_barrel'] = batted_balls['barrel'] == 1
# Create comparison categories
batted_balls['category'] = 'Neither'
batted_balls.loc[batted_balls['is_hard_hit'], 'category'] = 'Hard-Hit Only'
batted_balls.loc[batted_balls['is_barrel'], 'category'] = 'Barrel'
# Analyze outcomes by category
comparison = batted_balls.groupby('category').agg({
'launch_speed': ['count', 'mean'],
'launch_angle': 'mean',
'estimated_ba_using_speedangle': 'mean',
'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean()
}).round(3)
comparison.columns = ['Count', 'Avg_EV', 'Avg_LA', 'xBA', 'Hit_Rate']
return comparison
comparison = compare_hard_hit_barrel(statcast_data)
print("\nHard-Hit vs. Barrel Comparison:")
print(comparison)
Insight: Barrels represent the intersection of power (high EV) and optimal trajectory (ideal LA). A player can have a high hard-hit rate with a low barrel rate if they consistently hit balls too low (ground balls) or too high (pop-ups).
# R version: Barrel metrics
calculate_barrel_metrics <- function(df) {
batted_balls <- df %>%
filter(!is.na(launch_speed), !is.na(launch_angle))
if (nrow(batted_balls) == 0) {
return(NULL)
}
# Use Statcast's barrel column if available
if ('barrel' %in% colnames(batted_balls)) {
batted_balls <- batted_balls %>%
mutate(is_barrel = barrel == 1)
}
barrel_balls <- batted_balls %>%
filter(is_barrel == TRUE)
metrics <- list(
barrel_count = nrow(barrel_balls),
barrel_pct = (nrow(barrel_balls) / nrow(batted_balls)) * 100,
barrel_pa_pct = (nrow(barrel_balls) / nrow(df)) * 100
)
if (nrow(barrel_balls) > 0) {
metrics$avg_barrel_ev <- mean(barrel_balls$launch_speed, na.rm = TRUE)
metrics$avg_barrel_la <- mean(barrel_balls$launch_angle, na.rm = TRUE)
barrel_hits <- barrel_balls %>%
filter(events %in% c('single', 'double', 'triple', 'home_run'))
barrel_hr <- barrel_balls %>%
filter(events == 'home_run')
metrics$barrel_ba <- nrow(barrel_hits) / nrow(barrel_balls)
metrics$barrel_hr_pct <- (nrow(barrel_hr) / nrow(barrel_balls)) * 100
metrics$barrel_outcomes <- barrel_balls %>%
count(events) %>%
arrange(desc(n))
}
return(metrics)
}
barrel_metrics <- calculate_barrel_metrics(judge_data)
cat("\nBarrel Analysis\n")
cat(strrep("=", 50), "\n")
cat(sprintf("Barrel Count: %d\n", barrel_metrics$barrel_count))
cat(sprintf("Barrel%% (of batted balls): %.1f%%\n", barrel_metrics$barrel_pct))
cat(sprintf("Barrel/PA%%: %.1f%%\n", barrel_metrics$barrel_pa_pct))
cat(sprintf("\nAverage Barrel Exit Velocity: %.1f mph\n", barrel_metrics$avg_barrel_ev))
cat(sprintf("Average Barrel Launch Angle: %.1f°\n", barrel_metrics$avg_barrel_la))
cat(sprintf("\nBarrel Batting Average: %.3f\n", barrel_metrics$barrel_ba))
cat(sprintf("Barrel HR Rate: %.1f%%\n", barrel_metrics$barrel_hr_pct))
cat("\nBarrel Outcomes:\n")
print(barrel_metrics$barrel_outcomes)
def is_barrel(exit_velocity, launch_angle):
"""
Determine if a batted ball qualifies as a barrel based on MLB's definition.
Parameters:
exit_velocity: Exit velocity in mph
launch_angle: Launch angle in degrees
Returns:
Boolean indicating barrel status
"""
# Must be at least 98 mph
if exit_velocity < 98:
return False
# Define the acceptable launch angle range based on exit velocity
# These are approximations of MLB's actual formula
if 98 <= exit_velocity < 99:
return 26 <= launch_angle <= 30
elif 99 <= exit_velocity < 100:
return 25 <= launch_angle <= 31
elif 100 <= exit_velocity < 101:
return 24 <= launch_angle <= 33
elif 101 <= exit_velocity < 102:
return 23 <= launch_angle <= 34
elif 102 <= exit_velocity < 103:
return 22 <= launch_angle <= 35
elif 103 <= exit_velocity < 104:
return 21 <= launch_angle <= 36
elif 104 <= exit_velocity < 116:
return 20 <= launch_angle <= 37
else: # 116+ mph
return 8 <= launch_angle <= 50 # Very forgiving range
# Example usage
print(is_barrel(98, 28)) # True - perfect barrel
print(is_barrel(105, 29)) # True - high EV barrel
print(is_barrel(98, 40)) # False - too steep despite good EV
print(is_barrel(92, 28)) # False - EV too low
def calculate_barrel_metrics(df):
"""
Calculate barrel-related metrics from Statcast data.
Note: Statcast data includes a 'barrel' column, but this shows
how to calculate it manually and derive additional insights.
"""
batted_balls = df[(df['launch_speed'].notna()) &
(df['launch_angle'].notna())].copy()
if len(batted_balls) == 0:
return None
# Calculate barrels (using Statcast's column if available)
if 'barrel' in batted_balls.columns:
batted_balls['is_barrel'] = batted_balls['barrel'] == 1
else:
# Calculate manually
batted_balls['is_barrel'] = batted_balls.apply(
lambda row: is_barrel(row['launch_speed'], row['launch_angle']),
axis=1
)
barrel_balls = batted_balls[batted_balls['is_barrel']]
metrics = {
'barrel_count': len(barrel_balls),
'barrel_pct': (len(barrel_balls) / len(batted_balls)) * 100,
'barrel_pa_pct': (len(barrel_balls) / len(df)) * 100, # Per plate appearance
'avg_barrel_ev': barrel_balls['launch_speed'].mean() if len(barrel_balls) > 0 else 0,
'avg_barrel_la': barrel_balls['launch_angle'].mean() if len(barrel_balls) > 0 else 0,
}
# Barrel outcomes
if len(barrel_balls) > 0:
barrel_outcomes = barrel_balls['events'].value_counts()
metrics['barrel_outcomes'] = barrel_outcomes
# Calculate barrel performance
hits = barrel_balls['events'].isin(['single', 'double', 'triple', 'home_run']).sum()
hr = barrel_balls['events'].eq('home_run').sum()
metrics['barrel_ba'] = hits / len(barrel_balls)
metrics['barrel_hr_pct'] = (hr / len(barrel_balls)) * 100
return metrics
barrel_metrics = calculate_barrel_metrics(statcast_data)
print("\nBarrel Analysis")
print("=" * 50)
print(f"Barrel Count: {barrel_metrics['barrel_count']}")
print(f"Barrel% (of batted balls): {barrel_metrics['barrel_pct']:.1f}%")
print(f"Barrel/PA%: {barrel_metrics['barrel_pa_pct']:.1f}%")
print(f"\nAverage Barrel Exit Velocity: {barrel_metrics['avg_barrel_ev']:.1f} mph")
print(f"Average Barrel Launch Angle: {barrel_metrics['avg_barrel_la']:.1f}°")
print(f"\nBarrel Batting Average: {barrel_metrics['barrel_ba']:.3f}")
print(f"Barrel HR Rate: {barrel_metrics['barrel_hr_pct']:.1f}%")
print("\nBarrel Outcomes:")
print(barrel_metrics['barrel_outcomes'])
# Compare hard-hit vs. barrel rates
def compare_hard_hit_barrel(df):
"""Compare hard-hit and barrel classifications."""
batted_balls = df[(df['launch_speed'].notna()) &
(df['launch_angle'].notna())].copy()
batted_balls['is_hard_hit'] = batted_balls['launch_speed'] >= 95
if 'barrel' in batted_balls.columns:
batted_balls['is_barrel'] = batted_balls['barrel'] == 1
# Create comparison categories
batted_balls['category'] = 'Neither'
batted_balls.loc[batted_balls['is_hard_hit'], 'category'] = 'Hard-Hit Only'
batted_balls.loc[batted_balls['is_barrel'], 'category'] = 'Barrel'
# Analyze outcomes by category
comparison = batted_balls.groupby('category').agg({
'launch_speed': ['count', 'mean'],
'launch_angle': 'mean',
'estimated_ba_using_speedangle': 'mean',
'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean()
}).round(3)
comparison.columns = ['Count', 'Avg_EV', 'Avg_LA', 'xBA', 'Hit_Rate']
return comparison
comparison = compare_hard_hit_barrel(statcast_data)
print("\nHard-Hit vs. Barrel Comparison:")
print(comparison)
6.5.1 The Philosophy: Removing Luck and Defense
Traditional batting statistics like batting average, slugging percentage, and wOBA tell us what happened. Expected statistics (xStats) tell us what should have happened based purely on the quality of contact, independent of:
- Defense: A great play by a fielder shouldn't penalize the hitter
- Luck: A bloop single and a line drive out have very different contact quality
- Park factors: In the moment of contact, the ballpark shouldn't matter
- Weather: Wind, temperature, humidity affect actual but not expected outcomes
The xStats Philosophy:
Every batted ball in MLB history with similar exit velocity and launch angle has produced a certain average outcome. By looking at thousands of comparable batted balls, we can determine the expected outcome of any new batted ball.
For example:
- All balls hit 105 mph at 28° have historically resulted in hits approximately 75% of the time
- Therefore, a new ball hit 105 mph at 28° has an xBA of .750 for that batted ball
- Sum these expected values across all plate appearances to get a player's season xBA
6.5.2 xBA (Expected Batting Average)
Expected Batting Average (xBA) estimates what a player's batting average should be based solely on the quality of contact, removing defensive plays and luck.
def calculate_xba_metrics(df):
"""
Calculate expected batting average metrics and compare to actual.
Uses Statcast's 'estimated_ba_using_speedangle' which is calculated
using exit velocity and launch angle comparisons to historical data.
"""
# Filter for balls in play (exclude strikeouts, walks, etc.)
batted_balls = df[df['estimated_ba_using_speedangle'].notna()].copy()
# Calculate actual outcomes
batted_balls['is_hit'] = batted_balls['events'].isin([
'single', 'double', 'triple', 'home_run'
])
metrics = {
'xBA': batted_balls['estimated_ba_using_speedangle'].mean(),
'actual_BA_on_contact': batted_balls['is_hit'].mean(),
'ba_diff': batted_balls['is_hit'].mean() -
batted_balls['estimated_ba_using_speedangle'].mean(),
'batted_balls': len(batted_balls)
}
# Calculate xBA by exit velocity bins
batted_balls['ev_bin'] = pd.cut(
batted_balls['launch_speed'],
bins=[0, 85, 95, 105, 125],
labels=['<85', '85-95', '95-105', '105+']
)
xba_by_ev = batted_balls.groupby('ev_bin').agg({
'estimated_ba_using_speedangle': 'mean',
'is_hit': 'mean',
'launch_speed': 'count'
}).round(3)
xba_by_ev.columns = ['xBA', 'Actual_BA', 'Count']
metrics['xba_by_ev'] = xba_by_ev
# Identify over/under-performers (individual batted balls)
batted_balls['xba_diff'] = (batted_balls['is_hit'].astype(int) -
batted_balls['estimated_ba_using_speedangle'])
# Find biggest outperformers (hits with low xBA)
lucky_hits = batted_balls[
(batted_balls['is_hit'] == True) &
(batted_balls['estimated_ba_using_speedangle'] < 0.300)
].nsmallest(5, 'estimated_ba_using_speedangle')[
['game_date', 'events', 'launch_speed', 'launch_angle',
'estimated_ba_using_speedangle']
]
metrics['lucky_hits'] = lucky_hits
return metrics
xba_metrics = calculate_xba_metrics(statcast_data)
print("\nExpected Batting Average (xBA) Analysis")
print("=" * 60)
print(f"Expected BA (xBA): {xba_metrics['xBA']:.3f}")
print(f"Actual BA on Contact: {xba_metrics['actual_BA_on_contact']:.3f}")
print(f"Difference (Actual - Expected): {xba_metrics['ba_diff']:+.3f}")
if xba_metrics['ba_diff'] > 0.020:
print(" → Player is outperforming contact quality (lucky or good speed)")
elif xba_metrics['ba_diff'] < -0.020:
print(" → Player is underperforming contact quality (unlucky or poor speed)")
else:
print(" → Performance matches contact quality")
print(f"\nxBA by Exit Velocity:")
print(xba_metrics['xba_by_ev'])
print("\nLuckiest Hits (Low xBA but resulted in hit):")
print(xba_metrics['lucky_hits'].to_string(index=False))
# R version: xBA metrics
calculate_xba_metrics <- function(df) {
batted_balls <- df %>%
filter(!is.na(estimated_ba_using_speedangle)) %>%
mutate(
is_hit = events %in% c('single', 'double', 'triple', 'home_run'),
ev_bin = cut(
launch_speed,
breaks = c(0, 85, 95, 105, 125),
labels = c('<85', '85-95', '95-105', '105+')
)
)
metrics <- list(
xBA = mean(batted_balls$estimated_ba_using_speedangle, na.rm = TRUE),
actual_BA = mean(batted_balls$is_hit, na.rm = TRUE),
batted_balls = nrow(batted_balls)
)
metrics$ba_diff <- metrics$actual_BA - metrics$xBA
# xBA by exit velocity
xba_by_ev <- batted_balls %>%
group_by(ev_bin) %>%
summarise(
xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
actual_BA = mean(is_hit, na.rm = TRUE),
count = n(),
.groups = 'drop'
) %>%
mutate(across(c(xBA, actual_BA), round, 3))
metrics$xba_by_ev <- xba_by_ev
# Lucky hits
lucky_hits <- batted_balls %>%
filter(is_hit == TRUE, estimated_ba_using_speedangle < 0.300) %>%
arrange(estimated_ba_using_speedangle) %>%
select(game_date, events, launch_speed, launch_angle,
estimated_ba_using_speedangle) %>%
head(5)
metrics$lucky_hits <- lucky_hits
return(metrics)
}
xba_metrics <- calculate_xba_metrics(judge_data)
cat("\nExpected Batting Average (xBA) Analysis\n")
cat(strrep("=", 60), "\n")
cat(sprintf("Expected BA (xBA): %.3f\n", xba_metrics$xBA))
cat(sprintf("Actual BA on Contact: %.3f\n", xba_metrics$actual_BA))
cat(sprintf("Difference (Actual - Expected): %+.3f\n", xba_metrics$ba_diff))
if (xba_metrics$ba_diff > 0.020) {
cat(" → Player is outperforming contact quality\n")
} else if (xba_metrics$ba_diff < -0.020) {
cat(" → Player is underperforming contact quality\n")
} else {
cat(" → Performance matches contact quality\n")
}
cat("\nxBA by Exit Velocity:\n")
print(xba_metrics$xba_by_ev)
cat("\nLuckiest Hits (Low xBA but resulted in hit):\n")
print(xba_metrics$lucky_hits)
Interpreting xBA Differences:
- xBA > Actual BA: Player has been unlucky or facing strong defensive positioning. Expect positive regression.
- Actual BA > xBA: Player has been lucky, has exceptional speed, or benefits from weak defensive positioning. Expect negative regression.
- Difference < ±.020: Performance matches contact quality - what you see is what you get.
6.5.3 xwOBA (Expected Weighted On-Base Average)
Expected wOBA (xwOBA) is the most comprehensive expected statistic. While xBA only considers hits vs. outs, xwOBA accounts for the type of hit (single, double, triple, home run) expected based on exit velocity and launch angle.
xwOBA is calculated similarly to wOBA (covered in Chapter 3), but uses expected outcomes:
- Each batted ball gets an expected wOBA value based on its EV/LA combination
- These are averaged across all plate appearances
- Walks and strikeouts are included at their actual wOBA values
Why xwOBA > xBA:
- xBA treats all hits equally
- xwOBA distinguishes between expected singles and expected home runs
- xwOBA provides a complete picture of offensive value
- xwOBA correlates more strongly with future performance
def calculate_xwoba_metrics(df):
"""
Analyze expected wOBA (xwOBA) compared to actual wOBA.
Note: Statcast provides 'estimated_woba_using_speedangle' which
represents the expected wOBA for each batted ball.
"""
# Calculate actual wOBA (simplified - using typical weights)
woba_weights = {
'walk': 0.69,
'hit_by_pitch': 0.72,
'single': 0.88,
'double': 1.24,
'triple': 1.56,
'home_run': 2.08
}
df_calc = df.copy()
df_calc['woba_value'] = df_calc['events'].map(woba_weights).fillna(0)
# Count plate appearances (excluding certain events)
pa_events = ~df_calc['events'].isin(['caught_stealing_2b', 'caught_stealing_3b',
'caught_stealing_home', 'pickoff_1b',
'pickoff_2b', 'pickoff_3b'])
actual_woba = df_calc.loc[pa_events, 'woba_value'].sum() / pa_events.sum()
# Expected wOBA from Statcast
batted_balls_xwoba = df_calc[df_calc['estimated_woba_using_speedangle'].notna()]
if len(batted_balls_xwoba) > 0:
xwoba = batted_balls_xwoba['estimated_woba_using_speedangle'].mean()
else:
xwoba = None
metrics = {
'actual_wOBA': actual_woba,
'xwOBA': xwoba,
'woba_diff': actual_woba - xwoba if xwoba else None,
'batted_balls': len(batted_balls_xwoba)
}
# xwOBA by launch angle category
if len(batted_balls_xwoba) > 0:
batted_balls_xwoba_copy = batted_balls_xwoba.copy()
batted_balls_xwoba_copy['la_category'] = pd.cut(
batted_balls_xwoba_copy['launch_angle'],
bins=[-90, 10, 25, 50, 90],
labels=['Ground Ball', 'Line Drive', 'Fly Ball', 'Pop-up']
)
xwoba_by_la = batted_balls_xwoba_copy.groupby('la_category').agg({
'estimated_woba_using_speedangle': 'mean',
'launch_speed': ['mean', 'count']
}).round(3)
xwoba_by_la.columns = ['xwOBA', 'Avg_EV', 'Count']
metrics['xwoba_by_la'] = xwoba_by_la
return metrics
xwoba_metrics = calculate_xwoba_metrics(statcast_data)
print("\nExpected wOBA (xwOBA) Analysis")
print("=" * 60)
print(f"Actual wOBA: {xwoba_metrics['actual_wOBA']:.3f}")
print(f"Expected wOBA (xwOBA): {xwoba_metrics['xwOBA']:.3f}")
print(f"Difference (Actual - Expected): {xwoba_metrics['woba_diff']:+.3f}")
if xwoba_metrics['woba_diff'] > 0.020:
print(" → Outperforming expected - possibly lucky or elite speed")
elif xwoba_metrics['woba_diff'] < -0.020:
print(" → Underperforming expected - regression likely upcoming")
else:
print(" → Performance matches expectations")
print("\nxwOBA by Batted Ball Type:")
print(xwoba_metrics['xwoba_by_la'])
# Interpretation guide
print("\nxwOBA Scale:")
print(" Excellent: .390+")
print(" Great: .360 - .389")
print(" Above Average: .330 - .359")
print(" Average: .310 - .329")
print(" Below Average: .290 - .309")
print(" Poor: < .290")
6.5.4 Interpreting xStat Differences
The difference between actual and expected stats is incredibly valuable for predictive analysis:
Large Positive Difference (Actual >> Expected):
- Player has been lucky with batted ball outcomes
- Weak defensive positioning by opponents
- Exceptional speed creating extra hits
- Prediction: Expect decline toward xStat level
Large Negative Difference (Actual << Expected):
- Player has been unlucky with batted ball outcomes
- Facing strong defensive positioning (shift effectiveness)
- Poor speed limiting infield hits
- Prediction: Expect improvement toward xStat level
Small Difference (|Actual - Expected| < .020):
- Performance matches underlying contact quality
- Prediction: Expect similar future performance
def identify_regression_candidates(df, threshold=0.030):
"""
Identify players likely to regress based on xwOBA difference.
Parameters:
df: Statcast data
threshold: Minimum difference to flag (default .030)
Returns:
Regression prediction and analysis
"""
xwoba_metrics = calculate_xwoba_metrics(df)
if xwoba_metrics['xwOBA'] is None:
return "Insufficient data for analysis"
diff = xwoba_metrics['woba_diff']
actual = xwoba_metrics['actual_wOBA']
expected = xwoba_metrics['xwOBA']
analysis = {
'actual_wOBA': actual,
'expected_wOBA': expected,
'difference': diff,
'regression_likely': abs(diff) >= threshold
}
if diff >= threshold:
analysis['prediction'] = 'NEGATIVE REGRESSION LIKELY'
analysis['reason'] = f'Actual wOBA ({actual:.3f}) significantly exceeds xwOBA ({expected:.3f})'
analysis['action'] = 'SELL HIGH - Performance likely unsustainable'
elif diff <= -threshold:
analysis['prediction'] = 'POSITIVE REGRESSION LIKELY'
analysis['reason'] = f'Actual wOBA ({actual:.3f}) significantly below xwOBA ({expected:.3f})'
analysis['action'] = 'BUY LOW - Improvement expected'
else:
analysis['prediction'] = 'PERFORMANCE SUSTAINABLE'
analysis['reason'] = f'Actual and expected wOBA closely aligned'
analysis['action'] = 'HOLD - What you see is what you get'
return analysis
regression_analysis = identify_regression_candidates(statcast_data)
print("\n" + "=" * 60)
print("REGRESSION ANALYSIS")
print("=" * 60)
for key, value in regression_analysis.items():
print(f"{key}: {value}")
# R version: xBA metrics
calculate_xba_metrics <- function(df) {
batted_balls <- df %>%
filter(!is.na(estimated_ba_using_speedangle)) %>%
mutate(
is_hit = events %in% c('single', 'double', 'triple', 'home_run'),
ev_bin = cut(
launch_speed,
breaks = c(0, 85, 95, 105, 125),
labels = c('<85', '85-95', '95-105', '105+')
)
)
metrics <- list(
xBA = mean(batted_balls$estimated_ba_using_speedangle, na.rm = TRUE),
actual_BA = mean(batted_balls$is_hit, na.rm = TRUE),
batted_balls = nrow(batted_balls)
)
metrics$ba_diff <- metrics$actual_BA - metrics$xBA
# xBA by exit velocity
xba_by_ev <- batted_balls %>%
group_by(ev_bin) %>%
summarise(
xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
actual_BA = mean(is_hit, na.rm = TRUE),
count = n(),
.groups = 'drop'
) %>%
mutate(across(c(xBA, actual_BA), round, 3))
metrics$xba_by_ev <- xba_by_ev
# Lucky hits
lucky_hits <- batted_balls %>%
filter(is_hit == TRUE, estimated_ba_using_speedangle < 0.300) %>%
arrange(estimated_ba_using_speedangle) %>%
select(game_date, events, launch_speed, launch_angle,
estimated_ba_using_speedangle) %>%
head(5)
metrics$lucky_hits <- lucky_hits
return(metrics)
}
xba_metrics <- calculate_xba_metrics(judge_data)
cat("\nExpected Batting Average (xBA) Analysis\n")
cat(strrep("=", 60), "\n")
cat(sprintf("Expected BA (xBA): %.3f\n", xba_metrics$xBA))
cat(sprintf("Actual BA on Contact: %.3f\n", xba_metrics$actual_BA))
cat(sprintf("Difference (Actual - Expected): %+.3f\n", xba_metrics$ba_diff))
if (xba_metrics$ba_diff > 0.020) {
cat(" → Player is outperforming contact quality\n")
} else if (xba_metrics$ba_diff < -0.020) {
cat(" → Player is underperforming contact quality\n")
} else {
cat(" → Performance matches contact quality\n")
}
cat("\nxBA by Exit Velocity:\n")
print(xba_metrics$xba_by_ev)
cat("\nLuckiest Hits (Low xBA but resulted in hit):\n")
print(xba_metrics$lucky_hits)
def calculate_xba_metrics(df):
"""
Calculate expected batting average metrics and compare to actual.
Uses Statcast's 'estimated_ba_using_speedangle' which is calculated
using exit velocity and launch angle comparisons to historical data.
"""
# Filter for balls in play (exclude strikeouts, walks, etc.)
batted_balls = df[df['estimated_ba_using_speedangle'].notna()].copy()
# Calculate actual outcomes
batted_balls['is_hit'] = batted_balls['events'].isin([
'single', 'double', 'triple', 'home_run'
])
metrics = {
'xBA': batted_balls['estimated_ba_using_speedangle'].mean(),
'actual_BA_on_contact': batted_balls['is_hit'].mean(),
'ba_diff': batted_balls['is_hit'].mean() -
batted_balls['estimated_ba_using_speedangle'].mean(),
'batted_balls': len(batted_balls)
}
# Calculate xBA by exit velocity bins
batted_balls['ev_bin'] = pd.cut(
batted_balls['launch_speed'],
bins=[0, 85, 95, 105, 125],
labels=['<85', '85-95', '95-105', '105+']
)
xba_by_ev = batted_balls.groupby('ev_bin').agg({
'estimated_ba_using_speedangle': 'mean',
'is_hit': 'mean',
'launch_speed': 'count'
}).round(3)
xba_by_ev.columns = ['xBA', 'Actual_BA', 'Count']
metrics['xba_by_ev'] = xba_by_ev
# Identify over/under-performers (individual batted balls)
batted_balls['xba_diff'] = (batted_balls['is_hit'].astype(int) -
batted_balls['estimated_ba_using_speedangle'])
# Find biggest outperformers (hits with low xBA)
lucky_hits = batted_balls[
(batted_balls['is_hit'] == True) &
(batted_balls['estimated_ba_using_speedangle'] < 0.300)
].nsmallest(5, 'estimated_ba_using_speedangle')[
['game_date', 'events', 'launch_speed', 'launch_angle',
'estimated_ba_using_speedangle']
]
metrics['lucky_hits'] = lucky_hits
return metrics
xba_metrics = calculate_xba_metrics(statcast_data)
print("\nExpected Batting Average (xBA) Analysis")
print("=" * 60)
print(f"Expected BA (xBA): {xba_metrics['xBA']:.3f}")
print(f"Actual BA on Contact: {xba_metrics['actual_BA_on_contact']:.3f}")
print(f"Difference (Actual - Expected): {xba_metrics['ba_diff']:+.3f}")
if xba_metrics['ba_diff'] > 0.020:
print(" → Player is outperforming contact quality (lucky or good speed)")
elif xba_metrics['ba_diff'] < -0.020:
print(" → Player is underperforming contact quality (unlucky or poor speed)")
else:
print(" → Performance matches contact quality")
print(f"\nxBA by Exit Velocity:")
print(xba_metrics['xba_by_ev'])
print("\nLuckiest Hits (Low xBA but resulted in hit):")
print(xba_metrics['lucky_hits'].to_string(index=False))
def calculate_xwoba_metrics(df):
"""
Analyze expected wOBA (xwOBA) compared to actual wOBA.
Note: Statcast provides 'estimated_woba_using_speedangle' which
represents the expected wOBA for each batted ball.
"""
# Calculate actual wOBA (simplified - using typical weights)
woba_weights = {
'walk': 0.69,
'hit_by_pitch': 0.72,
'single': 0.88,
'double': 1.24,
'triple': 1.56,
'home_run': 2.08
}
df_calc = df.copy()
df_calc['woba_value'] = df_calc['events'].map(woba_weights).fillna(0)
# Count plate appearances (excluding certain events)
pa_events = ~df_calc['events'].isin(['caught_stealing_2b', 'caught_stealing_3b',
'caught_stealing_home', 'pickoff_1b',
'pickoff_2b', 'pickoff_3b'])
actual_woba = df_calc.loc[pa_events, 'woba_value'].sum() / pa_events.sum()
# Expected wOBA from Statcast
batted_balls_xwoba = df_calc[df_calc['estimated_woba_using_speedangle'].notna()]
if len(batted_balls_xwoba) > 0:
xwoba = batted_balls_xwoba['estimated_woba_using_speedangle'].mean()
else:
xwoba = None
metrics = {
'actual_wOBA': actual_woba,
'xwOBA': xwoba,
'woba_diff': actual_woba - xwoba if xwoba else None,
'batted_balls': len(batted_balls_xwoba)
}
# xwOBA by launch angle category
if len(batted_balls_xwoba) > 0:
batted_balls_xwoba_copy = batted_balls_xwoba.copy()
batted_balls_xwoba_copy['la_category'] = pd.cut(
batted_balls_xwoba_copy['launch_angle'],
bins=[-90, 10, 25, 50, 90],
labels=['Ground Ball', 'Line Drive', 'Fly Ball', 'Pop-up']
)
xwoba_by_la = batted_balls_xwoba_copy.groupby('la_category').agg({
'estimated_woba_using_speedangle': 'mean',
'launch_speed': ['mean', 'count']
}).round(3)
xwoba_by_la.columns = ['xwOBA', 'Avg_EV', 'Count']
metrics['xwoba_by_la'] = xwoba_by_la
return metrics
xwoba_metrics = calculate_xwoba_metrics(statcast_data)
print("\nExpected wOBA (xwOBA) Analysis")
print("=" * 60)
print(f"Actual wOBA: {xwoba_metrics['actual_wOBA']:.3f}")
print(f"Expected wOBA (xwOBA): {xwoba_metrics['xwOBA']:.3f}")
print(f"Difference (Actual - Expected): {xwoba_metrics['woba_diff']:+.3f}")
if xwoba_metrics['woba_diff'] > 0.020:
print(" → Outperforming expected - possibly lucky or elite speed")
elif xwoba_metrics['woba_diff'] < -0.020:
print(" → Underperforming expected - regression likely upcoming")
else:
print(" → Performance matches expectations")
print("\nxwOBA by Batted Ball Type:")
print(xwoba_metrics['xwoba_by_la'])
# Interpretation guide
print("\nxwOBA Scale:")
print(" Excellent: .390+")
print(" Great: .360 - .389")
print(" Above Average: .330 - .359")
print(" Average: .310 - .329")
print(" Below Average: .290 - .309")
print(" Poor: < .290")
def identify_regression_candidates(df, threshold=0.030):
"""
Identify players likely to regress based on xwOBA difference.
Parameters:
df: Statcast data
threshold: Minimum difference to flag (default .030)
Returns:
Regression prediction and analysis
"""
xwoba_metrics = calculate_xwoba_metrics(df)
if xwoba_metrics['xwOBA'] is None:
return "Insufficient data for analysis"
diff = xwoba_metrics['woba_diff']
actual = xwoba_metrics['actual_wOBA']
expected = xwoba_metrics['xwOBA']
analysis = {
'actual_wOBA': actual,
'expected_wOBA': expected,
'difference': diff,
'regression_likely': abs(diff) >= threshold
}
if diff >= threshold:
analysis['prediction'] = 'NEGATIVE REGRESSION LIKELY'
analysis['reason'] = f'Actual wOBA ({actual:.3f}) significantly exceeds xwOBA ({expected:.3f})'
analysis['action'] = 'SELL HIGH - Performance likely unsustainable'
elif diff <= -threshold:
analysis['prediction'] = 'POSITIVE REGRESSION LIKELY'
analysis['reason'] = f'Actual wOBA ({actual:.3f}) significantly below xwOBA ({expected:.3f})'
analysis['action'] = 'BUY LOW - Improvement expected'
else:
analysis['prediction'] = 'PERFORMANCE SUSTAINABLE'
analysis['reason'] = f'Actual and expected wOBA closely aligned'
analysis['action'] = 'HOLD - What you see is what you get'
return analysis
regression_analysis = identify_regression_candidates(statcast_data)
print("\n" + "=" * 60)
print("REGRESSION ANALYSIS")
print("=" * 60)
for key, value in regression_analysis.items():
print(f"{key}: {value}")
6.6.1 Calculating Spray Angle
Spray Angle (also called Hit Direction) measures the horizontal angle at which a ball is hit:
- Negative angles: Opposite field (left field for RHH, right field for LHH)
- Zero degrees: Straightaway center field
- Positive angles: Pull side (right field for RHH, left field for LHH)
Statcast provides the hc_x and hc_y coordinates of where the ball landed or was fielded. We can calculate spray angle from these coordinates:
import numpy as np
def calculate_spray_angle(hc_x, hc_y, batter_side):
"""
Calculate spray angle from hit coordinates.
Parameters:
hc_x: Horizontal coordinate (Statcast coordinate system)
hc_y: Vertical coordinate (Statcast coordinate system)
batter_side: 'R' for right-handed, 'L' for left-handed
Returns:
Spray angle in degrees
"""
# Convert Statcast coordinates to spray angle
# Home plate is roughly at (125, 205) in Statcast coordinates
home_x, home_y = 125, 205
# Calculate relative position
rel_x = hc_x - home_x
rel_y = hc_y - home_y
# Calculate angle in radians, then convert to degrees
angle_rad = np.arctan2(rel_x, rel_y)
angle_deg = np.degrees(angle_rad)
# Adjust for batter handedness
# For LHH, flip the sign to maintain consistent pull/oppo definition
if batter_side == 'L':
angle_deg = -angle_deg
return angle_deg
def categorize_spray_direction(spray_angle):
"""
Categorize spray angle into pull, center, opposite field.
Standard definitions:
- Pull: > 15 degrees
- Center: -15 to 15 degrees
- Opposite: < -15 degrees
"""
if spray_angle > 15:
return 'Pull'
elif spray_angle < -15:
return 'Opposite'
else:
return 'Center'
# Example: Add spray metrics to dataframe
def add_spray_metrics(df):
"""Add spray angle and direction to Statcast dataframe."""
df_spray = df.copy()
# Calculate spray angle for each batted ball
df_spray['spray_angle'] = df_spray.apply(
lambda row: calculate_spray_angle(
row['hc_x'], row['hc_y'], row['stand']
) if pd.notna(row['hc_x']) else None,
axis=1
)
# Categorize direction
df_spray['spray_direction'] = df_spray['spray_angle'].apply(
lambda x: categorize_spray_direction(x) if pd.notna(x) else None
)
return df_spray
# Apply to our data
statcast_with_spray = add_spray_metrics(statcast_data)
6.6.2 Pull, Center, Opposite Field Breakdown
Understanding a hitter's spray tendencies is crucial for:
- Defensive positioning: Extreme pull hitters invite shifts
- Power assessment: Most home runs are pulled
- Pitch approach: Pull-heavy hitters struggle with away pitches
- Development: Learning to use the whole field
def analyze_spray_tendencies(df):
"""
Comprehensive spray chart analysis.
Analyzes distribution and performance by spray direction.
"""
# Add spray metrics if not already present
if 'spray_direction' not in df.columns:
df = add_spray_metrics(df)
spray_data = df[df['spray_direction'].notna()].copy()
if len(spray_data) == 0:
return None
# Distribution analysis
distribution = spray_data['spray_direction'].value_counts(normalize=True) * 100
# Performance by direction
performance = spray_data.groupby('spray_direction').agg({
'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean(),
'estimated_ba_using_speedangle': 'mean',
'estimated_woba_using_speedangle': 'mean',
'launch_speed': 'mean',
'launch_angle': 'mean'
}).round(3)
performance.columns = ['BA', 'xBA', 'xwOBA', 'Avg_EV', 'Avg_LA']
performance['Percentage'] = distribution.round(1)
# Home runs by direction
hr_data = spray_data[spray_data['events'] == 'home_run']
hr_distribution = hr_data['spray_direction'].value_counts()
metrics = {
'distribution': distribution.to_dict(),
'performance': performance,
'hr_distribution': hr_distribution.to_dict(),
'total_batted_balls': len(spray_data),
'total_hr': len(hr_data)
}
# Pull tendency score (-100 to +100)
# +100 = extreme pull, -100 = extreme opposite field
pull_pct = distribution.get('Pull', 0)
oppo_pct = distribution.get('Opposite', 0)
metrics['pull_tendency_score'] = pull_pct - oppo_pct
return metrics
spray_analysis = analyze_spray_tendencies(statcast_with_spray)
print("\nSpray Chart Analysis")
print("=" * 60)
print("\nBatted Ball Distribution:")
for direction, pct in spray_analysis['distribution'].items():
print(f" {direction}: {pct:.1f}%")
print(f"\nPull Tendency Score: {spray_analysis['pull_tendency_score']:.1f}")
if spray_analysis['pull_tendency_score'] > 20:
print(" → PULL-HEAVY hitter (vulnerable to shifts)")
elif spray_analysis['pull_tendency_score'] < -20:
print(" → OPPOSITE FIELD hitter (uses whole field)")
else:
print(" → BALANCED spray approach")
print("\nPerformance by Direction:")
print(spray_analysis['performance'])
print("\nHome Run Distribution:")
for direction, count in spray_analysis['hr_distribution'].items():
pct = (count / spray_analysis['total_hr']) * 100
print(f" {direction}: {count} ({pct:.1f}%)")
# R version: Spray analysis
library(ggplot2)
calculate_spray_angle <- function(hc_x, hc_y, batter_side) {
home_x <- 125
home_y <- 205
rel_x <- hc_x - home_x
rel_y <- hc_y - home_y
angle_rad <- atan2(rel_x, rel_y)
angle_deg <- angle_rad * 180 / pi
if (batter_side == 'L') {
angle_deg <- -angle_deg
}
return(angle_deg)
}
analyze_spray_tendencies <- function(df) {
spray_data <- df %>%
filter(!is.na(hc_x), !is.na(hc_y)) %>%
rowwise() %>%
mutate(
spray_angle = calculate_spray_angle(hc_x, hc_y, stand),
spray_direction = case_when(
spray_angle > 15 ~ 'Pull',
spray_angle < -15 ~ 'Opposite',
TRUE ~ 'Center'
)
) %>%
ungroup()
# Distribution
distribution <- spray_data %>%
count(spray_direction) %>%
mutate(percentage = n / sum(n) * 100)
# Performance by direction
performance <- spray_data %>%
group_by(spray_direction) %>%
summarise(
BA = mean(events %in% c('single', 'double', 'triple', 'home_run'),
na.rm = TRUE),
xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
xwOBA = mean(estimated_woba_using_speedangle, na.rm = TRUE),
avg_ev = mean(launch_speed, na.rm = TRUE),
avg_la = mean(launch_angle, na.rm = TRUE),
count = n(),
.groups = 'drop'
) %>%
mutate(across(c(BA, xBA, xwOBA, avg_ev, avg_la), round, 3))
# Pull tendency score
pull_pct <- distribution %>%
filter(spray_direction == 'Pull') %>%
pull(percentage)
oppo_pct <- distribution %>%
filter(spray_direction == 'Opposite') %>%
pull(percentage)
pull_tendency <- ifelse(length(pull_pct) > 0, pull_pct, 0) -
ifelse(length(oppo_pct) > 0, oppo_pct, 0)
list(
distribution = distribution,
performance = performance,
pull_tendency_score = pull_tendency,
spray_data = spray_data
)
}
spray_analysis <- analyze_spray_tendencies(judge_data)
cat("\nSpray Chart Analysis\n")
cat(strrep("=", 60), "\n")
cat("\nBatted Ball Distribution:\n")
print(spray_analysis$distribution)
cat(sprintf("\nPull Tendency Score: %.1f\n", spray_analysis$pull_tendency_score))
cat("\nPerformance by Direction:\n")
print(spray_analysis$performance)
6.6.3 The Shift Era and Its End
From approximately 2015-2022, MLB experienced the "Shift Era" where defensive positioning became increasingly extreme, particularly against pull-heavy left-handed hitters. Teams would position three or even four infielders on the pull side, creating a massive disadvantage for hitters who couldn't adjust.
Impact of Shifts:
- Pull-heavy hitters saw BABIP drops of 20-40 points
- Ground ball pull hitters were most affected
- Created incentive for launch angle revolution (hit over the shift)
- Some hitters learned to hit opposite field, others refused to adjust
2023 Rule Change:
MLB banned extreme shifts starting in 2023, requiring:
- Two infielders on each side of second base
- All infielders on the infield dirt when pitch is released
Post-Shift Results:
- BABIP increased league-wide by ~10 points
- Pull-heavy hitters benefited most
- Batting averages rose across the board
- Reduced the penalty for being pull-dominant
For historical analysis (2015-2022 data), spray tendencies were crucial for understanding player value. Post-2023, they're less impactful but still relevant for hitting approach and pitch coverage.
# R version: Spray analysis
library(ggplot2)
calculate_spray_angle <- function(hc_x, hc_y, batter_side) {
home_x <- 125
home_y <- 205
rel_x <- hc_x - home_x
rel_y <- hc_y - home_y
angle_rad <- atan2(rel_x, rel_y)
angle_deg <- angle_rad * 180 / pi
if (batter_side == 'L') {
angle_deg <- -angle_deg
}
return(angle_deg)
}
analyze_spray_tendencies <- function(df) {
spray_data <- df %>%
filter(!is.na(hc_x), !is.na(hc_y)) %>%
rowwise() %>%
mutate(
spray_angle = calculate_spray_angle(hc_x, hc_y, stand),
spray_direction = case_when(
spray_angle > 15 ~ 'Pull',
spray_angle < -15 ~ 'Opposite',
TRUE ~ 'Center'
)
) %>%
ungroup()
# Distribution
distribution <- spray_data %>%
count(spray_direction) %>%
mutate(percentage = n / sum(n) * 100)
# Performance by direction
performance <- spray_data %>%
group_by(spray_direction) %>%
summarise(
BA = mean(events %in% c('single', 'double', 'triple', 'home_run'),
na.rm = TRUE),
xBA = mean(estimated_ba_using_speedangle, na.rm = TRUE),
xwOBA = mean(estimated_woba_using_speedangle, na.rm = TRUE),
avg_ev = mean(launch_speed, na.rm = TRUE),
avg_la = mean(launch_angle, na.rm = TRUE),
count = n(),
.groups = 'drop'
) %>%
mutate(across(c(BA, xBA, xwOBA, avg_ev, avg_la), round, 3))
# Pull tendency score
pull_pct <- distribution %>%
filter(spray_direction == 'Pull') %>%
pull(percentage)
oppo_pct <- distribution %>%
filter(spray_direction == 'Opposite') %>%
pull(percentage)
pull_tendency <- ifelse(length(pull_pct) > 0, pull_pct, 0) -
ifelse(length(oppo_pct) > 0, oppo_pct, 0)
list(
distribution = distribution,
performance = performance,
pull_tendency_score = pull_tendency,
spray_data = spray_data
)
}
spray_analysis <- analyze_spray_tendencies(judge_data)
cat("\nSpray Chart Analysis\n")
cat(strrep("=", 60), "\n")
cat("\nBatted Ball Distribution:\n")
print(spray_analysis$distribution)
cat(sprintf("\nPull Tendency Score: %.1f\n", spray_analysis$pull_tendency_score))
cat("\nPerformance by Direction:\n")
print(spray_analysis$performance)
import numpy as np
def calculate_spray_angle(hc_x, hc_y, batter_side):
"""
Calculate spray angle from hit coordinates.
Parameters:
hc_x: Horizontal coordinate (Statcast coordinate system)
hc_y: Vertical coordinate (Statcast coordinate system)
batter_side: 'R' for right-handed, 'L' for left-handed
Returns:
Spray angle in degrees
"""
# Convert Statcast coordinates to spray angle
# Home plate is roughly at (125, 205) in Statcast coordinates
home_x, home_y = 125, 205
# Calculate relative position
rel_x = hc_x - home_x
rel_y = hc_y - home_y
# Calculate angle in radians, then convert to degrees
angle_rad = np.arctan2(rel_x, rel_y)
angle_deg = np.degrees(angle_rad)
# Adjust for batter handedness
# For LHH, flip the sign to maintain consistent pull/oppo definition
if batter_side == 'L':
angle_deg = -angle_deg
return angle_deg
def categorize_spray_direction(spray_angle):
"""
Categorize spray angle into pull, center, opposite field.
Standard definitions:
- Pull: > 15 degrees
- Center: -15 to 15 degrees
- Opposite: < -15 degrees
"""
if spray_angle > 15:
return 'Pull'
elif spray_angle < -15:
return 'Opposite'
else:
return 'Center'
# Example: Add spray metrics to dataframe
def add_spray_metrics(df):
"""Add spray angle and direction to Statcast dataframe."""
df_spray = df.copy()
# Calculate spray angle for each batted ball
df_spray['spray_angle'] = df_spray.apply(
lambda row: calculate_spray_angle(
row['hc_x'], row['hc_y'], row['stand']
) if pd.notna(row['hc_x']) else None,
axis=1
)
# Categorize direction
df_spray['spray_direction'] = df_spray['spray_angle'].apply(
lambda x: categorize_spray_direction(x) if pd.notna(x) else None
)
return df_spray
# Apply to our data
statcast_with_spray = add_spray_metrics(statcast_data)
def analyze_spray_tendencies(df):
"""
Comprehensive spray chart analysis.
Analyzes distribution and performance by spray direction.
"""
# Add spray metrics if not already present
if 'spray_direction' not in df.columns:
df = add_spray_metrics(df)
spray_data = df[df['spray_direction'].notna()].copy()
if len(spray_data) == 0:
return None
# Distribution analysis
distribution = spray_data['spray_direction'].value_counts(normalize=True) * 100
# Performance by direction
performance = spray_data.groupby('spray_direction').agg({
'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean(),
'estimated_ba_using_speedangle': 'mean',
'estimated_woba_using_speedangle': 'mean',
'launch_speed': 'mean',
'launch_angle': 'mean'
}).round(3)
performance.columns = ['BA', 'xBA', 'xwOBA', 'Avg_EV', 'Avg_LA']
performance['Percentage'] = distribution.round(1)
# Home runs by direction
hr_data = spray_data[spray_data['events'] == 'home_run']
hr_distribution = hr_data['spray_direction'].value_counts()
metrics = {
'distribution': distribution.to_dict(),
'performance': performance,
'hr_distribution': hr_distribution.to_dict(),
'total_batted_balls': len(spray_data),
'total_hr': len(hr_data)
}
# Pull tendency score (-100 to +100)
# +100 = extreme pull, -100 = extreme opposite field
pull_pct = distribution.get('Pull', 0)
oppo_pct = distribution.get('Opposite', 0)
metrics['pull_tendency_score'] = pull_pct - oppo_pct
return metrics
spray_analysis = analyze_spray_tendencies(statcast_with_spray)
print("\nSpray Chart Analysis")
print("=" * 60)
print("\nBatted Ball Distribution:")
for direction, pct in spray_analysis['distribution'].items():
print(f" {direction}: {pct:.1f}%")
print(f"\nPull Tendency Score: {spray_analysis['pull_tendency_score']:.1f}")
if spray_analysis['pull_tendency_score'] > 20:
print(" → PULL-HEAVY hitter (vulnerable to shifts)")
elif spray_analysis['pull_tendency_score'] < -20:
print(" → OPPOSITE FIELD hitter (uses whole field)")
else:
print(" → BALANCED spray approach")
print("\nPerformance by Direction:")
print(spray_analysis['performance'])
print("\nHome Run Distribution:")
for direction, count in spray_analysis['hr_distribution'].items():
pct = (count / spray_analysis['total_hr']) * 100
print(f" {direction}: {count} ({pct:.1f}%)")
6.7.1 Understanding Sprint Speed
Sprint Speed measures a player's maximum running speed in feet per second (ft/s). Unlike stolen base totals (which depend on opportunity and decision-making), sprint speed is a pure athleticism metric.
Statcast defines sprint speed as: "A player's fastest one-second window on competitive plays"
Competitive plays include:
- Home to first on ground balls or bunt hits
- First to third on singles
- Second to home on singles
- First to home on doubles
- Any baserunning advancement attempt
Sprint Speed Scale:
| Category | Sprint Speed (ft/s) | Examples |
|---|---|---|
| Elite | 30+ | Bobby Witt Jr., Elly De La Cruz, Ronald Acuña Jr. |
| Plus | 28.5 - 29.9 | Trea Turner, Jazz Chisholm, CJ Abrams |
| Above Average | 27.5 - 28.4 | Mookie Betts, Francisco Lindor |
| Average | 27.0 - 27.4 | League average |
| Below Average | 26.0 - 26.9 | Many DHs and corner players |
| Poor | < 26.0 | Slow-footed power hitters |
The MLB average sprint speed is approximately 27 ft/s (about 18.4 mph).
6.7.2 Sprint Speed Impact
Sprint speed affects baseball outcomes in multiple ways:
- Infield Hits: Fast runners beat out more ground balls
- BABIP: Higher speed = higher BABIP, especially on ground balls
- Extra Bases: Speed allows taking extra bases on hits
- Stolen Bases: Prerequisite for successful base stealing
- Defensive Range: Fast players cover more ground (for position players)
def analyze_sprint_speed_impact(df):
"""
Analyze how sprint speed correlates with offensive outcomes.
Note: Sprint speed data requires full season aggregation.
This example shows the analytical approach.
"""
# Filter for ground balls (most affected by speed)
ground_balls = df[
(df['launch_angle'].notna()) &
(df['launch_angle'] < 10)
].copy()
if len(ground_balls) == 0:
return None
# Analyze ground ball outcomes
ground_balls['is_hit'] = ground_balls['events'].isin([
'single', 'double', 'triple', 'home_run'
])
gb_analysis = {
'total_ground_balls': len(ground_balls),
'gb_hits': ground_balls['is_hit'].sum(),
'gb_hit_rate': ground_balls['is_hit'].mean(),
'avg_ev_on_gb': ground_balls['launch_speed'].mean(),
'infield_singles': len(ground_balls[
(ground_balls['events'] == 'single') &
(ground_balls['hit_distance_sc'] < 150)
])
}
# Calculate expected vs actual on ground balls
if 'estimated_ba_using_speedangle' in ground_balls.columns:
gb_xba = ground_balls['estimated_ba_using_speedangle'].mean()
gb_actual = ground_balls['is_hit'].mean()
gb_analysis['gb_xBA'] = gb_xba
gb_analysis['gb_actual_BA'] = gb_actual
gb_analysis['speed_boost'] = gb_actual - gb_xba
return gb_analysis
speed_impact = analyze_sprint_speed_impact(statcast_data)
print("\nSprint Speed Impact Analysis")
print("=" * 60)
print(f"Total Ground Balls: {speed_impact['total_ground_balls']}")
print(f"Ground Ball Hit Rate: {speed_impact['gb_hit_rate']:.3f}")
print(f"Average EV on GB: {speed_impact['avg_ev_on_gb']:.1f} mph")
print(f"Infield Singles: {speed_impact['infield_singles']}")
if 'speed_boost' in speed_impact:
print(f"\nGround Ball xBA: {speed_impact['gb_xBA']:.3f}")
print(f"Ground Ball Actual BA: {speed_impact['gb_actual_BA']:.3f}")
print(f"Speed Boost: {speed_impact['speed_boost']:+.3f}")
if speed_impact['speed_boost'] > 0.030:
print(" → Elite speed creating extra value on ground balls")
elif speed_impact['speed_boost'] < -0.030:
print(" → Poor speed costing value on ground balls")
6.7.3 xBA Adjustment for Speed
One limitation of standard xBA is that it doesn't account for player speed. A 70 mph ground ball has different hit probabilities for:
- Elite speed (30 ft/s): ~.350 BA
- Average speed (27 ft/s): ~.250 BA
- Poor speed (25 ft/s): ~.180 BA
More advanced models incorporate sprint speed into expected batting average calculations:
def calculate_speed_adjusted_xba(df, sprint_speed):
"""
Adjust xBA based on player's sprint speed.
This is a simplified model. MLB's official xBA doesn't account for speed,
but more advanced models do.
Parameters:
df: Statcast dataframe
sprint_speed: Player's sprint speed in ft/s
"""
batted_balls = df[df['estimated_ba_using_speedangle'].notna()].copy()
# Speed adjustment factor
# Average sprint speed is ~27 ft/s
# Each 1 ft/s above/below average adds/subtracts ~.015 to GB xBA
speed_adjustment = (sprint_speed - 27.0) * 0.015
# Apply adjustment only to ground balls where speed matters most
batted_balls['speed_adjusted_xba'] = batted_balls['estimated_ba_using_speedangle']
ground_ball_mask = batted_balls['launch_angle'] < 10
batted_balls.loc[ground_ball_mask, 'speed_adjusted_xba'] += speed_adjustment
# Calculate overall adjusted xBA
standard_xba = batted_balls['estimated_ba_using_speedangle'].mean()
adjusted_xba = batted_balls['speed_adjusted_xba'].mean()
results = {
'sprint_speed': sprint_speed,
'standard_xBA': standard_xba,
'speed_adjusted_xBA': adjusted_xba,
'adjustment': adjusted_xba - standard_xba,
'ground_ball_pct': ground_ball_mask.mean() * 100
}
return results
# Example for a fast player (e.g., Bobby Witt Jr. at 30.4 ft/s)
fast_player_xba = calculate_speed_adjusted_xba(statcast_data, sprint_speed=30.4)
print("\nSpeed-Adjusted xBA Analysis")
print("=" * 60)
print(f"Player Sprint Speed: {fast_player_xba['sprint_speed']:.1f} ft/s")
print(f"Standard xBA: {fast_player_xba['standard_xBA']:.3f}")
print(f"Speed-Adjusted xBA: {fast_player_xba['speed_adjusted_xBA']:.3f}")
print(f"Speed Value: {fast_player_xba['adjustment']:+.3f}")
print(f"Ground Ball Rate: {fast_player_xba['ground_ball_pct']:.1f}%")
Key Insight: Speed is most valuable for ground ball hitters. Fly ball power hitters gain minimal benefit from elite speed on batted balls (though it helps in baserunning).
def analyze_sprint_speed_impact(df):
"""
Analyze how sprint speed correlates with offensive outcomes.
Note: Sprint speed data requires full season aggregation.
This example shows the analytical approach.
"""
# Filter for ground balls (most affected by speed)
ground_balls = df[
(df['launch_angle'].notna()) &
(df['launch_angle'] < 10)
].copy()
if len(ground_balls) == 0:
return None
# Analyze ground ball outcomes
ground_balls['is_hit'] = ground_balls['events'].isin([
'single', 'double', 'triple', 'home_run'
])
gb_analysis = {
'total_ground_balls': len(ground_balls),
'gb_hits': ground_balls['is_hit'].sum(),
'gb_hit_rate': ground_balls['is_hit'].mean(),
'avg_ev_on_gb': ground_balls['launch_speed'].mean(),
'infield_singles': len(ground_balls[
(ground_balls['events'] == 'single') &
(ground_balls['hit_distance_sc'] < 150)
])
}
# Calculate expected vs actual on ground balls
if 'estimated_ba_using_speedangle' in ground_balls.columns:
gb_xba = ground_balls['estimated_ba_using_speedangle'].mean()
gb_actual = ground_balls['is_hit'].mean()
gb_analysis['gb_xBA'] = gb_xba
gb_analysis['gb_actual_BA'] = gb_actual
gb_analysis['speed_boost'] = gb_actual - gb_xba
return gb_analysis
speed_impact = analyze_sprint_speed_impact(statcast_data)
print("\nSprint Speed Impact Analysis")
print("=" * 60)
print(f"Total Ground Balls: {speed_impact['total_ground_balls']}")
print(f"Ground Ball Hit Rate: {speed_impact['gb_hit_rate']:.3f}")
print(f"Average EV on GB: {speed_impact['avg_ev_on_gb']:.1f} mph")
print(f"Infield Singles: {speed_impact['infield_singles']}")
if 'speed_boost' in speed_impact:
print(f"\nGround Ball xBA: {speed_impact['gb_xBA']:.3f}")
print(f"Ground Ball Actual BA: {speed_impact['gb_actual_BA']:.3f}")
print(f"Speed Boost: {speed_impact['speed_boost']:+.3f}")
if speed_impact['speed_boost'] > 0.030:
print(" → Elite speed creating extra value on ground balls")
elif speed_impact['speed_boost'] < -0.030:
print(" → Poor speed costing value on ground balls")
def calculate_speed_adjusted_xba(df, sprint_speed):
"""
Adjust xBA based on player's sprint speed.
This is a simplified model. MLB's official xBA doesn't account for speed,
but more advanced models do.
Parameters:
df: Statcast dataframe
sprint_speed: Player's sprint speed in ft/s
"""
batted_balls = df[df['estimated_ba_using_speedangle'].notna()].copy()
# Speed adjustment factor
# Average sprint speed is ~27 ft/s
# Each 1 ft/s above/below average adds/subtracts ~.015 to GB xBA
speed_adjustment = (sprint_speed - 27.0) * 0.015
# Apply adjustment only to ground balls where speed matters most
batted_balls['speed_adjusted_xba'] = batted_balls['estimated_ba_using_speedangle']
ground_ball_mask = batted_balls['launch_angle'] < 10
batted_balls.loc[ground_ball_mask, 'speed_adjusted_xba'] += speed_adjustment
# Calculate overall adjusted xBA
standard_xba = batted_balls['estimated_ba_using_speedangle'].mean()
adjusted_xba = batted_balls['speed_adjusted_xba'].mean()
results = {
'sprint_speed': sprint_speed,
'standard_xBA': standard_xba,
'speed_adjusted_xBA': adjusted_xba,
'adjustment': adjusted_xba - standard_xba,
'ground_ball_pct': ground_ball_mask.mean() * 100
}
return results
# Example for a fast player (e.g., Bobby Witt Jr. at 30.4 ft/s)
fast_player_xba = calculate_speed_adjusted_xba(statcast_data, sprint_speed=30.4)
print("\nSpeed-Adjusted xBA Analysis")
print("=" * 60)
print(f"Player Sprint Speed: {fast_player_xba['sprint_speed']:.1f} ft/s")
print(f"Standard xBA: {fast_player_xba['standard_xBA']:.3f}")
print(f"Speed-Adjusted xBA: {fast_player_xba['speed_adjusted_xBA']:.3f}")
print(f"Speed Value: {fast_player_xba['adjustment']:+.3f}")
print(f"Ground Ball Rate: {fast_player_xba['ground_ball_pct']:.1f}%")
6.8.1 Plate Coverage Analysis
Understanding where a hitter performs best in the strike zone reveals:
- Pitch coverage: Can they handle inside? Outside? Up? Down?
- Weaknesses: Where do they struggle? (Pitcher targeting zones)
- Approach adjustments: How have they adapted?
def analyze_plate_coverage(df):
"""
Analyze performance by pitch location zones.
Uses Statcast's zone classification:
- Zones 1-9: Inside the strike zone
- Zones 11-14: Outside the strike zone
"""
pitch_data = df[df['zone'].notna()].copy()
# Categorize zones
def categorize_zone(zone):
if zone in [1, 2, 3, 4, 5, 6, 7, 8, 9]:
return 'In Zone'
elif zone in [11, 12, 13, 14]:
return 'Out of Zone'
else:
return 'Other'
pitch_data['zone_category'] = pitch_data['zone'].apply(categorize_zone)
# Overall zone performance
zone_performance = pitch_data.groupby('zone_category').agg({
'description': lambda x: (x == 'hit_into_play').sum(), # Swings
'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).sum(),
'type': lambda x: (x == 'X').sum() # Balls in play
})
# More detailed zone breakdown
detailed_zones = pitch_data.groupby('zone').agg({
'type': 'count', # Total pitches
'description': lambda x: (x.isin(['hit_into_play', 'foul', 'swinging_strike'])).mean(), # Swing%
'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean() # Hit rate when swinging
}).round(3)
detailed_zones.columns = ['Pitches', 'Swing_Rate', 'Hit_Rate']
# High/Low/Middle breakdown
pitch_data['vertical_location'] = pd.cut(
pitch_data['plate_z'],
bins=[0, 2.0, 3.5, 5.0],
labels=['Low', 'Middle', 'High']
)
pitch_data['horizontal_location'] = pd.cut(
pitch_data['plate_x'],
bins=[-2.5, -0.5, 0.5, 2.5],
labels=['Inside', 'Middle', 'Outside']
)
location_performance = pitch_data.groupby(['vertical_location', 'horizontal_location']).agg({
'estimated_woba_using_speedangle': 'mean',
'launch_speed': 'mean',
'type': 'count'
}).round(3)
location_performance.columns = ['xwOBA', 'Avg_EV', 'Pitches']
return {
'zone_performance': zone_performance,
'detailed_zones': detailed_zones,
'location_performance': location_performance
}
coverage_analysis = analyze_plate_coverage(statcast_data)
print("\nPlate Coverage Analysis")
print("=" * 60)
print("\nPerformance by Zone Category:")
print(coverage_analysis['zone_performance'])
print("\nDetailed Zone Breakdown:")
print(coverage_analysis['detailed_zones'])
print("\nPerformance by Location (Vertical × Horizontal):")
print(coverage_analysis['location_performance'])
# R version: Plate coverage
analyze_plate_coverage <- function(df) {
pitch_data <- df %>%
filter(!is.na(zone)) %>%
mutate(
zone_category = case_when(
zone %in% 1:9 ~ 'In Zone',
zone %in% 11:14 ~ 'Out of Zone',
TRUE ~ 'Other'
),
vertical_location = cut(
plate_z,
breaks = c(0, 2.0, 3.5, 5.0),
labels = c('Low', 'Middle', 'High')
),
horizontal_location = cut(
plate_x,
breaks = c(-2.5, -0.5, 0.5, 2.5),
labels = c('Inside', 'Middle', 'Outside')
)
)
# Zone performance
zone_performance <- pitch_data %>%
group_by(zone_category) %>%
summarise(
pitches = n(),
swing_rate = mean(description %in% c('hit_into_play', 'foul',
'swinging_strike'), na.rm = TRUE),
contact_rate = mean(type == 'X', na.rm = TRUE),
.groups = 'drop'
)
# Location performance
location_performance <- pitch_data %>%
filter(!is.na(vertical_location), !is.na(horizontal_location)) %>%
group_by(vertical_location, horizontal_location) %>%
summarise(
xwOBA = mean(estimated_woba_using_speedangle, na.rm = TRUE),
avg_ev = mean(launch_speed, na.rm = TRUE),
pitches = n(),
.groups = 'drop'
) %>%
mutate(across(c(xwOBA, avg_ev), round, 3))
list(
zone_performance = zone_performance,
location_performance = location_performance
)
}
coverage <- analyze_plate_coverage(judge_data)
print("Plate Coverage Analysis:")
print(coverage$zone_performance)
print(coverage$location_performance)
6.8.2 Performance by Pitch Type
Different hitters have different strengths against pitch types:
def analyze_pitch_type_performance(df):
"""
Analyze performance against different pitch types.
Common pitch types:
- FF: Four-seam fastball
- SI: Sinker
- SL: Slider
- CH: Changeup
- CU: Curveball
- FC: Cutter
"""
pitch_data = df[df['pitch_type'].notna()].copy()
performance_by_pitch = pitch_data.groupby('pitch_type').agg({
'type': 'count', # Total pitches
'description': lambda x: (x.isin(['swinging_strike', 'foul',
'hit_into_play'])).mean(), # Swing%
'events': lambda x: (x == 'strikeout').mean(), # K rate
'estimated_woba_using_speedangle': 'mean', # xwOBA
'launch_speed': 'mean', # Avg EV
'launch_angle': 'mean' # Avg LA
}).round(3)
performance_by_pitch.columns = ['Pitches', 'Swing%', 'K%', 'xwOBA',
'Avg_EV', 'Avg_LA']
performance_by_pitch = performance_by_pitch.sort_values('Pitches',
ascending=False)
# Fastball vs. Offspeed
pitch_data['pitch_category'] = pitch_data['pitch_type'].apply(
lambda x: 'Fastball' if x in ['FF', 'SI', 'FC'] else 'Offspeed'
)
category_performance = pitch_data.groupby('pitch_category').agg({
'type': 'count',
'estimated_woba_using_speedangle': 'mean',
'launch_speed': 'mean',
'events': lambda x: (x == 'strikeout').mean()
}).round(3)
category_performance.columns = ['Pitches', 'xwOBA', 'Avg_EV', 'K%']
return {
'by_pitch_type': performance_by_pitch,
'by_category': category_performance
}
pitch_performance = analyze_pitch_type_performance(statcast_data)
print("\nPitch Type Performance Analysis")
print("=" * 60)
print("\nPerformance by Specific Pitch Type:")
print(pitch_performance['by_pitch_type'])
print("\nFastball vs. Offspeed:")
print(pitch_performance['by_category'])
6.8.3 Count-Based Performance
How hitters perform in different counts reveals approach and discipline:
def analyze_count_performance(df):
"""
Analyze performance by ball-strike count.
Key counts:
- Hitter's counts: 1-0, 2-0, 3-0, 2-1, 3-1
- Pitcher's counts: 0-1, 0-2, 1-2, 2-2
- Even counts: 0-0, 1-1
"""
count_data = df[df['balls'].notna() & df['strikes'].notna()].copy()
count_data['count_string'] = (count_data['balls'].astype(int).astype(str) +
'-' +
count_data['strikes'].astype(int).astype(str))
# Categorize counts
def categorize_count(count_str):
hitters_counts = ['1-0', '2-0', '3-0', '2-1', '3-1']
pitchers_counts = ['0-1', '0-2', '1-2', '2-2']
if count_str in hitters_counts:
return "Hitter's Count"
elif count_str in pitchers_counts:
return "Pitcher's Count"
else:
return "Even Count"
count_data['count_category'] = count_data['count_string'].apply(categorize_count)
# Performance by count category
category_performance = count_data.groupby('count_category').agg({
'type': 'count',
'estimated_woba_using_speedangle': 'mean',
'launch_speed': 'mean',
'description': lambda x: (x.isin(['swinging_strike', 'foul',
'hit_into_play'])).mean()
}).round(3)
category_performance.columns = ['Pitches', 'xwOBA', 'Avg_EV', 'Swing%']
# Detailed count performance
detailed_counts = count_data.groupby('count_string').agg({
'type': 'count',
'estimated_woba_using_speedangle': 'mean',
'launch_speed': 'mean'
}).round(3)
detailed_counts.columns = ['Pitches', 'xwOBA', 'Avg_EV']
detailed_counts = detailed_counts.sort_values('Pitches', ascending=False)
return {
'by_category': category_performance,
'by_count': detailed_counts
}
count_performance = analyze_count_performance(statcast_data)
print("\nCount-Based Performance Analysis")
print("=" * 60)
print("\nPerformance by Count Category:")
print(count_performance['by_category'])
print("\nTop 10 Counts by Frequency:")
print(count_performance['by_count'].head(10))
# R version: Plate coverage
analyze_plate_coverage <- function(df) {
pitch_data <- df %>%
filter(!is.na(zone)) %>%
mutate(
zone_category = case_when(
zone %in% 1:9 ~ 'In Zone',
zone %in% 11:14 ~ 'Out of Zone',
TRUE ~ 'Other'
),
vertical_location = cut(
plate_z,
breaks = c(0, 2.0, 3.5, 5.0),
labels = c('Low', 'Middle', 'High')
),
horizontal_location = cut(
plate_x,
breaks = c(-2.5, -0.5, 0.5, 2.5),
labels = c('Inside', 'Middle', 'Outside')
)
)
# Zone performance
zone_performance <- pitch_data %>%
group_by(zone_category) %>%
summarise(
pitches = n(),
swing_rate = mean(description %in% c('hit_into_play', 'foul',
'swinging_strike'), na.rm = TRUE),
contact_rate = mean(type == 'X', na.rm = TRUE),
.groups = 'drop'
)
# Location performance
location_performance <- pitch_data %>%
filter(!is.na(vertical_location), !is.na(horizontal_location)) %>%
group_by(vertical_location, horizontal_location) %>%
summarise(
xwOBA = mean(estimated_woba_using_speedangle, na.rm = TRUE),
avg_ev = mean(launch_speed, na.rm = TRUE),
pitches = n(),
.groups = 'drop'
) %>%
mutate(across(c(xwOBA, avg_ev), round, 3))
list(
zone_performance = zone_performance,
location_performance = location_performance
)
}
coverage <- analyze_plate_coverage(judge_data)
print("Plate Coverage Analysis:")
print(coverage$zone_performance)
print(coverage$location_performance)
def analyze_plate_coverage(df):
"""
Analyze performance by pitch location zones.
Uses Statcast's zone classification:
- Zones 1-9: Inside the strike zone
- Zones 11-14: Outside the strike zone
"""
pitch_data = df[df['zone'].notna()].copy()
# Categorize zones
def categorize_zone(zone):
if zone in [1, 2, 3, 4, 5, 6, 7, 8, 9]:
return 'In Zone'
elif zone in [11, 12, 13, 14]:
return 'Out of Zone'
else:
return 'Other'
pitch_data['zone_category'] = pitch_data['zone'].apply(categorize_zone)
# Overall zone performance
zone_performance = pitch_data.groupby('zone_category').agg({
'description': lambda x: (x == 'hit_into_play').sum(), # Swings
'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).sum(),
'type': lambda x: (x == 'X').sum() # Balls in play
})
# More detailed zone breakdown
detailed_zones = pitch_data.groupby('zone').agg({
'type': 'count', # Total pitches
'description': lambda x: (x.isin(['hit_into_play', 'foul', 'swinging_strike'])).mean(), # Swing%
'events': lambda x: (x.isin(['single', 'double', 'triple', 'home_run'])).mean() # Hit rate when swinging
}).round(3)
detailed_zones.columns = ['Pitches', 'Swing_Rate', 'Hit_Rate']
# High/Low/Middle breakdown
pitch_data['vertical_location'] = pd.cut(
pitch_data['plate_z'],
bins=[0, 2.0, 3.5, 5.0],
labels=['Low', 'Middle', 'High']
)
pitch_data['horizontal_location'] = pd.cut(
pitch_data['plate_x'],
bins=[-2.5, -0.5, 0.5, 2.5],
labels=['Inside', 'Middle', 'Outside']
)
location_performance = pitch_data.groupby(['vertical_location', 'horizontal_location']).agg({
'estimated_woba_using_speedangle': 'mean',
'launch_speed': 'mean',
'type': 'count'
}).round(3)
location_performance.columns = ['xwOBA', 'Avg_EV', 'Pitches']
return {
'zone_performance': zone_performance,
'detailed_zones': detailed_zones,
'location_performance': location_performance
}
coverage_analysis = analyze_plate_coverage(statcast_data)
print("\nPlate Coverage Analysis")
print("=" * 60)
print("\nPerformance by Zone Category:")
print(coverage_analysis['zone_performance'])
print("\nDetailed Zone Breakdown:")
print(coverage_analysis['detailed_zones'])
print("\nPerformance by Location (Vertical × Horizontal):")
print(coverage_analysis['location_performance'])
def analyze_pitch_type_performance(df):
"""
Analyze performance against different pitch types.
Common pitch types:
- FF: Four-seam fastball
- SI: Sinker
- SL: Slider
- CH: Changeup
- CU: Curveball
- FC: Cutter
"""
pitch_data = df[df['pitch_type'].notna()].copy()
performance_by_pitch = pitch_data.groupby('pitch_type').agg({
'type': 'count', # Total pitches
'description': lambda x: (x.isin(['swinging_strike', 'foul',
'hit_into_play'])).mean(), # Swing%
'events': lambda x: (x == 'strikeout').mean(), # K rate
'estimated_woba_using_speedangle': 'mean', # xwOBA
'launch_speed': 'mean', # Avg EV
'launch_angle': 'mean' # Avg LA
}).round(3)
performance_by_pitch.columns = ['Pitches', 'Swing%', 'K%', 'xwOBA',
'Avg_EV', 'Avg_LA']
performance_by_pitch = performance_by_pitch.sort_values('Pitches',
ascending=False)
# Fastball vs. Offspeed
pitch_data['pitch_category'] = pitch_data['pitch_type'].apply(
lambda x: 'Fastball' if x in ['FF', 'SI', 'FC'] else 'Offspeed'
)
category_performance = pitch_data.groupby('pitch_category').agg({
'type': 'count',
'estimated_woba_using_speedangle': 'mean',
'launch_speed': 'mean',
'events': lambda x: (x == 'strikeout').mean()
}).round(3)
category_performance.columns = ['Pitches', 'xwOBA', 'Avg_EV', 'K%']
return {
'by_pitch_type': performance_by_pitch,
'by_category': category_performance
}
pitch_performance = analyze_pitch_type_performance(statcast_data)
print("\nPitch Type Performance Analysis")
print("=" * 60)
print("\nPerformance by Specific Pitch Type:")
print(pitch_performance['by_pitch_type'])
print("\nFastball vs. Offspeed:")
print(pitch_performance['by_category'])
def analyze_count_performance(df):
"""
Analyze performance by ball-strike count.
Key counts:
- Hitter's counts: 1-0, 2-0, 3-0, 2-1, 3-1
- Pitcher's counts: 0-1, 0-2, 1-2, 2-2
- Even counts: 0-0, 1-1
"""
count_data = df[df['balls'].notna() & df['strikes'].notna()].copy()
count_data['count_string'] = (count_data['balls'].astype(int).astype(str) +
'-' +
count_data['strikes'].astype(int).astype(str))
# Categorize counts
def categorize_count(count_str):
hitters_counts = ['1-0', '2-0', '3-0', '2-1', '3-1']
pitchers_counts = ['0-1', '0-2', '1-2', '2-2']
if count_str in hitters_counts:
return "Hitter's Count"
elif count_str in pitchers_counts:
return "Pitcher's Count"
else:
return "Even Count"
count_data['count_category'] = count_data['count_string'].apply(categorize_count)
# Performance by count category
category_performance = count_data.groupby('count_category').agg({
'type': 'count',
'estimated_woba_using_speedangle': 'mean',
'launch_speed': 'mean',
'description': lambda x: (x.isin(['swinging_strike', 'foul',
'hit_into_play'])).mean()
}).round(3)
category_performance.columns = ['Pitches', 'xwOBA', 'Avg_EV', 'Swing%']
# Detailed count performance
detailed_counts = count_data.groupby('count_string').agg({
'type': 'count',
'estimated_woba_using_speedangle': 'mean',
'launch_speed': 'mean'
}).round(3)
detailed_counts.columns = ['Pitches', 'xwOBA', 'Avg_EV']
detailed_counts = detailed_counts.sort_values('Pitches', ascending=False)
return {
'by_category': category_performance,
'by_count': detailed_counts
}
count_performance = analyze_count_performance(statcast_data)
print("\nCount-Based Performance Analysis")
print("=" * 60)
print("\nPerformance by Count Category:")
print(count_performance['by_category'])
print("\nTop 10 Counts by Frequency:")
print(count_performance['by_count'].head(10))
6.9.1 The Essential Stats for a Profile
A complete Statcast hitter profile should include:
- Power Metrics: Exit velocity, hard-hit%, barrel%, max EV
- Contact Quality: xBA, xwOBA, sweet spot%
- Approach: Launch angle, GB/LD/FB distribution
- Speed: Sprint speed, impact on BABIP
- Spray: Pull tendency, field usage
- Discipline: Plate coverage, pitch type splits
- Context: vs. actual stats (luck/regression indicators)
def create_complete_hitter_profile(df, player_name, sprint_speed=27.0):
"""
Generate a comprehensive Statcast hitter profile.
Parameters:
df: Statcast dataframe for the player
player_name: Player's name
sprint_speed: Player's sprint speed (if available)
Returns:
Dictionary containing complete profile
"""
profile = {'player_name': player_name}
# 1. Power Metrics
ev_metrics = calculate_ev_metrics(df)
profile['power'] = {
'avg_exit_velocity': ev_metrics['avg_ev'],
'max_exit_velocity': ev_metrics['max_ev'],
'hard_hit_rate': ev_metrics['hard_hit_pct'],
'ev_90th_percentile': ev_metrics['ev_90th_percentile']
}
# 2. Barrel Metrics
barrel_metrics = calculate_barrel_metrics(df)
profile['barrels'] = {
'barrel_rate': barrel_metrics['barrel_pct'],
'barrel_pa_rate': barrel_metrics['barrel_pa_pct'],
'avg_barrel_ev': barrel_metrics['avg_barrel_ev']
}
# 3. Launch Angle / Contact Type
la_metrics = calculate_launch_angle_metrics(df)
profile['batted_ball_profile'] = {
'avg_launch_angle': la_metrics['avg_la'],
'gb_rate': la_metrics['gb_pct'],
'ld_rate': la_metrics['ld_pct'],
'fb_rate': la_metrics['fb_pct'],
'sweet_spot_rate': la_metrics['sweet_spot_pct']
}
# 4. Expected Stats
xba_metrics = calculate_xba_metrics(df)
xwoba_metrics = calculate_xwoba_metrics(df)
profile['expected_stats'] = {
'xBA': xba_metrics['xBA'],
'xwOBA': xwoba_metrics['xwOBA'],
'actual_wOBA': xwoba_metrics['actual_wOBA'],
'woba_diff': xwoba_metrics['woba_diff']
}
# 5. Speed
profile['speed'] = {
'sprint_speed': sprint_speed,
'speed_rating': 'Elite' if sprint_speed >= 30 else
'Plus' if sprint_speed >= 28.5 else
'Above Avg' if sprint_speed >= 27.5 else
'Average' if sprint_speed >= 27 else
'Below Avg'
}
# 6. Summary Statistics
total_pa = len(df)
batted_balls = len(df[df['launch_speed'].notna()])
profile['summary'] = {
'total_plate_appearances': total_pa,
'total_batted_balls': batted_balls,
'batted_ball_rate': (batted_balls / total_pa * 100) if total_pa > 0 else 0
}
return profile
def print_hitter_profile(profile):
"""Pretty print a hitter profile."""
print("\n" + "=" * 70)
print(f"STATCAST HITTER PROFILE: {profile['player_name']}")
print("=" * 70)
print("\n>>> POWER METRICS <<<")
power = profile['power']
print(f" Average Exit Velocity: {power['avg_exit_velocity']:.1f} mph")
print(f" Maximum Exit Velocity: {power['max_exit_velocity']:.1f} mph")
print(f" Hard-Hit Rate (95+ mph): {power['hard_hit_rate']:.1f}%")
print(f" 90th Percentile EV: {power['ev_90th_percentile']:.1f} mph")
print("\n>>> BARREL METRICS <<<")
barrels = profile['barrels']
print(f" Barrel Rate: {barrels['barrel_rate']:.1f}%")
print(f" Barrels per PA: {barrels['barrel_pa_rate']:.1f}%")
print(f" Avg Barrel Exit Velo: {barrels['avg_barrel_ev']:.1f} mph")
print("\n>>> BATTED BALL PROFILE <<<")
bb = profile['batted_ball_profile']
print(f" Average Launch Angle: {bb['avg_launch_angle']:.1f}°")
print(f" Ground Ball Rate: {bb['gb_rate']:.1f}%")
print(f" Line Drive Rate: {bb['ld_rate']:.1f}%")
print(f" Fly Ball Rate: {bb['fb_rate']:.1f}%")
print(f" Sweet Spot Rate (8-32°): {bb['sweet_spot_rate']:.1f}%")
print("\n>>> EXPECTED STATISTICS <<<")
xstats = profile['expected_stats']
print(f" Expected BA (xBA): {xstats['xBA']:.3f}")
print(f" Expected wOBA (xwOBA): {xstats['xwOBA']:.3f}")
print(f" Actual wOBA: {xstats['actual_wOBA']:.3f}")
print(f" wOBA Difference: {xstats['woba_diff']:+.3f}", end="")
if abs(xstats['woba_diff']) > 0.025:
if xstats['woba_diff'] > 0:
print(" (OUTPERFORMING - regression risk)")
else:
print(" (UNDERPERFORMING - positive regression likely)")
else:
print(" (sustainable)")
print("\n>>> SPEED METRICS <<<")
speed = profile['speed']
print(f" Sprint Speed: {speed['sprint_speed']:.1f} ft/s ({speed['speed_rating']})")
print("\n>>> SUMMARY <<<")
summary = profile['summary']
print(f" Total Plate Appearances: {summary['total_plate_appearances']}")
print(f" Total Batted Balls: {summary['total_batted_balls']}")
print(f" Batted Ball Rate: {summary['batted_ball_rate']:.1f}%")
print("\n" + "=" * 70)
# Create and display profile
player_profile = create_complete_hitter_profile(
statcast_data,
"Aaron Judge",
sprint_speed=27.5 # Judge's approximate sprint speed
)
print_hitter_profile(player_profile)
This comprehensive profile gives scouts, analysts, and fantasy players a complete picture of a hitter's true talent level, independent of luck and circumstance.
6.9.2 Visualizing a Complete Profile
While we can't generate actual plots in this text format, here's what a complete visualization suite should include:
1. Exit Velocity Distribution Histogram
- Shows the distribution of all exit velocities
- Highlights hard-hit balls (95+) and barrels (98+)
- Includes percentile markers
2. Launch Angle Distribution
- Histogram of launch angles
- Color-coded by outcome (HR, hit, out)
- Shows sweet spot zone (8-32°)
3. Spray Chart
- Visual representation of where balls are hit
- Sized by exit velocity
- Colored by outcome
- Shows pull/center/opposite tendencies
4. xwOBA vs. Actual wOBA Scatter
- Each point is a batted ball
- Shows over/underperformance
- Regression line indicates luck vs. skill
5. Heat Map by Pitch Location
- Strike zone divided into grid
- Color represents xwOBA or BA by zone
- Reveals weaknesses and strengths
6. Performance by Count
- Bar chart showing xwOBA in different counts
- Separates hitter's counts, pitcher's counts, even
7. Radar Chart
- Multi-dimensional profile
- Axes: Exit Velo, Barrel%, Sweet Spot%, xwOBA, Sprint Speed, etc.
- Compare to league average
# Example code structure for visualizations (requires matplotlib/seaborn)
import matplotlib.pyplot as plt
import seaborn as sns
def visualize_hitter_profile(df, player_name):
"""
Create comprehensive visualization suite for a hitter.
Requires matplotlib and seaborn libraries.
"""
fig, axes = plt.subplots(3, 3, figsize=(18, 15))
fig.suptitle(f'{player_name} - Complete Statcast Profile', fontsize=16)
batted_balls = df[df['launch_speed'].notna()].copy()
# 1. Exit Velocity Distribution
axes[0, 0].hist(batted_balls['launch_speed'], bins=30, edgecolor='black')
axes[0, 0].axvline(95, color='red', linestyle='--', label='Hard Hit (95mph)')
axes[0, 0].set_title('Exit Velocity Distribution')
axes[0, 0].set_xlabel('Exit Velocity (mph)')
axes[0, 0].legend()
# 2. Launch Angle Distribution
axes[0, 1].hist(batted_balls['launch_angle'], bins=40, edgecolor='black')
axes[0, 1].axvline(8, color='green', linestyle='--', label='Sweet Spot')
axes[0, 1].axvline(32, color='green', linestyle='--')
axes[0, 1].set_title('Launch Angle Distribution')
axes[0, 1].set_xlabel('Launch Angle (degrees)')
axes[0, 1].legend()
# 3. EV vs LA Scatter (colored by outcome)
scatter_data = batted_balls.copy()
scatter_data['is_hit'] = scatter_data['events'].isin([
'single', 'double', 'triple', 'home_run'
])
colors = scatter_data['is_hit'].map({True: 'green', False: 'red'})
axes[0, 2].scatter(scatter_data['launch_speed'],
scatter_data['launch_angle'],
c=colors, alpha=0.5, s=10)
axes[0, 2].set_title('Exit Velocity vs Launch Angle')
axes[0, 2].set_xlabel('Exit Velocity (mph)')
axes[0, 2].set_ylabel('Launch Angle (degrees)')
# Additional plots would follow...
# (Sprint speed gauge, xwOBA comparison, zone heat map, etc.)
plt.tight_layout()
return fig
# Note: This is example code structure - actual implementation would require
# data and proper visualization library setup
def create_complete_hitter_profile(df, player_name, sprint_speed=27.0):
"""
Generate a comprehensive Statcast hitter profile.
Parameters:
df: Statcast dataframe for the player
player_name: Player's name
sprint_speed: Player's sprint speed (if available)
Returns:
Dictionary containing complete profile
"""
profile = {'player_name': player_name}
# 1. Power Metrics
ev_metrics = calculate_ev_metrics(df)
profile['power'] = {
'avg_exit_velocity': ev_metrics['avg_ev'],
'max_exit_velocity': ev_metrics['max_ev'],
'hard_hit_rate': ev_metrics['hard_hit_pct'],
'ev_90th_percentile': ev_metrics['ev_90th_percentile']
}
# 2. Barrel Metrics
barrel_metrics = calculate_barrel_metrics(df)
profile['barrels'] = {
'barrel_rate': barrel_metrics['barrel_pct'],
'barrel_pa_rate': barrel_metrics['barrel_pa_pct'],
'avg_barrel_ev': barrel_metrics['avg_barrel_ev']
}
# 3. Launch Angle / Contact Type
la_metrics = calculate_launch_angle_metrics(df)
profile['batted_ball_profile'] = {
'avg_launch_angle': la_metrics['avg_la'],
'gb_rate': la_metrics['gb_pct'],
'ld_rate': la_metrics['ld_pct'],
'fb_rate': la_metrics['fb_pct'],
'sweet_spot_rate': la_metrics['sweet_spot_pct']
}
# 4. Expected Stats
xba_metrics = calculate_xba_metrics(df)
xwoba_metrics = calculate_xwoba_metrics(df)
profile['expected_stats'] = {
'xBA': xba_metrics['xBA'],
'xwOBA': xwoba_metrics['xwOBA'],
'actual_wOBA': xwoba_metrics['actual_wOBA'],
'woba_diff': xwoba_metrics['woba_diff']
}
# 5. Speed
profile['speed'] = {
'sprint_speed': sprint_speed,
'speed_rating': 'Elite' if sprint_speed >= 30 else
'Plus' if sprint_speed >= 28.5 else
'Above Avg' if sprint_speed >= 27.5 else
'Average' if sprint_speed >= 27 else
'Below Avg'
}
# 6. Summary Statistics
total_pa = len(df)
batted_balls = len(df[df['launch_speed'].notna()])
profile['summary'] = {
'total_plate_appearances': total_pa,
'total_batted_balls': batted_balls,
'batted_ball_rate': (batted_balls / total_pa * 100) if total_pa > 0 else 0
}
return profile
def print_hitter_profile(profile):
"""Pretty print a hitter profile."""
print("\n" + "=" * 70)
print(f"STATCAST HITTER PROFILE: {profile['player_name']}")
print("=" * 70)
print("\n>>> POWER METRICS <<<")
power = profile['power']
print(f" Average Exit Velocity: {power['avg_exit_velocity']:.1f} mph")
print(f" Maximum Exit Velocity: {power['max_exit_velocity']:.1f} mph")
print(f" Hard-Hit Rate (95+ mph): {power['hard_hit_rate']:.1f}%")
print(f" 90th Percentile EV: {power['ev_90th_percentile']:.1f} mph")
print("\n>>> BARREL METRICS <<<")
barrels = profile['barrels']
print(f" Barrel Rate: {barrels['barrel_rate']:.1f}%")
print(f" Barrels per PA: {barrels['barrel_pa_rate']:.1f}%")
print(f" Avg Barrel Exit Velo: {barrels['avg_barrel_ev']:.1f} mph")
print("\n>>> BATTED BALL PROFILE <<<")
bb = profile['batted_ball_profile']
print(f" Average Launch Angle: {bb['avg_launch_angle']:.1f}°")
print(f" Ground Ball Rate: {bb['gb_rate']:.1f}%")
print(f" Line Drive Rate: {bb['ld_rate']:.1f}%")
print(f" Fly Ball Rate: {bb['fb_rate']:.1f}%")
print(f" Sweet Spot Rate (8-32°): {bb['sweet_spot_rate']:.1f}%")
print("\n>>> EXPECTED STATISTICS <<<")
xstats = profile['expected_stats']
print(f" Expected BA (xBA): {xstats['xBA']:.3f}")
print(f" Expected wOBA (xwOBA): {xstats['xwOBA']:.3f}")
print(f" Actual wOBA: {xstats['actual_wOBA']:.3f}")
print(f" wOBA Difference: {xstats['woba_diff']:+.3f}", end="")
if abs(xstats['woba_diff']) > 0.025:
if xstats['woba_diff'] > 0:
print(" (OUTPERFORMING - regression risk)")
else:
print(" (UNDERPERFORMING - positive regression likely)")
else:
print(" (sustainable)")
print("\n>>> SPEED METRICS <<<")
speed = profile['speed']
print(f" Sprint Speed: {speed['sprint_speed']:.1f} ft/s ({speed['speed_rating']})")
print("\n>>> SUMMARY <<<")
summary = profile['summary']
print(f" Total Plate Appearances: {summary['total_plate_appearances']}")
print(f" Total Batted Balls: {summary['total_batted_balls']}")
print(f" Batted Ball Rate: {summary['batted_ball_rate']:.1f}%")
print("\n" + "=" * 70)
# Create and display profile
player_profile = create_complete_hitter_profile(
statcast_data,
"Aaron Judge",
sprint_speed=27.5 # Judge's approximate sprint speed
)
print_hitter_profile(player_profile)
# Example code structure for visualizations (requires matplotlib/seaborn)
import matplotlib.pyplot as plt
import seaborn as sns
def visualize_hitter_profile(df, player_name):
"""
Create comprehensive visualization suite for a hitter.
Requires matplotlib and seaborn libraries.
"""
fig, axes = plt.subplots(3, 3, figsize=(18, 15))
fig.suptitle(f'{player_name} - Complete Statcast Profile', fontsize=16)
batted_balls = df[df['launch_speed'].notna()].copy()
# 1. Exit Velocity Distribution
axes[0, 0].hist(batted_balls['launch_speed'], bins=30, edgecolor='black')
axes[0, 0].axvline(95, color='red', linestyle='--', label='Hard Hit (95mph)')
axes[0, 0].set_title('Exit Velocity Distribution')
axes[0, 0].set_xlabel('Exit Velocity (mph)')
axes[0, 0].legend()
# 2. Launch Angle Distribution
axes[0, 1].hist(batted_balls['launch_angle'], bins=40, edgecolor='black')
axes[0, 1].axvline(8, color='green', linestyle='--', label='Sweet Spot')
axes[0, 1].axvline(32, color='green', linestyle='--')
axes[0, 1].set_title('Launch Angle Distribution')
axes[0, 1].set_xlabel('Launch Angle (degrees)')
axes[0, 1].legend()
# 3. EV vs LA Scatter (colored by outcome)
scatter_data = batted_balls.copy()
scatter_data['is_hit'] = scatter_data['events'].isin([
'single', 'double', 'triple', 'home_run'
])
colors = scatter_data['is_hit'].map({True: 'green', False: 'red'})
axes[0, 2].scatter(scatter_data['launch_speed'],
scatter_data['launch_angle'],
c=colors, alpha=0.5, s=10)
axes[0, 2].set_title('Exit Velocity vs Launch Angle')
axes[0, 2].set_xlabel('Exit Velocity (mph)')
axes[0, 2].set_ylabel('Launch Angle (degrees)')
# Additional plots would follow...
# (Sprint speed gauge, xwOBA comparison, zone heat map, etc.)
plt.tight_layout()
return fig
# Note: This is example code structure - actual implementation would require
# data and proper visualization library setup
Statcast data's richness demands interactive visualization to fully explore the multidimensional relationships between exit velocity, launch angle, hit distance, spray direction, and outcomes. While static plots provide snapshots, interactive visualizations enable analysts to rotate 3D perspectives, filter by outcome types, and discover patterns that would otherwise remain hidden. This section demonstrates advanced interactive Statcast visualizations using Plotly in both Python and R.
6.10.1 3D Scatter Plot: Exit Velocity, Launch Angle, and Distance
The relationship between exit velocity, launch angle, and hit distance forms the foundation of batted ball physics. A 3D interactive scatter plot allows us to explore this relationship dynamically, rotating the view to understand optimal launch conditions and identify barrels visually.
Python Implementation with Plotly:
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from pybaseball import statcast_batter, playerid_lookup
import numpy as np
# Fetch Statcast data for a power hitter (Aaron Judge example)
player_id = 592450 # Aaron Judge
start_date = '2024-04-01'
end_date = '2024-10-01'
statcast_data = statcast_batter(start_date, end_date, player_id)
# Filter for batted balls with complete data
batted_balls = statcast_data[
(statcast_data['launch_speed'].notna()) &
(statcast_data['launch_angle'].notna()) &
(statcast_data['hit_distance_sc'].notna())
].copy()
# Classify outcomes for color coding
def classify_outcome(event):
if event in ['home_run']:
return 'Home Run'
elif event in ['triple', 'double']:
return 'Extra-Base Hit'
elif event in ['single']:
return 'Single'
else:
return 'Out'
batted_balls['outcome_type'] = batted_balls['events'].apply(classify_outcome)
# Create color mapping
color_map = {
'Home Run': '#FF0000',
'Extra-Base Hit': '#FFA500',
'Single': '#00FF00',
'Out': '#808080'
}
batted_balls['color'] = batted_balls['outcome_type'].map(color_map)
# Create 3D scatter plot
fig = go.Figure()
for outcome in ['Out', 'Single', 'Extra-Base Hit', 'Home Run']:
df_subset = batted_balls[batted_balls['outcome_type'] == outcome]
fig.add_trace(go.Scatter3d(
x=df_subset['launch_speed'],
y=df_subset['launch_angle'],
z=df_subset['hit_distance_sc'],
mode='markers',
name=outcome,
marker=dict(
size=4,
color=color_map[outcome],
opacity=0.7,
line=dict(width=0.5, color='DarkSlateGray')
),
text=[
f"<b>{outcome}</b><br>" +
f"EV: {ev:.1f} mph<br>" +
f"LA: {la:.1f}°<br>" +
f"Distance: {dist:.1f} ft<br>" +
f"Date: {date}"
for ev, la, dist, date in zip(
df_subset['launch_speed'],
df_subset['launch_angle'],
df_subset['hit_distance_sc'],
df_subset['game_date']
)
],
hoverinfo='text'
))
# Add barrel zone reference (simplified)
# Barrels: 98+ mph exit velo with launch angles between 26-30 degrees
barrel_ev = np.linspace(98, 120, 10)
barrel_la = np.linspace(26, 30, 10)
barrel_dist = np.linspace(375, 450, 10)
fig.add_trace(go.Scatter3d(
x=barrel_ev,
y=barrel_la,
z=barrel_dist,
mode='markers',
name='Barrel Zone Reference',
marker=dict(size=8, color='gold', symbol='diamond', opacity=0.5),
showlegend=True
))
# Update layout
fig.update_layout(
title='3D Batted Ball Profile: Exit Velocity × Launch Angle × Distance',
scene=dict(
xaxis_title='Exit Velocity (mph)',
yaxis_title='Launch Angle (degrees)',
zaxis_title='Hit Distance (feet)',
camera=dict(
eye=dict(x=1.5, y=1.5, z=1.3)
)
),
width=1000,
height=800,
showlegend=True,
legend=dict(
x=0.02,
y=0.98,
bgcolor='rgba(255, 255, 255, 0.8)'
),
font=dict(size=12)
)
fig.show()
# fig.write_html('3d_batted_ball_profile.html')
R Implementation:
library(plotly)
library(dplyr)
library(baseballr)
# Fetch Statcast data
player_id <- 592450 # Aaron Judge
statcast_data <- statcast_search_batters(
start_date = "2024-04-01",
end_date = "2024-10-01",
batterid = player_id
)
# Filter and prepare data
batted_balls <- statcast_data %>%
filter(
!is.na(launch_speed),
!is.na(launch_angle),
!is.na(hit_distance_sc)
) %>%
mutate(
outcome_type = case_when(
events == "home_run" ~ "Home Run",
events %in% c("triple", "double") ~ "Extra-Base Hit",
events == "single" ~ "Single",
TRUE ~ "Out"
),
color = case_when(
outcome_type == "Home Run" ~ "#FF0000",
outcome_type == "Extra-Base Hit" ~ "#FFA500",
outcome_type == "Single" ~ "#00FF00",
outcome_type == "Out" ~ "#808080"
)
)
# Create 3D scatter plot
fig <- plot_ly(
data = batted_balls,
x = ~launch_speed,
y = ~launch_angle,
z = ~hit_distance_sc,
color = ~outcome_type,
colors = c(
"Home Run" = "#FF0000",
"Extra-Base Hit" = "#FFA500",
"Single" = "#00FF00",
"Out" = "#808080"
),
type = 'scatter3d',
mode = 'markers',
marker = list(
size = 4,
opacity = 0.7,
line = list(width = 0.5, color = 'rgba(50, 50, 50, 0.5)')
),
text = ~paste0(
"<b>", outcome_type, "</b><br>",
"EV: ", round(launch_speed, 1), " mph<br>",
"LA: ", round(launch_angle, 1), "°<br>",
"Distance: ", round(hit_distance_sc, 1), " ft<br>",
"Date: ", game_date
),
hoverinfo = 'text'
) %>%
layout(
title = list(
text = "3D Batted Ball Profile: Exit Velocity × Launch Angle × Distance",
font = list(size = 14)
),
scene = list(
xaxis = list(title = "Exit Velocity (mph)"),
yaxis = list(title = "Launch Angle (degrees)"),
zaxis = list(title = "Hit Distance (feet)"),
camera = list(
eye = list(x = 1.5, y = 1.5, z = 1.3)
)
),
width = 1000,
height = 800,
showlegend = TRUE,
legend = list(
x = 0.02,
y = 0.98,
bgcolor = 'rgba(255, 255, 255, 0.8)'
)
)
fig
# htmlwidgets::saveWidget(fig, "3d_batted_ball_profile.html")
Key Insights from 3D Visualization:
- Optimal Launch Windows: Visually identify the exit velocity and launch angle combinations that produce the longest distances
- Barrel Recognition: Home runs cluster in specific regions of the 3D space
- Outcome Patterns: Outs dominate at extreme launch angles (too high or too low) regardless of exit velocity
- Power Threshold: A clear velocity threshold exists below which home runs become extremely rare
6.10.2 Interactive Spray Chart with Hit Data
Spray charts visualize where batted balls land on the field, revealing hitter tendencies, shift vulnerabilities, and approach patterns. An interactive spray chart with hover data transforms this classic visualization into a powerful analytical tool.
Python Implementation:
import plotly.graph_objects as go
from matplotlib.patches import Arc
import numpy as np
# Filter for balls in play
balls_in_play = statcast_data[
(statcast_data['hc_x'].notna()) &
(statcast_data['hc_y'].notna())
].copy()
# Classify hit outcomes and directions
def get_hit_value(event):
"""Assign numeric value to hit outcomes"""
if event == 'home_run':
return 4
elif event == 'triple':
return 3
elif event == 'double':
return 2
elif event == 'single':
return 1
else:
return 0
balls_in_play['hit_value'] = balls_in_play['events'].apply(get_hit_value)
balls_in_play['is_hit'] = balls_in_play['hit_value'] > 0
# Create spray chart
fig = go.Figure()
# Add outs
outs = balls_in_play[~balls_in_play['is_hit']]
fig.add_trace(go.Scatter(
x=outs['hc_x'],
y=outs['hc_y'],
mode='markers',
name='Outs',
marker=dict(
size=8,
color='lightgray',
symbol='circle',
opacity=0.5,
line=dict(width=0.5, color='gray')
),
text=[
f"<b>Out</b><br>" +
f"EV: {ev:.1f} mph<br>" +
f"LA: {la:.1f}°<br>" +
f"Distance: {dist:.0f} ft"
for ev, la, dist in zip(
outs['launch_speed'],
outs['launch_angle'],
outs['hit_distance_sc'].fillna(0)
)
],
hoverinfo='text'
))
# Add hits by type
hit_colors = {1: '#90EE90', 2: '#4169E1', 3: '#FF8C00', 4: '#FF0000'}
hit_names = {1: 'Single', 2: 'Double', 3: 'Triple', 4: 'Home Run'}
for hit_val in [1, 2, 3, 4]:
hits = balls_in_play[balls_in_play['hit_value'] == hit_val]
if len(hits) > 0:
fig.add_trace(go.Scatter(
x=hits['hc_x'],
y=hits['hc_y'],
mode='markers',
name=hit_names[hit_val],
marker=dict(
size=10,
color=hit_colors[hit_val],
symbol='circle',
opacity=0.8,
line=dict(width=1, color='black')
),
text=[
f"<b>{hit_names[hit_val]}</b><br>" +
f"EV: {ev:.1f} mph<br>" +
f"LA: {la:.1f}°<br>" +
f"Distance: {dist:.0f} ft<br>" +
f"Date: {date}"
for ev, la, dist, date in zip(
hits['launch_speed'],
hits['launch_angle'],
hits['hit_distance_sc'].fillna(0),
hits['game_date']
)
],
hoverinfo='text'
))
# Add field dimensions (simplified diamond)
# Home plate at approximately (125, 200) in Statcast coordinates
fig.add_shape(type="line", x0=125, y0=200, x1=125, y1=50,
line=dict(color="green", width=2)) # Center field line
# Update layout for baseball field appearance
fig.update_layout(
title='Interactive Spray Chart - 2024 Season',
xaxis=dict(
title='Horizontal Position',
range=[0, 250],
showgrid=False,
zeroline=False
),
yaxis=dict(
title='Distance from Home Plate',
range=[0, 300],
showgrid=False,
zeroline=False,
scaleanchor="x",
scaleratio=1
),
plot_bgcolor='rgba(34, 139, 34, 0.1)', # Light green background
width=900,
height=900,
showlegend=True,
legend=dict(
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="right",
x=1
),
hovermode='closest'
)
fig.show()
R Implementation:
library(plotly)
library(dplyr)
# Filter for balls in play
balls_in_play <- statcast_data %>%
filter(!is.na(hc_x), !is.na(hc_y)) %>%
mutate(
hit_value = case_when(
events == "home_run" ~ 4,
events == "triple" ~ 3,
events == "double" ~ 2,
events == "single" ~ 1,
TRUE ~ 0
),
is_hit = hit_value > 0,
outcome_label = case_when(
hit_value == 4 ~ "Home Run",
hit_value == 3 ~ "Triple",
hit_value == 2 ~ "Double",
hit_value == 1 ~ "Single",
TRUE ~ "Out"
)
)
# Create spray chart
fig <- plot_ly()
# Add outs
outs <- balls_in_play %>% filter(!is_hit)
fig <- fig %>%
add_trace(
data = outs,
x = ~hc_x,
y = ~hc_y,
type = 'scatter',
mode = 'markers',
name = 'Outs',
marker = list(
size = 8,
color = 'lightgray',
opacity = 0.5,
line = list(width = 0.5, color = 'gray')
),
text = ~paste0(
"<b>Out</b><br>",
"EV: ", round(launch_speed, 1), " mph<br>",
"LA: ", round(launch_angle, 1), "°<br>",
"Distance: ", round(hit_distance_sc, 0), " ft"
),
hoverinfo = 'text'
)
# Add hits by type
hit_data <- list(
list(value = 1, name = "Single", color = "#90EE90"),
list(value = 2, name = "Double", color = "#4169E1"),
list(value = 3, name = "Triple", color = "#FF8C00"),
list(value = 4, name = "Home Run", color = "#FF0000")
)
for (hit_type in hit_data) {
hits <- balls_in_play %>% filter(hit_value == hit_type$value)
if (nrow(hits) > 0) {
fig <- fig %>%
add_trace(
data = hits,
x = ~hc_x,
y = ~hc_y,
type = 'scatter',
mode = 'markers',
name = hit_type$name,
marker = list(
size = 10,
color = hit_type$color,
opacity = 0.8,
line = list(width = 1, color = 'black')
),
text = ~paste0(
"<b>", outcome_label, "</b><br>",
"EV: ", round(launch_speed, 1), " mph<br>",
"LA: ", round(launch_angle, 1), "°<br>",
"Distance: ", round(hit_distance_sc, 0), " ft<br>",
"Date: ", game_date
),
hoverinfo = 'text'
)
}
}
# Update layout
fig <- fig %>%
layout(
title = "Interactive Spray Chart - 2024 Season",
xaxis = list(
title = "Horizontal Position",
range = c(0, 250),
showgrid = FALSE,
zeroline = FALSE
),
yaxis = list(
title = "Distance from Home Plate",
range = c(0, 300),
showgrid = FALSE,
zeroline = FALSE,
scaleanchor = "x",
scaleratio = 1
),
plot_bgcolor = 'rgba(34, 139, 34, 0.1)',
width = 900,
height = 900,
showlegend = TRUE,
legend = list(
orientation = "h",
yanchor = "bottom",
y = 1.02,
xanchor = "right",
x = 1
),
hovermode = 'closest'
)
fig
6.10.3 Animated Barrel Rate Trends Over Time
Barrel rate is one of the most stable and predictive Statcast metrics. Visualizing how a player's barrel rate evolves throughout a season can reveal hot streaks, mechanical adjustments, or fatigue patterns. Animated visualizations effectively communicate temporal trends.
Python Implementation:
import plotly.express as px
from datetime import datetime
# Calculate rolling barrel rate
def calculate_rolling_metrics(df, window=50):
"""Calculate rolling Statcast metrics"""
df = df.sort_values('game_date').copy()
# Identify barrels
df['is_barrel'] = (
(df['launch_speed'] >= 98) &
(df['launch_angle'] >= 26) &
(df['launch_angle'] <= 30)
)
# Calculate rolling metrics
df['rolling_barrel_rate'] = (
df['is_barrel'].rolling(window=window, min_periods=10).mean() * 100
)
df['rolling_ev'] = df['launch_speed'].rolling(
window=window, min_periods=10
).mean()
df['rolling_hard_hit'] = (
(df['launch_speed'] >= 95).rolling(window=window, min_periods=10).mean() * 100
)
return df
# Apply rolling calculations
batted_balls_rolling = calculate_rolling_metrics(batted_balls, window=50)
batted_balls_rolling = batted_balls_rolling.dropna(
subset=['rolling_barrel_rate']
)
# Create animated line plot
fig = px.line(
batted_balls_rolling,
x='game_date',
y='rolling_barrel_rate',
title='Rolling 50-Batted Ball Barrel Rate - 2024 Season',
labels={
'game_date': 'Date',
'rolling_barrel_rate': 'Barrel Rate (%)'
},
markers=True
)
# Add season average reference line
season_avg_barrel = batted_balls['is_barrel'].mean() * 100
fig.add_hline(
y=season_avg_barrel,
line_dash="dash",
line_color="red",
annotation_text=f"Season Avg: {season_avg_barrel:.1f}%",
annotation_position="right"
)
# Add MLB average reference
fig.add_hline(
y=8.0,
line_dash="dash",
line_color="gray",
annotation_text="MLB Avg: 8.0%",
annotation_position="left"
)
fig.update_layout(
width=1100,
height=600,
hovermode='x unified',
template='plotly_white',
font=dict(size=12)
)
fig.update_traces(
line=dict(width=3, color='#1f77b4'),
marker=dict(size=6)
)
fig.show()
Alternative: Multi-Metric Animated Dashboard (Python):
from plotly.subplots import make_subplots
import plotly.graph_objects as go
# Create subplot figure with multiple metrics
fig = make_subplots(
rows=3, cols=1,
subplot_titles=(
'Rolling Barrel Rate (%)',
'Rolling Average Exit Velocity (mph)',
'Rolling Hard-Hit Rate (%)'
),
vertical_spacing=0.1
)
# Barrel Rate
fig.add_trace(
go.Scatter(
x=batted_balls_rolling['game_date'],
y=batted_balls_rolling['rolling_barrel_rate'],
mode='lines+markers',
name='Barrel Rate',
line=dict(color='#FF4444', width=2),
marker=dict(size=4)
),
row=1, col=1
)
# Exit Velocity
fig.add_trace(
go.Scatter(
x=batted_balls_rolling['game_date'],
y=batted_balls_rolling['rolling_ev'],
mode='lines+markers',
name='Avg Exit Velocity',
line=dict(color='#4444FF', width=2),
marker=dict(size=4)
),
row=2, col=1
)
# Hard-Hit Rate
fig.add_trace(
go.Scatter(
x=batted_balls_rolling['game_date'],
y=batted_balls_rolling['rolling_hard_hit'],
mode='lines+markers',
name='Hard-Hit Rate',
line=dict(color='#44FF44', width=2),
marker=dict(size=4)
),
row=3, col=1
)
fig.update_xaxes(title_text="Date", row=3, col=1)
fig.update_yaxes(title_text="%", row=1, col=1)
fig.update_yaxes(title_text="mph", row=2, col=1)
fig.update_yaxes(title_text="%", row=3, col=1)
fig.update_layout(
title_text='Statcast Metrics Trends - Rolling 50 Batted Balls',
height=900,
width=1100,
showlegend=False,
hovermode='x unified',
template='plotly_white'
)
fig.show()
R Implementation with Time Series:
library(plotly)
library(dplyr)
library(zoo)
# Calculate rolling metrics
batted_balls_rolling <- batted_balls %>%
arrange(game_date) %>%
mutate(
is_barrel = (launch_speed >= 98 & launch_angle >= 26 & launch_angle <= 30),
rolling_barrel_rate = rollapply(
is_barrel, width = 50, FUN = mean, fill = NA, align = "right"
) * 100,
rolling_ev = rollapply(
launch_speed, width = 50, FUN = mean, fill = NA, align = "right", na.rm = TRUE
)
) %>%
filter(!is.na(rolling_barrel_rate))
# Create animated line plot
fig <- plot_ly(
data = batted_balls_rolling,
x = ~game_date,
y = ~rolling_barrel_rate,
type = 'scatter',
mode = 'lines+markers',
line = list(color = '#1f77b4', width = 3),
marker = list(size = 6),
text = ~paste0(
"Date: ", game_date, "<br>",
"Barrel Rate: ", round(rolling_barrel_rate, 1), "%<br>",
"Avg EV: ", round(rolling_ev, 1), " mph"
),
hoverinfo = 'text'
) %>%
add_trace(
y = mean(batted_balls_rolling$is_barrel) * 100,
type = 'scatter',
mode = 'lines',
line = list(dash = 'dash', color = 'red', width = 2),
name = 'Season Average',
showlegend = TRUE
) %>%
add_trace(
y = 8.0,
type = 'scatter',
mode = 'lines',
line = list(dash = 'dash', color = 'gray', width = 2),
name = 'MLB Average',
showlegend = TRUE
) %>%
layout(
title = "Rolling 50-Batted Ball Barrel Rate - 2024 Season",
xaxis = list(title = "Date"),
yaxis = list(title = "Barrel Rate (%)"),
width = 1100,
height = 600,
hovermode = 'x unified',
template = 'plotly_white'
)
fig
Key Insights from Temporal Visualizations:
- Consistency: Stable barrel rates indicate reliable power production
- Trend Identification: Upward or downward trends may signal mechanical changes or injury
- Volatility: High variance suggests inconsistent contact quality
- Comparative Context: Reference lines provide immediate context against league and personal averages
Best Practices for Interactive Statcast Visualizations:
- Always Include Context: Add league average reference lines or comparison groups
- Rich Hover Data: Include date, outcome, exit velocity, launch angle, and distance
- Color Encoding: Use intuitive colors (red for home runs, gray for outs)
- Export Options: Save as HTML for sharing with non-technical stakeholders
- Performance: Limit data points to reasonable numbers (< 5000) for responsive interactions
- Accessibility: Ensure sufficient color contrast and alternative text descriptions
Interactive Statcast visualizations empower analysts to explore complex multidimensional data efficiently, uncover hidden patterns, and communicate findings compellingly to diverse audiences.
library(plotly)
library(dplyr)
library(baseballr)
# Fetch Statcast data
player_id <- 592450 # Aaron Judge
statcast_data <- statcast_search_batters(
start_date = "2024-04-01",
end_date = "2024-10-01",
batterid = player_id
)
# Filter and prepare data
batted_balls <- statcast_data %>%
filter(
!is.na(launch_speed),
!is.na(launch_angle),
!is.na(hit_distance_sc)
) %>%
mutate(
outcome_type = case_when(
events == "home_run" ~ "Home Run",
events %in% c("triple", "double") ~ "Extra-Base Hit",
events == "single" ~ "Single",
TRUE ~ "Out"
),
color = case_when(
outcome_type == "Home Run" ~ "#FF0000",
outcome_type == "Extra-Base Hit" ~ "#FFA500",
outcome_type == "Single" ~ "#00FF00",
outcome_type == "Out" ~ "#808080"
)
)
# Create 3D scatter plot
fig <- plot_ly(
data = batted_balls,
x = ~launch_speed,
y = ~launch_angle,
z = ~hit_distance_sc,
color = ~outcome_type,
colors = c(
"Home Run" = "#FF0000",
"Extra-Base Hit" = "#FFA500",
"Single" = "#00FF00",
"Out" = "#808080"
),
type = 'scatter3d',
mode = 'markers',
marker = list(
size = 4,
opacity = 0.7,
line = list(width = 0.5, color = 'rgba(50, 50, 50, 0.5)')
),
text = ~paste0(
"<b>", outcome_type, "</b><br>",
"EV: ", round(launch_speed, 1), " mph<br>",
"LA: ", round(launch_angle, 1), "°<br>",
"Distance: ", round(hit_distance_sc, 1), " ft<br>",
"Date: ", game_date
),
hoverinfo = 'text'
) %>%
layout(
title = list(
text = "3D Batted Ball Profile: Exit Velocity × Launch Angle × Distance",
font = list(size = 14)
),
scene = list(
xaxis = list(title = "Exit Velocity (mph)"),
yaxis = list(title = "Launch Angle (degrees)"),
zaxis = list(title = "Hit Distance (feet)"),
camera = list(
eye = list(x = 1.5, y = 1.5, z = 1.3)
)
),
width = 1000,
height = 800,
showlegend = TRUE,
legend = list(
x = 0.02,
y = 0.98,
bgcolor = 'rgba(255, 255, 255, 0.8)'
)
)
fig
# htmlwidgets::saveWidget(fig, "3d_batted_ball_profile.html")
library(plotly)
library(dplyr)
# Filter for balls in play
balls_in_play <- statcast_data %>%
filter(!is.na(hc_x), !is.na(hc_y)) %>%
mutate(
hit_value = case_when(
events == "home_run" ~ 4,
events == "triple" ~ 3,
events == "double" ~ 2,
events == "single" ~ 1,
TRUE ~ 0
),
is_hit = hit_value > 0,
outcome_label = case_when(
hit_value == 4 ~ "Home Run",
hit_value == 3 ~ "Triple",
hit_value == 2 ~ "Double",
hit_value == 1 ~ "Single",
TRUE ~ "Out"
)
)
# Create spray chart
fig <- plot_ly()
# Add outs
outs <- balls_in_play %>% filter(!is_hit)
fig <- fig %>%
add_trace(
data = outs,
x = ~hc_x,
y = ~hc_y,
type = 'scatter',
mode = 'markers',
name = 'Outs',
marker = list(
size = 8,
color = 'lightgray',
opacity = 0.5,
line = list(width = 0.5, color = 'gray')
),
text = ~paste0(
"<b>Out</b><br>",
"EV: ", round(launch_speed, 1), " mph<br>",
"LA: ", round(launch_angle, 1), "°<br>",
"Distance: ", round(hit_distance_sc, 0), " ft"
),
hoverinfo = 'text'
)
# Add hits by type
hit_data <- list(
list(value = 1, name = "Single", color = "#90EE90"),
list(value = 2, name = "Double", color = "#4169E1"),
list(value = 3, name = "Triple", color = "#FF8C00"),
list(value = 4, name = "Home Run", color = "#FF0000")
)
for (hit_type in hit_data) {
hits <- balls_in_play %>% filter(hit_value == hit_type$value)
if (nrow(hits) > 0) {
fig <- fig %>%
add_trace(
data = hits,
x = ~hc_x,
y = ~hc_y,
type = 'scatter',
mode = 'markers',
name = hit_type$name,
marker = list(
size = 10,
color = hit_type$color,
opacity = 0.8,
line = list(width = 1, color = 'black')
),
text = ~paste0(
"<b>", outcome_label, "</b><br>",
"EV: ", round(launch_speed, 1), " mph<br>",
"LA: ", round(launch_angle, 1), "°<br>",
"Distance: ", round(hit_distance_sc, 0), " ft<br>",
"Date: ", game_date
),
hoverinfo = 'text'
)
}
}
# Update layout
fig <- fig %>%
layout(
title = "Interactive Spray Chart - 2024 Season",
xaxis = list(
title = "Horizontal Position",
range = c(0, 250),
showgrid = FALSE,
zeroline = FALSE
),
yaxis = list(
title = "Distance from Home Plate",
range = c(0, 300),
showgrid = FALSE,
zeroline = FALSE,
scaleanchor = "x",
scaleratio = 1
),
plot_bgcolor = 'rgba(34, 139, 34, 0.1)',
width = 900,
height = 900,
showlegend = TRUE,
legend = list(
orientation = "h",
yanchor = "bottom",
y = 1.02,
xanchor = "right",
x = 1
),
hovermode = 'closest'
)
fig
library(plotly)
library(dplyr)
library(zoo)
# Calculate rolling metrics
batted_balls_rolling <- batted_balls %>%
arrange(game_date) %>%
mutate(
is_barrel = (launch_speed >= 98 & launch_angle >= 26 & launch_angle <= 30),
rolling_barrel_rate = rollapply(
is_barrel, width = 50, FUN = mean, fill = NA, align = "right"
) * 100,
rolling_ev = rollapply(
launch_speed, width = 50, FUN = mean, fill = NA, align = "right", na.rm = TRUE
)
) %>%
filter(!is.na(rolling_barrel_rate))
# Create animated line plot
fig <- plot_ly(
data = batted_balls_rolling,
x = ~game_date,
y = ~rolling_barrel_rate,
type = 'scatter',
mode = 'lines+markers',
line = list(color = '#1f77b4', width = 3),
marker = list(size = 6),
text = ~paste0(
"Date: ", game_date, "<br>",
"Barrel Rate: ", round(rolling_barrel_rate, 1), "%<br>",
"Avg EV: ", round(rolling_ev, 1), " mph"
),
hoverinfo = 'text'
) %>%
add_trace(
y = mean(batted_balls_rolling$is_barrel) * 100,
type = 'scatter',
mode = 'lines',
line = list(dash = 'dash', color = 'red', width = 2),
name = 'Season Average',
showlegend = TRUE
) %>%
add_trace(
y = 8.0,
type = 'scatter',
mode = 'lines',
line = list(dash = 'dash', color = 'gray', width = 2),
name = 'MLB Average',
showlegend = TRUE
) %>%
layout(
title = "Rolling 50-Batted Ball Barrel Rate - 2024 Season",
xaxis = list(title = "Date"),
yaxis = list(title = "Barrel Rate (%)"),
width = 1100,
height = 600,
hovermode = 'x unified',
template = 'plotly_white'
)
fig
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from pybaseball import statcast_batter, playerid_lookup
import numpy as np
# Fetch Statcast data for a power hitter (Aaron Judge example)
player_id = 592450 # Aaron Judge
start_date = '2024-04-01'
end_date = '2024-10-01'
statcast_data = statcast_batter(start_date, end_date, player_id)
# Filter for batted balls with complete data
batted_balls = statcast_data[
(statcast_data['launch_speed'].notna()) &
(statcast_data['launch_angle'].notna()) &
(statcast_data['hit_distance_sc'].notna())
].copy()
# Classify outcomes for color coding
def classify_outcome(event):
if event in ['home_run']:
return 'Home Run'
elif event in ['triple', 'double']:
return 'Extra-Base Hit'
elif event in ['single']:
return 'Single'
else:
return 'Out'
batted_balls['outcome_type'] = batted_balls['events'].apply(classify_outcome)
# Create color mapping
color_map = {
'Home Run': '#FF0000',
'Extra-Base Hit': '#FFA500',
'Single': '#00FF00',
'Out': '#808080'
}
batted_balls['color'] = batted_balls['outcome_type'].map(color_map)
# Create 3D scatter plot
fig = go.Figure()
for outcome in ['Out', 'Single', 'Extra-Base Hit', 'Home Run']:
df_subset = batted_balls[batted_balls['outcome_type'] == outcome]
fig.add_trace(go.Scatter3d(
x=df_subset['launch_speed'],
y=df_subset['launch_angle'],
z=df_subset['hit_distance_sc'],
mode='markers',
name=outcome,
marker=dict(
size=4,
color=color_map[outcome],
opacity=0.7,
line=dict(width=0.5, color='DarkSlateGray')
),
text=[
f"<b>{outcome}</b><br>" +
f"EV: {ev:.1f} mph<br>" +
f"LA: {la:.1f}°<br>" +
f"Distance: {dist:.1f} ft<br>" +
f"Date: {date}"
for ev, la, dist, date in zip(
df_subset['launch_speed'],
df_subset['launch_angle'],
df_subset['hit_distance_sc'],
df_subset['game_date']
)
],
hoverinfo='text'
))
# Add barrel zone reference (simplified)
# Barrels: 98+ mph exit velo with launch angles between 26-30 degrees
barrel_ev = np.linspace(98, 120, 10)
barrel_la = np.linspace(26, 30, 10)
barrel_dist = np.linspace(375, 450, 10)
fig.add_trace(go.Scatter3d(
x=barrel_ev,
y=barrel_la,
z=barrel_dist,
mode='markers',
name='Barrel Zone Reference',
marker=dict(size=8, color='gold', symbol='diamond', opacity=0.5),
showlegend=True
))
# Update layout
fig.update_layout(
title='3D Batted Ball Profile: Exit Velocity × Launch Angle × Distance',
scene=dict(
xaxis_title='Exit Velocity (mph)',
yaxis_title='Launch Angle (degrees)',
zaxis_title='Hit Distance (feet)',
camera=dict(
eye=dict(x=1.5, y=1.5, z=1.3)
)
),
width=1000,
height=800,
showlegend=True,
legend=dict(
x=0.02,
y=0.98,
bgcolor='rgba(255, 255, 255, 0.8)'
),
font=dict(size=12)
)
fig.show()
# fig.write_html('3d_batted_ball_profile.html')
import plotly.graph_objects as go
from matplotlib.patches import Arc
import numpy as np
# Filter for balls in play
balls_in_play = statcast_data[
(statcast_data['hc_x'].notna()) &
(statcast_data['hc_y'].notna())
].copy()
# Classify hit outcomes and directions
def get_hit_value(event):
"""Assign numeric value to hit outcomes"""
if event == 'home_run':
return 4
elif event == 'triple':
return 3
elif event == 'double':
return 2
elif event == 'single':
return 1
else:
return 0
balls_in_play['hit_value'] = balls_in_play['events'].apply(get_hit_value)
balls_in_play['is_hit'] = balls_in_play['hit_value'] > 0
# Create spray chart
fig = go.Figure()
# Add outs
outs = balls_in_play[~balls_in_play['is_hit']]
fig.add_trace(go.Scatter(
x=outs['hc_x'],
y=outs['hc_y'],
mode='markers',
name='Outs',
marker=dict(
size=8,
color='lightgray',
symbol='circle',
opacity=0.5,
line=dict(width=0.5, color='gray')
),
text=[
f"<b>Out</b><br>" +
f"EV: {ev:.1f} mph<br>" +
f"LA: {la:.1f}°<br>" +
f"Distance: {dist:.0f} ft"
for ev, la, dist in zip(
outs['launch_speed'],
outs['launch_angle'],
outs['hit_distance_sc'].fillna(0)
)
],
hoverinfo='text'
))
# Add hits by type
hit_colors = {1: '#90EE90', 2: '#4169E1', 3: '#FF8C00', 4: '#FF0000'}
hit_names = {1: 'Single', 2: 'Double', 3: 'Triple', 4: 'Home Run'}
for hit_val in [1, 2, 3, 4]:
hits = balls_in_play[balls_in_play['hit_value'] == hit_val]
if len(hits) > 0:
fig.add_trace(go.Scatter(
x=hits['hc_x'],
y=hits['hc_y'],
mode='markers',
name=hit_names[hit_val],
marker=dict(
size=10,
color=hit_colors[hit_val],
symbol='circle',
opacity=0.8,
line=dict(width=1, color='black')
),
text=[
f"<b>{hit_names[hit_val]}</b><br>" +
f"EV: {ev:.1f} mph<br>" +
f"LA: {la:.1f}°<br>" +
f"Distance: {dist:.0f} ft<br>" +
f"Date: {date}"
for ev, la, dist, date in zip(
hits['launch_speed'],
hits['launch_angle'],
hits['hit_distance_sc'].fillna(0),
hits['game_date']
)
],
hoverinfo='text'
))
# Add field dimensions (simplified diamond)
# Home plate at approximately (125, 200) in Statcast coordinates
fig.add_shape(type="line", x0=125, y0=200, x1=125, y1=50,
line=dict(color="green", width=2)) # Center field line
# Update layout for baseball field appearance
fig.update_layout(
title='Interactive Spray Chart - 2024 Season',
xaxis=dict(
title='Horizontal Position',
range=[0, 250],
showgrid=False,
zeroline=False
),
yaxis=dict(
title='Distance from Home Plate',
range=[0, 300],
showgrid=False,
zeroline=False,
scaleanchor="x",
scaleratio=1
),
plot_bgcolor='rgba(34, 139, 34, 0.1)', # Light green background
width=900,
height=900,
showlegend=True,
legend=dict(
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="right",
x=1
),
hovermode='closest'
)
fig.show()
import plotly.express as px
from datetime import datetime
# Calculate rolling barrel rate
def calculate_rolling_metrics(df, window=50):
"""Calculate rolling Statcast metrics"""
df = df.sort_values('game_date').copy()
# Identify barrels
df['is_barrel'] = (
(df['launch_speed'] >= 98) &
(df['launch_angle'] >= 26) &
(df['launch_angle'] <= 30)
)
# Calculate rolling metrics
df['rolling_barrel_rate'] = (
df['is_barrel'].rolling(window=window, min_periods=10).mean() * 100
)
df['rolling_ev'] = df['launch_speed'].rolling(
window=window, min_periods=10
).mean()
df['rolling_hard_hit'] = (
(df['launch_speed'] >= 95).rolling(window=window, min_periods=10).mean() * 100
)
return df
# Apply rolling calculations
batted_balls_rolling = calculate_rolling_metrics(batted_balls, window=50)
batted_balls_rolling = batted_balls_rolling.dropna(
subset=['rolling_barrel_rate']
)
# Create animated line plot
fig = px.line(
batted_balls_rolling,
x='game_date',
y='rolling_barrel_rate',
title='Rolling 50-Batted Ball Barrel Rate - 2024 Season',
labels={
'game_date': 'Date',
'rolling_barrel_rate': 'Barrel Rate (%)'
},
markers=True
)
# Add season average reference line
season_avg_barrel = batted_balls['is_barrel'].mean() * 100
fig.add_hline(
y=season_avg_barrel,
line_dash="dash",
line_color="red",
annotation_text=f"Season Avg: {season_avg_barrel:.1f}%",
annotation_position="right"
)
# Add MLB average reference
fig.add_hline(
y=8.0,
line_dash="dash",
line_color="gray",
annotation_text="MLB Avg: 8.0%",
annotation_position="left"
)
fig.update_layout(
width=1100,
height=600,
hovermode='x unified',
template='plotly_white',
font=dict(size=12)
)
fig.update_traces(
line=dict(width=3, color='#1f77b4'),
marker=dict(size=6)
)
fig.show()
from plotly.subplots import make_subplots
import plotly.graph_objects as go
# Create subplot figure with multiple metrics
fig = make_subplots(
rows=3, cols=1,
subplot_titles=(
'Rolling Barrel Rate (%)',
'Rolling Average Exit Velocity (mph)',
'Rolling Hard-Hit Rate (%)'
),
vertical_spacing=0.1
)
# Barrel Rate
fig.add_trace(
go.Scatter(
x=batted_balls_rolling['game_date'],
y=batted_balls_rolling['rolling_barrel_rate'],
mode='lines+markers',
name='Barrel Rate',
line=dict(color='#FF4444', width=2),
marker=dict(size=4)
),
row=1, col=1
)
# Exit Velocity
fig.add_trace(
go.Scatter(
x=batted_balls_rolling['game_date'],
y=batted_balls_rolling['rolling_ev'],
mode='lines+markers',
name='Avg Exit Velocity',
line=dict(color='#4444FF', width=2),
marker=dict(size=4)
),
row=2, col=1
)
# Hard-Hit Rate
fig.add_trace(
go.Scatter(
x=batted_balls_rolling['game_date'],
y=batted_balls_rolling['rolling_hard_hit'],
mode='lines+markers',
name='Hard-Hit Rate',
line=dict(color='#44FF44', width=2),
marker=dict(size=4)
),
row=3, col=1
)
fig.update_xaxes(title_text="Date", row=3, col=1)
fig.update_yaxes(title_text="%", row=1, col=1)
fig.update_yaxes(title_text="mph", row=2, col=1)
fig.update_yaxes(title_text="%", row=3, col=1)
fig.update_layout(
title_text='Statcast Metrics Trends - Rolling 50 Batted Balls',
height=900,
width=1100,
showlegend=False,
hovermode='x unified',
template='plotly_white'
)
fig.show()
Exercise 1: xwOBA Investigation
Task: Identify the three biggest "regression candidates" from the 2024 season - players whose actual wOBA significantly differs from xwOBA (minimum 300 PA).
Steps:
- Fetch Statcast data for multiple players or use Baseball Savant leaderboards
- Calculate actual wOBA and xwOBA for each player
- Identify players with largest positive difference (overperforming)
- Identify players with largest negative difference (underperforming)
- Analyze why - look at their batted ball profile, sprint speed, etc.
Questions to answer:
- Who is most likely to decline next season?
- Who is most likely to improve?
- What's causing the discrepancy for each player?
Exercise 2: Barrel Rate vs. Home Run Analysis
Task: Analyze the relationship between barrel rate and home run totals.
Steps:
- Collect barrel% and HR data for 20+ qualified hitters
- Create a scatter plot: Barrel% (x-axis) vs. HR (y-axis)
- Calculate correlation coefficient
- Identify outliers - high barrel%, low HR and vice versa
- Investigate why outliers exist (park factors, launch angle within barrels)
Questions to answer:
- How strong is the correlation between Barrel% and HR?
- What barrel% typically produces 30+ HR?
- Which players have high barrels but low HR? Why?
Exercise 3: Complete Hitter Profile
Task: Create a complete Statcast profile for two contrasting hitters - one power hitter and one contact/speed hitter.
Suggested players:
- Power: Aaron Judge, Kyle Schwarber, Pete Alonso
- Contact/Speed: Luis Arraez, Steven Kwan, Elly De La Cruz
Requirements:
- Use the
create_complete_hitter_profile()function - Calculate all key metrics for both players
- Compare and contrast their profiles
- Explain how their different approaches produce value
- Predict future performance based on Statcast data
Exercise 4: Launch Angle Revolution Analysis
Task: Analyze how launch angles have changed over time and their impact on home runs.
Steps:
- Fetch league-wide data from 2015, 2018, and 2024
- Calculate average launch angle for each year
- Calculate FB% (25-50°) for each year
- Compare to league-wide HR totals
- Identify when the "launch angle revolution" peaked
Questions to answer:
- How much has average launch angle increased since 2015?
- What year had the highest fly ball rate?
- Has the revolution plateaued or reversed?
- What's the relationship between league-wide launch angle and HR totals?
Chapter Summary
In this chapter, we've explored the revolutionary world of Statcast Analytics, focusing specifically on hitting metrics. We've learned:
- Statcast's Impact: Since 2015, ball and player tracking has provided unprecedented insight into player performance
- Exit Velocity: The foundational power metric, with 95+ mph representing "hard hit" contact and elite hitters averaging 93+ mph
- Launch Angle: The vertical component of batted balls, with the "sweet spot" (8-32°) combining the high BABIP of line drives with home run power
- Barrels: The perfect combination of exit velocity and launch angle, representing elite contact quality with .500+ BA and 1.500+ SLG
- Expected Statistics (xStats): Metrics like xBA and xwOBA that isolate contact quality from luck and defense, crucial for identifying regression candidates
- Spray Analysis: Understanding pull/center/opposite field tendencies and their implications for defensive positioning
- Sprint Speed: The pure athleticism metric that impacts BABIP, especially on ground balls, and baserunning value
- Advanced Analysis: Plate coverage, pitch type splits, and count-based performance reveal the complete picture of a hitter's approach
- Complete Profiles: Combining all Statcast metrics creates a comprehensive view of hitter talent, separating skill from circumstance
The Statcast revolution has fundamentally changed baseball analysis. We no longer need to rely solely on outcome-based statistics that blend skill, luck, defense, and park factors. Instead, we can isolate the hitter's contribution - the quality of contact - and build predictive models that identify future performance more accurately than ever before.
In the next chapter, we'll shift our focus to Statcast Pitching Analytics, exploring how similar tracking technologies have transformed our understanding of pitchers.