7.1.1 What Statcast Measures for Pitchers
Just as Statcast revolutionized hitting analysis, it has fundamentally transformed how we evaluate pitchers. Before Statcast, pitching evaluation relied heavily on outcome-based statistics like ERA, WHIP, and strikeout rate. These metrics tell us what happened, but not how or why. Statcast peels back the layers, measuring the physical properties of every pitch thrown in Major League Baseball.
The TrackMan radar system tracks each pitch from the moment it leaves the pitcher's hand until it crosses home plate (or is put into play). This tracking provides unprecedented insight into pitch characteristics that were previously invisible or estimated:
Velocity: Measured at the release point and as the ball crosses the plate. This isn't just "how hard does he throw" - it's precise measurement of pitch speed throughout its flight path.
Spin Rate: Measured in revolutions per minute (RPM), spin rate quantifies how much the ball rotates. Higher spin typically means more movement and better "carry" on fastballs.
Movement: Broken down into horizontal break (side-to-side movement) and induced vertical break (how much the pitch "rises" or drops relative to a spinless pitch).
Release Point: The three-dimensional coordinates (height, side, extension) where the ball leaves the pitcher's hand.
Vertical Approach Angle (VAA): The angle at which the pitch enters the hitting zone - a critical factor in swing decisions and contact quality.
Pitch Location: Precise coordinates of where the pitch crosses the front of home plate, enabling detailed command analysis.
This wealth of data has enabled new evaluation frameworks. We can now identify why one pitcher's fastball generates more swings-and-misses than another's despite similar velocity. We can understand why certain breaking balls are more effective. We can diagnose mechanical issues and design new pitches based on data-driven principles.
7.1.2 Key Statcast Pitching Metrics
Here's a comprehensive overview of the most important Statcast pitching metrics:
| Metric | Definition | Typical Range | Elite Threshold | What It Reveals |
|---|---|---|---|---|
| 4-Seam Velocity | Release point velocity | 92-95 mph | 97+ mph | Fastball power |
| Spin Rate (4-Seam) | Fastball spin | 2200-2400 RPM | 2500+ RPM | Fastball "rise" and whiff ability |
| Induced Vertical Break (IVB) | Vertical movement vs. gravity | 15-17 inches | 18+ inches | Fastball carry/riding action |
| Horizontal Break (HB) | Side-to-side movement | Varies by pitch | Context-dependent | Lateral movement |
| Vertical Approach Angle | Entry angle into zone | -4° to -6° | Varies by pitch type | Perceived rise/deception |
| Extension | Release point distance | 6.0-6.5 feet | 6.5+ feet | Effective velocity boost |
| Release Height | Vertical release point | 5.5-6.5 feet | Context-dependent | Angle and deception |
| Whiff Rate | Swings and misses / swings | 20-25% | 30%+ | Swing-and-miss ability |
| Chase Rate | Swings outside zone / pitches outside | 25-30% | 35%+ | Deception effectiveness |
| xwOBA | Expected wOBA allowed | .310-.330 | <.300 | Contact quality allowed |
Understanding these metrics and their interactions is essential for modern pitching analysis. A 95 mph fastball with 2600 RPM spin will behave very differently from a 95 mph fastball with 2200 RPM spin. The former will have more "ride" and generate more swings-and-misses on high fastballs.
7.1.3 The Physics of Pitching
To understand Statcast metrics, we need basic understanding of pitching physics. When a pitcher throws a ball, two primary forces affect its flight:
Gravity pulls the ball downward. Without any spin, a pitch thrown from 6 feet high would drop approximately 3-4 feet by the time it reaches home plate (60.5 feet away).
Magnus Force is created by spin. As the ball rotates, it creates pressure differentials that cause movement perpendicular to the spin axis. A ball spinning with backspin (like a four-seam fastball) creates upward Magnus force, causing the pitch to "rise" - or more accurately, drop less than gravity alone would cause.
Induced Vertical Break (IVB) measures this Magnus effect. A fastball with 16 inches of IVB drops 16 inches less than a spinless pitch would. High-spin fastballs can have 18-20 inches of IVB, creating the perception that the pitch is "rising" as it reaches the plate.
Horizontal Break works similarly but perpendicular to vertical movement. A slider's spin axis tilted sideways creates lateral movement away from the pitcher's arm side.
Understanding these principles helps explain why certain pitch characteristics work: high-spin fastballs up in the zone generate swings underneath the ball, sliders with tight spin and lateral movement induce weak contact, curveballs with high spin and downward movement produce ground balls.
7.2.1 Release Velocity vs. Perceived Velocity
Release velocity is the speed of the pitch as it leaves the pitcher's hand, typically measured a few feet after release. This is the "official" velocity shown on stadium radar guns and broadcasts.
Perceived velocity (or effective velocity) is what matters to the hitter. Two factors modify how fast a pitch "feels" to a hitter:
- Extension: Pitchers who release the ball further from the rubber effectively shorten the distance to the plate. A pitcher with 7 feet of extension releases the ball from 53.5 feet away (60.5 - 7) rather than 60.5 feet. This gives the hitter less reaction time, making the pitch "play" 1-2 mph faster.
- Vertical Approach Angle (VAA): Pitches entering the zone on flatter angles are harder to track and give hitters less time to adjust.
The formula for perceived velocity adjustment from extension is:
Perceived Velocity = Release Velocity × (60.5 / (60.5 - Extension))
For example, a 95 mph fastball with 7 feet of extension:
- Perceived Velocity = 95 × (60.5 / 53.5) = 95 × 1.131 = 107.4 mph in terms of reaction time
This explains why some pitchers' "average" fastballs generate more swings-and-misses than expected - their extension makes the pitch play faster.
7.2.2 Velocity Metrics by Pitch Type
Different pitch types have characteristic velocity ranges. Understanding these helps with pitch classification and arsenal evaluation:
Four-Seam Fastball (FF): 90-98 mph (elite: 97+)
Two-Seam Fastball/Sinker (SI): 89-96 mph (typically 1-2 mph slower than four-seam)
Cutter (FC): 87-94 mph (typically 2-5 mph slower than fastball)
Slider (SL): 83-90 mph (can be slower for "sweepers")
Curveball (CU): 75-82 mph (larger break, slower velocity)
Changeup (CH): 82-88 mph (typically 6-10 mph slower than fastball)
Splitter (FS): 84-90 mph (2-6 mph slower than fastball)
Let's code functions to analyze velocity by pitch type:
Python Implementation
import pandas as pd
import numpy as np
from pybaseball import statcast_pitcher, playerid_lookup
import matplotlib.pyplot as plt
import seaborn as sns
def calculate_velocity_metrics(df):
"""
Calculate comprehensive velocity metrics by pitch type.
Parameters:
df: Statcast DataFrame with pitch-level data
Returns:
DataFrame with velocity metrics by pitch type
"""
# Filter for valid pitches
pitches = df[df['release_speed'].notna()].copy()
# Group by pitch type and calculate metrics
velo_metrics = pitches.groupby('pitch_type').agg({
'release_speed': ['mean', 'std', 'min', 'max', 'count'],
'release_extension': 'mean'
}).round(2)
# Flatten column names
velo_metrics.columns = ['_'.join(col).strip() for col in velo_metrics.columns]
velo_metrics = velo_metrics.reset_index()
# Rename for clarity
velo_metrics.columns = ['pitch_type', 'avg_velo', 'velo_std', 'min_velo',
'max_velo', 'count', 'avg_extension']
# Calculate perceived velocity
velo_metrics['perceived_velo'] = (
velo_metrics['avg_velo'] * (60.5 / (60.5 - velo_metrics['avg_extension']))
).round(2)
# Calculate percentage of pitches
velo_metrics['pitch_pct'] = (
velo_metrics['count'] / velo_metrics['count'].sum() * 100
).round(1)
# Sort by average velocity descending
velo_metrics = velo_metrics.sort_values('avg_velo', ascending=False)
return velo_metrics
# Example: Analyze Gerrit Cole's velocity
# Gerrit Cole player_id: 543037
start_date = '2024-04-01'
end_date = '2024-10-01'
cole_pitches = statcast_pitcher(start_date, end_date, 543037)
if cole_pitches is not None and len(cole_pitches) > 0:
cole_velo = calculate_velocity_metrics(cole_pitches)
print("Gerrit Cole 2024 Velocity Profile")
print("=" * 80)
print(cole_velo.to_string(index=False))
# Visualize velocity by pitch type
pitch_counts = cole_pitches.groupby('pitch_type')['release_speed'].count()
qualifying_pitches = pitch_counts[pitch_counts >= 50].index
fig, ax = plt.subplots(figsize=(12, 6))
data_to_plot = cole_pitches[cole_pitches['pitch_type'].isin(qualifying_pitches)]
sns.boxplot(data=data_to_plot, x='pitch_type', y='release_speed',
palette='Set2', ax=ax)
ax.set_xlabel('Pitch Type', fontsize=12, fontweight='bold')
ax.set_ylabel('Release Velocity (mph)', fontsize=12, fontweight='bold')
ax.set_title('Gerrit Cole 2024: Velocity Distribution by Pitch Type',
fontsize=14, fontweight='bold', pad=15)
ax.grid(axis='y', alpha=0.3, linestyle='--')
plt.tight_layout()
plt.show()
R Implementation
library(baseballr)
library(dplyr)
library(tidyr)
library(ggplot2)
calculate_velocity_metrics <- function(df) {
# Filter for valid pitches
pitches <- df %>%
filter(!is.na(release_speed))
# Group by pitch type and calculate metrics
velo_metrics <- pitches %>%
group_by(pitch_type) %>%
summarise(
avg_velo = mean(release_speed, na.rm = TRUE),
velo_std = sd(release_speed, na.rm = TRUE),
min_velo = min(release_speed, na.rm = TRUE),
max_velo = max(release_speed, na.rm = TRUE),
count = n(),
avg_extension = mean(release_extension, na.rm = TRUE),
.groups = 'drop'
) %>%
mutate(
# Calculate perceived velocity
perceived_velo = avg_velo * (60.5 / (60.5 - avg_extension)),
# Calculate pitch percentage
pitch_pct = count / sum(count) * 100,
# Round for display
across(where(is.numeric), ~round(.x, 2))
) %>%
arrange(desc(avg_velo))
return(velo_metrics)
}
# Fetch Gerrit Cole's 2024 data
# Gerrit Cole MLBAM ID: 543037
cole_pitches <- statcast_search_pitchers(
start_date = "2024-04-01",
end_date = "2024-10-01",
pitcherid = 543037
)
cole_velo <- calculate_velocity_metrics(cole_pitches)
cat("Gerrit Cole 2024 Velocity Profile\n")
cat(strrep("=", 80), "\n")
print(cole_velo)
# Visualize velocity by pitch type
qualifying_pitches <- cole_pitches %>%
count(pitch_type) %>%
filter(n >= 50) %>%
pull(pitch_type)
cole_pitches %>%
filter(pitch_type %in% qualifying_pitches) %>%
ggplot(aes(x = pitch_type, y = release_speed, fill = pitch_type)) +
geom_boxplot(alpha = 0.7, outlier.alpha = 0.3) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Gerrit Cole 2024: Velocity Distribution by Pitch Type",
x = "Pitch Type",
y = "Release Velocity (mph)"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
axis.title = element_text(face = "bold", size = 12),
legend.position = "none",
panel.grid.major.x = element_blank()
)
7.2.3 Velocity Decline and Fatigue Analysis
Velocity is one of the first indicators of pitcher fatigue or injury. Monitoring velocity trends throughout games and across seasons provides valuable information for player health and performance management.
def analyze_velocity_by_pitch_number(df):
"""
Analyze how velocity changes as pitch count increases within games.
Parameters:
df: Statcast DataFrame with pitch-level data
Returns:
DataFrame showing velocity trends by pitch number bins
"""
# Focus on four-seam fastballs for consistency
fastballs = df[df['pitch_type'] == 'FF'].copy()
if len(fastballs) == 0:
return None
# Create pitch count bins
fastballs['pitch_bin'] = pd.cut(
fastballs['pitch_number'],
bins=[0, 25, 50, 75, 100, 150],
labels=['1-25', '26-50', '51-75', '76-100', '100+']
)
# Calculate velocity by bin
velo_by_count = fastballs.groupby('pitch_bin', observed=True).agg({
'release_speed': ['mean', 'count']
}).round(2)
velo_by_count.columns = ['avg_velo', 'pitch_count']
velo_by_count = velo_by_count.reset_index()
# Calculate velocity drop from first bin
baseline_velo = velo_by_count.iloc[0]['avg_velo']
velo_by_count['velo_drop'] = (
velo_by_count['avg_velo'] - baseline_velo
).round(2)
return velo_by_count
# Analyze Cole's velocity by pitch count
cole_velo_trend = analyze_velocity_by_pitch_number(cole_pitches)
if cole_velo_trend is not None:
print("\nVelocity by Pitch Count (Four-Seam Fastball)")
print("=" * 60)
print(cole_velo_trend.to_string(index=False))
# Visualize
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(range(len(cole_velo_trend)), cole_velo_trend['avg_velo'],
marker='o', linewidth=2, markersize=8, color='#003087')
ax.set_xticks(range(len(cole_velo_trend)))
ax.set_xticklabels(cole_velo_trend['pitch_bin'])
ax.set_xlabel('Pitch Count Range', fontsize=12, fontweight='bold')
ax.set_ylabel('Average Velocity (mph)', fontsize=12, fontweight='bold')
ax.set_title('Fastball Velocity by Pitch Count\nGerrit Cole 2024',
fontsize=14, fontweight='bold')
ax.grid(axis='y', alpha=0.3, linestyle='--')
# Add velocity drop annotations
for i, row in cole_velo_trend.iterrows():
if row['velo_drop'] != 0:
ax.annotate(f"{row['velo_drop']:+.1f}",
xy=(i, row['avg_velo']),
xytext=(0, 10), textcoords='offset points',
ha='center', fontsize=9, color='red')
plt.tight_layout()
plt.show()
# R version: Velocity by pitch count
analyze_velocity_by_pitch_number <- function(df) {
# Focus on four-seam fastballs
fastballs <- df %>%
filter(pitch_type == 'FF', !is.na(release_speed))
if (nrow(fastballs) == 0) {
return(NULL)
}
# Create pitch count bins
velo_by_count <- fastballs %>%
mutate(
pitch_bin = cut(pitch_number,
breaks = c(0, 25, 50, 75, 100, 150),
labels = c('1-25', '26-50', '51-75', '76-100', '100+'))
) %>%
group_by(pitch_bin, .drop = FALSE) %>%
summarise(
avg_velo = mean(release_speed, na.rm = TRUE),
pitch_count = n(),
.groups = 'drop'
) %>%
mutate(
velo_drop = avg_velo - first(avg_velo),
across(c(avg_velo, velo_drop), ~round(.x, 2))
)
return(velo_by_count)
}
cole_velo_trend <- analyze_velocity_by_pitch_number(cole_pitches)
cat("\nVelocity by Pitch Count (Four-Seam Fastball)\n")
cat(strrep("=", 60), "\n")
print(cole_velo_trend)
# Visualize
ggplot(cole_velo_trend, aes(x = pitch_bin, y = avg_velo, group = 1)) +
geom_line(color = '#003087', size = 1.2) +
geom_point(color = '#003087', size = 4) +
geom_text(aes(label = sprintf("%+.1f", velo_drop)),
vjust = -1.5, color = 'red', fontface = 'bold', size = 3.5) +
labs(
title = "Fastball Velocity by Pitch Count",
subtitle = "Gerrit Cole 2024",
x = "Pitch Count Range",
y = "Average Velocity (mph)"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
axis.title = element_text(face = "bold", size = 11)
)
7.2.4 Key Insights on Velocity
- Velocity matters, but it's not everything: A 98 mph fastball with poor command or predictable sequencing will underperform a well-located 94 mph fastball with deception.
- Extension is underrated: Pitchers with plus extension (6.5+ feet) can succeed with "average" velocity because the ball plays faster.
- Velocity decline signals: Drops of 2+ mph within a game warrant attention - they may indicate fatigue or injury. Season-over-season declines require investigation.
- Pitch-to-pitch variation: Elite pitchers show minimal velocity variation on their fastball, suggesting consistent mechanics. Excessive variation may indicate mechanical inconsistency.
library(baseballr)
library(dplyr)
library(tidyr)
library(ggplot2)
calculate_velocity_metrics <- function(df) {
# Filter for valid pitches
pitches <- df %>%
filter(!is.na(release_speed))
# Group by pitch type and calculate metrics
velo_metrics <- pitches %>%
group_by(pitch_type) %>%
summarise(
avg_velo = mean(release_speed, na.rm = TRUE),
velo_std = sd(release_speed, na.rm = TRUE),
min_velo = min(release_speed, na.rm = TRUE),
max_velo = max(release_speed, na.rm = TRUE),
count = n(),
avg_extension = mean(release_extension, na.rm = TRUE),
.groups = 'drop'
) %>%
mutate(
# Calculate perceived velocity
perceived_velo = avg_velo * (60.5 / (60.5 - avg_extension)),
# Calculate pitch percentage
pitch_pct = count / sum(count) * 100,
# Round for display
across(where(is.numeric), ~round(.x, 2))
) %>%
arrange(desc(avg_velo))
return(velo_metrics)
}
# Fetch Gerrit Cole's 2024 data
# Gerrit Cole MLBAM ID: 543037
cole_pitches <- statcast_search_pitchers(
start_date = "2024-04-01",
end_date = "2024-10-01",
pitcherid = 543037
)
cole_velo <- calculate_velocity_metrics(cole_pitches)
cat("Gerrit Cole 2024 Velocity Profile\n")
cat(strrep("=", 80), "\n")
print(cole_velo)
# Visualize velocity by pitch type
qualifying_pitches <- cole_pitches %>%
count(pitch_type) %>%
filter(n >= 50) %>%
pull(pitch_type)
cole_pitches %>%
filter(pitch_type %in% qualifying_pitches) %>%
ggplot(aes(x = pitch_type, y = release_speed, fill = pitch_type)) +
geom_boxplot(alpha = 0.7, outlier.alpha = 0.3) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Gerrit Cole 2024: Velocity Distribution by Pitch Type",
x = "Pitch Type",
y = "Release Velocity (mph)"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
axis.title = element_text(face = "bold", size = 12),
legend.position = "none",
panel.grid.major.x = element_blank()
)
# R version: Velocity by pitch count
analyze_velocity_by_pitch_number <- function(df) {
# Focus on four-seam fastballs
fastballs <- df %>%
filter(pitch_type == 'FF', !is.na(release_speed))
if (nrow(fastballs) == 0) {
return(NULL)
}
# Create pitch count bins
velo_by_count <- fastballs %>%
mutate(
pitch_bin = cut(pitch_number,
breaks = c(0, 25, 50, 75, 100, 150),
labels = c('1-25', '26-50', '51-75', '76-100', '100+'))
) %>%
group_by(pitch_bin, .drop = FALSE) %>%
summarise(
avg_velo = mean(release_speed, na.rm = TRUE),
pitch_count = n(),
.groups = 'drop'
) %>%
mutate(
velo_drop = avg_velo - first(avg_velo),
across(c(avg_velo, velo_drop), ~round(.x, 2))
)
return(velo_by_count)
}
cole_velo_trend <- analyze_velocity_by_pitch_number(cole_pitches)
cat("\nVelocity by Pitch Count (Four-Seam Fastball)\n")
cat(strrep("=", 60), "\n")
print(cole_velo_trend)
# Visualize
ggplot(cole_velo_trend, aes(x = pitch_bin, y = avg_velo, group = 1)) +
geom_line(color = '#003087', size = 1.2) +
geom_point(color = '#003087', size = 4) +
geom_text(aes(label = sprintf("%+.1f", velo_drop)),
vjust = -1.5, color = 'red', fontface = 'bold', size = 3.5) +
labs(
title = "Fastball Velocity by Pitch Count",
subtitle = "Gerrit Cole 2024",
x = "Pitch Count Range",
y = "Average Velocity (mph)"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
axis.title = element_text(face = "bold", size = 11)
)
import pandas as pd
import numpy as np
from pybaseball import statcast_pitcher, playerid_lookup
import matplotlib.pyplot as plt
import seaborn as sns
def calculate_velocity_metrics(df):
"""
Calculate comprehensive velocity metrics by pitch type.
Parameters:
df: Statcast DataFrame with pitch-level data
Returns:
DataFrame with velocity metrics by pitch type
"""
# Filter for valid pitches
pitches = df[df['release_speed'].notna()].copy()
# Group by pitch type and calculate metrics
velo_metrics = pitches.groupby('pitch_type').agg({
'release_speed': ['mean', 'std', 'min', 'max', 'count'],
'release_extension': 'mean'
}).round(2)
# Flatten column names
velo_metrics.columns = ['_'.join(col).strip() for col in velo_metrics.columns]
velo_metrics = velo_metrics.reset_index()
# Rename for clarity
velo_metrics.columns = ['pitch_type', 'avg_velo', 'velo_std', 'min_velo',
'max_velo', 'count', 'avg_extension']
# Calculate perceived velocity
velo_metrics['perceived_velo'] = (
velo_metrics['avg_velo'] * (60.5 / (60.5 - velo_metrics['avg_extension']))
).round(2)
# Calculate percentage of pitches
velo_metrics['pitch_pct'] = (
velo_metrics['count'] / velo_metrics['count'].sum() * 100
).round(1)
# Sort by average velocity descending
velo_metrics = velo_metrics.sort_values('avg_velo', ascending=False)
return velo_metrics
# Example: Analyze Gerrit Cole's velocity
# Gerrit Cole player_id: 543037
start_date = '2024-04-01'
end_date = '2024-10-01'
cole_pitches = statcast_pitcher(start_date, end_date, 543037)
if cole_pitches is not None and len(cole_pitches) > 0:
cole_velo = calculate_velocity_metrics(cole_pitches)
print("Gerrit Cole 2024 Velocity Profile")
print("=" * 80)
print(cole_velo.to_string(index=False))
# Visualize velocity by pitch type
pitch_counts = cole_pitches.groupby('pitch_type')['release_speed'].count()
qualifying_pitches = pitch_counts[pitch_counts >= 50].index
fig, ax = plt.subplots(figsize=(12, 6))
data_to_plot = cole_pitches[cole_pitches['pitch_type'].isin(qualifying_pitches)]
sns.boxplot(data=data_to_plot, x='pitch_type', y='release_speed',
palette='Set2', ax=ax)
ax.set_xlabel('Pitch Type', fontsize=12, fontweight='bold')
ax.set_ylabel('Release Velocity (mph)', fontsize=12, fontweight='bold')
ax.set_title('Gerrit Cole 2024: Velocity Distribution by Pitch Type',
fontsize=14, fontweight='bold', pad=15)
ax.grid(axis='y', alpha=0.3, linestyle='--')
plt.tight_layout()
plt.show()
def analyze_velocity_by_pitch_number(df):
"""
Analyze how velocity changes as pitch count increases within games.
Parameters:
df: Statcast DataFrame with pitch-level data
Returns:
DataFrame showing velocity trends by pitch number bins
"""
# Focus on four-seam fastballs for consistency
fastballs = df[df['pitch_type'] == 'FF'].copy()
if len(fastballs) == 0:
return None
# Create pitch count bins
fastballs['pitch_bin'] = pd.cut(
fastballs['pitch_number'],
bins=[0, 25, 50, 75, 100, 150],
labels=['1-25', '26-50', '51-75', '76-100', '100+']
)
# Calculate velocity by bin
velo_by_count = fastballs.groupby('pitch_bin', observed=True).agg({
'release_speed': ['mean', 'count']
}).round(2)
velo_by_count.columns = ['avg_velo', 'pitch_count']
velo_by_count = velo_by_count.reset_index()
# Calculate velocity drop from first bin
baseline_velo = velo_by_count.iloc[0]['avg_velo']
velo_by_count['velo_drop'] = (
velo_by_count['avg_velo'] - baseline_velo
).round(2)
return velo_by_count
# Analyze Cole's velocity by pitch count
cole_velo_trend = analyze_velocity_by_pitch_number(cole_pitches)
if cole_velo_trend is not None:
print("\nVelocity by Pitch Count (Four-Seam Fastball)")
print("=" * 60)
print(cole_velo_trend.to_string(index=False))
# Visualize
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(range(len(cole_velo_trend)), cole_velo_trend['avg_velo'],
marker='o', linewidth=2, markersize=8, color='#003087')
ax.set_xticks(range(len(cole_velo_trend)))
ax.set_xticklabels(cole_velo_trend['pitch_bin'])
ax.set_xlabel('Pitch Count Range', fontsize=12, fontweight='bold')
ax.set_ylabel('Average Velocity (mph)', fontsize=12, fontweight='bold')
ax.set_title('Fastball Velocity by Pitch Count\nGerrit Cole 2024',
fontsize=14, fontweight='bold')
ax.grid(axis='y', alpha=0.3, linestyle='--')
# Add velocity drop annotations
for i, row in cole_velo_trend.iterrows():
if row['velo_drop'] != 0:
ax.annotate(f"{row['velo_drop']:+.1f}",
xy=(i, row['avg_velo']),
xytext=(0, 10), textcoords='offset points',
ha='center', fontsize=9, color='red')
plt.tight_layout()
plt.show()
7.3.1 Understanding Spin Rate and Spin Axis
Spin rate measures how many times per minute the ball rotates. For fastballs, higher spin rates generally correlate with more "rise" (or less drop from gravity). For breaking balls, spin rate affects the magnitude of break.
However, not all spin is created equal. The spin axis - the orientation of the ball's rotation - determines the direction of movement. A perfectly backspin four-seam fastball (spin axis at 12:00 on a clock face) produces maximum vertical movement. Tilt the axis to 1:00, and you get some horizontal movement mixed with vertical rise.
Active Spin (or useful spin) is the percentage of spin contributing to movement. A pitch with 2500 RPM but only 85% active spin has 2125 RPM actually creating movement. The remaining 15% is "gyro spin" - spinning like a football - which doesn't create Magnus force.
Key concepts:
- Four-Seam Fastball: High spin (2400+ RPM), high active spin percentage (95%+), creates vertical "rise"
- Two-Seam/Sinker: Lower spin (2100-2300 RPM), often has natural arm-side run, less vertical rise
- Slider: High spin (2400-2700 RPM), tilted axis creates lateral break
- Curveball: High spin (2500-3000 RPM), forward-tilted axis creates vertical drop
- Changeup: Lower spin (1500-1900 RPM), mimics fastball arm action but generates less Magnus force
7.3.2 Movement Profiles by Pitch Type
Different pitches have characteristic movement profiles. Understanding these helps with pitch design and arsenal optimization:
| Pitch Type | Spin Rate (RPM) | Induced Vertical Break | Horizontal Break | Primary Action |
|---|---|---|---|---|
| Four-Seam FB | 2200-2600 | 14-18 inches | -2 to +2 inches | Rising action |
| Sinker | 2000-2300 | 8-14 inches | 10-16 inches (arm-side) | Sinking, running |
| Cutter | 2400-2700 | 10-14 inches | 2-6 inches (glove-side) | Late break away |
| Slider | 2400-2800 | 2-8 inches | 4-10 inches (glove-side) | Sweeping break |
| Curveball | 2500-3000 | -6 to -12 inches | 4-12 inches | Big vertical drop |
| Changeup | 1500-2000 | 8-14 inches | 12-18 inches (arm-side) | Fading action |
| Splitter | 1500-1900 | 4-10 inches | -2 to +4 inches | Late tumble |
Note: Negative IVB means the pitch drops more than gravity alone; positive means it rises relative to gravity.
Horizontal break is measured from the catcher's perspective; positive = arm side, negative = glove side.
7.3.3 Pitch Movement Visualization
Understanding pitch movement is easier with visualization. Let's create movement charts:
Python Implementation
def plot_pitch_movement(df, player_name="Pitcher"):
"""
Create a pitch movement chart showing horizontal vs. vertical break.
Parameters:
df: Statcast DataFrame with pitch-level data
player_name: Name for chart title
"""
# Filter for pitches with movement data
pitches = df[
df['pfx_x'].notna() &
df['pfx_z'].notna() &
df['pitch_type'].notna()
].copy()
# Convert pfx (in feet) to inches for easier interpretation
pitches['horizontal_break'] = pitches['pfx_x'] * 12
pitches['induced_vertical_break'] = pitches['pfx_z'] * 12
# Get pitch counts for filtering
pitch_counts = pitches['pitch_type'].value_counts()
qualifying_pitches = pitch_counts[pitch_counts >= 30].index
plot_data = pitches[pitches['pitch_type'].isin(qualifying_pitches)]
# Create figure
fig, ax = plt.subplots(figsize=(12, 10))
# Define colors for pitch types
pitch_colors = {
'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
'FS': '#17becf', 'KC': '#bcbd22'
}
# Plot each pitch type
for pitch_type in qualifying_pitches:
pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]
ax.scatter(
pitch_subset['horizontal_break'],
pitch_subset['induced_vertical_break'],
c=pitch_colors.get(pitch_type, '#7f7f7f'),
label=f"{pitch_type} (n={len(pitch_subset)})",
alpha=0.6,
s=30,
edgecolors='black',
linewidth=0.5
)
# Add reference lines
ax.axhline(y=0, color='gray', linestyle='--', linewidth=1, alpha=0.5)
ax.axvline(x=0, color='gray', linestyle='--', linewidth=1, alpha=0.5)
# Labels and formatting
ax.set_xlabel('Horizontal Break (inches)\n← Glove Side | Arm Side →',
fontsize=12, fontweight='bold')
ax.set_ylabel('Induced Vertical Break (inches)\n↓ Drop | Rise ↑',
fontsize=12, fontweight='bold')
ax.set_title(f'{player_name} Pitch Movement Profile\nCatcher\'s Perspective',
fontsize=14, fontweight='bold', pad=20)
# Add grid
ax.grid(True, alpha=0.3, linestyle='--')
# Legend
ax.legend(loc='upper left', framealpha=0.9, fontsize=10)
# Equal aspect ratio for proper representation
ax.set_aspect('equal', adjustable='box')
plt.tight_layout()
plt.show()
# Create movement chart for Gerrit Cole
if cole_pitches is not None and len(cole_pitches) > 0:
plot_pitch_movement(cole_pitches, "Gerrit Cole 2024")
# R version: Pitch movement visualization
plot_pitch_movement <- function(df, player_name = "Pitcher") {
# Filter for pitches with movement data
pitches <- df %>%
filter(!is.na(pfx_x), !is.na(pfx_z), !is.na(pitch_type)) %>%
mutate(
horizontal_break = pfx_x * 12, # Convert feet to inches
induced_vertical_break = pfx_z * 12
)
# Filter for pitch types with sufficient counts
pitch_counts <- pitches %>%
count(pitch_type) %>%
filter(n >= 30)
plot_data <- pitches %>%
filter(pitch_type %in% pitch_counts$pitch_type)
# Create movement chart
ggplot(plot_data, aes(x = horizontal_break, y = induced_vertical_break,
color = pitch_type)) +
geom_point(alpha = 0.6, size = 2) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
scale_color_brewer(palette = "Set1",
name = "Pitch Type",
labels = function(x) {
counts <- pitch_counts %>%
filter(pitch_type %in% x) %>%
pull(n)
paste0(x, " (n=", counts, ")")
}) +
labs(
title = paste(player_name, "Pitch Movement Profile"),
subtitle = "Catcher's Perspective",
x = "Horizontal Break (inches)\n← Glove Side | Arm Side →",
y = "Induced Vertical Break (inches)\n↓ Drop | Rise ↑"
) +
coord_fixed() + # Equal aspect ratio
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
axis.title = element_text(face = "bold", size = 11),
legend.position = "right",
panel.grid.major = element_line(color = "gray90"),
panel.grid.minor = element_line(color = "gray95")
)
}
# Create movement chart for Gerrit Cole
plot_pitch_movement(cole_pitches, "Gerrit Cole 2024")
7.3.4 Analyzing Spin Efficiency
Spin efficiency measures how much of a pitch's spin creates movement versus gyro spin (which doesn't):
def calculate_spin_metrics(df):
"""
Calculate comprehensive spin metrics by pitch type.
Parameters:
df: Statcast DataFrame with pitch-level data
Returns:
DataFrame with spin metrics by pitch type
"""
# Filter for pitches with spin data
pitches = df[
df['release_spin_rate'].notna() &
df['pitch_type'].notna()
].copy()
# Group by pitch type
spin_metrics = pitches.groupby('pitch_type').agg({
'release_spin_rate': ['mean', 'std', 'min', 'max'],
'spin_axis': 'mean',
'release_speed': 'mean',
'pfx_x': lambda x: (x * 12).mean(), # Convert to inches
'pfx_z': lambda x: (x * 12).mean()
}).round(1)
spin_metrics.columns = ['avg_spin', 'spin_std', 'min_spin', 'max_spin',
'avg_spin_axis', 'avg_velo', 'avg_h_break', 'avg_v_break']
spin_metrics = spin_metrics.reset_index()
# Add pitch counts
pitch_counts = pitches.groupby('pitch_type').size().reset_index(name='count')
spin_metrics = spin_metrics.merge(pitch_counts, on='pitch_type')
# Calculate spin-to-velocity ratio (indicator of movement potential)
spin_metrics['spin_velo_ratio'] = (
spin_metrics['avg_spin'] / spin_metrics['avg_velo']
).round(1)
return spin_metrics.sort_values('avg_spin', ascending=False)
# Analyze Cole's spin rates
cole_spin = calculate_spin_metrics(cole_pitches)
print("\nGerrit Cole 2024 Spin Rate Profile")
print("=" * 90)
print(cole_spin.to_string(index=False))
# R version: Spin metrics
calculate_spin_metrics <- function(df) {
pitches <- df %>%
filter(!is.na(release_spin_rate), !is.na(pitch_type))
spin_metrics <- pitches %>%
group_by(pitch_type) %>%
summarise(
avg_spin = mean(release_spin_rate, na.rm = TRUE),
spin_std = sd(release_spin_rate, na.rm = TRUE),
min_spin = min(release_spin_rate, na.rm = TRUE),
max_spin = max(release_spin_rate, na.rm = TRUE),
avg_spin_axis = mean(spin_axis, na.rm = TRUE),
avg_velo = mean(release_speed, na.rm = TRUE),
avg_h_break = mean(pfx_x * 12, na.rm = TRUE), # Convert to inches
avg_v_break = mean(pfx_z * 12, na.rm = TRUE),
count = n(),
.groups = 'drop'
) %>%
mutate(
spin_velo_ratio = avg_spin / avg_velo,
across(where(is.numeric), ~round(.x, 1))
) %>%
arrange(desc(avg_spin))
return(spin_metrics)
}
cole_spin <- calculate_spin_metrics(cole_pitches)
cat("\nGerrit Cole 2024 Spin Rate Profile\n")
cat(strrep("=", 90), "\n")
print(cole_spin)
# R version: Pitch movement visualization
plot_pitch_movement <- function(df, player_name = "Pitcher") {
# Filter for pitches with movement data
pitches <- df %>%
filter(!is.na(pfx_x), !is.na(pfx_z), !is.na(pitch_type)) %>%
mutate(
horizontal_break = pfx_x * 12, # Convert feet to inches
induced_vertical_break = pfx_z * 12
)
# Filter for pitch types with sufficient counts
pitch_counts <- pitches %>%
count(pitch_type) %>%
filter(n >= 30)
plot_data <- pitches %>%
filter(pitch_type %in% pitch_counts$pitch_type)
# Create movement chart
ggplot(plot_data, aes(x = horizontal_break, y = induced_vertical_break,
color = pitch_type)) +
geom_point(alpha = 0.6, size = 2) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
scale_color_brewer(palette = "Set1",
name = "Pitch Type",
labels = function(x) {
counts <- pitch_counts %>%
filter(pitch_type %in% x) %>%
pull(n)
paste0(x, " (n=", counts, ")")
}) +
labs(
title = paste(player_name, "Pitch Movement Profile"),
subtitle = "Catcher's Perspective",
x = "Horizontal Break (inches)\n← Glove Side | Arm Side →",
y = "Induced Vertical Break (inches)\n↓ Drop | Rise ↑"
) +
coord_fixed() + # Equal aspect ratio
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
axis.title = element_text(face = "bold", size = 11),
legend.position = "right",
panel.grid.major = element_line(color = "gray90"),
panel.grid.minor = element_line(color = "gray95")
)
}
# Create movement chart for Gerrit Cole
plot_pitch_movement(cole_pitches, "Gerrit Cole 2024")
# R version: Spin metrics
calculate_spin_metrics <- function(df) {
pitches <- df %>%
filter(!is.na(release_spin_rate), !is.na(pitch_type))
spin_metrics <- pitches %>%
group_by(pitch_type) %>%
summarise(
avg_spin = mean(release_spin_rate, na.rm = TRUE),
spin_std = sd(release_spin_rate, na.rm = TRUE),
min_spin = min(release_spin_rate, na.rm = TRUE),
max_spin = max(release_spin_rate, na.rm = TRUE),
avg_spin_axis = mean(spin_axis, na.rm = TRUE),
avg_velo = mean(release_speed, na.rm = TRUE),
avg_h_break = mean(pfx_x * 12, na.rm = TRUE), # Convert to inches
avg_v_break = mean(pfx_z * 12, na.rm = TRUE),
count = n(),
.groups = 'drop'
) %>%
mutate(
spin_velo_ratio = avg_spin / avg_velo,
across(where(is.numeric), ~round(.x, 1))
) %>%
arrange(desc(avg_spin))
return(spin_metrics)
}
cole_spin <- calculate_spin_metrics(cole_pitches)
cat("\nGerrit Cole 2024 Spin Rate Profile\n")
cat(strrep("=", 90), "\n")
print(cole_spin)
def plot_pitch_movement(df, player_name="Pitcher"):
"""
Create a pitch movement chart showing horizontal vs. vertical break.
Parameters:
df: Statcast DataFrame with pitch-level data
player_name: Name for chart title
"""
# Filter for pitches with movement data
pitches = df[
df['pfx_x'].notna() &
df['pfx_z'].notna() &
df['pitch_type'].notna()
].copy()
# Convert pfx (in feet) to inches for easier interpretation
pitches['horizontal_break'] = pitches['pfx_x'] * 12
pitches['induced_vertical_break'] = pitches['pfx_z'] * 12
# Get pitch counts for filtering
pitch_counts = pitches['pitch_type'].value_counts()
qualifying_pitches = pitch_counts[pitch_counts >= 30].index
plot_data = pitches[pitches['pitch_type'].isin(qualifying_pitches)]
# Create figure
fig, ax = plt.subplots(figsize=(12, 10))
# Define colors for pitch types
pitch_colors = {
'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
'FS': '#17becf', 'KC': '#bcbd22'
}
# Plot each pitch type
for pitch_type in qualifying_pitches:
pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]
ax.scatter(
pitch_subset['horizontal_break'],
pitch_subset['induced_vertical_break'],
c=pitch_colors.get(pitch_type, '#7f7f7f'),
label=f"{pitch_type} (n={len(pitch_subset)})",
alpha=0.6,
s=30,
edgecolors='black',
linewidth=0.5
)
# Add reference lines
ax.axhline(y=0, color='gray', linestyle='--', linewidth=1, alpha=0.5)
ax.axvline(x=0, color='gray', linestyle='--', linewidth=1, alpha=0.5)
# Labels and formatting
ax.set_xlabel('Horizontal Break (inches)\n← Glove Side | Arm Side →',
fontsize=12, fontweight='bold')
ax.set_ylabel('Induced Vertical Break (inches)\n↓ Drop | Rise ↑',
fontsize=12, fontweight='bold')
ax.set_title(f'{player_name} Pitch Movement Profile\nCatcher\'s Perspective',
fontsize=14, fontweight='bold', pad=20)
# Add grid
ax.grid(True, alpha=0.3, linestyle='--')
# Legend
ax.legend(loc='upper left', framealpha=0.9, fontsize=10)
# Equal aspect ratio for proper representation
ax.set_aspect('equal', adjustable='box')
plt.tight_layout()
plt.show()
# Create movement chart for Gerrit Cole
if cole_pitches is not None and len(cole_pitches) > 0:
plot_pitch_movement(cole_pitches, "Gerrit Cole 2024")
def calculate_spin_metrics(df):
"""
Calculate comprehensive spin metrics by pitch type.
Parameters:
df: Statcast DataFrame with pitch-level data
Returns:
DataFrame with spin metrics by pitch type
"""
# Filter for pitches with spin data
pitches = df[
df['release_spin_rate'].notna() &
df['pitch_type'].notna()
].copy()
# Group by pitch type
spin_metrics = pitches.groupby('pitch_type').agg({
'release_spin_rate': ['mean', 'std', 'min', 'max'],
'spin_axis': 'mean',
'release_speed': 'mean',
'pfx_x': lambda x: (x * 12).mean(), # Convert to inches
'pfx_z': lambda x: (x * 12).mean()
}).round(1)
spin_metrics.columns = ['avg_spin', 'spin_std', 'min_spin', 'max_spin',
'avg_spin_axis', 'avg_velo', 'avg_h_break', 'avg_v_break']
spin_metrics = spin_metrics.reset_index()
# Add pitch counts
pitch_counts = pitches.groupby('pitch_type').size().reset_index(name='count')
spin_metrics = spin_metrics.merge(pitch_counts, on='pitch_type')
# Calculate spin-to-velocity ratio (indicator of movement potential)
spin_metrics['spin_velo_ratio'] = (
spin_metrics['avg_spin'] / spin_metrics['avg_velo']
).round(1)
return spin_metrics.sort_values('avg_spin', ascending=False)
# Analyze Cole's spin rates
cole_spin = calculate_spin_metrics(cole_pitches)
print("\nGerrit Cole 2024 Spin Rate Profile")
print("=" * 90)
print(cole_spin.to_string(index=False))
7.4.1 What VAA Is and Why It Matters
Vertical Approach Angle (VAA) is the angle at which a pitch enters the strike zone, measured in degrees from horizontal. A pitch coming in at -6° is dropping at a steeper angle than one at -4°.
VAA matters because it affects both hitter perception and contact quality:
- Perception: Flatter VAA (closer to 0°) makes pitches harder to track. The ball appears to "hop" or stay flat longer before dropping.
- Contact Quality: Steeper VAA means the ball is dropping more as it reaches the hitting zone, often resulting in ground balls. Flatter VAA on fastballs up generates swings underneath the ball.
- Deception: Large VAA differences between a pitcher's fastball and breaking ball create deception. If a fastball enters at -4.5° and a curveball at -7.5°, hitters struggle to identify pitches early.
Typical VAA ranges:
- Four-Seam Fastball: -4° to -5.5° (flatter = better for high fastballs)
- Sinker: -5° to -6.5° (steeper helps induce ground balls)
- Curveball: -6.5° to -9° (very steep, dramatic drop)
- Changeup: -5° to -6.5° (similar to sinker, induces weak contact)
Pitchers with plus extension and release height have flatter VAA on fastballs, making them more effective up in the zone.
7.4.2 Calculating VAA
VAA can be calculated from Statcast's trajectory data. The formula involves the pitch's vertical velocity (vz0) and horizontal velocity (vx0) as it crosses the plate:
VAA = arctan(vz / vy) × (180 / π)
Where:
- vz = vertical velocity at the plate (feet/second)
- vy = forward velocity at the plate (feet/second)
Let's implement VAA calculation:
Python Implementation
def calculate_vaa(df):
"""
Calculate Vertical Approach Angle for each pitch.
Parameters:
df: Statcast DataFrame with velocity components
Returns:
DataFrame with VAA added
"""
pitches = df.copy()
# Calculate VAA from velocity components at the plate
# vz0 and vy0 are velocities at the plate
pitches['vaa'] = np.degrees(
np.arctan(pitches['vz0'] / pitches['vy0'])
)
return pitches
def analyze_vaa_by_pitch_type(df):
"""
Analyze VAA metrics by pitch type.
Parameters:
df: Statcast DataFrame with VAA calculated
Returns:
DataFrame with VAA metrics by pitch type
"""
# Calculate VAA
pitches = calculate_vaa(df)
# Filter for valid VAA values
pitches = pitches[pitches['vaa'].notna()]
# Group by pitch type
vaa_metrics = pitches.groupby('pitch_type').agg({
'vaa': ['mean', 'std', 'min', 'max'],
'release_speed': 'mean',
'release_extension': 'mean',
'release_pos_z': 'mean' # Release height
}).round(2)
vaa_metrics.columns = ['avg_vaa', 'vaa_std', 'min_vaa', 'max_vaa',
'avg_velo', 'avg_extension', 'release_height']
vaa_metrics = vaa_metrics.reset_index()
# Add pitch counts
pitch_counts = pitches.groupby('pitch_type').size().reset_index(name='count')
vaa_metrics = vaa_metrics.merge(pitch_counts, on='pitch_type')
return vaa_metrics.sort_values('avg_vaa')
# Analyze VAA for Gerrit Cole
cole_vaa = analyze_vaa_by_pitch_type(cole_pitches)
print("\nGerrit Cole 2024 Vertical Approach Angle Profile")
print("=" * 80)
print(cole_vaa.to_string(index=False))
# Visualize VAA by pitch type
fig, ax = plt.subplots(figsize=(12, 6))
qualifying_pitches = cole_vaa[cole_vaa['count'] >= 50]['pitch_type']
plot_data = cole_vaa[cole_vaa['pitch_type'].isin(qualifying_pitches)]
bars = ax.bar(range(len(plot_data)), plot_data['avg_vaa'],
color='steelblue', alpha=0.7, edgecolor='black')
# Color bars by category
for i, (idx, row) in enumerate(plot_data.iterrows()):
if row['avg_vaa'] > -5:
bars[i].set_color('#2ca02c') # Green for flat
elif row['avg_vaa'] > -6:
bars[i].set_color('#ff7f0e') # Orange for medium
else:
bars[i].set_color('#d62728') # Red for steep
ax.set_xticks(range(len(plot_data)))
ax.set_xticklabels(plot_data['pitch_type'])
ax.set_xlabel('Pitch Type', fontsize=12, fontweight='bold')
ax.set_ylabel('Average VAA (degrees)', fontsize=12, fontweight='bold')
ax.set_title('Vertical Approach Angle by Pitch Type\nGerrit Cole 2024\n' +
'Green = Flat | Orange = Medium | Red = Steep',
fontsize=13, fontweight='bold')
ax.axhline(y=0, color='black', linestyle='-', linewidth=0.8)
ax.grid(axis='y', alpha=0.3, linestyle='--')
# Add value labels
for i, (idx, row) in enumerate(plot_data.iterrows()):
ax.text(i, row['avg_vaa'], f"{row['avg_vaa']:.1f}°",
ha='center', va='bottom' if row['avg_vaa'] > 0 else 'top',
fontweight='bold', fontsize=10)
plt.tight_layout()
plt.show()
R Implementation
calculate_vaa <- function(df) {
# Calculate VAA from velocity components
df <- df %>%
mutate(
vaa = atan(vz0 / vy0) * (180 / pi)
)
return(df)
}
analyze_vaa_by_pitch_type <- function(df) {
# Calculate VAA
pitches <- calculate_vaa(df) %>%
filter(!is.na(vaa))
# Group by pitch type
vaa_metrics <- pitches %>%
group_by(pitch_type) %>%
summarise(
avg_vaa = mean(vaa, na.rm = TRUE),
vaa_std = sd(vaa, na.rm = TRUE),
min_vaa = min(vaa, na.rm = TRUE),
max_vaa = max(vaa, na.rm = TRUE),
avg_velo = mean(release_speed, na.rm = TRUE),
avg_extension = mean(release_extension, na.rm = TRUE),
release_height = mean(release_pos_z, na.rm = TRUE),
count = n(),
.groups = 'drop'
) %>%
mutate(across(where(is.numeric), ~round(.x, 2))) %>%
arrange(avg_vaa)
return(vaa_metrics)
}
# Analyze VAA for Gerrit Cole
cole_vaa <- analyze_vaa_by_pitch_type(cole_pitches)
cat("\nGerrit Cole 2024 Vertical Approach Angle Profile\n")
cat(strrep("=", 80), "\n")
print(cole_vaa)
# Visualize VAA by pitch type
cole_vaa %>%
filter(count >= 50) %>%
mutate(
vaa_category = case_when(
avg_vaa > -5 ~ "Flat",
avg_vaa > -6 ~ "Medium",
TRUE ~ "Steep"
),
vaa_category = factor(vaa_category, levels = c("Flat", "Medium", "Steep"))
) %>%
ggplot(aes(x = reorder(pitch_type, avg_vaa), y = avg_vaa, fill = vaa_category)) +
geom_bar(stat = "identity", color = "black", alpha = 0.8) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_text(aes(label = sprintf("%.1f°", avg_vaa)),
hjust = ifelse(cole_vaa$avg_vaa[cole_vaa$count >= 50] > 0, -0.2, 1.2),
fontface = "bold", size = 4) +
scale_fill_manual(values = c("Flat" = "#2ca02c", "Medium" = "#ff7f0e", "Steep" = "#d62728")) +
labs(
title = "Vertical Approach Angle by Pitch Type",
subtitle = "Gerrit Cole 2024",
x = "Pitch Type",
y = "Average VAA (degrees)",
fill = "VAA Category"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
axis.title = element_text(face = "bold", size = 11),
legend.position = "top"
)
7.4.3 VAA and Performance
Flat VAA on fastballs correlates with higher whiff rates when thrown in the upper third of the zone. Pitchers like Spencer Strider generate extreme whiff rates partly due to their flat VAA combined with elite velocity and spin.
Conversely, pitches with steep VAA work better low in the zone, generating ground balls and weak contact. Sinkerball pitchers leverage steep VAA to induce ground balls.
calculate_vaa <- function(df) {
# Calculate VAA from velocity components
df <- df %>%
mutate(
vaa = atan(vz0 / vy0) * (180 / pi)
)
return(df)
}
analyze_vaa_by_pitch_type <- function(df) {
# Calculate VAA
pitches <- calculate_vaa(df) %>%
filter(!is.na(vaa))
# Group by pitch type
vaa_metrics <- pitches %>%
group_by(pitch_type) %>%
summarise(
avg_vaa = mean(vaa, na.rm = TRUE),
vaa_std = sd(vaa, na.rm = TRUE),
min_vaa = min(vaa, na.rm = TRUE),
max_vaa = max(vaa, na.rm = TRUE),
avg_velo = mean(release_speed, na.rm = TRUE),
avg_extension = mean(release_extension, na.rm = TRUE),
release_height = mean(release_pos_z, na.rm = TRUE),
count = n(),
.groups = 'drop'
) %>%
mutate(across(where(is.numeric), ~round(.x, 2))) %>%
arrange(avg_vaa)
return(vaa_metrics)
}
# Analyze VAA for Gerrit Cole
cole_vaa <- analyze_vaa_by_pitch_type(cole_pitches)
cat("\nGerrit Cole 2024 Vertical Approach Angle Profile\n")
cat(strrep("=", 80), "\n")
print(cole_vaa)
# Visualize VAA by pitch type
cole_vaa %>%
filter(count >= 50) %>%
mutate(
vaa_category = case_when(
avg_vaa > -5 ~ "Flat",
avg_vaa > -6 ~ "Medium",
TRUE ~ "Steep"
),
vaa_category = factor(vaa_category, levels = c("Flat", "Medium", "Steep"))
) %>%
ggplot(aes(x = reorder(pitch_type, avg_vaa), y = avg_vaa, fill = vaa_category)) +
geom_bar(stat = "identity", color = "black", alpha = 0.8) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_text(aes(label = sprintf("%.1f°", avg_vaa)),
hjust = ifelse(cole_vaa$avg_vaa[cole_vaa$count >= 50] > 0, -0.2, 1.2),
fontface = "bold", size = 4) +
scale_fill_manual(values = c("Flat" = "#2ca02c", "Medium" = "#ff7f0e", "Steep" = "#d62728")) +
labs(
title = "Vertical Approach Angle by Pitch Type",
subtitle = "Gerrit Cole 2024",
x = "Pitch Type",
y = "Average VAA (degrees)",
fill = "VAA Category"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
axis.title = element_text(face = "bold", size = 11),
legend.position = "top"
)
def calculate_vaa(df):
"""
Calculate Vertical Approach Angle for each pitch.
Parameters:
df: Statcast DataFrame with velocity components
Returns:
DataFrame with VAA added
"""
pitches = df.copy()
# Calculate VAA from velocity components at the plate
# vz0 and vy0 are velocities at the plate
pitches['vaa'] = np.degrees(
np.arctan(pitches['vz0'] / pitches['vy0'])
)
return pitches
def analyze_vaa_by_pitch_type(df):
"""
Analyze VAA metrics by pitch type.
Parameters:
df: Statcast DataFrame with VAA calculated
Returns:
DataFrame with VAA metrics by pitch type
"""
# Calculate VAA
pitches = calculate_vaa(df)
# Filter for valid VAA values
pitches = pitches[pitches['vaa'].notna()]
# Group by pitch type
vaa_metrics = pitches.groupby('pitch_type').agg({
'vaa': ['mean', 'std', 'min', 'max'],
'release_speed': 'mean',
'release_extension': 'mean',
'release_pos_z': 'mean' # Release height
}).round(2)
vaa_metrics.columns = ['avg_vaa', 'vaa_std', 'min_vaa', 'max_vaa',
'avg_velo', 'avg_extension', 'release_height']
vaa_metrics = vaa_metrics.reset_index()
# Add pitch counts
pitch_counts = pitches.groupby('pitch_type').size().reset_index(name='count')
vaa_metrics = vaa_metrics.merge(pitch_counts, on='pitch_type')
return vaa_metrics.sort_values('avg_vaa')
# Analyze VAA for Gerrit Cole
cole_vaa = analyze_vaa_by_pitch_type(cole_pitches)
print("\nGerrit Cole 2024 Vertical Approach Angle Profile")
print("=" * 80)
print(cole_vaa.to_string(index=False))
# Visualize VAA by pitch type
fig, ax = plt.subplots(figsize=(12, 6))
qualifying_pitches = cole_vaa[cole_vaa['count'] >= 50]['pitch_type']
plot_data = cole_vaa[cole_vaa['pitch_type'].isin(qualifying_pitches)]
bars = ax.bar(range(len(plot_data)), plot_data['avg_vaa'],
color='steelblue', alpha=0.7, edgecolor='black')
# Color bars by category
for i, (idx, row) in enumerate(plot_data.iterrows()):
if row['avg_vaa'] > -5:
bars[i].set_color('#2ca02c') # Green for flat
elif row['avg_vaa'] > -6:
bars[i].set_color('#ff7f0e') # Orange for medium
else:
bars[i].set_color('#d62728') # Red for steep
ax.set_xticks(range(len(plot_data)))
ax.set_xticklabels(plot_data['pitch_type'])
ax.set_xlabel('Pitch Type', fontsize=12, fontweight='bold')
ax.set_ylabel('Average VAA (degrees)', fontsize=12, fontweight='bold')
ax.set_title('Vertical Approach Angle by Pitch Type\nGerrit Cole 2024\n' +
'Green = Flat | Orange = Medium | Red = Steep',
fontsize=13, fontweight='bold')
ax.axhline(y=0, color='black', linestyle='-', linewidth=0.8)
ax.grid(axis='y', alpha=0.3, linestyle='--')
# Add value labels
for i, (idx, row) in enumerate(plot_data.iterrows()):
ax.text(i, row['avg_vaa'], f"{row['avg_vaa']:.1f}°",
ha='center', va='bottom' if row['avg_vaa'] > 0 else 'top',
fontweight='bold', fontsize=10)
plt.tight_layout()
plt.show()
7.5.1 Release Point Consistency
Release point refers to where the ball leaves the pitcher's hand, measured in three dimensions:
- X-axis: Side-to-side (from catcher's view, negative = arm side, positive = glove side)
- Y-axis: Distance from home plate (extension)
- Z-axis: Height above ground
Consistency in release point is crucial for deception. If a pitcher releases his fastball from 6 feet high and his curveball from 5.5 feet, hitters can identify pitches early. Elite pitchers maintain nearly identical release points across their arsenal.
def analyze_release_point_consistency(df):
"""
Analyze release point consistency by pitch type.
Parameters:
df: Statcast DataFrame with release point data
Returns:
DataFrame with release point metrics and consistency measures
"""
pitches = df[
df['release_pos_x'].notna() &
df['release_pos_z'].notna() &
df['pitch_type'].notna()
].copy()
# Calculate metrics by pitch type
release_metrics = pitches.groupby('pitch_type').agg({
'release_pos_x': ['mean', 'std'],
'release_pos_z': ['mean', 'std'],
'release_extension': ['mean', 'std']
}).round(3)
release_metrics.columns = ['x_mean', 'x_std', 'z_mean', 'z_std',
'ext_mean', 'ext_std']
release_metrics = release_metrics.reset_index()
# Add pitch counts
pitch_counts = pitches.groupby('pitch_type').size().reset_index(name='count')
release_metrics = release_metrics.merge(pitch_counts, on='pitch_type')
# Calculate total variability (consistency score - lower is better)
release_metrics['consistency_score'] = (
release_metrics['x_std'] + release_metrics['z_std']
).round(3)
return release_metrics.sort_values('consistency_score')
# Visualize release point scatter
def plot_release_points(df, player_name="Pitcher"):
"""Create scatter plot of release points by pitch type."""
pitches = df[
df['release_pos_x'].notna() &
df['release_pos_z'].notna() &
df['pitch_type'].notna()
].copy()
# Filter for pitch types with sufficient counts
pitch_counts = pitches['pitch_type'].value_counts()
qualifying_pitches = pitch_counts[pitch_counts >= 30].index
plot_data = pitches[pitches['pitch_type'].isin(qualifying_pitches)]
fig, ax = plt.subplots(figsize=(10, 12))
# Pitch colors
pitch_colors = {
'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
'FS': '#17becf', 'KC': '#bcbd22'
}
# Plot each pitch type
for pitch_type in qualifying_pitches:
pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]
ax.scatter(
pitch_subset['release_pos_x'],
pitch_subset['release_pos_z'],
c=pitch_colors.get(pitch_type, '#7f7f7f'),
label=f"{pitch_type} (n={len(pitch_subset)})",
alpha=0.4,
s=20,
edgecolors='none'
)
# Add mean release point
mean_x = pitch_subset['release_pos_x'].mean()
mean_z = pitch_subset['release_pos_z'].mean()
ax.scatter(mean_x, mean_z,
c=pitch_colors.get(pitch_type, '#7f7f7f'),
marker='X', s=200, edgecolors='black', linewidth=2,
zorder=10)
ax.set_xlabel('Horizontal Release Point (feet)\n← Arm Side | Glove Side →',
fontsize=12, fontweight='bold')
ax.set_ylabel('Vertical Release Point (feet)',
fontsize=12, fontweight='bold')
ax.set_title(f'{player_name} Release Point Consistency\n' +
'X = Mean Release Point by Pitch Type',
fontsize=14, fontweight='bold', pad=20)
ax.grid(True, alpha=0.3, linestyle='--')
ax.legend(loc='upper right', framealpha=0.9, fontsize=9)
plt.tight_layout()
plt.show()
# Analyze Cole's release point consistency
cole_release = analyze_release_point_consistency(cole_pitches)
print("\nRelease Point Consistency Analysis")
print("=" * 80)
print(cole_release.to_string(index=False))
print("\nNote: Lower consistency_score indicates better release point consistency")
# Visualize
plot_release_points(cole_pitches, "Gerrit Cole 2024")
# R version: Release point analysis
analyze_release_point_consistency <- function(df) {
pitches <- df %>%
filter(!is.na(release_pos_x), !is.na(release_pos_z), !is.na(pitch_type))
release_metrics <- pitches %>%
group_by(pitch_type) %>%
summarise(
x_mean = mean(release_pos_x, na.rm = TRUE),
x_std = sd(release_pos_x, na.rm = TRUE),
z_mean = mean(release_pos_z, na.rm = TRUE),
z_std = sd(release_pos_z, na.rm = TRUE),
ext_mean = mean(release_extension, na.rm = TRUE),
ext_std = sd(release_extension, na.rm = TRUE),
count = n(),
.groups = 'drop'
) %>%
mutate(
consistency_score = x_std + z_std,
across(where(is.numeric), ~round(.x, 3))
) %>%
arrange(consistency_score)
return(release_metrics)
}
plot_release_points <- function(df, player_name = "Pitcher") {
pitches <- df %>%
filter(!is.na(release_pos_x), !is.na(release_pos_z), !is.na(pitch_type))
# Filter for sufficient pitch counts
pitch_counts <- pitches %>%
count(pitch_type) %>%
filter(n >= 30)
plot_data <- pitches %>%
filter(pitch_type %in% pitch_counts$pitch_type)
# Calculate means for each pitch type
means <- plot_data %>%
group_by(pitch_type) %>%
summarise(
mean_x = mean(release_pos_x),
mean_z = mean(release_pos_z),
.groups = 'drop'
)
ggplot(plot_data, aes(x = release_pos_x, y = release_pos_z, color = pitch_type)) +
geom_point(alpha = 0.4, size = 1.5) +
geom_point(data = means, aes(x = mean_x, y = mean_z, color = pitch_type),
shape = 4, size = 8, stroke = 2) +
scale_color_brewer(palette = "Set1", name = "Pitch Type") +
labs(
title = paste(player_name, "Release Point Consistency"),
subtitle = "X = Mean Release Point by Pitch Type",
x = "Horizontal Release Point (feet)\n← Arm Side | Glove Side →",
y = "Vertical Release Point (feet)"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
axis.title = element_text(face = "bold", size = 11),
legend.position = "right"
)
}
cole_release <- analyze_release_point_consistency(cole_pitches)
cat("\nRelease Point Consistency Analysis\n")
cat(strrep("=", 80), "\n")
print(cole_release)
cat("\nNote: Lower consistency_score indicates better release point consistency\n")
plot_release_points(cole_pitches, "Gerrit Cole 2024")
7.5.2 Pitch Tunneling
Pitch tunneling refers to the concept that effective pitch sequences make pitches look identical for as long as possible before breaking sharply in different directions. When a fastball and changeup travel through the same "tunnel" (similar trajectory) for most of the flight path, hitters can't distinguish them until it's too late to adjust.
Tunneling metrics:
- Tunnel Point: The location where pitch paths diverge measurably
- Release Point Similarity: How close release points are across pitch types
- Early Flight Path: Trajectory similarity in the first 20-30 feet
Elite tunneling creates decision-making problems for hitters. They must commit to a swing based on early trajectory, but elite tunneling means that trajectory is identical for multiple pitch types.
Example: Gerrit Cole's four-seam fastball and slider come from nearly identical release points with similar early trajectories. By the time the slider breaks away, the hitter has already committed to the fastball location.
# R version: Release point analysis
analyze_release_point_consistency <- function(df) {
pitches <- df %>%
filter(!is.na(release_pos_x), !is.na(release_pos_z), !is.na(pitch_type))
release_metrics <- pitches %>%
group_by(pitch_type) %>%
summarise(
x_mean = mean(release_pos_x, na.rm = TRUE),
x_std = sd(release_pos_x, na.rm = TRUE),
z_mean = mean(release_pos_z, na.rm = TRUE),
z_std = sd(release_pos_z, na.rm = TRUE),
ext_mean = mean(release_extension, na.rm = TRUE),
ext_std = sd(release_extension, na.rm = TRUE),
count = n(),
.groups = 'drop'
) %>%
mutate(
consistency_score = x_std + z_std,
across(where(is.numeric), ~round(.x, 3))
) %>%
arrange(consistency_score)
return(release_metrics)
}
plot_release_points <- function(df, player_name = "Pitcher") {
pitches <- df %>%
filter(!is.na(release_pos_x), !is.na(release_pos_z), !is.na(pitch_type))
# Filter for sufficient pitch counts
pitch_counts <- pitches %>%
count(pitch_type) %>%
filter(n >= 30)
plot_data <- pitches %>%
filter(pitch_type %in% pitch_counts$pitch_type)
# Calculate means for each pitch type
means <- plot_data %>%
group_by(pitch_type) %>%
summarise(
mean_x = mean(release_pos_x),
mean_z = mean(release_pos_z),
.groups = 'drop'
)
ggplot(plot_data, aes(x = release_pos_x, y = release_pos_z, color = pitch_type)) +
geom_point(alpha = 0.4, size = 1.5) +
geom_point(data = means, aes(x = mean_x, y = mean_z, color = pitch_type),
shape = 4, size = 8, stroke = 2) +
scale_color_brewer(palette = "Set1", name = "Pitch Type") +
labs(
title = paste(player_name, "Release Point Consistency"),
subtitle = "X = Mean Release Point by Pitch Type",
x = "Horizontal Release Point (feet)\n← Arm Side | Glove Side →",
y = "Vertical Release Point (feet)"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
axis.title = element_text(face = "bold", size = 11),
legend.position = "right"
)
}
cole_release <- analyze_release_point_consistency(cole_pitches)
cat("\nRelease Point Consistency Analysis\n")
cat(strrep("=", 80), "\n")
print(cole_release)
cat("\nNote: Lower consistency_score indicates better release point consistency\n")
plot_release_points(cole_pitches, "Gerrit Cole 2024")
def analyze_release_point_consistency(df):
"""
Analyze release point consistency by pitch type.
Parameters:
df: Statcast DataFrame with release point data
Returns:
DataFrame with release point metrics and consistency measures
"""
pitches = df[
df['release_pos_x'].notna() &
df['release_pos_z'].notna() &
df['pitch_type'].notna()
].copy()
# Calculate metrics by pitch type
release_metrics = pitches.groupby('pitch_type').agg({
'release_pos_x': ['mean', 'std'],
'release_pos_z': ['mean', 'std'],
'release_extension': ['mean', 'std']
}).round(3)
release_metrics.columns = ['x_mean', 'x_std', 'z_mean', 'z_std',
'ext_mean', 'ext_std']
release_metrics = release_metrics.reset_index()
# Add pitch counts
pitch_counts = pitches.groupby('pitch_type').size().reset_index(name='count')
release_metrics = release_metrics.merge(pitch_counts, on='pitch_type')
# Calculate total variability (consistency score - lower is better)
release_metrics['consistency_score'] = (
release_metrics['x_std'] + release_metrics['z_std']
).round(3)
return release_metrics.sort_values('consistency_score')
# Visualize release point scatter
def plot_release_points(df, player_name="Pitcher"):
"""Create scatter plot of release points by pitch type."""
pitches = df[
df['release_pos_x'].notna() &
df['release_pos_z'].notna() &
df['pitch_type'].notna()
].copy()
# Filter for pitch types with sufficient counts
pitch_counts = pitches['pitch_type'].value_counts()
qualifying_pitches = pitch_counts[pitch_counts >= 30].index
plot_data = pitches[pitches['pitch_type'].isin(qualifying_pitches)]
fig, ax = plt.subplots(figsize=(10, 12))
# Pitch colors
pitch_colors = {
'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
'FS': '#17becf', 'KC': '#bcbd22'
}
# Plot each pitch type
for pitch_type in qualifying_pitches:
pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]
ax.scatter(
pitch_subset['release_pos_x'],
pitch_subset['release_pos_z'],
c=pitch_colors.get(pitch_type, '#7f7f7f'),
label=f"{pitch_type} (n={len(pitch_subset)})",
alpha=0.4,
s=20,
edgecolors='none'
)
# Add mean release point
mean_x = pitch_subset['release_pos_x'].mean()
mean_z = pitch_subset['release_pos_z'].mean()
ax.scatter(mean_x, mean_z,
c=pitch_colors.get(pitch_type, '#7f7f7f'),
marker='X', s=200, edgecolors='black', linewidth=2,
zorder=10)
ax.set_xlabel('Horizontal Release Point (feet)\n← Arm Side | Glove Side →',
fontsize=12, fontweight='bold')
ax.set_ylabel('Vertical Release Point (feet)',
fontsize=12, fontweight='bold')
ax.set_title(f'{player_name} Release Point Consistency\n' +
'X = Mean Release Point by Pitch Type',
fontsize=14, fontweight='bold', pad=20)
ax.grid(True, alpha=0.3, linestyle='--')
ax.legend(loc='upper right', framealpha=0.9, fontsize=9)
plt.tight_layout()
plt.show()
# Analyze Cole's release point consistency
cole_release = analyze_release_point_consistency(cole_pitches)
print("\nRelease Point Consistency Analysis")
print("=" * 80)
print(cole_release.to_string(index=False))
print("\nNote: Lower consistency_score indicates better release point consistency")
# Visualize
plot_release_points(cole_pitches, "Gerrit Cole 2024")
7.6.1 Building an Arsenal Report
A comprehensive arsenal report summarizes a pitcher's full repertoire, including usage rates, velocity, movement, and outcome metrics for each pitch type:
def create_arsenal_report(df, player_name="Pitcher"):
"""
Create comprehensive pitch arsenal report.
Parameters:
df: Statcast DataFrame with pitch-level data
player_name: Name for report header
Returns:
DataFrame with complete arsenal metrics
"""
pitches = df[df['pitch_type'].notna()].copy()
# Calculate whiff rate (swings and misses / swings)
pitches['was_swing'] = pitches['description'].isin([
'swinging_strike', 'swinging_strike_blocked',
'foul', 'hit_into_play', 'foul_tip'
])
pitches['was_whiff'] = pitches['description'].isin([
'swinging_strike', 'swinging_strike_blocked'
])
# Build arsenal report
arsenal = pitches.groupby('pitch_type').agg({
'pitch_type': 'count', # Total pitches
'release_speed': 'mean',
'release_spin_rate': 'mean',
'pfx_z': lambda x: (x * 12).mean(), # IVB in inches
'pfx_x': lambda x: (x * 12).mean(), # HB in inches
'was_swing': 'sum',
'was_whiff': 'sum'
}).round(1)
arsenal.columns = ['count', 'avg_velo', 'avg_spin', 'ivb', 'hb',
'swings', 'whiffs']
arsenal = arsenal.reset_index()
# Calculate rates
total_pitches = arsenal['count'].sum()
arsenal['usage_pct'] = (arsenal['count'] / total_pitches * 100).round(1)
arsenal['whiff_rate'] = (arsenal['whiffs'] / arsenal['swings'] * 100).round(1)
# Get xwOBA by pitch type (if available)
if 'estimated_woba_using_speedangle' in pitches.columns:
xwoba = pitches.groupby('pitch_type')['estimated_woba_using_speedangle'].mean()
arsenal = arsenal.merge(
xwoba.round(3).reset_index().rename(columns={'estimated_woba_using_speedangle': 'xwOBA'}),
on='pitch_type',
how='left'
)
# Sort by usage
arsenal = arsenal.sort_values('usage_pct', ascending=False)
return arsenal
# Generate arsenal report
cole_arsenal = create_arsenal_report(cole_pitches, "Gerrit Cole")
print("\n" + "="*90)
print(f"PITCH ARSENAL REPORT: Gerrit Cole 2024")
print("="*90)
print(cole_arsenal.to_string(index=False))
print("\nKey:")
print(" IVB = Induced Vertical Break (inches)")
print(" HB = Horizontal Break (inches, + = arm side, - = glove side)")
print(" Whiff Rate = Swinging strikes / Total swings")
# R version: Arsenal report
create_arsenal_report <- function(df, player_name = "Pitcher") {
pitches <- df %>%
filter(!is.na(pitch_type))
# Identify swings and whiffs
pitches <- pitches %>%
mutate(
was_swing = description %in% c('swinging_strike', 'swinging_strike_blocked',
'foul', 'hit_into_play', 'foul_tip'),
was_whiff = description %in% c('swinging_strike', 'swinging_strike_blocked')
)
# Build arsenal report
arsenal <- pitches %>%
group_by(pitch_type) %>%
summarise(
count = n(),
avg_velo = mean(release_speed, na.rm = TRUE),
avg_spin = mean(release_spin_rate, na.rm = TRUE),
ivb = mean(pfx_z * 12, na.rm = TRUE),
hb = mean(pfx_x * 12, na.rm = TRUE),
swings = sum(was_swing, na.rm = TRUE),
whiffs = sum(was_whiff, na.rm = TRUE),
xwOBA = mean(estimated_woba_using_speedangle, na.rm = TRUE),
.groups = 'drop'
) %>%
mutate(
usage_pct = count / sum(count) * 100,
whiff_rate = whiffs / swings * 100,
across(c(avg_velo, avg_spin, ivb, hb, usage_pct, whiff_rate, xwOBA),
~round(.x, 1))
) %>%
arrange(desc(usage_pct))
return(arsenal)
}
cole_arsenal <- create_arsenal_report(cole_pitches, "Gerrit Cole")
cat("\n", strrep("=", 90), "\n")
cat("PITCH ARSENAL REPORT: Gerrit Cole 2024\n")
cat(strrep("=", 90), "\n")
print(cole_arsenal)
cat("\nKey:\n")
cat(" IVB = Induced Vertical Break (inches)\n")
cat(" HB = Horizontal Break (inches, + = arm side, - = glove side)\n")
cat(" Whiff Rate = Swinging strikes / Total swings\n")
7.6.2 Understanding Stuff+
Stuff+ is a metric that evaluates the "quality" of a pitch in isolation, independent of results. It's scaled to 100, where:
- 100 = MLB average
- 110 = 10% better than average
- 90 = 10% worse than average
Stuff+ models incorporate:
- Velocity
- Spin rate
- Movement (vertical and horizontal)
- Release characteristics
- Platoon matchup
Stuff+ helps identify pitchers who might be underperforming their arsenal quality (unlucky results or command issues) or overperforming (likely to regress).
Location+ and Pitching+ complement Stuff+:
- Location+: Quality of pitch location/command
- Pitching+: Combined metric (stuff + location + context)
While we can't replicate the proprietary Stuff+ model exactly, we can approximate it using Statcast data and understand its principles.
# R version: Arsenal report
create_arsenal_report <- function(df, player_name = "Pitcher") {
pitches <- df %>%
filter(!is.na(pitch_type))
# Identify swings and whiffs
pitches <- pitches %>%
mutate(
was_swing = description %in% c('swinging_strike', 'swinging_strike_blocked',
'foul', 'hit_into_play', 'foul_tip'),
was_whiff = description %in% c('swinging_strike', 'swinging_strike_blocked')
)
# Build arsenal report
arsenal <- pitches %>%
group_by(pitch_type) %>%
summarise(
count = n(),
avg_velo = mean(release_speed, na.rm = TRUE),
avg_spin = mean(release_spin_rate, na.rm = TRUE),
ivb = mean(pfx_z * 12, na.rm = TRUE),
hb = mean(pfx_x * 12, na.rm = TRUE),
swings = sum(was_swing, na.rm = TRUE),
whiffs = sum(was_whiff, na.rm = TRUE),
xwOBA = mean(estimated_woba_using_speedangle, na.rm = TRUE),
.groups = 'drop'
) %>%
mutate(
usage_pct = count / sum(count) * 100,
whiff_rate = whiffs / swings * 100,
across(c(avg_velo, avg_spin, ivb, hb, usage_pct, whiff_rate, xwOBA),
~round(.x, 1))
) %>%
arrange(desc(usage_pct))
return(arsenal)
}
cole_arsenal <- create_arsenal_report(cole_pitches, "Gerrit Cole")
cat("\n", strrep("=", 90), "\n")
cat("PITCH ARSENAL REPORT: Gerrit Cole 2024\n")
cat(strrep("=", 90), "\n")
print(cole_arsenal)
cat("\nKey:\n")
cat(" IVB = Induced Vertical Break (inches)\n")
cat(" HB = Horizontal Break (inches, + = arm side, - = glove side)\n")
cat(" Whiff Rate = Swinging strikes / Total swings\n")
def create_arsenal_report(df, player_name="Pitcher"):
"""
Create comprehensive pitch arsenal report.
Parameters:
df: Statcast DataFrame with pitch-level data
player_name: Name for report header
Returns:
DataFrame with complete arsenal metrics
"""
pitches = df[df['pitch_type'].notna()].copy()
# Calculate whiff rate (swings and misses / swings)
pitches['was_swing'] = pitches['description'].isin([
'swinging_strike', 'swinging_strike_blocked',
'foul', 'hit_into_play', 'foul_tip'
])
pitches['was_whiff'] = pitches['description'].isin([
'swinging_strike', 'swinging_strike_blocked'
])
# Build arsenal report
arsenal = pitches.groupby('pitch_type').agg({
'pitch_type': 'count', # Total pitches
'release_speed': 'mean',
'release_spin_rate': 'mean',
'pfx_z': lambda x: (x * 12).mean(), # IVB in inches
'pfx_x': lambda x: (x * 12).mean(), # HB in inches
'was_swing': 'sum',
'was_whiff': 'sum'
}).round(1)
arsenal.columns = ['count', 'avg_velo', 'avg_spin', 'ivb', 'hb',
'swings', 'whiffs']
arsenal = arsenal.reset_index()
# Calculate rates
total_pitches = arsenal['count'].sum()
arsenal['usage_pct'] = (arsenal['count'] / total_pitches * 100).round(1)
arsenal['whiff_rate'] = (arsenal['whiffs'] / arsenal['swings'] * 100).round(1)
# Get xwOBA by pitch type (if available)
if 'estimated_woba_using_speedangle' in pitches.columns:
xwoba = pitches.groupby('pitch_type')['estimated_woba_using_speedangle'].mean()
arsenal = arsenal.merge(
xwoba.round(3).reset_index().rename(columns={'estimated_woba_using_speedangle': 'xwOBA'}),
on='pitch_type',
how='left'
)
# Sort by usage
arsenal = arsenal.sort_values('usage_pct', ascending=False)
return arsenal
# Generate arsenal report
cole_arsenal = create_arsenal_report(cole_pitches, "Gerrit Cole")
print("\n" + "="*90)
print(f"PITCH ARSENAL REPORT: Gerrit Cole 2024")
print("="*90)
print(cole_arsenal.to_string(index=False))
print("\nKey:")
print(" IVB = Induced Vertical Break (inches)")
print(" HB = Horizontal Break (inches, + = arm side, - = glove side)")
print(" Whiff Rate = Swinging strikes / Total swings")
7.7.1 Zone Analysis Metrics
Pitch location is as important as pitch quality. The strike zone can be divided into regions for analysis:
Zone Metrics:
- Zone%: Percentage of pitches in the strike zone
- Edge%: Percentage of pitches on the edges (borders of zone)
- Heart%: Percentage of pitches in the middle of the zone (most hittable)
- Chase%: Percentage of pitches outside zone that generate swings
- Waste%: Percentage of pitches well outside zone (intentional balls)
def analyze_location_metrics(df):
"""
Calculate pitch location and command metrics.
Parameters:
df: Statcast DataFrame with location data
Returns:
DataFrame with location metrics by pitch type
"""
pitches = df[df['zone'].notna()].copy()
# Zone definitions (Statcast uses zones 1-9 for strike zone, 11-14 for outside)
pitches['in_zone'] = pitches['zone'] <= 9
pitches['heart'] = pitches['zone'].isin([5]) # Zone 5 is heart
pitches['edge'] = pitches['zone'].isin([1, 2, 3, 4, 6, 7, 8, 9]) & ~pitches['heart']
pitches['outside_zone'] = pitches['zone'] > 9
# Identify chases (swings on pitches outside zone)
pitches['is_swing'] = pitches['description'].isin([
'swinging_strike', 'swinging_strike_blocked',
'foul', 'hit_into_play', 'foul_tip'
])
pitches['is_chase'] = pitches['is_swing'] & pitches['outside_zone']
# Calculate metrics by pitch type
location_metrics = pitches.groupby('pitch_type').agg({
'in_zone': ['sum', 'count'],
'heart': 'sum',
'edge': 'sum',
'outside_zone': 'sum',
'is_chase': 'sum'
})
location_metrics.columns = ['zone_count', 'total', 'heart_count',
'edge_count', 'outside_count', 'chase_count']
location_metrics = location_metrics.reset_index()
# Calculate percentages
location_metrics['zone_pct'] = (
location_metrics['zone_count'] / location_metrics['total'] * 100
).round(1)
location_metrics['heart_pct'] = (
location_metrics['heart_count'] / location_metrics['total'] * 100
).round(1)
location_metrics['edge_pct'] = (
location_metrics['edge_count'] / location_metrics['total'] * 100
).round(1)
# Chase rate: chases / pitches outside zone
location_metrics['chase_rate'] = (
location_metrics['chase_count'] / location_metrics['outside_count'] * 100
).round(1)
# Select final columns
result = location_metrics[[
'pitch_type', 'total', 'zone_pct', 'heart_pct',
'edge_pct', 'chase_rate'
]].copy()
return result.sort_values('total', ascending=False)
# Analyze location metrics
cole_location = analyze_location_metrics(cole_pitches)
print("\nLocation & Command Metrics by Pitch Type")
print("=" * 70)
print(cole_location.to_string(index=False))
# R version: Location metrics
analyze_location_metrics <- function(df) {
pitches <- df %>%
filter(!is.na(zone))
# Define zone categories
pitches <- pitches %>%
mutate(
in_zone = zone <= 9,
heart = zone == 5,
edge = zone %in% c(1, 2, 3, 4, 6, 7, 8, 9),
outside_zone = zone > 9,
is_swing = description %in% c('swinging_strike', 'swinging_strike_blocked',
'foul', 'hit_into_play', 'foul_tip'),
is_chase = is_swing & outside_zone
)
# Calculate metrics by pitch type
location_metrics <- pitches %>%
group_by(pitch_type) %>%
summarise(
total = n(),
zone_count = sum(in_zone),
heart_count = sum(heart),
edge_count = sum(edge),
outside_count = sum(outside_zone),
chase_count = sum(is_chase),
.groups = 'drop'
) %>%
mutate(
zone_pct = zone_count / total * 100,
heart_pct = heart_count / total * 100,
edge_pct = edge_count / total * 100,
chase_rate = chase_count / outside_count * 100,
across(c(zone_pct, heart_pct, edge_pct, chase_rate), ~round(.x, 1))
) %>%
select(pitch_type, total, zone_pct, heart_pct, edge_pct, chase_rate) %>%
arrange(desc(total))
return(location_metrics)
}
cole_location <- analyze_location_metrics(cole_pitches)
cat("\nLocation & Command Metrics by Pitch Type\n")
cat(strrep("=", 70), "\n")
print(cole_location)
7.7.2 CSW% and Chase Rate
Called Strikes + Whiffs (CSW%) is a simple but powerful metric for evaluating pitcher performance:
CSW% = (Called Strikes + Swinging Strikes) / Total Pitches
CSW% captures two essential pitcher skills:
- Command: Throwing strikes that hitters don't swing at
- Stuff: Getting whiffs when hitters do swing
League average CSW% is approximately 28-30%. Elite pitchers exceed 33%.
Chase Rate measures deception - how often hitters swing at pitches outside the zone:
Chase Rate = Swings Outside Zone / Pitches Outside Zone
League average chase rate is approximately 28-30%. Elite pitches/pitchers exceed 35%.
def calculate_csw_metrics(df):
"""
Calculate CSW% and related metrics.
Parameters:
df: Statcast DataFrame
Returns:
DataFrame with CSW metrics by pitch type
"""
pitches = df.copy()
# Identify called strikes and whiffs
pitches['called_strike'] = pitches['description'] == 'called_strike'
pitches['swinging_strike'] = pitches['description'].isin([
'swinging_strike', 'swinging_strike_blocked'
])
pitches['csw'] = pitches['called_strike'] | pitches['swinging_strike']
# Calculate by pitch type
csw_metrics = pitches.groupby('pitch_type').agg({
'pitch_type': 'count',
'called_strike': 'sum',
'swinging_strike': 'sum',
'csw': 'sum'
})
csw_metrics.columns = ['total', 'called_strikes', 'whiffs', 'csw_count']
csw_metrics = csw_metrics.reset_index()
# Calculate percentages
csw_metrics['called_strike_pct'] = (
csw_metrics['called_strikes'] / csw_metrics['total'] * 100
).round(1)
csw_metrics['whiff_pct'] = (
csw_metrics['whiffs'] / csw_metrics['total'] * 100
).round(1)
csw_metrics['csw_pct'] = (
csw_metrics['csw_count'] / csw_metrics['total'] * 100
).round(1)
result = csw_metrics[[
'pitch_type', 'total', 'called_strike_pct',
'whiff_pct', 'csw_pct'
]].copy()
return result.sort_values('csw_pct', ascending=False)
cole_csw = calculate_csw_metrics(cole_pitches)
print("\nCSW% Analysis by Pitch Type")
print("=" * 70)
print(cole_csw.to_string(index=False))
print("\nMLB Average CSW%: ~29%")
print("Elite Threshold: 33%+")
# R version: CSW metrics
calculate_csw_metrics <- function(df) {
pitches <- df %>%
mutate(
called_strike = description == 'called_strike',
swinging_strike = description %in% c('swinging_strike', 'swinging_strike_blocked'),
csw = called_strike | swinging_strike
)
csw_metrics <- pitches %>%
group_by(pitch_type) %>%
summarise(
total = n(),
called_strikes = sum(called_strike),
whiffs = sum(swinging_strike),
csw_count = sum(csw),
.groups = 'drop'
) %>%
mutate(
called_strike_pct = called_strikes / total * 100,
whiff_pct = whiffs / total * 100,
csw_pct = csw_count / total * 100,
across(c(called_strike_pct, whiff_pct, csw_pct), ~round(.x, 1))
) %>%
select(pitch_type, total, called_strike_pct, whiff_pct, csw_pct) %>%
arrange(desc(csw_pct))
return(csw_metrics)
}
cole_csw <- calculate_csw_metrics(cole_pitches)
cat("\nCSW% Analysis by Pitch Type\n")
cat(strrep("=", 70), "\n")
print(cole_csw)
cat("\nMLB Average CSW%: ~29%\n")
cat("Elite Threshold: 33%+\n")
# R version: Location metrics
analyze_location_metrics <- function(df) {
pitches <- df %>%
filter(!is.na(zone))
# Define zone categories
pitches <- pitches %>%
mutate(
in_zone = zone <= 9,
heart = zone == 5,
edge = zone %in% c(1, 2, 3, 4, 6, 7, 8, 9),
outside_zone = zone > 9,
is_swing = description %in% c('swinging_strike', 'swinging_strike_blocked',
'foul', 'hit_into_play', 'foul_tip'),
is_chase = is_swing & outside_zone
)
# Calculate metrics by pitch type
location_metrics <- pitches %>%
group_by(pitch_type) %>%
summarise(
total = n(),
zone_count = sum(in_zone),
heart_count = sum(heart),
edge_count = sum(edge),
outside_count = sum(outside_zone),
chase_count = sum(is_chase),
.groups = 'drop'
) %>%
mutate(
zone_pct = zone_count / total * 100,
heart_pct = heart_count / total * 100,
edge_pct = edge_count / total * 100,
chase_rate = chase_count / outside_count * 100,
across(c(zone_pct, heart_pct, edge_pct, chase_rate), ~round(.x, 1))
) %>%
select(pitch_type, total, zone_pct, heart_pct, edge_pct, chase_rate) %>%
arrange(desc(total))
return(location_metrics)
}
cole_location <- analyze_location_metrics(cole_pitches)
cat("\nLocation & Command Metrics by Pitch Type\n")
cat(strrep("=", 70), "\n")
print(cole_location)
# R version: CSW metrics
calculate_csw_metrics <- function(df) {
pitches <- df %>%
mutate(
called_strike = description == 'called_strike',
swinging_strike = description %in% c('swinging_strike', 'swinging_strike_blocked'),
csw = called_strike | swinging_strike
)
csw_metrics <- pitches %>%
group_by(pitch_type) %>%
summarise(
total = n(),
called_strikes = sum(called_strike),
whiffs = sum(swinging_strike),
csw_count = sum(csw),
.groups = 'drop'
) %>%
mutate(
called_strike_pct = called_strikes / total * 100,
whiff_pct = whiffs / total * 100,
csw_pct = csw_count / total * 100,
across(c(called_strike_pct, whiff_pct, csw_pct), ~round(.x, 1))
) %>%
select(pitch_type, total, called_strike_pct, whiff_pct, csw_pct) %>%
arrange(desc(csw_pct))
return(csw_metrics)
}
cole_csw <- calculate_csw_metrics(cole_pitches)
cat("\nCSW% Analysis by Pitch Type\n")
cat(strrep("=", 70), "\n")
print(cole_csw)
cat("\nMLB Average CSW%: ~29%\n")
cat("Elite Threshold: 33%+\n")
def analyze_location_metrics(df):
"""
Calculate pitch location and command metrics.
Parameters:
df: Statcast DataFrame with location data
Returns:
DataFrame with location metrics by pitch type
"""
pitches = df[df['zone'].notna()].copy()
# Zone definitions (Statcast uses zones 1-9 for strike zone, 11-14 for outside)
pitches['in_zone'] = pitches['zone'] <= 9
pitches['heart'] = pitches['zone'].isin([5]) # Zone 5 is heart
pitches['edge'] = pitches['zone'].isin([1, 2, 3, 4, 6, 7, 8, 9]) & ~pitches['heart']
pitches['outside_zone'] = pitches['zone'] > 9
# Identify chases (swings on pitches outside zone)
pitches['is_swing'] = pitches['description'].isin([
'swinging_strike', 'swinging_strike_blocked',
'foul', 'hit_into_play', 'foul_tip'
])
pitches['is_chase'] = pitches['is_swing'] & pitches['outside_zone']
# Calculate metrics by pitch type
location_metrics = pitches.groupby('pitch_type').agg({
'in_zone': ['sum', 'count'],
'heart': 'sum',
'edge': 'sum',
'outside_zone': 'sum',
'is_chase': 'sum'
})
location_metrics.columns = ['zone_count', 'total', 'heart_count',
'edge_count', 'outside_count', 'chase_count']
location_metrics = location_metrics.reset_index()
# Calculate percentages
location_metrics['zone_pct'] = (
location_metrics['zone_count'] / location_metrics['total'] * 100
).round(1)
location_metrics['heart_pct'] = (
location_metrics['heart_count'] / location_metrics['total'] * 100
).round(1)
location_metrics['edge_pct'] = (
location_metrics['edge_count'] / location_metrics['total'] * 100
).round(1)
# Chase rate: chases / pitches outside zone
location_metrics['chase_rate'] = (
location_metrics['chase_count'] / location_metrics['outside_count'] * 100
).round(1)
# Select final columns
result = location_metrics[[
'pitch_type', 'total', 'zone_pct', 'heart_pct',
'edge_pct', 'chase_rate'
]].copy()
return result.sort_values('total', ascending=False)
# Analyze location metrics
cole_location = analyze_location_metrics(cole_pitches)
print("\nLocation & Command Metrics by Pitch Type")
print("=" * 70)
print(cole_location.to_string(index=False))
def calculate_csw_metrics(df):
"""
Calculate CSW% and related metrics.
Parameters:
df: Statcast DataFrame
Returns:
DataFrame with CSW metrics by pitch type
"""
pitches = df.copy()
# Identify called strikes and whiffs
pitches['called_strike'] = pitches['description'] == 'called_strike'
pitches['swinging_strike'] = pitches['description'].isin([
'swinging_strike', 'swinging_strike_blocked'
])
pitches['csw'] = pitches['called_strike'] | pitches['swinging_strike']
# Calculate by pitch type
csw_metrics = pitches.groupby('pitch_type').agg({
'pitch_type': 'count',
'called_strike': 'sum',
'swinging_strike': 'sum',
'csw': 'sum'
})
csw_metrics.columns = ['total', 'called_strikes', 'whiffs', 'csw_count']
csw_metrics = csw_metrics.reset_index()
# Calculate percentages
csw_metrics['called_strike_pct'] = (
csw_metrics['called_strikes'] / csw_metrics['total'] * 100
).round(1)
csw_metrics['whiff_pct'] = (
csw_metrics['whiffs'] / csw_metrics['total'] * 100
).round(1)
csw_metrics['csw_pct'] = (
csw_metrics['csw_count'] / csw_metrics['total'] * 100
).round(1)
result = csw_metrics[[
'pitch_type', 'total', 'called_strike_pct',
'whiff_pct', 'csw_pct'
]].copy()
return result.sort_values('csw_pct', ascending=False)
cole_csw = calculate_csw_metrics(cole_pitches)
print("\nCSW% Analysis by Pitch Type")
print("=" * 70)
print(cole_csw.to_string(index=False))
print("\nMLB Average CSW%: ~29%")
print("Elite Threshold: 33%+")
7.8.1 xERA and xwOBA Against
Just as hitters have expected statistics, pitchers have expected stats against based on contact quality allowed:
xwOBA Against (xwOBA): Expected weighted on-base average allowed, based on exit velocity and launch angle of batted balls. This removes defensive performance and luck, isolating the pitcher's responsibility.
xERA (Expected ERA): Estimated ERA based on expected outcomes rather than actual outcomes.
These metrics help identify:
- Unlucky pitchers: High ERA but low xERA (likely to improve)
- Lucky pitchers: Low ERA but high xERA (likely to regress)
- Contact management: Pitchers who limit hard contact even when allowing hits
def analyze_expected_stats(df, player_name="Pitcher"):
"""
Analyze expected statistics for a pitcher.
Parameters:
df: Statcast DataFrame with expected stats
player_name: Name for report
Returns:
Summary of expected vs actual stats
"""
# Filter for batted balls with expected stats
batted_balls = df[
df['estimated_woba_using_speedangle'].notna()
].copy()
if len(batted_balls) == 0:
return None
# Calculate actual wOBA (simplified - using hits and outs)
batted_balls['is_hit'] = batted_balls['events'].isin([
'single', 'double', 'triple', 'home_run'
])
batted_balls['is_hr'] = batted_balls['events'] == 'home_run'
# Aggregate metrics
xwOBA_against = batted_balls['estimated_woba_using_speedangle'].mean()
avg_ev = batted_balls['launch_speed'].mean()
avg_la = batted_balls['launch_angle'].mean()
barrel_pct = (batted_balls['barrel'] == 1).mean() * 100
hard_hit_pct = (batted_balls['launch_speed'] >= 95).mean() * 100
results = {
'Player': player_name,
'xwOBA_Against': round(xwOBA_against, 3),
'Avg_EV_Against': round(avg_ev, 1),
'Avg_LA_Against': round(avg_la, 1),
'Barrel%_Against': round(barrel_pct, 1),
'HardHit%_Against': round(hard_hit_pct, 1),
'Batted_Balls': len(batted_balls)
}
return pd.Series(results)
cole_xstats = analyze_expected_stats(cole_pitches, "Gerrit Cole")
if cole_xstats is not None:
print("\nExpected Stats Against (2024)")
print("=" * 60)
for key, value in cole_xstats.items():
print(f"{key:.<30} {value}")
print("\nMLB Average xwOBA Against: ~.315")
print("Elite Threshold: <.300")
# R version: Expected stats
analyze_expected_stats <- function(df, player_name = "Pitcher") {
batted_balls <- df %>%
filter(!is.na(estimated_woba_using_speedangle))
if (nrow(batted_balls) == 0) {
return(NULL)
}
results <- batted_balls %>%
summarise(
Player = player_name,
xwOBA_Against = mean(estimated_woba_using_speedangle, na.rm = TRUE),
Avg_EV_Against = mean(launch_speed, na.rm = TRUE),
Avg_LA_Against = mean(launch_angle, na.rm = TRUE),
Barrel_Pct_Against = mean(barrel == 1, na.rm = TRUE) * 100,
HardHit_Pct_Against = mean(launch_speed >= 95, na.rm = TRUE) * 100,
Batted_Balls = n()
) %>%
mutate(across(where(is.numeric), ~round(.x, 3)))
return(results)
}
cole_xstats <- analyze_expected_stats(cole_pitches, "Gerrit Cole")
if (!is.null(cole_xstats)) {
cat("\nExpected Stats Against (2024)\n")
cat(strrep("=", 60), "\n")
print(t(cole_xstats), quote = FALSE)
cat("\nMLB Average xwOBA Against: ~.315\n")
cat("Elite Threshold: <.300\n")
}
7.8.2 Interpreting Expected Stats
When a pitcher's actual ERA significantly exceeds his xERA, several factors might be at play:
- Poor defense: Fielders not converting outs at expected rates
- Sequencing luck: Hits clustered together, leading to big innings
- Runners on base: Performance worse with runners on (pitch selection, pressure)
- Sample size: Early season stats can show large gaps that normalize
Conversely, when actual ERA is much lower than xERA:
- Good defense: Excellent fielding behind the pitcher
- Sequencing luck: Hits scattered, preventing big innings
- Strand rate: Above-average at stranding runners
- Unsustainable: Likely to see ERA rise toward xERA over time
# R version: Expected stats
analyze_expected_stats <- function(df, player_name = "Pitcher") {
batted_balls <- df %>%
filter(!is.na(estimated_woba_using_speedangle))
if (nrow(batted_balls) == 0) {
return(NULL)
}
results <- batted_balls %>%
summarise(
Player = player_name,
xwOBA_Against = mean(estimated_woba_using_speedangle, na.rm = TRUE),
Avg_EV_Against = mean(launch_speed, na.rm = TRUE),
Avg_LA_Against = mean(launch_angle, na.rm = TRUE),
Barrel_Pct_Against = mean(barrel == 1, na.rm = TRUE) * 100,
HardHit_Pct_Against = mean(launch_speed >= 95, na.rm = TRUE) * 100,
Batted_Balls = n()
) %>%
mutate(across(where(is.numeric), ~round(.x, 3)))
return(results)
}
cole_xstats <- analyze_expected_stats(cole_pitches, "Gerrit Cole")
if (!is.null(cole_xstats)) {
cat("\nExpected Stats Against (2024)\n")
cat(strrep("=", 60), "\n")
print(t(cole_xstats), quote = FALSE)
cat("\nMLB Average xwOBA Against: ~.315\n")
cat("Elite Threshold: <.300\n")
}
def analyze_expected_stats(df, player_name="Pitcher"):
"""
Analyze expected statistics for a pitcher.
Parameters:
df: Statcast DataFrame with expected stats
player_name: Name for report
Returns:
Summary of expected vs actual stats
"""
# Filter for batted balls with expected stats
batted_balls = df[
df['estimated_woba_using_speedangle'].notna()
].copy()
if len(batted_balls) == 0:
return None
# Calculate actual wOBA (simplified - using hits and outs)
batted_balls['is_hit'] = batted_balls['events'].isin([
'single', 'double', 'triple', 'home_run'
])
batted_balls['is_hr'] = batted_balls['events'] == 'home_run'
# Aggregate metrics
xwOBA_against = batted_balls['estimated_woba_using_speedangle'].mean()
avg_ev = batted_balls['launch_speed'].mean()
avg_la = batted_balls['launch_angle'].mean()
barrel_pct = (batted_balls['barrel'] == 1).mean() * 100
hard_hit_pct = (batted_balls['launch_speed'] >= 95).mean() * 100
results = {
'Player': player_name,
'xwOBA_Against': round(xwOBA_against, 3),
'Avg_EV_Against': round(avg_ev, 1),
'Avg_LA_Against': round(avg_la, 1),
'Barrel%_Against': round(barrel_pct, 1),
'HardHit%_Against': round(hard_hit_pct, 1),
'Batted_Balls': len(batted_balls)
}
return pd.Series(results)
cole_xstats = analyze_expected_stats(cole_pitches, "Gerrit Cole")
if cole_xstats is not None:
print("\nExpected Stats Against (2024)")
print("=" * 60)
for key, value in cole_xstats.items():
print(f"{key:.<30} {value}")
print("\nMLB Average xwOBA Against: ~.315")
print("Elite Threshold: <.300")
Modern pitch analysis demands interactive visualizations that allow analysts, coaches, and fans to explore multi-dimensional data dynamically. While static charts effectively communicate specific insights, interactive tools enable deeper exploration of pitch arsenals, movement profiles, and sequencing patterns. This section introduces three powerful interactive visualization approaches using Plotly's interactive graphing capabilities, which provide zoom, pan, hover details, and filtering options that static visualizations cannot match.
Interactive pitch analysis tools serve multiple audiences. Player development staff use them to identify mechanical adjustments that could improve pitch characteristics. Opposing teams employ them for scouting and game-planning. Broadcasters leverage them to educate viewers about what makes certain pitches effective. The combination of real-time data updates and interactive exploration creates unprecedented opportunities for understanding pitcher performance.
7.9.1 Interactive Pitch Movement Chart
The pitch movement chart—plotting horizontal break against vertical break—is fundamental to understanding a pitcher's arsenal. Making this visualization interactive transforms it from a descriptive tool into an analytical powerhouse. Users can hover over individual pitches to see exact velocity, spin rate, and outcome data. They can filter by game situation, count, or pitch result. They can identify outlier pitches that deviate from typical movement patterns, potentially indicating mechanical issues or grip adjustments.
R Implementation:
library(tidyverse)
library(plotly)
library(baseballr)
create_interactive_pitch_movement <- function(pitcher_data, player_name = "Pitcher") {
# Filter for pitches with complete movement data
pitches <- pitcher_data %>%
filter(!is.na(pfx_x), !is.na(pfx_z), !is.na(pitch_type)) %>%
mutate(
horizontal_break = pfx_x * 12, # Convert to inches
vertical_break = pfx_z * 12, # Induced vertical break
hover_text = paste0(
"<b>", pitch_type, "</b><br>",
"Velocity: ", round(release_speed, 1), " mph<br>",
"Spin: ", round(release_spin_rate, 0), " rpm<br>",
"H-Break: ", round(horizontal_break, 1), " in<br>",
"V-Break: ", round(vertical_break, 1), " in<br>",
"Result: ", events
)
)
# Filter for pitch types with sufficient samples
pitch_counts <- pitches %>% count(pitch_type)
qualifying_pitches <- pitch_counts %>% filter(n >= 30) %>% pull(pitch_type)
plot_data <- pitches %>% filter(pitch_type %in% qualifying_pitches)
# Define color palette for pitch types
pitch_colors <- c(
'FF' = '#d62728', # Four-seam: Red
'SI' = '#ff7f0e', # Sinker: Orange
'FC' = '#2ca02c', # Cutter: Green
'SL' = '#9467bd', # Slider: Purple
'CU' = '#8c564b', # Curve: Brown
'CH' = '#e377c2', # Change: Pink
'FS' = '#17becf', # Splitter: Cyan
'KC' = '#bcbd22' # Knuckle-curve: Yellow-green
)
# Create interactive scatter plot
p <- plot_ly(data = plot_data) %>%
add_markers(
x = ~horizontal_break,
y = ~vertical_break,
color = ~pitch_type,
colors = pitch_colors,
text = ~hover_text,
hoverinfo = "text",
marker = list(
size = 8,
opacity = 0.6,
line = list(width = 0.5, color = 'black')
)
) %>%
layout(
title = list(
text = paste0("<b>", player_name, " Pitch Movement Profile</b><br>",
"<sub>Catcher's Perspective - Hover for Details</sub>"),
font = list(size = 16)
),
xaxis = list(
title = "<b>Horizontal Break (inches)</b><br>← Glove Side | Arm Side →",
zeroline = TRUE,
zerolinewidth = 2,
zerolinecolor = 'gray',
gridcolor = 'lightgray'
),
yaxis = list(
title = "<b>Induced Vertical Break (inches)</b><br>↓ Drop | Rise ↑",
zeroline = TRUE,
zerolinewidth = 2,
zerolinecolor = 'gray',
gridcolor = 'lightgray',
scaleanchor = "x", # Equal aspect ratio
scaleratio = 1
),
hovermode = 'closest',
showlegend = TRUE,
legend = list(
title = list(text = '<b>Pitch Type</b>'),
orientation = 'v',
x = 1.02,
y = 1
),
margin = list(l = 80, r = 120, t = 100, b = 80)
) %>%
config(displayModeBar = TRUE, displaylogo = FALSE)
return(p)
}
# Example usage with Gerrit Cole's data
# cole_pitches <- statcast_search_pitchers(
# start_date = "2024-04-01",
# end_date = "2024-10-01",
# pitcherid = 543037
# )
#
# interactive_movement_plot <- create_interactive_pitch_movement(
# cole_pitches,
# "Gerrit Cole 2024"
# )
# interactive_movement_plot
Python Implementation:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from pybaseball import statcast_pitcher
def create_interactive_pitch_movement(pitcher_data, player_name="Pitcher"):
"""
Create interactive pitch movement visualization using Plotly.
Parameters:
pitcher_data: DataFrame with Statcast pitch data
player_name: Name for chart title
Returns:
Plotly figure object
"""
# Filter for complete movement data
pitches = pitcher_data[
pitcher_data['pfx_x'].notna() &
pitcher_data['pfx_z'].notna() &
pitcher_data['pitch_type'].notna()
].copy()
# Convert to inches
pitches['horizontal_break'] = pitches['pfx_x'] * 12
pitches['vertical_break'] = pitches['pfx_z'] * 12
# Create hover text
pitches['hover_text'] = pitches.apply(
lambda row: f"<b>{row['pitch_type']}</b><br>" +
f"Velocity: {row['release_speed']:.1f} mph<br>" +
f"Spin: {row['release_spin_rate']:.0f} rpm<br>" +
f"H-Break: {row['horizontal_break']:.1f} in<br>" +
f"V-Break: {row['vertical_break']:.1f} in<br>" +
f"Result: {row['events']}",
axis=1
)
# Filter for qualifying pitch types
pitch_counts = pitches['pitch_type'].value_counts()
qualifying_pitches = pitch_counts[pitch_counts >= 30].index
plot_data = pitches[pitches['pitch_type'].isin(qualifying_pitches)]
# Pitch type colors
pitch_colors = {
'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
'FS': '#17becf', 'KC': '#bcbd22'
}
# Create figure
fig = go.Figure()
# Add scatter trace for each pitch type
for pitch_type in qualifying_pitches:
pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]
fig.add_trace(go.Scatter(
x=pitch_subset['horizontal_break'],
y=pitch_subset['vertical_break'],
mode='markers',
name=f"{pitch_type} (n={len(pitch_subset)})",
text=pitch_subset['hover_text'],
hoverinfo='text',
marker=dict(
color=pitch_colors.get(pitch_type, '#7f7f7f'),
size=8,
opacity=0.6,
line=dict(width=0.5, color='black')
)
))
# Update layout
fig.update_layout(
title=dict(
text=f"<b>{player_name} Pitch Movement Profile</b><br>" +
"<sub>Catcher's Perspective - Hover for Details</sub>",
x=0.5,
xanchor='center',
font=dict(size=16)
),
xaxis=dict(
title="<b>Horizontal Break (inches)</b><br>← Glove Side | Arm Side →",
zeroline=True,
zerolinewidth=2,
zerolinecolor='gray',
gridcolor='lightgray',
showgrid=True
),
yaxis=dict(
title="<b>Induced Vertical Break (inches)</b><br>↓ Drop | Rise ↑",
zeroline=True,
zerolinewidth=2,
zerolinecolor='gray',
gridcolor='lightgray',
showgrid=True,
scaleanchor="x",
scaleratio=1
),
hovermode='closest',
showlegend=True,
legend=dict(
title=dict(text='<b>Pitch Type</b>'),
orientation='v',
x=1.02,
y=1
),
width=1000,
height=900,
margin=dict(l=80, r=150, t=100, b=80),
template='plotly_white'
)
return fig
# Example usage
# cole_pitches = statcast_pitcher('2024-04-01', '2024-10-01', 543037)
# fig = create_interactive_pitch_movement(cole_pitches, "Gerrit Cole 2024")
# fig.show()
This interactive movement chart allows users to immediately identify pitch clustering, outliers, and the separation between pitch types. A pitcher with good pitch tunneling will show overlapping early trajectories but diverging final movement. The hover functionality enables quick identification of specific pitches for video review or further analysis.
7.9.2 Interactive Release Point Visualization
Release point consistency is critical for deception, but static visualizations can obscure patterns that emerge when examining pitches interactively. An interactive 3D release point chart allows rotation to examine side, height, and extension from multiple angles. Filtering by pitch type reveals whether a pitcher "tips" pitches through inconsistent release points. Color-coding by velocity or outcome adds another analytical dimension.
R Implementation:
library(plotly)
library(dplyr)
create_interactive_release_points <- function(pitcher_data, player_name = "Pitcher") {
# Filter for complete release point data
pitches <- pitcher_data %>%
filter(
!is.na(release_pos_x),
!is.na(release_pos_y),
!is.na(release_pos_z),
!is.na(pitch_type)
) %>%
mutate(
hover_text = paste0(
"<b>", pitch_type, "</b><br>",
"X (side): ", round(release_pos_x, 2), " ft<br>",
"Y (extension): ", round(release_pos_y, 2), " ft<br>",
"Z (height): ", round(release_pos_z, 2), " ft<br>",
"Velocity: ", round(release_speed, 1), " mph<br>",
"Result: ", events
)
)
# Filter for qualifying pitch types
pitch_counts <- pitches %>% count(pitch_type)
qualifying <- pitch_counts %>% filter(n >= 30) %>% pull(pitch_type)
plot_data <- pitches %>% filter(pitch_type %in% qualifying)
# Color palette
pitch_colors <- c(
'FF' = '#d62728', 'SI' = '#ff7f0e', 'FC' = '#2ca02c',
'SL' = '#9467bd', 'CU' = '#8c564b', 'CH' = '#e377c2',
'FS' = '#17becf', 'KC' = '#bcbd22'
)
# Create 3D scatter plot
p <- plot_ly(data = plot_data) %>%
add_markers(
x = ~release_pos_x,
y = ~release_extension, # Use extension for depth
z = ~release_pos_z,
color = ~pitch_type,
colors = pitch_colors,
text = ~hover_text,
hoverinfo = "text",
marker = list(
size = 5,
opacity = 0.7,
line = list(width = 0.3, color = 'black')
)
) %>%
layout(
title = list(
text = paste0("<b>", player_name, " Release Point Consistency</b><br>",
"<sub>3D View - Rotate to Explore</sub>"),
font = list(size = 16)
),
scene = list(
xaxis = list(
title = '<b>Horizontal Position (ft)</b><br>← Arm Side | Glove Side →',
gridcolor = 'lightgray',
backgroundcolor = 'white'
),
yaxis = list(
title = '<b>Extension (ft)</b>',
gridcolor = 'lightgray',
backgroundcolor = 'white'
),
zaxis = list(
title = '<b>Release Height (ft)</b>',
gridcolor = 'lightgray',
backgroundcolor = 'white'
),
camera = list(
eye = list(x = 1.5, y = 1.5, z = 1.3)
),
aspectmode = 'cube'
),
showlegend = TRUE,
legend = list(
title = list(text = '<b>Pitch Type</b>'),
x = 1.02,
y = 0.9
)
) %>%
config(displayModeBar = TRUE, displaylogo = FALSE)
return(p)
}
# Example usage
# release_plot <- create_interactive_release_points(
# cole_pitches,
# "Gerrit Cole 2024"
# )
# release_plot
Python Implementation:
import plotly.graph_objects as go
def create_interactive_release_points(pitcher_data, player_name="Pitcher"):
"""
Create 3D interactive release point visualization.
Parameters:
pitcher_data: DataFrame with Statcast pitch data
player_name: Name for chart title
Returns:
Plotly figure object
"""
# Filter for complete data
pitches = pitcher_data[
pitcher_data['release_pos_x'].notna() &
pitcher_data['release_pos_y'].notna() &
pitcher_data['release_pos_z'].notna() &
pitcher_data['pitch_type'].notna()
].copy()
# Create hover text
pitches['hover_text'] = pitches.apply(
lambda row: f"<b>{row['pitch_type']}</b><br>" +
f"X (side): {row['release_pos_x']:.2f} ft<br>" +
f"Y (extension): {row['release_pos_y']:.2f} ft<br>" +
f"Z (height): {row['release_pos_z']:.2f} ft<br>" +
f"Velocity: {row['release_speed']:.1f} mph<br>" +
f"Result: {row['events']}",
axis=1
)
# Filter for qualifying pitch types
pitch_counts = pitches['pitch_type'].value_counts()
qualifying = pitch_counts[pitch_counts >= 30].index
plot_data = pitches[pitches['pitch_type'].isin(qualifying)]
# Pitch colors
pitch_colors = {
'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
'FS': '#17becf', 'KC': '#bcbd22'
}
# Create figure
fig = go.Figure()
# Add trace for each pitch type
for pitch_type in qualifying:
pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]
fig.add_trace(go.Scatter3d(
x=pitch_subset['release_pos_x'],
y=pitch_subset['release_extension'], # Use extension for Y-axis
z=pitch_subset['release_pos_z'],
mode='markers',
name=f"{pitch_type} (n={len(pitch_subset)})",
text=pitch_subset['hover_text'],
hoverinfo='text',
marker=dict(
color=pitch_colors.get(pitch_type, '#7f7f7f'),
size=5,
opacity=0.7,
line=dict(width=0.3, color='black')
)
))
# Update layout
fig.update_layout(
title=dict(
text=f"<b>{player_name} Release Point Consistency</b><br>" +
"<sub>3D View - Rotate to Explore</sub>",
x=0.5,
xanchor='center',
font=dict(size=16)
),
scene=dict(
xaxis=dict(
title='<b>Horizontal Position (ft)</b><br>← Arm Side | Glove Side →',
gridcolor='lightgray',
backgroundcolor='white'
),
yaxis=dict(
title='<b>Extension (ft)</b>',
gridcolor='lightgray',
backgroundcolor='white'
),
zaxis=dict(
title='<b>Release Height (ft)</b>',
gridcolor='lightgray',
backgroundcolor='white'
),
camera=dict(
eye=dict(x=1.5, y=1.5, z=1.3)
),
aspectmode='cube'
),
showlegend=True,
legend=dict(
title=dict(text='<b>Pitch Type</b>'),
x=1.02,
y=0.9
),
width=1000,
height=800,
template='plotly_white'
)
return fig
# Example usage
# fig = create_interactive_release_points(cole_pitches, "Gerrit Cole 2024")
# fig.show()
The 3D release point visualization is particularly valuable for identifying "tipping" issues. If a pitcher's curveball consistently releases from a different height or arm slot than their fastball, hitters can pick up the pitch type early. The interactive rotation capability allows coaches to examine release points from the hitter's perspective, revealing subtle differences that might not be apparent in 2D projections.
7.9.3 Animated Pitch Sequence Explorer
Understanding pitch sequencing requires seeing how pitches relate to each other temporally. An animated pitch sequence explorer shows each pitch in order, tracking location, velocity, and movement while maintaining context of game situation, count, and previous pitches. This creates a narrative view of how a pitcher attacks hitters, revealing patterns in pitch selection and execution.
R Implementation:
library(plotly)
library(dplyr)
create_animated_pitch_sequence <- function(pitcher_data, player_name = "Pitcher",
max_pitches = 200) {
# Prepare pitch sequence data
pitches <- pitcher_data %>%
filter(!is.na(plate_x), !is.na(plate_z), !is.na(pitch_type)) %>%
arrange(game_date, at_bat_number, pitch_number) %>%
mutate(
sequence_num = row_number(),
count_state = paste0(balls, "-", strikes),
frame_label = paste0("Pitch ", sequence_num, ": ", pitch_type,
" @ ", round(release_speed, 1), " mph<br>",
"Count: ", count_state, " | ",
description)
) %>%
head(max_pitches) # Limit for performance
# Define strike zone boundaries
sz_top <- 3.5 # Approximate top of zone
sz_bottom <- 1.5 # Approximate bottom
sz_left <- -0.83 # Left edge (catcher's view)
sz_right <- 0.83 # Right edge
# Pitch colors
pitch_colors <- c(
'FF' = '#d62728', 'SI' = '#ff7f0e', 'FC' = '#2ca02c',
'SL' = '#9467bd', 'CU' = '#8c564b', 'CH' = '#e377c2',
'FS' = '#17becf', 'KC' = '#bcbd22'
)
# Create animated scatter plot
p <- plot_ly(
data = pitches,
x = ~plate_x,
y = ~plate_z,
frame = ~sequence_num,
color = ~pitch_type,
colors = pitch_colors,
text = ~frame_label,
hoverinfo = "text",
type = 'scatter',
mode = 'markers',
marker = list(
size = 12,
opacity = 0.8,
line = list(width = 1, color = 'black')
)
) %>%
layout(
title = list(
text = paste0("<b>", player_name, " Pitch Sequence</b><br>",
"<sub>Catcher's View - Press Play</sub>"),
font = list(size = 16)
),
xaxis = list(
title = "<b>Horizontal Location (ft)</b><br>← Inside | Outside →",
range = c(-2, 2),
zeroline = TRUE,
zerolinecolor = 'lightgray'
),
yaxis = list(
title = "<b>Vertical Location (ft)</b>",
range = c(0, 5),
zeroline = FALSE
),
shapes = list(
# Strike zone rectangle
list(
type = "rect",
x0 = sz_left, x1 = sz_right,
y0 = sz_bottom, y1 = sz_top,
line = list(color = "black", width = 2),
fillcolor = "rgba(200, 200, 200, 0.1)"
)
),
showlegend = TRUE,
legend = list(title = list(text = '<b>Pitch Type</b>'))
) %>%
animation_opts(
frame = 1000, # 1 second per pitch
transition = 500,
redraw = FALSE
) %>%
animation_slider(
currentvalue = list(
prefix = "Pitch: ",
font = list(size = 14, color = "black")
)
) %>%
config(displayModeBar = TRUE, displaylogo = FALSE)
return(p)
}
# Example usage
# sequence_plot <- create_animated_pitch_sequence(
# cole_pitches,
# "Gerrit Cole 2024",
# max_pitches = 150
# )
# sequence_plot
Python Implementation:
import plotly.graph_objects as go
def create_animated_pitch_sequence(pitcher_data, player_name="Pitcher",
max_pitches=200):
"""
Create animated pitch sequence visualization.
Parameters:
pitcher_data: DataFrame with Statcast pitch data
player_name: Name for chart title
max_pitches: Maximum pitches to include (for performance)
Returns:
Plotly figure object
"""
# Prepare sequence data
pitches = pitcher_data[
pitcher_data['plate_x'].notna() &
pitcher_data['plate_z'].notna() &
pitcher_data['pitch_type'].notna()
].copy()
# Sort chronologically
pitches = pitches.sort_values(['game_date', 'at_bat_number', 'pitch_number'])
pitches = pitches.head(max_pitches) # Limit for performance
pitches['sequence_num'] = range(1, len(pitches) + 1)
# Create count state and labels
pitches['count_state'] = pitches['balls'].astype(str) + '-' + pitches['strikes'].astype(str)
pitches['frame_label'] = pitches.apply(
lambda row: f"Pitch {row['sequence_num']}: {row['pitch_type']} " +
f"@ {row['release_speed']:.1f} mph<br>" +
f"Count: {row['count_state']} | {row['description']}",
axis=1
)
# Strike zone boundaries
sz_left, sz_right = -0.83, 0.83
sz_bottom, sz_top = 1.5, 3.5
# Pitch colors
pitch_colors = {
'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
'FS': '#17becf', 'KC': '#bcbd22'
}
# Create figure with frames
fig = go.Figure()
# Get unique pitch types for legend
pitch_types = pitches['pitch_type'].unique()
# Create frames for animation
frames = []
for seq_num in pitches['sequence_num'].unique():
frame_data = pitches[pitches['sequence_num'] <= seq_num]
frame_traces = []
for pitch_type in pitch_types:
pt_data = frame_data[frame_data['pitch_type'] == pitch_type]
if len(pt_data) > 0:
frame_traces.append(go.Scatter(
x=pt_data['plate_x'],
y=pt_data['plate_z'],
mode='markers',
name=pitch_type,
text=pt_data['frame_label'],
hoverinfo='text',
marker=dict(
color=pitch_colors.get(pitch_type, '#7f7f7f'),
size=12,
opacity=0.8,
line=dict(width=1, color='black')
),
showlegend=(seq_num == 1) # Only show legend on first frame
))
frames.append(go.Frame(data=frame_traces, name=str(seq_num)))
# Add initial frame data
initial_data = pitches[pitches['sequence_num'] == 1]
for pitch_type in pitch_types:
pt_data = initial_data[initial_data['pitch_type'] == pitch_type]
if len(pt_data) > 0:
fig.add_trace(go.Scatter(
x=pt_data['plate_x'],
y=pt_data['plate_z'],
mode='markers',
name=pitch_type,
text=pt_data['frame_label'],
hoverinfo='text',
marker=dict(
color=pitch_colors.get(pitch_type, '#7f7f7f'),
size=12,
opacity=0.8,
line=dict(width=1, color='black')
)
))
# Add strike zone
fig.add_shape(
type="rect",
x0=sz_left, x1=sz_right,
y0=sz_bottom, y1=sz_top,
line=dict(color="black", width=2),
fillcolor="rgba(200, 200, 200, 0.1)"
)
fig.frames = frames
# Update layout
fig.update_layout(
title=dict(
text=f"<b>{player_name} Pitch Sequence</b><br>" +
"<sub>Catcher's View - Press Play</sub>",
x=0.5,
xanchor='center',
font=dict(size=16)
),
xaxis=dict(
title="<b>Horizontal Location (ft)</b><br>← Inside | Outside →",
range=[-2, 2],
zeroline=True,
zerolinecolor='lightgray',
gridcolor='lightgray'
),
yaxis=dict(
title="<b>Vertical Location (ft)</b>",
range=[0, 5],
zeroline=False,
gridcolor='lightgray'
),
showlegend=True,
legend=dict(title=dict(text='<b>Pitch Type</b>')),
updatemenus=[{
'type': 'buttons',
'showactive': False,
'buttons': [
{
'label': 'Play',
'method': 'animate',
'args': [None, {
'frame': {'duration': 1000, 'redraw': True},
'fromcurrent': True,
'transition': {'duration': 500}
}]
},
{
'label': 'Pause',
'method': 'animate',
'args': [[None], {
'frame': {'duration': 0, 'redraw': False},
'mode': 'immediate',
'transition': {'duration': 0}
}]
}
],
'x': 0.1,
'y': 0
}],
sliders=[{
'active': 0,
'steps': [
{
'args': [[f.name], {
'frame': {'duration': 0, 'redraw': True},
'mode': 'immediate',
'transition': {'duration': 0}
}],
'label': f.name,
'method': 'animate'
}
for f in frames
],
'currentvalue': {
'prefix': 'Pitch: ',
'font': {'size': 14, 'color': 'black'}
},
'x': 0.1,
'len': 0.9,
'xanchor': 'left',
'y': 0,
'yanchor': 'top'
}],
width=1000,
height=800,
template='plotly_white'
)
return fig
# Example usage
# fig = create_animated_pitch_sequence(cole_pitches, "Gerrit Cole 2024", max_pitches=150)
# fig.show()
The animated pitch sequence explorer is particularly powerful for understanding how pitchers set up hitters. Watch an elite pitcher work an at-bat: fastball up and in to establish the inner half, slider down and away to show something soft, another fastball up to get ahead, then put away with a slider off the plate that the hitter chases. The sequential nature reveals patterns that aggregated statistics miss entirely. Analysts can identify if a pitcher becomes predictable in certain counts or situations, offering actionable insights for both pitcher development and opponent preparation.
These interactive visualization tools represent the cutting edge of pitch analysis. They transform static data into explorable experiences, enabling deeper insights and more effective communication of complex multi-dimensional pitching data. The combination of hover details, filtering, rotation (for 3D), and animation creates unprecedented analytical power for understanding pitcher performance and arsenal effectiveness.
library(tidyverse)
library(plotly)
library(baseballr)
create_interactive_pitch_movement <- function(pitcher_data, player_name = "Pitcher") {
# Filter for pitches with complete movement data
pitches <- pitcher_data %>%
filter(!is.na(pfx_x), !is.na(pfx_z), !is.na(pitch_type)) %>%
mutate(
horizontal_break = pfx_x * 12, # Convert to inches
vertical_break = pfx_z * 12, # Induced vertical break
hover_text = paste0(
"<b>", pitch_type, "</b><br>",
"Velocity: ", round(release_speed, 1), " mph<br>",
"Spin: ", round(release_spin_rate, 0), " rpm<br>",
"H-Break: ", round(horizontal_break, 1), " in<br>",
"V-Break: ", round(vertical_break, 1), " in<br>",
"Result: ", events
)
)
# Filter for pitch types with sufficient samples
pitch_counts <- pitches %>% count(pitch_type)
qualifying_pitches <- pitch_counts %>% filter(n >= 30) %>% pull(pitch_type)
plot_data <- pitches %>% filter(pitch_type %in% qualifying_pitches)
# Define color palette for pitch types
pitch_colors <- c(
'FF' = '#d62728', # Four-seam: Red
'SI' = '#ff7f0e', # Sinker: Orange
'FC' = '#2ca02c', # Cutter: Green
'SL' = '#9467bd', # Slider: Purple
'CU' = '#8c564b', # Curve: Brown
'CH' = '#e377c2', # Change: Pink
'FS' = '#17becf', # Splitter: Cyan
'KC' = '#bcbd22' # Knuckle-curve: Yellow-green
)
# Create interactive scatter plot
p <- plot_ly(data = plot_data) %>%
add_markers(
x = ~horizontal_break,
y = ~vertical_break,
color = ~pitch_type,
colors = pitch_colors,
text = ~hover_text,
hoverinfo = "text",
marker = list(
size = 8,
opacity = 0.6,
line = list(width = 0.5, color = 'black')
)
) %>%
layout(
title = list(
text = paste0("<b>", player_name, " Pitch Movement Profile</b><br>",
"<sub>Catcher's Perspective - Hover for Details</sub>"),
font = list(size = 16)
),
xaxis = list(
title = "<b>Horizontal Break (inches)</b><br>← Glove Side | Arm Side →",
zeroline = TRUE,
zerolinewidth = 2,
zerolinecolor = 'gray',
gridcolor = 'lightgray'
),
yaxis = list(
title = "<b>Induced Vertical Break (inches)</b><br>↓ Drop | Rise ↑",
zeroline = TRUE,
zerolinewidth = 2,
zerolinecolor = 'gray',
gridcolor = 'lightgray',
scaleanchor = "x", # Equal aspect ratio
scaleratio = 1
),
hovermode = 'closest',
showlegend = TRUE,
legend = list(
title = list(text = '<b>Pitch Type</b>'),
orientation = 'v',
x = 1.02,
y = 1
),
margin = list(l = 80, r = 120, t = 100, b = 80)
) %>%
config(displayModeBar = TRUE, displaylogo = FALSE)
return(p)
}
# Example usage with Gerrit Cole's data
# cole_pitches <- statcast_search_pitchers(
# start_date = "2024-04-01",
# end_date = "2024-10-01",
# pitcherid = 543037
# )
#
# interactive_movement_plot <- create_interactive_pitch_movement(
# cole_pitches,
# "Gerrit Cole 2024"
# )
# interactive_movement_plot
library(plotly)
library(dplyr)
create_interactive_release_points <- function(pitcher_data, player_name = "Pitcher") {
# Filter for complete release point data
pitches <- pitcher_data %>%
filter(
!is.na(release_pos_x),
!is.na(release_pos_y),
!is.na(release_pos_z),
!is.na(pitch_type)
) %>%
mutate(
hover_text = paste0(
"<b>", pitch_type, "</b><br>",
"X (side): ", round(release_pos_x, 2), " ft<br>",
"Y (extension): ", round(release_pos_y, 2), " ft<br>",
"Z (height): ", round(release_pos_z, 2), " ft<br>",
"Velocity: ", round(release_speed, 1), " mph<br>",
"Result: ", events
)
)
# Filter for qualifying pitch types
pitch_counts <- pitches %>% count(pitch_type)
qualifying <- pitch_counts %>% filter(n >= 30) %>% pull(pitch_type)
plot_data <- pitches %>% filter(pitch_type %in% qualifying)
# Color palette
pitch_colors <- c(
'FF' = '#d62728', 'SI' = '#ff7f0e', 'FC' = '#2ca02c',
'SL' = '#9467bd', 'CU' = '#8c564b', 'CH' = '#e377c2',
'FS' = '#17becf', 'KC' = '#bcbd22'
)
# Create 3D scatter plot
p <- plot_ly(data = plot_data) %>%
add_markers(
x = ~release_pos_x,
y = ~release_extension, # Use extension for depth
z = ~release_pos_z,
color = ~pitch_type,
colors = pitch_colors,
text = ~hover_text,
hoverinfo = "text",
marker = list(
size = 5,
opacity = 0.7,
line = list(width = 0.3, color = 'black')
)
) %>%
layout(
title = list(
text = paste0("<b>", player_name, " Release Point Consistency</b><br>",
"<sub>3D View - Rotate to Explore</sub>"),
font = list(size = 16)
),
scene = list(
xaxis = list(
title = '<b>Horizontal Position (ft)</b><br>← Arm Side | Glove Side →',
gridcolor = 'lightgray',
backgroundcolor = 'white'
),
yaxis = list(
title = '<b>Extension (ft)</b>',
gridcolor = 'lightgray',
backgroundcolor = 'white'
),
zaxis = list(
title = '<b>Release Height (ft)</b>',
gridcolor = 'lightgray',
backgroundcolor = 'white'
),
camera = list(
eye = list(x = 1.5, y = 1.5, z = 1.3)
),
aspectmode = 'cube'
),
showlegend = TRUE,
legend = list(
title = list(text = '<b>Pitch Type</b>'),
x = 1.02,
y = 0.9
)
) %>%
config(displayModeBar = TRUE, displaylogo = FALSE)
return(p)
}
# Example usage
# release_plot <- create_interactive_release_points(
# cole_pitches,
# "Gerrit Cole 2024"
# )
# release_plot
library(plotly)
library(dplyr)
create_animated_pitch_sequence <- function(pitcher_data, player_name = "Pitcher",
max_pitches = 200) {
# Prepare pitch sequence data
pitches <- pitcher_data %>%
filter(!is.na(plate_x), !is.na(plate_z), !is.na(pitch_type)) %>%
arrange(game_date, at_bat_number, pitch_number) %>%
mutate(
sequence_num = row_number(),
count_state = paste0(balls, "-", strikes),
frame_label = paste0("Pitch ", sequence_num, ": ", pitch_type,
" @ ", round(release_speed, 1), " mph<br>",
"Count: ", count_state, " | ",
description)
) %>%
head(max_pitches) # Limit for performance
# Define strike zone boundaries
sz_top <- 3.5 # Approximate top of zone
sz_bottom <- 1.5 # Approximate bottom
sz_left <- -0.83 # Left edge (catcher's view)
sz_right <- 0.83 # Right edge
# Pitch colors
pitch_colors <- c(
'FF' = '#d62728', 'SI' = '#ff7f0e', 'FC' = '#2ca02c',
'SL' = '#9467bd', 'CU' = '#8c564b', 'CH' = '#e377c2',
'FS' = '#17becf', 'KC' = '#bcbd22'
)
# Create animated scatter plot
p <- plot_ly(
data = pitches,
x = ~plate_x,
y = ~plate_z,
frame = ~sequence_num,
color = ~pitch_type,
colors = pitch_colors,
text = ~frame_label,
hoverinfo = "text",
type = 'scatter',
mode = 'markers',
marker = list(
size = 12,
opacity = 0.8,
line = list(width = 1, color = 'black')
)
) %>%
layout(
title = list(
text = paste0("<b>", player_name, " Pitch Sequence</b><br>",
"<sub>Catcher's View - Press Play</sub>"),
font = list(size = 16)
),
xaxis = list(
title = "<b>Horizontal Location (ft)</b><br>← Inside | Outside →",
range = c(-2, 2),
zeroline = TRUE,
zerolinecolor = 'lightgray'
),
yaxis = list(
title = "<b>Vertical Location (ft)</b>",
range = c(0, 5),
zeroline = FALSE
),
shapes = list(
# Strike zone rectangle
list(
type = "rect",
x0 = sz_left, x1 = sz_right,
y0 = sz_bottom, y1 = sz_top,
line = list(color = "black", width = 2),
fillcolor = "rgba(200, 200, 200, 0.1)"
)
),
showlegend = TRUE,
legend = list(title = list(text = '<b>Pitch Type</b>'))
) %>%
animation_opts(
frame = 1000, # 1 second per pitch
transition = 500,
redraw = FALSE
) %>%
animation_slider(
currentvalue = list(
prefix = "Pitch: ",
font = list(size = 14, color = "black")
)
) %>%
config(displayModeBar = TRUE, displaylogo = FALSE)
return(p)
}
# Example usage
# sequence_plot <- create_animated_pitch_sequence(
# cole_pitches,
# "Gerrit Cole 2024",
# max_pitches = 150
# )
# sequence_plot
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from pybaseball import statcast_pitcher
def create_interactive_pitch_movement(pitcher_data, player_name="Pitcher"):
"""
Create interactive pitch movement visualization using Plotly.
Parameters:
pitcher_data: DataFrame with Statcast pitch data
player_name: Name for chart title
Returns:
Plotly figure object
"""
# Filter for complete movement data
pitches = pitcher_data[
pitcher_data['pfx_x'].notna() &
pitcher_data['pfx_z'].notna() &
pitcher_data['pitch_type'].notna()
].copy()
# Convert to inches
pitches['horizontal_break'] = pitches['pfx_x'] * 12
pitches['vertical_break'] = pitches['pfx_z'] * 12
# Create hover text
pitches['hover_text'] = pitches.apply(
lambda row: f"<b>{row['pitch_type']}</b><br>" +
f"Velocity: {row['release_speed']:.1f} mph<br>" +
f"Spin: {row['release_spin_rate']:.0f} rpm<br>" +
f"H-Break: {row['horizontal_break']:.1f} in<br>" +
f"V-Break: {row['vertical_break']:.1f} in<br>" +
f"Result: {row['events']}",
axis=1
)
# Filter for qualifying pitch types
pitch_counts = pitches['pitch_type'].value_counts()
qualifying_pitches = pitch_counts[pitch_counts >= 30].index
plot_data = pitches[pitches['pitch_type'].isin(qualifying_pitches)]
# Pitch type colors
pitch_colors = {
'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
'FS': '#17becf', 'KC': '#bcbd22'
}
# Create figure
fig = go.Figure()
# Add scatter trace for each pitch type
for pitch_type in qualifying_pitches:
pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]
fig.add_trace(go.Scatter(
x=pitch_subset['horizontal_break'],
y=pitch_subset['vertical_break'],
mode='markers',
name=f"{pitch_type} (n={len(pitch_subset)})",
text=pitch_subset['hover_text'],
hoverinfo='text',
marker=dict(
color=pitch_colors.get(pitch_type, '#7f7f7f'),
size=8,
opacity=0.6,
line=dict(width=0.5, color='black')
)
))
# Update layout
fig.update_layout(
title=dict(
text=f"<b>{player_name} Pitch Movement Profile</b><br>" +
"<sub>Catcher's Perspective - Hover for Details</sub>",
x=0.5,
xanchor='center',
font=dict(size=16)
),
xaxis=dict(
title="<b>Horizontal Break (inches)</b><br>← Glove Side | Arm Side →",
zeroline=True,
zerolinewidth=2,
zerolinecolor='gray',
gridcolor='lightgray',
showgrid=True
),
yaxis=dict(
title="<b>Induced Vertical Break (inches)</b><br>↓ Drop | Rise ↑",
zeroline=True,
zerolinewidth=2,
zerolinecolor='gray',
gridcolor='lightgray',
showgrid=True,
scaleanchor="x",
scaleratio=1
),
hovermode='closest',
showlegend=True,
legend=dict(
title=dict(text='<b>Pitch Type</b>'),
orientation='v',
x=1.02,
y=1
),
width=1000,
height=900,
margin=dict(l=80, r=150, t=100, b=80),
template='plotly_white'
)
return fig
# Example usage
# cole_pitches = statcast_pitcher('2024-04-01', '2024-10-01', 543037)
# fig = create_interactive_pitch_movement(cole_pitches, "Gerrit Cole 2024")
# fig.show()
import plotly.graph_objects as go
def create_interactive_release_points(pitcher_data, player_name="Pitcher"):
"""
Create 3D interactive release point visualization.
Parameters:
pitcher_data: DataFrame with Statcast pitch data
player_name: Name for chart title
Returns:
Plotly figure object
"""
# Filter for complete data
pitches = pitcher_data[
pitcher_data['release_pos_x'].notna() &
pitcher_data['release_pos_y'].notna() &
pitcher_data['release_pos_z'].notna() &
pitcher_data['pitch_type'].notna()
].copy()
# Create hover text
pitches['hover_text'] = pitches.apply(
lambda row: f"<b>{row['pitch_type']}</b><br>" +
f"X (side): {row['release_pos_x']:.2f} ft<br>" +
f"Y (extension): {row['release_pos_y']:.2f} ft<br>" +
f"Z (height): {row['release_pos_z']:.2f} ft<br>" +
f"Velocity: {row['release_speed']:.1f} mph<br>" +
f"Result: {row['events']}",
axis=1
)
# Filter for qualifying pitch types
pitch_counts = pitches['pitch_type'].value_counts()
qualifying = pitch_counts[pitch_counts >= 30].index
plot_data = pitches[pitches['pitch_type'].isin(qualifying)]
# Pitch colors
pitch_colors = {
'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
'FS': '#17becf', 'KC': '#bcbd22'
}
# Create figure
fig = go.Figure()
# Add trace for each pitch type
for pitch_type in qualifying:
pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]
fig.add_trace(go.Scatter3d(
x=pitch_subset['release_pos_x'],
y=pitch_subset['release_extension'], # Use extension for Y-axis
z=pitch_subset['release_pos_z'],
mode='markers',
name=f"{pitch_type} (n={len(pitch_subset)})",
text=pitch_subset['hover_text'],
hoverinfo='text',
marker=dict(
color=pitch_colors.get(pitch_type, '#7f7f7f'),
size=5,
opacity=0.7,
line=dict(width=0.3, color='black')
)
))
# Update layout
fig.update_layout(
title=dict(
text=f"<b>{player_name} Release Point Consistency</b><br>" +
"<sub>3D View - Rotate to Explore</sub>",
x=0.5,
xanchor='center',
font=dict(size=16)
),
scene=dict(
xaxis=dict(
title='<b>Horizontal Position (ft)</b><br>← Arm Side | Glove Side →',
gridcolor='lightgray',
backgroundcolor='white'
),
yaxis=dict(
title='<b>Extension (ft)</b>',
gridcolor='lightgray',
backgroundcolor='white'
),
zaxis=dict(
title='<b>Release Height (ft)</b>',
gridcolor='lightgray',
backgroundcolor='white'
),
camera=dict(
eye=dict(x=1.5, y=1.5, z=1.3)
),
aspectmode='cube'
),
showlegend=True,
legend=dict(
title=dict(text='<b>Pitch Type</b>'),
x=1.02,
y=0.9
),
width=1000,
height=800,
template='plotly_white'
)
return fig
# Example usage
# fig = create_interactive_release_points(cole_pitches, "Gerrit Cole 2024")
# fig.show()
import plotly.graph_objects as go
def create_animated_pitch_sequence(pitcher_data, player_name="Pitcher",
max_pitches=200):
"""
Create animated pitch sequence visualization.
Parameters:
pitcher_data: DataFrame with Statcast pitch data
player_name: Name for chart title
max_pitches: Maximum pitches to include (for performance)
Returns:
Plotly figure object
"""
# Prepare sequence data
pitches = pitcher_data[
pitcher_data['plate_x'].notna() &
pitcher_data['plate_z'].notna() &
pitcher_data['pitch_type'].notna()
].copy()
# Sort chronologically
pitches = pitches.sort_values(['game_date', 'at_bat_number', 'pitch_number'])
pitches = pitches.head(max_pitches) # Limit for performance
pitches['sequence_num'] = range(1, len(pitches) + 1)
# Create count state and labels
pitches['count_state'] = pitches['balls'].astype(str) + '-' + pitches['strikes'].astype(str)
pitches['frame_label'] = pitches.apply(
lambda row: f"Pitch {row['sequence_num']}: {row['pitch_type']} " +
f"@ {row['release_speed']:.1f} mph<br>" +
f"Count: {row['count_state']} | {row['description']}",
axis=1
)
# Strike zone boundaries
sz_left, sz_right = -0.83, 0.83
sz_bottom, sz_top = 1.5, 3.5
# Pitch colors
pitch_colors = {
'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
'FS': '#17becf', 'KC': '#bcbd22'
}
# Create figure with frames
fig = go.Figure()
# Get unique pitch types for legend
pitch_types = pitches['pitch_type'].unique()
# Create frames for animation
frames = []
for seq_num in pitches['sequence_num'].unique():
frame_data = pitches[pitches['sequence_num'] <= seq_num]
frame_traces = []
for pitch_type in pitch_types:
pt_data = frame_data[frame_data['pitch_type'] == pitch_type]
if len(pt_data) > 0:
frame_traces.append(go.Scatter(
x=pt_data['plate_x'],
y=pt_data['plate_z'],
mode='markers',
name=pitch_type,
text=pt_data['frame_label'],
hoverinfo='text',
marker=dict(
color=pitch_colors.get(pitch_type, '#7f7f7f'),
size=12,
opacity=0.8,
line=dict(width=1, color='black')
),
showlegend=(seq_num == 1) # Only show legend on first frame
))
frames.append(go.Frame(data=frame_traces, name=str(seq_num)))
# Add initial frame data
initial_data = pitches[pitches['sequence_num'] == 1]
for pitch_type in pitch_types:
pt_data = initial_data[initial_data['pitch_type'] == pitch_type]
if len(pt_data) > 0:
fig.add_trace(go.Scatter(
x=pt_data['plate_x'],
y=pt_data['plate_z'],
mode='markers',
name=pitch_type,
text=pt_data['frame_label'],
hoverinfo='text',
marker=dict(
color=pitch_colors.get(pitch_type, '#7f7f7f'),
size=12,
opacity=0.8,
line=dict(width=1, color='black')
)
))
# Add strike zone
fig.add_shape(
type="rect",
x0=sz_left, x1=sz_right,
y0=sz_bottom, y1=sz_top,
line=dict(color="black", width=2),
fillcolor="rgba(200, 200, 200, 0.1)"
)
fig.frames = frames
# Update layout
fig.update_layout(
title=dict(
text=f"<b>{player_name} Pitch Sequence</b><br>" +
"<sub>Catcher's View - Press Play</sub>",
x=0.5,
xanchor='center',
font=dict(size=16)
),
xaxis=dict(
title="<b>Horizontal Location (ft)</b><br>← Inside | Outside →",
range=[-2, 2],
zeroline=True,
zerolinecolor='lightgray',
gridcolor='lightgray'
),
yaxis=dict(
title="<b>Vertical Location (ft)</b>",
range=[0, 5],
zeroline=False,
gridcolor='lightgray'
),
showlegend=True,
legend=dict(title=dict(text='<b>Pitch Type</b>')),
updatemenus=[{
'type': 'buttons',
'showactive': False,
'buttons': [
{
'label': 'Play',
'method': 'animate',
'args': [None, {
'frame': {'duration': 1000, 'redraw': True},
'fromcurrent': True,
'transition': {'duration': 500}
}]
},
{
'label': 'Pause',
'method': 'animate',
'args': [[None], {
'frame': {'duration': 0, 'redraw': False},
'mode': 'immediate',
'transition': {'duration': 0}
}]
}
],
'x': 0.1,
'y': 0
}],
sliders=[{
'active': 0,
'steps': [
{
'args': [[f.name], {
'frame': {'duration': 0, 'redraw': True},
'mode': 'immediate',
'transition': {'duration': 0}
}],
'label': f.name,
'method': 'animate'
}
for f in frames
],
'currentvalue': {
'prefix': 'Pitch: ',
'font': {'size': 14, 'color': 'black'}
},
'x': 0.1,
'len': 0.9,
'xanchor': 'left',
'y': 0,
'yanchor': 'top'
}],
width=1000,
height=800,
template='plotly_white'
)
return fig
# Example usage
# fig = create_animated_pitch_sequence(cole_pitches, "Gerrit Cole 2024", max_pitches=150)
# fig.show()
Exercise 7.1: Pitcher Comparison Analysis
Using Statcast data, compare two starting pitchers from different teams:
- Pull data for both pitchers for the 2024 season
- Calculate velocity, spin rate, and movement profiles for each pitch type
- Compare their arsenals: usage rates, average velocity, and whiff rates
- Create a pitch movement chart for each pitcher
- Write a brief scouting report comparing their arsenals
Suggested pitchers: Spencer Strider (ATL) and Shota Imanaga (CHC)
Exercise 7.2: Command and Location Analysis
Analyze pitch location patterns for a pitcher of your choice:
- Calculate zone%, edge%, heart%, and chase rate by pitch type
- Analyze CSW% overall and by pitch type
- Create a heatmap showing pitch locations for their primary pitch (four-seam fastball)
- Compare location patterns by count (ahead vs. behind in the count)
- Assess: Is this pitcher's success driven more by stuff or command?
Exercise 7.3: Arsenal Effectiveness Study
Investigate which pitch in a pitcher's arsenal is most/least effective:
- For each pitch type, calculate:
- Whiff rate
- xwOBA against
- Hard hit rate against
- CSW%
- Usage rate
- Identify the best and worst pitches in the arsenal
- Analyze if usage rate aligns with effectiveness (do they throw their best pitches most?)
- Calculate pitch values: Run Value per 100 pitches for each pitch type
- Make a recommendation: Should they adjust their pitch usage?
Challenge Extension: Compare the pitcher's arsenal effectiveness against left-handed vs. right-handed batters. Do they have platoon splits? Which pitches drive those splits?
You've now completed your deep dive into Statcast pitching analytics. You understand how modern tracking systems measure every pitch, what those measurements reveal about pitcher performance, and how to analyze arsenals, command, and expected outcomes. These skills form the foundation for evaluating pitchers in the modern game, whether you're building projection models, designing development plans, or making strategic decisions.
The next chapter will explore park factors and environmental effects - understanding how context affects the statistics we've been analyzing throughout this book.
Practice Exercises
Reinforce what you've learned with these hands-on exercises. Try to solve them on your own before viewing hints or solutions.
Tips for Success
- Read the problem carefully before starting to code
- Break down complex problems into smaller steps
- Use the hints if you're stuck - they won't give away the answer
- After solving, compare your approach with the solution
Pitcher Comparison Analysis
1. Pull data for both pitchers for the 2024 season
2. Calculate velocity, spin rate, and movement profiles for each pitch type
3. Compare their arsenals: usage rates, average velocity, and whiff rates
4. Create a pitch movement chart for each pitcher
5. Write a brief scouting report comparing their arsenals
**Suggested pitchers**: Spencer Strider (ATL) and Shota Imanaga (CHC)
Command and Location Analysis
1. Calculate zone%, edge%, heart%, and chase rate by pitch type
2. Analyze CSW% overall and by pitch type
3. Create a heatmap showing pitch locations for their primary pitch (four-seam fastball)
4. Compare location patterns by count (ahead vs. behind in the count)
5. Assess: Is this pitcher's success driven more by stuff or command?
Arsenal Effectiveness Study
1. For each pitch type, calculate:
- Whiff rate
- xwOBA against
- Hard hit rate against
- CSW%
- Usage rate
2. Identify the best and worst pitches in the arsenal
3. Analyze if usage rate aligns with effectiveness (do they throw their best pitches most?)
4. Calculate pitch values: Run Value per 100 pitches for each pitch type
5. Make a recommendation: Should they adjust their pitch usage?
**Challenge Extension**: Compare the pitcher's arsenal effectiveness against left-handed vs. right-handed batters. Do they have platoon splits? Which pitches drive those splits?
---
You've now completed your deep dive into Statcast pitching analytics. You understand how modern tracking systems measure every pitch, what those measurements reveal about pitcher performance, and how to analyze arsenals, command, and expected outcomes. These skills form the foundation for evaluating pitchers in the modern game, whether you're building projection models, designing development plans, or making strategic decisions.
The next chapter will explore park factors and environmental effects - understanding how context affects the statistics we've been analyzing throughout this book.