Pitch tunneling represents one of the most sophisticated concepts in modern pitching analytics. At its core, tunneling refers to the ability of a pitcher to make different pitches appear identical for as long as possible before they diverge toward their final destinations. This optical illusion creates a decision-making crisis for hitters, who must commit to a swing before they can reliably distinguish between pitches.
The Neuroscience of Hitting
Understanding pitch tunneling requires appreciation of the severe time constraints faced by hitters. A 95 mph fastball travels from the pitcher's hand to home plate in approximately 400 milliseconds. However, the visual processing and motor execution required to swing a bat takes roughly 200-250 milliseconds. This means hitters must begin their swing decision when the ball is roughly halfway to the plate—around 30 feet from release.
Research by Dr. Peter Fadde and others has shown that elite hitters rely heavily on early ball flight cues to predict pitch location and type. They track the ball out of the pitcher's hand and use the first 15-20 feet of flight to make their swing decision. This is where tunneling becomes devastating: if two pitches occupy the same visual space during this critical recognition window, the hitter cannot reliably distinguish between them until it's too late to adjust.
The Tunnel Point Concept
The "tunnel point" is the location along the pitch trajectory where different pitches begin to separate visibly. For effective tunneling, this point should occur as late as possible—ideally beyond the point where hitters must commit to their swing. A typical tunnel point for elite pitchers occurs around 20-25 feet from home plate, though some exceptional pitchers achieve separation points even closer to the plate.
Consider a pitcher throwing a four-seam fastball and a slider with identical release points and similar trajectories for the first 25 feet. At the tunnel point, the slider begins to break sharply away while the fastball continues on its straighter path. The hitter, having committed to one pitch type based on early flight cues, finds themselves helpless against the other.
Key Components of Effective Tunneling
Several factors contribute to successful pitch tunneling:
Release Point Consistency: Pitches must emerge from the same release point (within inches) to appear identical initially. Inconsistent release points give hitters early visual cues about pitch type.
Matching Early Trajectories: Tunneling pairs should follow nearly identical paths through the recognition zone (first 15-25 feet).
Sufficient Late Differentiation: After the tunnel point, pitches must separate enough to cover different zones, forcing the hitter to commit to one or the other.
Complementary Movement Profiles: The best tunneling pairs have movement that diverges in different planes—fastball with vertical rise paired with a slider breaking horizontally and down.
Historical Context
While pitchers have intuitively understood deception for over a century, the analytical framework for tunneling emerged only recently. The term "tunneling" was popularized by baseball analyst Mike Fast in the early 2010s, leveraging pitch tracking data from PITCHf/x and later Statcast. This data revolution allowed quantification of concepts that pitchers and coaches had only vaguely articulated previously.
Old-school pitching coaches would tell their pitchers to "make your changeup look like your fastball" or "throw everything from the same slot." These were intuitive approximations of tunneling principles. Modern technology allows us to measure exactly how well pitchers execute these concepts and identify specific improvements.
Calculating tunnel points requires three-dimensional pitch trajectory data, which Statcast provides for every pitch in MLB. The analysis involves comparing the flight paths of different pitch types and identifying where they diverge beyond a perceptual threshold.
Mathematical Framework
For two pitches with trajectories described by position vectors p₁(t) and p₂(t), the tunnel point occurs at the earliest time t_tunnel where:
||p₁(t) - p₂(t)|| > δ_perceptual
Where δ_perceptual represents the minimum distance (typically 1-3 inches) that a hitter can reliably detect given the ball's speed and distance from the plate.
Implementing Tunnel Point Calculation in R
Let's analyze Spencer Strider's exceptional fastball-slider tunnel using pitch-by-pitch Statcast data:
# Load required libraries
library(baseballr)
library(tidyverse)
library(plotly)
# Function to calculate pitch trajectory at any point
calculate_trajectory <- function(pitch_data, distance_from_plate) {
# Extract initial conditions
vx0 <- pitch_data$vx0
vy0 <- pitch_data$vy0
vz0 <- pitch_data$vz0
ax <- pitch_data$ax
ay <- pitch_data$ay
az <- pitch_data$az
x0 <- pitch_data$release_pos_x
y0 <- pitch_data$release_pos_y
z0 <- pitch_data$release_pos_z
# Calculate time to reach the distance
# Using quadratic formula for: distance = y0 + vy0*t + 0.5*ay*t^2
y_target <- distance_from_plate
a <- 0.5 * ay
b <- vy0
c <- y0 - y_target
# Take the positive root (forward in time)
t <- (-b - sqrt(b^2 - 4*a*c)) / (2*a)
# Calculate x, y, z positions at this time
x <- x0 + vx0*t + 0.5*ax*t^2
y <- y0 + vy0*t + 0.5*ay*t^2
z <- z0 + vz0*t + 0.5*az*t^2
return(c(x = x, y = y, z = z))
}
# Function to find tunnel point between two pitch types
find_tunnel_point <- function(pitch1_data, pitch2_data,
threshold_inches = 2,
start_dist = 50,
end_dist = 5) {
distances <- seq(start_dist, end_dist, by = -0.5)
for (dist in distances) {
pos1 <- calculate_trajectory(pitch1_data, dist)
pos2 <- calculate_trajectory(pitch2_data, dist)
# Calculate 3D Euclidean distance in inches
separation <- sqrt((pos1['x'] - pos2['x'])^2 +
(pos1['z'] - pos2['z'])^2) * 12
if (separation > threshold_inches) {
return(list(
tunnel_distance = dist,
separation_inches = separation,
position1 = pos1,
position2 = pos2
))
}
}
return(NULL)
}
# Get Spencer Strider's 2023 data
strider_id <- 675911
# Fetch Statcast data for 2023 season
strider_data <- statcast_search_pitchers(
start_date = "2023-04-01",
end_date = "2023-09-30",
pitcherid = strider_id
)
# Filter for fastballs and sliders
ff_slider <- strider_data %>%
filter(pitch_type %in% c("FF", "SL")) %>%
filter(!is.na(release_pos_x) & !is.na(vx0))
# Get average pitch characteristics
avg_fastball <- ff_slider %>%
filter(pitch_type == "FF") %>%
summarise(across(c(vx0, vy0, vz0, ax, ay, az,
release_pos_x, release_pos_y, release_pos_z,
release_speed, release_spin_rate,
pfx_x, pfx_z), mean, na.rm = TRUE))
avg_slider <- ff_slider %>%
filter(pitch_type == "SL") %>%
summarise(across(c(vx0, vy0, vz0, ax, ay, az,
release_pos_x, release_pos_y, release_pos_z,
release_speed, release_spin_rate,
pfx_x, pfx_z), mean, na.rm = TRUE))
# Calculate tunnel point
tunnel_result <- find_tunnel_point(avg_fastball, avg_slider)
cat("Spencer Strider FF-SL Tunnel Analysis\n")
cat("=====================================\n")
cat(sprintf("Tunnel Point: %.1f feet from home plate\n",
tunnel_result$tunnel_distance))
cat(sprintf("Separation at tunnel: %.2f inches\n",
tunnel_result$separation_inches))
cat(sprintf("Fastball velocity: %.1f mph\n", avg_fastball$release_speed))
cat(sprintf("Slider velocity: %.1f mph\n", avg_slider$release_speed))
cat(sprintf("Velocity differential: %.1f mph\n",
avg_fastball$release_speed - avg_slider$release_speed))
# Visualize the tunnel
visualize_tunnel <- function(pitch1, pitch2, pitch1_name, pitch2_name) {
distances <- seq(55, 0, by = -0.5)
trajectory_data <- map_df(distances, function(d) {
if (d > 0) {
pos1 <- calculate_trajectory(pitch1, d)
pos2 <- calculate_trajectory(pitch2, d)
tibble(
distance = d,
x_ff = pos1['x'],
z_ff = pos1['z'],
x_sl = pos2['x'],
z_sl = pos2['z']
)
}
})
p <- ggplot(trajectory_data) +
geom_path(aes(x = distance, y = x_ff * 12, color = pitch1_name),
size = 1.2) +
geom_path(aes(x = distance, y = x_sl * 12, color = pitch2_name),
size = 1.2) +
scale_color_manual(values = c("FF" = "#d62728", "SL" = "#1f77b4")) +
labs(title = "Pitch Tunneling: Horizontal Movement",
subtitle = "Spencer Strider Fastball vs Slider",
x = "Distance from Home Plate (feet)",
y = "Horizontal Position (inches from center)",
color = "Pitch Type") +
theme_minimal() +
theme(legend.position = "bottom")
return(p)
}
Python Implementation with Pybaseball
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pybaseball import statcast_pitcher
from pybaseball import playerid_lookup
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns
# Configure plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 8)
class PitchTrajectory:
"""Calculate and analyze pitch trajectories from Statcast data"""
def __init__(self, pitch_data):
"""
Initialize with pitch data containing:
vx0, vy0, vz0, ax, ay, az, release_pos_x, release_pos_y, release_pos_z
"""
self.vx0 = pitch_data['vx0']
self.vy0 = pitch_data['vy0']
self.vz0 = pitch_data['vz0']
self.ax = pitch_data['ax']
self.ay = pitch_data['ay']
self.az = pitch_data['az']
self.x0 = pitch_data['release_pos_x']
self.y0 = pitch_data['release_pos_y']
self.z0 = pitch_data['release_pos_z']
def position_at_distance(self, y_distance):
"""
Calculate x, y, z position when ball is at y_distance from plate
Parameters:
-----------
y_distance : float
Distance from home plate in feet (positive = toward pitcher)
Returns:
--------
tuple : (x, y, z) position in feet
"""
# Solve for time using quadratic formula
# y(t) = y0 + vy0*t + 0.5*ay*t^2
a = 0.5 * self.ay
b = self.vy0
c = self.y0 - y_distance
discriminant = b**2 - 4*a*c
if discriminant < 0:
return None
t = (-b - np.sqrt(discriminant)) / (2*a)
if t < 0:
return None
x = self.x0 + self.vx0*t + 0.5*self.ax*t**2
y = self.y0 + self.vy0*t + 0.5*self.ay*t**2
z = self.z0 + self.vz0*t + 0.5*self.az*t**2
return (x, y, z)
def full_trajectory(self, num_points=100):
"""Generate complete trajectory from release to plate"""
distances = np.linspace(self.y0, 1.417, num_points) # 1.417 ft = 17 inches
trajectory = []
for d in distances:
pos = self.position_at_distance(d)
if pos:
trajectory.append(pos)
return np.array(trajectory)
def find_tunnel_point(traj1, traj2, threshold_inches=2.0):
"""
Find where two pitch trajectories separate beyond threshold
Parameters:
-----------
traj1, traj2 : PitchTrajectory objects
threshold_inches : float
Separation threshold in inches
Returns:
--------
dict : tunnel point information
"""
# Check distances from 50 feet to 5 feet
distances = np.arange(50, 5, -0.5)
for dist in distances:
pos1 = traj1.position_at_distance(dist)
pos2 = traj2.position_at_distance(dist)
if pos1 is None or pos2 is None:
continue
# Calculate horizontal and vertical separation
horiz_sep = np.sqrt((pos1[0] - pos2[0])**2) * 12 # Convert to inches
vert_sep = abs(pos1[2] - pos2[2]) * 12
total_sep = np.sqrt(horiz_sep**2 + vert_sep**2)
if total_sep > threshold_inches:
return {
'tunnel_distance': dist,
'total_separation': total_sep,
'horizontal_separation': horiz_sep,
'vertical_separation': vert_sep,
'pos1': pos1,
'pos2': pos2
}
return None
# Analyze Dylan Cease's tunneling
def analyze_pitcher_tunnel(pitcher_name, year=2023):
"""Comprehensive tunnel analysis for a pitcher"""
# Get pitcher ID
# Dylan Cease ID: 656302
pitcher_id = 656302
# Fetch data
print(f"Fetching data for {pitcher_name} ({year})...")
data = statcast_pitcher(f'{year}-04-01', f'{year}-09-30', pitcher_id)
if data is None or len(data) == 0:
print("No data found")
return
# Filter for primary pitches
pitches = data[data['pitch_type'].isin(['FF', 'SL', 'CU'])].copy()
# Remove null values
required_cols = ['vx0', 'vy0', 'vz0', 'ax', 'ay', 'az',
'release_pos_x', 'release_pos_y', 'release_pos_z']
pitches = pitches.dropna(subset=required_cols)
print(f"\nAnalyzing {len(pitches)} pitches")
print(f"Pitch mix: {pitches['pitch_type'].value_counts().to_dict()}")
# Calculate average trajectory for each pitch type
pitch_types = {}
for ptype in ['FF', 'SL', 'CU']:
subset = pitches[pitches['pitch_type'] == ptype]
if len(subset) > 0:
avg_data = subset[required_cols].mean()
pitch_types[ptype] = PitchTrajectory(avg_data)
# Analyze all pitch pairs
results = {}
pairs = [('FF', 'SL'), ('FF', 'CU'), ('SL', 'CU')]
for p1, p2 in pairs:
if p1 in pitch_types and p2 in pitch_types:
tunnel = find_tunnel_point(pitch_types[p1], pitch_types[p2])
if tunnel:
results[f"{p1}-{p2}"] = tunnel
# Print results
print(f"\n{'='*60}")
print(f"TUNNEL ANALYSIS: {pitcher_name} ({year})")
print(f"{'='*60}\n")
for pair, result in results.items():
print(f"{pair} Tunnel:")
print(f" Distance from plate: {result['tunnel_distance']:.1f} feet")
print(f" Total separation: {result['total_separation']:.2f} inches")
print(f" Horizontal: {result['horizontal_separation']:.2f} inches")
print(f" Vertical: {result['vertical_separation']:.2f} inches")
print()
return pitches, pitch_types, results
# Example usage
pitches, trajectories, tunnel_results = analyze_pitcher_tunnel("Dylan Cease", 2023)
# Visualize 3D trajectories
def plot_3d_trajectories(pitch_types, pitcher_name):
"""Create 3D visualization of pitch trajectories"""
fig = plt.figure(figsize=(14, 10))
ax = fig.add_subplot(111, projection='3d')
colors = {'FF': '#d62728', 'SL': '#1f77b4', 'CU': '#2ca02c', 'CH': '#ff7f0e'}
labels = {'FF': 'Four-Seam FB', 'SL': 'Slider',
'CU': 'Curveball', 'CH': 'Changeup'}
for ptype, trajectory in pitch_types.items():
path = trajectory.full_trajectory(num_points=200)
ax.plot(path[:, 0], path[:, 1], path[:, 2],
color=colors.get(ptype, 'gray'),
linewidth=2.5,
label=labels.get(ptype, ptype),
alpha=0.8)
# Add strike zone
sz_x = [-0.708/2, 0.708/2, 0.708/2, -0.708/2, -0.708/2] # 17 inches wide
sz_z = [1.5, 1.5, 3.5, 3.5, 1.5] # Approximate zone
sz_y = [1.417] * 5 # Front of plate
ax.plot(sz_x, sz_y, sz_z, 'k-', linewidth=2, alpha=0.5)
ax.set_xlabel('Horizontal Position (feet)', fontsize=11)
ax.set_ylabel('Distance from Plate (feet)', fontsize=11)
ax.set_zlabel('Height (feet)', fontsize=11)
ax.set_title(f'{pitcher_name} - Pitch Trajectories (3D View)',
fontsize=14, fontweight='bold')
ax.legend(loc='upper left', fontsize=10)
ax.view_init(elev=15, azim=45)
plt.tight_layout()
return fig
# Create visualization
fig = plot_3d_trajectories(trajectories, "Dylan Cease")
plt.savefig('cease_tunnel_3d.png', dpi=300, bbox_inches='tight')
plt.show()
||p₁(t) - p₂(t)|| > δ_perceptual
# Load required libraries
library(baseballr)
library(tidyverse)
library(plotly)
# Function to calculate pitch trajectory at any point
calculate_trajectory <- function(pitch_data, distance_from_plate) {
# Extract initial conditions
vx0 <- pitch_data$vx0
vy0 <- pitch_data$vy0
vz0 <- pitch_data$vz0
ax <- pitch_data$ax
ay <- pitch_data$ay
az <- pitch_data$az
x0 <- pitch_data$release_pos_x
y0 <- pitch_data$release_pos_y
z0 <- pitch_data$release_pos_z
# Calculate time to reach the distance
# Using quadratic formula for: distance = y0 + vy0*t + 0.5*ay*t^2
y_target <- distance_from_plate
a <- 0.5 * ay
b <- vy0
c <- y0 - y_target
# Take the positive root (forward in time)
t <- (-b - sqrt(b^2 - 4*a*c)) / (2*a)
# Calculate x, y, z positions at this time
x <- x0 + vx0*t + 0.5*ax*t^2
y <- y0 + vy0*t + 0.5*ay*t^2
z <- z0 + vz0*t + 0.5*az*t^2
return(c(x = x, y = y, z = z))
}
# Function to find tunnel point between two pitch types
find_tunnel_point <- function(pitch1_data, pitch2_data,
threshold_inches = 2,
start_dist = 50,
end_dist = 5) {
distances <- seq(start_dist, end_dist, by = -0.5)
for (dist in distances) {
pos1 <- calculate_trajectory(pitch1_data, dist)
pos2 <- calculate_trajectory(pitch2_data, dist)
# Calculate 3D Euclidean distance in inches
separation <- sqrt((pos1['x'] - pos2['x'])^2 +
(pos1['z'] - pos2['z'])^2) * 12
if (separation > threshold_inches) {
return(list(
tunnel_distance = dist,
separation_inches = separation,
position1 = pos1,
position2 = pos2
))
}
}
return(NULL)
}
# Get Spencer Strider's 2023 data
strider_id <- 675911
# Fetch Statcast data for 2023 season
strider_data <- statcast_search_pitchers(
start_date = "2023-04-01",
end_date = "2023-09-30",
pitcherid = strider_id
)
# Filter for fastballs and sliders
ff_slider <- strider_data %>%
filter(pitch_type %in% c("FF", "SL")) %>%
filter(!is.na(release_pos_x) & !is.na(vx0))
# Get average pitch characteristics
avg_fastball <- ff_slider %>%
filter(pitch_type == "FF") %>%
summarise(across(c(vx0, vy0, vz0, ax, ay, az,
release_pos_x, release_pos_y, release_pos_z,
release_speed, release_spin_rate,
pfx_x, pfx_z), mean, na.rm = TRUE))
avg_slider <- ff_slider %>%
filter(pitch_type == "SL") %>%
summarise(across(c(vx0, vy0, vz0, ax, ay, az,
release_pos_x, release_pos_y, release_pos_z,
release_speed, release_spin_rate,
pfx_x, pfx_z), mean, na.rm = TRUE))
# Calculate tunnel point
tunnel_result <- find_tunnel_point(avg_fastball, avg_slider)
cat("Spencer Strider FF-SL Tunnel Analysis\n")
cat("=====================================\n")
cat(sprintf("Tunnel Point: %.1f feet from home plate\n",
tunnel_result$tunnel_distance))
cat(sprintf("Separation at tunnel: %.2f inches\n",
tunnel_result$separation_inches))
cat(sprintf("Fastball velocity: %.1f mph\n", avg_fastball$release_speed))
cat(sprintf("Slider velocity: %.1f mph\n", avg_slider$release_speed))
cat(sprintf("Velocity differential: %.1f mph\n",
avg_fastball$release_speed - avg_slider$release_speed))
# Visualize the tunnel
visualize_tunnel <- function(pitch1, pitch2, pitch1_name, pitch2_name) {
distances <- seq(55, 0, by = -0.5)
trajectory_data <- map_df(distances, function(d) {
if (d > 0) {
pos1 <- calculate_trajectory(pitch1, d)
pos2 <- calculate_trajectory(pitch2, d)
tibble(
distance = d,
x_ff = pos1['x'],
z_ff = pos1['z'],
x_sl = pos2['x'],
z_sl = pos2['z']
)
}
})
p <- ggplot(trajectory_data) +
geom_path(aes(x = distance, y = x_ff * 12, color = pitch1_name),
size = 1.2) +
geom_path(aes(x = distance, y = x_sl * 12, color = pitch2_name),
size = 1.2) +
scale_color_manual(values = c("FF" = "#d62728", "SL" = "#1f77b4")) +
labs(title = "Pitch Tunneling: Horizontal Movement",
subtitle = "Spencer Strider Fastball vs Slider",
x = "Distance from Home Plate (feet)",
y = "Horizontal Position (inches from center)",
color = "Pitch Type") +
theme_minimal() +
theme(legend.position = "bottom")
return(p)
}
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pybaseball import statcast_pitcher
from pybaseball import playerid_lookup
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns
# Configure plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 8)
class PitchTrajectory:
"""Calculate and analyze pitch trajectories from Statcast data"""
def __init__(self, pitch_data):
"""
Initialize with pitch data containing:
vx0, vy0, vz0, ax, ay, az, release_pos_x, release_pos_y, release_pos_z
"""
self.vx0 = pitch_data['vx0']
self.vy0 = pitch_data['vy0']
self.vz0 = pitch_data['vz0']
self.ax = pitch_data['ax']
self.ay = pitch_data['ay']
self.az = pitch_data['az']
self.x0 = pitch_data['release_pos_x']
self.y0 = pitch_data['release_pos_y']
self.z0 = pitch_data['release_pos_z']
def position_at_distance(self, y_distance):
"""
Calculate x, y, z position when ball is at y_distance from plate
Parameters:
-----------
y_distance : float
Distance from home plate in feet (positive = toward pitcher)
Returns:
--------
tuple : (x, y, z) position in feet
"""
# Solve for time using quadratic formula
# y(t) = y0 + vy0*t + 0.5*ay*t^2
a = 0.5 * self.ay
b = self.vy0
c = self.y0 - y_distance
discriminant = b**2 - 4*a*c
if discriminant < 0:
return None
t = (-b - np.sqrt(discriminant)) / (2*a)
if t < 0:
return None
x = self.x0 + self.vx0*t + 0.5*self.ax*t**2
y = self.y0 + self.vy0*t + 0.5*self.ay*t**2
z = self.z0 + self.vz0*t + 0.5*self.az*t**2
return (x, y, z)
def full_trajectory(self, num_points=100):
"""Generate complete trajectory from release to plate"""
distances = np.linspace(self.y0, 1.417, num_points) # 1.417 ft = 17 inches
trajectory = []
for d in distances:
pos = self.position_at_distance(d)
if pos:
trajectory.append(pos)
return np.array(trajectory)
def find_tunnel_point(traj1, traj2, threshold_inches=2.0):
"""
Find where two pitch trajectories separate beyond threshold
Parameters:
-----------
traj1, traj2 : PitchTrajectory objects
threshold_inches : float
Separation threshold in inches
Returns:
--------
dict : tunnel point information
"""
# Check distances from 50 feet to 5 feet
distances = np.arange(50, 5, -0.5)
for dist in distances:
pos1 = traj1.position_at_distance(dist)
pos2 = traj2.position_at_distance(dist)
if pos1 is None or pos2 is None:
continue
# Calculate horizontal and vertical separation
horiz_sep = np.sqrt((pos1[0] - pos2[0])**2) * 12 # Convert to inches
vert_sep = abs(pos1[2] - pos2[2]) * 12
total_sep = np.sqrt(horiz_sep**2 + vert_sep**2)
if total_sep > threshold_inches:
return {
'tunnel_distance': dist,
'total_separation': total_sep,
'horizontal_separation': horiz_sep,
'vertical_separation': vert_sep,
'pos1': pos1,
'pos2': pos2
}
return None
# Analyze Dylan Cease's tunneling
def analyze_pitcher_tunnel(pitcher_name, year=2023):
"""Comprehensive tunnel analysis for a pitcher"""
# Get pitcher ID
# Dylan Cease ID: 656302
pitcher_id = 656302
# Fetch data
print(f"Fetching data for {pitcher_name} ({year})...")
data = statcast_pitcher(f'{year}-04-01', f'{year}-09-30', pitcher_id)
if data is None or len(data) == 0:
print("No data found")
return
# Filter for primary pitches
pitches = data[data['pitch_type'].isin(['FF', 'SL', 'CU'])].copy()
# Remove null values
required_cols = ['vx0', 'vy0', 'vz0', 'ax', 'ay', 'az',
'release_pos_x', 'release_pos_y', 'release_pos_z']
pitches = pitches.dropna(subset=required_cols)
print(f"\nAnalyzing {len(pitches)} pitches")
print(f"Pitch mix: {pitches['pitch_type'].value_counts().to_dict()}")
# Calculate average trajectory for each pitch type
pitch_types = {}
for ptype in ['FF', 'SL', 'CU']:
subset = pitches[pitches['pitch_type'] == ptype]
if len(subset) > 0:
avg_data = subset[required_cols].mean()
pitch_types[ptype] = PitchTrajectory(avg_data)
# Analyze all pitch pairs
results = {}
pairs = [('FF', 'SL'), ('FF', 'CU'), ('SL', 'CU')]
for p1, p2 in pairs:
if p1 in pitch_types and p2 in pitch_types:
tunnel = find_tunnel_point(pitch_types[p1], pitch_types[p2])
if tunnel:
results[f"{p1}-{p2}"] = tunnel
# Print results
print(f"\n{'='*60}")
print(f"TUNNEL ANALYSIS: {pitcher_name} ({year})")
print(f"{'='*60}\n")
for pair, result in results.items():
print(f"{pair} Tunnel:")
print(f" Distance from plate: {result['tunnel_distance']:.1f} feet")
print(f" Total separation: {result['total_separation']:.2f} inches")
print(f" Horizontal: {result['horizontal_separation']:.2f} inches")
print(f" Vertical: {result['vertical_separation']:.2f} inches")
print()
return pitches, pitch_types, results
# Example usage
pitches, trajectories, tunnel_results = analyze_pitcher_tunnel("Dylan Cease", 2023)
# Visualize 3D trajectories
def plot_3d_trajectories(pitch_types, pitcher_name):
"""Create 3D visualization of pitch trajectories"""
fig = plt.figure(figsize=(14, 10))
ax = fig.add_subplot(111, projection='3d')
colors = {'FF': '#d62728', 'SL': '#1f77b4', 'CU': '#2ca02c', 'CH': '#ff7f0e'}
labels = {'FF': 'Four-Seam FB', 'SL': 'Slider',
'CU': 'Curveball', 'CH': 'Changeup'}
for ptype, trajectory in pitch_types.items():
path = trajectory.full_trajectory(num_points=200)
ax.plot(path[:, 0], path[:, 1], path[:, 2],
color=colors.get(ptype, 'gray'),
linewidth=2.5,
label=labels.get(ptype, ptype),
alpha=0.8)
# Add strike zone
sz_x = [-0.708/2, 0.708/2, 0.708/2, -0.708/2, -0.708/2] # 17 inches wide
sz_z = [1.5, 1.5, 3.5, 3.5, 1.5] # Approximate zone
sz_y = [1.417] * 5 # Front of plate
ax.plot(sz_x, sz_y, sz_z, 'k-', linewidth=2, alpha=0.5)
ax.set_xlabel('Horizontal Position (feet)', fontsize=11)
ax.set_ylabel('Distance from Plate (feet)', fontsize=11)
ax.set_zlabel('Height (feet)', fontsize=11)
ax.set_title(f'{pitcher_name} - Pitch Trajectories (3D View)',
fontsize=14, fontweight='bold')
ax.legend(loc='upper left', fontsize=10)
ax.view_init(elev=15, azim=45)
plt.tight_layout()
return fig
# Create visualization
fig = plot_3d_trajectories(trajectories, "Dylan Cease")
plt.savefig('cease_tunnel_3d.png', dpi=300, bbox_inches='tight')
plt.show()
Release point consistency is the foundation of effective pitch tunneling. Even slight variations in where the ball leaves the pitcher's hand can telegraph pitch type to observant hitters. Elite pitchers maintain release points within a 2-3 inch window across all pitch types.
Measuring Release Point Variance
Statcast tracks three-dimensional release position (x, y, z coordinates) for every pitch. We can analyze the standard deviation and range of these positions to quantify consistency.
# Analyze release point consistency
analyze_release_consistency <- function(pitcher_data) {
release_analysis <- pitcher_data %>%
group_by(pitch_type) %>%
summarise(
n_pitches = n(),
# Release position means
mean_x = mean(release_pos_x, na.rm = TRUE),
mean_y = mean(release_pos_y, na.rm = TRUE),
mean_z = mean(release_pos_z, na.rm = TRUE),
# Release position standard deviations (in inches)
sd_x = sd(release_pos_x, na.rm = TRUE) * 12,
sd_y = sd(release_pos_y, na.rm = TRUE) * 12,
sd_z = sd(release_pos_z, na.rm = TRUE) * 12,
# Total positional variance
total_variance = sqrt(sd_x^2 + sd_z^2),
.groups = 'drop'
) %>%
arrange(desc(n_pitches))
return(release_analysis)
}
# Analyze Shohei Ohtani's release consistency
ohtani_id <- 660271
ohtani_data <- statcast_search_pitchers(
start_date = "2023-04-01",
end_date = "2023-09-30",
pitcherid = ohtani_id
)
ohtani_release <- analyze_release_consistency(ohtani_data)
print("Shohei Ohtani Release Point Consistency (2023)")
print(ohtani_release)
# Visualize release points
plot_release_consistency <- function(pitcher_data, pitcher_name) {
# Calculate deviations from mean release point
mean_release <- pitcher_data %>%
summarise(
mean_x = mean(release_pos_x, na.rm = TRUE),
mean_z = mean(release_pos_z, na.rm = TRUE)
)
plot_data <- pitcher_data %>%
filter(!is.na(release_pos_x) & !is.na(release_pos_z)) %>%
mutate(
rel_x_inches = (release_pos_x - mean_release$mean_x) * 12,
rel_z_inches = (release_pos_z - mean_release$mean_z) * 12
)
p <- ggplot(plot_data, aes(x = rel_x_inches, y = rel_z_inches,
color = pitch_type)) +
geom_point(alpha = 0.3, size = 1.5) +
stat_ellipse(level = 0.95, size = 1) +
facet_wrap(~pitch_type, ncol = 3) +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.5) +
labs(
title = paste(pitcher_name, "- Release Point Consistency"),
subtitle = "95% confidence ellipses by pitch type",
x = "Horizontal Deviation from Mean (inches)",
y = "Vertical Deviation from Mean (inches)",
color = "Pitch Type"
) +
theme_minimal() +
theme(legend.position = "bottom") +
coord_fixed(ratio = 1)
return(p)
}
# Create visualization
p_release <- plot_release_consistency(ohtani_data, "Shohei Ohtani")
print(p_release)
# Calculate cross-pitch release consistency
calculate_release_distance_matrix <- function(pitcher_data) {
pitch_types <- unique(pitcher_data$pitch_type)
pitch_types <- pitch_types[!is.na(pitch_types)]
# Calculate mean release point for each pitch type
means <- pitcher_data %>%
group_by(pitch_type) %>%
summarise(
mean_x = mean(release_pos_x, na.rm = TRUE),
mean_y = mean(release_pos_y, na.rm = TRUE),
mean_z = mean(release_pos_z, na.rm = TRUE),
.groups = 'drop'
)
# Create distance matrix
n <- nrow(means)
dist_matrix <- matrix(0, nrow = n, ncol = n)
rownames(dist_matrix) <- means$pitch_type
colnames(dist_matrix) <- means$pitch_type
for (i in 1:n) {
for (j in 1:n) {
if (i != j) {
dx <- means$mean_x[i] - means$mean_x[j]
dy <- means$mean_y[i] - means$mean_y[j]
dz <- means$mean_z[i] - means$mean_z[j]
# Distance in inches
dist_matrix[i, j] <- sqrt(dx^2 + dy^2 + dz^2) * 12
}
}
}
return(dist_matrix)
}
ohtani_distances <- calculate_release_distance_matrix(ohtani_data)
print("Release Point Distances Between Pitch Types (inches):")
print(round(ohtani_distances, 2))
Python Release Point Analysis
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import multivariate_normal
from matplotlib.patches import Ellipse
def analyze_release_point_consistency(pitcher_data):
"""
Analyze release point consistency across pitch types
Returns detailed statistics and visualizations
"""
# Filter valid data
valid_data = pitcher_data.dropna(
subset=['release_pos_x', 'release_pos_y', 'release_pos_z', 'pitch_type']
)
# Calculate statistics by pitch type
stats = valid_data.groupby('pitch_type').agg({
'release_pos_x': ['mean', 'std', 'count'],
'release_pos_y': ['mean', 'std'],
'release_pos_z': ['mean', 'std']
})
# Convert standard deviations to inches
stats[('release_pos_x', 'std')] *= 12
stats[('release_pos_y', 'std')] *= 12
stats[('release_pos_z', 'std')] *= 12
# Calculate total variance in horizontal/vertical plane
stats['total_variance'] = np.sqrt(
stats[('release_pos_x', 'std')]**2 +
stats[('release_pos_z', 'std')]**2
)
return stats
def plot_release_point_comparison(pitcher_data, pitcher_name):
"""
Create comprehensive release point visualization
"""
fig, axes = plt.subplots(2, 2, figsize=(14, 12))
valid_data = pitcher_data.dropna(
subset=['release_pos_x', 'release_pos_z', 'pitch_type']
)
# Get pitch types and colors
pitch_types = valid_data['pitch_type'].unique()
colors = plt.cm.Set2(np.linspace(0, 1, len(pitch_types)))
color_map = dict(zip(pitch_types, colors))
# Plot 1: Scatter plot with density
ax1 = axes[0, 0]
for ptype in pitch_types:
subset = valid_data[valid_data['pitch_type'] == ptype]
ax1.scatter(subset['release_pos_x'] * 12,
subset['release_pos_z'] * 12,
c=[color_map[ptype]],
label=ptype,
alpha=0.3,
s=20)
ax1.set_xlabel('Horizontal Position (inches)', fontsize=11)
ax1.set_ylabel('Vertical Position (inches)', fontsize=11)
ax1.set_title('Release Point Scatter', fontsize=12, fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)
# Plot 2: Confidence ellipses
ax2 = axes[0, 1]
for ptype in pitch_types:
subset = valid_data[valid_data['pitch_type'] == ptype]
if len(subset) < 3:
continue
x = subset['release_pos_x'].values * 12
z = subset['release_pos_z'].values * 12
# Calculate covariance and mean
mean_x, mean_z = np.mean(x), np.mean(z)
cov = np.cov(x, z)
# Create confidence ellipse
eigenvalues, eigenvectors = np.linalg.eig(cov)
angle = np.degrees(np.arctan2(eigenvectors[1, 0], eigenvectors[0, 0]))
# 95% confidence interval
width, height = 2 * 2.448 * np.sqrt(eigenvalues)
ellipse = Ellipse(xy=(mean_x, mean_z),
width=width,
height=height,
angle=angle,
facecolor=color_map[ptype],
alpha=0.3,
edgecolor=color_map[ptype],
linewidth=2,
label=ptype)
ax2.add_patch(ellipse)
ax2.plot(mean_x, mean_z, 'o', color=color_map[ptype],
markersize=8, markeredgecolor='black', markeredgewidth=1)
ax2.set_xlabel('Horizontal Position (inches)', fontsize=11)
ax2.set_ylabel('Vertical Position (inches)', fontsize=11)
ax2.set_title('95% Confidence Ellipses', fontsize=12, fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.axis('equal')
# Plot 3: Standard deviation by pitch type
ax3 = axes[1, 0]
stats = analyze_release_point_consistency(pitcher_data)
x_std = stats[('release_pos_x', 'std')].values
z_std = stats[('release_pos_z', 'std')].values
pitch_labels = stats.index.tolist()
x_pos = np.arange(len(pitch_labels))
width = 0.35
ax3.bar(x_pos - width/2, x_std, width, label='Horizontal SD', alpha=0.8)
ax3.bar(x_pos + width/2, z_std, width, label='Vertical SD', alpha=0.8)
ax3.set_xlabel('Pitch Type', fontsize=11)
ax3.set_ylabel('Standard Deviation (inches)', fontsize=11)
ax3.set_title('Release Point Variability', fontsize=12, fontweight='bold')
ax3.set_xticks(x_pos)
ax3.set_xticklabels(pitch_labels)
ax3.legend()
ax3.grid(True, alpha=0.3, axis='y')
# Plot 4: Time series (by game date)
ax4 = axes[1, 1]
if 'game_date' in valid_data.columns:
valid_data['game_date'] = pd.to_datetime(valid_data['game_date'])
for ptype in pitch_types[:3]: # Limit to top 3 pitch types
subset = valid_data[valid_data['pitch_type'] == ptype]
# Calculate rolling mean of release height
subset_sorted = subset.sort_values('game_date')
subset_sorted['release_z_inches'] = subset_sorted['release_pos_z'] * 12
# Group by game date and calculate mean
daily_avg = subset_sorted.groupby('game_date')['release_z_inches'].mean()
ax4.plot(daily_avg.index, daily_avg.values,
label=ptype, marker='o', alpha=0.7)
ax4.set_xlabel('Date', fontsize=11)
ax4.set_ylabel('Release Height (inches)', fontsize=11)
ax4.set_title('Release Point Consistency Over Season',
fontsize=12, fontweight='bold')
ax4.legend()
ax4.grid(True, alpha=0.3)
plt.setp(ax4.xaxis.get_majorticklabels(), rotation=45)
plt.suptitle(f'{pitcher_name} - Release Point Analysis',
fontsize=14, fontweight='bold', y=1.00)
plt.tight_layout()
return fig
# Example usage with Ohtani's data
# Assuming we have ohtani_data loaded
# fig = plot_release_point_comparison(pitches, "Shohei Ohtani")
# plt.savefig('ohtani_release_analysis.png', dpi=300, bbox_inches='tight')
# Analyze release point consistency
analyze_release_consistency <- function(pitcher_data) {
release_analysis <- pitcher_data %>%
group_by(pitch_type) %>%
summarise(
n_pitches = n(),
# Release position means
mean_x = mean(release_pos_x, na.rm = TRUE),
mean_y = mean(release_pos_y, na.rm = TRUE),
mean_z = mean(release_pos_z, na.rm = TRUE),
# Release position standard deviations (in inches)
sd_x = sd(release_pos_x, na.rm = TRUE) * 12,
sd_y = sd(release_pos_y, na.rm = TRUE) * 12,
sd_z = sd(release_pos_z, na.rm = TRUE) * 12,
# Total positional variance
total_variance = sqrt(sd_x^2 + sd_z^2),
.groups = 'drop'
) %>%
arrange(desc(n_pitches))
return(release_analysis)
}
# Analyze Shohei Ohtani's release consistency
ohtani_id <- 660271
ohtani_data <- statcast_search_pitchers(
start_date = "2023-04-01",
end_date = "2023-09-30",
pitcherid = ohtani_id
)
ohtani_release <- analyze_release_consistency(ohtani_data)
print("Shohei Ohtani Release Point Consistency (2023)")
print(ohtani_release)
# Visualize release points
plot_release_consistency <- function(pitcher_data, pitcher_name) {
# Calculate deviations from mean release point
mean_release <- pitcher_data %>%
summarise(
mean_x = mean(release_pos_x, na.rm = TRUE),
mean_z = mean(release_pos_z, na.rm = TRUE)
)
plot_data <- pitcher_data %>%
filter(!is.na(release_pos_x) & !is.na(release_pos_z)) %>%
mutate(
rel_x_inches = (release_pos_x - mean_release$mean_x) * 12,
rel_z_inches = (release_pos_z - mean_release$mean_z) * 12
)
p <- ggplot(plot_data, aes(x = rel_x_inches, y = rel_z_inches,
color = pitch_type)) +
geom_point(alpha = 0.3, size = 1.5) +
stat_ellipse(level = 0.95, size = 1) +
facet_wrap(~pitch_type, ncol = 3) +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.5) +
labs(
title = paste(pitcher_name, "- Release Point Consistency"),
subtitle = "95% confidence ellipses by pitch type",
x = "Horizontal Deviation from Mean (inches)",
y = "Vertical Deviation from Mean (inches)",
color = "Pitch Type"
) +
theme_minimal() +
theme(legend.position = "bottom") +
coord_fixed(ratio = 1)
return(p)
}
# Create visualization
p_release <- plot_release_consistency(ohtani_data, "Shohei Ohtani")
print(p_release)
# Calculate cross-pitch release consistency
calculate_release_distance_matrix <- function(pitcher_data) {
pitch_types <- unique(pitcher_data$pitch_type)
pitch_types <- pitch_types[!is.na(pitch_types)]
# Calculate mean release point for each pitch type
means <- pitcher_data %>%
group_by(pitch_type) %>%
summarise(
mean_x = mean(release_pos_x, na.rm = TRUE),
mean_y = mean(release_pos_y, na.rm = TRUE),
mean_z = mean(release_pos_z, na.rm = TRUE),
.groups = 'drop'
)
# Create distance matrix
n <- nrow(means)
dist_matrix <- matrix(0, nrow = n, ncol = n)
rownames(dist_matrix) <- means$pitch_type
colnames(dist_matrix) <- means$pitch_type
for (i in 1:n) {
for (j in 1:n) {
if (i != j) {
dx <- means$mean_x[i] - means$mean_x[j]
dy <- means$mean_y[i] - means$mean_y[j]
dz <- means$mean_z[i] - means$mean_z[j]
# Distance in inches
dist_matrix[i, j] <- sqrt(dx^2 + dy^2 + dz^2) * 12
}
}
}
return(dist_matrix)
}
ohtani_distances <- calculate_release_distance_matrix(ohtani_data)
print("Release Point Distances Between Pitch Types (inches):")
print(round(ohtani_distances, 2))
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import multivariate_normal
from matplotlib.patches import Ellipse
def analyze_release_point_consistency(pitcher_data):
"""
Analyze release point consistency across pitch types
Returns detailed statistics and visualizations
"""
# Filter valid data
valid_data = pitcher_data.dropna(
subset=['release_pos_x', 'release_pos_y', 'release_pos_z', 'pitch_type']
)
# Calculate statistics by pitch type
stats = valid_data.groupby('pitch_type').agg({
'release_pos_x': ['mean', 'std', 'count'],
'release_pos_y': ['mean', 'std'],
'release_pos_z': ['mean', 'std']
})
# Convert standard deviations to inches
stats[('release_pos_x', 'std')] *= 12
stats[('release_pos_y', 'std')] *= 12
stats[('release_pos_z', 'std')] *= 12
# Calculate total variance in horizontal/vertical plane
stats['total_variance'] = np.sqrt(
stats[('release_pos_x', 'std')]**2 +
stats[('release_pos_z', 'std')]**2
)
return stats
def plot_release_point_comparison(pitcher_data, pitcher_name):
"""
Create comprehensive release point visualization
"""
fig, axes = plt.subplots(2, 2, figsize=(14, 12))
valid_data = pitcher_data.dropna(
subset=['release_pos_x', 'release_pos_z', 'pitch_type']
)
# Get pitch types and colors
pitch_types = valid_data['pitch_type'].unique()
colors = plt.cm.Set2(np.linspace(0, 1, len(pitch_types)))
color_map = dict(zip(pitch_types, colors))
# Plot 1: Scatter plot with density
ax1 = axes[0, 0]
for ptype in pitch_types:
subset = valid_data[valid_data['pitch_type'] == ptype]
ax1.scatter(subset['release_pos_x'] * 12,
subset['release_pos_z'] * 12,
c=[color_map[ptype]],
label=ptype,
alpha=0.3,
s=20)
ax1.set_xlabel('Horizontal Position (inches)', fontsize=11)
ax1.set_ylabel('Vertical Position (inches)', fontsize=11)
ax1.set_title('Release Point Scatter', fontsize=12, fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)
# Plot 2: Confidence ellipses
ax2 = axes[0, 1]
for ptype in pitch_types:
subset = valid_data[valid_data['pitch_type'] == ptype]
if len(subset) < 3:
continue
x = subset['release_pos_x'].values * 12
z = subset['release_pos_z'].values * 12
# Calculate covariance and mean
mean_x, mean_z = np.mean(x), np.mean(z)
cov = np.cov(x, z)
# Create confidence ellipse
eigenvalues, eigenvectors = np.linalg.eig(cov)
angle = np.degrees(np.arctan2(eigenvectors[1, 0], eigenvectors[0, 0]))
# 95% confidence interval
width, height = 2 * 2.448 * np.sqrt(eigenvalues)
ellipse = Ellipse(xy=(mean_x, mean_z),
width=width,
height=height,
angle=angle,
facecolor=color_map[ptype],
alpha=0.3,
edgecolor=color_map[ptype],
linewidth=2,
label=ptype)
ax2.add_patch(ellipse)
ax2.plot(mean_x, mean_z, 'o', color=color_map[ptype],
markersize=8, markeredgecolor='black', markeredgewidth=1)
ax2.set_xlabel('Horizontal Position (inches)', fontsize=11)
ax2.set_ylabel('Vertical Position (inches)', fontsize=11)
ax2.set_title('95% Confidence Ellipses', fontsize=12, fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.axis('equal')
# Plot 3: Standard deviation by pitch type
ax3 = axes[1, 0]
stats = analyze_release_point_consistency(pitcher_data)
x_std = stats[('release_pos_x', 'std')].values
z_std = stats[('release_pos_z', 'std')].values
pitch_labels = stats.index.tolist()
x_pos = np.arange(len(pitch_labels))
width = 0.35
ax3.bar(x_pos - width/2, x_std, width, label='Horizontal SD', alpha=0.8)
ax3.bar(x_pos + width/2, z_std, width, label='Vertical SD', alpha=0.8)
ax3.set_xlabel('Pitch Type', fontsize=11)
ax3.set_ylabel('Standard Deviation (inches)', fontsize=11)
ax3.set_title('Release Point Variability', fontsize=12, fontweight='bold')
ax3.set_xticks(x_pos)
ax3.set_xticklabels(pitch_labels)
ax3.legend()
ax3.grid(True, alpha=0.3, axis='y')
# Plot 4: Time series (by game date)
ax4 = axes[1, 1]
if 'game_date' in valid_data.columns:
valid_data['game_date'] = pd.to_datetime(valid_data['game_date'])
for ptype in pitch_types[:3]: # Limit to top 3 pitch types
subset = valid_data[valid_data['pitch_type'] == ptype]
# Calculate rolling mean of release height
subset_sorted = subset.sort_values('game_date')
subset_sorted['release_z_inches'] = subset_sorted['release_pos_z'] * 12
# Group by game date and calculate mean
daily_avg = subset_sorted.groupby('game_date')['release_z_inches'].mean()
ax4.plot(daily_avg.index, daily_avg.values,
label=ptype, marker='o', alpha=0.7)
ax4.set_xlabel('Date', fontsize=11)
ax4.set_ylabel('Release Height (inches)', fontsize=11)
ax4.set_title('Release Point Consistency Over Season',
fontsize=12, fontweight='bold')
ax4.legend()
ax4.grid(True, alpha=0.3)
plt.setp(ax4.xaxis.get_majorticklabels(), rotation=45)
plt.suptitle(f'{pitcher_name} - Release Point Analysis',
fontsize=14, fontweight='bold', y=1.00)
plt.tight_layout()
return fig
# Example usage with Ohtani's data
# Assuming we have ohtani_data loaded
# fig = plot_release_point_comparison(pitches, "Shohei Ohtani")
# plt.savefig('ohtani_release_analysis.png', dpi=300, bbox_inches='tight')
The effectiveness of pitch tunneling depends not just on similar early trajectories, but on meaningful late separation. Pitches that tunnel well together must ultimately break to different locations, forcing the hitter to commit to one zone or another.
Understanding Induced Vertical Break (IVB)
Induced Vertical Break (IVB) represents the vertical movement of a pitch relative to a spinless pitch affected only by gravity. A positive IVB means the pitch "rises" (actually, it drops less than gravity alone would cause), while negative IVB indicates the pitch drops more than gravity alone.
For reference:
- Elite four-seam fastballs: +15 to +20 inches IVB
- Sliders: -5 to +5 inches IVB
- Curveballs: -15 to -20 inches IVB
- Changeups: +5 to +10 inches IVB
The IVB difference between tunneling pairs creates vertical separation at the plate.
Horizontal Break and Arm-Side vs. Glove-Side Movement
Horizontal break describes side-to-side movement. For right-handed pitchers:
- Positive horizontal break = movement toward first base (arm side)
- Negative horizontal break = movement toward third base (glove side)
Effective tunneling often pairs pitches with opposite horizontal movement (e.g., a fastball with arm-side run paired with a glove-side slider).
Calculating Break Differentials
# Analyze break differentials for tunneling pairs
analyze_break_differentials <- function(pitcher_data) {
# Calculate average break by pitch type
break_summary <- pitcher_data %>%
group_by(pitch_type) %>%
summarise(
n = n(),
avg_speed = mean(release_speed, na.rm = TRUE),
# Induced vertical break (pfx_z in feet, convert to inches)
ivb = mean(pfx_z, na.rm = TRUE) * 12,
ivb_sd = sd(pfx_z, na.rm = TRUE) * 12,
# Horizontal break (pfx_x in feet, convert to inches)
hb = mean(pfx_x, na.rm = TRUE) * 12,
hb_sd = sd(pfx_x, na.rm = TRUE) * 12,
# Total break magnitude
total_break = sqrt(ivb^2 + hb^2),
# Spin rate
spin_rate = mean(release_spin_rate, na.rm = TRUE),
.groups = 'drop'
) %>%
arrange(desc(n))
return(break_summary)
}
# Analyze Spencer Strider's arsenal
strider_breaks <- analyze_break_differentials(strider_data)
print("Spencer Strider Break Profile (2023)")
print(strider_breaks)
# Calculate pairwise break differentials
calculate_break_differential_matrix <- function(pitcher_data) {
break_data <- pitcher_data %>%
group_by(pitch_type) %>%
summarise(
ivb = mean(pfx_z, na.rm = TRUE) * 12,
hb = mean(pfx_x, na.rm = TRUE) * 12,
.groups = 'drop'
)
n <- nrow(break_data)
# Create matrices for IVB and HB differentials
ivb_diff <- matrix(0, nrow = n, ncol = n)
hb_diff <- matrix(0, nrow = n, ncol = n)
total_diff <- matrix(0, nrow = n, ncol = n)
rownames(ivb_diff) <- rownames(hb_diff) <- rownames(total_diff) <- break_data$pitch_type
colnames(ivb_diff) <- colnames(hb_diff) <- colnames(total_diff) <- break_data$pitch_type
for (i in 1:n) {
for (j in 1:n) {
ivb_diff[i, j] <- break_data$ivb[i] - break_data$ivb[j]
hb_diff[i, j] <- break_data$hb[i] - break_data$hb[j]
total_diff[i, j] <- sqrt(ivb_diff[i, j]^2 + hb_diff[i, j]^2)
}
}
return(list(
ivb_differential = ivb_diff,
hb_differential = hb_diff,
total_differential = total_diff
))
}
strider_differentials <- calculate_break_differential_matrix(strider_data)
print("\nInduced Vertical Break Differentials (inches):")
print(round(strider_differentials$ivb_differential, 1))
print("\nHorizontal Break Differentials (inches):")
print(round(strider_differentials$hb_differential, 1))
print("\nTotal Break Differentials (inches):")
print(round(strider_differentials$total_differential, 1))
# Visualize break profiles
plot_break_profile <- function(pitcher_data, pitcher_name) {
plot_data <- pitcher_data %>%
filter(!is.na(pfx_x) & !is.na(pfx_z)) %>%
mutate(
hb_inches = pfx_x * 12,
ivb_inches = pfx_z * 12
)
# Calculate means for labels
means <- plot_data %>%
group_by(pitch_type) %>%
summarise(
hb = mean(hb_inches),
ivb = mean(ivb_inches),
n = n(),
.groups = 'drop'
)
p <- ggplot(plot_data, aes(x = hb_inches, y = ivb_inches,
color = pitch_type)) +
geom_point(alpha = 0.2, size = 2) +
stat_ellipse(level = 0.75, size = 1.2) +
geom_point(data = means, aes(x = hb, y = ivb),
size = 5, shape = 17) +
geom_text(data = means,
aes(x = hb, y = ivb,
label = paste0(pitch_type, "\n(n=", n, ")")),
vjust = -1.5, size = 3.5, fontface = "bold") +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.5) +
labs(
title = paste(pitcher_name, "- Pitch Movement Profile"),
subtitle = "Induced vertical break vs. horizontal break (catcher's perspective)",
x = "Horizontal Break (inches, negative = glove side)",
y = "Induced Vertical Break (inches, positive = rise)",
color = "Pitch Type",
caption = "Triangles represent mean movement | Ellipses show 75% confidence intervals"
) +
theme_minimal() +
theme(
legend.position = "right",
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(size = 10)
) +
coord_fixed(ratio = 1)
return(p)
}
p_breaks <- plot_break_profile(strider_data, "Spencer Strider")
print(p_breaks)
Advanced Break Analysis in Python
def analyze_pitch_break_profiles(pitcher_data):
"""
Comprehensive analysis of pitch movement profiles
Returns break statistics and optimal tunneling pairs
"""
# Filter valid data
valid = pitcher_data.dropna(subset=['pfx_x', 'pfx_z', 'pitch_type'])
# Calculate break statistics by pitch type
break_stats = valid.groupby('pitch_type').agg({
'pfx_x': ['mean', 'std'],
'pfx_z': ['mean', 'std'],
'release_speed': 'mean',
'release_spin_rate': 'mean',
'pitch_type': 'count'
})
# Convert to inches
break_stats[('pfx_x', 'mean')] *= 12
break_stats[('pfx_x', 'std')] *= 12
break_stats[('pfx_z', 'mean')] *= 12
break_stats[('pfx_z', 'std')] *= 12
# Rename columns for clarity
break_stats.columns = ['hb_mean', 'hb_std', 'ivb_mean', 'ivb_std',
'velo', 'spin', 'count']
# Calculate total break
break_stats['total_break'] = np.sqrt(
break_stats['hb_mean']**2 + break_stats['ivb_mean']**2
)
# Calculate break angle
break_stats['break_angle'] = np.degrees(
np.arctan2(break_stats['ivb_mean'], break_stats['hb_mean'])
)
return break_stats
def find_optimal_tunnel_pairs(pitcher_data, min_break_diff=6, max_velo_diff=10):
"""
Identify optimal pitch pairs for tunneling based on:
1. Sufficient break differential (creates deception)
2. Similar velocity (similar initial trajectory)
3. Different movement profiles
"""
break_stats = analyze_pitch_break_profiles(pitcher_data)
pitch_types = break_stats.index.tolist()
tunnel_pairs = []
for i, p1 in enumerate(pitch_types):
for p2 in pitch_types[i+1:]:
# Calculate differentials
velo_diff = abs(break_stats.loc[p1, 'velo'] -
break_stats.loc[p2, 'velo'])
ivb_diff = abs(break_stats.loc[p1, 'ivb_mean'] -
break_stats.loc[p2, 'ivb_mean'])
hb_diff = abs(break_stats.loc[p1, 'hb_mean'] -
break_stats.loc[p2, 'hb_mean'])
total_break_diff = np.sqrt(ivb_diff**2 + hb_diff**2)
# Check if pair meets criteria
if total_break_diff >= min_break_diff and velo_diff <= max_velo_diff:
# Calculate tunnel quality score
# Higher score = better tunneling potential
tunnel_score = (total_break_diff / 20) * (1 - velo_diff / 15)
tunnel_pairs.append({
'pitch1': p1,
'pitch2': p2,
'velo_diff': velo_diff,
'ivb_diff': ivb_diff,
'hb_diff': hb_diff,
'total_break_diff': total_break_diff,
'tunnel_score': tunnel_score
})
return pd.DataFrame(tunnel_pairs).sort_values('tunnel_score', ascending=False)
def visualize_break_profiles_advanced(pitcher_data, pitcher_name):
"""
Create advanced visualization of pitch break profiles
"""
fig, axes = plt.subplots(2, 2, figsize=(15, 14))
valid = pitcher_data.dropna(subset=['pfx_x', 'pfx_z', 'pitch_type'])
# Convert to inches
valid['hb'] = valid['pfx_x'] * 12
valid['ivb'] = valid['pfx_z'] * 12
pitch_types = valid['pitch_type'].value_counts().head(6).index.tolist()
colors = plt.cm.tab10(np.linspace(0, 1, len(pitch_types)))
color_map = dict(zip(pitch_types, colors))
# Plot 1: Movement profile (traditional view)
ax1 = axes[0, 0]
for ptype in pitch_types:
subset = valid[valid['pitch_type'] == ptype]
ax1.scatter(subset['hb'], subset['ivb'],
c=[color_map[ptype]], label=ptype,
alpha=0.3, s=15)
# Add mean marker
mean_hb = subset['hb'].mean()
mean_ivb = subset['ivb'].mean()
ax1.scatter(mean_hb, mean_ivb, c=[color_map[ptype]],
s=200, marker='*', edgecolors='black', linewidths=1.5)
ax1.axhline(0, color='black', linestyle='--', alpha=0.3, linewidth=1)
ax1.axvline(0, color='black', linestyle='--', alpha=0.3, linewidth=1)
ax1.set_xlabel('Horizontal Break (inches)', fontsize=11)
ax1.set_ylabel('Induced Vertical Break (inches)', fontsize=11)
ax1.set_title('Movement Profile (Catcher\'s View)', fontsize=12, fontweight='bold')
ax1.legend(loc='best', fontsize=9)
ax1.grid(True, alpha=0.3)
ax1.set_aspect('equal', adjustable='box')
# Plot 2: Break vs. Velocity
ax2 = axes[0, 1]
for ptype in pitch_types:
subset = valid[valid['pitch_type'] == ptype]
total_break = np.sqrt(subset['hb']**2 + subset['ivb']**2)
ax2.scatter(subset['release_speed'], total_break,
c=[color_map[ptype]], label=ptype,
alpha=0.4, s=20)
ax2.set_xlabel('Release Speed (mph)', fontsize=11)
ax2.set_ylabel('Total Break (inches)', fontsize=11)
ax2.set_title('Break vs. Velocity', fontsize=12, fontweight='bold')
ax2.legend(loc='best', fontsize=9)
ax2.grid(True, alpha=0.3)
# Plot 3: Break angle distribution
ax3 = axes[1, 0]
for ptype in pitch_types:
subset = valid[valid['pitch_type'] == ptype]
break_angle = np.degrees(np.arctan2(subset['ivb'], subset['hb']))
ax3.hist(break_angle, bins=20, alpha=0.5,
label=ptype, color=color_map[ptype])
ax3.set_xlabel('Break Angle (degrees)', fontsize=11)
ax3.set_ylabel('Frequency', fontsize=11)
ax3.set_title('Break Angle Distribution', fontsize=12, fontweight='bold')
ax3.legend(loc='best', fontsize=9)
ax3.grid(True, alpha=0.3, axis='y')
# Plot 4: Spin rate vs. IVB
ax4 = axes[1, 1]
valid_spin = valid.dropna(subset=['release_spin_rate'])
for ptype in pitch_types:
subset = valid_spin[valid_spin['pitch_type'] == ptype]
if len(subset) > 0:
ax4.scatter(subset['release_spin_rate'], subset['ivb'],
c=[color_map[ptype]], label=ptype,
alpha=0.4, s=20)
ax4.set_xlabel('Spin Rate (rpm)', fontsize=11)
ax4.set_ylabel('Induced Vertical Break (inches)', fontsize=11)
ax4.set_title('Spin Rate vs. IVB', fontsize=12, fontweight='bold')
ax4.legend(loc='best', fontsize=9)
ax4.grid(True, alpha=0.3)
plt.suptitle(f'{pitcher_name} - Advanced Break Analysis',
fontsize=14, fontweight='bold', y=0.995)
plt.tight_layout()
return fig
# Example usage
# break_analysis = analyze_pitch_break_profiles(pitches)
# optimal_pairs = find_optimal_tunnel_pairs(pitches)
# print("\nOptimal Tunneling Pairs:")
# print(optimal_pairs)
# Analyze break differentials for tunneling pairs
analyze_break_differentials <- function(pitcher_data) {
# Calculate average break by pitch type
break_summary <- pitcher_data %>%
group_by(pitch_type) %>%
summarise(
n = n(),
avg_speed = mean(release_speed, na.rm = TRUE),
# Induced vertical break (pfx_z in feet, convert to inches)
ivb = mean(pfx_z, na.rm = TRUE) * 12,
ivb_sd = sd(pfx_z, na.rm = TRUE) * 12,
# Horizontal break (pfx_x in feet, convert to inches)
hb = mean(pfx_x, na.rm = TRUE) * 12,
hb_sd = sd(pfx_x, na.rm = TRUE) * 12,
# Total break magnitude
total_break = sqrt(ivb^2 + hb^2),
# Spin rate
spin_rate = mean(release_spin_rate, na.rm = TRUE),
.groups = 'drop'
) %>%
arrange(desc(n))
return(break_summary)
}
# Analyze Spencer Strider's arsenal
strider_breaks <- analyze_break_differentials(strider_data)
print("Spencer Strider Break Profile (2023)")
print(strider_breaks)
# Calculate pairwise break differentials
calculate_break_differential_matrix <- function(pitcher_data) {
break_data <- pitcher_data %>%
group_by(pitch_type) %>%
summarise(
ivb = mean(pfx_z, na.rm = TRUE) * 12,
hb = mean(pfx_x, na.rm = TRUE) * 12,
.groups = 'drop'
)
n <- nrow(break_data)
# Create matrices for IVB and HB differentials
ivb_diff <- matrix(0, nrow = n, ncol = n)
hb_diff <- matrix(0, nrow = n, ncol = n)
total_diff <- matrix(0, nrow = n, ncol = n)
rownames(ivb_diff) <- rownames(hb_diff) <- rownames(total_diff) <- break_data$pitch_type
colnames(ivb_diff) <- colnames(hb_diff) <- colnames(total_diff) <- break_data$pitch_type
for (i in 1:n) {
for (j in 1:n) {
ivb_diff[i, j] <- break_data$ivb[i] - break_data$ivb[j]
hb_diff[i, j] <- break_data$hb[i] - break_data$hb[j]
total_diff[i, j] <- sqrt(ivb_diff[i, j]^2 + hb_diff[i, j]^2)
}
}
return(list(
ivb_differential = ivb_diff,
hb_differential = hb_diff,
total_differential = total_diff
))
}
strider_differentials <- calculate_break_differential_matrix(strider_data)
print("\nInduced Vertical Break Differentials (inches):")
print(round(strider_differentials$ivb_differential, 1))
print("\nHorizontal Break Differentials (inches):")
print(round(strider_differentials$hb_differential, 1))
print("\nTotal Break Differentials (inches):")
print(round(strider_differentials$total_differential, 1))
# Visualize break profiles
plot_break_profile <- function(pitcher_data, pitcher_name) {
plot_data <- pitcher_data %>%
filter(!is.na(pfx_x) & !is.na(pfx_z)) %>%
mutate(
hb_inches = pfx_x * 12,
ivb_inches = pfx_z * 12
)
# Calculate means for labels
means <- plot_data %>%
group_by(pitch_type) %>%
summarise(
hb = mean(hb_inches),
ivb = mean(ivb_inches),
n = n(),
.groups = 'drop'
)
p <- ggplot(plot_data, aes(x = hb_inches, y = ivb_inches,
color = pitch_type)) +
geom_point(alpha = 0.2, size = 2) +
stat_ellipse(level = 0.75, size = 1.2) +
geom_point(data = means, aes(x = hb, y = ivb),
size = 5, shape = 17) +
geom_text(data = means,
aes(x = hb, y = ivb,
label = paste0(pitch_type, "\n(n=", n, ")")),
vjust = -1.5, size = 3.5, fontface = "bold") +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.5) +
labs(
title = paste(pitcher_name, "- Pitch Movement Profile"),
subtitle = "Induced vertical break vs. horizontal break (catcher's perspective)",
x = "Horizontal Break (inches, negative = glove side)",
y = "Induced Vertical Break (inches, positive = rise)",
color = "Pitch Type",
caption = "Triangles represent mean movement | Ellipses show 75% confidence intervals"
) +
theme_minimal() +
theme(
legend.position = "right",
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(size = 10)
) +
coord_fixed(ratio = 1)
return(p)
}
p_breaks <- plot_break_profile(strider_data, "Spencer Strider")
print(p_breaks)
def analyze_pitch_break_profiles(pitcher_data):
"""
Comprehensive analysis of pitch movement profiles
Returns break statistics and optimal tunneling pairs
"""
# Filter valid data
valid = pitcher_data.dropna(subset=['pfx_x', 'pfx_z', 'pitch_type'])
# Calculate break statistics by pitch type
break_stats = valid.groupby('pitch_type').agg({
'pfx_x': ['mean', 'std'],
'pfx_z': ['mean', 'std'],
'release_speed': 'mean',
'release_spin_rate': 'mean',
'pitch_type': 'count'
})
# Convert to inches
break_stats[('pfx_x', 'mean')] *= 12
break_stats[('pfx_x', 'std')] *= 12
break_stats[('pfx_z', 'mean')] *= 12
break_stats[('pfx_z', 'std')] *= 12
# Rename columns for clarity
break_stats.columns = ['hb_mean', 'hb_std', 'ivb_mean', 'ivb_std',
'velo', 'spin', 'count']
# Calculate total break
break_stats['total_break'] = np.sqrt(
break_stats['hb_mean']**2 + break_stats['ivb_mean']**2
)
# Calculate break angle
break_stats['break_angle'] = np.degrees(
np.arctan2(break_stats['ivb_mean'], break_stats['hb_mean'])
)
return break_stats
def find_optimal_tunnel_pairs(pitcher_data, min_break_diff=6, max_velo_diff=10):
"""
Identify optimal pitch pairs for tunneling based on:
1. Sufficient break differential (creates deception)
2. Similar velocity (similar initial trajectory)
3. Different movement profiles
"""
break_stats = analyze_pitch_break_profiles(pitcher_data)
pitch_types = break_stats.index.tolist()
tunnel_pairs = []
for i, p1 in enumerate(pitch_types):
for p2 in pitch_types[i+1:]:
# Calculate differentials
velo_diff = abs(break_stats.loc[p1, 'velo'] -
break_stats.loc[p2, 'velo'])
ivb_diff = abs(break_stats.loc[p1, 'ivb_mean'] -
break_stats.loc[p2, 'ivb_mean'])
hb_diff = abs(break_stats.loc[p1, 'hb_mean'] -
break_stats.loc[p2, 'hb_mean'])
total_break_diff = np.sqrt(ivb_diff**2 + hb_diff**2)
# Check if pair meets criteria
if total_break_diff >= min_break_diff and velo_diff <= max_velo_diff:
# Calculate tunnel quality score
# Higher score = better tunneling potential
tunnel_score = (total_break_diff / 20) * (1 - velo_diff / 15)
tunnel_pairs.append({
'pitch1': p1,
'pitch2': p2,
'velo_diff': velo_diff,
'ivb_diff': ivb_diff,
'hb_diff': hb_diff,
'total_break_diff': total_break_diff,
'tunnel_score': tunnel_score
})
return pd.DataFrame(tunnel_pairs).sort_values('tunnel_score', ascending=False)
def visualize_break_profiles_advanced(pitcher_data, pitcher_name):
"""
Create advanced visualization of pitch break profiles
"""
fig, axes = plt.subplots(2, 2, figsize=(15, 14))
valid = pitcher_data.dropna(subset=['pfx_x', 'pfx_z', 'pitch_type'])
# Convert to inches
valid['hb'] = valid['pfx_x'] * 12
valid['ivb'] = valid['pfx_z'] * 12
pitch_types = valid['pitch_type'].value_counts().head(6).index.tolist()
colors = plt.cm.tab10(np.linspace(0, 1, len(pitch_types)))
color_map = dict(zip(pitch_types, colors))
# Plot 1: Movement profile (traditional view)
ax1 = axes[0, 0]
for ptype in pitch_types:
subset = valid[valid['pitch_type'] == ptype]
ax1.scatter(subset['hb'], subset['ivb'],
c=[color_map[ptype]], label=ptype,
alpha=0.3, s=15)
# Add mean marker
mean_hb = subset['hb'].mean()
mean_ivb = subset['ivb'].mean()
ax1.scatter(mean_hb, mean_ivb, c=[color_map[ptype]],
s=200, marker='*', edgecolors='black', linewidths=1.5)
ax1.axhline(0, color='black', linestyle='--', alpha=0.3, linewidth=1)
ax1.axvline(0, color='black', linestyle='--', alpha=0.3, linewidth=1)
ax1.set_xlabel('Horizontal Break (inches)', fontsize=11)
ax1.set_ylabel('Induced Vertical Break (inches)', fontsize=11)
ax1.set_title('Movement Profile (Catcher\'s View)', fontsize=12, fontweight='bold')
ax1.legend(loc='best', fontsize=9)
ax1.grid(True, alpha=0.3)
ax1.set_aspect('equal', adjustable='box')
# Plot 2: Break vs. Velocity
ax2 = axes[0, 1]
for ptype in pitch_types:
subset = valid[valid['pitch_type'] == ptype]
total_break = np.sqrt(subset['hb']**2 + subset['ivb']**2)
ax2.scatter(subset['release_speed'], total_break,
c=[color_map[ptype]], label=ptype,
alpha=0.4, s=20)
ax2.set_xlabel('Release Speed (mph)', fontsize=11)
ax2.set_ylabel('Total Break (inches)', fontsize=11)
ax2.set_title('Break vs. Velocity', fontsize=12, fontweight='bold')
ax2.legend(loc='best', fontsize=9)
ax2.grid(True, alpha=0.3)
# Plot 3: Break angle distribution
ax3 = axes[1, 0]
for ptype in pitch_types:
subset = valid[valid['pitch_type'] == ptype]
break_angle = np.degrees(np.arctan2(subset['ivb'], subset['hb']))
ax3.hist(break_angle, bins=20, alpha=0.5,
label=ptype, color=color_map[ptype])
ax3.set_xlabel('Break Angle (degrees)', fontsize=11)
ax3.set_ylabel('Frequency', fontsize=11)
ax3.set_title('Break Angle Distribution', fontsize=12, fontweight='bold')
ax3.legend(loc='best', fontsize=9)
ax3.grid(True, alpha=0.3, axis='y')
# Plot 4: Spin rate vs. IVB
ax4 = axes[1, 1]
valid_spin = valid.dropna(subset=['release_spin_rate'])
for ptype in pitch_types:
subset = valid_spin[valid_spin['pitch_type'] == ptype]
if len(subset) > 0:
ax4.scatter(subset['release_spin_rate'], subset['ivb'],
c=[color_map[ptype]], label=ptype,
alpha=0.4, s=20)
ax4.set_xlabel('Spin Rate (rpm)', fontsize=11)
ax4.set_ylabel('Induced Vertical Break (inches)', fontsize=11)
ax4.set_title('Spin Rate vs. IVB', fontsize=12, fontweight='bold')
ax4.legend(loc='best', fontsize=9)
ax4.grid(True, alpha=0.3)
plt.suptitle(f'{pitcher_name} - Advanced Break Analysis',
fontsize=14, fontweight='bold', y=0.995)
plt.tight_layout()
return fig
# Example usage
# break_analysis = analyze_pitch_break_profiles(pitches)
# optimal_pairs = find_optimal_tunnel_pairs(pitches)
# print("\nOptimal Tunneling Pairs:")
# print(optimal_pairs)
Effective tunneling isn't just about having pitches that can tunnel—it's about using them strategically in sequences that maximize deception and exploit hitter weaknesses.
Common Tunneling Sequences
The Classic Fastball-Breaking Ball Tunnel
The most fundamental tunneling sequence pairs a high-velocity fastball with a breaking ball that has similar early trajectory. Spencer Strider's fastball-slider combination exemplifies this:
- Fastball up in zone (98+ mph, +18 inches IVB)
- Slider same release, drops below zone (88 mph, +2 inches IVB, -8 inches HB)
Hitters must commit to the high fastball or low slider before the tunnel point around 20-25 feet. Elite execution makes this a 50-50 guess.
The Velocity Ladder
This sequence uses multiple pitches of decreasing velocity, all tunneling through the same release point:
- Four-seam fastball (97 mph)
- Two-seam fastball/sinker (94 mph)
- Changeup (87 mph)
- Curveball (82 mph)
Each pitch has similar initial trajectory but different break and speed, creating a cascading effect where hitters struggle to time any pitch correctly.
Same-Side/Opposite-Side Break Patterns
Advanced sequencing alternates between glove-side and arm-side movement:
- Pitch 1: Fastball with arm-side run (positive HB)
- Pitch 2: Slider with glove-side break (negative HB)
- Pitch 3: Back to arm-side (changeup or two-seam)
This horizontal variation forces hitters to cover the full width of the plate while managing vertical uncertainty.
Analyzing Pitch Sequences in R
# Analyze pitch sequencing patterns
library(tidyverse)
analyze_pitch_sequences <- function(pitcher_data, sequence_length = 2) {
# Arrange by game and at-bat
sequenced_data <- pitcher_data %>%
filter(!is.na(pitch_type)) %>%
arrange(game_date, at_bat_number, pitch_number) %>%
group_by(game_pk, at_bat_number) %>%
mutate(
prev_pitch = lag(pitch_type, 1),
prev_prev_pitch = lag(pitch_type, 2),
result = description
) %>%
ungroup()
# Analyze two-pitch sequences
if (sequence_length == 2) {
sequences <- sequenced_data %>%
filter(!is.na(prev_pitch)) %>%
group_by(prev_pitch, pitch_type) %>%
summarise(
n_sequences = n(),
# Outcomes
swing_pct = mean(description %in% c("swinging_strike",
"foul",
"hit_into_play"),
na.rm = TRUE) * 100,
whiff_pct = mean(description == "swinging_strike",
na.rm = TRUE) * 100,
contact_pct = mean(description %in% c("foul", "hit_into_play"),
na.rm = TRUE) * 100,
chase_pct = mean(description == "swinging_strike" &
zone < 1, na.rm = TRUE) * 100,
.groups = 'drop'
) %>%
arrange(desc(n_sequences))
return(sequences)
}
# Analyze three-pitch sequences
if (sequence_length == 3) {
sequences <- sequenced_data %>%
filter(!is.na(prev_pitch) & !is.na(prev_prev_pitch)) %>%
group_by(prev_prev_pitch, prev_pitch, pitch_type) %>%
summarise(
n_sequences = n(),
whiff_pct = mean(description == "swinging_strike",
na.rm = TRUE) * 100,
.groups = 'drop'
) %>%
arrange(desc(n_sequences))
return(sequences)
}
}
# Analyze Spencer Strider's sequences
strider_sequences_2 <- analyze_pitch_sequences(strider_data, sequence_length = 2)
print("Spencer Strider Two-Pitch Sequences (2023)")
print(strider_sequences_2 %>% head(10))
# Focus on tunneling pairs
tunnel_sequences <- strider_sequences_2 %>%
filter((prev_pitch == "FF" & pitch_type == "SL") |
(prev_pitch == "SL" & pitch_type == "FF"))
print("\nFastball-Slider Tunneling Sequences:")
print(tunnel_sequences)
# Visualize sequence effectiveness
plot_sequence_heatmap <- function(sequence_data, metric = "whiff_pct") {
# Create matrix for heatmap
plot_matrix <- sequence_data %>%
select(prev_pitch, pitch_type, all_of(metric)) %>%
pivot_wider(
names_from = pitch_type,
values_from = all_of(metric),
values_fill = 0
) %>%
column_to_rownames("prev_pitch") %>%
as.matrix()
# Create heatmap
library(reshape2)
melted <- melt(plot_matrix)
p <- ggplot(melted, aes(x = Var2, y = Var1, fill = value)) +
geom_tile(color = "white", size = 1) +
geom_text(aes(label = sprintf("%.1f%%", value)),
color = "white", fontface = "bold", size = 4) +
scale_fill_gradient2(
low = "#3182bd", mid = "#9ecae1", high = "#de2d26",
midpoint = median(melted$value, na.rm = TRUE),
name = str_replace_all(metric, "_", " ") %>% str_to_title()
) +
labs(
title = "Pitch Sequencing Effectiveness",
subtitle = paste("Metric:", str_replace_all(metric, "_", " ") %>% str_to_title()),
x = "Current Pitch",
y = "Previous Pitch"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(size = 14, face = "bold")
) +
coord_fixed()
return(p)
}
p_sequence <- plot_sequence_heatmap(strider_sequences_2, "whiff_pct")
print(p_sequence)
# Analyze count-based sequencing
analyze_count_sequences <- function(pitcher_data) {
sequence_by_count <- pitcher_data %>%
filter(!is.na(pitch_type) & !is.na(balls) & !is.na(strikes)) %>%
mutate(count = paste0(balls, "-", strikes)) %>%
group_by(count, pitch_type) %>%
summarise(
n = n(),
usage_pct = n() / nrow(pitcher_data) * 100,
whiff_rate = mean(description == "swinging_strike", na.rm = TRUE),
csw_rate = mean(description %in% c("called_strike", "swinging_strike"),
na.rm = TRUE),
.groups = 'drop'
) %>%
arrange(count, desc(n))
return(sequence_by_count)
}
strider_by_count <- analyze_count_sequences(strider_data)
print("\nPitch Usage by Count:")
print(strider_by_count %>%
filter(n >= 20) %>%
head(15))
Python Sequence Analysis
def analyze_pitch_sequencing(pitcher_data):
"""
Comprehensive pitch sequencing analysis
Returns sequence patterns, effectiveness metrics, and recommendations
"""
# Sort data chronologically within at-bats
data = pitcher_data.sort_values(['game_date', 'at_bat_number', 'pitch_number'])
# Create lagged pitch types
data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
data['next_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(-1)
# Analyze two-pitch sequences
sequences = data.dropna(subset=['prev_pitch', 'pitch_type']).groupby(
['prev_pitch', 'pitch_type']
).agg({
'pitch_type': 'count',
'description': lambda x: (x == 'swinging_strike').mean() * 100,
'zone': lambda x: ((x < 1) & (data.loc[x.index, 'description'] == 'swinging_strike')).mean() * 100
}).rename(columns={
'pitch_type': 'count',
'description': 'whiff_rate',
'zone': 'chase_rate'
})
sequences['usage_pct'] = sequences['count'] / sequences['count'].sum() * 100
return sequences.sort_values('count', ascending=False)
def calculate_sequence_run_value(pitcher_data):
"""
Calculate run value of pitch sequences using linear weights
"""
# Linear weights for pitch outcomes
weights = {
'swinging_strike': -0.10,
'swinging_strike_blocked': -0.10,
'called_strike': -0.05,
'foul': -0.03,
'ball': 0.03,
'hit_by_pitch': 0.08,
'hit_into_play': 0.02 # Average, varies by batted ball
}
data = pitcher_data.copy()
data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
data['run_value'] = data['description'].map(weights).fillna(0)
sequence_values = data.dropna(subset=['prev_pitch']).groupby(
['prev_pitch', 'pitch_type']
).agg({
'run_value': ['mean', 'count'],
'estimated_woba_using_speedangle': 'mean'
})
return sequence_values
def visualize_sequence_network(pitcher_data, pitcher_name, min_sequences=20):
"""
Create network graph showing pitch sequence flow
"""
import networkx as nx
# Get sequence data
sequences = analyze_pitch_sequencing(pitcher_data)
sequences = sequences[sequences['count'] >= min_sequences]
# Create directed graph
G = nx.DiGraph()
for (prev_pitch, curr_pitch), row in sequences.iterrows():
G.add_edge(prev_pitch, curr_pitch,
weight=row['count'],
whiff_rate=row['whiff_rate'])
# Create visualization
fig, ax = plt.subplots(figsize=(12, 10))
pos = nx.spring_layout(G, k=2, iterations=50)
# Draw nodes
node_sizes = [G.degree(node) * 300 for node in G.nodes()]
nx.draw_networkx_nodes(G, pos, node_size=node_sizes,
node_color='lightblue',
edgecolors='black',
linewidths=2,
ax=ax)
# Draw edges with width proportional to frequency
edges = G.edges()
weights = [G[u][v]['weight'] for u, v in edges]
max_weight = max(weights)
nx.draw_networkx_edges(G, pos,
width=[w/max_weight * 5 for w in weights],
alpha=0.6,
edge_color='gray',
arrows=True,
arrowsize=20,
arrowstyle='->',
connectionstyle='arc3,rad=0.1',
ax=ax)
# Draw labels
nx.draw_networkx_labels(G, pos, font_size=12,
font_weight='bold', ax=ax)
# Add edge labels for high-frequency sequences
edge_labels = {(u, v): f"{G[u][v]['weight']}\n{G[u][v]['whiff_rate']:.1f}%"
for u, v in edges if G[u][v]['weight'] > min_sequences * 2}
nx.draw_networkx_edge_labels(G, pos, edge_labels,
font_size=8, ax=ax)
ax.set_title(f'{pitcher_name} - Pitch Sequence Network\n' +
'Node size = degree | Edge width = frequency | Labels = count & whiff%',
fontsize=14, fontweight='bold', pad=20)
ax.axis('off')
plt.tight_layout()
return fig
def identify_high_leverage_sequences(pitcher_data):
"""
Identify most effective sequences in high-leverage situations
"""
data = pitcher_data.copy()
data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
# Define high leverage: 2-strike counts
high_leverage = data[data['strikes'] == 2].copy()
sequences = high_leverage.dropna(subset=['prev_pitch']).groupby(
['prev_pitch', 'pitch_type']
).agg({
'description': [
('count', 'count'),
('strikeout_rate', lambda x: (x == 'swinging_strike').sum() / len(x)),
('whiff_rate', lambda x: (x == 'swinging_strike').mean()),
],
'events': lambda x: x.notna().sum()
})
sequences.columns = ['_'.join(col).strip('_') for col in sequences.columns.values]
return sequences.sort_values('description_count', ascending=False)
# Example usage
# sequences = analyze_pitch_sequencing(pitches)
# print("\nMost Common Sequences:")
# print(sequences.head(15))
#
# high_lev = identify_high_leverage_sequences(pitches)
# print("\nTwo-Strike Sequences:")
# print(high_lev.head(10))
# Analyze pitch sequencing patterns
library(tidyverse)
analyze_pitch_sequences <- function(pitcher_data, sequence_length = 2) {
# Arrange by game and at-bat
sequenced_data <- pitcher_data %>%
filter(!is.na(pitch_type)) %>%
arrange(game_date, at_bat_number, pitch_number) %>%
group_by(game_pk, at_bat_number) %>%
mutate(
prev_pitch = lag(pitch_type, 1),
prev_prev_pitch = lag(pitch_type, 2),
result = description
) %>%
ungroup()
# Analyze two-pitch sequences
if (sequence_length == 2) {
sequences <- sequenced_data %>%
filter(!is.na(prev_pitch)) %>%
group_by(prev_pitch, pitch_type) %>%
summarise(
n_sequences = n(),
# Outcomes
swing_pct = mean(description %in% c("swinging_strike",
"foul",
"hit_into_play"),
na.rm = TRUE) * 100,
whiff_pct = mean(description == "swinging_strike",
na.rm = TRUE) * 100,
contact_pct = mean(description %in% c("foul", "hit_into_play"),
na.rm = TRUE) * 100,
chase_pct = mean(description == "swinging_strike" &
zone < 1, na.rm = TRUE) * 100,
.groups = 'drop'
) %>%
arrange(desc(n_sequences))
return(sequences)
}
# Analyze three-pitch sequences
if (sequence_length == 3) {
sequences <- sequenced_data %>%
filter(!is.na(prev_pitch) & !is.na(prev_prev_pitch)) %>%
group_by(prev_prev_pitch, prev_pitch, pitch_type) %>%
summarise(
n_sequences = n(),
whiff_pct = mean(description == "swinging_strike",
na.rm = TRUE) * 100,
.groups = 'drop'
) %>%
arrange(desc(n_sequences))
return(sequences)
}
}
# Analyze Spencer Strider's sequences
strider_sequences_2 <- analyze_pitch_sequences(strider_data, sequence_length = 2)
print("Spencer Strider Two-Pitch Sequences (2023)")
print(strider_sequences_2 %>% head(10))
# Focus on tunneling pairs
tunnel_sequences <- strider_sequences_2 %>%
filter((prev_pitch == "FF" & pitch_type == "SL") |
(prev_pitch == "SL" & pitch_type == "FF"))
print("\nFastball-Slider Tunneling Sequences:")
print(tunnel_sequences)
# Visualize sequence effectiveness
plot_sequence_heatmap <- function(sequence_data, metric = "whiff_pct") {
# Create matrix for heatmap
plot_matrix <- sequence_data %>%
select(prev_pitch, pitch_type, all_of(metric)) %>%
pivot_wider(
names_from = pitch_type,
values_from = all_of(metric),
values_fill = 0
) %>%
column_to_rownames("prev_pitch") %>%
as.matrix()
# Create heatmap
library(reshape2)
melted <- melt(plot_matrix)
p <- ggplot(melted, aes(x = Var2, y = Var1, fill = value)) +
geom_tile(color = "white", size = 1) +
geom_text(aes(label = sprintf("%.1f%%", value)),
color = "white", fontface = "bold", size = 4) +
scale_fill_gradient2(
low = "#3182bd", mid = "#9ecae1", high = "#de2d26",
midpoint = median(melted$value, na.rm = TRUE),
name = str_replace_all(metric, "_", " ") %>% str_to_title()
) +
labs(
title = "Pitch Sequencing Effectiveness",
subtitle = paste("Metric:", str_replace_all(metric, "_", " ") %>% str_to_title()),
x = "Current Pitch",
y = "Previous Pitch"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(size = 14, face = "bold")
) +
coord_fixed()
return(p)
}
p_sequence <- plot_sequence_heatmap(strider_sequences_2, "whiff_pct")
print(p_sequence)
# Analyze count-based sequencing
analyze_count_sequences <- function(pitcher_data) {
sequence_by_count <- pitcher_data %>%
filter(!is.na(pitch_type) & !is.na(balls) & !is.na(strikes)) %>%
mutate(count = paste0(balls, "-", strikes)) %>%
group_by(count, pitch_type) %>%
summarise(
n = n(),
usage_pct = n() / nrow(pitcher_data) * 100,
whiff_rate = mean(description == "swinging_strike", na.rm = TRUE),
csw_rate = mean(description %in% c("called_strike", "swinging_strike"),
na.rm = TRUE),
.groups = 'drop'
) %>%
arrange(count, desc(n))
return(sequence_by_count)
}
strider_by_count <- analyze_count_sequences(strider_data)
print("\nPitch Usage by Count:")
print(strider_by_count %>%
filter(n >= 20) %>%
head(15))
def analyze_pitch_sequencing(pitcher_data):
"""
Comprehensive pitch sequencing analysis
Returns sequence patterns, effectiveness metrics, and recommendations
"""
# Sort data chronologically within at-bats
data = pitcher_data.sort_values(['game_date', 'at_bat_number', 'pitch_number'])
# Create lagged pitch types
data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
data['next_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(-1)
# Analyze two-pitch sequences
sequences = data.dropna(subset=['prev_pitch', 'pitch_type']).groupby(
['prev_pitch', 'pitch_type']
).agg({
'pitch_type': 'count',
'description': lambda x: (x == 'swinging_strike').mean() * 100,
'zone': lambda x: ((x < 1) & (data.loc[x.index, 'description'] == 'swinging_strike')).mean() * 100
}).rename(columns={
'pitch_type': 'count',
'description': 'whiff_rate',
'zone': 'chase_rate'
})
sequences['usage_pct'] = sequences['count'] / sequences['count'].sum() * 100
return sequences.sort_values('count', ascending=False)
def calculate_sequence_run_value(pitcher_data):
"""
Calculate run value of pitch sequences using linear weights
"""
# Linear weights for pitch outcomes
weights = {
'swinging_strike': -0.10,
'swinging_strike_blocked': -0.10,
'called_strike': -0.05,
'foul': -0.03,
'ball': 0.03,
'hit_by_pitch': 0.08,
'hit_into_play': 0.02 # Average, varies by batted ball
}
data = pitcher_data.copy()
data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
data['run_value'] = data['description'].map(weights).fillna(0)
sequence_values = data.dropna(subset=['prev_pitch']).groupby(
['prev_pitch', 'pitch_type']
).agg({
'run_value': ['mean', 'count'],
'estimated_woba_using_speedangle': 'mean'
})
return sequence_values
def visualize_sequence_network(pitcher_data, pitcher_name, min_sequences=20):
"""
Create network graph showing pitch sequence flow
"""
import networkx as nx
# Get sequence data
sequences = analyze_pitch_sequencing(pitcher_data)
sequences = sequences[sequences['count'] >= min_sequences]
# Create directed graph
G = nx.DiGraph()
for (prev_pitch, curr_pitch), row in sequences.iterrows():
G.add_edge(prev_pitch, curr_pitch,
weight=row['count'],
whiff_rate=row['whiff_rate'])
# Create visualization
fig, ax = plt.subplots(figsize=(12, 10))
pos = nx.spring_layout(G, k=2, iterations=50)
# Draw nodes
node_sizes = [G.degree(node) * 300 for node in G.nodes()]
nx.draw_networkx_nodes(G, pos, node_size=node_sizes,
node_color='lightblue',
edgecolors='black',
linewidths=2,
ax=ax)
# Draw edges with width proportional to frequency
edges = G.edges()
weights = [G[u][v]['weight'] for u, v in edges]
max_weight = max(weights)
nx.draw_networkx_edges(G, pos,
width=[w/max_weight * 5 for w in weights],
alpha=0.6,
edge_color='gray',
arrows=True,
arrowsize=20,
arrowstyle='->',
connectionstyle='arc3,rad=0.1',
ax=ax)
# Draw labels
nx.draw_networkx_labels(G, pos, font_size=12,
font_weight='bold', ax=ax)
# Add edge labels for high-frequency sequences
edge_labels = {(u, v): f"{G[u][v]['weight']}\n{G[u][v]['whiff_rate']:.1f}%"
for u, v in edges if G[u][v]['weight'] > min_sequences * 2}
nx.draw_networkx_edge_labels(G, pos, edge_labels,
font_size=8, ax=ax)
ax.set_title(f'{pitcher_name} - Pitch Sequence Network\n' +
'Node size = degree | Edge width = frequency | Labels = count & whiff%',
fontsize=14, fontweight='bold', pad=20)
ax.axis('off')
plt.tight_layout()
return fig
def identify_high_leverage_sequences(pitcher_data):
"""
Identify most effective sequences in high-leverage situations
"""
data = pitcher_data.copy()
data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
# Define high leverage: 2-strike counts
high_leverage = data[data['strikes'] == 2].copy()
sequences = high_leverage.dropna(subset=['prev_pitch']).groupby(
['prev_pitch', 'pitch_type']
).agg({
'description': [
('count', 'count'),
('strikeout_rate', lambda x: (x == 'swinging_strike').sum() / len(x)),
('whiff_rate', lambda x: (x == 'swinging_strike').mean()),
],
'events': lambda x: x.notna().sum()
})
sequences.columns = ['_'.join(col).strip('_') for col in sequences.columns.values]
return sequences.sort_values('description_count', ascending=False)
# Example usage
# sequences = analyze_pitch_sequencing(pitches)
# print("\nMost Common Sequences:")
# print(sequences.head(15))
#
# high_lev = identify_high_leverage_sequences(pitches)
# print("\nTwo-Strike Sequences:")
# print(high_lev.head(10))
Quantifying pitch deception requires metrics that capture hitter decision-making quality. Traditional stats like strikeout rate don't isolate the tunneling effect. We need metrics that measure how often hitters are "fooled" by the pitch presentation.
Key Deception Metrics
Swing Decision Value (SDV)
This metric evaluates whether hitters make correct swing decisions based on pitch location. It compares actual swing/take decisions against optimal decisions (swing at strikes, take balls).
SDV = (Correct Swings + Correct Takes) / Total Pitches
Lower SDV indicates more deception—hitters are making poor decisions.
Chase Rate on Tunneled Pitches
Measures how often hitters swing at pitches outside the zone when they follow a tunneling setup pitch.
Early Swing Rate
Hitters who are fooled by tunneling often commit to their swing earlier, visible in higher swing rates on non-competitive pitches.
Expected vs. Actual Contact Quality
When tunneling works, hitters make poor contact even when they swing. This can be measured by comparing expected wOBA (xwOBA) on contact against actual results.
Implementing Deception Metrics in R
# Calculate comprehensive deception metrics
calculate_deception_metrics <- function(pitcher_data) {
# Add swing/take information
pitch_metrics <- pitcher_data %>%
mutate(
swing = description %in% c("swinging_strike", "swinging_strike_blocked",
"foul", "foul_tip", "hit_into_play",
"foul_bunt", "missed_bunt"),
whiff = description %in% c("swinging_strike", "swinging_strike_blocked"),
chase = swing & (zone > 9 | is.na(zone)),
in_zone = zone <= 9 & !is.na(zone),
# Swing decision value components
correct_swing = swing & in_zone,
correct_take = !swing & !in_zone,
incorrect_swing = swing & !in_zone,
incorrect_take = !swing & in_zone
)
# Overall metrics
overall <- pitch_metrics %>%
summarise(
total_pitches = n(),
# Swing metrics
swing_rate = mean(swing, na.rm = TRUE) * 100,
whiff_rate = sum(whiff, na.rm = TRUE) / sum(swing, na.rm = TRUE) * 100,
chase_rate = sum(chase, na.rm = TRUE) / sum(!in_zone, na.rm = TRUE) * 100,
zone_swing_rate = sum(swing[in_zone], na.rm = TRUE) /
sum(in_zone, na.rm = TRUE) * 100,
# Decision quality
swing_decision_value = mean(c(correct_swing, correct_take),
na.rm = TRUE) * 100,
# Called strike plus whiff rate
csw_rate = mean(description %in% c("called_strike", "swinging_strike",
"swinging_strike_blocked"),
na.rm = TRUE) * 100
)
# By pitch type
by_pitch <- pitch_metrics %>%
group_by(pitch_type) %>%
summarise(
n = n(),
swing_rate = mean(swing, na.rm = TRUE) * 100,
whiff_rate = sum(whiff, na.rm = TRUE) / sum(swing, na.rm = TRUE) * 100,
chase_rate = sum(chase, na.rm = TRUE) / sum(!in_zone, na.rm = TRUE) * 100,
csw_rate = mean(description %in% c("called_strike", "swinging_strike"),
na.rm = TRUE) * 100,
.groups = 'drop'
) %>%
arrange(desc(n))
# Tunnel-specific metrics (pitches following fastballs)
tunnel_metrics <- pitch_metrics %>%
arrange(game_date, at_bat_number, pitch_number) %>%
group_by(game_pk, at_bat_number) %>%
mutate(prev_pitch = lag(pitch_type)) %>%
ungroup() %>%
filter(prev_pitch == "FF" & pitch_type %in% c("SL", "CU", "CH")) %>%
group_by(pitch_type) %>%
summarise(
n_after_ff = n(),
chase_rate_after_ff = sum(chase, na.rm = TRUE) /
sum(!in_zone, na.rm = TRUE) * 100,
whiff_rate_after_ff = sum(whiff, na.rm = TRUE) /
sum(swing, na.rm = TRUE) * 100,
.groups = 'drop'
)
return(list(
overall = overall,
by_pitch_type = by_pitch,
tunnel_metrics = tunnel_metrics
))
}
# Analyze Spencer Strider's deception
strider_deception <- calculate_deception_metrics(strider_data)
print("Spencer Strider Deception Metrics (2023)")
print("========================================\n")
print("Overall:")
print(strider_deception$overall)
print("\nBy Pitch Type:")
print(strider_deception$by_pitch_type)
print("\nAfter Fastball Setup:")
print(strider_deception$tunnel_metrics)
# Compare deception across counts
deception_by_count <- function(pitcher_data) {
pitcher_data %>%
filter(!is.na(balls) & !is.na(strikes)) %>%
mutate(
count = paste0(balls, "-", strikes),
swing = description %in% c("swinging_strike", "foul",
"hit_into_play", "swinging_strike_blocked"),
chase = swing & (zone > 9 | is.na(zone)),
in_zone = zone <= 9 & !is.na(zone)
) %>%
group_by(count) %>%
summarise(
n = n(),
chase_rate = sum(chase, na.rm = TRUE) / sum(!in_zone, na.rm = TRUE) * 100,
swing_rate = mean(swing, na.rm = TRUE) * 100,
.groups = 'drop'
) %>%
arrange(desc(n))
}
strider_count_deception <- deception_by_count(strider_data)
print("\nDeception by Count:")
print(strider_count_deception)
# Visualize deception metrics
plot_deception_comparison <- function(pitcher_data, pitcher_name) {
metrics <- calculate_deception_metrics(pitcher_data)
plot_data <- metrics$by_pitch_type %>%
filter(n >= 50) %>%
select(pitch_type, whiff_rate, chase_rate, csw_rate) %>%
pivot_longer(cols = c(whiff_rate, chase_rate, csw_rate),
names_to = "metric",
values_to = "rate")
p <- ggplot(plot_data, aes(x = pitch_type, y = rate, fill = metric)) +
geom_bar(stat = "identity", position = "dodge", alpha = 0.8) +
geom_text(aes(label = sprintf("%.1f%%", rate)),
position = position_dodge(width = 0.9),
vjust = -0.5, size = 3) +
scale_fill_brewer(palette = "Set2",
labels = c("Chase Rate", "CSW Rate", "Whiff Rate")) +
labs(
title = paste(pitcher_name, "- Deception Effectiveness by Pitch"),
subtitle = "Key metrics for measuring pitch deception",
x = "Pitch Type",
y = "Rate (%)",
fill = "Metric"
) +
theme_minimal() +
theme(
legend.position = "bottom",
plot.title = element_text(face = "bold", size = 14)
)
return(p)
}
p_deception <- plot_deception_comparison(strider_data, "Spencer Strider")
print(p_deception)
Advanced Deception Analysis in Python
def calculate_tunneling_effectiveness_score(pitcher_data):
"""
Calculate comprehensive tunneling effectiveness score (TES)
TES combines multiple factors:
- Chase rate on breaking balls after fastballs
- Whiff rate on tunneled pitches
- Swing decision errors
- Contact quality degradation
Returns score from 0-100 (higher = better tunneling)
"""
data = pitcher_data.copy()
# Identify swings and chases
data['swing'] = data['description'].isin([
'swinging_strike', 'swinging_strike_blocked',
'foul', 'foul_tip', 'hit_into_play'
])
data['whiff'] = data['description'].isin([
'swinging_strike', 'swinging_strike_blocked'
])
data['in_zone'] = (data['zone'] <= 9) & (data['zone'].notna())
data['chase'] = data['swing'] & ~data['in_zone']
# Add previous pitch info
data = data.sort_values(['game_date', 'at_bat_number', 'pitch_number'])
data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
# Calculate component scores
# 1. Chase rate on breaking balls after fastballs (weight: 35%)
tunnel_pitches = data[
(data['prev_pitch'] == 'FF') &
(data['pitch_type'].isin(['SL', 'CU', 'CH']))
]
if len(tunnel_pitches) > 0:
chase_score = (tunnel_pitches['chase'].sum() /
(~tunnel_pitches['in_zone']).sum() * 100)
else:
chase_score = 0
# 2. Whiff rate on swings (weight: 30%)
if data['swing'].sum() > 0:
whiff_score = (data['whiff'].sum() / data['swing'].sum() * 100)
else:
whiff_score = 0
# 3. CSW rate (weight: 20%)
csw = data['description'].isin(['called_strike', 'swinging_strike'])
csw_score = csw.mean() * 100
# 4. Swing decision errors (weight: 15%)
correct_decisions = (
(data['swing'] & data['in_zone']) |
(~data['swing'] & ~data['in_zone'])
)
decision_error_rate = (1 - correct_decisions.mean()) * 100
# Combine scores with weights
tes = (
chase_score * 0.35 +
whiff_score * 0.30 +
csw_score * 0.20 +
decision_error_rate * 0.15
)
# Normalize to 0-100 scale
tes = min(100, max(0, tes))
return {
'tunneling_effectiveness_score': tes,
'chase_rate': chase_score,
'whiff_rate': whiff_score,
'csw_rate': csw_score,
'decision_error_rate': decision_error_rate
}
def compare_deception_to_league(pitcher_data, league_data=None):
"""
Compare pitcher's deception metrics to league average
Returns percentile rankings for key metrics
"""
pitcher_metrics = calculate_tunneling_effectiveness_score(pitcher_data)
# If no league data provided, use typical MLB averages (2023)
if league_data is None:
league_avg = {
'chase_rate': 28.5,
'whiff_rate': 24.2,
'csw_rate': 28.1,
'tunneling_effectiveness_score': 27.0
}
league_std = {
'chase_rate': 4.5,
'whiff_rate': 3.8,
'csw_rate': 3.2,
'tunneling_effectiveness_score': 5.5
}
else:
# Calculate from provided league data
league_avg = calculate_tunneling_effectiveness_score(league_data)
league_std = {k: v * 0.15 for k, v in league_avg.items()} # Estimate
# Calculate z-scores and percentiles
comparison = {}
for metric in ['chase_rate', 'whiff_rate', 'csw_rate',
'tunneling_effectiveness_score']:
z_score = ((pitcher_metrics[metric] - league_avg[metric]) /
league_std[metric])
# Convert to percentile (approximate)
from scipy import stats
percentile = stats.norm.cdf(z_score) * 100
comparison[metric] = {
'value': pitcher_metrics[metric],
'league_avg': league_avg[metric],
'z_score': z_score,
'percentile': percentile
}
return comparison
def visualize_deception_dashboard(pitcher_data, pitcher_name):
"""
Create comprehensive deception analysis dashboard
"""
fig = plt.figure(figsize=(16, 12))
gs = fig.add_gridspec(3, 3, hspace=0.3, wspace=0.3)
# Prepare data
data = pitcher_data.copy()
data['swing'] = data['description'].isin([
'swinging_strike', 'foul', 'hit_into_play', 'swinging_strike_blocked'
])
data['whiff'] = data['description'].isin(['swinging_strike', 'swinging_strike_blocked'])
data['in_zone'] = (data['zone'] <= 9) & (data['zone'].notna())
data['chase'] = data['swing'] & ~data['in_zone']
# 1. Overall TES Score (large)
ax1 = fig.add_subplot(gs[0, :2])
tes_result = calculate_tunneling_effectiveness_score(data)
tes = tes_result['tunneling_effectiveness_score']
# Create gauge chart
theta = np.linspace(0, np.pi, 100)
r = np.ones(100)
# Color segments
ax1.fill_between(theta[:33], 0, r[:33], color='#d62728', alpha=0.3, label='Below Avg')
ax1.fill_between(theta[33:66], 0, r[33:66], color='#ff7f0e', alpha=0.3, label='Average')
ax1.fill_between(theta[66:], 0, r[66:], color='#2ca02c', alpha=0.3, label='Elite')
# Add needle
angle = (tes / 100) * np.pi
ax1.plot([0, np.cos(angle)], [0, np.sin(angle)], 'k-', linewidth=4)
ax1.plot(0, 0, 'ko', markersize=15)
ax1.set_xlim(-0.1, 1.1)
ax1.set_ylim(0, 1.1)
ax1.axis('off')
ax1.set_title(f'Tunneling Effectiveness Score: {tes:.1f}',
fontsize=16, fontweight='bold', pad=20)
ax1.text(0.5, -0.15, f'Percentile: {tes:.0f}th',
ha='center', fontsize=12, transform=ax1.transData)
# 2. Component breakdown
ax2 = fig.add_subplot(gs[0, 2])
components = ['Chase\nRate', 'Whiff\nRate', 'CSW\nRate', 'Decision\nErrors']
values = [
tes_result['chase_rate'],
tes_result['whiff_rate'],
tes_result['csw_rate'],
tes_result['decision_error_rate']
]
colors_comp = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']
ax2.barh(components, values, color=colors_comp, alpha=0.7)
ax2.set_xlabel('Rate (%)', fontsize=10)
ax2.set_title('Component Metrics', fontsize=11, fontweight='bold')
ax2.grid(axis='x', alpha=0.3)
# 3. Chase rate by pitch type after FF
ax3 = fig.add_subplot(gs[1, 0])
data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
tunnel_data = data[data['prev_pitch'] == 'FF']
chase_by_pitch = tunnel_data.groupby('pitch_type').apply(
lambda x: x['chase'].sum() / (~x['in_zone']).sum() * 100 if (~x['in_zone']).sum() > 0 else 0
).sort_values(ascending=False)
ax3.bar(range(len(chase_by_pitch)), chase_by_pitch.values,
color='steelblue', alpha=0.7)
ax3.set_xticks(range(len(chase_by_pitch)))
ax3.set_xticklabels(chase_by_pitch.index, rotation=45)
ax3.set_ylabel('Chase Rate (%)', fontsize=10)
ax3.set_title('Chase Rate After Fastball', fontsize=11, fontweight='bold')
ax3.grid(axis='y', alpha=0.3)
ax3.axhline(28.5, color='red', linestyle='--', alpha=0.5, label='MLB Avg')
ax3.legend(fontsize=8)
# 4. Whiff rate by count
ax4 = fig.add_subplot(gs[1, 1])
data['count'] = data['balls'].astype(str) + '-' + data['strikes'].astype(str)
whiff_by_count = data.groupby('count').apply(
lambda x: x['whiff'].sum() / x['swing'].sum() * 100 if x['swing'].sum() > 0 else 0
)
# Order counts logically
count_order = ['0-0', '0-1', '0-2', '1-0', '1-1', '1-2',
'2-0', '2-1', '2-2', '3-0', '3-1', '3-2']
whiff_by_count = whiff_by_count.reindex(
[c for c in count_order if c in whiff_by_count.index]
)
ax4.plot(range(len(whiff_by_count)), whiff_by_count.values,
marker='o', linewidth=2, markersize=8, color='darkgreen')
ax4.set_xticks(range(len(whiff_by_count)))
ax4.set_xticklabels(whiff_by_count.index, rotation=45, fontsize=8)
ax4.set_ylabel('Whiff Rate (%)', fontsize=10)
ax4.set_title('Whiff Rate by Count', fontsize=11, fontweight='bold')
ax4.grid(alpha=0.3)
ax4.axhline(24.2, color='red', linestyle='--', alpha=0.5)
# 5. Heat map: swing rate by location after FF
ax5 = fig.add_subplot(gs[1, 2])
tunnel_data_valid = tunnel_data.dropna(subset=['plate_x', 'plate_z'])
# Create 2D histogram
x_bins = np.linspace(-1.5, 1.5, 10)
z_bins = np.linspace(1.0, 4.0, 10)
H, xedges, zedges = np.histogram2d(
tunnel_data_valid['plate_x'],
tunnel_data_valid['plate_z'],
bins=[x_bins, z_bins],
weights=tunnel_data_valid['swing'].astype(float)
)
H_count, _, _ = np.histogram2d(
tunnel_data_valid['plate_x'],
tunnel_data_valid['plate_z'],
bins=[x_bins, z_bins]
)
swing_rate_grid = np.divide(H, H_count, where=H_count > 0,
out=np.zeros_like(H)) * 100
im = ax5.imshow(swing_rate_grid.T, origin='lower',
extent=[-1.5, 1.5, 1.0, 4.0],
aspect='auto', cmap='RdYlGn', vmin=0, vmax=100)
# Add strike zone
from matplotlib.patches import Rectangle
sz = Rectangle((-17/24, 1.5), 17/12, 2.0,
linewidth=2, edgecolor='black',
facecolor='none')
ax5.add_patch(sz)
ax5.set_xlabel('Horizontal (ft)', fontsize=9)
ax5.set_ylabel('Height (ft)', fontsize=9)
ax5.set_title('Swing Rate After FB', fontsize=11, fontweight='bold')
plt.colorbar(im, ax=ax5, label='Swing %')
# 6. Pitch sequence network (simplified)
ax6 = fig.add_subplot(gs[2, :])
data_seq = data.sort_values(['game_date', 'at_bat_number', 'pitch_number'])
sequences = data_seq.groupby(['prev_pitch', 'pitch_type']).size()
sequences = sequences[sequences >= 20].sort_values(ascending=False).head(10)
labels = [f"{prev} → {curr}" for prev, curr in sequences.index]
values = sequences.values
bars = ax6.barh(range(len(sequences)), values, color='coral', alpha=0.7)
ax6.set_yticks(range(len(sequences)))
ax6.set_yticklabels(labels, fontsize=9)
ax6.set_xlabel('Frequency', fontsize=10)
ax6.set_title('Most Common Pitch Sequences', fontsize=11, fontweight='bold')
ax6.grid(axis='x', alpha=0.3)
# Add value labels
for i, (bar, val) in enumerate(zip(bars, values)):
ax6.text(val + 5, i, str(int(val)), va='center', fontsize=9)
plt.suptitle(f'{pitcher_name} - Deception Analytics Dashboard',
fontsize=16, fontweight='bold', y=0.995)
return fig
# Example usage
# tes_score = calculate_tunneling_effectiveness_score(pitches)
# print(f"\nTunneling Effectiveness Score: {tes_score['tunneling_effectiveness_score']:.1f}")
# print(f"Component Breakdown:")
# for k, v in tes_score.items():
# if k != 'tunneling_effectiveness_score':
# print(f" {k}: {v:.1f}%")
#
# fig_dashboard = visualize_deception_dashboard(pitches, "Dylan Cease")
# plt.savefig('cease_deception_dashboard.png', dpi=300, bbox_inches='tight')
SDV = (Correct Swings + Correct Takes) / Total Pitches
# Calculate comprehensive deception metrics
calculate_deception_metrics <- function(pitcher_data) {
# Add swing/take information
pitch_metrics <- pitcher_data %>%
mutate(
swing = description %in% c("swinging_strike", "swinging_strike_blocked",
"foul", "foul_tip", "hit_into_play",
"foul_bunt", "missed_bunt"),
whiff = description %in% c("swinging_strike", "swinging_strike_blocked"),
chase = swing & (zone > 9 | is.na(zone)),
in_zone = zone <= 9 & !is.na(zone),
# Swing decision value components
correct_swing = swing & in_zone,
correct_take = !swing & !in_zone,
incorrect_swing = swing & !in_zone,
incorrect_take = !swing & in_zone
)
# Overall metrics
overall <- pitch_metrics %>%
summarise(
total_pitches = n(),
# Swing metrics
swing_rate = mean(swing, na.rm = TRUE) * 100,
whiff_rate = sum(whiff, na.rm = TRUE) / sum(swing, na.rm = TRUE) * 100,
chase_rate = sum(chase, na.rm = TRUE) / sum(!in_zone, na.rm = TRUE) * 100,
zone_swing_rate = sum(swing[in_zone], na.rm = TRUE) /
sum(in_zone, na.rm = TRUE) * 100,
# Decision quality
swing_decision_value = mean(c(correct_swing, correct_take),
na.rm = TRUE) * 100,
# Called strike plus whiff rate
csw_rate = mean(description %in% c("called_strike", "swinging_strike",
"swinging_strike_blocked"),
na.rm = TRUE) * 100
)
# By pitch type
by_pitch <- pitch_metrics %>%
group_by(pitch_type) %>%
summarise(
n = n(),
swing_rate = mean(swing, na.rm = TRUE) * 100,
whiff_rate = sum(whiff, na.rm = TRUE) / sum(swing, na.rm = TRUE) * 100,
chase_rate = sum(chase, na.rm = TRUE) / sum(!in_zone, na.rm = TRUE) * 100,
csw_rate = mean(description %in% c("called_strike", "swinging_strike"),
na.rm = TRUE) * 100,
.groups = 'drop'
) %>%
arrange(desc(n))
# Tunnel-specific metrics (pitches following fastballs)
tunnel_metrics <- pitch_metrics %>%
arrange(game_date, at_bat_number, pitch_number) %>%
group_by(game_pk, at_bat_number) %>%
mutate(prev_pitch = lag(pitch_type)) %>%
ungroup() %>%
filter(prev_pitch == "FF" & pitch_type %in% c("SL", "CU", "CH")) %>%
group_by(pitch_type) %>%
summarise(
n_after_ff = n(),
chase_rate_after_ff = sum(chase, na.rm = TRUE) /
sum(!in_zone, na.rm = TRUE) * 100,
whiff_rate_after_ff = sum(whiff, na.rm = TRUE) /
sum(swing, na.rm = TRUE) * 100,
.groups = 'drop'
)
return(list(
overall = overall,
by_pitch_type = by_pitch,
tunnel_metrics = tunnel_metrics
))
}
# Analyze Spencer Strider's deception
strider_deception <- calculate_deception_metrics(strider_data)
print("Spencer Strider Deception Metrics (2023)")
print("========================================\n")
print("Overall:")
print(strider_deception$overall)
print("\nBy Pitch Type:")
print(strider_deception$by_pitch_type)
print("\nAfter Fastball Setup:")
print(strider_deception$tunnel_metrics)
# Compare deception across counts
deception_by_count <- function(pitcher_data) {
pitcher_data %>%
filter(!is.na(balls) & !is.na(strikes)) %>%
mutate(
count = paste0(balls, "-", strikes),
swing = description %in% c("swinging_strike", "foul",
"hit_into_play", "swinging_strike_blocked"),
chase = swing & (zone > 9 | is.na(zone)),
in_zone = zone <= 9 & !is.na(zone)
) %>%
group_by(count) %>%
summarise(
n = n(),
chase_rate = sum(chase, na.rm = TRUE) / sum(!in_zone, na.rm = TRUE) * 100,
swing_rate = mean(swing, na.rm = TRUE) * 100,
.groups = 'drop'
) %>%
arrange(desc(n))
}
strider_count_deception <- deception_by_count(strider_data)
print("\nDeception by Count:")
print(strider_count_deception)
# Visualize deception metrics
plot_deception_comparison <- function(pitcher_data, pitcher_name) {
metrics <- calculate_deception_metrics(pitcher_data)
plot_data <- metrics$by_pitch_type %>%
filter(n >= 50) %>%
select(pitch_type, whiff_rate, chase_rate, csw_rate) %>%
pivot_longer(cols = c(whiff_rate, chase_rate, csw_rate),
names_to = "metric",
values_to = "rate")
p <- ggplot(plot_data, aes(x = pitch_type, y = rate, fill = metric)) +
geom_bar(stat = "identity", position = "dodge", alpha = 0.8) +
geom_text(aes(label = sprintf("%.1f%%", rate)),
position = position_dodge(width = 0.9),
vjust = -0.5, size = 3) +
scale_fill_brewer(palette = "Set2",
labels = c("Chase Rate", "CSW Rate", "Whiff Rate")) +
labs(
title = paste(pitcher_name, "- Deception Effectiveness by Pitch"),
subtitle = "Key metrics for measuring pitch deception",
x = "Pitch Type",
y = "Rate (%)",
fill = "Metric"
) +
theme_minimal() +
theme(
legend.position = "bottom",
plot.title = element_text(face = "bold", size = 14)
)
return(p)
}
p_deception <- plot_deception_comparison(strider_data, "Spencer Strider")
print(p_deception)
def calculate_tunneling_effectiveness_score(pitcher_data):
"""
Calculate comprehensive tunneling effectiveness score (TES)
TES combines multiple factors:
- Chase rate on breaking balls after fastballs
- Whiff rate on tunneled pitches
- Swing decision errors
- Contact quality degradation
Returns score from 0-100 (higher = better tunneling)
"""
data = pitcher_data.copy()
# Identify swings and chases
data['swing'] = data['description'].isin([
'swinging_strike', 'swinging_strike_blocked',
'foul', 'foul_tip', 'hit_into_play'
])
data['whiff'] = data['description'].isin([
'swinging_strike', 'swinging_strike_blocked'
])
data['in_zone'] = (data['zone'] <= 9) & (data['zone'].notna())
data['chase'] = data['swing'] & ~data['in_zone']
# Add previous pitch info
data = data.sort_values(['game_date', 'at_bat_number', 'pitch_number'])
data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
# Calculate component scores
# 1. Chase rate on breaking balls after fastballs (weight: 35%)
tunnel_pitches = data[
(data['prev_pitch'] == 'FF') &
(data['pitch_type'].isin(['SL', 'CU', 'CH']))
]
if len(tunnel_pitches) > 0:
chase_score = (tunnel_pitches['chase'].sum() /
(~tunnel_pitches['in_zone']).sum() * 100)
else:
chase_score = 0
# 2. Whiff rate on swings (weight: 30%)
if data['swing'].sum() > 0:
whiff_score = (data['whiff'].sum() / data['swing'].sum() * 100)
else:
whiff_score = 0
# 3. CSW rate (weight: 20%)
csw = data['description'].isin(['called_strike', 'swinging_strike'])
csw_score = csw.mean() * 100
# 4. Swing decision errors (weight: 15%)
correct_decisions = (
(data['swing'] & data['in_zone']) |
(~data['swing'] & ~data['in_zone'])
)
decision_error_rate = (1 - correct_decisions.mean()) * 100
# Combine scores with weights
tes = (
chase_score * 0.35 +
whiff_score * 0.30 +
csw_score * 0.20 +
decision_error_rate * 0.15
)
# Normalize to 0-100 scale
tes = min(100, max(0, tes))
return {
'tunneling_effectiveness_score': tes,
'chase_rate': chase_score,
'whiff_rate': whiff_score,
'csw_rate': csw_score,
'decision_error_rate': decision_error_rate
}
def compare_deception_to_league(pitcher_data, league_data=None):
"""
Compare pitcher's deception metrics to league average
Returns percentile rankings for key metrics
"""
pitcher_metrics = calculate_tunneling_effectiveness_score(pitcher_data)
# If no league data provided, use typical MLB averages (2023)
if league_data is None:
league_avg = {
'chase_rate': 28.5,
'whiff_rate': 24.2,
'csw_rate': 28.1,
'tunneling_effectiveness_score': 27.0
}
league_std = {
'chase_rate': 4.5,
'whiff_rate': 3.8,
'csw_rate': 3.2,
'tunneling_effectiveness_score': 5.5
}
else:
# Calculate from provided league data
league_avg = calculate_tunneling_effectiveness_score(league_data)
league_std = {k: v * 0.15 for k, v in league_avg.items()} # Estimate
# Calculate z-scores and percentiles
comparison = {}
for metric in ['chase_rate', 'whiff_rate', 'csw_rate',
'tunneling_effectiveness_score']:
z_score = ((pitcher_metrics[metric] - league_avg[metric]) /
league_std[metric])
# Convert to percentile (approximate)
from scipy import stats
percentile = stats.norm.cdf(z_score) * 100
comparison[metric] = {
'value': pitcher_metrics[metric],
'league_avg': league_avg[metric],
'z_score': z_score,
'percentile': percentile
}
return comparison
def visualize_deception_dashboard(pitcher_data, pitcher_name):
"""
Create comprehensive deception analysis dashboard
"""
fig = plt.figure(figsize=(16, 12))
gs = fig.add_gridspec(3, 3, hspace=0.3, wspace=0.3)
# Prepare data
data = pitcher_data.copy()
data['swing'] = data['description'].isin([
'swinging_strike', 'foul', 'hit_into_play', 'swinging_strike_blocked'
])
data['whiff'] = data['description'].isin(['swinging_strike', 'swinging_strike_blocked'])
data['in_zone'] = (data['zone'] <= 9) & (data['zone'].notna())
data['chase'] = data['swing'] & ~data['in_zone']
# 1. Overall TES Score (large)
ax1 = fig.add_subplot(gs[0, :2])
tes_result = calculate_tunneling_effectiveness_score(data)
tes = tes_result['tunneling_effectiveness_score']
# Create gauge chart
theta = np.linspace(0, np.pi, 100)
r = np.ones(100)
# Color segments
ax1.fill_between(theta[:33], 0, r[:33], color='#d62728', alpha=0.3, label='Below Avg')
ax1.fill_between(theta[33:66], 0, r[33:66], color='#ff7f0e', alpha=0.3, label='Average')
ax1.fill_between(theta[66:], 0, r[66:], color='#2ca02c', alpha=0.3, label='Elite')
# Add needle
angle = (tes / 100) * np.pi
ax1.plot([0, np.cos(angle)], [0, np.sin(angle)], 'k-', linewidth=4)
ax1.plot(0, 0, 'ko', markersize=15)
ax1.set_xlim(-0.1, 1.1)
ax1.set_ylim(0, 1.1)
ax1.axis('off')
ax1.set_title(f'Tunneling Effectiveness Score: {tes:.1f}',
fontsize=16, fontweight='bold', pad=20)
ax1.text(0.5, -0.15, f'Percentile: {tes:.0f}th',
ha='center', fontsize=12, transform=ax1.transData)
# 2. Component breakdown
ax2 = fig.add_subplot(gs[0, 2])
components = ['Chase\nRate', 'Whiff\nRate', 'CSW\nRate', 'Decision\nErrors']
values = [
tes_result['chase_rate'],
tes_result['whiff_rate'],
tes_result['csw_rate'],
tes_result['decision_error_rate']
]
colors_comp = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']
ax2.barh(components, values, color=colors_comp, alpha=0.7)
ax2.set_xlabel('Rate (%)', fontsize=10)
ax2.set_title('Component Metrics', fontsize=11, fontweight='bold')
ax2.grid(axis='x', alpha=0.3)
# 3. Chase rate by pitch type after FF
ax3 = fig.add_subplot(gs[1, 0])
data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
tunnel_data = data[data['prev_pitch'] == 'FF']
chase_by_pitch = tunnel_data.groupby('pitch_type').apply(
lambda x: x['chase'].sum() / (~x['in_zone']).sum() * 100 if (~x['in_zone']).sum() > 0 else 0
).sort_values(ascending=False)
ax3.bar(range(len(chase_by_pitch)), chase_by_pitch.values,
color='steelblue', alpha=0.7)
ax3.set_xticks(range(len(chase_by_pitch)))
ax3.set_xticklabels(chase_by_pitch.index, rotation=45)
ax3.set_ylabel('Chase Rate (%)', fontsize=10)
ax3.set_title('Chase Rate After Fastball', fontsize=11, fontweight='bold')
ax3.grid(axis='y', alpha=0.3)
ax3.axhline(28.5, color='red', linestyle='--', alpha=0.5, label='MLB Avg')
ax3.legend(fontsize=8)
# 4. Whiff rate by count
ax4 = fig.add_subplot(gs[1, 1])
data['count'] = data['balls'].astype(str) + '-' + data['strikes'].astype(str)
whiff_by_count = data.groupby('count').apply(
lambda x: x['whiff'].sum() / x['swing'].sum() * 100 if x['swing'].sum() > 0 else 0
)
# Order counts logically
count_order = ['0-0', '0-1', '0-2', '1-0', '1-1', '1-2',
'2-0', '2-1', '2-2', '3-0', '3-1', '3-2']
whiff_by_count = whiff_by_count.reindex(
[c for c in count_order if c in whiff_by_count.index]
)
ax4.plot(range(len(whiff_by_count)), whiff_by_count.values,
marker='o', linewidth=2, markersize=8, color='darkgreen')
ax4.set_xticks(range(len(whiff_by_count)))
ax4.set_xticklabels(whiff_by_count.index, rotation=45, fontsize=8)
ax4.set_ylabel('Whiff Rate (%)', fontsize=10)
ax4.set_title('Whiff Rate by Count', fontsize=11, fontweight='bold')
ax4.grid(alpha=0.3)
ax4.axhline(24.2, color='red', linestyle='--', alpha=0.5)
# 5. Heat map: swing rate by location after FF
ax5 = fig.add_subplot(gs[1, 2])
tunnel_data_valid = tunnel_data.dropna(subset=['plate_x', 'plate_z'])
# Create 2D histogram
x_bins = np.linspace(-1.5, 1.5, 10)
z_bins = np.linspace(1.0, 4.0, 10)
H, xedges, zedges = np.histogram2d(
tunnel_data_valid['plate_x'],
tunnel_data_valid['plate_z'],
bins=[x_bins, z_bins],
weights=tunnel_data_valid['swing'].astype(float)
)
H_count, _, _ = np.histogram2d(
tunnel_data_valid['plate_x'],
tunnel_data_valid['plate_z'],
bins=[x_bins, z_bins]
)
swing_rate_grid = np.divide(H, H_count, where=H_count > 0,
out=np.zeros_like(H)) * 100
im = ax5.imshow(swing_rate_grid.T, origin='lower',
extent=[-1.5, 1.5, 1.0, 4.0],
aspect='auto', cmap='RdYlGn', vmin=0, vmax=100)
# Add strike zone
from matplotlib.patches import Rectangle
sz = Rectangle((-17/24, 1.5), 17/12, 2.0,
linewidth=2, edgecolor='black',
facecolor='none')
ax5.add_patch(sz)
ax5.set_xlabel('Horizontal (ft)', fontsize=9)
ax5.set_ylabel('Height (ft)', fontsize=9)
ax5.set_title('Swing Rate After FB', fontsize=11, fontweight='bold')
plt.colorbar(im, ax=ax5, label='Swing %')
# 6. Pitch sequence network (simplified)
ax6 = fig.add_subplot(gs[2, :])
data_seq = data.sort_values(['game_date', 'at_bat_number', 'pitch_number'])
sequences = data_seq.groupby(['prev_pitch', 'pitch_type']).size()
sequences = sequences[sequences >= 20].sort_values(ascending=False).head(10)
labels = [f"{prev} → {curr}" for prev, curr in sequences.index]
values = sequences.values
bars = ax6.barh(range(len(sequences)), values, color='coral', alpha=0.7)
ax6.set_yticks(range(len(sequences)))
ax6.set_yticklabels(labels, fontsize=9)
ax6.set_xlabel('Frequency', fontsize=10)
ax6.set_title('Most Common Pitch Sequences', fontsize=11, fontweight='bold')
ax6.grid(axis='x', alpha=0.3)
# Add value labels
for i, (bar, val) in enumerate(zip(bars, values)):
ax6.text(val + 5, i, str(int(val)), va='center', fontsize=9)
plt.suptitle(f'{pitcher_name} - Deception Analytics Dashboard',
fontsize=16, fontweight='bold', y=0.995)
return fig
# Example usage
# tes_score = calculate_tunneling_effectiveness_score(pitches)
# print(f"\nTunneling Effectiveness Score: {tes_score['tunneling_effectiveness_score']:.1f}")
# print(f"Component Breakdown:")
# for k, v in tes_score.items():
# if k != 'tunneling_effectiveness_score':
# print(f" {k}: {v:.1f}%")
#
# fig_dashboard = visualize_deception_dashboard(pitches, "Dylan Cease")
# plt.savefig('cease_deception_dashboard.png', dpi=300, bbox_inches='tight')
Modern pitch tunneling analysis benefits immensely from interactive visualizations that allow exploration of trajectories, release points, and movement profiles from multiple angles.
R Interactive Visualizations with Plotly
library(plotly)
library(htmlwidgets)
# Create interactive 3D pitch trajectory visualization
create_3d_tunnel_plot <- function(pitcher_data, pitcher_name,
pitch_types = c("FF", "SL")) {
# Filter for selected pitch types
plot_data <- pitcher_data %>%
filter(pitch_type %in% pitch_types) %>%
filter(!is.na(vx0) & !is.na(vy0) & !is.na(vz0))
# Calculate trajectories for representative pitches
# (Use median characteristics for each pitch type)
trajectories <- list()
for (ptype in pitch_types) {
subset <- plot_data %>% filter(pitch_type == ptype)
median_pitch <- subset %>%
summarise(across(c(vx0, vy0, vz0, ax, ay, az,
release_pos_x, release_pos_y, release_pos_z),
median, na.rm = TRUE))
# Generate trajectory points
distances <- seq(median_pitch$release_pos_y, 1.417, length.out = 100)
traj_points <- map_df(distances, function(d) {
pos <- calculate_trajectory(median_pitch, d)
tibble(
distance = d,
x = pos['x'],
y = pos['y'],
z = pos['z'],
pitch_type = ptype
)
})
trajectories[[ptype]] <- traj_points
}
all_trajectories <- bind_rows(trajectories)
# Create 3D plot
colors <- c("FF" = "#d62728", "SL" = "#1f77b4", "CU" = "#2ca02c",
"CH" = "#ff7f0e")
p <- plot_ly()
for (ptype in pitch_types) {
traj <- all_trajectories %>% filter(pitch_type == ptype)
p <- p %>% add_trace(
data = traj,
x = ~x,
y = ~y,
z = ~z,
type = "scatter3d",
mode = "lines",
line = list(width = 6, color = colors[ptype]),
name = ptype,
hovertemplate = paste(
"<b>", ptype, "</b><br>",
"Distance from plate: %{y:.1f} ft<br>",
"Horizontal: %{x:.2f} ft<br>",
"Height: %{z:.2f} ft<br>",
"<extra></extra>"
)
)
}
# Add strike zone
sz_x <- c(-0.708/2, 0.708/2, 0.708/2, -0.708/2, -0.708/2)
sz_z <- c(1.5, 1.5, 3.5, 3.5, 1.5)
sz_y <- rep(1.417, 5)
p <- p %>% add_trace(
x = sz_x,
y = sz_y,
z = sz_z,
type = "scatter3d",
mode = "lines",
line = list(color = "black", width = 4),
name = "Strike Zone",
showlegend = TRUE,
hoverinfo = "skip"
)
# Add home plate
plate_x <- c(0, -0.708/2, -0.708/2, 0, 0.708/2, 0.708/2, 0)
plate_y <- c(1.417 + 0.2, 1.417, 1.417 - 0.15, 1.417 - 0.2,
1.417 - 0.15, 1.417, 1.417 + 0.2)
plate_z <- rep(0, 7)
p <- p %>% add_trace(
x = plate_x,
y = plate_y,
z = plate_z,
type = "scatter3d",
mode = "lines",
fill = "toself",
line = list(color = "black", width = 2),
name = "Home Plate",
surfacecolor = "white",
hoverinfo = "skip"
)
# Layout
p <- p %>% layout(
title = list(
text = paste0("<b>", pitcher_name, " - 3D Pitch Tunneling</b><br>",
"<sup>Interactive trajectory visualization</sup>"),
font = list(size = 18)
),
scene = list(
xaxis = list(title = "Horizontal Position (ft)", range = c(-2, 2)),
yaxis = list(title = "Distance from Plate (ft)", range = c(0, 60)),
zaxis = list(title = "Height (ft)", range = c(0, 8)),
camera = list(
eye = list(x = 1.5, y = -1.5, z = 0.8)
),
aspectmode = "manual",
aspectratio = list(x = 1, y = 3, z = 1)
),
legend = list(x = 0.7, y = 0.9),
hovermode = "closest"
)
return(p)
}
# Create interactive plot
p_3d <- create_3d_tunnel_plot(strider_data, "Spencer Strider", c("FF", "SL"))
p_3d
# Save as HTML file
# htmlwidgets::saveWidget(p_3d, "strider_3d_tunnel.html")
# Create interactive release point comparison
create_interactive_release_plot <- function(pitcher_data, pitcher_name) {
plot_data <- pitcher_data %>%
filter(!is.na(release_pos_x) & !is.na(release_pos_z)) %>%
mutate(
release_x_inches = release_pos_x * 12,
release_z_inches = release_pos_z * 12
)
# Calculate statistics for hover info
stats_by_pitch <- plot_data %>%
group_by(pitch_type) %>%
summarise(
n = n(),
mean_x = mean(release_x_inches),
mean_z = mean(release_z_inches),
sd_x = sd(release_x_inches),
sd_z = sd(release_z_inches),
mean_velo = mean(release_speed, na.rm = TRUE),
.groups = 'drop'
)
p <- plot_ly()
# Add scatter points for each pitch type
for (ptype in unique(plot_data$pitch_type)) {
subset <- plot_data %>% filter(pitch_type == ptype)
stats <- stats_by_pitch %>% filter(pitch_type == ptype)
p <- p %>% add_trace(
data = subset,
x = ~release_x_inches,
y = ~release_z_inches,
type = "scatter",
mode = "markers",
name = ptype,
marker = list(size = 5, opacity = 0.3),
hovertemplate = paste(
"<b>", ptype, "</b><br>",
"X: %{x:.1f} in<br>",
"Z: %{y:.1f} in<br>",
"Velo: ", round(subset$release_speed, 1), " mph<br>",
"<extra></extra>"
)
)
# Add mean marker
p <- p %>% add_trace(
x = stats$mean_x,
y = stats$mean_z,
type = "scatter",
mode = "markers",
marker = list(size = 15, symbol = "star",
line = list(color = "black", width = 2)),
name = paste(ptype, "mean"),
showlegend = FALSE,
hovertemplate = paste0(
"<b>", ptype, " Average</b><br>",
"X: ", round(stats$mean_x, 2), " ± ", round(stats$sd_x, 2), " in<br>",
"Z: ", round(stats$mean_z, 2), " ± ", round(stats$sd_z, 2), " in<br>",
"Avg Velo: ", round(stats$mean_velo, 1), " mph<br>",
"Count: ", stats$n, "<br>",
"<extra></extra>"
)
)
}
p <- p %>% layout(
title = list(
text = paste0("<b>", pitcher_name, " - Release Point Consistency</b><br>",
"<sup>Stars indicate average release point for each pitch</sup>"),
font = list(size = 16)
),
xaxis = list(
title = "Horizontal Release Position (inches from center)",
zeroline = TRUE
),
yaxis = list(
title = "Vertical Release Position (inches)",
zeroline = FALSE
),
hovermode = "closest",
legend = list(x = 0.02, y = 0.98)
)
return(p)
}
# Create release point plot
p_release_interactive <- create_interactive_release_plot(strider_data,
"Spencer Strider")
p_release_interactive
# Create interactive movement profile plot
create_interactive_movement_plot <- function(pitcher_data, pitcher_name) {
plot_data <- pitcher_data %>%
filter(!is.na(pfx_x) & !is.na(pfx_z)) %>%
mutate(
hb_inches = pfx_x * 12,
ivb_inches = pfx_z * 12
)
p <- plot_ly()
for (ptype in unique(plot_data$pitch_type)) {
subset <- plot_data %>% filter(pitch_type == ptype)
p <- p %>% add_trace(
data = subset,
x = ~hb_inches,
y = ~ivb_inches,
type = "scatter",
mode = "markers",
name = ptype,
marker = list(
size = ~release_speed / 5, # Size by velocity
opacity = 0.4,
line = list(width = 0.5, color = "white")
),
text = ~paste0(
"Pitch: ", pitch_type, "<br>",
"Velo: ", round(release_speed, 1), " mph<br>",
"HB: ", round(hb_inches, 1), " in<br>",
"IVB: ", round(ivb_inches, 1), " in<br>",
"Spin: ", round(release_spin_rate, 0), " rpm"
),
hovertemplate = "%{text}<extra></extra>"
)
}
# Add reference lines
p <- p %>%
add_segments(x = 0, xend = 0, y = -25, yend = 25,
line = list(dash = "dash", color = "gray"),
showlegend = FALSE, hoverinfo = "skip") %>%
add_segments(x = -25, xend = 25, y = 0, yend = 0,
line = list(dash = "dash", color = "gray"),
showlegend = FALSE, hoverinfo = "skip")
p <- p %>% layout(
title = list(
text = paste0("<b>", pitcher_name, " - Pitch Movement Profile</b><br>",
"<sup>Marker size scaled by velocity</sup>"),
font = list(size = 16)
),
xaxis = list(
title = "Horizontal Break (inches, - = glove side)",
range = c(-25, 25)
),
yaxis = list(
title = "Induced Vertical Break (inches, + = rise)",
range = c(-25, 25)
),
hovermode = "closest"
)
return(p)
}
p_movement_interactive <- create_interactive_movement_plot(strider_data,
"Spencer Strider")
p_movement_interactive
Python Interactive Visualizations
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
def create_interactive_tunnel_viz(pitcher_data, pitcher_name):
"""
Create comprehensive interactive tunneling dashboard
"""
# Filter valid data
valid = pitcher_data.dropna(subset=['pfx_x', 'pfx_z', 'pitch_type'])
# Create subplots
fig = make_subplots(
rows=2, cols=2,
subplot_titles=('Movement Profile', 'Release Points',
'Velocity Distribution', 'Break by Count'),
specs=[[{'type': 'scatter'}, {'type': 'scatter'}],
[{'type': 'box'}, {'type': 'bar'}]]
)
pitch_types = valid['pitch_type'].value_counts().head(5).index.tolist()
colors = px.colors.qualitative.Set2
# Plot 1: Movement profile
for i, ptype in enumerate(pitch_types):
subset = valid[valid['pitch_type'] == ptype]
fig.add_trace(
go.Scatter(
x=subset['pfx_x'] * 12,
y=subset['pfx_z'] * 12,
mode='markers',
name=ptype,
marker=dict(size=5, opacity=0.4, color=colors[i]),
hovertemplate=(
f'<b>{ptype}</b><br>' +
'HB: %{x:.1f} in<br>' +
'IVB: %{y:.1f} in<br>' +
'<extra></extra>'
),
legendgroup=ptype
),
row=1, col=1
)
# Add reference lines
fig.add_hline(y=0, line_dash="dash", line_color="gray",
opacity=0.5, row=1, col=1)
fig.add_vline(x=0, line_dash="dash", line_color="gray",
opacity=0.5, row=1, col=1)
# Plot 2: Release points
valid_release = valid.dropna(subset=['release_pos_x', 'release_pos_z'])
for i, ptype in enumerate(pitch_types):
subset = valid_release[valid_release['pitch_type'] == ptype]
fig.add_trace(
go.Scatter(
x=subset['release_pos_x'] * 12,
y=subset['release_pos_z'] * 12,
mode='markers',
name=ptype,
marker=dict(size=4, opacity=0.3, color=colors[i]),
showlegend=False,
hovertemplate=(
f'<b>{ptype}</b><br>' +
'X: %{x:.1f} in<br>' +
'Z: %{y:.1f} in<br>' +
'<extra></extra>'
),
legendgroup=ptype
),
row=1, col=2
)
# Plot 3: Velocity distribution
for i, ptype in enumerate(pitch_types):
subset = valid[valid['pitch_type'] == ptype]
fig.add_trace(
go.Box(
y=subset['release_speed'],
name=ptype,
marker_color=colors[i],
showlegend=False,
legendgroup=ptype
),
row=2, col=1
)
# Plot 4: Break by count
valid['count'] = valid['balls'].astype(str) + '-' + valid['strikes'].astype(str)
# Calculate total break by count for primary pitch
primary_pitch = pitch_types[0]
primary_data = valid[valid['pitch_type'] == primary_pitch]
break_by_count = primary_data.groupby('count').apply(
lambda x: np.sqrt((x['pfx_x'] * 12) ** 2 + (x['pfx_z'] * 12) ** 2).mean()
).sort_index()
fig.add_trace(
go.Bar(
x=break_by_count.index,
y=break_by_count.values,
marker_color=colors[0],
name=f'{primary_pitch} Break',
showlegend=False,
hovertemplate=(
'Count: %{x}<br>' +
'Avg Break: %{y:.1f} in<br>' +
'<extra></extra>'
)
),
row=2, col=2
)
# Update axes
fig.update_xaxes(title_text="Horizontal Break (in)", row=1, col=1)
fig.update_yaxes(title_text="Induced Vertical Break (in)", row=1, col=1)
fig.update_xaxes(title_text="Release X (in)", row=1, col=2)
fig.update_yaxes(title_text="Release Z (in)", row=1, col=2)
fig.update_yaxes(title_text="Velocity (mph)", row=2, col=1)
fig.update_xaxes(title_text="Count", row=2, col=2)
fig.update_yaxes(title_text="Total Break (in)", row=2, col=2)
# Update layout
fig.update_layout(
title_text=f"{pitcher_name} - Interactive Tunneling Dashboard",
title_font_size=18,
height=900,
showlegend=True,
hovermode='closest'
)
return fig
# Example: create interactive visualization
# fig_interactive = create_interactive_tunnel_viz(pitches, "Dylan Cease")
# fig_interactive.show()
# fig_interactive.write_html("cease_interactive_dashboard.html")
library(plotly)
library(htmlwidgets)
# Create interactive 3D pitch trajectory visualization
create_3d_tunnel_plot <- function(pitcher_data, pitcher_name,
pitch_types = c("FF", "SL")) {
# Filter for selected pitch types
plot_data <- pitcher_data %>%
filter(pitch_type %in% pitch_types) %>%
filter(!is.na(vx0) & !is.na(vy0) & !is.na(vz0))
# Calculate trajectories for representative pitches
# (Use median characteristics for each pitch type)
trajectories <- list()
for (ptype in pitch_types) {
subset <- plot_data %>% filter(pitch_type == ptype)
median_pitch <- subset %>%
summarise(across(c(vx0, vy0, vz0, ax, ay, az,
release_pos_x, release_pos_y, release_pos_z),
median, na.rm = TRUE))
# Generate trajectory points
distances <- seq(median_pitch$release_pos_y, 1.417, length.out = 100)
traj_points <- map_df(distances, function(d) {
pos <- calculate_trajectory(median_pitch, d)
tibble(
distance = d,
x = pos['x'],
y = pos['y'],
z = pos['z'],
pitch_type = ptype
)
})
trajectories[[ptype]] <- traj_points
}
all_trajectories <- bind_rows(trajectories)
# Create 3D plot
colors <- c("FF" = "#d62728", "SL" = "#1f77b4", "CU" = "#2ca02c",
"CH" = "#ff7f0e")
p <- plot_ly()
for (ptype in pitch_types) {
traj <- all_trajectories %>% filter(pitch_type == ptype)
p <- p %>% add_trace(
data = traj,
x = ~x,
y = ~y,
z = ~z,
type = "scatter3d",
mode = "lines",
line = list(width = 6, color = colors[ptype]),
name = ptype,
hovertemplate = paste(
"<b>", ptype, "</b><br>",
"Distance from plate: %{y:.1f} ft<br>",
"Horizontal: %{x:.2f} ft<br>",
"Height: %{z:.2f} ft<br>",
"<extra></extra>"
)
)
}
# Add strike zone
sz_x <- c(-0.708/2, 0.708/2, 0.708/2, -0.708/2, -0.708/2)
sz_z <- c(1.5, 1.5, 3.5, 3.5, 1.5)
sz_y <- rep(1.417, 5)
p <- p %>% add_trace(
x = sz_x,
y = sz_y,
z = sz_z,
type = "scatter3d",
mode = "lines",
line = list(color = "black", width = 4),
name = "Strike Zone",
showlegend = TRUE,
hoverinfo = "skip"
)
# Add home plate
plate_x <- c(0, -0.708/2, -0.708/2, 0, 0.708/2, 0.708/2, 0)
plate_y <- c(1.417 + 0.2, 1.417, 1.417 - 0.15, 1.417 - 0.2,
1.417 - 0.15, 1.417, 1.417 + 0.2)
plate_z <- rep(0, 7)
p <- p %>% add_trace(
x = plate_x,
y = plate_y,
z = plate_z,
type = "scatter3d",
mode = "lines",
fill = "toself",
line = list(color = "black", width = 2),
name = "Home Plate",
surfacecolor = "white",
hoverinfo = "skip"
)
# Layout
p <- p %>% layout(
title = list(
text = paste0("<b>", pitcher_name, " - 3D Pitch Tunneling</b><br>",
"<sup>Interactive trajectory visualization</sup>"),
font = list(size = 18)
),
scene = list(
xaxis = list(title = "Horizontal Position (ft)", range = c(-2, 2)),
yaxis = list(title = "Distance from Plate (ft)", range = c(0, 60)),
zaxis = list(title = "Height (ft)", range = c(0, 8)),
camera = list(
eye = list(x = 1.5, y = -1.5, z = 0.8)
),
aspectmode = "manual",
aspectratio = list(x = 1, y = 3, z = 1)
),
legend = list(x = 0.7, y = 0.9),
hovermode = "closest"
)
return(p)
}
# Create interactive plot
p_3d <- create_3d_tunnel_plot(strider_data, "Spencer Strider", c("FF", "SL"))
p_3d
# Save as HTML file
# htmlwidgets::saveWidget(p_3d, "strider_3d_tunnel.html")
# Create interactive release point comparison
create_interactive_release_plot <- function(pitcher_data, pitcher_name) {
plot_data <- pitcher_data %>%
filter(!is.na(release_pos_x) & !is.na(release_pos_z)) %>%
mutate(
release_x_inches = release_pos_x * 12,
release_z_inches = release_pos_z * 12
)
# Calculate statistics for hover info
stats_by_pitch <- plot_data %>%
group_by(pitch_type) %>%
summarise(
n = n(),
mean_x = mean(release_x_inches),
mean_z = mean(release_z_inches),
sd_x = sd(release_x_inches),
sd_z = sd(release_z_inches),
mean_velo = mean(release_speed, na.rm = TRUE),
.groups = 'drop'
)
p <- plot_ly()
# Add scatter points for each pitch type
for (ptype in unique(plot_data$pitch_type)) {
subset <- plot_data %>% filter(pitch_type == ptype)
stats <- stats_by_pitch %>% filter(pitch_type == ptype)
p <- p %>% add_trace(
data = subset,
x = ~release_x_inches,
y = ~release_z_inches,
type = "scatter",
mode = "markers",
name = ptype,
marker = list(size = 5, opacity = 0.3),
hovertemplate = paste(
"<b>", ptype, "</b><br>",
"X: %{x:.1f} in<br>",
"Z: %{y:.1f} in<br>",
"Velo: ", round(subset$release_speed, 1), " mph<br>",
"<extra></extra>"
)
)
# Add mean marker
p <- p %>% add_trace(
x = stats$mean_x,
y = stats$mean_z,
type = "scatter",
mode = "markers",
marker = list(size = 15, symbol = "star",
line = list(color = "black", width = 2)),
name = paste(ptype, "mean"),
showlegend = FALSE,
hovertemplate = paste0(
"<b>", ptype, " Average</b><br>",
"X: ", round(stats$mean_x, 2), " ± ", round(stats$sd_x, 2), " in<br>",
"Z: ", round(stats$mean_z, 2), " ± ", round(stats$sd_z, 2), " in<br>",
"Avg Velo: ", round(stats$mean_velo, 1), " mph<br>",
"Count: ", stats$n, "<br>",
"<extra></extra>"
)
)
}
p <- p %>% layout(
title = list(
text = paste0("<b>", pitcher_name, " - Release Point Consistency</b><br>",
"<sup>Stars indicate average release point for each pitch</sup>"),
font = list(size = 16)
),
xaxis = list(
title = "Horizontal Release Position (inches from center)",
zeroline = TRUE
),
yaxis = list(
title = "Vertical Release Position (inches)",
zeroline = FALSE
),
hovermode = "closest",
legend = list(x = 0.02, y = 0.98)
)
return(p)
}
# Create release point plot
p_release_interactive <- create_interactive_release_plot(strider_data,
"Spencer Strider")
p_release_interactive
# Create interactive movement profile plot
create_interactive_movement_plot <- function(pitcher_data, pitcher_name) {
plot_data <- pitcher_data %>%
filter(!is.na(pfx_x) & !is.na(pfx_z)) %>%
mutate(
hb_inches = pfx_x * 12,
ivb_inches = pfx_z * 12
)
p <- plot_ly()
for (ptype in unique(plot_data$pitch_type)) {
subset <- plot_data %>% filter(pitch_type == ptype)
p <- p %>% add_trace(
data = subset,
x = ~hb_inches,
y = ~ivb_inches,
type = "scatter",
mode = "markers",
name = ptype,
marker = list(
size = ~release_speed / 5, # Size by velocity
opacity = 0.4,
line = list(width = 0.5, color = "white")
),
text = ~paste0(
"Pitch: ", pitch_type, "<br>",
"Velo: ", round(release_speed, 1), " mph<br>",
"HB: ", round(hb_inches, 1), " in<br>",
"IVB: ", round(ivb_inches, 1), " in<br>",
"Spin: ", round(release_spin_rate, 0), " rpm"
),
hovertemplate = "%{text}<extra></extra>"
)
}
# Add reference lines
p <- p %>%
add_segments(x = 0, xend = 0, y = -25, yend = 25,
line = list(dash = "dash", color = "gray"),
showlegend = FALSE, hoverinfo = "skip") %>%
add_segments(x = -25, xend = 25, y = 0, yend = 0,
line = list(dash = "dash", color = "gray"),
showlegend = FALSE, hoverinfo = "skip")
p <- p %>% layout(
title = list(
text = paste0("<b>", pitcher_name, " - Pitch Movement Profile</b><br>",
"<sup>Marker size scaled by velocity</sup>"),
font = list(size = 16)
),
xaxis = list(
title = "Horizontal Break (inches, - = glove side)",
range = c(-25, 25)
),
yaxis = list(
title = "Induced Vertical Break (inches, + = rise)",
range = c(-25, 25)
),
hovermode = "closest"
)
return(p)
}
p_movement_interactive <- create_interactive_movement_plot(strider_data,
"Spencer Strider")
p_movement_interactive
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
def create_interactive_tunnel_viz(pitcher_data, pitcher_name):
"""
Create comprehensive interactive tunneling dashboard
"""
# Filter valid data
valid = pitcher_data.dropna(subset=['pfx_x', 'pfx_z', 'pitch_type'])
# Create subplots
fig = make_subplots(
rows=2, cols=2,
subplot_titles=('Movement Profile', 'Release Points',
'Velocity Distribution', 'Break by Count'),
specs=[[{'type': 'scatter'}, {'type': 'scatter'}],
[{'type': 'box'}, {'type': 'bar'}]]
)
pitch_types = valid['pitch_type'].value_counts().head(5).index.tolist()
colors = px.colors.qualitative.Set2
# Plot 1: Movement profile
for i, ptype in enumerate(pitch_types):
subset = valid[valid['pitch_type'] == ptype]
fig.add_trace(
go.Scatter(
x=subset['pfx_x'] * 12,
y=subset['pfx_z'] * 12,
mode='markers',
name=ptype,
marker=dict(size=5, opacity=0.4, color=colors[i]),
hovertemplate=(
f'<b>{ptype}</b><br>' +
'HB: %{x:.1f} in<br>' +
'IVB: %{y:.1f} in<br>' +
'<extra></extra>'
),
legendgroup=ptype
),
row=1, col=1
)
# Add reference lines
fig.add_hline(y=0, line_dash="dash", line_color="gray",
opacity=0.5, row=1, col=1)
fig.add_vline(x=0, line_dash="dash", line_color="gray",
opacity=0.5, row=1, col=1)
# Plot 2: Release points
valid_release = valid.dropna(subset=['release_pos_x', 'release_pos_z'])
for i, ptype in enumerate(pitch_types):
subset = valid_release[valid_release['pitch_type'] == ptype]
fig.add_trace(
go.Scatter(
x=subset['release_pos_x'] * 12,
y=subset['release_pos_z'] * 12,
mode='markers',
name=ptype,
marker=dict(size=4, opacity=0.3, color=colors[i]),
showlegend=False,
hovertemplate=(
f'<b>{ptype}</b><br>' +
'X: %{x:.1f} in<br>' +
'Z: %{y:.1f} in<br>' +
'<extra></extra>'
),
legendgroup=ptype
),
row=1, col=2
)
# Plot 3: Velocity distribution
for i, ptype in enumerate(pitch_types):
subset = valid[valid['pitch_type'] == ptype]
fig.add_trace(
go.Box(
y=subset['release_speed'],
name=ptype,
marker_color=colors[i],
showlegend=False,
legendgroup=ptype
),
row=2, col=1
)
# Plot 4: Break by count
valid['count'] = valid['balls'].astype(str) + '-' + valid['strikes'].astype(str)
# Calculate total break by count for primary pitch
primary_pitch = pitch_types[0]
primary_data = valid[valid['pitch_type'] == primary_pitch]
break_by_count = primary_data.groupby('count').apply(
lambda x: np.sqrt((x['pfx_x'] * 12) ** 2 + (x['pfx_z'] * 12) ** 2).mean()
).sort_index()
fig.add_trace(
go.Bar(
x=break_by_count.index,
y=break_by_count.values,
marker_color=colors[0],
name=f'{primary_pitch} Break',
showlegend=False,
hovertemplate=(
'Count: %{x}<br>' +
'Avg Break: %{y:.1f} in<br>' +
'<extra></extra>'
)
),
row=2, col=2
)
# Update axes
fig.update_xaxes(title_text="Horizontal Break (in)", row=1, col=1)
fig.update_yaxes(title_text="Induced Vertical Break (in)", row=1, col=1)
fig.update_xaxes(title_text="Release X (in)", row=1, col=2)
fig.update_yaxes(title_text="Release Z (in)", row=1, col=2)
fig.update_yaxes(title_text="Velocity (mph)", row=2, col=1)
fig.update_xaxes(title_text="Count", row=2, col=2)
fig.update_yaxes(title_text="Total Break (in)", row=2, col=2)
# Update layout
fig.update_layout(
title_text=f"{pitcher_name} - Interactive Tunneling Dashboard",
title_font_size=18,
height=900,
showlegend=True,
hovermode='closest'
)
return fig
# Example: create interactive visualization
# fig_interactive = create_interactive_tunnel_viz(pitches, "Dylan Cease")
# fig_interactive.show()
# fig_interactive.write_html("cease_interactive_dashboard.html")
Exercise 1: Calculate Basic Tunnel Points (Easy)
Using Statcast data for any pitcher of your choice:
- Download one game's worth of pitch data
- Calculate the tunnel point for their fastball-slider combination
- Identify the separation distance at the tunnel point
- Create a simple 2D visualization of the horizontal separation over distance
Expected output: A tunnel point distance (in feet from plate) and a plot showing pitch convergence/divergence.
Exercise 2: Release Point Consistency Analysis (Easy)
For a pitcher with at least 500 pitches in 2023:
- Calculate the standard deviation of release points (x, y, z) for each pitch type
- Create a release point consistency matrix showing distances between average release points
- Identify which pitch types have the most consistent release points
- Determine if release consistency correlates with whiff rate
Hint: Lower standard deviation indicates better consistency.
Exercise 3: Movement Profile Comparison (Medium)
Compare the movement profiles of three elite pitchers (e.g., Strider, Cease, Ohtani):
- Calculate average horizontal and vertical break for each pitcher's primary pitches
- Create a break differential matrix for each pitcher
- Identify which pitcher has the greatest total break differential between their fastball and slider
- Visualize all three pitchers' movement profiles on the same plot
Deliverable: A comparison table and a multi-pitcher movement plot.
Exercise 4: Pitch Sequencing Effectiveness (Medium)
Analyze pitch sequencing patterns for an elite strikeout pitcher:
- Calculate two-pitch sequence frequencies
- Determine which sequences have the highest whiff rates
- Identify count-specific sequencing patterns (e.g., two-strike approaches)
- Create a sequence flow diagram showing the most common transitions
Challenge: Weight sequences by game leverage to find high-pressure patterns.
Exercise 5: Custom Tunneling Score (Hard)
Develop your own tunneling effectiveness metric that combines:
- Tunnel point distance (later = better)
- Release point consistency (tighter = better)
- Late movement differential (greater = better)
- Chase rate on breaking balls after fastballs
Create a scoring system that produces a single tunneling grade (0-100) and apply it to 10 starting pitchers. Validate your metric by correlating it with strikeout rate or ERA.
Bonus: Create percentile rankings and identify which pitchers over/underperform their tunneling score.
Exercise 6: Trajectory Reconstruction (Hard)
Using the physics equations and Statcast's initial conditions:
- Write a complete trajectory calculator that takes vx0, vy0, vz0, ax, ay, az, and release position
- Generate 100 points along a pitch's trajectory from release to plate
- Calculate the exact point where two pitches diverge by more than 3 inches
- Create an animated visualization showing both pitches traveling toward the plate
Advanced: Add spin-based trajectory calculations to account for Magnus force effects.
Exercise 7: League-Wide Tunneling Analysis (Hard)
Perform a comprehensive analysis of tunneling across MLB:
- Download pitch data for all starting pitchers with 100+ innings in 2023
- Calculate tunneling metrics for each pitcher's primary pitch pairs
- Identify cluster groups of pitchers with similar tunneling profiles
- Determine if tunneling effectiveness correlates with:
- Strikeout rate
- Walk rate
- ERA
- Opponent batting average
Create a research report with visualizations showing the relationship between tunneling and pitcher success.
Expected deliverable: A 1000-word analysis with at least 5 visualizations and statistical significance tests.
Exercise 8: Real-Time Tunnel Detector (Expert)
Build a system that processes live game data and identifies exceptional tunneling sequences:
- Set up a data pipeline that ingests pitch-by-pitch data
- Calculate tunnel points in real-time for consecutive pitches
- Flag sequences where tunnel points occur within 15 feet of the plate
- Generate automatic alerts for "elite tunneling moments"
- Create a dashboard that displays these sequences with trajectory visualizations
Technologies: Consider using Streamlit, Dash, or Shiny for the dashboard.
Summary
Pitch tunneling represents the cutting edge of pitching analytics, combining physics, psychology, and data science to understand one of baseball's most subtle arts. Elite pitchers like Spencer Strider, Dylan Cease, and Shohei Ohtani succeed not just because they throw hard or spin the ball well, but because they create impossible decision points for hitters.
Key takeaways from this chapter:
- Tunneling is about time: Hitters must decide before they can distinguish pitches
- Release point consistency is foundational: Even slight variations telegraph pitch type
- Movement differentials create deception: Similar early paths, divergent endpoints
- Sequencing amplifies tunneling: Strategic pitch ordering maximizes deception
- Modern analytics quantify the unquantifiable: What was once "feel" is now measurable
As you apply these concepts, remember that data illuminates but doesn't dictate. The best pitchers blend analytics with artistry, using numbers to refine their craft while maintaining the creativity that makes baseball beautiful.
The code examples in this chapter provide a starting point for your own tunneling research. Whether you're a coach seeking competitive advantages, an analyst building predictive models, or a fan deepening your appreciation of the game, understanding pitch tunneling opens new dimensions in baseball analysis.
Further Reading:
- Mike Fast's original tunneling research (Baseball Prospectus, 2011)
- Dr. Peter Fadde's work on pitch recognition and decision-making
- MLB's Statcast research on pitch tracking accuracy
- Research papers on visual perception in sports
Continue exploring, questioning, and analyzing. The future of pitching analytics is limited only by imagination and computing power.