Chapter 26: Pitch Tunneling & Deception Analytics

Pitch tunneling represents one of the most sophisticated concepts in modern pitching analytics. At its core, tunneling refers to the ability of a pitcher to make different pitches appear identical for as long as possible before they diverge toward their final destinations. This optical illusion creates a decision-making crisis for hitters, who must commit to a swing before they can reliably distinguish between pitches.

Advanced ~5 min read 8 sections 14 code examples
Book Progress
50%
Chapter 27 of 54
What You'll Learn
  • The Science of Pitch Deception
  • Tunnel Point Analysis
  • Release Point Consistency
  • Induced Vertical and Horizontal Break Differences
  • And 4 more topics...
Languages in This Chapter
R (8) Python (6)

All code examples can be copied and run in your environment.

26.1 The Science of Pitch Deception

Pitch tunneling represents one of the most sophisticated concepts in modern pitching analytics. At its core, tunneling refers to the ability of a pitcher to make different pitches appear identical for as long as possible before they diverge toward their final destinations. This optical illusion creates a decision-making crisis for hitters, who must commit to a swing before they can reliably distinguish between pitches.

The Neuroscience of Hitting

Understanding pitch tunneling requires appreciation of the severe time constraints faced by hitters. A 95 mph fastball travels from the pitcher's hand to home plate in approximately 400 milliseconds. However, the visual processing and motor execution required to swing a bat takes roughly 200-250 milliseconds. This means hitters must begin their swing decision when the ball is roughly halfway to the plate—around 30 feet from release.

Research by Dr. Peter Fadde and others has shown that elite hitters rely heavily on early ball flight cues to predict pitch location and type. They track the ball out of the pitcher's hand and use the first 15-20 feet of flight to make their swing decision. This is where tunneling becomes devastating: if two pitches occupy the same visual space during this critical recognition window, the hitter cannot reliably distinguish between them until it's too late to adjust.

The Tunnel Point Concept

The "tunnel point" is the location along the pitch trajectory where different pitches begin to separate visibly. For effective tunneling, this point should occur as late as possible—ideally beyond the point where hitters must commit to their swing. A typical tunnel point for elite pitchers occurs around 20-25 feet from home plate, though some exceptional pitchers achieve separation points even closer to the plate.

Consider a pitcher throwing a four-seam fastball and a slider with identical release points and similar trajectories for the first 25 feet. At the tunnel point, the slider begins to break sharply away while the fastball continues on its straighter path. The hitter, having committed to one pitch type based on early flight cues, finds themselves helpless against the other.

Key Components of Effective Tunneling

Several factors contribute to successful pitch tunneling:

Release Point Consistency: Pitches must emerge from the same release point (within inches) to appear identical initially. Inconsistent release points give hitters early visual cues about pitch type.

Matching Early Trajectories: Tunneling pairs should follow nearly identical paths through the recognition zone (first 15-25 feet).

Sufficient Late Differentiation: After the tunnel point, pitches must separate enough to cover different zones, forcing the hitter to commit to one or the other.

Complementary Movement Profiles: The best tunneling pairs have movement that diverges in different planes—fastball with vertical rise paired with a slider breaking horizontally and down.

Historical Context

While pitchers have intuitively understood deception for over a century, the analytical framework for tunneling emerged only recently. The term "tunneling" was popularized by baseball analyst Mike Fast in the early 2010s, leveraging pitch tracking data from PITCHf/x and later Statcast. This data revolution allowed quantification of concepts that pitchers and coaches had only vaguely articulated previously.

Old-school pitching coaches would tell their pitchers to "make your changeup look like your fastball" or "throw everything from the same slot." These were intuitive approximations of tunneling principles. Modern technology allows us to measure exactly how well pitchers execute these concepts and identify specific improvements.


26.2 Tunnel Point Analysis

Calculating tunnel points requires three-dimensional pitch trajectory data, which Statcast provides for every pitch in MLB. The analysis involves comparing the flight paths of different pitch types and identifying where they diverge beyond a perceptual threshold.

Mathematical Framework

For two pitches with trajectories described by position vectors p₁(t) and p₂(t), the tunnel point occurs at the earliest time t_tunnel where:

||p₁(t) - p₂(t)|| > δ_perceptual

Where δ_perceptual represents the minimum distance (typically 1-3 inches) that a hitter can reliably detect given the ball's speed and distance from the plate.

Implementing Tunnel Point Calculation in R

Let's analyze Spencer Strider's exceptional fastball-slider tunnel using pitch-by-pitch Statcast data:

# Load required libraries
library(baseballr)
library(tidyverse)
library(plotly)

# Function to calculate pitch trajectory at any point
calculate_trajectory <- function(pitch_data, distance_from_plate) {
  # Extract initial conditions
  vx0 <- pitch_data$vx0
  vy0 <- pitch_data$vy0
  vz0 <- pitch_data$vz0

  ax <- pitch_data$ax
  ay <- pitch_data$ay
  az <- pitch_data$az

  x0 <- pitch_data$release_pos_x
  y0 <- pitch_data$release_pos_y
  z0 <- pitch_data$release_pos_z

  # Calculate time to reach the distance
  # Using quadratic formula for: distance = y0 + vy0*t + 0.5*ay*t^2
  y_target <- distance_from_plate
  a <- 0.5 * ay
  b <- vy0
  c <- y0 - y_target

  # Take the positive root (forward in time)
  t <- (-b - sqrt(b^2 - 4*a*c)) / (2*a)

  # Calculate x, y, z positions at this time
  x <- x0 + vx0*t + 0.5*ax*t^2
  y <- y0 + vy0*t + 0.5*ay*t^2
  z <- z0 + vz0*t + 0.5*az*t^2

  return(c(x = x, y = y, z = z))
}

# Function to find tunnel point between two pitch types
find_tunnel_point <- function(pitch1_data, pitch2_data,
                               threshold_inches = 2,
                               start_dist = 50,
                               end_dist = 5) {

  distances <- seq(start_dist, end_dist, by = -0.5)

  for (dist in distances) {
    pos1 <- calculate_trajectory(pitch1_data, dist)
    pos2 <- calculate_trajectory(pitch2_data, dist)

    # Calculate 3D Euclidean distance in inches
    separation <- sqrt((pos1['x'] - pos2['x'])^2 +
                      (pos1['z'] - pos2['z'])^2) * 12

    if (separation > threshold_inches) {
      return(list(
        tunnel_distance = dist,
        separation_inches = separation,
        position1 = pos1,
        position2 = pos2
      ))
    }
  }

  return(NULL)
}

# Get Spencer Strider's 2023 data
strider_id <- 675911

# Fetch Statcast data for 2023 season
strider_data <- statcast_search_pitchers(
  start_date = "2023-04-01",
  end_date = "2023-09-30",
  pitcherid = strider_id
)

# Filter for fastballs and sliders
ff_slider <- strider_data %>%
  filter(pitch_type %in% c("FF", "SL")) %>%
  filter(!is.na(release_pos_x) & !is.na(vx0))

# Get average pitch characteristics
avg_fastball <- ff_slider %>%
  filter(pitch_type == "FF") %>%
  summarise(across(c(vx0, vy0, vz0, ax, ay, az,
                     release_pos_x, release_pos_y, release_pos_z,
                     release_speed, release_spin_rate,
                     pfx_x, pfx_z), mean, na.rm = TRUE))

avg_slider <- ff_slider %>%
  filter(pitch_type == "SL") %>%
  summarise(across(c(vx0, vy0, vz0, ax, ay, az,
                     release_pos_x, release_pos_y, release_pos_z,
                     release_speed, release_spin_rate,
                     pfx_x, pfx_z), mean, na.rm = TRUE))

# Calculate tunnel point
tunnel_result <- find_tunnel_point(avg_fastball, avg_slider)

cat("Spencer Strider FF-SL Tunnel Analysis\n")
cat("=====================================\n")
cat(sprintf("Tunnel Point: %.1f feet from home plate\n",
            tunnel_result$tunnel_distance))
cat(sprintf("Separation at tunnel: %.2f inches\n",
            tunnel_result$separation_inches))
cat(sprintf("Fastball velocity: %.1f mph\n", avg_fastball$release_speed))
cat(sprintf("Slider velocity: %.1f mph\n", avg_slider$release_speed))
cat(sprintf("Velocity differential: %.1f mph\n",
            avg_fastball$release_speed - avg_slider$release_speed))

# Visualize the tunnel
visualize_tunnel <- function(pitch1, pitch2, pitch1_name, pitch2_name) {
  distances <- seq(55, 0, by = -0.5)

  trajectory_data <- map_df(distances, function(d) {
    if (d > 0) {
      pos1 <- calculate_trajectory(pitch1, d)
      pos2 <- calculate_trajectory(pitch2, d)

      tibble(
        distance = d,
        x_ff = pos1['x'],
        z_ff = pos1['z'],
        x_sl = pos2['x'],
        z_sl = pos2['z']
      )
    }
  })

  p <- ggplot(trajectory_data) +
    geom_path(aes(x = distance, y = x_ff * 12, color = pitch1_name),
              size = 1.2) +
    geom_path(aes(x = distance, y = x_sl * 12, color = pitch2_name),
              size = 1.2) +
    scale_color_manual(values = c("FF" = "#d62728", "SL" = "#1f77b4")) +
    labs(title = "Pitch Tunneling: Horizontal Movement",
         subtitle = "Spencer Strider Fastball vs Slider",
         x = "Distance from Home Plate (feet)",
         y = "Horizontal Position (inches from center)",
         color = "Pitch Type") +
    theme_minimal() +
    theme(legend.position = "bottom")

  return(p)
}

Python Implementation with Pybaseball

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pybaseball import statcast_pitcher
from pybaseball import playerid_lookup
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns

# Configure plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 8)

class PitchTrajectory:
    """Calculate and analyze pitch trajectories from Statcast data"""

    def __init__(self, pitch_data):
        """
        Initialize with pitch data containing:
        vx0, vy0, vz0, ax, ay, az, release_pos_x, release_pos_y, release_pos_z
        """
        self.vx0 = pitch_data['vx0']
        self.vy0 = pitch_data['vy0']
        self.vz0 = pitch_data['vz0']

        self.ax = pitch_data['ax']
        self.ay = pitch_data['ay']
        self.az = pitch_data['az']

        self.x0 = pitch_data['release_pos_x']
        self.y0 = pitch_data['release_pos_y']
        self.z0 = pitch_data['release_pos_z']

    def position_at_distance(self, y_distance):
        """
        Calculate x, y, z position when ball is at y_distance from plate

        Parameters:
        -----------
        y_distance : float
            Distance from home plate in feet (positive = toward pitcher)

        Returns:
        --------
        tuple : (x, y, z) position in feet
        """
        # Solve for time using quadratic formula
        # y(t) = y0 + vy0*t + 0.5*ay*t^2
        a = 0.5 * self.ay
        b = self.vy0
        c = self.y0 - y_distance

        discriminant = b**2 - 4*a*c
        if discriminant < 0:
            return None

        t = (-b - np.sqrt(discriminant)) / (2*a)

        if t < 0:
            return None

        x = self.x0 + self.vx0*t + 0.5*self.ax*t**2
        y = self.y0 + self.vy0*t + 0.5*self.ay*t**2
        z = self.z0 + self.vz0*t + 0.5*self.az*t**2

        return (x, y, z)

    def full_trajectory(self, num_points=100):
        """Generate complete trajectory from release to plate"""
        distances = np.linspace(self.y0, 1.417, num_points)  # 1.417 ft = 17 inches
        trajectory = []

        for d in distances:
            pos = self.position_at_distance(d)
            if pos:
                trajectory.append(pos)

        return np.array(trajectory)


def find_tunnel_point(traj1, traj2, threshold_inches=2.0):
    """
    Find where two pitch trajectories separate beyond threshold

    Parameters:
    -----------
    traj1, traj2 : PitchTrajectory objects
    threshold_inches : float
        Separation threshold in inches

    Returns:
    --------
    dict : tunnel point information
    """
    # Check distances from 50 feet to 5 feet
    distances = np.arange(50, 5, -0.5)

    for dist in distances:
        pos1 = traj1.position_at_distance(dist)
        pos2 = traj2.position_at_distance(dist)

        if pos1 is None or pos2 is None:
            continue

        # Calculate horizontal and vertical separation
        horiz_sep = np.sqrt((pos1[0] - pos2[0])**2) * 12  # Convert to inches
        vert_sep = abs(pos1[2] - pos2[2]) * 12
        total_sep = np.sqrt(horiz_sep**2 + vert_sep**2)

        if total_sep > threshold_inches:
            return {
                'tunnel_distance': dist,
                'total_separation': total_sep,
                'horizontal_separation': horiz_sep,
                'vertical_separation': vert_sep,
                'pos1': pos1,
                'pos2': pos2
            }

    return None


# Analyze Dylan Cease's tunneling
def analyze_pitcher_tunnel(pitcher_name, year=2023):
    """Comprehensive tunnel analysis for a pitcher"""

    # Get pitcher ID
    # Dylan Cease ID: 656302
    pitcher_id = 656302

    # Fetch data
    print(f"Fetching data for {pitcher_name} ({year})...")
    data = statcast_pitcher(f'{year}-04-01', f'{year}-09-30', pitcher_id)

    if data is None or len(data) == 0:
        print("No data found")
        return

    # Filter for primary pitches
    pitches = data[data['pitch_type'].isin(['FF', 'SL', 'CU'])].copy()

    # Remove null values
    required_cols = ['vx0', 'vy0', 'vz0', 'ax', 'ay', 'az',
                     'release_pos_x', 'release_pos_y', 'release_pos_z']
    pitches = pitches.dropna(subset=required_cols)

    print(f"\nAnalyzing {len(pitches)} pitches")
    print(f"Pitch mix: {pitches['pitch_type'].value_counts().to_dict()}")

    # Calculate average trajectory for each pitch type
    pitch_types = {}
    for ptype in ['FF', 'SL', 'CU']:
        subset = pitches[pitches['pitch_type'] == ptype]
        if len(subset) > 0:
            avg_data = subset[required_cols].mean()
            pitch_types[ptype] = PitchTrajectory(avg_data)

    # Analyze all pitch pairs
    results = {}
    pairs = [('FF', 'SL'), ('FF', 'CU'), ('SL', 'CU')]

    for p1, p2 in pairs:
        if p1 in pitch_types and p2 in pitch_types:
            tunnel = find_tunnel_point(pitch_types[p1], pitch_types[p2])
            if tunnel:
                results[f"{p1}-{p2}"] = tunnel

    # Print results
    print(f"\n{'='*60}")
    print(f"TUNNEL ANALYSIS: {pitcher_name} ({year})")
    print(f"{'='*60}\n")

    for pair, result in results.items():
        print(f"{pair} Tunnel:")
        print(f"  Distance from plate: {result['tunnel_distance']:.1f} feet")
        print(f"  Total separation: {result['total_separation']:.2f} inches")
        print(f"  Horizontal: {result['horizontal_separation']:.2f} inches")
        print(f"  Vertical: {result['vertical_separation']:.2f} inches")
        print()

    return pitches, pitch_types, results


# Example usage
pitches, trajectories, tunnel_results = analyze_pitcher_tunnel("Dylan Cease", 2023)


# Visualize 3D trajectories
def plot_3d_trajectories(pitch_types, pitcher_name):
    """Create 3D visualization of pitch trajectories"""

    fig = plt.figure(figsize=(14, 10))
    ax = fig.add_subplot(111, projection='3d')

    colors = {'FF': '#d62728', 'SL': '#1f77b4', 'CU': '#2ca02c', 'CH': '#ff7f0e'}
    labels = {'FF': 'Four-Seam FB', 'SL': 'Slider',
              'CU': 'Curveball', 'CH': 'Changeup'}

    for ptype, trajectory in pitch_types.items():
        path = trajectory.full_trajectory(num_points=200)

        ax.plot(path[:, 0], path[:, 1], path[:, 2],
                color=colors.get(ptype, 'gray'),
                linewidth=2.5,
                label=labels.get(ptype, ptype),
                alpha=0.8)

    # Add strike zone
    sz_x = [-0.708/2, 0.708/2, 0.708/2, -0.708/2, -0.708/2]  # 17 inches wide
    sz_z = [1.5, 1.5, 3.5, 3.5, 1.5]  # Approximate zone
    sz_y = [1.417] * 5  # Front of plate

    ax.plot(sz_x, sz_y, sz_z, 'k-', linewidth=2, alpha=0.5)

    ax.set_xlabel('Horizontal Position (feet)', fontsize=11)
    ax.set_ylabel('Distance from Plate (feet)', fontsize=11)
    ax.set_zlabel('Height (feet)', fontsize=11)
    ax.set_title(f'{pitcher_name} - Pitch Trajectories (3D View)',
                 fontsize=14, fontweight='bold')

    ax.legend(loc='upper left', fontsize=10)
    ax.view_init(elev=15, azim=45)

    plt.tight_layout()
    return fig

# Create visualization
fig = plot_3d_trajectories(trajectories, "Dylan Cease")
plt.savefig('cease_tunnel_3d.png', dpi=300, bbox_inches='tight')
plt.show()
R
||p₁(t) - p₂(t)|| > δ_perceptual
R
# Load required libraries
library(baseballr)
library(tidyverse)
library(plotly)

# Function to calculate pitch trajectory at any point
calculate_trajectory <- function(pitch_data, distance_from_plate) {
  # Extract initial conditions
  vx0 <- pitch_data$vx0
  vy0 <- pitch_data$vy0
  vz0 <- pitch_data$vz0

  ax <- pitch_data$ax
  ay <- pitch_data$ay
  az <- pitch_data$az

  x0 <- pitch_data$release_pos_x
  y0 <- pitch_data$release_pos_y
  z0 <- pitch_data$release_pos_z

  # Calculate time to reach the distance
  # Using quadratic formula for: distance = y0 + vy0*t + 0.5*ay*t^2
  y_target <- distance_from_plate
  a <- 0.5 * ay
  b <- vy0
  c <- y0 - y_target

  # Take the positive root (forward in time)
  t <- (-b - sqrt(b^2 - 4*a*c)) / (2*a)

  # Calculate x, y, z positions at this time
  x <- x0 + vx0*t + 0.5*ax*t^2
  y <- y0 + vy0*t + 0.5*ay*t^2
  z <- z0 + vz0*t + 0.5*az*t^2

  return(c(x = x, y = y, z = z))
}

# Function to find tunnel point between two pitch types
find_tunnel_point <- function(pitch1_data, pitch2_data,
                               threshold_inches = 2,
                               start_dist = 50,
                               end_dist = 5) {

  distances <- seq(start_dist, end_dist, by = -0.5)

  for (dist in distances) {
    pos1 <- calculate_trajectory(pitch1_data, dist)
    pos2 <- calculate_trajectory(pitch2_data, dist)

    # Calculate 3D Euclidean distance in inches
    separation <- sqrt((pos1['x'] - pos2['x'])^2 +
                      (pos1['z'] - pos2['z'])^2) * 12

    if (separation > threshold_inches) {
      return(list(
        tunnel_distance = dist,
        separation_inches = separation,
        position1 = pos1,
        position2 = pos2
      ))
    }
  }

  return(NULL)
}

# Get Spencer Strider's 2023 data
strider_id <- 675911

# Fetch Statcast data for 2023 season
strider_data <- statcast_search_pitchers(
  start_date = "2023-04-01",
  end_date = "2023-09-30",
  pitcherid = strider_id
)

# Filter for fastballs and sliders
ff_slider <- strider_data %>%
  filter(pitch_type %in% c("FF", "SL")) %>%
  filter(!is.na(release_pos_x) & !is.na(vx0))

# Get average pitch characteristics
avg_fastball <- ff_slider %>%
  filter(pitch_type == "FF") %>%
  summarise(across(c(vx0, vy0, vz0, ax, ay, az,
                     release_pos_x, release_pos_y, release_pos_z,
                     release_speed, release_spin_rate,
                     pfx_x, pfx_z), mean, na.rm = TRUE))

avg_slider <- ff_slider %>%
  filter(pitch_type == "SL") %>%
  summarise(across(c(vx0, vy0, vz0, ax, ay, az,
                     release_pos_x, release_pos_y, release_pos_z,
                     release_speed, release_spin_rate,
                     pfx_x, pfx_z), mean, na.rm = TRUE))

# Calculate tunnel point
tunnel_result <- find_tunnel_point(avg_fastball, avg_slider)

cat("Spencer Strider FF-SL Tunnel Analysis\n")
cat("=====================================\n")
cat(sprintf("Tunnel Point: %.1f feet from home plate\n",
            tunnel_result$tunnel_distance))
cat(sprintf("Separation at tunnel: %.2f inches\n",
            tunnel_result$separation_inches))
cat(sprintf("Fastball velocity: %.1f mph\n", avg_fastball$release_speed))
cat(sprintf("Slider velocity: %.1f mph\n", avg_slider$release_speed))
cat(sprintf("Velocity differential: %.1f mph\n",
            avg_fastball$release_speed - avg_slider$release_speed))

# Visualize the tunnel
visualize_tunnel <- function(pitch1, pitch2, pitch1_name, pitch2_name) {
  distances <- seq(55, 0, by = -0.5)

  trajectory_data <- map_df(distances, function(d) {
    if (d > 0) {
      pos1 <- calculate_trajectory(pitch1, d)
      pos2 <- calculate_trajectory(pitch2, d)

      tibble(
        distance = d,
        x_ff = pos1['x'],
        z_ff = pos1['z'],
        x_sl = pos2['x'],
        z_sl = pos2['z']
      )
    }
  })

  p <- ggplot(trajectory_data) +
    geom_path(aes(x = distance, y = x_ff * 12, color = pitch1_name),
              size = 1.2) +
    geom_path(aes(x = distance, y = x_sl * 12, color = pitch2_name),
              size = 1.2) +
    scale_color_manual(values = c("FF" = "#d62728", "SL" = "#1f77b4")) +
    labs(title = "Pitch Tunneling: Horizontal Movement",
         subtitle = "Spencer Strider Fastball vs Slider",
         x = "Distance from Home Plate (feet)",
         y = "Horizontal Position (inches from center)",
         color = "Pitch Type") +
    theme_minimal() +
    theme(legend.position = "bottom")

  return(p)
}
Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pybaseball import statcast_pitcher
from pybaseball import playerid_lookup
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns

# Configure plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 8)

class PitchTrajectory:
    """Calculate and analyze pitch trajectories from Statcast data"""

    def __init__(self, pitch_data):
        """
        Initialize with pitch data containing:
        vx0, vy0, vz0, ax, ay, az, release_pos_x, release_pos_y, release_pos_z
        """
        self.vx0 = pitch_data['vx0']
        self.vy0 = pitch_data['vy0']
        self.vz0 = pitch_data['vz0']

        self.ax = pitch_data['ax']
        self.ay = pitch_data['ay']
        self.az = pitch_data['az']

        self.x0 = pitch_data['release_pos_x']
        self.y0 = pitch_data['release_pos_y']
        self.z0 = pitch_data['release_pos_z']

    def position_at_distance(self, y_distance):
        """
        Calculate x, y, z position when ball is at y_distance from plate

        Parameters:
        -----------
        y_distance : float
            Distance from home plate in feet (positive = toward pitcher)

        Returns:
        --------
        tuple : (x, y, z) position in feet
        """
        # Solve for time using quadratic formula
        # y(t) = y0 + vy0*t + 0.5*ay*t^2
        a = 0.5 * self.ay
        b = self.vy0
        c = self.y0 - y_distance

        discriminant = b**2 - 4*a*c
        if discriminant < 0:
            return None

        t = (-b - np.sqrt(discriminant)) / (2*a)

        if t < 0:
            return None

        x = self.x0 + self.vx0*t + 0.5*self.ax*t**2
        y = self.y0 + self.vy0*t + 0.5*self.ay*t**2
        z = self.z0 + self.vz0*t + 0.5*self.az*t**2

        return (x, y, z)

    def full_trajectory(self, num_points=100):
        """Generate complete trajectory from release to plate"""
        distances = np.linspace(self.y0, 1.417, num_points)  # 1.417 ft = 17 inches
        trajectory = []

        for d in distances:
            pos = self.position_at_distance(d)
            if pos:
                trajectory.append(pos)

        return np.array(trajectory)


def find_tunnel_point(traj1, traj2, threshold_inches=2.0):
    """
    Find where two pitch trajectories separate beyond threshold

    Parameters:
    -----------
    traj1, traj2 : PitchTrajectory objects
    threshold_inches : float
        Separation threshold in inches

    Returns:
    --------
    dict : tunnel point information
    """
    # Check distances from 50 feet to 5 feet
    distances = np.arange(50, 5, -0.5)

    for dist in distances:
        pos1 = traj1.position_at_distance(dist)
        pos2 = traj2.position_at_distance(dist)

        if pos1 is None or pos2 is None:
            continue

        # Calculate horizontal and vertical separation
        horiz_sep = np.sqrt((pos1[0] - pos2[0])**2) * 12  # Convert to inches
        vert_sep = abs(pos1[2] - pos2[2]) * 12
        total_sep = np.sqrt(horiz_sep**2 + vert_sep**2)

        if total_sep > threshold_inches:
            return {
                'tunnel_distance': dist,
                'total_separation': total_sep,
                'horizontal_separation': horiz_sep,
                'vertical_separation': vert_sep,
                'pos1': pos1,
                'pos2': pos2
            }

    return None


# Analyze Dylan Cease's tunneling
def analyze_pitcher_tunnel(pitcher_name, year=2023):
    """Comprehensive tunnel analysis for a pitcher"""

    # Get pitcher ID
    # Dylan Cease ID: 656302
    pitcher_id = 656302

    # Fetch data
    print(f"Fetching data for {pitcher_name} ({year})...")
    data = statcast_pitcher(f'{year}-04-01', f'{year}-09-30', pitcher_id)

    if data is None or len(data) == 0:
        print("No data found")
        return

    # Filter for primary pitches
    pitches = data[data['pitch_type'].isin(['FF', 'SL', 'CU'])].copy()

    # Remove null values
    required_cols = ['vx0', 'vy0', 'vz0', 'ax', 'ay', 'az',
                     'release_pos_x', 'release_pos_y', 'release_pos_z']
    pitches = pitches.dropna(subset=required_cols)

    print(f"\nAnalyzing {len(pitches)} pitches")
    print(f"Pitch mix: {pitches['pitch_type'].value_counts().to_dict()}")

    # Calculate average trajectory for each pitch type
    pitch_types = {}
    for ptype in ['FF', 'SL', 'CU']:
        subset = pitches[pitches['pitch_type'] == ptype]
        if len(subset) > 0:
            avg_data = subset[required_cols].mean()
            pitch_types[ptype] = PitchTrajectory(avg_data)

    # Analyze all pitch pairs
    results = {}
    pairs = [('FF', 'SL'), ('FF', 'CU'), ('SL', 'CU')]

    for p1, p2 in pairs:
        if p1 in pitch_types and p2 in pitch_types:
            tunnel = find_tunnel_point(pitch_types[p1], pitch_types[p2])
            if tunnel:
                results[f"{p1}-{p2}"] = tunnel

    # Print results
    print(f"\n{'='*60}")
    print(f"TUNNEL ANALYSIS: {pitcher_name} ({year})")
    print(f"{'='*60}\n")

    for pair, result in results.items():
        print(f"{pair} Tunnel:")
        print(f"  Distance from plate: {result['tunnel_distance']:.1f} feet")
        print(f"  Total separation: {result['total_separation']:.2f} inches")
        print(f"  Horizontal: {result['horizontal_separation']:.2f} inches")
        print(f"  Vertical: {result['vertical_separation']:.2f} inches")
        print()

    return pitches, pitch_types, results


# Example usage
pitches, trajectories, tunnel_results = analyze_pitcher_tunnel("Dylan Cease", 2023)


# Visualize 3D trajectories
def plot_3d_trajectories(pitch_types, pitcher_name):
    """Create 3D visualization of pitch trajectories"""

    fig = plt.figure(figsize=(14, 10))
    ax = fig.add_subplot(111, projection='3d')

    colors = {'FF': '#d62728', 'SL': '#1f77b4', 'CU': '#2ca02c', 'CH': '#ff7f0e'}
    labels = {'FF': 'Four-Seam FB', 'SL': 'Slider',
              'CU': 'Curveball', 'CH': 'Changeup'}

    for ptype, trajectory in pitch_types.items():
        path = trajectory.full_trajectory(num_points=200)

        ax.plot(path[:, 0], path[:, 1], path[:, 2],
                color=colors.get(ptype, 'gray'),
                linewidth=2.5,
                label=labels.get(ptype, ptype),
                alpha=0.8)

    # Add strike zone
    sz_x = [-0.708/2, 0.708/2, 0.708/2, -0.708/2, -0.708/2]  # 17 inches wide
    sz_z = [1.5, 1.5, 3.5, 3.5, 1.5]  # Approximate zone
    sz_y = [1.417] * 5  # Front of plate

    ax.plot(sz_x, sz_y, sz_z, 'k-', linewidth=2, alpha=0.5)

    ax.set_xlabel('Horizontal Position (feet)', fontsize=11)
    ax.set_ylabel('Distance from Plate (feet)', fontsize=11)
    ax.set_zlabel('Height (feet)', fontsize=11)
    ax.set_title(f'{pitcher_name} - Pitch Trajectories (3D View)',
                 fontsize=14, fontweight='bold')

    ax.legend(loc='upper left', fontsize=10)
    ax.view_init(elev=15, azim=45)

    plt.tight_layout()
    return fig

# Create visualization
fig = plot_3d_trajectories(trajectories, "Dylan Cease")
plt.savefig('cease_tunnel_3d.png', dpi=300, bbox_inches='tight')
plt.show()

26.3 Release Point Consistency

Release point consistency is the foundation of effective pitch tunneling. Even slight variations in where the ball leaves the pitcher's hand can telegraph pitch type to observant hitters. Elite pitchers maintain release points within a 2-3 inch window across all pitch types.

Measuring Release Point Variance

Statcast tracks three-dimensional release position (x, y, z coordinates) for every pitch. We can analyze the standard deviation and range of these positions to quantify consistency.

# Analyze release point consistency
analyze_release_consistency <- function(pitcher_data) {

  release_analysis <- pitcher_data %>%
    group_by(pitch_type) %>%
    summarise(
      n_pitches = n(),

      # Release position means
      mean_x = mean(release_pos_x, na.rm = TRUE),
      mean_y = mean(release_pos_y, na.rm = TRUE),
      mean_z = mean(release_pos_z, na.rm = TRUE),

      # Release position standard deviations (in inches)
      sd_x = sd(release_pos_x, na.rm = TRUE) * 12,
      sd_y = sd(release_pos_y, na.rm = TRUE) * 12,
      sd_z = sd(release_pos_z, na.rm = TRUE) * 12,

      # Total positional variance
      total_variance = sqrt(sd_x^2 + sd_z^2),

      .groups = 'drop'
    ) %>%
    arrange(desc(n_pitches))

  return(release_analysis)
}

# Analyze Shohei Ohtani's release consistency
ohtani_id <- 660271

ohtani_data <- statcast_search_pitchers(
  start_date = "2023-04-01",
  end_date = "2023-09-30",
  pitcherid = ohtani_id
)

ohtani_release <- analyze_release_consistency(ohtani_data)

print("Shohei Ohtani Release Point Consistency (2023)")
print(ohtani_release)

# Visualize release points
plot_release_consistency <- function(pitcher_data, pitcher_name) {

  # Calculate deviations from mean release point
  mean_release <- pitcher_data %>%
    summarise(
      mean_x = mean(release_pos_x, na.rm = TRUE),
      mean_z = mean(release_pos_z, na.rm = TRUE)
    )

  plot_data <- pitcher_data %>%
    filter(!is.na(release_pos_x) & !is.na(release_pos_z)) %>%
    mutate(
      rel_x_inches = (release_pos_x - mean_release$mean_x) * 12,
      rel_z_inches = (release_pos_z - mean_release$mean_z) * 12
    )

  p <- ggplot(plot_data, aes(x = rel_x_inches, y = rel_z_inches,
                              color = pitch_type)) +
    geom_point(alpha = 0.3, size = 1.5) +
    stat_ellipse(level = 0.95, size = 1) +
    facet_wrap(~pitch_type, ncol = 3) +
    geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
    geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.5) +
    labs(
      title = paste(pitcher_name, "- Release Point Consistency"),
      subtitle = "95% confidence ellipses by pitch type",
      x = "Horizontal Deviation from Mean (inches)",
      y = "Vertical Deviation from Mean (inches)",
      color = "Pitch Type"
    ) +
    theme_minimal() +
    theme(legend.position = "bottom") +
    coord_fixed(ratio = 1)

  return(p)
}

# Create visualization
p_release <- plot_release_consistency(ohtani_data, "Shohei Ohtani")
print(p_release)

# Calculate cross-pitch release consistency
calculate_release_distance_matrix <- function(pitcher_data) {

  pitch_types <- unique(pitcher_data$pitch_type)
  pitch_types <- pitch_types[!is.na(pitch_types)]

  # Calculate mean release point for each pitch type
  means <- pitcher_data %>%
    group_by(pitch_type) %>%
    summarise(
      mean_x = mean(release_pos_x, na.rm = TRUE),
      mean_y = mean(release_pos_y, na.rm = TRUE),
      mean_z = mean(release_pos_z, na.rm = TRUE),
      .groups = 'drop'
    )

  # Create distance matrix
  n <- nrow(means)
  dist_matrix <- matrix(0, nrow = n, ncol = n)
  rownames(dist_matrix) <- means$pitch_type
  colnames(dist_matrix) <- means$pitch_type

  for (i in 1:n) {
    for (j in 1:n) {
      if (i != j) {
        dx <- means$mean_x[i] - means$mean_x[j]
        dy <- means$mean_y[i] - means$mean_y[j]
        dz <- means$mean_z[i] - means$mean_z[j]

        # Distance in inches
        dist_matrix[i, j] <- sqrt(dx^2 + dy^2 + dz^2) * 12
      }
    }
  }

  return(dist_matrix)
}

ohtani_distances <- calculate_release_distance_matrix(ohtani_data)
print("Release Point Distances Between Pitch Types (inches):")
print(round(ohtani_distances, 2))

Python Release Point Analysis

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import multivariate_normal
from matplotlib.patches import Ellipse

def analyze_release_point_consistency(pitcher_data):
    """
    Analyze release point consistency across pitch types

    Returns detailed statistics and visualizations
    """

    # Filter valid data
    valid_data = pitcher_data.dropna(
        subset=['release_pos_x', 'release_pos_y', 'release_pos_z', 'pitch_type']
    )

    # Calculate statistics by pitch type
    stats = valid_data.groupby('pitch_type').agg({
        'release_pos_x': ['mean', 'std', 'count'],
        'release_pos_y': ['mean', 'std'],
        'release_pos_z': ['mean', 'std']
    })

    # Convert standard deviations to inches
    stats[('release_pos_x', 'std')] *= 12
    stats[('release_pos_y', 'std')] *= 12
    stats[('release_pos_z', 'std')] *= 12

    # Calculate total variance in horizontal/vertical plane
    stats['total_variance'] = np.sqrt(
        stats[('release_pos_x', 'std')]**2 +
        stats[('release_pos_z', 'std')]**2
    )

    return stats


def plot_release_point_comparison(pitcher_data, pitcher_name):
    """
    Create comprehensive release point visualization
    """

    fig, axes = plt.subplots(2, 2, figsize=(14, 12))

    valid_data = pitcher_data.dropna(
        subset=['release_pos_x', 'release_pos_z', 'pitch_type']
    )

    # Get pitch types and colors
    pitch_types = valid_data['pitch_type'].unique()
    colors = plt.cm.Set2(np.linspace(0, 1, len(pitch_types)))
    color_map = dict(zip(pitch_types, colors))

    # Plot 1: Scatter plot with density
    ax1 = axes[0, 0]
    for ptype in pitch_types:
        subset = valid_data[valid_data['pitch_type'] == ptype]
        ax1.scatter(subset['release_pos_x'] * 12,
                   subset['release_pos_z'] * 12,
                   c=[color_map[ptype]],
                   label=ptype,
                   alpha=0.3,
                   s=20)

    ax1.set_xlabel('Horizontal Position (inches)', fontsize=11)
    ax1.set_ylabel('Vertical Position (inches)', fontsize=11)
    ax1.set_title('Release Point Scatter', fontsize=12, fontweight='bold')
    ax1.legend()
    ax1.grid(True, alpha=0.3)

    # Plot 2: Confidence ellipses
    ax2 = axes[0, 1]
    for ptype in pitch_types:
        subset = valid_data[valid_data['pitch_type'] == ptype]

        if len(subset) < 3:
            continue

        x = subset['release_pos_x'].values * 12
        z = subset['release_pos_z'].values * 12

        # Calculate covariance and mean
        mean_x, mean_z = np.mean(x), np.mean(z)
        cov = np.cov(x, z)

        # Create confidence ellipse
        eigenvalues, eigenvectors = np.linalg.eig(cov)
        angle = np.degrees(np.arctan2(eigenvectors[1, 0], eigenvectors[0, 0]))

        # 95% confidence interval
        width, height = 2 * 2.448 * np.sqrt(eigenvalues)

        ellipse = Ellipse(xy=(mean_x, mean_z),
                         width=width,
                         height=height,
                         angle=angle,
                         facecolor=color_map[ptype],
                         alpha=0.3,
                         edgecolor=color_map[ptype],
                         linewidth=2,
                         label=ptype)

        ax2.add_patch(ellipse)
        ax2.plot(mean_x, mean_z, 'o', color=color_map[ptype],
                markersize=8, markeredgecolor='black', markeredgewidth=1)

    ax2.set_xlabel('Horizontal Position (inches)', fontsize=11)
    ax2.set_ylabel('Vertical Position (inches)', fontsize=11)
    ax2.set_title('95% Confidence Ellipses', fontsize=12, fontweight='bold')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    ax2.axis('equal')

    # Plot 3: Standard deviation by pitch type
    ax3 = axes[1, 0]
    stats = analyze_release_point_consistency(pitcher_data)

    x_std = stats[('release_pos_x', 'std')].values
    z_std = stats[('release_pos_z', 'std')].values
    pitch_labels = stats.index.tolist()

    x_pos = np.arange(len(pitch_labels))
    width = 0.35

    ax3.bar(x_pos - width/2, x_std, width, label='Horizontal SD', alpha=0.8)
    ax3.bar(x_pos + width/2, z_std, width, label='Vertical SD', alpha=0.8)

    ax3.set_xlabel('Pitch Type', fontsize=11)
    ax3.set_ylabel('Standard Deviation (inches)', fontsize=11)
    ax3.set_title('Release Point Variability', fontsize=12, fontweight='bold')
    ax3.set_xticks(x_pos)
    ax3.set_xticklabels(pitch_labels)
    ax3.legend()
    ax3.grid(True, alpha=0.3, axis='y')

    # Plot 4: Time series (by game date)
    ax4 = axes[1, 1]

    if 'game_date' in valid_data.columns:
        valid_data['game_date'] = pd.to_datetime(valid_data['game_date'])

        for ptype in pitch_types[:3]:  # Limit to top 3 pitch types
            subset = valid_data[valid_data['pitch_type'] == ptype]

            # Calculate rolling mean of release height
            subset_sorted = subset.sort_values('game_date')
            subset_sorted['release_z_inches'] = subset_sorted['release_pos_z'] * 12

            # Group by game date and calculate mean
            daily_avg = subset_sorted.groupby('game_date')['release_z_inches'].mean()

            ax4.plot(daily_avg.index, daily_avg.values,
                    label=ptype, marker='o', alpha=0.7)

        ax4.set_xlabel('Date', fontsize=11)
        ax4.set_ylabel('Release Height (inches)', fontsize=11)
        ax4.set_title('Release Point Consistency Over Season',
                     fontsize=12, fontweight='bold')
        ax4.legend()
        ax4.grid(True, alpha=0.3)
        plt.setp(ax4.xaxis.get_majorticklabels(), rotation=45)

    plt.suptitle(f'{pitcher_name} - Release Point Analysis',
                fontsize=14, fontweight='bold', y=1.00)
    plt.tight_layout()

    return fig

# Example usage with Ohtani's data
# Assuming we have ohtani_data loaded
# fig = plot_release_point_comparison(pitches, "Shohei Ohtani")
# plt.savefig('ohtani_release_analysis.png', dpi=300, bbox_inches='tight')
R
# Analyze release point consistency
analyze_release_consistency <- function(pitcher_data) {

  release_analysis <- pitcher_data %>%
    group_by(pitch_type) %>%
    summarise(
      n_pitches = n(),

      # Release position means
      mean_x = mean(release_pos_x, na.rm = TRUE),
      mean_y = mean(release_pos_y, na.rm = TRUE),
      mean_z = mean(release_pos_z, na.rm = TRUE),

      # Release position standard deviations (in inches)
      sd_x = sd(release_pos_x, na.rm = TRUE) * 12,
      sd_y = sd(release_pos_y, na.rm = TRUE) * 12,
      sd_z = sd(release_pos_z, na.rm = TRUE) * 12,

      # Total positional variance
      total_variance = sqrt(sd_x^2 + sd_z^2),

      .groups = 'drop'
    ) %>%
    arrange(desc(n_pitches))

  return(release_analysis)
}

# Analyze Shohei Ohtani's release consistency
ohtani_id <- 660271

ohtani_data <- statcast_search_pitchers(
  start_date = "2023-04-01",
  end_date = "2023-09-30",
  pitcherid = ohtani_id
)

ohtani_release <- analyze_release_consistency(ohtani_data)

print("Shohei Ohtani Release Point Consistency (2023)")
print(ohtani_release)

# Visualize release points
plot_release_consistency <- function(pitcher_data, pitcher_name) {

  # Calculate deviations from mean release point
  mean_release <- pitcher_data %>%
    summarise(
      mean_x = mean(release_pos_x, na.rm = TRUE),
      mean_z = mean(release_pos_z, na.rm = TRUE)
    )

  plot_data <- pitcher_data %>%
    filter(!is.na(release_pos_x) & !is.na(release_pos_z)) %>%
    mutate(
      rel_x_inches = (release_pos_x - mean_release$mean_x) * 12,
      rel_z_inches = (release_pos_z - mean_release$mean_z) * 12
    )

  p <- ggplot(plot_data, aes(x = rel_x_inches, y = rel_z_inches,
                              color = pitch_type)) +
    geom_point(alpha = 0.3, size = 1.5) +
    stat_ellipse(level = 0.95, size = 1) +
    facet_wrap(~pitch_type, ncol = 3) +
    geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
    geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.5) +
    labs(
      title = paste(pitcher_name, "- Release Point Consistency"),
      subtitle = "95% confidence ellipses by pitch type",
      x = "Horizontal Deviation from Mean (inches)",
      y = "Vertical Deviation from Mean (inches)",
      color = "Pitch Type"
    ) +
    theme_minimal() +
    theme(legend.position = "bottom") +
    coord_fixed(ratio = 1)

  return(p)
}

# Create visualization
p_release <- plot_release_consistency(ohtani_data, "Shohei Ohtani")
print(p_release)

# Calculate cross-pitch release consistency
calculate_release_distance_matrix <- function(pitcher_data) {

  pitch_types <- unique(pitcher_data$pitch_type)
  pitch_types <- pitch_types[!is.na(pitch_types)]

  # Calculate mean release point for each pitch type
  means <- pitcher_data %>%
    group_by(pitch_type) %>%
    summarise(
      mean_x = mean(release_pos_x, na.rm = TRUE),
      mean_y = mean(release_pos_y, na.rm = TRUE),
      mean_z = mean(release_pos_z, na.rm = TRUE),
      .groups = 'drop'
    )

  # Create distance matrix
  n <- nrow(means)
  dist_matrix <- matrix(0, nrow = n, ncol = n)
  rownames(dist_matrix) <- means$pitch_type
  colnames(dist_matrix) <- means$pitch_type

  for (i in 1:n) {
    for (j in 1:n) {
      if (i != j) {
        dx <- means$mean_x[i] - means$mean_x[j]
        dy <- means$mean_y[i] - means$mean_y[j]
        dz <- means$mean_z[i] - means$mean_z[j]

        # Distance in inches
        dist_matrix[i, j] <- sqrt(dx^2 + dy^2 + dz^2) * 12
      }
    }
  }

  return(dist_matrix)
}

ohtani_distances <- calculate_release_distance_matrix(ohtani_data)
print("Release Point Distances Between Pitch Types (inches):")
print(round(ohtani_distances, 2))
Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import multivariate_normal
from matplotlib.patches import Ellipse

def analyze_release_point_consistency(pitcher_data):
    """
    Analyze release point consistency across pitch types

    Returns detailed statistics and visualizations
    """

    # Filter valid data
    valid_data = pitcher_data.dropna(
        subset=['release_pos_x', 'release_pos_y', 'release_pos_z', 'pitch_type']
    )

    # Calculate statistics by pitch type
    stats = valid_data.groupby('pitch_type').agg({
        'release_pos_x': ['mean', 'std', 'count'],
        'release_pos_y': ['mean', 'std'],
        'release_pos_z': ['mean', 'std']
    })

    # Convert standard deviations to inches
    stats[('release_pos_x', 'std')] *= 12
    stats[('release_pos_y', 'std')] *= 12
    stats[('release_pos_z', 'std')] *= 12

    # Calculate total variance in horizontal/vertical plane
    stats['total_variance'] = np.sqrt(
        stats[('release_pos_x', 'std')]**2 +
        stats[('release_pos_z', 'std')]**2
    )

    return stats


def plot_release_point_comparison(pitcher_data, pitcher_name):
    """
    Create comprehensive release point visualization
    """

    fig, axes = plt.subplots(2, 2, figsize=(14, 12))

    valid_data = pitcher_data.dropna(
        subset=['release_pos_x', 'release_pos_z', 'pitch_type']
    )

    # Get pitch types and colors
    pitch_types = valid_data['pitch_type'].unique()
    colors = plt.cm.Set2(np.linspace(0, 1, len(pitch_types)))
    color_map = dict(zip(pitch_types, colors))

    # Plot 1: Scatter plot with density
    ax1 = axes[0, 0]
    for ptype in pitch_types:
        subset = valid_data[valid_data['pitch_type'] == ptype]
        ax1.scatter(subset['release_pos_x'] * 12,
                   subset['release_pos_z'] * 12,
                   c=[color_map[ptype]],
                   label=ptype,
                   alpha=0.3,
                   s=20)

    ax1.set_xlabel('Horizontal Position (inches)', fontsize=11)
    ax1.set_ylabel('Vertical Position (inches)', fontsize=11)
    ax1.set_title('Release Point Scatter', fontsize=12, fontweight='bold')
    ax1.legend()
    ax1.grid(True, alpha=0.3)

    # Plot 2: Confidence ellipses
    ax2 = axes[0, 1]
    for ptype in pitch_types:
        subset = valid_data[valid_data['pitch_type'] == ptype]

        if len(subset) < 3:
            continue

        x = subset['release_pos_x'].values * 12
        z = subset['release_pos_z'].values * 12

        # Calculate covariance and mean
        mean_x, mean_z = np.mean(x), np.mean(z)
        cov = np.cov(x, z)

        # Create confidence ellipse
        eigenvalues, eigenvectors = np.linalg.eig(cov)
        angle = np.degrees(np.arctan2(eigenvectors[1, 0], eigenvectors[0, 0]))

        # 95% confidence interval
        width, height = 2 * 2.448 * np.sqrt(eigenvalues)

        ellipse = Ellipse(xy=(mean_x, mean_z),
                         width=width,
                         height=height,
                         angle=angle,
                         facecolor=color_map[ptype],
                         alpha=0.3,
                         edgecolor=color_map[ptype],
                         linewidth=2,
                         label=ptype)

        ax2.add_patch(ellipse)
        ax2.plot(mean_x, mean_z, 'o', color=color_map[ptype],
                markersize=8, markeredgecolor='black', markeredgewidth=1)

    ax2.set_xlabel('Horizontal Position (inches)', fontsize=11)
    ax2.set_ylabel('Vertical Position (inches)', fontsize=11)
    ax2.set_title('95% Confidence Ellipses', fontsize=12, fontweight='bold')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    ax2.axis('equal')

    # Plot 3: Standard deviation by pitch type
    ax3 = axes[1, 0]
    stats = analyze_release_point_consistency(pitcher_data)

    x_std = stats[('release_pos_x', 'std')].values
    z_std = stats[('release_pos_z', 'std')].values
    pitch_labels = stats.index.tolist()

    x_pos = np.arange(len(pitch_labels))
    width = 0.35

    ax3.bar(x_pos - width/2, x_std, width, label='Horizontal SD', alpha=0.8)
    ax3.bar(x_pos + width/2, z_std, width, label='Vertical SD', alpha=0.8)

    ax3.set_xlabel('Pitch Type', fontsize=11)
    ax3.set_ylabel('Standard Deviation (inches)', fontsize=11)
    ax3.set_title('Release Point Variability', fontsize=12, fontweight='bold')
    ax3.set_xticks(x_pos)
    ax3.set_xticklabels(pitch_labels)
    ax3.legend()
    ax3.grid(True, alpha=0.3, axis='y')

    # Plot 4: Time series (by game date)
    ax4 = axes[1, 1]

    if 'game_date' in valid_data.columns:
        valid_data['game_date'] = pd.to_datetime(valid_data['game_date'])

        for ptype in pitch_types[:3]:  # Limit to top 3 pitch types
            subset = valid_data[valid_data['pitch_type'] == ptype]

            # Calculate rolling mean of release height
            subset_sorted = subset.sort_values('game_date')
            subset_sorted['release_z_inches'] = subset_sorted['release_pos_z'] * 12

            # Group by game date and calculate mean
            daily_avg = subset_sorted.groupby('game_date')['release_z_inches'].mean()

            ax4.plot(daily_avg.index, daily_avg.values,
                    label=ptype, marker='o', alpha=0.7)

        ax4.set_xlabel('Date', fontsize=11)
        ax4.set_ylabel('Release Height (inches)', fontsize=11)
        ax4.set_title('Release Point Consistency Over Season',
                     fontsize=12, fontweight='bold')
        ax4.legend()
        ax4.grid(True, alpha=0.3)
        plt.setp(ax4.xaxis.get_majorticklabels(), rotation=45)

    plt.suptitle(f'{pitcher_name} - Release Point Analysis',
                fontsize=14, fontweight='bold', y=1.00)
    plt.tight_layout()

    return fig

# Example usage with Ohtani's data
# Assuming we have ohtani_data loaded
# fig = plot_release_point_comparison(pitches, "Shohei Ohtani")
# plt.savefig('ohtani_release_analysis.png', dpi=300, bbox_inches='tight')

26.4 Induced Vertical and Horizontal Break Differences

The effectiveness of pitch tunneling depends not just on similar early trajectories, but on meaningful late separation. Pitches that tunnel well together must ultimately break to different locations, forcing the hitter to commit to one zone or another.

Understanding Induced Vertical Break (IVB)

Induced Vertical Break (IVB) represents the vertical movement of a pitch relative to a spinless pitch affected only by gravity. A positive IVB means the pitch "rises" (actually, it drops less than gravity alone would cause), while negative IVB indicates the pitch drops more than gravity alone.

For reference:


  • Elite four-seam fastballs: +15 to +20 inches IVB

  • Sliders: -5 to +5 inches IVB

  • Curveballs: -15 to -20 inches IVB

  • Changeups: +5 to +10 inches IVB

The IVB difference between tunneling pairs creates vertical separation at the plate.

Horizontal Break and Arm-Side vs. Glove-Side Movement

Horizontal break describes side-to-side movement. For right-handed pitchers:


  • Positive horizontal break = movement toward first base (arm side)

  • Negative horizontal break = movement toward third base (glove side)

Effective tunneling often pairs pitches with opposite horizontal movement (e.g., a fastball with arm-side run paired with a glove-side slider).

Calculating Break Differentials

# Analyze break differentials for tunneling pairs
analyze_break_differentials <- function(pitcher_data) {

  # Calculate average break by pitch type
  break_summary <- pitcher_data %>%
    group_by(pitch_type) %>%
    summarise(
      n = n(),
      avg_speed = mean(release_speed, na.rm = TRUE),

      # Induced vertical break (pfx_z in feet, convert to inches)
      ivb = mean(pfx_z, na.rm = TRUE) * 12,
      ivb_sd = sd(pfx_z, na.rm = TRUE) * 12,

      # Horizontal break (pfx_x in feet, convert to inches)
      hb = mean(pfx_x, na.rm = TRUE) * 12,
      hb_sd = sd(pfx_x, na.rm = TRUE) * 12,

      # Total break magnitude
      total_break = sqrt(ivb^2 + hb^2),

      # Spin rate
      spin_rate = mean(release_spin_rate, na.rm = TRUE),

      .groups = 'drop'
    ) %>%
    arrange(desc(n))

  return(break_summary)
}

# Analyze Spencer Strider's arsenal
strider_breaks <- analyze_break_differentials(strider_data)

print("Spencer Strider Break Profile (2023)")
print(strider_breaks)

# Calculate pairwise break differentials
calculate_break_differential_matrix <- function(pitcher_data) {

  break_data <- pitcher_data %>%
    group_by(pitch_type) %>%
    summarise(
      ivb = mean(pfx_z, na.rm = TRUE) * 12,
      hb = mean(pfx_x, na.rm = TRUE) * 12,
      .groups = 'drop'
    )

  n <- nrow(break_data)

  # Create matrices for IVB and HB differentials
  ivb_diff <- matrix(0, nrow = n, ncol = n)
  hb_diff <- matrix(0, nrow = n, ncol = n)
  total_diff <- matrix(0, nrow = n, ncol = n)

  rownames(ivb_diff) <- rownames(hb_diff) <- rownames(total_diff) <- break_data$pitch_type
  colnames(ivb_diff) <- colnames(hb_diff) <- colnames(total_diff) <- break_data$pitch_type

  for (i in 1:n) {
    for (j in 1:n) {
      ivb_diff[i, j] <- break_data$ivb[i] - break_data$ivb[j]
      hb_diff[i, j] <- break_data$hb[i] - break_data$hb[j]
      total_diff[i, j] <- sqrt(ivb_diff[i, j]^2 + hb_diff[i, j]^2)
    }
  }

  return(list(
    ivb_differential = ivb_diff,
    hb_differential = hb_diff,
    total_differential = total_diff
  ))
}

strider_differentials <- calculate_break_differential_matrix(strider_data)

print("\nInduced Vertical Break Differentials (inches):")
print(round(strider_differentials$ivb_differential, 1))

print("\nHorizontal Break Differentials (inches):")
print(round(strider_differentials$hb_differential, 1))

print("\nTotal Break Differentials (inches):")
print(round(strider_differentials$total_differential, 1))

# Visualize break profiles
plot_break_profile <- function(pitcher_data, pitcher_name) {

  plot_data <- pitcher_data %>%
    filter(!is.na(pfx_x) & !is.na(pfx_z)) %>%
    mutate(
      hb_inches = pfx_x * 12,
      ivb_inches = pfx_z * 12
    )

  # Calculate means for labels
  means <- plot_data %>%
    group_by(pitch_type) %>%
    summarise(
      hb = mean(hb_inches),
      ivb = mean(ivb_inches),
      n = n(),
      .groups = 'drop'
    )

  p <- ggplot(plot_data, aes(x = hb_inches, y = ivb_inches,
                              color = pitch_type)) +
    geom_point(alpha = 0.2, size = 2) +
    stat_ellipse(level = 0.75, size = 1.2) +
    geom_point(data = means, aes(x = hb, y = ivb),
               size = 5, shape = 17) +
    geom_text(data = means,
              aes(x = hb, y = ivb,
                  label = paste0(pitch_type, "\n(n=", n, ")")),
              vjust = -1.5, size = 3.5, fontface = "bold") +
    geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
    geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.5) +
    labs(
      title = paste(pitcher_name, "- Pitch Movement Profile"),
      subtitle = "Induced vertical break vs. horizontal break (catcher's perspective)",
      x = "Horizontal Break (inches, negative = glove side)",
      y = "Induced Vertical Break (inches, positive = rise)",
      color = "Pitch Type",
      caption = "Triangles represent mean movement | Ellipses show 75% confidence intervals"
    ) +
    theme_minimal() +
    theme(
      legend.position = "right",
      plot.title = element_text(size = 14, face = "bold"),
      plot.subtitle = element_text(size = 10)
    ) +
    coord_fixed(ratio = 1)

  return(p)
}

p_breaks <- plot_break_profile(strider_data, "Spencer Strider")
print(p_breaks)

Advanced Break Analysis in Python

def analyze_pitch_break_profiles(pitcher_data):
    """
    Comprehensive analysis of pitch movement profiles

    Returns break statistics and optimal tunneling pairs
    """

    # Filter valid data
    valid = pitcher_data.dropna(subset=['pfx_x', 'pfx_z', 'pitch_type'])

    # Calculate break statistics by pitch type
    break_stats = valid.groupby('pitch_type').agg({
        'pfx_x': ['mean', 'std'],
        'pfx_z': ['mean', 'std'],
        'release_speed': 'mean',
        'release_spin_rate': 'mean',
        'pitch_type': 'count'
    })

    # Convert to inches
    break_stats[('pfx_x', 'mean')] *= 12
    break_stats[('pfx_x', 'std')] *= 12
    break_stats[('pfx_z', 'mean')] *= 12
    break_stats[('pfx_z', 'std')] *= 12

    # Rename columns for clarity
    break_stats.columns = ['hb_mean', 'hb_std', 'ivb_mean', 'ivb_std',
                           'velo', 'spin', 'count']

    # Calculate total break
    break_stats['total_break'] = np.sqrt(
        break_stats['hb_mean']**2 + break_stats['ivb_mean']**2
    )

    # Calculate break angle
    break_stats['break_angle'] = np.degrees(
        np.arctan2(break_stats['ivb_mean'], break_stats['hb_mean'])
    )

    return break_stats


def find_optimal_tunnel_pairs(pitcher_data, min_break_diff=6, max_velo_diff=10):
    """
    Identify optimal pitch pairs for tunneling based on:
    1. Sufficient break differential (creates deception)
    2. Similar velocity (similar initial trajectory)
    3. Different movement profiles
    """

    break_stats = analyze_pitch_break_profiles(pitcher_data)

    pitch_types = break_stats.index.tolist()
    tunnel_pairs = []

    for i, p1 in enumerate(pitch_types):
        for p2 in pitch_types[i+1:]:

            # Calculate differentials
            velo_diff = abs(break_stats.loc[p1, 'velo'] -
                          break_stats.loc[p2, 'velo'])

            ivb_diff = abs(break_stats.loc[p1, 'ivb_mean'] -
                         break_stats.loc[p2, 'ivb_mean'])

            hb_diff = abs(break_stats.loc[p1, 'hb_mean'] -
                        break_stats.loc[p2, 'hb_mean'])

            total_break_diff = np.sqrt(ivb_diff**2 + hb_diff**2)

            # Check if pair meets criteria
            if total_break_diff >= min_break_diff and velo_diff <= max_velo_diff:

                # Calculate tunnel quality score
                # Higher score = better tunneling potential
                tunnel_score = (total_break_diff / 20) * (1 - velo_diff / 15)

                tunnel_pairs.append({
                    'pitch1': p1,
                    'pitch2': p2,
                    'velo_diff': velo_diff,
                    'ivb_diff': ivb_diff,
                    'hb_diff': hb_diff,
                    'total_break_diff': total_break_diff,
                    'tunnel_score': tunnel_score
                })

    return pd.DataFrame(tunnel_pairs).sort_values('tunnel_score', ascending=False)


def visualize_break_profiles_advanced(pitcher_data, pitcher_name):
    """
    Create advanced visualization of pitch break profiles
    """

    fig, axes = plt.subplots(2, 2, figsize=(15, 14))

    valid = pitcher_data.dropna(subset=['pfx_x', 'pfx_z', 'pitch_type'])

    # Convert to inches
    valid['hb'] = valid['pfx_x'] * 12
    valid['ivb'] = valid['pfx_z'] * 12

    pitch_types = valid['pitch_type'].value_counts().head(6).index.tolist()
    colors = plt.cm.tab10(np.linspace(0, 1, len(pitch_types)))
    color_map = dict(zip(pitch_types, colors))

    # Plot 1: Movement profile (traditional view)
    ax1 = axes[0, 0]

    for ptype in pitch_types:
        subset = valid[valid['pitch_type'] == ptype]

        ax1.scatter(subset['hb'], subset['ivb'],
                   c=[color_map[ptype]], label=ptype,
                   alpha=0.3, s=15)

        # Add mean marker
        mean_hb = subset['hb'].mean()
        mean_ivb = subset['ivb'].mean()
        ax1.scatter(mean_hb, mean_ivb, c=[color_map[ptype]],
                   s=200, marker='*', edgecolors='black', linewidths=1.5)

    ax1.axhline(0, color='black', linestyle='--', alpha=0.3, linewidth=1)
    ax1.axvline(0, color='black', linestyle='--', alpha=0.3, linewidth=1)
    ax1.set_xlabel('Horizontal Break (inches)', fontsize=11)
    ax1.set_ylabel('Induced Vertical Break (inches)', fontsize=11)
    ax1.set_title('Movement Profile (Catcher\'s View)', fontsize=12, fontweight='bold')
    ax1.legend(loc='best', fontsize=9)
    ax1.grid(True, alpha=0.3)
    ax1.set_aspect('equal', adjustable='box')

    # Plot 2: Break vs. Velocity
    ax2 = axes[0, 1]

    for ptype in pitch_types:
        subset = valid[valid['pitch_type'] == ptype]
        total_break = np.sqrt(subset['hb']**2 + subset['ivb']**2)

        ax2.scatter(subset['release_speed'], total_break,
                   c=[color_map[ptype]], label=ptype,
                   alpha=0.4, s=20)

    ax2.set_xlabel('Release Speed (mph)', fontsize=11)
    ax2.set_ylabel('Total Break (inches)', fontsize=11)
    ax2.set_title('Break vs. Velocity', fontsize=12, fontweight='bold')
    ax2.legend(loc='best', fontsize=9)
    ax2.grid(True, alpha=0.3)

    # Plot 3: Break angle distribution
    ax3 = axes[1, 0]

    for ptype in pitch_types:
        subset = valid[valid['pitch_type'] == ptype]
        break_angle = np.degrees(np.arctan2(subset['ivb'], subset['hb']))

        ax3.hist(break_angle, bins=20, alpha=0.5,
                label=ptype, color=color_map[ptype])

    ax3.set_xlabel('Break Angle (degrees)', fontsize=11)
    ax3.set_ylabel('Frequency', fontsize=11)
    ax3.set_title('Break Angle Distribution', fontsize=12, fontweight='bold')
    ax3.legend(loc='best', fontsize=9)
    ax3.grid(True, alpha=0.3, axis='y')

    # Plot 4: Spin rate vs. IVB
    ax4 = axes[1, 1]

    valid_spin = valid.dropna(subset=['release_spin_rate'])

    for ptype in pitch_types:
        subset = valid_spin[valid_spin['pitch_type'] == ptype]

        if len(subset) > 0:
            ax4.scatter(subset['release_spin_rate'], subset['ivb'],
                       c=[color_map[ptype]], label=ptype,
                       alpha=0.4, s=20)

    ax4.set_xlabel('Spin Rate (rpm)', fontsize=11)
    ax4.set_ylabel('Induced Vertical Break (inches)', fontsize=11)
    ax4.set_title('Spin Rate vs. IVB', fontsize=12, fontweight='bold')
    ax4.legend(loc='best', fontsize=9)
    ax4.grid(True, alpha=0.3)

    plt.suptitle(f'{pitcher_name} - Advanced Break Analysis',
                fontsize=14, fontweight='bold', y=0.995)
    plt.tight_layout()

    return fig


# Example usage
# break_analysis = analyze_pitch_break_profiles(pitches)
# optimal_pairs = find_optimal_tunnel_pairs(pitches)
# print("\nOptimal Tunneling Pairs:")
# print(optimal_pairs)
R
# Analyze break differentials for tunneling pairs
analyze_break_differentials <- function(pitcher_data) {

  # Calculate average break by pitch type
  break_summary <- pitcher_data %>%
    group_by(pitch_type) %>%
    summarise(
      n = n(),
      avg_speed = mean(release_speed, na.rm = TRUE),

      # Induced vertical break (pfx_z in feet, convert to inches)
      ivb = mean(pfx_z, na.rm = TRUE) * 12,
      ivb_sd = sd(pfx_z, na.rm = TRUE) * 12,

      # Horizontal break (pfx_x in feet, convert to inches)
      hb = mean(pfx_x, na.rm = TRUE) * 12,
      hb_sd = sd(pfx_x, na.rm = TRUE) * 12,

      # Total break magnitude
      total_break = sqrt(ivb^2 + hb^2),

      # Spin rate
      spin_rate = mean(release_spin_rate, na.rm = TRUE),

      .groups = 'drop'
    ) %>%
    arrange(desc(n))

  return(break_summary)
}

# Analyze Spencer Strider's arsenal
strider_breaks <- analyze_break_differentials(strider_data)

print("Spencer Strider Break Profile (2023)")
print(strider_breaks)

# Calculate pairwise break differentials
calculate_break_differential_matrix <- function(pitcher_data) {

  break_data <- pitcher_data %>%
    group_by(pitch_type) %>%
    summarise(
      ivb = mean(pfx_z, na.rm = TRUE) * 12,
      hb = mean(pfx_x, na.rm = TRUE) * 12,
      .groups = 'drop'
    )

  n <- nrow(break_data)

  # Create matrices for IVB and HB differentials
  ivb_diff <- matrix(0, nrow = n, ncol = n)
  hb_diff <- matrix(0, nrow = n, ncol = n)
  total_diff <- matrix(0, nrow = n, ncol = n)

  rownames(ivb_diff) <- rownames(hb_diff) <- rownames(total_diff) <- break_data$pitch_type
  colnames(ivb_diff) <- colnames(hb_diff) <- colnames(total_diff) <- break_data$pitch_type

  for (i in 1:n) {
    for (j in 1:n) {
      ivb_diff[i, j] <- break_data$ivb[i] - break_data$ivb[j]
      hb_diff[i, j] <- break_data$hb[i] - break_data$hb[j]
      total_diff[i, j] <- sqrt(ivb_diff[i, j]^2 + hb_diff[i, j]^2)
    }
  }

  return(list(
    ivb_differential = ivb_diff,
    hb_differential = hb_diff,
    total_differential = total_diff
  ))
}

strider_differentials <- calculate_break_differential_matrix(strider_data)

print("\nInduced Vertical Break Differentials (inches):")
print(round(strider_differentials$ivb_differential, 1))

print("\nHorizontal Break Differentials (inches):")
print(round(strider_differentials$hb_differential, 1))

print("\nTotal Break Differentials (inches):")
print(round(strider_differentials$total_differential, 1))

# Visualize break profiles
plot_break_profile <- function(pitcher_data, pitcher_name) {

  plot_data <- pitcher_data %>%
    filter(!is.na(pfx_x) & !is.na(pfx_z)) %>%
    mutate(
      hb_inches = pfx_x * 12,
      ivb_inches = pfx_z * 12
    )

  # Calculate means for labels
  means <- plot_data %>%
    group_by(pitch_type) %>%
    summarise(
      hb = mean(hb_inches),
      ivb = mean(ivb_inches),
      n = n(),
      .groups = 'drop'
    )

  p <- ggplot(plot_data, aes(x = hb_inches, y = ivb_inches,
                              color = pitch_type)) +
    geom_point(alpha = 0.2, size = 2) +
    stat_ellipse(level = 0.75, size = 1.2) +
    geom_point(data = means, aes(x = hb, y = ivb),
               size = 5, shape = 17) +
    geom_text(data = means,
              aes(x = hb, y = ivb,
                  label = paste0(pitch_type, "\n(n=", n, ")")),
              vjust = -1.5, size = 3.5, fontface = "bold") +
    geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
    geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.5) +
    labs(
      title = paste(pitcher_name, "- Pitch Movement Profile"),
      subtitle = "Induced vertical break vs. horizontal break (catcher's perspective)",
      x = "Horizontal Break (inches, negative = glove side)",
      y = "Induced Vertical Break (inches, positive = rise)",
      color = "Pitch Type",
      caption = "Triangles represent mean movement | Ellipses show 75% confidence intervals"
    ) +
    theme_minimal() +
    theme(
      legend.position = "right",
      plot.title = element_text(size = 14, face = "bold"),
      plot.subtitle = element_text(size = 10)
    ) +
    coord_fixed(ratio = 1)

  return(p)
}

p_breaks <- plot_break_profile(strider_data, "Spencer Strider")
print(p_breaks)
Python
def analyze_pitch_break_profiles(pitcher_data):
    """
    Comprehensive analysis of pitch movement profiles

    Returns break statistics and optimal tunneling pairs
    """

    # Filter valid data
    valid = pitcher_data.dropna(subset=['pfx_x', 'pfx_z', 'pitch_type'])

    # Calculate break statistics by pitch type
    break_stats = valid.groupby('pitch_type').agg({
        'pfx_x': ['mean', 'std'],
        'pfx_z': ['mean', 'std'],
        'release_speed': 'mean',
        'release_spin_rate': 'mean',
        'pitch_type': 'count'
    })

    # Convert to inches
    break_stats[('pfx_x', 'mean')] *= 12
    break_stats[('pfx_x', 'std')] *= 12
    break_stats[('pfx_z', 'mean')] *= 12
    break_stats[('pfx_z', 'std')] *= 12

    # Rename columns for clarity
    break_stats.columns = ['hb_mean', 'hb_std', 'ivb_mean', 'ivb_std',
                           'velo', 'spin', 'count']

    # Calculate total break
    break_stats['total_break'] = np.sqrt(
        break_stats['hb_mean']**2 + break_stats['ivb_mean']**2
    )

    # Calculate break angle
    break_stats['break_angle'] = np.degrees(
        np.arctan2(break_stats['ivb_mean'], break_stats['hb_mean'])
    )

    return break_stats


def find_optimal_tunnel_pairs(pitcher_data, min_break_diff=6, max_velo_diff=10):
    """
    Identify optimal pitch pairs for tunneling based on:
    1. Sufficient break differential (creates deception)
    2. Similar velocity (similar initial trajectory)
    3. Different movement profiles
    """

    break_stats = analyze_pitch_break_profiles(pitcher_data)

    pitch_types = break_stats.index.tolist()
    tunnel_pairs = []

    for i, p1 in enumerate(pitch_types):
        for p2 in pitch_types[i+1:]:

            # Calculate differentials
            velo_diff = abs(break_stats.loc[p1, 'velo'] -
                          break_stats.loc[p2, 'velo'])

            ivb_diff = abs(break_stats.loc[p1, 'ivb_mean'] -
                         break_stats.loc[p2, 'ivb_mean'])

            hb_diff = abs(break_stats.loc[p1, 'hb_mean'] -
                        break_stats.loc[p2, 'hb_mean'])

            total_break_diff = np.sqrt(ivb_diff**2 + hb_diff**2)

            # Check if pair meets criteria
            if total_break_diff >= min_break_diff and velo_diff <= max_velo_diff:

                # Calculate tunnel quality score
                # Higher score = better tunneling potential
                tunnel_score = (total_break_diff / 20) * (1 - velo_diff / 15)

                tunnel_pairs.append({
                    'pitch1': p1,
                    'pitch2': p2,
                    'velo_diff': velo_diff,
                    'ivb_diff': ivb_diff,
                    'hb_diff': hb_diff,
                    'total_break_diff': total_break_diff,
                    'tunnel_score': tunnel_score
                })

    return pd.DataFrame(tunnel_pairs).sort_values('tunnel_score', ascending=False)


def visualize_break_profiles_advanced(pitcher_data, pitcher_name):
    """
    Create advanced visualization of pitch break profiles
    """

    fig, axes = plt.subplots(2, 2, figsize=(15, 14))

    valid = pitcher_data.dropna(subset=['pfx_x', 'pfx_z', 'pitch_type'])

    # Convert to inches
    valid['hb'] = valid['pfx_x'] * 12
    valid['ivb'] = valid['pfx_z'] * 12

    pitch_types = valid['pitch_type'].value_counts().head(6).index.tolist()
    colors = plt.cm.tab10(np.linspace(0, 1, len(pitch_types)))
    color_map = dict(zip(pitch_types, colors))

    # Plot 1: Movement profile (traditional view)
    ax1 = axes[0, 0]

    for ptype in pitch_types:
        subset = valid[valid['pitch_type'] == ptype]

        ax1.scatter(subset['hb'], subset['ivb'],
                   c=[color_map[ptype]], label=ptype,
                   alpha=0.3, s=15)

        # Add mean marker
        mean_hb = subset['hb'].mean()
        mean_ivb = subset['ivb'].mean()
        ax1.scatter(mean_hb, mean_ivb, c=[color_map[ptype]],
                   s=200, marker='*', edgecolors='black', linewidths=1.5)

    ax1.axhline(0, color='black', linestyle='--', alpha=0.3, linewidth=1)
    ax1.axvline(0, color='black', linestyle='--', alpha=0.3, linewidth=1)
    ax1.set_xlabel('Horizontal Break (inches)', fontsize=11)
    ax1.set_ylabel('Induced Vertical Break (inches)', fontsize=11)
    ax1.set_title('Movement Profile (Catcher\'s View)', fontsize=12, fontweight='bold')
    ax1.legend(loc='best', fontsize=9)
    ax1.grid(True, alpha=0.3)
    ax1.set_aspect('equal', adjustable='box')

    # Plot 2: Break vs. Velocity
    ax2 = axes[0, 1]

    for ptype in pitch_types:
        subset = valid[valid['pitch_type'] == ptype]
        total_break = np.sqrt(subset['hb']**2 + subset['ivb']**2)

        ax2.scatter(subset['release_speed'], total_break,
                   c=[color_map[ptype]], label=ptype,
                   alpha=0.4, s=20)

    ax2.set_xlabel('Release Speed (mph)', fontsize=11)
    ax2.set_ylabel('Total Break (inches)', fontsize=11)
    ax2.set_title('Break vs. Velocity', fontsize=12, fontweight='bold')
    ax2.legend(loc='best', fontsize=9)
    ax2.grid(True, alpha=0.3)

    # Plot 3: Break angle distribution
    ax3 = axes[1, 0]

    for ptype in pitch_types:
        subset = valid[valid['pitch_type'] == ptype]
        break_angle = np.degrees(np.arctan2(subset['ivb'], subset['hb']))

        ax3.hist(break_angle, bins=20, alpha=0.5,
                label=ptype, color=color_map[ptype])

    ax3.set_xlabel('Break Angle (degrees)', fontsize=11)
    ax3.set_ylabel('Frequency', fontsize=11)
    ax3.set_title('Break Angle Distribution', fontsize=12, fontweight='bold')
    ax3.legend(loc='best', fontsize=9)
    ax3.grid(True, alpha=0.3, axis='y')

    # Plot 4: Spin rate vs. IVB
    ax4 = axes[1, 1]

    valid_spin = valid.dropna(subset=['release_spin_rate'])

    for ptype in pitch_types:
        subset = valid_spin[valid_spin['pitch_type'] == ptype]

        if len(subset) > 0:
            ax4.scatter(subset['release_spin_rate'], subset['ivb'],
                       c=[color_map[ptype]], label=ptype,
                       alpha=0.4, s=20)

    ax4.set_xlabel('Spin Rate (rpm)', fontsize=11)
    ax4.set_ylabel('Induced Vertical Break (inches)', fontsize=11)
    ax4.set_title('Spin Rate vs. IVB', fontsize=12, fontweight='bold')
    ax4.legend(loc='best', fontsize=9)
    ax4.grid(True, alpha=0.3)

    plt.suptitle(f'{pitcher_name} - Advanced Break Analysis',
                fontsize=14, fontweight='bold', y=0.995)
    plt.tight_layout()

    return fig


# Example usage
# break_analysis = analyze_pitch_break_profiles(pitches)
# optimal_pairs = find_optimal_tunnel_pairs(pitches)
# print("\nOptimal Tunneling Pairs:")
# print(optimal_pairs)

26.5 Pitch Sequencing Strategies

Effective tunneling isn't just about having pitches that can tunnel—it's about using them strategically in sequences that maximize deception and exploit hitter weaknesses.

Common Tunneling Sequences

The Classic Fastball-Breaking Ball Tunnel

The most fundamental tunneling sequence pairs a high-velocity fastball with a breaking ball that has similar early trajectory. Spencer Strider's fastball-slider combination exemplifies this:

  1. Fastball up in zone (98+ mph, +18 inches IVB)
  2. Slider same release, drops below zone (88 mph, +2 inches IVB, -8 inches HB)

Hitters must commit to the high fastball or low slider before the tunnel point around 20-25 feet. Elite execution makes this a 50-50 guess.

The Velocity Ladder

This sequence uses multiple pitches of decreasing velocity, all tunneling through the same release point:

  1. Four-seam fastball (97 mph)
  2. Two-seam fastball/sinker (94 mph)
  3. Changeup (87 mph)
  4. Curveball (82 mph)

Each pitch has similar initial trajectory but different break and speed, creating a cascading effect where hitters struggle to time any pitch correctly.

Same-Side/Opposite-Side Break Patterns

Advanced sequencing alternates between glove-side and arm-side movement:

  • Pitch 1: Fastball with arm-side run (positive HB)
  • Pitch 2: Slider with glove-side break (negative HB)
  • Pitch 3: Back to arm-side (changeup or two-seam)

This horizontal variation forces hitters to cover the full width of the plate while managing vertical uncertainty.

Analyzing Pitch Sequences in R

# Analyze pitch sequencing patterns
library(tidyverse)

analyze_pitch_sequences <- function(pitcher_data, sequence_length = 2) {

  # Arrange by game and at-bat
  sequenced_data <- pitcher_data %>%
    filter(!is.na(pitch_type)) %>%
    arrange(game_date, at_bat_number, pitch_number) %>%
    group_by(game_pk, at_bat_number) %>%
    mutate(
      prev_pitch = lag(pitch_type, 1),
      prev_prev_pitch = lag(pitch_type, 2),
      result = description
    ) %>%
    ungroup()

  # Analyze two-pitch sequences
  if (sequence_length == 2) {
    sequences <- sequenced_data %>%
      filter(!is.na(prev_pitch)) %>%
      group_by(prev_pitch, pitch_type) %>%
      summarise(
        n_sequences = n(),

        # Outcomes
        swing_pct = mean(description %in% c("swinging_strike",
                                            "foul",
                                            "hit_into_play"),
                        na.rm = TRUE) * 100,

        whiff_pct = mean(description == "swinging_strike",
                        na.rm = TRUE) * 100,

        contact_pct = mean(description %in% c("foul", "hit_into_play"),
                          na.rm = TRUE) * 100,

        chase_pct = mean(description == "swinging_strike" &
                        zone < 1, na.rm = TRUE) * 100,

        .groups = 'drop'
      ) %>%
      arrange(desc(n_sequences))

    return(sequences)
  }

  # Analyze three-pitch sequences
  if (sequence_length == 3) {
    sequences <- sequenced_data %>%
      filter(!is.na(prev_pitch) & !is.na(prev_prev_pitch)) %>%
      group_by(prev_prev_pitch, prev_pitch, pitch_type) %>%
      summarise(
        n_sequences = n(),
        whiff_pct = mean(description == "swinging_strike",
                        na.rm = TRUE) * 100,
        .groups = 'drop'
      ) %>%
      arrange(desc(n_sequences))

    return(sequences)
  }
}

# Analyze Spencer Strider's sequences
strider_sequences_2 <- analyze_pitch_sequences(strider_data, sequence_length = 2)

print("Spencer Strider Two-Pitch Sequences (2023)")
print(strider_sequences_2 %>% head(10))

# Focus on tunneling pairs
tunnel_sequences <- strider_sequences_2 %>%
  filter((prev_pitch == "FF" & pitch_type == "SL") |
         (prev_pitch == "SL" & pitch_type == "FF"))

print("\nFastball-Slider Tunneling Sequences:")
print(tunnel_sequences)

# Visualize sequence effectiveness
plot_sequence_heatmap <- function(sequence_data, metric = "whiff_pct") {

  # Create matrix for heatmap
  plot_matrix <- sequence_data %>%
    select(prev_pitch, pitch_type, all_of(metric)) %>%
    pivot_wider(
      names_from = pitch_type,
      values_from = all_of(metric),
      values_fill = 0
    ) %>%
    column_to_rownames("prev_pitch") %>%
    as.matrix()

  # Create heatmap
  library(reshape2)
  melted <- melt(plot_matrix)

  p <- ggplot(melted, aes(x = Var2, y = Var1, fill = value)) +
    geom_tile(color = "white", size = 1) +
    geom_text(aes(label = sprintf("%.1f%%", value)),
              color = "white", fontface = "bold", size = 4) +
    scale_fill_gradient2(
      low = "#3182bd", mid = "#9ecae1", high = "#de2d26",
      midpoint = median(melted$value, na.rm = TRUE),
      name = str_replace_all(metric, "_", " ") %>% str_to_title()
    ) +
    labs(
      title = "Pitch Sequencing Effectiveness",
      subtitle = paste("Metric:", str_replace_all(metric, "_", " ") %>% str_to_title()),
      x = "Current Pitch",
      y = "Previous Pitch"
    ) +
    theme_minimal() +
    theme(
      axis.text.x = element_text(angle = 45, hjust = 1),
      plot.title = element_text(size = 14, face = "bold")
    ) +
    coord_fixed()

  return(p)
}

p_sequence <- plot_sequence_heatmap(strider_sequences_2, "whiff_pct")
print(p_sequence)

# Analyze count-based sequencing
analyze_count_sequences <- function(pitcher_data) {

  sequence_by_count <- pitcher_data %>%
    filter(!is.na(pitch_type) & !is.na(balls) & !is.na(strikes)) %>%
    mutate(count = paste0(balls, "-", strikes)) %>%
    group_by(count, pitch_type) %>%
    summarise(
      n = n(),
      usage_pct = n() / nrow(pitcher_data) * 100,
      whiff_rate = mean(description == "swinging_strike", na.rm = TRUE),
      csw_rate = mean(description %in% c("called_strike", "swinging_strike"),
                     na.rm = TRUE),
      .groups = 'drop'
    ) %>%
    arrange(count, desc(n))

  return(sequence_by_count)
}

strider_by_count <- analyze_count_sequences(strider_data)

print("\nPitch Usage by Count:")
print(strider_by_count %>%
      filter(n >= 20) %>%
      head(15))

Python Sequence Analysis

def analyze_pitch_sequencing(pitcher_data):
    """
    Comprehensive pitch sequencing analysis

    Returns sequence patterns, effectiveness metrics, and recommendations
    """

    # Sort data chronologically within at-bats
    data = pitcher_data.sort_values(['game_date', 'at_bat_number', 'pitch_number'])

    # Create lagged pitch types
    data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
    data['next_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(-1)

    # Analyze two-pitch sequences
    sequences = data.dropna(subset=['prev_pitch', 'pitch_type']).groupby(
        ['prev_pitch', 'pitch_type']
    ).agg({
        'pitch_type': 'count',
        'description': lambda x: (x == 'swinging_strike').mean() * 100,
        'zone': lambda x: ((x < 1) & (data.loc[x.index, 'description'] == 'swinging_strike')).mean() * 100
    }).rename(columns={
        'pitch_type': 'count',
        'description': 'whiff_rate',
        'zone': 'chase_rate'
    })

    sequences['usage_pct'] = sequences['count'] / sequences['count'].sum() * 100

    return sequences.sort_values('count', ascending=False)


def calculate_sequence_run_value(pitcher_data):
    """
    Calculate run value of pitch sequences using linear weights
    """

    # Linear weights for pitch outcomes
    weights = {
        'swinging_strike': -0.10,
        'swinging_strike_blocked': -0.10,
        'called_strike': -0.05,
        'foul': -0.03,
        'ball': 0.03,
        'hit_by_pitch': 0.08,
        'hit_into_play': 0.02  # Average, varies by batted ball
    }

    data = pitcher_data.copy()
    data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
    data['run_value'] = data['description'].map(weights).fillna(0)

    sequence_values = data.dropna(subset=['prev_pitch']).groupby(
        ['prev_pitch', 'pitch_type']
    ).agg({
        'run_value': ['mean', 'count'],
        'estimated_woba_using_speedangle': 'mean'
    })

    return sequence_values


def visualize_sequence_network(pitcher_data, pitcher_name, min_sequences=20):
    """
    Create network graph showing pitch sequence flow
    """

    import networkx as nx

    # Get sequence data
    sequences = analyze_pitch_sequencing(pitcher_data)
    sequences = sequences[sequences['count'] >= min_sequences]

    # Create directed graph
    G = nx.DiGraph()

    for (prev_pitch, curr_pitch), row in sequences.iterrows():
        G.add_edge(prev_pitch, curr_pitch,
                  weight=row['count'],
                  whiff_rate=row['whiff_rate'])

    # Create visualization
    fig, ax = plt.subplots(figsize=(12, 10))

    pos = nx.spring_layout(G, k=2, iterations=50)

    # Draw nodes
    node_sizes = [G.degree(node) * 300 for node in G.nodes()]
    nx.draw_networkx_nodes(G, pos, node_size=node_sizes,
                          node_color='lightblue',
                          edgecolors='black',
                          linewidths=2,
                          ax=ax)

    # Draw edges with width proportional to frequency
    edges = G.edges()
    weights = [G[u][v]['weight'] for u, v in edges]
    max_weight = max(weights)

    nx.draw_networkx_edges(G, pos,
                          width=[w/max_weight * 5 for w in weights],
                          alpha=0.6,
                          edge_color='gray',
                          arrows=True,
                          arrowsize=20,
                          arrowstyle='->',
                          connectionstyle='arc3,rad=0.1',
                          ax=ax)

    # Draw labels
    nx.draw_networkx_labels(G, pos, font_size=12,
                           font_weight='bold', ax=ax)

    # Add edge labels for high-frequency sequences
    edge_labels = {(u, v): f"{G[u][v]['weight']}\n{G[u][v]['whiff_rate']:.1f}%"
                  for u, v in edges if G[u][v]['weight'] > min_sequences * 2}

    nx.draw_networkx_edge_labels(G, pos, edge_labels,
                                font_size=8, ax=ax)

    ax.set_title(f'{pitcher_name} - Pitch Sequence Network\n' +
                'Node size = degree | Edge width = frequency | Labels = count & whiff%',
                fontsize=14, fontweight='bold', pad=20)
    ax.axis('off')

    plt.tight_layout()
    return fig


def identify_high_leverage_sequences(pitcher_data):
    """
    Identify most effective sequences in high-leverage situations
    """

    data = pitcher_data.copy()
    data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)

    # Define high leverage: 2-strike counts
    high_leverage = data[data['strikes'] == 2].copy()

    sequences = high_leverage.dropna(subset=['prev_pitch']).groupby(
        ['prev_pitch', 'pitch_type']
    ).agg({
        'description': [
            ('count', 'count'),
            ('strikeout_rate', lambda x: (x == 'swinging_strike').sum() / len(x)),
            ('whiff_rate', lambda x: (x == 'swinging_strike').mean()),
        ],
        'events': lambda x: x.notna().sum()
    })

    sequences.columns = ['_'.join(col).strip('_') for col in sequences.columns.values]

    return sequences.sort_values('description_count', ascending=False)

# Example usage
# sequences = analyze_pitch_sequencing(pitches)
# print("\nMost Common Sequences:")
# print(sequences.head(15))
#
# high_lev = identify_high_leverage_sequences(pitches)
# print("\nTwo-Strike Sequences:")
# print(high_lev.head(10))
R
# Analyze pitch sequencing patterns
library(tidyverse)

analyze_pitch_sequences <- function(pitcher_data, sequence_length = 2) {

  # Arrange by game and at-bat
  sequenced_data <- pitcher_data %>%
    filter(!is.na(pitch_type)) %>%
    arrange(game_date, at_bat_number, pitch_number) %>%
    group_by(game_pk, at_bat_number) %>%
    mutate(
      prev_pitch = lag(pitch_type, 1),
      prev_prev_pitch = lag(pitch_type, 2),
      result = description
    ) %>%
    ungroup()

  # Analyze two-pitch sequences
  if (sequence_length == 2) {
    sequences <- sequenced_data %>%
      filter(!is.na(prev_pitch)) %>%
      group_by(prev_pitch, pitch_type) %>%
      summarise(
        n_sequences = n(),

        # Outcomes
        swing_pct = mean(description %in% c("swinging_strike",
                                            "foul",
                                            "hit_into_play"),
                        na.rm = TRUE) * 100,

        whiff_pct = mean(description == "swinging_strike",
                        na.rm = TRUE) * 100,

        contact_pct = mean(description %in% c("foul", "hit_into_play"),
                          na.rm = TRUE) * 100,

        chase_pct = mean(description == "swinging_strike" &
                        zone < 1, na.rm = TRUE) * 100,

        .groups = 'drop'
      ) %>%
      arrange(desc(n_sequences))

    return(sequences)
  }

  # Analyze three-pitch sequences
  if (sequence_length == 3) {
    sequences <- sequenced_data %>%
      filter(!is.na(prev_pitch) & !is.na(prev_prev_pitch)) %>%
      group_by(prev_prev_pitch, prev_pitch, pitch_type) %>%
      summarise(
        n_sequences = n(),
        whiff_pct = mean(description == "swinging_strike",
                        na.rm = TRUE) * 100,
        .groups = 'drop'
      ) %>%
      arrange(desc(n_sequences))

    return(sequences)
  }
}

# Analyze Spencer Strider's sequences
strider_sequences_2 <- analyze_pitch_sequences(strider_data, sequence_length = 2)

print("Spencer Strider Two-Pitch Sequences (2023)")
print(strider_sequences_2 %>% head(10))

# Focus on tunneling pairs
tunnel_sequences <- strider_sequences_2 %>%
  filter((prev_pitch == "FF" & pitch_type == "SL") |
         (prev_pitch == "SL" & pitch_type == "FF"))

print("\nFastball-Slider Tunneling Sequences:")
print(tunnel_sequences)

# Visualize sequence effectiveness
plot_sequence_heatmap <- function(sequence_data, metric = "whiff_pct") {

  # Create matrix for heatmap
  plot_matrix <- sequence_data %>%
    select(prev_pitch, pitch_type, all_of(metric)) %>%
    pivot_wider(
      names_from = pitch_type,
      values_from = all_of(metric),
      values_fill = 0
    ) %>%
    column_to_rownames("prev_pitch") %>%
    as.matrix()

  # Create heatmap
  library(reshape2)
  melted <- melt(plot_matrix)

  p <- ggplot(melted, aes(x = Var2, y = Var1, fill = value)) +
    geom_tile(color = "white", size = 1) +
    geom_text(aes(label = sprintf("%.1f%%", value)),
              color = "white", fontface = "bold", size = 4) +
    scale_fill_gradient2(
      low = "#3182bd", mid = "#9ecae1", high = "#de2d26",
      midpoint = median(melted$value, na.rm = TRUE),
      name = str_replace_all(metric, "_", " ") %>% str_to_title()
    ) +
    labs(
      title = "Pitch Sequencing Effectiveness",
      subtitle = paste("Metric:", str_replace_all(metric, "_", " ") %>% str_to_title()),
      x = "Current Pitch",
      y = "Previous Pitch"
    ) +
    theme_minimal() +
    theme(
      axis.text.x = element_text(angle = 45, hjust = 1),
      plot.title = element_text(size = 14, face = "bold")
    ) +
    coord_fixed()

  return(p)
}

p_sequence <- plot_sequence_heatmap(strider_sequences_2, "whiff_pct")
print(p_sequence)

# Analyze count-based sequencing
analyze_count_sequences <- function(pitcher_data) {

  sequence_by_count <- pitcher_data %>%
    filter(!is.na(pitch_type) & !is.na(balls) & !is.na(strikes)) %>%
    mutate(count = paste0(balls, "-", strikes)) %>%
    group_by(count, pitch_type) %>%
    summarise(
      n = n(),
      usage_pct = n() / nrow(pitcher_data) * 100,
      whiff_rate = mean(description == "swinging_strike", na.rm = TRUE),
      csw_rate = mean(description %in% c("called_strike", "swinging_strike"),
                     na.rm = TRUE),
      .groups = 'drop'
    ) %>%
    arrange(count, desc(n))

  return(sequence_by_count)
}

strider_by_count <- analyze_count_sequences(strider_data)

print("\nPitch Usage by Count:")
print(strider_by_count %>%
      filter(n >= 20) %>%
      head(15))
Python
def analyze_pitch_sequencing(pitcher_data):
    """
    Comprehensive pitch sequencing analysis

    Returns sequence patterns, effectiveness metrics, and recommendations
    """

    # Sort data chronologically within at-bats
    data = pitcher_data.sort_values(['game_date', 'at_bat_number', 'pitch_number'])

    # Create lagged pitch types
    data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
    data['next_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(-1)

    # Analyze two-pitch sequences
    sequences = data.dropna(subset=['prev_pitch', 'pitch_type']).groupby(
        ['prev_pitch', 'pitch_type']
    ).agg({
        'pitch_type': 'count',
        'description': lambda x: (x == 'swinging_strike').mean() * 100,
        'zone': lambda x: ((x < 1) & (data.loc[x.index, 'description'] == 'swinging_strike')).mean() * 100
    }).rename(columns={
        'pitch_type': 'count',
        'description': 'whiff_rate',
        'zone': 'chase_rate'
    })

    sequences['usage_pct'] = sequences['count'] / sequences['count'].sum() * 100

    return sequences.sort_values('count', ascending=False)


def calculate_sequence_run_value(pitcher_data):
    """
    Calculate run value of pitch sequences using linear weights
    """

    # Linear weights for pitch outcomes
    weights = {
        'swinging_strike': -0.10,
        'swinging_strike_blocked': -0.10,
        'called_strike': -0.05,
        'foul': -0.03,
        'ball': 0.03,
        'hit_by_pitch': 0.08,
        'hit_into_play': 0.02  # Average, varies by batted ball
    }

    data = pitcher_data.copy()
    data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
    data['run_value'] = data['description'].map(weights).fillna(0)

    sequence_values = data.dropna(subset=['prev_pitch']).groupby(
        ['prev_pitch', 'pitch_type']
    ).agg({
        'run_value': ['mean', 'count'],
        'estimated_woba_using_speedangle': 'mean'
    })

    return sequence_values


def visualize_sequence_network(pitcher_data, pitcher_name, min_sequences=20):
    """
    Create network graph showing pitch sequence flow
    """

    import networkx as nx

    # Get sequence data
    sequences = analyze_pitch_sequencing(pitcher_data)
    sequences = sequences[sequences['count'] >= min_sequences]

    # Create directed graph
    G = nx.DiGraph()

    for (prev_pitch, curr_pitch), row in sequences.iterrows():
        G.add_edge(prev_pitch, curr_pitch,
                  weight=row['count'],
                  whiff_rate=row['whiff_rate'])

    # Create visualization
    fig, ax = plt.subplots(figsize=(12, 10))

    pos = nx.spring_layout(G, k=2, iterations=50)

    # Draw nodes
    node_sizes = [G.degree(node) * 300 for node in G.nodes()]
    nx.draw_networkx_nodes(G, pos, node_size=node_sizes,
                          node_color='lightblue',
                          edgecolors='black',
                          linewidths=2,
                          ax=ax)

    # Draw edges with width proportional to frequency
    edges = G.edges()
    weights = [G[u][v]['weight'] for u, v in edges]
    max_weight = max(weights)

    nx.draw_networkx_edges(G, pos,
                          width=[w/max_weight * 5 for w in weights],
                          alpha=0.6,
                          edge_color='gray',
                          arrows=True,
                          arrowsize=20,
                          arrowstyle='->',
                          connectionstyle='arc3,rad=0.1',
                          ax=ax)

    # Draw labels
    nx.draw_networkx_labels(G, pos, font_size=12,
                           font_weight='bold', ax=ax)

    # Add edge labels for high-frequency sequences
    edge_labels = {(u, v): f"{G[u][v]['weight']}\n{G[u][v]['whiff_rate']:.1f}%"
                  for u, v in edges if G[u][v]['weight'] > min_sequences * 2}

    nx.draw_networkx_edge_labels(G, pos, edge_labels,
                                font_size=8, ax=ax)

    ax.set_title(f'{pitcher_name} - Pitch Sequence Network\n' +
                'Node size = degree | Edge width = frequency | Labels = count & whiff%',
                fontsize=14, fontweight='bold', pad=20)
    ax.axis('off')

    plt.tight_layout()
    return fig


def identify_high_leverage_sequences(pitcher_data):
    """
    Identify most effective sequences in high-leverage situations
    """

    data = pitcher_data.copy()
    data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)

    # Define high leverage: 2-strike counts
    high_leverage = data[data['strikes'] == 2].copy()

    sequences = high_leverage.dropna(subset=['prev_pitch']).groupby(
        ['prev_pitch', 'pitch_type']
    ).agg({
        'description': [
            ('count', 'count'),
            ('strikeout_rate', lambda x: (x == 'swinging_strike').sum() / len(x)),
            ('whiff_rate', lambda x: (x == 'swinging_strike').mean()),
        ],
        'events': lambda x: x.notna().sum()
    })

    sequences.columns = ['_'.join(col).strip('_') for col in sequences.columns.values]

    return sequences.sort_values('description_count', ascending=False)

# Example usage
# sequences = analyze_pitch_sequencing(pitches)
# print("\nMost Common Sequences:")
# print(sequences.head(15))
#
# high_lev = identify_high_leverage_sequences(pitches)
# print("\nTwo-Strike Sequences:")
# print(high_lev.head(10))

26.6 Measuring Deception Effectiveness

Quantifying pitch deception requires metrics that capture hitter decision-making quality. Traditional stats like strikeout rate don't isolate the tunneling effect. We need metrics that measure how often hitters are "fooled" by the pitch presentation.

Key Deception Metrics

Swing Decision Value (SDV)

This metric evaluates whether hitters make correct swing decisions based on pitch location. It compares actual swing/take decisions against optimal decisions (swing at strikes, take balls).

SDV = (Correct Swings + Correct Takes) / Total Pitches

Lower SDV indicates more deception—hitters are making poor decisions.

Chase Rate on Tunneled Pitches

Measures how often hitters swing at pitches outside the zone when they follow a tunneling setup pitch.

Early Swing Rate

Hitters who are fooled by tunneling often commit to their swing earlier, visible in higher swing rates on non-competitive pitches.

Expected vs. Actual Contact Quality

When tunneling works, hitters make poor contact even when they swing. This can be measured by comparing expected wOBA (xwOBA) on contact against actual results.

Implementing Deception Metrics in R

# Calculate comprehensive deception metrics
calculate_deception_metrics <- function(pitcher_data) {

  # Add swing/take information
  pitch_metrics <- pitcher_data %>%
    mutate(
      swing = description %in% c("swinging_strike", "swinging_strike_blocked",
                                 "foul", "foul_tip", "hit_into_play",
                                 "foul_bunt", "missed_bunt"),

      whiff = description %in% c("swinging_strike", "swinging_strike_blocked"),

      chase = swing & (zone > 9 | is.na(zone)),

      in_zone = zone <= 9 & !is.na(zone),

      # Swing decision value components
      correct_swing = swing & in_zone,
      correct_take = !swing & !in_zone,
      incorrect_swing = swing & !in_zone,
      incorrect_take = !swing & in_zone
    )

  # Overall metrics
  overall <- pitch_metrics %>%
    summarise(
      total_pitches = n(),

      # Swing metrics
      swing_rate = mean(swing, na.rm = TRUE) * 100,
      whiff_rate = sum(whiff, na.rm = TRUE) / sum(swing, na.rm = TRUE) * 100,
      chase_rate = sum(chase, na.rm = TRUE) / sum(!in_zone, na.rm = TRUE) * 100,
      zone_swing_rate = sum(swing[in_zone], na.rm = TRUE) /
                        sum(in_zone, na.rm = TRUE) * 100,

      # Decision quality
      swing_decision_value = mean(c(correct_swing, correct_take),
                                 na.rm = TRUE) * 100,

      # Called strike plus whiff rate
      csw_rate = mean(description %in% c("called_strike", "swinging_strike",
                                        "swinging_strike_blocked"),
                     na.rm = TRUE) * 100
    )

  # By pitch type
  by_pitch <- pitch_metrics %>%
    group_by(pitch_type) %>%
    summarise(
      n = n(),
      swing_rate = mean(swing, na.rm = TRUE) * 100,
      whiff_rate = sum(whiff, na.rm = TRUE) / sum(swing, na.rm = TRUE) * 100,
      chase_rate = sum(chase, na.rm = TRUE) / sum(!in_zone, na.rm = TRUE) * 100,
      csw_rate = mean(description %in% c("called_strike", "swinging_strike"),
                     na.rm = TRUE) * 100,
      .groups = 'drop'
    ) %>%
    arrange(desc(n))

  # Tunnel-specific metrics (pitches following fastballs)
  tunnel_metrics <- pitch_metrics %>%
    arrange(game_date, at_bat_number, pitch_number) %>%
    group_by(game_pk, at_bat_number) %>%
    mutate(prev_pitch = lag(pitch_type)) %>%
    ungroup() %>%
    filter(prev_pitch == "FF" & pitch_type %in% c("SL", "CU", "CH")) %>%
    group_by(pitch_type) %>%
    summarise(
      n_after_ff = n(),
      chase_rate_after_ff = sum(chase, na.rm = TRUE) /
                            sum(!in_zone, na.rm = TRUE) * 100,
      whiff_rate_after_ff = sum(whiff, na.rm = TRUE) /
                           sum(swing, na.rm = TRUE) * 100,
      .groups = 'drop'
    )

  return(list(
    overall = overall,
    by_pitch_type = by_pitch,
    tunnel_metrics = tunnel_metrics
  ))
}

# Analyze Spencer Strider's deception
strider_deception <- calculate_deception_metrics(strider_data)

print("Spencer Strider Deception Metrics (2023)")
print("========================================\n")
print("Overall:")
print(strider_deception$overall)
print("\nBy Pitch Type:")
print(strider_deception$by_pitch_type)
print("\nAfter Fastball Setup:")
print(strider_deception$tunnel_metrics)

# Compare deception across counts
deception_by_count <- function(pitcher_data) {

  pitcher_data %>%
    filter(!is.na(balls) & !is.na(strikes)) %>%
    mutate(
      count = paste0(balls, "-", strikes),
      swing = description %in% c("swinging_strike", "foul",
                                 "hit_into_play", "swinging_strike_blocked"),
      chase = swing & (zone > 9 | is.na(zone)),
      in_zone = zone <= 9 & !is.na(zone)
    ) %>%
    group_by(count) %>%
    summarise(
      n = n(),
      chase_rate = sum(chase, na.rm = TRUE) / sum(!in_zone, na.rm = TRUE) * 100,
      swing_rate = mean(swing, na.rm = TRUE) * 100,
      .groups = 'drop'
    ) %>%
    arrange(desc(n))
}

strider_count_deception <- deception_by_count(strider_data)

print("\nDeception by Count:")
print(strider_count_deception)

# Visualize deception metrics
plot_deception_comparison <- function(pitcher_data, pitcher_name) {

  metrics <- calculate_deception_metrics(pitcher_data)

  plot_data <- metrics$by_pitch_type %>%
    filter(n >= 50) %>%
    select(pitch_type, whiff_rate, chase_rate, csw_rate) %>%
    pivot_longer(cols = c(whiff_rate, chase_rate, csw_rate),
                names_to = "metric",
                values_to = "rate")

  p <- ggplot(plot_data, aes(x = pitch_type, y = rate, fill = metric)) +
    geom_bar(stat = "identity", position = "dodge", alpha = 0.8) +
    geom_text(aes(label = sprintf("%.1f%%", rate)),
              position = position_dodge(width = 0.9),
              vjust = -0.5, size = 3) +
    scale_fill_brewer(palette = "Set2",
                     labels = c("Chase Rate", "CSW Rate", "Whiff Rate")) +
    labs(
      title = paste(pitcher_name, "- Deception Effectiveness by Pitch"),
      subtitle = "Key metrics for measuring pitch deception",
      x = "Pitch Type",
      y = "Rate (%)",
      fill = "Metric"
    ) +
    theme_minimal() +
    theme(
      legend.position = "bottom",
      plot.title = element_text(face = "bold", size = 14)
    )

  return(p)
}

p_deception <- plot_deception_comparison(strider_data, "Spencer Strider")
print(p_deception)

Advanced Deception Analysis in Python

def calculate_tunneling_effectiveness_score(pitcher_data):
    """
    Calculate comprehensive tunneling effectiveness score (TES)

    TES combines multiple factors:
    - Chase rate on breaking balls after fastballs
    - Whiff rate on tunneled pitches
    - Swing decision errors
    - Contact quality degradation

    Returns score from 0-100 (higher = better tunneling)
    """

    data = pitcher_data.copy()

    # Identify swings and chases
    data['swing'] = data['description'].isin([
        'swinging_strike', 'swinging_strike_blocked',
        'foul', 'foul_tip', 'hit_into_play'
    ])

    data['whiff'] = data['description'].isin([
        'swinging_strike', 'swinging_strike_blocked'
    ])

    data['in_zone'] = (data['zone'] <= 9) & (data['zone'].notna())
    data['chase'] = data['swing'] & ~data['in_zone']

    # Add previous pitch info
    data = data.sort_values(['game_date', 'at_bat_number', 'pitch_number'])
    data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)

    # Calculate component scores

    # 1. Chase rate on breaking balls after fastballs (weight: 35%)
    tunnel_pitches = data[
        (data['prev_pitch'] == 'FF') &
        (data['pitch_type'].isin(['SL', 'CU', 'CH']))
    ]

    if len(tunnel_pitches) > 0:
        chase_score = (tunnel_pitches['chase'].sum() /
                      (~tunnel_pitches['in_zone']).sum() * 100)
    else:
        chase_score = 0

    # 2. Whiff rate on swings (weight: 30%)
    if data['swing'].sum() > 0:
        whiff_score = (data['whiff'].sum() / data['swing'].sum() * 100)
    else:
        whiff_score = 0

    # 3. CSW rate (weight: 20%)
    csw = data['description'].isin(['called_strike', 'swinging_strike'])
    csw_score = csw.mean() * 100

    # 4. Swing decision errors (weight: 15%)
    correct_decisions = (
        (data['swing'] & data['in_zone']) |
        (~data['swing'] & ~data['in_zone'])
    )
    decision_error_rate = (1 - correct_decisions.mean()) * 100

    # Combine scores with weights
    tes = (
        chase_score * 0.35 +
        whiff_score * 0.30 +
        csw_score * 0.20 +
        decision_error_rate * 0.15
    )

    # Normalize to 0-100 scale
    tes = min(100, max(0, tes))

    return {
        'tunneling_effectiveness_score': tes,
        'chase_rate': chase_score,
        'whiff_rate': whiff_score,
        'csw_rate': csw_score,
        'decision_error_rate': decision_error_rate
    }


def compare_deception_to_league(pitcher_data, league_data=None):
    """
    Compare pitcher's deception metrics to league average

    Returns percentile rankings for key metrics
    """

    pitcher_metrics = calculate_tunneling_effectiveness_score(pitcher_data)

    # If no league data provided, use typical MLB averages (2023)
    if league_data is None:
        league_avg = {
            'chase_rate': 28.5,
            'whiff_rate': 24.2,
            'csw_rate': 28.1,
            'tunneling_effectiveness_score': 27.0
        }
        league_std = {
            'chase_rate': 4.5,
            'whiff_rate': 3.8,
            'csw_rate': 3.2,
            'tunneling_effectiveness_score': 5.5
        }
    else:
        # Calculate from provided league data
        league_avg = calculate_tunneling_effectiveness_score(league_data)
        league_std = {k: v * 0.15 for k, v in league_avg.items()}  # Estimate

    # Calculate z-scores and percentiles
    comparison = {}
    for metric in ['chase_rate', 'whiff_rate', 'csw_rate',
                   'tunneling_effectiveness_score']:
        z_score = ((pitcher_metrics[metric] - league_avg[metric]) /
                  league_std[metric])

        # Convert to percentile (approximate)
        from scipy import stats
        percentile = stats.norm.cdf(z_score) * 100

        comparison[metric] = {
            'value': pitcher_metrics[metric],
            'league_avg': league_avg[metric],
            'z_score': z_score,
            'percentile': percentile
        }

    return comparison


def visualize_deception_dashboard(pitcher_data, pitcher_name):
    """
    Create comprehensive deception analysis dashboard
    """

    fig = plt.figure(figsize=(16, 12))
    gs = fig.add_gridspec(3, 3, hspace=0.3, wspace=0.3)

    # Prepare data
    data = pitcher_data.copy()
    data['swing'] = data['description'].isin([
        'swinging_strike', 'foul', 'hit_into_play', 'swinging_strike_blocked'
    ])
    data['whiff'] = data['description'].isin(['swinging_strike', 'swinging_strike_blocked'])
    data['in_zone'] = (data['zone'] <= 9) & (data['zone'].notna())
    data['chase'] = data['swing'] & ~data['in_zone']

    # 1. Overall TES Score (large)
    ax1 = fig.add_subplot(gs[0, :2])

    tes_result = calculate_tunneling_effectiveness_score(data)
    tes = tes_result['tunneling_effectiveness_score']

    # Create gauge chart
    theta = np.linspace(0, np.pi, 100)
    r = np.ones(100)

    # Color segments
    ax1.fill_between(theta[:33], 0, r[:33], color='#d62728', alpha=0.3, label='Below Avg')
    ax1.fill_between(theta[33:66], 0, r[33:66], color='#ff7f0e', alpha=0.3, label='Average')
    ax1.fill_between(theta[66:], 0, r[66:], color='#2ca02c', alpha=0.3, label='Elite')

    # Add needle
    angle = (tes / 100) * np.pi
    ax1.plot([0, np.cos(angle)], [0, np.sin(angle)], 'k-', linewidth=4)
    ax1.plot(0, 0, 'ko', markersize=15)

    ax1.set_xlim(-0.1, 1.1)
    ax1.set_ylim(0, 1.1)
    ax1.axis('off')
    ax1.set_title(f'Tunneling Effectiveness Score: {tes:.1f}',
                 fontsize=16, fontweight='bold', pad=20)
    ax1.text(0.5, -0.15, f'Percentile: {tes:.0f}th',
            ha='center', fontsize=12, transform=ax1.transData)

    # 2. Component breakdown
    ax2 = fig.add_subplot(gs[0, 2])

    components = ['Chase\nRate', 'Whiff\nRate', 'CSW\nRate', 'Decision\nErrors']
    values = [
        tes_result['chase_rate'],
        tes_result['whiff_rate'],
        tes_result['csw_rate'],
        tes_result['decision_error_rate']
    ]

    colors_comp = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']
    ax2.barh(components, values, color=colors_comp, alpha=0.7)
    ax2.set_xlabel('Rate (%)', fontsize=10)
    ax2.set_title('Component Metrics', fontsize=11, fontweight='bold')
    ax2.grid(axis='x', alpha=0.3)

    # 3. Chase rate by pitch type after FF
    ax3 = fig.add_subplot(gs[1, 0])

    data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
    tunnel_data = data[data['prev_pitch'] == 'FF']

    chase_by_pitch = tunnel_data.groupby('pitch_type').apply(
        lambda x: x['chase'].sum() / (~x['in_zone']).sum() * 100 if (~x['in_zone']).sum() > 0 else 0
    ).sort_values(ascending=False)

    ax3.bar(range(len(chase_by_pitch)), chase_by_pitch.values,
           color='steelblue', alpha=0.7)
    ax3.set_xticks(range(len(chase_by_pitch)))
    ax3.set_xticklabels(chase_by_pitch.index, rotation=45)
    ax3.set_ylabel('Chase Rate (%)', fontsize=10)
    ax3.set_title('Chase Rate After Fastball', fontsize=11, fontweight='bold')
    ax3.grid(axis='y', alpha=0.3)
    ax3.axhline(28.5, color='red', linestyle='--', alpha=0.5, label='MLB Avg')
    ax3.legend(fontsize=8)

    # 4. Whiff rate by count
    ax4 = fig.add_subplot(gs[1, 1])

    data['count'] = data['balls'].astype(str) + '-' + data['strikes'].astype(str)
    whiff_by_count = data.groupby('count').apply(
        lambda x: x['whiff'].sum() / x['swing'].sum() * 100 if x['swing'].sum() > 0 else 0
    )

    # Order counts logically
    count_order = ['0-0', '0-1', '0-2', '1-0', '1-1', '1-2',
                   '2-0', '2-1', '2-2', '3-0', '3-1', '3-2']
    whiff_by_count = whiff_by_count.reindex(
        [c for c in count_order if c in whiff_by_count.index]
    )

    ax4.plot(range(len(whiff_by_count)), whiff_by_count.values,
            marker='o', linewidth=2, markersize=8, color='darkgreen')
    ax4.set_xticks(range(len(whiff_by_count)))
    ax4.set_xticklabels(whiff_by_count.index, rotation=45, fontsize=8)
    ax4.set_ylabel('Whiff Rate (%)', fontsize=10)
    ax4.set_title('Whiff Rate by Count', fontsize=11, fontweight='bold')
    ax4.grid(alpha=0.3)
    ax4.axhline(24.2, color='red', linestyle='--', alpha=0.5)

    # 5. Heat map: swing rate by location after FF
    ax5 = fig.add_subplot(gs[1, 2])

    tunnel_data_valid = tunnel_data.dropna(subset=['plate_x', 'plate_z'])

    # Create 2D histogram
    x_bins = np.linspace(-1.5, 1.5, 10)
    z_bins = np.linspace(1.0, 4.0, 10)

    H, xedges, zedges = np.histogram2d(
        tunnel_data_valid['plate_x'],
        tunnel_data_valid['plate_z'],
        bins=[x_bins, z_bins],
        weights=tunnel_data_valid['swing'].astype(float)
    )

    H_count, _, _ = np.histogram2d(
        tunnel_data_valid['plate_x'],
        tunnel_data_valid['plate_z'],
        bins=[x_bins, z_bins]
    )

    swing_rate_grid = np.divide(H, H_count, where=H_count > 0,
                               out=np.zeros_like(H)) * 100

    im = ax5.imshow(swing_rate_grid.T, origin='lower',
                   extent=[-1.5, 1.5, 1.0, 4.0],
                   aspect='auto', cmap='RdYlGn', vmin=0, vmax=100)

    # Add strike zone
    from matplotlib.patches import Rectangle
    sz = Rectangle((-17/24, 1.5), 17/12, 2.0,
                  linewidth=2, edgecolor='black',
                  facecolor='none')
    ax5.add_patch(sz)

    ax5.set_xlabel('Horizontal (ft)', fontsize=9)
    ax5.set_ylabel('Height (ft)', fontsize=9)
    ax5.set_title('Swing Rate After FB', fontsize=11, fontweight='bold')
    plt.colorbar(im, ax=ax5, label='Swing %')

    # 6. Pitch sequence network (simplified)
    ax6 = fig.add_subplot(gs[2, :])

    data_seq = data.sort_values(['game_date', 'at_bat_number', 'pitch_number'])
    sequences = data_seq.groupby(['prev_pitch', 'pitch_type']).size()
    sequences = sequences[sequences >= 20].sort_values(ascending=False).head(10)

    labels = [f"{prev} → {curr}" for prev, curr in sequences.index]
    values = sequences.values

    bars = ax6.barh(range(len(sequences)), values, color='coral', alpha=0.7)
    ax6.set_yticks(range(len(sequences)))
    ax6.set_yticklabels(labels, fontsize=9)
    ax6.set_xlabel('Frequency', fontsize=10)
    ax6.set_title('Most Common Pitch Sequences', fontsize=11, fontweight='bold')
    ax6.grid(axis='x', alpha=0.3)

    # Add value labels
    for i, (bar, val) in enumerate(zip(bars, values)):
        ax6.text(val + 5, i, str(int(val)), va='center', fontsize=9)

    plt.suptitle(f'{pitcher_name} - Deception Analytics Dashboard',
                fontsize=16, fontweight='bold', y=0.995)

    return fig

# Example usage
# tes_score = calculate_tunneling_effectiveness_score(pitches)
# print(f"\nTunneling Effectiveness Score: {tes_score['tunneling_effectiveness_score']:.1f}")
# print(f"Component Breakdown:")
# for k, v in tes_score.items():
#     if k != 'tunneling_effectiveness_score':
#         print(f"  {k}: {v:.1f}%")
#
# fig_dashboard = visualize_deception_dashboard(pitches, "Dylan Cease")
# plt.savefig('cease_deception_dashboard.png', dpi=300, bbox_inches='tight')
R
SDV = (Correct Swings + Correct Takes) / Total Pitches
R
# Calculate comprehensive deception metrics
calculate_deception_metrics <- function(pitcher_data) {

  # Add swing/take information
  pitch_metrics <- pitcher_data %>%
    mutate(
      swing = description %in% c("swinging_strike", "swinging_strike_blocked",
                                 "foul", "foul_tip", "hit_into_play",
                                 "foul_bunt", "missed_bunt"),

      whiff = description %in% c("swinging_strike", "swinging_strike_blocked"),

      chase = swing & (zone > 9 | is.na(zone)),

      in_zone = zone <= 9 & !is.na(zone),

      # Swing decision value components
      correct_swing = swing & in_zone,
      correct_take = !swing & !in_zone,
      incorrect_swing = swing & !in_zone,
      incorrect_take = !swing & in_zone
    )

  # Overall metrics
  overall <- pitch_metrics %>%
    summarise(
      total_pitches = n(),

      # Swing metrics
      swing_rate = mean(swing, na.rm = TRUE) * 100,
      whiff_rate = sum(whiff, na.rm = TRUE) / sum(swing, na.rm = TRUE) * 100,
      chase_rate = sum(chase, na.rm = TRUE) / sum(!in_zone, na.rm = TRUE) * 100,
      zone_swing_rate = sum(swing[in_zone], na.rm = TRUE) /
                        sum(in_zone, na.rm = TRUE) * 100,

      # Decision quality
      swing_decision_value = mean(c(correct_swing, correct_take),
                                 na.rm = TRUE) * 100,

      # Called strike plus whiff rate
      csw_rate = mean(description %in% c("called_strike", "swinging_strike",
                                        "swinging_strike_blocked"),
                     na.rm = TRUE) * 100
    )

  # By pitch type
  by_pitch <- pitch_metrics %>%
    group_by(pitch_type) %>%
    summarise(
      n = n(),
      swing_rate = mean(swing, na.rm = TRUE) * 100,
      whiff_rate = sum(whiff, na.rm = TRUE) / sum(swing, na.rm = TRUE) * 100,
      chase_rate = sum(chase, na.rm = TRUE) / sum(!in_zone, na.rm = TRUE) * 100,
      csw_rate = mean(description %in% c("called_strike", "swinging_strike"),
                     na.rm = TRUE) * 100,
      .groups = 'drop'
    ) %>%
    arrange(desc(n))

  # Tunnel-specific metrics (pitches following fastballs)
  tunnel_metrics <- pitch_metrics %>%
    arrange(game_date, at_bat_number, pitch_number) %>%
    group_by(game_pk, at_bat_number) %>%
    mutate(prev_pitch = lag(pitch_type)) %>%
    ungroup() %>%
    filter(prev_pitch == "FF" & pitch_type %in% c("SL", "CU", "CH")) %>%
    group_by(pitch_type) %>%
    summarise(
      n_after_ff = n(),
      chase_rate_after_ff = sum(chase, na.rm = TRUE) /
                            sum(!in_zone, na.rm = TRUE) * 100,
      whiff_rate_after_ff = sum(whiff, na.rm = TRUE) /
                           sum(swing, na.rm = TRUE) * 100,
      .groups = 'drop'
    )

  return(list(
    overall = overall,
    by_pitch_type = by_pitch,
    tunnel_metrics = tunnel_metrics
  ))
}

# Analyze Spencer Strider's deception
strider_deception <- calculate_deception_metrics(strider_data)

print("Spencer Strider Deception Metrics (2023)")
print("========================================\n")
print("Overall:")
print(strider_deception$overall)
print("\nBy Pitch Type:")
print(strider_deception$by_pitch_type)
print("\nAfter Fastball Setup:")
print(strider_deception$tunnel_metrics)

# Compare deception across counts
deception_by_count <- function(pitcher_data) {

  pitcher_data %>%
    filter(!is.na(balls) & !is.na(strikes)) %>%
    mutate(
      count = paste0(balls, "-", strikes),
      swing = description %in% c("swinging_strike", "foul",
                                 "hit_into_play", "swinging_strike_blocked"),
      chase = swing & (zone > 9 | is.na(zone)),
      in_zone = zone <= 9 & !is.na(zone)
    ) %>%
    group_by(count) %>%
    summarise(
      n = n(),
      chase_rate = sum(chase, na.rm = TRUE) / sum(!in_zone, na.rm = TRUE) * 100,
      swing_rate = mean(swing, na.rm = TRUE) * 100,
      .groups = 'drop'
    ) %>%
    arrange(desc(n))
}

strider_count_deception <- deception_by_count(strider_data)

print("\nDeception by Count:")
print(strider_count_deception)

# Visualize deception metrics
plot_deception_comparison <- function(pitcher_data, pitcher_name) {

  metrics <- calculate_deception_metrics(pitcher_data)

  plot_data <- metrics$by_pitch_type %>%
    filter(n >= 50) %>%
    select(pitch_type, whiff_rate, chase_rate, csw_rate) %>%
    pivot_longer(cols = c(whiff_rate, chase_rate, csw_rate),
                names_to = "metric",
                values_to = "rate")

  p <- ggplot(plot_data, aes(x = pitch_type, y = rate, fill = metric)) +
    geom_bar(stat = "identity", position = "dodge", alpha = 0.8) +
    geom_text(aes(label = sprintf("%.1f%%", rate)),
              position = position_dodge(width = 0.9),
              vjust = -0.5, size = 3) +
    scale_fill_brewer(palette = "Set2",
                     labels = c("Chase Rate", "CSW Rate", "Whiff Rate")) +
    labs(
      title = paste(pitcher_name, "- Deception Effectiveness by Pitch"),
      subtitle = "Key metrics for measuring pitch deception",
      x = "Pitch Type",
      y = "Rate (%)",
      fill = "Metric"
    ) +
    theme_minimal() +
    theme(
      legend.position = "bottom",
      plot.title = element_text(face = "bold", size = 14)
    )

  return(p)
}

p_deception <- plot_deception_comparison(strider_data, "Spencer Strider")
print(p_deception)
Python
def calculate_tunneling_effectiveness_score(pitcher_data):
    """
    Calculate comprehensive tunneling effectiveness score (TES)

    TES combines multiple factors:
    - Chase rate on breaking balls after fastballs
    - Whiff rate on tunneled pitches
    - Swing decision errors
    - Contact quality degradation

    Returns score from 0-100 (higher = better tunneling)
    """

    data = pitcher_data.copy()

    # Identify swings and chases
    data['swing'] = data['description'].isin([
        'swinging_strike', 'swinging_strike_blocked',
        'foul', 'foul_tip', 'hit_into_play'
    ])

    data['whiff'] = data['description'].isin([
        'swinging_strike', 'swinging_strike_blocked'
    ])

    data['in_zone'] = (data['zone'] <= 9) & (data['zone'].notna())
    data['chase'] = data['swing'] & ~data['in_zone']

    # Add previous pitch info
    data = data.sort_values(['game_date', 'at_bat_number', 'pitch_number'])
    data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)

    # Calculate component scores

    # 1. Chase rate on breaking balls after fastballs (weight: 35%)
    tunnel_pitches = data[
        (data['prev_pitch'] == 'FF') &
        (data['pitch_type'].isin(['SL', 'CU', 'CH']))
    ]

    if len(tunnel_pitches) > 0:
        chase_score = (tunnel_pitches['chase'].sum() /
                      (~tunnel_pitches['in_zone']).sum() * 100)
    else:
        chase_score = 0

    # 2. Whiff rate on swings (weight: 30%)
    if data['swing'].sum() > 0:
        whiff_score = (data['whiff'].sum() / data['swing'].sum() * 100)
    else:
        whiff_score = 0

    # 3. CSW rate (weight: 20%)
    csw = data['description'].isin(['called_strike', 'swinging_strike'])
    csw_score = csw.mean() * 100

    # 4. Swing decision errors (weight: 15%)
    correct_decisions = (
        (data['swing'] & data['in_zone']) |
        (~data['swing'] & ~data['in_zone'])
    )
    decision_error_rate = (1 - correct_decisions.mean()) * 100

    # Combine scores with weights
    tes = (
        chase_score * 0.35 +
        whiff_score * 0.30 +
        csw_score * 0.20 +
        decision_error_rate * 0.15
    )

    # Normalize to 0-100 scale
    tes = min(100, max(0, tes))

    return {
        'tunneling_effectiveness_score': tes,
        'chase_rate': chase_score,
        'whiff_rate': whiff_score,
        'csw_rate': csw_score,
        'decision_error_rate': decision_error_rate
    }


def compare_deception_to_league(pitcher_data, league_data=None):
    """
    Compare pitcher's deception metrics to league average

    Returns percentile rankings for key metrics
    """

    pitcher_metrics = calculate_tunneling_effectiveness_score(pitcher_data)

    # If no league data provided, use typical MLB averages (2023)
    if league_data is None:
        league_avg = {
            'chase_rate': 28.5,
            'whiff_rate': 24.2,
            'csw_rate': 28.1,
            'tunneling_effectiveness_score': 27.0
        }
        league_std = {
            'chase_rate': 4.5,
            'whiff_rate': 3.8,
            'csw_rate': 3.2,
            'tunneling_effectiveness_score': 5.5
        }
    else:
        # Calculate from provided league data
        league_avg = calculate_tunneling_effectiveness_score(league_data)
        league_std = {k: v * 0.15 for k, v in league_avg.items()}  # Estimate

    # Calculate z-scores and percentiles
    comparison = {}
    for metric in ['chase_rate', 'whiff_rate', 'csw_rate',
                   'tunneling_effectiveness_score']:
        z_score = ((pitcher_metrics[metric] - league_avg[metric]) /
                  league_std[metric])

        # Convert to percentile (approximate)
        from scipy import stats
        percentile = stats.norm.cdf(z_score) * 100

        comparison[metric] = {
            'value': pitcher_metrics[metric],
            'league_avg': league_avg[metric],
            'z_score': z_score,
            'percentile': percentile
        }

    return comparison


def visualize_deception_dashboard(pitcher_data, pitcher_name):
    """
    Create comprehensive deception analysis dashboard
    """

    fig = plt.figure(figsize=(16, 12))
    gs = fig.add_gridspec(3, 3, hspace=0.3, wspace=0.3)

    # Prepare data
    data = pitcher_data.copy()
    data['swing'] = data['description'].isin([
        'swinging_strike', 'foul', 'hit_into_play', 'swinging_strike_blocked'
    ])
    data['whiff'] = data['description'].isin(['swinging_strike', 'swinging_strike_blocked'])
    data['in_zone'] = (data['zone'] <= 9) & (data['zone'].notna())
    data['chase'] = data['swing'] & ~data['in_zone']

    # 1. Overall TES Score (large)
    ax1 = fig.add_subplot(gs[0, :2])

    tes_result = calculate_tunneling_effectiveness_score(data)
    tes = tes_result['tunneling_effectiveness_score']

    # Create gauge chart
    theta = np.linspace(0, np.pi, 100)
    r = np.ones(100)

    # Color segments
    ax1.fill_between(theta[:33], 0, r[:33], color='#d62728', alpha=0.3, label='Below Avg')
    ax1.fill_between(theta[33:66], 0, r[33:66], color='#ff7f0e', alpha=0.3, label='Average')
    ax1.fill_between(theta[66:], 0, r[66:], color='#2ca02c', alpha=0.3, label='Elite')

    # Add needle
    angle = (tes / 100) * np.pi
    ax1.plot([0, np.cos(angle)], [0, np.sin(angle)], 'k-', linewidth=4)
    ax1.plot(0, 0, 'ko', markersize=15)

    ax1.set_xlim(-0.1, 1.1)
    ax1.set_ylim(0, 1.1)
    ax1.axis('off')
    ax1.set_title(f'Tunneling Effectiveness Score: {tes:.1f}',
                 fontsize=16, fontweight='bold', pad=20)
    ax1.text(0.5, -0.15, f'Percentile: {tes:.0f}th',
            ha='center', fontsize=12, transform=ax1.transData)

    # 2. Component breakdown
    ax2 = fig.add_subplot(gs[0, 2])

    components = ['Chase\nRate', 'Whiff\nRate', 'CSW\nRate', 'Decision\nErrors']
    values = [
        tes_result['chase_rate'],
        tes_result['whiff_rate'],
        tes_result['csw_rate'],
        tes_result['decision_error_rate']
    ]

    colors_comp = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']
    ax2.barh(components, values, color=colors_comp, alpha=0.7)
    ax2.set_xlabel('Rate (%)', fontsize=10)
    ax2.set_title('Component Metrics', fontsize=11, fontweight='bold')
    ax2.grid(axis='x', alpha=0.3)

    # 3. Chase rate by pitch type after FF
    ax3 = fig.add_subplot(gs[1, 0])

    data['prev_pitch'] = data.groupby(['game_pk', 'at_bat_number'])['pitch_type'].shift(1)
    tunnel_data = data[data['prev_pitch'] == 'FF']

    chase_by_pitch = tunnel_data.groupby('pitch_type').apply(
        lambda x: x['chase'].sum() / (~x['in_zone']).sum() * 100 if (~x['in_zone']).sum() > 0 else 0
    ).sort_values(ascending=False)

    ax3.bar(range(len(chase_by_pitch)), chase_by_pitch.values,
           color='steelblue', alpha=0.7)
    ax3.set_xticks(range(len(chase_by_pitch)))
    ax3.set_xticklabels(chase_by_pitch.index, rotation=45)
    ax3.set_ylabel('Chase Rate (%)', fontsize=10)
    ax3.set_title('Chase Rate After Fastball', fontsize=11, fontweight='bold')
    ax3.grid(axis='y', alpha=0.3)
    ax3.axhline(28.5, color='red', linestyle='--', alpha=0.5, label='MLB Avg')
    ax3.legend(fontsize=8)

    # 4. Whiff rate by count
    ax4 = fig.add_subplot(gs[1, 1])

    data['count'] = data['balls'].astype(str) + '-' + data['strikes'].astype(str)
    whiff_by_count = data.groupby('count').apply(
        lambda x: x['whiff'].sum() / x['swing'].sum() * 100 if x['swing'].sum() > 0 else 0
    )

    # Order counts logically
    count_order = ['0-0', '0-1', '0-2', '1-0', '1-1', '1-2',
                   '2-0', '2-1', '2-2', '3-0', '3-1', '3-2']
    whiff_by_count = whiff_by_count.reindex(
        [c for c in count_order if c in whiff_by_count.index]
    )

    ax4.plot(range(len(whiff_by_count)), whiff_by_count.values,
            marker='o', linewidth=2, markersize=8, color='darkgreen')
    ax4.set_xticks(range(len(whiff_by_count)))
    ax4.set_xticklabels(whiff_by_count.index, rotation=45, fontsize=8)
    ax4.set_ylabel('Whiff Rate (%)', fontsize=10)
    ax4.set_title('Whiff Rate by Count', fontsize=11, fontweight='bold')
    ax4.grid(alpha=0.3)
    ax4.axhline(24.2, color='red', linestyle='--', alpha=0.5)

    # 5. Heat map: swing rate by location after FF
    ax5 = fig.add_subplot(gs[1, 2])

    tunnel_data_valid = tunnel_data.dropna(subset=['plate_x', 'plate_z'])

    # Create 2D histogram
    x_bins = np.linspace(-1.5, 1.5, 10)
    z_bins = np.linspace(1.0, 4.0, 10)

    H, xedges, zedges = np.histogram2d(
        tunnel_data_valid['plate_x'],
        tunnel_data_valid['plate_z'],
        bins=[x_bins, z_bins],
        weights=tunnel_data_valid['swing'].astype(float)
    )

    H_count, _, _ = np.histogram2d(
        tunnel_data_valid['plate_x'],
        tunnel_data_valid['plate_z'],
        bins=[x_bins, z_bins]
    )

    swing_rate_grid = np.divide(H, H_count, where=H_count > 0,
                               out=np.zeros_like(H)) * 100

    im = ax5.imshow(swing_rate_grid.T, origin='lower',
                   extent=[-1.5, 1.5, 1.0, 4.0],
                   aspect='auto', cmap='RdYlGn', vmin=0, vmax=100)

    # Add strike zone
    from matplotlib.patches import Rectangle
    sz = Rectangle((-17/24, 1.5), 17/12, 2.0,
                  linewidth=2, edgecolor='black',
                  facecolor='none')
    ax5.add_patch(sz)

    ax5.set_xlabel('Horizontal (ft)', fontsize=9)
    ax5.set_ylabel('Height (ft)', fontsize=9)
    ax5.set_title('Swing Rate After FB', fontsize=11, fontweight='bold')
    plt.colorbar(im, ax=ax5, label='Swing %')

    # 6. Pitch sequence network (simplified)
    ax6 = fig.add_subplot(gs[2, :])

    data_seq = data.sort_values(['game_date', 'at_bat_number', 'pitch_number'])
    sequences = data_seq.groupby(['prev_pitch', 'pitch_type']).size()
    sequences = sequences[sequences >= 20].sort_values(ascending=False).head(10)

    labels = [f"{prev} → {curr}" for prev, curr in sequences.index]
    values = sequences.values

    bars = ax6.barh(range(len(sequences)), values, color='coral', alpha=0.7)
    ax6.set_yticks(range(len(sequences)))
    ax6.set_yticklabels(labels, fontsize=9)
    ax6.set_xlabel('Frequency', fontsize=10)
    ax6.set_title('Most Common Pitch Sequences', fontsize=11, fontweight='bold')
    ax6.grid(axis='x', alpha=0.3)

    # Add value labels
    for i, (bar, val) in enumerate(zip(bars, values)):
        ax6.text(val + 5, i, str(int(val)), va='center', fontsize=9)

    plt.suptitle(f'{pitcher_name} - Deception Analytics Dashboard',
                fontsize=16, fontweight='bold', y=0.995)

    return fig

# Example usage
# tes_score = calculate_tunneling_effectiveness_score(pitches)
# print(f"\nTunneling Effectiveness Score: {tes_score['tunneling_effectiveness_score']:.1f}")
# print(f"Component Breakdown:")
# for k, v in tes_score.items():
#     if k != 'tunneling_effectiveness_score':
#         print(f"  {k}: {v:.1f}%")
#
# fig_dashboard = visualize_deception_dashboard(pitches, "Dylan Cease")
# plt.savefig('cease_deception_dashboard.png', dpi=300, bbox_inches='tight')

26.7 Interactive Tunnel Visualizations

Modern pitch tunneling analysis benefits immensely from interactive visualizations that allow exploration of trajectories, release points, and movement profiles from multiple angles.

R Interactive Visualizations with Plotly

library(plotly)
library(htmlwidgets)

# Create interactive 3D pitch trajectory visualization
create_3d_tunnel_plot <- function(pitcher_data, pitcher_name,
                                  pitch_types = c("FF", "SL")) {

  # Filter for selected pitch types
  plot_data <- pitcher_data %>%
    filter(pitch_type %in% pitch_types) %>%
    filter(!is.na(vx0) & !is.na(vy0) & !is.na(vz0))

  # Calculate trajectories for representative pitches
  # (Use median characteristics for each pitch type)
  trajectories <- list()

  for (ptype in pitch_types) {
    subset <- plot_data %>% filter(pitch_type == ptype)

    median_pitch <- subset %>%
      summarise(across(c(vx0, vy0, vz0, ax, ay, az,
                        release_pos_x, release_pos_y, release_pos_z),
                      median, na.rm = TRUE))

    # Generate trajectory points
    distances <- seq(median_pitch$release_pos_y, 1.417, length.out = 100)

    traj_points <- map_df(distances, function(d) {
      pos <- calculate_trajectory(median_pitch, d)
      tibble(
        distance = d,
        x = pos['x'],
        y = pos['y'],
        z = pos['z'],
        pitch_type = ptype
      )
    })

    trajectories[[ptype]] <- traj_points
  }

  all_trajectories <- bind_rows(trajectories)

  # Create 3D plot
  colors <- c("FF" = "#d62728", "SL" = "#1f77b4", "CU" = "#2ca02c",
              "CH" = "#ff7f0e")

  p <- plot_ly()

  for (ptype in pitch_types) {
    traj <- all_trajectories %>% filter(pitch_type == ptype)

    p <- p %>% add_trace(
      data = traj,
      x = ~x,
      y = ~y,
      z = ~z,
      type = "scatter3d",
      mode = "lines",
      line = list(width = 6, color = colors[ptype]),
      name = ptype,
      hovertemplate = paste(
        "<b>", ptype, "</b><br>",
        "Distance from plate: %{y:.1f} ft<br>",
        "Horizontal: %{x:.2f} ft<br>",
        "Height: %{z:.2f} ft<br>",
        "<extra></extra>"
      )
    )
  }

  # Add strike zone
  sz_x <- c(-0.708/2, 0.708/2, 0.708/2, -0.708/2, -0.708/2)
  sz_z <- c(1.5, 1.5, 3.5, 3.5, 1.5)
  sz_y <- rep(1.417, 5)

  p <- p %>% add_trace(
    x = sz_x,
    y = sz_y,
    z = sz_z,
    type = "scatter3d",
    mode = "lines",
    line = list(color = "black", width = 4),
    name = "Strike Zone",
    showlegend = TRUE,
    hoverinfo = "skip"
  )

  # Add home plate
  plate_x <- c(0, -0.708/2, -0.708/2, 0, 0.708/2, 0.708/2, 0)
  plate_y <- c(1.417 + 0.2, 1.417, 1.417 - 0.15, 1.417 - 0.2,
               1.417 - 0.15, 1.417, 1.417 + 0.2)
  plate_z <- rep(0, 7)

  p <- p %>% add_trace(
    x = plate_x,
    y = plate_y,
    z = plate_z,
    type = "scatter3d",
    mode = "lines",
    fill = "toself",
    line = list(color = "black", width = 2),
    name = "Home Plate",
    surfacecolor = "white",
    hoverinfo = "skip"
  )

  # Layout
  p <- p %>% layout(
    title = list(
      text = paste0("<b>", pitcher_name, " - 3D Pitch Tunneling</b><br>",
                   "<sup>Interactive trajectory visualization</sup>"),
      font = list(size = 18)
    ),
    scene = list(
      xaxis = list(title = "Horizontal Position (ft)", range = c(-2, 2)),
      yaxis = list(title = "Distance from Plate (ft)", range = c(0, 60)),
      zaxis = list(title = "Height (ft)", range = c(0, 8)),
      camera = list(
        eye = list(x = 1.5, y = -1.5, z = 0.8)
      ),
      aspectmode = "manual",
      aspectratio = list(x = 1, y = 3, z = 1)
    ),
    legend = list(x = 0.7, y = 0.9),
    hovermode = "closest"
  )

  return(p)
}

# Create interactive plot
p_3d <- create_3d_tunnel_plot(strider_data, "Spencer Strider", c("FF", "SL"))
p_3d

# Save as HTML file
# htmlwidgets::saveWidget(p_3d, "strider_3d_tunnel.html")


# Create interactive release point comparison
create_interactive_release_plot <- function(pitcher_data, pitcher_name) {

  plot_data <- pitcher_data %>%
    filter(!is.na(release_pos_x) & !is.na(release_pos_z)) %>%
    mutate(
      release_x_inches = release_pos_x * 12,
      release_z_inches = release_pos_z * 12
    )

  # Calculate statistics for hover info
  stats_by_pitch <- plot_data %>%
    group_by(pitch_type) %>%
    summarise(
      n = n(),
      mean_x = mean(release_x_inches),
      mean_z = mean(release_z_inches),
      sd_x = sd(release_x_inches),
      sd_z = sd(release_z_inches),
      mean_velo = mean(release_speed, na.rm = TRUE),
      .groups = 'drop'
    )

  p <- plot_ly()

  # Add scatter points for each pitch type
  for (ptype in unique(plot_data$pitch_type)) {
    subset <- plot_data %>% filter(pitch_type == ptype)
    stats <- stats_by_pitch %>% filter(pitch_type == ptype)

    p <- p %>% add_trace(
      data = subset,
      x = ~release_x_inches,
      y = ~release_z_inches,
      type = "scatter",
      mode = "markers",
      name = ptype,
      marker = list(size = 5, opacity = 0.3),
      hovertemplate = paste(
        "<b>", ptype, "</b><br>",
        "X: %{x:.1f} in<br>",
        "Z: %{y:.1f} in<br>",
        "Velo: ", round(subset$release_speed, 1), " mph<br>",
        "<extra></extra>"
      )
    )

    # Add mean marker
    p <- p %>% add_trace(
      x = stats$mean_x,
      y = stats$mean_z,
      type = "scatter",
      mode = "markers",
      marker = list(size = 15, symbol = "star",
                   line = list(color = "black", width = 2)),
      name = paste(ptype, "mean"),
      showlegend = FALSE,
      hovertemplate = paste0(
        "<b>", ptype, " Average</b><br>",
        "X: ", round(stats$mean_x, 2), " ± ", round(stats$sd_x, 2), " in<br>",
        "Z: ", round(stats$mean_z, 2), " ± ", round(stats$sd_z, 2), " in<br>",
        "Avg Velo: ", round(stats$mean_velo, 1), " mph<br>",
        "Count: ", stats$n, "<br>",
        "<extra></extra>"
      )
    )
  }

  p <- p %>% layout(
    title = list(
      text = paste0("<b>", pitcher_name, " - Release Point Consistency</b><br>",
                   "<sup>Stars indicate average release point for each pitch</sup>"),
      font = list(size = 16)
    ),
    xaxis = list(
      title = "Horizontal Release Position (inches from center)",
      zeroline = TRUE
    ),
    yaxis = list(
      title = "Vertical Release Position (inches)",
      zeroline = FALSE
    ),
    hovermode = "closest",
    legend = list(x = 0.02, y = 0.98)
  )

  return(p)
}

# Create release point plot
p_release_interactive <- create_interactive_release_plot(strider_data,
                                                         "Spencer Strider")
p_release_interactive


# Create interactive movement profile plot
create_interactive_movement_plot <- function(pitcher_data, pitcher_name) {

  plot_data <- pitcher_data %>%
    filter(!is.na(pfx_x) & !is.na(pfx_z)) %>%
    mutate(
      hb_inches = pfx_x * 12,
      ivb_inches = pfx_z * 12
    )

  p <- plot_ly()

  for (ptype in unique(plot_data$pitch_type)) {
    subset <- plot_data %>% filter(pitch_type == ptype)

    p <- p %>% add_trace(
      data = subset,
      x = ~hb_inches,
      y = ~ivb_inches,
      type = "scatter",
      mode = "markers",
      name = ptype,
      marker = list(
        size = ~release_speed / 5,  # Size by velocity
        opacity = 0.4,
        line = list(width = 0.5, color = "white")
      ),
      text = ~paste0(
        "Pitch: ", pitch_type, "<br>",
        "Velo: ", round(release_speed, 1), " mph<br>",
        "HB: ", round(hb_inches, 1), " in<br>",
        "IVB: ", round(ivb_inches, 1), " in<br>",
        "Spin: ", round(release_spin_rate, 0), " rpm"
      ),
      hovertemplate = "%{text}<extra></extra>"
    )
  }

  # Add reference lines
  p <- p %>%
    add_segments(x = 0, xend = 0, y = -25, yend = 25,
                line = list(dash = "dash", color = "gray"),
                showlegend = FALSE, hoverinfo = "skip") %>%
    add_segments(x = -25, xend = 25, y = 0, yend = 0,
                line = list(dash = "dash", color = "gray"),
                showlegend = FALSE, hoverinfo = "skip")

  p <- p %>% layout(
    title = list(
      text = paste0("<b>", pitcher_name, " - Pitch Movement Profile</b><br>",
                   "<sup>Marker size scaled by velocity</sup>"),
      font = list(size = 16)
    ),
    xaxis = list(
      title = "Horizontal Break (inches, - = glove side)",
      range = c(-25, 25)
    ),
    yaxis = list(
      title = "Induced Vertical Break (inches, + = rise)",
      range = c(-25, 25)
    ),
    hovermode = "closest"
  )

  return(p)
}

p_movement_interactive <- create_interactive_movement_plot(strider_data,
                                                           "Spencer Strider")
p_movement_interactive

Python Interactive Visualizations

import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

def create_interactive_tunnel_viz(pitcher_data, pitcher_name):
    """
    Create comprehensive interactive tunneling dashboard
    """

    # Filter valid data
    valid = pitcher_data.dropna(subset=['pfx_x', 'pfx_z', 'pitch_type'])

    # Create subplots
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Movement Profile', 'Release Points',
                       'Velocity Distribution', 'Break by Count'),
        specs=[[{'type': 'scatter'}, {'type': 'scatter'}],
               [{'type': 'box'}, {'type': 'bar'}]]
    )

    pitch_types = valid['pitch_type'].value_counts().head(5).index.tolist()
    colors = px.colors.qualitative.Set2

    # Plot 1: Movement profile
    for i, ptype in enumerate(pitch_types):
        subset = valid[valid['pitch_type'] == ptype]

        fig.add_trace(
            go.Scatter(
                x=subset['pfx_x'] * 12,
                y=subset['pfx_z'] * 12,
                mode='markers',
                name=ptype,
                marker=dict(size=5, opacity=0.4, color=colors[i]),
                hovertemplate=(
                    f'<b>{ptype}</b><br>' +
                    'HB: %{x:.1f} in<br>' +
                    'IVB: %{y:.1f} in<br>' +
                    '<extra></extra>'
                ),
                legendgroup=ptype
            ),
            row=1, col=1
        )

    # Add reference lines
    fig.add_hline(y=0, line_dash="dash", line_color="gray",
                 opacity=0.5, row=1, col=1)
    fig.add_vline(x=0, line_dash="dash", line_color="gray",
                 opacity=0.5, row=1, col=1)

    # Plot 2: Release points
    valid_release = valid.dropna(subset=['release_pos_x', 'release_pos_z'])

    for i, ptype in enumerate(pitch_types):
        subset = valid_release[valid_release['pitch_type'] == ptype]

        fig.add_trace(
            go.Scatter(
                x=subset['release_pos_x'] * 12,
                y=subset['release_pos_z'] * 12,
                mode='markers',
                name=ptype,
                marker=dict(size=4, opacity=0.3, color=colors[i]),
                showlegend=False,
                hovertemplate=(
                    f'<b>{ptype}</b><br>' +
                    'X: %{x:.1f} in<br>' +
                    'Z: %{y:.1f} in<br>' +
                    '<extra></extra>'
                ),
                legendgroup=ptype
            ),
            row=1, col=2
        )

    # Plot 3: Velocity distribution
    for i, ptype in enumerate(pitch_types):
        subset = valid[valid['pitch_type'] == ptype]

        fig.add_trace(
            go.Box(
                y=subset['release_speed'],
                name=ptype,
                marker_color=colors[i],
                showlegend=False,
                legendgroup=ptype
            ),
            row=2, col=1
        )

    # Plot 4: Break by count
    valid['count'] = valid['balls'].astype(str) + '-' + valid['strikes'].astype(str)

    # Calculate total break by count for primary pitch
    primary_pitch = pitch_types[0]
    primary_data = valid[valid['pitch_type'] == primary_pitch]

    break_by_count = primary_data.groupby('count').apply(
        lambda x: np.sqrt((x['pfx_x'] * 12) ** 2 + (x['pfx_z'] * 12) ** 2).mean()
    ).sort_index()

    fig.add_trace(
        go.Bar(
            x=break_by_count.index,
            y=break_by_count.values,
            marker_color=colors[0],
            name=f'{primary_pitch} Break',
            showlegend=False,
            hovertemplate=(
                'Count: %{x}<br>' +
                'Avg Break: %{y:.1f} in<br>' +
                '<extra></extra>'
            )
        ),
        row=2, col=2
    )

    # Update axes
    fig.update_xaxes(title_text="Horizontal Break (in)", row=1, col=1)
    fig.update_yaxes(title_text="Induced Vertical Break (in)", row=1, col=1)

    fig.update_xaxes(title_text="Release X (in)", row=1, col=2)
    fig.update_yaxes(title_text="Release Z (in)", row=1, col=2)

    fig.update_yaxes(title_text="Velocity (mph)", row=2, col=1)

    fig.update_xaxes(title_text="Count", row=2, col=2)
    fig.update_yaxes(title_text="Total Break (in)", row=2, col=2)

    # Update layout
    fig.update_layout(
        title_text=f"{pitcher_name} - Interactive Tunneling Dashboard",
        title_font_size=18,
        height=900,
        showlegend=True,
        hovermode='closest'
    )

    return fig

# Example: create interactive visualization
# fig_interactive = create_interactive_tunnel_viz(pitches, "Dylan Cease")
# fig_interactive.show()
# fig_interactive.write_html("cease_interactive_dashboard.html")
R
library(plotly)
library(htmlwidgets)

# Create interactive 3D pitch trajectory visualization
create_3d_tunnel_plot <- function(pitcher_data, pitcher_name,
                                  pitch_types = c("FF", "SL")) {

  # Filter for selected pitch types
  plot_data <- pitcher_data %>%
    filter(pitch_type %in% pitch_types) %>%
    filter(!is.na(vx0) & !is.na(vy0) & !is.na(vz0))

  # Calculate trajectories for representative pitches
  # (Use median characteristics for each pitch type)
  trajectories <- list()

  for (ptype in pitch_types) {
    subset <- plot_data %>% filter(pitch_type == ptype)

    median_pitch <- subset %>%
      summarise(across(c(vx0, vy0, vz0, ax, ay, az,
                        release_pos_x, release_pos_y, release_pos_z),
                      median, na.rm = TRUE))

    # Generate trajectory points
    distances <- seq(median_pitch$release_pos_y, 1.417, length.out = 100)

    traj_points <- map_df(distances, function(d) {
      pos <- calculate_trajectory(median_pitch, d)
      tibble(
        distance = d,
        x = pos['x'],
        y = pos['y'],
        z = pos['z'],
        pitch_type = ptype
      )
    })

    trajectories[[ptype]] <- traj_points
  }

  all_trajectories <- bind_rows(trajectories)

  # Create 3D plot
  colors <- c("FF" = "#d62728", "SL" = "#1f77b4", "CU" = "#2ca02c",
              "CH" = "#ff7f0e")

  p <- plot_ly()

  for (ptype in pitch_types) {
    traj <- all_trajectories %>% filter(pitch_type == ptype)

    p <- p %>% add_trace(
      data = traj,
      x = ~x,
      y = ~y,
      z = ~z,
      type = "scatter3d",
      mode = "lines",
      line = list(width = 6, color = colors[ptype]),
      name = ptype,
      hovertemplate = paste(
        "<b>", ptype, "</b><br>",
        "Distance from plate: %{y:.1f} ft<br>",
        "Horizontal: %{x:.2f} ft<br>",
        "Height: %{z:.2f} ft<br>",
        "<extra></extra>"
      )
    )
  }

  # Add strike zone
  sz_x <- c(-0.708/2, 0.708/2, 0.708/2, -0.708/2, -0.708/2)
  sz_z <- c(1.5, 1.5, 3.5, 3.5, 1.5)
  sz_y <- rep(1.417, 5)

  p <- p %>% add_trace(
    x = sz_x,
    y = sz_y,
    z = sz_z,
    type = "scatter3d",
    mode = "lines",
    line = list(color = "black", width = 4),
    name = "Strike Zone",
    showlegend = TRUE,
    hoverinfo = "skip"
  )

  # Add home plate
  plate_x <- c(0, -0.708/2, -0.708/2, 0, 0.708/2, 0.708/2, 0)
  plate_y <- c(1.417 + 0.2, 1.417, 1.417 - 0.15, 1.417 - 0.2,
               1.417 - 0.15, 1.417, 1.417 + 0.2)
  plate_z <- rep(0, 7)

  p <- p %>% add_trace(
    x = plate_x,
    y = plate_y,
    z = plate_z,
    type = "scatter3d",
    mode = "lines",
    fill = "toself",
    line = list(color = "black", width = 2),
    name = "Home Plate",
    surfacecolor = "white",
    hoverinfo = "skip"
  )

  # Layout
  p <- p %>% layout(
    title = list(
      text = paste0("<b>", pitcher_name, " - 3D Pitch Tunneling</b><br>",
                   "<sup>Interactive trajectory visualization</sup>"),
      font = list(size = 18)
    ),
    scene = list(
      xaxis = list(title = "Horizontal Position (ft)", range = c(-2, 2)),
      yaxis = list(title = "Distance from Plate (ft)", range = c(0, 60)),
      zaxis = list(title = "Height (ft)", range = c(0, 8)),
      camera = list(
        eye = list(x = 1.5, y = -1.5, z = 0.8)
      ),
      aspectmode = "manual",
      aspectratio = list(x = 1, y = 3, z = 1)
    ),
    legend = list(x = 0.7, y = 0.9),
    hovermode = "closest"
  )

  return(p)
}

# Create interactive plot
p_3d <- create_3d_tunnel_plot(strider_data, "Spencer Strider", c("FF", "SL"))
p_3d

# Save as HTML file
# htmlwidgets::saveWidget(p_3d, "strider_3d_tunnel.html")


# Create interactive release point comparison
create_interactive_release_plot <- function(pitcher_data, pitcher_name) {

  plot_data <- pitcher_data %>%
    filter(!is.na(release_pos_x) & !is.na(release_pos_z)) %>%
    mutate(
      release_x_inches = release_pos_x * 12,
      release_z_inches = release_pos_z * 12
    )

  # Calculate statistics for hover info
  stats_by_pitch <- plot_data %>%
    group_by(pitch_type) %>%
    summarise(
      n = n(),
      mean_x = mean(release_x_inches),
      mean_z = mean(release_z_inches),
      sd_x = sd(release_x_inches),
      sd_z = sd(release_z_inches),
      mean_velo = mean(release_speed, na.rm = TRUE),
      .groups = 'drop'
    )

  p <- plot_ly()

  # Add scatter points for each pitch type
  for (ptype in unique(plot_data$pitch_type)) {
    subset <- plot_data %>% filter(pitch_type == ptype)
    stats <- stats_by_pitch %>% filter(pitch_type == ptype)

    p <- p %>% add_trace(
      data = subset,
      x = ~release_x_inches,
      y = ~release_z_inches,
      type = "scatter",
      mode = "markers",
      name = ptype,
      marker = list(size = 5, opacity = 0.3),
      hovertemplate = paste(
        "<b>", ptype, "</b><br>",
        "X: %{x:.1f} in<br>",
        "Z: %{y:.1f} in<br>",
        "Velo: ", round(subset$release_speed, 1), " mph<br>",
        "<extra></extra>"
      )
    )

    # Add mean marker
    p <- p %>% add_trace(
      x = stats$mean_x,
      y = stats$mean_z,
      type = "scatter",
      mode = "markers",
      marker = list(size = 15, symbol = "star",
                   line = list(color = "black", width = 2)),
      name = paste(ptype, "mean"),
      showlegend = FALSE,
      hovertemplate = paste0(
        "<b>", ptype, " Average</b><br>",
        "X: ", round(stats$mean_x, 2), " ± ", round(stats$sd_x, 2), " in<br>",
        "Z: ", round(stats$mean_z, 2), " ± ", round(stats$sd_z, 2), " in<br>",
        "Avg Velo: ", round(stats$mean_velo, 1), " mph<br>",
        "Count: ", stats$n, "<br>",
        "<extra></extra>"
      )
    )
  }

  p <- p %>% layout(
    title = list(
      text = paste0("<b>", pitcher_name, " - Release Point Consistency</b><br>",
                   "<sup>Stars indicate average release point for each pitch</sup>"),
      font = list(size = 16)
    ),
    xaxis = list(
      title = "Horizontal Release Position (inches from center)",
      zeroline = TRUE
    ),
    yaxis = list(
      title = "Vertical Release Position (inches)",
      zeroline = FALSE
    ),
    hovermode = "closest",
    legend = list(x = 0.02, y = 0.98)
  )

  return(p)
}

# Create release point plot
p_release_interactive <- create_interactive_release_plot(strider_data,
                                                         "Spencer Strider")
p_release_interactive


# Create interactive movement profile plot
create_interactive_movement_plot <- function(pitcher_data, pitcher_name) {

  plot_data <- pitcher_data %>%
    filter(!is.na(pfx_x) & !is.na(pfx_z)) %>%
    mutate(
      hb_inches = pfx_x * 12,
      ivb_inches = pfx_z * 12
    )

  p <- plot_ly()

  for (ptype in unique(plot_data$pitch_type)) {
    subset <- plot_data %>% filter(pitch_type == ptype)

    p <- p %>% add_trace(
      data = subset,
      x = ~hb_inches,
      y = ~ivb_inches,
      type = "scatter",
      mode = "markers",
      name = ptype,
      marker = list(
        size = ~release_speed / 5,  # Size by velocity
        opacity = 0.4,
        line = list(width = 0.5, color = "white")
      ),
      text = ~paste0(
        "Pitch: ", pitch_type, "<br>",
        "Velo: ", round(release_speed, 1), " mph<br>",
        "HB: ", round(hb_inches, 1), " in<br>",
        "IVB: ", round(ivb_inches, 1), " in<br>",
        "Spin: ", round(release_spin_rate, 0), " rpm"
      ),
      hovertemplate = "%{text}<extra></extra>"
    )
  }

  # Add reference lines
  p <- p %>%
    add_segments(x = 0, xend = 0, y = -25, yend = 25,
                line = list(dash = "dash", color = "gray"),
                showlegend = FALSE, hoverinfo = "skip") %>%
    add_segments(x = -25, xend = 25, y = 0, yend = 0,
                line = list(dash = "dash", color = "gray"),
                showlegend = FALSE, hoverinfo = "skip")

  p <- p %>% layout(
    title = list(
      text = paste0("<b>", pitcher_name, " - Pitch Movement Profile</b><br>",
                   "<sup>Marker size scaled by velocity</sup>"),
      font = list(size = 16)
    ),
    xaxis = list(
      title = "Horizontal Break (inches, - = glove side)",
      range = c(-25, 25)
    ),
    yaxis = list(
      title = "Induced Vertical Break (inches, + = rise)",
      range = c(-25, 25)
    ),
    hovermode = "closest"
  )

  return(p)
}

p_movement_interactive <- create_interactive_movement_plot(strider_data,
                                                           "Spencer Strider")
p_movement_interactive
Python
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

def create_interactive_tunnel_viz(pitcher_data, pitcher_name):
    """
    Create comprehensive interactive tunneling dashboard
    """

    # Filter valid data
    valid = pitcher_data.dropna(subset=['pfx_x', 'pfx_z', 'pitch_type'])

    # Create subplots
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Movement Profile', 'Release Points',
                       'Velocity Distribution', 'Break by Count'),
        specs=[[{'type': 'scatter'}, {'type': 'scatter'}],
               [{'type': 'box'}, {'type': 'bar'}]]
    )

    pitch_types = valid['pitch_type'].value_counts().head(5).index.tolist()
    colors = px.colors.qualitative.Set2

    # Plot 1: Movement profile
    for i, ptype in enumerate(pitch_types):
        subset = valid[valid['pitch_type'] == ptype]

        fig.add_trace(
            go.Scatter(
                x=subset['pfx_x'] * 12,
                y=subset['pfx_z'] * 12,
                mode='markers',
                name=ptype,
                marker=dict(size=5, opacity=0.4, color=colors[i]),
                hovertemplate=(
                    f'<b>{ptype}</b><br>' +
                    'HB: %{x:.1f} in<br>' +
                    'IVB: %{y:.1f} in<br>' +
                    '<extra></extra>'
                ),
                legendgroup=ptype
            ),
            row=1, col=1
        )

    # Add reference lines
    fig.add_hline(y=0, line_dash="dash", line_color="gray",
                 opacity=0.5, row=1, col=1)
    fig.add_vline(x=0, line_dash="dash", line_color="gray",
                 opacity=0.5, row=1, col=1)

    # Plot 2: Release points
    valid_release = valid.dropna(subset=['release_pos_x', 'release_pos_z'])

    for i, ptype in enumerate(pitch_types):
        subset = valid_release[valid_release['pitch_type'] == ptype]

        fig.add_trace(
            go.Scatter(
                x=subset['release_pos_x'] * 12,
                y=subset['release_pos_z'] * 12,
                mode='markers',
                name=ptype,
                marker=dict(size=4, opacity=0.3, color=colors[i]),
                showlegend=False,
                hovertemplate=(
                    f'<b>{ptype}</b><br>' +
                    'X: %{x:.1f} in<br>' +
                    'Z: %{y:.1f} in<br>' +
                    '<extra></extra>'
                ),
                legendgroup=ptype
            ),
            row=1, col=2
        )

    # Plot 3: Velocity distribution
    for i, ptype in enumerate(pitch_types):
        subset = valid[valid['pitch_type'] == ptype]

        fig.add_trace(
            go.Box(
                y=subset['release_speed'],
                name=ptype,
                marker_color=colors[i],
                showlegend=False,
                legendgroup=ptype
            ),
            row=2, col=1
        )

    # Plot 4: Break by count
    valid['count'] = valid['balls'].astype(str) + '-' + valid['strikes'].astype(str)

    # Calculate total break by count for primary pitch
    primary_pitch = pitch_types[0]
    primary_data = valid[valid['pitch_type'] == primary_pitch]

    break_by_count = primary_data.groupby('count').apply(
        lambda x: np.sqrt((x['pfx_x'] * 12) ** 2 + (x['pfx_z'] * 12) ** 2).mean()
    ).sort_index()

    fig.add_trace(
        go.Bar(
            x=break_by_count.index,
            y=break_by_count.values,
            marker_color=colors[0],
            name=f'{primary_pitch} Break',
            showlegend=False,
            hovertemplate=(
                'Count: %{x}<br>' +
                'Avg Break: %{y:.1f} in<br>' +
                '<extra></extra>'
            )
        ),
        row=2, col=2
    )

    # Update axes
    fig.update_xaxes(title_text="Horizontal Break (in)", row=1, col=1)
    fig.update_yaxes(title_text="Induced Vertical Break (in)", row=1, col=1)

    fig.update_xaxes(title_text="Release X (in)", row=1, col=2)
    fig.update_yaxes(title_text="Release Z (in)", row=1, col=2)

    fig.update_yaxes(title_text="Velocity (mph)", row=2, col=1)

    fig.update_xaxes(title_text="Count", row=2, col=2)
    fig.update_yaxes(title_text="Total Break (in)", row=2, col=2)

    # Update layout
    fig.update_layout(
        title_text=f"{pitcher_name} - Interactive Tunneling Dashboard",
        title_font_size=18,
        height=900,
        showlegend=True,
        hovermode='closest'
    )

    return fig

# Example: create interactive visualization
# fig_interactive = create_interactive_tunnel_viz(pitches, "Dylan Cease")
# fig_interactive.show()
# fig_interactive.write_html("cease_interactive_dashboard.html")

26.8 Exercises

Exercise 1: Calculate Basic Tunnel Points (Easy)

Using Statcast data for any pitcher of your choice:

  1. Download one game's worth of pitch data
  2. Calculate the tunnel point for their fastball-slider combination
  3. Identify the separation distance at the tunnel point
  4. Create a simple 2D visualization of the horizontal separation over distance

Expected output: A tunnel point distance (in feet from plate) and a plot showing pitch convergence/divergence.

Exercise 2: Release Point Consistency Analysis (Easy)

For a pitcher with at least 500 pitches in 2023:

  1. Calculate the standard deviation of release points (x, y, z) for each pitch type
  2. Create a release point consistency matrix showing distances between average release points
  3. Identify which pitch types have the most consistent release points
  4. Determine if release consistency correlates with whiff rate

Hint: Lower standard deviation indicates better consistency.

Exercise 3: Movement Profile Comparison (Medium)

Compare the movement profiles of three elite pitchers (e.g., Strider, Cease, Ohtani):

  1. Calculate average horizontal and vertical break for each pitcher's primary pitches
  2. Create a break differential matrix for each pitcher
  3. Identify which pitcher has the greatest total break differential between their fastball and slider
  4. Visualize all three pitchers' movement profiles on the same plot

Deliverable: A comparison table and a multi-pitcher movement plot.

Exercise 4: Pitch Sequencing Effectiveness (Medium)

Analyze pitch sequencing patterns for an elite strikeout pitcher:

  1. Calculate two-pitch sequence frequencies
  2. Determine which sequences have the highest whiff rates
  3. Identify count-specific sequencing patterns (e.g., two-strike approaches)
  4. Create a sequence flow diagram showing the most common transitions

Challenge: Weight sequences by game leverage to find high-pressure patterns.

Exercise 5: Custom Tunneling Score (Hard)

Develop your own tunneling effectiveness metric that combines:

  1. Tunnel point distance (later = better)
  2. Release point consistency (tighter = better)
  3. Late movement differential (greater = better)
  4. Chase rate on breaking balls after fastballs

Create a scoring system that produces a single tunneling grade (0-100) and apply it to 10 starting pitchers. Validate your metric by correlating it with strikeout rate or ERA.

Bonus: Create percentile rankings and identify which pitchers over/underperform their tunneling score.

Exercise 6: Trajectory Reconstruction (Hard)

Using the physics equations and Statcast's initial conditions:

  1. Write a complete trajectory calculator that takes vx0, vy0, vz0, ax, ay, az, and release position
  2. Generate 100 points along a pitch's trajectory from release to plate
  3. Calculate the exact point where two pitches diverge by more than 3 inches
  4. Create an animated visualization showing both pitches traveling toward the plate

Advanced: Add spin-based trajectory calculations to account for Magnus force effects.

Exercise 7: League-Wide Tunneling Analysis (Hard)

Perform a comprehensive analysis of tunneling across MLB:

  1. Download pitch data for all starting pitchers with 100+ innings in 2023
  2. Calculate tunneling metrics for each pitcher's primary pitch pairs
  3. Identify cluster groups of pitchers with similar tunneling profiles
  4. Determine if tunneling effectiveness correlates with:
  • Strikeout rate
  • Walk rate
  • ERA
  • Opponent batting average

Create a research report with visualizations showing the relationship between tunneling and pitcher success.

Expected deliverable: A 1000-word analysis with at least 5 visualizations and statistical significance tests.

Exercise 8: Real-Time Tunnel Detector (Expert)

Build a system that processes live game data and identifies exceptional tunneling sequences:

  1. Set up a data pipeline that ingests pitch-by-pitch data
  2. Calculate tunnel points in real-time for consecutive pitches
  3. Flag sequences where tunnel points occur within 15 feet of the plate
  4. Generate automatic alerts for "elite tunneling moments"
  5. Create a dashboard that displays these sequences with trajectory visualizations

Technologies: Consider using Streamlit, Dash, or Shiny for the dashboard.


Summary

Pitch tunneling represents the cutting edge of pitching analytics, combining physics, psychology, and data science to understand one of baseball's most subtle arts. Elite pitchers like Spencer Strider, Dylan Cease, and Shohei Ohtani succeed not just because they throw hard or spin the ball well, but because they create impossible decision points for hitters.

Key takeaways from this chapter:

  1. Tunneling is about time: Hitters must decide before they can distinguish pitches
  2. Release point consistency is foundational: Even slight variations telegraph pitch type
  3. Movement differentials create deception: Similar early paths, divergent endpoints
  4. Sequencing amplifies tunneling: Strategic pitch ordering maximizes deception
  5. Modern analytics quantify the unquantifiable: What was once "feel" is now measurable

As you apply these concepts, remember that data illuminates but doesn't dictate. The best pitchers blend analytics with artistry, using numbers to refine their craft while maintaining the creativity that makes baseball beautiful.

The code examples in this chapter provide a starting point for your own tunneling research. Whether you're a coach seeking competitive advantages, an analyst building predictive models, or a fan deepening your appreciation of the game, understanding pitch tunneling opens new dimensions in baseball analysis.

Further Reading:


  • Mike Fast's original tunneling research (Baseball Prospectus, 2011)

  • Dr. Peter Fadde's work on pitch recognition and decision-making

  • MLB's Statcast research on pitch tracking accuracy

  • Research papers on visual perception in sports

Continue exploring, questioning, and analyzing. The future of pitching analytics is limited only by imagination and computing power.

Chapter Summary

In this chapter, you learned about pitch tunneling & deception analytics. Key topics covered:

  • The Science of Pitch Deception
  • Tunnel Point Analysis
  • Release Point Consistency
  • Induced Vertical and Horizontal Break Differences
  • Pitch Sequencing Strategies
  • Measuring Deception Effectiveness