Chapter 7: Statcast Analytics - Pitching

7.1 The Modern Pitching Analytics Revolution

7.1.1 What Statcast Measures for Pitchers

Just as Statcast revolutionized hitting analysis, it has fundamentally transformed how we evaluate pitchers. Before Statcast, pitching evaluation relied heavily on outcome-based statistics like ERA, WHIP, and strikeout rate. These metrics tell us what happened, but not how or why. Statcast peels back the layers, measuring the physical properties of every pitch thrown in Major League Baseball.

The TrackMan radar system tracks each pitch from the moment it leaves the pitcher's hand until it crosses home plate (or is put into play). This tracking provides unprecedented insight into pitch characteristics that were previously invisible or estimated:

Velocity: Measured at the release point and as the ball crosses the plate. This isn't just "how hard does he throw" - it's precise measurement of pitch speed throughout its flight path.

Spin Rate: Measured in revolutions per minute (RPM), spin rate quantifies how much the ball rotates. Higher spin typically means more movement and better "carry" on fastballs.

Movement: Broken down into horizontal break (side-to-side movement) and induced vertical break (how much the pitch "rises" or drops relative to a spinless pitch).

Release Point: The three-dimensional coordinates (height, side, extension) where the ball leaves the pitcher's hand.

Vertical Approach Angle (VAA): The angle at which the pitch enters the hitting zone - a critical factor in swing decisions and contact quality.

Pitch Location: Precise coordinates of where the pitch crosses the front of home plate, enabling detailed command analysis.

This wealth of data has enabled new evaluation frameworks. We can now identify why one pitcher's fastball generates more swings-and-misses than another's despite similar velocity. We can understand why certain breaking balls are more effective. We can diagnose mechanical issues and design new pitches based on data-driven principles.

7.1.2 Key Statcast Pitching Metrics

Here's a comprehensive overview of the most important Statcast pitching metrics:

Metric	Definition	Typical Range	Elite Threshold	What It Reveals
4-Seam Velocity	Release point velocity	92-95 mph	97+ mph	Fastball power
Spin Rate (4-Seam)	Fastball spin	2200-2400 RPM	2500+ RPM	Fastball "rise" and whiff ability
Induced Vertical Break (IVB)	Vertical movement vs. gravity	15-17 inches	18+ inches	Fastball carry/riding action
Horizontal Break (HB)	Side-to-side movement	Varies by pitch	Context-dependent	Lateral movement
Vertical Approach Angle	Entry angle into zone	-4° to -6°	Varies by pitch type	Perceived rise/deception
Extension	Release point distance	6.0-6.5 feet	6.5+ feet	Effective velocity boost
Release Height	Vertical release point	5.5-6.5 feet	Context-dependent	Angle and deception
Whiff Rate	Swings and misses / swings	20-25%	30%+	Swing-and-miss ability
Chase Rate	Swings outside zone / pitches outside	25-30%	35%+	Deception effectiveness
xwOBA	Expected wOBA allowed	.310-.330	<.300	Contact quality allowed

Understanding these metrics and their interactions is essential for modern pitching analysis. A 95 mph fastball with 2600 RPM spin will behave very differently from a 95 mph fastball with 2200 RPM spin. The former will have more "ride" and generate more swings-and-misses on high fastballs.

7.1.3 The Physics of Pitching

To understand Statcast metrics, we need basic understanding of pitching physics. When a pitcher throws a ball, two primary forces affect its flight:

Gravity pulls the ball downward. Without any spin, a pitch thrown from 6 feet high would drop approximately 3-4 feet by the time it reaches home plate (60.5 feet away).

Magnus Force is created by spin. As the ball rotates, it creates pressure differentials that cause movement perpendicular to the spin axis. A ball spinning with backspin (like a four-seam fastball) creates upward Magnus force, causing the pitch to "rise" - or more accurately, drop less than gravity alone would cause.

Induced Vertical Break (IVB) measures this Magnus effect. A fastball with 16 inches of IVB drops 16 inches less than a spinless pitch would. High-spin fastballs can have 18-20 inches of IVB, creating the perception that the pitch is "rising" as it reaches the plate.

Horizontal Break works similarly but perpendicular to vertical movement. A slider's spin axis tilted sideways creates lateral movement away from the pitcher's arm side.

Understanding these principles helps explain why certain pitch characteristics work: high-spin fastballs up in the zone generate swings underneath the ball, sliders with tight spin and lateral movement induce weak contact, curveballs with high spin and downward movement produce ground balls.

7.2 Velocity Analysis

7.2.1 Release Velocity vs. Perceived Velocity

Release velocity is the speed of the pitch as it leaves the pitcher's hand, typically measured a few feet after release. This is the "official" velocity shown on stadium radar guns and broadcasts.

Perceived velocity (or effective velocity) is what matters to the hitter. Two factors modify how fast a pitch "feels" to a hitter:

Extension: Pitchers who release the ball further from the rubber effectively shorten the distance to the plate. A pitcher with 7 feet of extension releases the ball from 53.5 feet away (60.5 - 7) rather than 60.5 feet. This gives the hitter less reaction time, making the pitch "play" 1-2 mph faster.

Vertical Approach Angle (VAA): Pitches entering the zone on flatter angles are harder to track and give hitters less time to adjust.

The formula for perceived velocity adjustment from extension is:

Perceived Velocity = Release Velocity × (60.5 / (60.5 - Extension))

For example, a 95 mph fastball with 7 feet of extension:

Perceived Velocity = 95 × (60.5 / 53.5) = 95 × 1.131 = 107.4 mph in terms of reaction time

This explains why some pitchers' "average" fastballs generate more swings-and-misses than expected - their extension makes the pitch play faster.

7.2.2 Velocity Metrics by Pitch Type

Different pitch types have characteristic velocity ranges. Understanding these helps with pitch classification and arsenal evaluation:

Four-Seam Fastball (FF): 90-98 mph (elite: 97+)
Two-Seam Fastball/Sinker (SI): 89-96 mph (typically 1-2 mph slower than four-seam)
Cutter (FC): 87-94 mph (typically 2-5 mph slower than fastball)
Slider (SL): 83-90 mph (can be slower for "sweepers")
Curveball (CU): 75-82 mph (larger break, slower velocity)
Changeup (CH): 82-88 mph (typically 6-10 mph slower than fastball)
Splitter (FS): 84-90 mph (2-6 mph slower than fastball)

Let's code functions to analyze velocity by pitch type:

Python Implementation

import pandas as pd
import numpy as np
from pybaseball import statcast_pitcher, playerid_lookup
import matplotlib.pyplot as plt
import seaborn as sns

def calculate_velocity_metrics(df):
    """
    Calculate comprehensive velocity metrics by pitch type.

    Parameters:
    df: Statcast DataFrame with pitch-level data

    Returns:
    DataFrame with velocity metrics by pitch type
    """
    # Filter for valid pitches
    pitches = df[df['release_speed'].notna()].copy()

    # Group by pitch type and calculate metrics
    velo_metrics = pitches.groupby('pitch_type').agg({
        'release_speed': ['mean', 'std', 'min', 'max', 'count'],
        'release_extension': 'mean'
    }).round(2)

    # Flatten column names
    velo_metrics.columns = ['_'.join(col).strip() for col in velo_metrics.columns]
    velo_metrics = velo_metrics.reset_index()

    # Rename for clarity
    velo_metrics.columns = ['pitch_type', 'avg_velo', 'velo_std', 'min_velo',
                            'max_velo', 'count', 'avg_extension']

    # Calculate perceived velocity
    velo_metrics['perceived_velo'] = (
        velo_metrics['avg_velo'] * (60.5 / (60.5 - velo_metrics['avg_extension']))
    ).round(2)

    # Calculate percentage of pitches
    velo_metrics['pitch_pct'] = (
        velo_metrics['count'] / velo_metrics['count'].sum() * 100
    ).round(1)

    # Sort by average velocity descending
    velo_metrics = velo_metrics.sort_values('avg_velo', ascending=False)

    return velo_metrics

# Example: Analyze Gerrit Cole's velocity
# Gerrit Cole player_id: 543037
start_date = '2024-04-01'
end_date = '2024-10-01'

cole_pitches = statcast_pitcher(start_date, end_date, 543037)

if cole_pitches is not None and len(cole_pitches) > 0:
    cole_velo = calculate_velocity_metrics(cole_pitches)

    print("Gerrit Cole 2024 Velocity Profile")
    print("=" * 80)
    print(cole_velo.to_string(index=False))

    # Visualize velocity by pitch type
    pitch_counts = cole_pitches.groupby('pitch_type')['release_speed'].count()
    qualifying_pitches = pitch_counts[pitch_counts >= 50].index

    fig, ax = plt.subplots(figsize=(12, 6))

    data_to_plot = cole_pitches[cole_pitches['pitch_type'].isin(qualifying_pitches)]

    sns.boxplot(data=data_to_plot, x='pitch_type', y='release_speed',
                palette='Set2', ax=ax)

    ax.set_xlabel('Pitch Type', fontsize=12, fontweight='bold')
    ax.set_ylabel('Release Velocity (mph)', fontsize=12, fontweight='bold')
    ax.set_title('Gerrit Cole 2024: Velocity Distribution by Pitch Type',
                 fontsize=14, fontweight='bold', pad=15)
    ax.grid(axis='y', alpha=0.3, linestyle='--')

    plt.tight_layout()
    plt.show()

R Implementation

library(baseballr)
library(dplyr)
library(tidyr)
library(ggplot2)

calculate_velocity_metrics <- function(df) {
  # Filter for valid pitches
  pitches <- df %>%
    filter(!is.na(release_speed))

  # Group by pitch type and calculate metrics
  velo_metrics <- pitches %>%
    group_by(pitch_type) %>%
    summarise(
      avg_velo = mean(release_speed, na.rm = TRUE),
      velo_std = sd(release_speed, na.rm = TRUE),
      min_velo = min(release_speed, na.rm = TRUE),
      max_velo = max(release_speed, na.rm = TRUE),
      count = n(),
      avg_extension = mean(release_extension, na.rm = TRUE),
      .groups = 'drop'
    ) %>%
    mutate(
      # Calculate perceived velocity
      perceived_velo = avg_velo * (60.5 / (60.5 - avg_extension)),
      # Calculate pitch percentage
      pitch_pct = count / sum(count) * 100,
      # Round for display
      across(where(is.numeric), ~round(.x, 2))
    ) %>%
    arrange(desc(avg_velo))

  return(velo_metrics)
}

# Fetch Gerrit Cole's 2024 data
# Gerrit Cole MLBAM ID: 543037
cole_pitches <- statcast_search_pitchers(
  start_date = "2024-04-01",
  end_date = "2024-10-01",
  pitcherid = 543037
)

cole_velo <- calculate_velocity_metrics(cole_pitches)

cat("Gerrit Cole 2024 Velocity Profile\n")
cat(strrep("=", 80), "\n")
print(cole_velo)

# Visualize velocity by pitch type
qualifying_pitches <- cole_pitches %>%
  count(pitch_type) %>%
  filter(n >= 50) %>%
  pull(pitch_type)

cole_pitches %>%
  filter(pitch_type %in% qualifying_pitches) %>%
  ggplot(aes(x = pitch_type, y = release_speed, fill = pitch_type)) +
  geom_boxplot(alpha = 0.7, outlier.alpha = 0.3) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Gerrit Cole 2024: Velocity Distribution by Pitch Type",
    x = "Pitch Type",
    y = "Release Velocity (mph)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    axis.title = element_text(face = "bold", size = 12),
    legend.position = "none",
    panel.grid.major.x = element_blank()
  )

7.2.3 Velocity Decline and Fatigue Analysis

Velocity is one of the first indicators of pitcher fatigue or injury. Monitoring velocity trends throughout games and across seasons provides valuable information for player health and performance management.

def analyze_velocity_by_pitch_number(df):
    """
    Analyze how velocity changes as pitch count increases within games.

    Parameters:
    df: Statcast DataFrame with pitch-level data

    Returns:
    DataFrame showing velocity trends by pitch number bins
    """
    # Focus on four-seam fastballs for consistency
    fastballs = df[df['pitch_type'] == 'FF'].copy()

    if len(fastballs) == 0:
        return None

    # Create pitch count bins
    fastballs['pitch_bin'] = pd.cut(
        fastballs['pitch_number'],
        bins=[0, 25, 50, 75, 100, 150],
        labels=['1-25', '26-50', '51-75', '76-100', '100+']
    )

    # Calculate velocity by bin
    velo_by_count = fastballs.groupby('pitch_bin', observed=True).agg({
        'release_speed': ['mean', 'count']
    }).round(2)

    velo_by_count.columns = ['avg_velo', 'pitch_count']
    velo_by_count = velo_by_count.reset_index()

    # Calculate velocity drop from first bin
    baseline_velo = velo_by_count.iloc[0]['avg_velo']
    velo_by_count['velo_drop'] = (
        velo_by_count['avg_velo'] - baseline_velo
    ).round(2)

    return velo_by_count

# Analyze Cole's velocity by pitch count
cole_velo_trend = analyze_velocity_by_pitch_number(cole_pitches)

if cole_velo_trend is not None:
    print("\nVelocity by Pitch Count (Four-Seam Fastball)")
    print("=" * 60)
    print(cole_velo_trend.to_string(index=False))

    # Visualize
    fig, ax = plt.subplots(figsize=(10, 6))

    ax.plot(range(len(cole_velo_trend)), cole_velo_trend['avg_velo'],
            marker='o', linewidth=2, markersize=8, color='#003087')

    ax.set_xticks(range(len(cole_velo_trend)))
    ax.set_xticklabels(cole_velo_trend['pitch_bin'])
    ax.set_xlabel('Pitch Count Range', fontsize=12, fontweight='bold')
    ax.set_ylabel('Average Velocity (mph)', fontsize=12, fontweight='bold')
    ax.set_title('Fastball Velocity by Pitch Count\nGerrit Cole 2024',
                 fontsize=14, fontweight='bold')
    ax.grid(axis='y', alpha=0.3, linestyle='--')

    # Add velocity drop annotations
    for i, row in cole_velo_trend.iterrows():
        if row['velo_drop'] != 0:
            ax.annotate(f"{row['velo_drop']:+.1f}",
                       xy=(i, row['avg_velo']),
                       xytext=(0, 10), textcoords='offset points',
                       ha='center', fontsize=9, color='red')

    plt.tight_layout()
    plt.show()

# R version: Velocity by pitch count
analyze_velocity_by_pitch_number <- function(df) {
  # Focus on four-seam fastballs
  fastballs <- df %>%
    filter(pitch_type == 'FF', !is.na(release_speed))

  if (nrow(fastballs) == 0) {
    return(NULL)
  }

  # Create pitch count bins
  velo_by_count <- fastballs %>%
    mutate(
      pitch_bin = cut(pitch_number,
                     breaks = c(0, 25, 50, 75, 100, 150),
                     labels = c('1-25', '26-50', '51-75', '76-100', '100+'))
    ) %>%
    group_by(pitch_bin, .drop = FALSE) %>%
    summarise(
      avg_velo = mean(release_speed, na.rm = TRUE),
      pitch_count = n(),
      .groups = 'drop'
    ) %>%
    mutate(
      velo_drop = avg_velo - first(avg_velo),
      across(c(avg_velo, velo_drop), ~round(.x, 2))
    )

  return(velo_by_count)
}

cole_velo_trend <- analyze_velocity_by_pitch_number(cole_pitches)

cat("\nVelocity by Pitch Count (Four-Seam Fastball)\n")
cat(strrep("=", 60), "\n")
print(cole_velo_trend)

# Visualize
ggplot(cole_velo_trend, aes(x = pitch_bin, y = avg_velo, group = 1)) +
  geom_line(color = '#003087', size = 1.2) +
  geom_point(color = '#003087', size = 4) +
  geom_text(aes(label = sprintf("%+.1f", velo_drop)),
            vjust = -1.5, color = 'red', fontface = 'bold', size = 3.5) +
  labs(
    title = "Fastball Velocity by Pitch Count",
    subtitle = "Gerrit Cole 2024",
    x = "Pitch Count Range",
    y = "Average Velocity (mph)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5),
    axis.title = element_text(face = "bold", size = 11)
  )

7.2.4 Key Insights on Velocity

Velocity matters, but it's not everything: A 98 mph fastball with poor command or predictable sequencing will underperform a well-located 94 mph fastball with deception.

Extension is underrated: Pitchers with plus extension (6.5+ feet) can succeed with "average" velocity because the ball plays faster.

Velocity decline signals: Drops of 2+ mph within a game warrant attention - they may indicate fatigue or injury. Season-over-season declines require investigation.

Pitch-to-pitch variation: Elite pitchers show minimal velocity variation on their fastball, suggesting consistent mechanics. Excessive variation may indicate mechanical inconsistency.

library(baseballr)
library(dplyr)
library(tidyr)
library(ggplot2)

calculate_velocity_metrics <- function(df) {
  # Filter for valid pitches
  pitches <- df %>%
    filter(!is.na(release_speed))

  # Group by pitch type and calculate metrics
  velo_metrics <- pitches %>%
    group_by(pitch_type) %>%
    summarise(
      avg_velo = mean(release_speed, na.rm = TRUE),
      velo_std = sd(release_speed, na.rm = TRUE),
      min_velo = min(release_speed, na.rm = TRUE),
      max_velo = max(release_speed, na.rm = TRUE),
      count = n(),
      avg_extension = mean(release_extension, na.rm = TRUE),
      .groups = 'drop'
    ) %>%
    mutate(
      # Calculate perceived velocity
      perceived_velo = avg_velo * (60.5 / (60.5 - avg_extension)),
      # Calculate pitch percentage
      pitch_pct = count / sum(count) * 100,
      # Round for display
      across(where(is.numeric), ~round(.x, 2))
    ) %>%
    arrange(desc(avg_velo))

  return(velo_metrics)
}

# Fetch Gerrit Cole's 2024 data
# Gerrit Cole MLBAM ID: 543037
cole_pitches <- statcast_search_pitchers(
  start_date = "2024-04-01",
  end_date = "2024-10-01",
  pitcherid = 543037
)

cole_velo <- calculate_velocity_metrics(cole_pitches)

cat("Gerrit Cole 2024 Velocity Profile\n")
cat(strrep("=", 80), "\n")
print(cole_velo)

# Visualize velocity by pitch type
qualifying_pitches <- cole_pitches %>%
  count(pitch_type) %>%
  filter(n >= 50) %>%
  pull(pitch_type)

cole_pitches %>%
  filter(pitch_type %in% qualifying_pitches) %>%
  ggplot(aes(x = pitch_type, y = release_speed, fill = pitch_type)) +
  geom_boxplot(alpha = 0.7, outlier.alpha = 0.3) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Gerrit Cole 2024: Velocity Distribution by Pitch Type",
    x = "Pitch Type",
    y = "Release Velocity (mph)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    axis.title = element_text(face = "bold", size = 12),
    legend.position = "none",
    panel.grid.major.x = element_blank()
  )

# R version: Velocity by pitch count
analyze_velocity_by_pitch_number <- function(df) {
  # Focus on four-seam fastballs
  fastballs <- df %>%
    filter(pitch_type == 'FF', !is.na(release_speed))

  if (nrow(fastballs) == 0) {
    return(NULL)
  }

  # Create pitch count bins
  velo_by_count <- fastballs %>%
    mutate(
      pitch_bin = cut(pitch_number,
                     breaks = c(0, 25, 50, 75, 100, 150),
                     labels = c('1-25', '26-50', '51-75', '76-100', '100+'))
    ) %>%
    group_by(pitch_bin, .drop = FALSE) %>%
    summarise(
      avg_velo = mean(release_speed, na.rm = TRUE),
      pitch_count = n(),
      .groups = 'drop'
    ) %>%
    mutate(
      velo_drop = avg_velo - first(avg_velo),
      across(c(avg_velo, velo_drop), ~round(.x, 2))
    )

  return(velo_by_count)
}

cole_velo_trend <- analyze_velocity_by_pitch_number(cole_pitches)

cat("\nVelocity by Pitch Count (Four-Seam Fastball)\n")
cat(strrep("=", 60), "\n")
print(cole_velo_trend)

# Visualize
ggplot(cole_velo_trend, aes(x = pitch_bin, y = avg_velo, group = 1)) +
  geom_line(color = '#003087', size = 1.2) +
  geom_point(color = '#003087', size = 4) +
  geom_text(aes(label = sprintf("%+.1f", velo_drop)),
            vjust = -1.5, color = 'red', fontface = 'bold', size = 3.5) +
  labs(
    title = "Fastball Velocity by Pitch Count",
    subtitle = "Gerrit Cole 2024",
    x = "Pitch Count Range",
    y = "Average Velocity (mph)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5),
    axis.title = element_text(face = "bold", size = 11)
  )

Python

import pandas as pd
import numpy as np
from pybaseball import statcast_pitcher, playerid_lookup
import matplotlib.pyplot as plt
import seaborn as sns

def calculate_velocity_metrics(df):
    """
    Calculate comprehensive velocity metrics by pitch type.

    Parameters:
    df: Statcast DataFrame with pitch-level data

    Returns:
    DataFrame with velocity metrics by pitch type
    """
    # Filter for valid pitches
    pitches = df[df['release_speed'].notna()].copy()

    # Group by pitch type and calculate metrics
    velo_metrics = pitches.groupby('pitch_type').agg({
        'release_speed': ['mean', 'std', 'min', 'max', 'count'],
        'release_extension': 'mean'
    }).round(2)

    # Flatten column names
    velo_metrics.columns = ['_'.join(col).strip() for col in velo_metrics.columns]
    velo_metrics = velo_metrics.reset_index()

    # Rename for clarity
    velo_metrics.columns = ['pitch_type', 'avg_velo', 'velo_std', 'min_velo',
                            'max_velo', 'count', 'avg_extension']

    # Calculate perceived velocity
    velo_metrics['perceived_velo'] = (
        velo_metrics['avg_velo'] * (60.5 / (60.5 - velo_metrics['avg_extension']))
    ).round(2)

    # Calculate percentage of pitches
    velo_metrics['pitch_pct'] = (
        velo_metrics['count'] / velo_metrics['count'].sum() * 100
    ).round(1)

    # Sort by average velocity descending
    velo_metrics = velo_metrics.sort_values('avg_velo', ascending=False)

    return velo_metrics

# Example: Analyze Gerrit Cole's velocity
# Gerrit Cole player_id: 543037
start_date = '2024-04-01'
end_date = '2024-10-01'

cole_pitches = statcast_pitcher(start_date, end_date, 543037)

if cole_pitches is not None and len(cole_pitches) > 0:
    cole_velo = calculate_velocity_metrics(cole_pitches)

    print("Gerrit Cole 2024 Velocity Profile")
    print("=" * 80)
    print(cole_velo.to_string(index=False))

    # Visualize velocity by pitch type
    pitch_counts = cole_pitches.groupby('pitch_type')['release_speed'].count()
    qualifying_pitches = pitch_counts[pitch_counts >= 50].index

    fig, ax = plt.subplots(figsize=(12, 6))

    data_to_plot = cole_pitches[cole_pitches['pitch_type'].isin(qualifying_pitches)]

    sns.boxplot(data=data_to_plot, x='pitch_type', y='release_speed',
                palette='Set2', ax=ax)

    ax.set_xlabel('Pitch Type', fontsize=12, fontweight='bold')
    ax.set_ylabel('Release Velocity (mph)', fontsize=12, fontweight='bold')
    ax.set_title('Gerrit Cole 2024: Velocity Distribution by Pitch Type',
                 fontsize=14, fontweight='bold', pad=15)
    ax.grid(axis='y', alpha=0.3, linestyle='--')

    plt.tight_layout()
    plt.show()

Python

def analyze_velocity_by_pitch_number(df):
    """
    Analyze how velocity changes as pitch count increases within games.

    Parameters:
    df: Statcast DataFrame with pitch-level data

    Returns:
    DataFrame showing velocity trends by pitch number bins
    """
    # Focus on four-seam fastballs for consistency
    fastballs = df[df['pitch_type'] == 'FF'].copy()

    if len(fastballs) == 0:
        return None

    # Create pitch count bins
    fastballs['pitch_bin'] = pd.cut(
        fastballs['pitch_number'],
        bins=[0, 25, 50, 75, 100, 150],
        labels=['1-25', '26-50', '51-75', '76-100', '100+']
    )

    # Calculate velocity by bin
    velo_by_count = fastballs.groupby('pitch_bin', observed=True).agg({
        'release_speed': ['mean', 'count']
    }).round(2)

    velo_by_count.columns = ['avg_velo', 'pitch_count']
    velo_by_count = velo_by_count.reset_index()

    # Calculate velocity drop from first bin
    baseline_velo = velo_by_count.iloc[0]['avg_velo']
    velo_by_count['velo_drop'] = (
        velo_by_count['avg_velo'] - baseline_velo
    ).round(2)

    return velo_by_count

# Analyze Cole's velocity by pitch count
cole_velo_trend = analyze_velocity_by_pitch_number(cole_pitches)

if cole_velo_trend is not None:
    print("\nVelocity by Pitch Count (Four-Seam Fastball)")
    print("=" * 60)
    print(cole_velo_trend.to_string(index=False))

    # Visualize
    fig, ax = plt.subplots(figsize=(10, 6))

    ax.plot(range(len(cole_velo_trend)), cole_velo_trend['avg_velo'],
            marker='o', linewidth=2, markersize=8, color='#003087')

    ax.set_xticks(range(len(cole_velo_trend)))
    ax.set_xticklabels(cole_velo_trend['pitch_bin'])
    ax.set_xlabel('Pitch Count Range', fontsize=12, fontweight='bold')
    ax.set_ylabel('Average Velocity (mph)', fontsize=12, fontweight='bold')
    ax.set_title('Fastball Velocity by Pitch Count\nGerrit Cole 2024',
                 fontsize=14, fontweight='bold')
    ax.grid(axis='y', alpha=0.3, linestyle='--')

    # Add velocity drop annotations
    for i, row in cole_velo_trend.iterrows():
        if row['velo_drop'] != 0:
            ax.annotate(f"{row['velo_drop']:+.1f}",
                       xy=(i, row['avg_velo']),
                       xytext=(0, 10), textcoords='offset points',
                       ha='center', fontsize=9, color='red')

    plt.tight_layout()
    plt.show()

7.3 Spin Rate and Movement

7.3.1 Understanding Spin Rate and Spin Axis

Spin rate measures how many times per minute the ball rotates. For fastballs, higher spin rates generally correlate with more "rise" (or less drop from gravity). For breaking balls, spin rate affects the magnitude of break.

However, not all spin is created equal. The spin axis - the orientation of the ball's rotation - determines the direction of movement. A perfectly backspin four-seam fastball (spin axis at 12:00 on a clock face) produces maximum vertical movement. Tilt the axis to 1:00, and you get some horizontal movement mixed with vertical rise.

Active Spin (or useful spin) is the percentage of spin contributing to movement. A pitch with 2500 RPM but only 85% active spin has 2125 RPM actually creating movement. The remaining 15% is "gyro spin" - spinning like a football - which doesn't create Magnus force.

Key concepts:

Four-Seam Fastball: High spin (2400+ RPM), high active spin percentage (95%+), creates vertical "rise"
Two-Seam/Sinker: Lower spin (2100-2300 RPM), often has natural arm-side run, less vertical rise
Slider: High spin (2400-2700 RPM), tilted axis creates lateral break
Curveball: High spin (2500-3000 RPM), forward-tilted axis creates vertical drop
Changeup: Lower spin (1500-1900 RPM), mimics fastball arm action but generates less Magnus force

7.3.2 Movement Profiles by Pitch Type

Different pitches have characteristic movement profiles. Understanding these helps with pitch design and arsenal optimization:

Pitch Type	Spin Rate (RPM)	Induced Vertical Break	Horizontal Break	Primary Action
Four-Seam FB	2200-2600	14-18 inches	-2 to +2 inches	Rising action
Sinker	2000-2300	8-14 inches	10-16 inches (arm-side)	Sinking, running
Cutter	2400-2700	10-14 inches	2-6 inches (glove-side)	Late break away
Slider	2400-2800	2-8 inches	4-10 inches (glove-side)	Sweeping break
Curveball	2500-3000	-6 to -12 inches	4-12 inches	Big vertical drop
Changeup	1500-2000	8-14 inches	12-18 inches (arm-side)	Fading action
Splitter	1500-1900	4-10 inches	-2 to +4 inches	Late tumble

Note: Negative IVB means the pitch drops more than gravity alone; positive means it rises relative to gravity.

Horizontal break is measured from the catcher's perspective; positive = arm side, negative = glove side.

7.3.3 Pitch Movement Visualization

Understanding pitch movement is easier with visualization. Let's create movement charts:

Python Implementation

def plot_pitch_movement(df, player_name="Pitcher"):
    """
    Create a pitch movement chart showing horizontal vs. vertical break.

    Parameters:
    df: Statcast DataFrame with pitch-level data
    player_name: Name for chart title
    """
    # Filter for pitches with movement data
    pitches = df[
        df['pfx_x'].notna() &
        df['pfx_z'].notna() &
        df['pitch_type'].notna()
    ].copy()

    # Convert pfx (in feet) to inches for easier interpretation
    pitches['horizontal_break'] = pitches['pfx_x'] * 12
    pitches['induced_vertical_break'] = pitches['pfx_z'] * 12

    # Get pitch counts for filtering
    pitch_counts = pitches['pitch_type'].value_counts()
    qualifying_pitches = pitch_counts[pitch_counts >= 30].index

    plot_data = pitches[pitches['pitch_type'].isin(qualifying_pitches)]

    # Create figure
    fig, ax = plt.subplots(figsize=(12, 10))

    # Define colors for pitch types
    pitch_colors = {
        'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
        'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
        'FS': '#17becf', 'KC': '#bcbd22'
    }

    # Plot each pitch type
    for pitch_type in qualifying_pitches:
        pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]

        ax.scatter(
            pitch_subset['horizontal_break'],
            pitch_subset['induced_vertical_break'],
            c=pitch_colors.get(pitch_type, '#7f7f7f'),
            label=f"{pitch_type} (n={len(pitch_subset)})",
            alpha=0.6,
            s=30,
            edgecolors='black',
            linewidth=0.5
        )

    # Add reference lines
    ax.axhline(y=0, color='gray', linestyle='--', linewidth=1, alpha=0.5)
    ax.axvline(x=0, color='gray', linestyle='--', linewidth=1, alpha=0.5)

    # Labels and formatting
    ax.set_xlabel('Horizontal Break (inches)\n← Glove Side | Arm Side →',
                  fontsize=12, fontweight='bold')
    ax.set_ylabel('Induced Vertical Break (inches)\n↓ Drop | Rise ↑',
                  fontsize=12, fontweight='bold')
    ax.set_title(f'{player_name} Pitch Movement Profile\nCatcher\'s Perspective',
                 fontsize=14, fontweight='bold', pad=20)

    # Add grid
    ax.grid(True, alpha=0.3, linestyle='--')

    # Legend
    ax.legend(loc='upper left', framealpha=0.9, fontsize=10)

    # Equal aspect ratio for proper representation
    ax.set_aspect('equal', adjustable='box')

    plt.tight_layout()
    plt.show()

# Create movement chart for Gerrit Cole
if cole_pitches is not None and len(cole_pitches) > 0:
    plot_pitch_movement(cole_pitches, "Gerrit Cole 2024")

# R version: Pitch movement visualization
plot_pitch_movement <- function(df, player_name = "Pitcher") {
  # Filter for pitches with movement data
  pitches <- df %>%
    filter(!is.na(pfx_x), !is.na(pfx_z), !is.na(pitch_type)) %>%
    mutate(
      horizontal_break = pfx_x * 12,  # Convert feet to inches
      induced_vertical_break = pfx_z * 12
    )

  # Filter for pitch types with sufficient counts
  pitch_counts <- pitches %>%
    count(pitch_type) %>%
    filter(n >= 30)

  plot_data <- pitches %>%
    filter(pitch_type %in% pitch_counts$pitch_type)

  # Create movement chart
  ggplot(plot_data, aes(x = horizontal_break, y = induced_vertical_break,
                        color = pitch_type)) +
    geom_point(alpha = 0.6, size = 2) +
    geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
    geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
    scale_color_brewer(palette = "Set1",
                       name = "Pitch Type",
                       labels = function(x) {
                         counts <- pitch_counts %>%
                           filter(pitch_type %in% x) %>%
                           pull(n)
                         paste0(x, " (n=", counts, ")")
                       }) +
    labs(
      title = paste(player_name, "Pitch Movement Profile"),
      subtitle = "Catcher's Perspective",
      x = "Horizontal Break (inches)\n← Glove Side | Arm Side →",
      y = "Induced Vertical Break (inches)\n↓ Drop | Rise ↑"
    ) +
    coord_fixed() +  # Equal aspect ratio
    theme_minimal() +
    theme(
      plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
      plot.subtitle = element_text(size = 11, hjust = 0.5),
      axis.title = element_text(face = "bold", size = 11),
      legend.position = "right",
      panel.grid.major = element_line(color = "gray90"),
      panel.grid.minor = element_line(color = "gray95")
    )
}

# Create movement chart for Gerrit Cole
plot_pitch_movement(cole_pitches, "Gerrit Cole 2024")

7.3.4 Analyzing Spin Efficiency

Spin efficiency measures how much of a pitch's spin creates movement versus gyro spin (which doesn't):

def calculate_spin_metrics(df):
    """
    Calculate comprehensive spin metrics by pitch type.

    Parameters:
    df: Statcast DataFrame with pitch-level data

    Returns:
    DataFrame with spin metrics by pitch type
    """
    # Filter for pitches with spin data
    pitches = df[
        df['release_spin_rate'].notna() &
        df['pitch_type'].notna()
    ].copy()

    # Group by pitch type
    spin_metrics = pitches.groupby('pitch_type').agg({
        'release_spin_rate': ['mean', 'std', 'min', 'max'],
        'spin_axis': 'mean',
        'release_speed': 'mean',
        'pfx_x': lambda x: (x * 12).mean(),  # Convert to inches
        'pfx_z': lambda x: (x * 12).mean()
    }).round(1)

    spin_metrics.columns = ['avg_spin', 'spin_std', 'min_spin', 'max_spin',
                            'avg_spin_axis', 'avg_velo', 'avg_h_break', 'avg_v_break']
    spin_metrics = spin_metrics.reset_index()

    # Add pitch counts
    pitch_counts = pitches.groupby('pitch_type').size().reset_index(name='count')
    spin_metrics = spin_metrics.merge(pitch_counts, on='pitch_type')

    # Calculate spin-to-velocity ratio (indicator of movement potential)
    spin_metrics['spin_velo_ratio'] = (
        spin_metrics['avg_spin'] / spin_metrics['avg_velo']
    ).round(1)

    return spin_metrics.sort_values('avg_spin', ascending=False)

# Analyze Cole's spin rates
cole_spin = calculate_spin_metrics(cole_pitches)

print("\nGerrit Cole 2024 Spin Rate Profile")
print("=" * 90)
print(cole_spin.to_string(index=False))

# R version: Spin metrics
calculate_spin_metrics <- function(df) {
  pitches <- df %>%
    filter(!is.na(release_spin_rate), !is.na(pitch_type))

  spin_metrics <- pitches %>%
    group_by(pitch_type) %>%
    summarise(
      avg_spin = mean(release_spin_rate, na.rm = TRUE),
      spin_std = sd(release_spin_rate, na.rm = TRUE),
      min_spin = min(release_spin_rate, na.rm = TRUE),
      max_spin = max(release_spin_rate, na.rm = TRUE),
      avg_spin_axis = mean(spin_axis, na.rm = TRUE),
      avg_velo = mean(release_speed, na.rm = TRUE),
      avg_h_break = mean(pfx_x * 12, na.rm = TRUE),  # Convert to inches
      avg_v_break = mean(pfx_z * 12, na.rm = TRUE),
      count = n(),
      .groups = 'drop'
    ) %>%
    mutate(
      spin_velo_ratio = avg_spin / avg_velo,
      across(where(is.numeric), ~round(.x, 1))
    ) %>%
    arrange(desc(avg_spin))

  return(spin_metrics)
}

cole_spin <- calculate_spin_metrics(cole_pitches)

cat("\nGerrit Cole 2024 Spin Rate Profile\n")
cat(strrep("=", 90), "\n")
print(cole_spin)

# R version: Pitch movement visualization
plot_pitch_movement <- function(df, player_name = "Pitcher") {
  # Filter for pitches with movement data
  pitches <- df %>%
    filter(!is.na(pfx_x), !is.na(pfx_z), !is.na(pitch_type)) %>%
    mutate(
      horizontal_break = pfx_x * 12,  # Convert feet to inches
      induced_vertical_break = pfx_z * 12
    )

  # Filter for pitch types with sufficient counts
  pitch_counts <- pitches %>%
    count(pitch_type) %>%
    filter(n >= 30)

  plot_data <- pitches %>%
    filter(pitch_type %in% pitch_counts$pitch_type)

  # Create movement chart
  ggplot(plot_data, aes(x = horizontal_break, y = induced_vertical_break,
                        color = pitch_type)) +
    geom_point(alpha = 0.6, size = 2) +
    geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
    geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
    scale_color_brewer(palette = "Set1",
                       name = "Pitch Type",
                       labels = function(x) {
                         counts <- pitch_counts %>%
                           filter(pitch_type %in% x) %>%
                           pull(n)
                         paste0(x, " (n=", counts, ")")
                       }) +
    labs(
      title = paste(player_name, "Pitch Movement Profile"),
      subtitle = "Catcher's Perspective",
      x = "Horizontal Break (inches)\n← Glove Side | Arm Side →",
      y = "Induced Vertical Break (inches)\n↓ Drop | Rise ↑"
    ) +
    coord_fixed() +  # Equal aspect ratio
    theme_minimal() +
    theme(
      plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
      plot.subtitle = element_text(size = 11, hjust = 0.5),
      axis.title = element_text(face = "bold", size = 11),
      legend.position = "right",
      panel.grid.major = element_line(color = "gray90"),
      panel.grid.minor = element_line(color = "gray95")
    )
}

# Create movement chart for Gerrit Cole
plot_pitch_movement(cole_pitches, "Gerrit Cole 2024")

# R version: Spin metrics
calculate_spin_metrics <- function(df) {
  pitches <- df %>%
    filter(!is.na(release_spin_rate), !is.na(pitch_type))

  spin_metrics <- pitches %>%
    group_by(pitch_type) %>%
    summarise(
      avg_spin = mean(release_spin_rate, na.rm = TRUE),
      spin_std = sd(release_spin_rate, na.rm = TRUE),
      min_spin = min(release_spin_rate, na.rm = TRUE),
      max_spin = max(release_spin_rate, na.rm = TRUE),
      avg_spin_axis = mean(spin_axis, na.rm = TRUE),
      avg_velo = mean(release_speed, na.rm = TRUE),
      avg_h_break = mean(pfx_x * 12, na.rm = TRUE),  # Convert to inches
      avg_v_break = mean(pfx_z * 12, na.rm = TRUE),
      count = n(),
      .groups = 'drop'
    ) %>%
    mutate(
      spin_velo_ratio = avg_spin / avg_velo,
      across(where(is.numeric), ~round(.x, 1))
    ) %>%
    arrange(desc(avg_spin))

  return(spin_metrics)
}

cole_spin <- calculate_spin_metrics(cole_pitches)

cat("\nGerrit Cole 2024 Spin Rate Profile\n")
cat(strrep("=", 90), "\n")
print(cole_spin)

Python

def plot_pitch_movement(df, player_name="Pitcher"):
    """
    Create a pitch movement chart showing horizontal vs. vertical break.

    Parameters:
    df: Statcast DataFrame with pitch-level data
    player_name: Name for chart title
    """
    # Filter for pitches with movement data
    pitches = df[
        df['pfx_x'].notna() &
        df['pfx_z'].notna() &
        df['pitch_type'].notna()
    ].copy()

    # Convert pfx (in feet) to inches for easier interpretation
    pitches['horizontal_break'] = pitches['pfx_x'] * 12
    pitches['induced_vertical_break'] = pitches['pfx_z'] * 12

    # Get pitch counts for filtering
    pitch_counts = pitches['pitch_type'].value_counts()
    qualifying_pitches = pitch_counts[pitch_counts >= 30].index

    plot_data = pitches[pitches['pitch_type'].isin(qualifying_pitches)]

    # Create figure
    fig, ax = plt.subplots(figsize=(12, 10))

    # Define colors for pitch types
    pitch_colors = {
        'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
        'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
        'FS': '#17becf', 'KC': '#bcbd22'
    }

    # Plot each pitch type
    for pitch_type in qualifying_pitches:
        pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]

        ax.scatter(
            pitch_subset['horizontal_break'],
            pitch_subset['induced_vertical_break'],
            c=pitch_colors.get(pitch_type, '#7f7f7f'),
            label=f"{pitch_type} (n={len(pitch_subset)})",
            alpha=0.6,
            s=30,
            edgecolors='black',
            linewidth=0.5
        )

    # Add reference lines
    ax.axhline(y=0, color='gray', linestyle='--', linewidth=1, alpha=0.5)
    ax.axvline(x=0, color='gray', linestyle='--', linewidth=1, alpha=0.5)

    # Labels and formatting
    ax.set_xlabel('Horizontal Break (inches)\n← Glove Side | Arm Side →',
                  fontsize=12, fontweight='bold')
    ax.set_ylabel('Induced Vertical Break (inches)\n↓ Drop | Rise ↑',
                  fontsize=12, fontweight='bold')
    ax.set_title(f'{player_name} Pitch Movement Profile\nCatcher\'s Perspective',
                 fontsize=14, fontweight='bold', pad=20)

    # Add grid
    ax.grid(True, alpha=0.3, linestyle='--')

    # Legend
    ax.legend(loc='upper left', framealpha=0.9, fontsize=10)

    # Equal aspect ratio for proper representation
    ax.set_aspect('equal', adjustable='box')

    plt.tight_layout()
    plt.show()

# Create movement chart for Gerrit Cole
if cole_pitches is not None and len(cole_pitches) > 0:
    plot_pitch_movement(cole_pitches, "Gerrit Cole 2024")

Python

def calculate_spin_metrics(df):
    """
    Calculate comprehensive spin metrics by pitch type.

    Parameters:
    df: Statcast DataFrame with pitch-level data

    Returns:
    DataFrame with spin metrics by pitch type
    """
    # Filter for pitches with spin data
    pitches = df[
        df['release_spin_rate'].notna() &
        df['pitch_type'].notna()
    ].copy()

    # Group by pitch type
    spin_metrics = pitches.groupby('pitch_type').agg({
        'release_spin_rate': ['mean', 'std', 'min', 'max'],
        'spin_axis': 'mean',
        'release_speed': 'mean',
        'pfx_x': lambda x: (x * 12).mean(),  # Convert to inches
        'pfx_z': lambda x: (x * 12).mean()
    }).round(1)

    spin_metrics.columns = ['avg_spin', 'spin_std', 'min_spin', 'max_spin',
                            'avg_spin_axis', 'avg_velo', 'avg_h_break', 'avg_v_break']
    spin_metrics = spin_metrics.reset_index()

    # Add pitch counts
    pitch_counts = pitches.groupby('pitch_type').size().reset_index(name='count')
    spin_metrics = spin_metrics.merge(pitch_counts, on='pitch_type')

    # Calculate spin-to-velocity ratio (indicator of movement potential)
    spin_metrics['spin_velo_ratio'] = (
        spin_metrics['avg_spin'] / spin_metrics['avg_velo']
    ).round(1)

    return spin_metrics.sort_values('avg_spin', ascending=False)

# Analyze Cole's spin rates
cole_spin = calculate_spin_metrics(cole_pitches)

print("\nGerrit Cole 2024 Spin Rate Profile")
print("=" * 90)
print(cole_spin.to_string(index=False))

7.4 Vertical Approach Angle (VAA)

7.4.1 What VAA Is and Why It Matters

Vertical Approach Angle (VAA) is the angle at which a pitch enters the strike zone, measured in degrees from horizontal. A pitch coming in at -6° is dropping at a steeper angle than one at -4°.

VAA matters because it affects both hitter perception and contact quality:

Perception: Flatter VAA (closer to 0°) makes pitches harder to track. The ball appears to "hop" or stay flat longer before dropping.

Contact Quality: Steeper VAA means the ball is dropping more as it reaches the hitting zone, often resulting in ground balls. Flatter VAA on fastballs up generates swings underneath the ball.

Deception: Large VAA differences between a pitcher's fastball and breaking ball create deception. If a fastball enters at -4.5° and a curveball at -7.5°, hitters struggle to identify pitches early.

Typical VAA ranges:

Four-Seam Fastball: -4° to -5.5° (flatter = better for high fastballs)

Sinker: -5° to -6.5° (steeper helps induce ground balls)

Curveball: -6.5° to -9° (very steep, dramatic drop)

Changeup: -5° to -6.5° (similar to sinker, induces weak contact)

Pitchers with plus extension and release height have flatter VAA on fastballs, making them more effective up in the zone.

7.4.2 Calculating VAA

VAA can be calculated from Statcast's trajectory data. The formula involves the pitch's vertical velocity (vz0) and horizontal velocity (vx0) as it crosses the plate:

VAA = arctan(vz / vy) × (180 / π)

Where:

vz = vertical velocity at the plate (feet/second)

vy = forward velocity at the plate (feet/second)

Let's implement VAA calculation:

Python Implementation

def calculate_vaa(df):
    """
    Calculate Vertical Approach Angle for each pitch.

    Parameters:
    df: Statcast DataFrame with velocity components

    Returns:
    DataFrame with VAA added
    """
    pitches = df.copy()

    # Calculate VAA from velocity components at the plate
    # vz0 and vy0 are velocities at the plate
    pitches['vaa'] = np.degrees(
        np.arctan(pitches['vz0'] / pitches['vy0'])
    )

    return pitches

def analyze_vaa_by_pitch_type(df):
    """
    Analyze VAA metrics by pitch type.

    Parameters:
    df: Statcast DataFrame with VAA calculated

    Returns:
    DataFrame with VAA metrics by pitch type
    """
    # Calculate VAA
    pitches = calculate_vaa(df)

    # Filter for valid VAA values
    pitches = pitches[pitches['vaa'].notna()]

    # Group by pitch type
    vaa_metrics = pitches.groupby('pitch_type').agg({
        'vaa': ['mean', 'std', 'min', 'max'],
        'release_speed': 'mean',
        'release_extension': 'mean',
        'release_pos_z': 'mean'  # Release height
    }).round(2)

    vaa_metrics.columns = ['avg_vaa', 'vaa_std', 'min_vaa', 'max_vaa',
                          'avg_velo', 'avg_extension', 'release_height']
    vaa_metrics = vaa_metrics.reset_index()

    # Add pitch counts
    pitch_counts = pitches.groupby('pitch_type').size().reset_index(name='count')
    vaa_metrics = vaa_metrics.merge(pitch_counts, on='pitch_type')

    return vaa_metrics.sort_values('avg_vaa')

# Analyze VAA for Gerrit Cole
cole_vaa = analyze_vaa_by_pitch_type(cole_pitches)

print("\nGerrit Cole 2024 Vertical Approach Angle Profile")
print("=" * 80)
print(cole_vaa.to_string(index=False))

# Visualize VAA by pitch type
fig, ax = plt.subplots(figsize=(12, 6))

qualifying_pitches = cole_vaa[cole_vaa['count'] >= 50]['pitch_type']
plot_data = cole_vaa[cole_vaa['pitch_type'].isin(qualifying_pitches)]

bars = ax.bar(range(len(plot_data)), plot_data['avg_vaa'],
              color='steelblue', alpha=0.7, edgecolor='black')

# Color bars by category
for i, (idx, row) in enumerate(plot_data.iterrows()):
    if row['avg_vaa'] > -5:
        bars[i].set_color('#2ca02c')  # Green for flat
    elif row['avg_vaa'] > -6:
        bars[i].set_color('#ff7f0e')  # Orange for medium
    else:
        bars[i].set_color('#d62728')  # Red for steep

ax.set_xticks(range(len(plot_data)))
ax.set_xticklabels(plot_data['pitch_type'])
ax.set_xlabel('Pitch Type', fontsize=12, fontweight='bold')
ax.set_ylabel('Average VAA (degrees)', fontsize=12, fontweight='bold')
ax.set_title('Vertical Approach Angle by Pitch Type\nGerrit Cole 2024\n' +
             'Green = Flat | Orange = Medium | Red = Steep',
             fontsize=13, fontweight='bold')
ax.axhline(y=0, color='black', linestyle='-', linewidth=0.8)
ax.grid(axis='y', alpha=0.3, linestyle='--')

# Add value labels
for i, (idx, row) in enumerate(plot_data.iterrows()):
    ax.text(i, row['avg_vaa'], f"{row['avg_vaa']:.1f}°",
           ha='center', va='bottom' if row['avg_vaa'] > 0 else 'top',
           fontweight='bold', fontsize=10)

plt.tight_layout()
plt.show()

R Implementation

calculate_vaa <- function(df) {
  # Calculate VAA from velocity components
  df <- df %>%
    mutate(
      vaa = atan(vz0 / vy0) * (180 / pi)
    )

  return(df)
}

analyze_vaa_by_pitch_type <- function(df) {
  # Calculate VAA
  pitches <- calculate_vaa(df) %>%
    filter(!is.na(vaa))

  # Group by pitch type
  vaa_metrics <- pitches %>%
    group_by(pitch_type) %>%
    summarise(
      avg_vaa = mean(vaa, na.rm = TRUE),
      vaa_std = sd(vaa, na.rm = TRUE),
      min_vaa = min(vaa, na.rm = TRUE),
      max_vaa = max(vaa, na.rm = TRUE),
      avg_velo = mean(release_speed, na.rm = TRUE),
      avg_extension = mean(release_extension, na.rm = TRUE),
      release_height = mean(release_pos_z, na.rm = TRUE),
      count = n(),
      .groups = 'drop'
    ) %>%
    mutate(across(where(is.numeric), ~round(.x, 2))) %>%
    arrange(avg_vaa)

  return(vaa_metrics)
}

# Analyze VAA for Gerrit Cole
cole_vaa <- analyze_vaa_by_pitch_type(cole_pitches)

cat("\nGerrit Cole 2024 Vertical Approach Angle Profile\n")
cat(strrep("=", 80), "\n")
print(cole_vaa)

# Visualize VAA by pitch type
cole_vaa %>%
  filter(count >= 50) %>%
  mutate(
    vaa_category = case_when(
      avg_vaa > -5 ~ "Flat",
      avg_vaa > -6 ~ "Medium",
      TRUE ~ "Steep"
    ),
    vaa_category = factor(vaa_category, levels = c("Flat", "Medium", "Steep"))
  ) %>%
  ggplot(aes(x = reorder(pitch_type, avg_vaa), y = avg_vaa, fill = vaa_category)) +
  geom_bar(stat = "identity", color = "black", alpha = 0.8) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_text(aes(label = sprintf("%.1f°", avg_vaa)),
            hjust = ifelse(cole_vaa$avg_vaa[cole_vaa$count >= 50] > 0, -0.2, 1.2),
            fontface = "bold", size = 4) +
  scale_fill_manual(values = c("Flat" = "#2ca02c", "Medium" = "#ff7f0e", "Steep" = "#d62728")) +
  labs(
    title = "Vertical Approach Angle by Pitch Type",
    subtitle = "Gerrit Cole 2024",
    x = "Pitch Type",
    y = "Average VAA (degrees)",
    fill = "VAA Category"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5),
    axis.title = element_text(face = "bold", size = 11),
    legend.position = "top"
  )

7.4.3 VAA and Performance

Flat VAA on fastballs correlates with higher whiff rates when thrown in the upper third of the zone. Pitchers like Spencer Strider generate extreme whiff rates partly due to their flat VAA combined with elite velocity and spin.

Conversely, pitches with steep VAA work better low in the zone, generating ground balls and weak contact. Sinkerball pitchers leverage steep VAA to induce ground balls.

calculate_vaa <- function(df) {
  # Calculate VAA from velocity components
  df <- df %>%
    mutate(
      vaa = atan(vz0 / vy0) * (180 / pi)
    )

  return(df)
}

analyze_vaa_by_pitch_type <- function(df) {
  # Calculate VAA
  pitches <- calculate_vaa(df) %>%
    filter(!is.na(vaa))

  # Group by pitch type
  vaa_metrics <- pitches %>%
    group_by(pitch_type) %>%
    summarise(
      avg_vaa = mean(vaa, na.rm = TRUE),
      vaa_std = sd(vaa, na.rm = TRUE),
      min_vaa = min(vaa, na.rm = TRUE),
      max_vaa = max(vaa, na.rm = TRUE),
      avg_velo = mean(release_speed, na.rm = TRUE),
      avg_extension = mean(release_extension, na.rm = TRUE),
      release_height = mean(release_pos_z, na.rm = TRUE),
      count = n(),
      .groups = 'drop'
    ) %>%
    mutate(across(where(is.numeric), ~round(.x, 2))) %>%
    arrange(avg_vaa)

  return(vaa_metrics)
}

# Analyze VAA for Gerrit Cole
cole_vaa <- analyze_vaa_by_pitch_type(cole_pitches)

cat("\nGerrit Cole 2024 Vertical Approach Angle Profile\n")
cat(strrep("=", 80), "\n")
print(cole_vaa)

# Visualize VAA by pitch type
cole_vaa %>%
  filter(count >= 50) %>%
  mutate(
    vaa_category = case_when(
      avg_vaa > -5 ~ "Flat",
      avg_vaa > -6 ~ "Medium",
      TRUE ~ "Steep"
    ),
    vaa_category = factor(vaa_category, levels = c("Flat", "Medium", "Steep"))
  ) %>%
  ggplot(aes(x = reorder(pitch_type, avg_vaa), y = avg_vaa, fill = vaa_category)) +
  geom_bar(stat = "identity", color = "black", alpha = 0.8) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_text(aes(label = sprintf("%.1f°", avg_vaa)),
            hjust = ifelse(cole_vaa$avg_vaa[cole_vaa$count >= 50] > 0, -0.2, 1.2),
            fontface = "bold", size = 4) +
  scale_fill_manual(values = c("Flat" = "#2ca02c", "Medium" = "#ff7f0e", "Steep" = "#d62728")) +
  labs(
    title = "Vertical Approach Angle by Pitch Type",
    subtitle = "Gerrit Cole 2024",
    x = "Pitch Type",
    y = "Average VAA (degrees)",
    fill = "VAA Category"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5),
    axis.title = element_text(face = "bold", size = 11),
    legend.position = "top"
  )

Python

def calculate_vaa(df):
    """
    Calculate Vertical Approach Angle for each pitch.

    Parameters:
    df: Statcast DataFrame with velocity components

    Returns:
    DataFrame with VAA added
    """
    pitches = df.copy()

    # Calculate VAA from velocity components at the plate
    # vz0 and vy0 are velocities at the plate
    pitches['vaa'] = np.degrees(
        np.arctan(pitches['vz0'] / pitches['vy0'])
    )

    return pitches

def analyze_vaa_by_pitch_type(df):
    """
    Analyze VAA metrics by pitch type.

    Parameters:
    df: Statcast DataFrame with VAA calculated

    Returns:
    DataFrame with VAA metrics by pitch type
    """
    # Calculate VAA
    pitches = calculate_vaa(df)

    # Filter for valid VAA values
    pitches = pitches[pitches['vaa'].notna()]

    # Group by pitch type
    vaa_metrics = pitches.groupby('pitch_type').agg({
        'vaa': ['mean', 'std', 'min', 'max'],
        'release_speed': 'mean',
        'release_extension': 'mean',
        'release_pos_z': 'mean'  # Release height
    }).round(2)

    vaa_metrics.columns = ['avg_vaa', 'vaa_std', 'min_vaa', 'max_vaa',
                          'avg_velo', 'avg_extension', 'release_height']
    vaa_metrics = vaa_metrics.reset_index()

    # Add pitch counts
    pitch_counts = pitches.groupby('pitch_type').size().reset_index(name='count')
    vaa_metrics = vaa_metrics.merge(pitch_counts, on='pitch_type')

    return vaa_metrics.sort_values('avg_vaa')

# Analyze VAA for Gerrit Cole
cole_vaa = analyze_vaa_by_pitch_type(cole_pitches)

print("\nGerrit Cole 2024 Vertical Approach Angle Profile")
print("=" * 80)
print(cole_vaa.to_string(index=False))

# Visualize VAA by pitch type
fig, ax = plt.subplots(figsize=(12, 6))

qualifying_pitches = cole_vaa[cole_vaa['count'] >= 50]['pitch_type']
plot_data = cole_vaa[cole_vaa['pitch_type'].isin(qualifying_pitches)]

bars = ax.bar(range(len(plot_data)), plot_data['avg_vaa'],
              color='steelblue', alpha=0.7, edgecolor='black')

# Color bars by category
for i, (idx, row) in enumerate(plot_data.iterrows()):
    if row['avg_vaa'] > -5:
        bars[i].set_color('#2ca02c')  # Green for flat
    elif row['avg_vaa'] > -6:
        bars[i].set_color('#ff7f0e')  # Orange for medium
    else:
        bars[i].set_color('#d62728')  # Red for steep

ax.set_xticks(range(len(plot_data)))
ax.set_xticklabels(plot_data['pitch_type'])
ax.set_xlabel('Pitch Type', fontsize=12, fontweight='bold')
ax.set_ylabel('Average VAA (degrees)', fontsize=12, fontweight='bold')
ax.set_title('Vertical Approach Angle by Pitch Type\nGerrit Cole 2024\n' +
             'Green = Flat | Orange = Medium | Red = Steep',
             fontsize=13, fontweight='bold')
ax.axhline(y=0, color='black', linestyle='-', linewidth=0.8)
ax.grid(axis='y', alpha=0.3, linestyle='--')

# Add value labels
for i, (idx, row) in enumerate(plot_data.iterrows()):
    ax.text(i, row['avg_vaa'], f"{row['avg_vaa']:.1f}°",
           ha='center', va='bottom' if row['avg_vaa'] > 0 else 'top',
           fontweight='bold', fontsize=10)

plt.tight_layout()
plt.show()

7.5 Release Point and Tunneling

7.5.1 Release Point Consistency

Release point refers to where the ball leaves the pitcher's hand, measured in three dimensions:

X-axis: Side-to-side (from catcher's view, negative = arm side, positive = glove side)

Y-axis: Distance from home plate (extension)

Z-axis: Height above ground

Consistency in release point is crucial for deception. If a pitcher releases his fastball from 6 feet high and his curveball from 5.5 feet, hitters can identify pitches early. Elite pitchers maintain nearly identical release points across their arsenal.

def analyze_release_point_consistency(df):
    """
    Analyze release point consistency by pitch type.

    Parameters:
    df: Statcast DataFrame with release point data

    Returns:
    DataFrame with release point metrics and consistency measures
    """
    pitches = df[
        df['release_pos_x'].notna() &
        df['release_pos_z'].notna() &
        df['pitch_type'].notna()
    ].copy()

    # Calculate metrics by pitch type
    release_metrics = pitches.groupby('pitch_type').agg({
        'release_pos_x': ['mean', 'std'],
        'release_pos_z': ['mean', 'std'],
        'release_extension': ['mean', 'std']
    }).round(3)

    release_metrics.columns = ['x_mean', 'x_std', 'z_mean', 'z_std',
                               'ext_mean', 'ext_std']
    release_metrics = release_metrics.reset_index()

    # Add pitch counts
    pitch_counts = pitches.groupby('pitch_type').size().reset_index(name='count')
    release_metrics = release_metrics.merge(pitch_counts, on='pitch_type')

    # Calculate total variability (consistency score - lower is better)
    release_metrics['consistency_score'] = (
        release_metrics['x_std'] + release_metrics['z_std']
    ).round(3)

    return release_metrics.sort_values('consistency_score')

# Visualize release point scatter
def plot_release_points(df, player_name="Pitcher"):
    """Create scatter plot of release points by pitch type."""
    pitches = df[
        df['release_pos_x'].notna() &
        df['release_pos_z'].notna() &
        df['pitch_type'].notna()
    ].copy()

    # Filter for pitch types with sufficient counts
    pitch_counts = pitches['pitch_type'].value_counts()
    qualifying_pitches = pitch_counts[pitch_counts >= 30].index
    plot_data = pitches[pitches['pitch_type'].isin(qualifying_pitches)]

    fig, ax = plt.subplots(figsize=(10, 12))

    # Pitch colors
    pitch_colors = {
        'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
        'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
        'FS': '#17becf', 'KC': '#bcbd22'
    }

    # Plot each pitch type
    for pitch_type in qualifying_pitches:
        pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]

        ax.scatter(
            pitch_subset['release_pos_x'],
            pitch_subset['release_pos_z'],
            c=pitch_colors.get(pitch_type, '#7f7f7f'),
            label=f"{pitch_type} (n={len(pitch_subset)})",
            alpha=0.4,
            s=20,
            edgecolors='none'
        )

        # Add mean release point
        mean_x = pitch_subset['release_pos_x'].mean()
        mean_z = pitch_subset['release_pos_z'].mean()
        ax.scatter(mean_x, mean_z,
                  c=pitch_colors.get(pitch_type, '#7f7f7f'),
                  marker='X', s=200, edgecolors='black', linewidth=2,
                  zorder=10)

    ax.set_xlabel('Horizontal Release Point (feet)\n← Arm Side | Glove Side →',
                  fontsize=12, fontweight='bold')
    ax.set_ylabel('Vertical Release Point (feet)',
                  fontsize=12, fontweight='bold')
    ax.set_title(f'{player_name} Release Point Consistency\n' +
                 'X = Mean Release Point by Pitch Type',
                 fontsize=14, fontweight='bold', pad=20)

    ax.grid(True, alpha=0.3, linestyle='--')
    ax.legend(loc='upper right', framealpha=0.9, fontsize=9)

    plt.tight_layout()
    plt.show()

# Analyze Cole's release point consistency
cole_release = analyze_release_point_consistency(cole_pitches)

print("\nRelease Point Consistency Analysis")
print("=" * 80)
print(cole_release.to_string(index=False))
print("\nNote: Lower consistency_score indicates better release point consistency")

# Visualize
plot_release_points(cole_pitches, "Gerrit Cole 2024")

# R version: Release point analysis
analyze_release_point_consistency <- function(df) {
  pitches <- df %>%
    filter(!is.na(release_pos_x), !is.na(release_pos_z), !is.na(pitch_type))

  release_metrics <- pitches %>%
    group_by(pitch_type) %>%
    summarise(
      x_mean = mean(release_pos_x, na.rm = TRUE),
      x_std = sd(release_pos_x, na.rm = TRUE),
      z_mean = mean(release_pos_z, na.rm = TRUE),
      z_std = sd(release_pos_z, na.rm = TRUE),
      ext_mean = mean(release_extension, na.rm = TRUE),
      ext_std = sd(release_extension, na.rm = TRUE),
      count = n(),
      .groups = 'drop'
    ) %>%
    mutate(
      consistency_score = x_std + z_std,
      across(where(is.numeric), ~round(.x, 3))
    ) %>%
    arrange(consistency_score)

  return(release_metrics)
}

plot_release_points <- function(df, player_name = "Pitcher") {
  pitches <- df %>%
    filter(!is.na(release_pos_x), !is.na(release_pos_z), !is.na(pitch_type))

  # Filter for sufficient pitch counts
  pitch_counts <- pitches %>%
    count(pitch_type) %>%
    filter(n >= 30)

  plot_data <- pitches %>%
    filter(pitch_type %in% pitch_counts$pitch_type)

  # Calculate means for each pitch type
  means <- plot_data %>%
    group_by(pitch_type) %>%
    summarise(
      mean_x = mean(release_pos_x),
      mean_z = mean(release_pos_z),
      .groups = 'drop'
    )

  ggplot(plot_data, aes(x = release_pos_x, y = release_pos_z, color = pitch_type)) +
    geom_point(alpha = 0.4, size = 1.5) +
    geom_point(data = means, aes(x = mean_x, y = mean_z, color = pitch_type),
               shape = 4, size = 8, stroke = 2) +
    scale_color_brewer(palette = "Set1", name = "Pitch Type") +
    labs(
      title = paste(player_name, "Release Point Consistency"),
      subtitle = "X = Mean Release Point by Pitch Type",
      x = "Horizontal Release Point (feet)\n← Arm Side | Glove Side →",
      y = "Vertical Release Point (feet)"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
      plot.subtitle = element_text(size = 11, hjust = 0.5),
      axis.title = element_text(face = "bold", size = 11),
      legend.position = "right"
    )
}

cole_release <- analyze_release_point_consistency(cole_pitches)

cat("\nRelease Point Consistency Analysis\n")
cat(strrep("=", 80), "\n")
print(cole_release)
cat("\nNote: Lower consistency_score indicates better release point consistency\n")

plot_release_points(cole_pitches, "Gerrit Cole 2024")

7.5.2 Pitch Tunneling

Pitch tunneling refers to the concept that effective pitch sequences make pitches look identical for as long as possible before breaking sharply in different directions. When a fastball and changeup travel through the same "tunnel" (similar trajectory) for most of the flight path, hitters can't distinguish them until it's too late to adjust.

Tunneling metrics:

Tunnel Point: The location where pitch paths diverge measurably

Release Point Similarity: How close release points are across pitch types

Early Flight Path: Trajectory similarity in the first 20-30 feet

Elite tunneling creates decision-making problems for hitters. They must commit to a swing based on early trajectory, but elite tunneling means that trajectory is identical for multiple pitch types.

Example: Gerrit Cole's four-seam fastball and slider come from nearly identical release points with similar early trajectories. By the time the slider breaks away, the hitter has already committed to the fastball location.

# R version: Release point analysis
analyze_release_point_consistency <- function(df) {
  pitches <- df %>%
    filter(!is.na(release_pos_x), !is.na(release_pos_z), !is.na(pitch_type))

  release_metrics <- pitches %>%
    group_by(pitch_type) %>%
    summarise(
      x_mean = mean(release_pos_x, na.rm = TRUE),
      x_std = sd(release_pos_x, na.rm = TRUE),
      z_mean = mean(release_pos_z, na.rm = TRUE),
      z_std = sd(release_pos_z, na.rm = TRUE),
      ext_mean = mean(release_extension, na.rm = TRUE),
      ext_std = sd(release_extension, na.rm = TRUE),
      count = n(),
      .groups = 'drop'
    ) %>%
    mutate(
      consistency_score = x_std + z_std,
      across(where(is.numeric), ~round(.x, 3))
    ) %>%
    arrange(consistency_score)

  return(release_metrics)
}

plot_release_points <- function(df, player_name = "Pitcher") {
  pitches <- df %>%
    filter(!is.na(release_pos_x), !is.na(release_pos_z), !is.na(pitch_type))

  # Filter for sufficient pitch counts
  pitch_counts <- pitches %>%
    count(pitch_type) %>%
    filter(n >= 30)

  plot_data <- pitches %>%
    filter(pitch_type %in% pitch_counts$pitch_type)

  # Calculate means for each pitch type
  means <- plot_data %>%
    group_by(pitch_type) %>%
    summarise(
      mean_x = mean(release_pos_x),
      mean_z = mean(release_pos_z),
      .groups = 'drop'
    )

  ggplot(plot_data, aes(x = release_pos_x, y = release_pos_z, color = pitch_type)) +
    geom_point(alpha = 0.4, size = 1.5) +
    geom_point(data = means, aes(x = mean_x, y = mean_z, color = pitch_type),
               shape = 4, size = 8, stroke = 2) +
    scale_color_brewer(palette = "Set1", name = "Pitch Type") +
    labs(
      title = paste(player_name, "Release Point Consistency"),
      subtitle = "X = Mean Release Point by Pitch Type",
      x = "Horizontal Release Point (feet)\n← Arm Side | Glove Side →",
      y = "Vertical Release Point (feet)"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
      plot.subtitle = element_text(size = 11, hjust = 0.5),
      axis.title = element_text(face = "bold", size = 11),
      legend.position = "right"
    )
}

cole_release <- analyze_release_point_consistency(cole_pitches)

cat("\nRelease Point Consistency Analysis\n")
cat(strrep("=", 80), "\n")
print(cole_release)
cat("\nNote: Lower consistency_score indicates better release point consistency\n")

plot_release_points(cole_pitches, "Gerrit Cole 2024")

Python

def analyze_release_point_consistency(df):
    """
    Analyze release point consistency by pitch type.

    Parameters:
    df: Statcast DataFrame with release point data

    Returns:
    DataFrame with release point metrics and consistency measures
    """
    pitches = df[
        df['release_pos_x'].notna() &
        df['release_pos_z'].notna() &
        df['pitch_type'].notna()
    ].copy()

    # Calculate metrics by pitch type
    release_metrics = pitches.groupby('pitch_type').agg({
        'release_pos_x': ['mean', 'std'],
        'release_pos_z': ['mean', 'std'],
        'release_extension': ['mean', 'std']
    }).round(3)

    release_metrics.columns = ['x_mean', 'x_std', 'z_mean', 'z_std',
                               'ext_mean', 'ext_std']
    release_metrics = release_metrics.reset_index()

    # Add pitch counts
    pitch_counts = pitches.groupby('pitch_type').size().reset_index(name='count')
    release_metrics = release_metrics.merge(pitch_counts, on='pitch_type')

    # Calculate total variability (consistency score - lower is better)
    release_metrics['consistency_score'] = (
        release_metrics['x_std'] + release_metrics['z_std']
    ).round(3)

    return release_metrics.sort_values('consistency_score')

# Visualize release point scatter
def plot_release_points(df, player_name="Pitcher"):
    """Create scatter plot of release points by pitch type."""
    pitches = df[
        df['release_pos_x'].notna() &
        df['release_pos_z'].notna() &
        df['pitch_type'].notna()
    ].copy()

    # Filter for pitch types with sufficient counts
    pitch_counts = pitches['pitch_type'].value_counts()
    qualifying_pitches = pitch_counts[pitch_counts >= 30].index
    plot_data = pitches[pitches['pitch_type'].isin(qualifying_pitches)]

    fig, ax = plt.subplots(figsize=(10, 12))

    # Pitch colors
    pitch_colors = {
        'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
        'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
        'FS': '#17becf', 'KC': '#bcbd22'
    }

    # Plot each pitch type
    for pitch_type in qualifying_pitches:
        pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]

        ax.scatter(
            pitch_subset['release_pos_x'],
            pitch_subset['release_pos_z'],
            c=pitch_colors.get(pitch_type, '#7f7f7f'),
            label=f"{pitch_type} (n={len(pitch_subset)})",
            alpha=0.4,
            s=20,
            edgecolors='none'
        )

        # Add mean release point
        mean_x = pitch_subset['release_pos_x'].mean()
        mean_z = pitch_subset['release_pos_z'].mean()
        ax.scatter(mean_x, mean_z,
                  c=pitch_colors.get(pitch_type, '#7f7f7f'),
                  marker='X', s=200, edgecolors='black', linewidth=2,
                  zorder=10)

    ax.set_xlabel('Horizontal Release Point (feet)\n← Arm Side | Glove Side →',
                  fontsize=12, fontweight='bold')
    ax.set_ylabel('Vertical Release Point (feet)',
                  fontsize=12, fontweight='bold')
    ax.set_title(f'{player_name} Release Point Consistency\n' +
                 'X = Mean Release Point by Pitch Type',
                 fontsize=14, fontweight='bold', pad=20)

    ax.grid(True, alpha=0.3, linestyle='--')
    ax.legend(loc='upper right', framealpha=0.9, fontsize=9)

    plt.tight_layout()
    plt.show()

# Analyze Cole's release point consistency
cole_release = analyze_release_point_consistency(cole_pitches)

print("\nRelease Point Consistency Analysis")
print("=" * 80)
print(cole_release.to_string(index=False))
print("\nNote: Lower consistency_score indicates better release point consistency")

# Visualize
plot_release_points(cole_pitches, "Gerrit Cole 2024")

7.6 Pitch Arsenal Analysis

7.6.1 Building an Arsenal Report

A comprehensive arsenal report summarizes a pitcher's full repertoire, including usage rates, velocity, movement, and outcome metrics for each pitch type:

def create_arsenal_report(df, player_name="Pitcher"):
    """
    Create comprehensive pitch arsenal report.

    Parameters:
    df: Statcast DataFrame with pitch-level data
    player_name: Name for report header

    Returns:
    DataFrame with complete arsenal metrics
    """
    pitches = df[df['pitch_type'].notna()].copy()

    # Calculate whiff rate (swings and misses / swings)
    pitches['was_swing'] = pitches['description'].isin([
        'swinging_strike', 'swinging_strike_blocked',
        'foul', 'hit_into_play', 'foul_tip'
    ])
    pitches['was_whiff'] = pitches['description'].isin([
        'swinging_strike', 'swinging_strike_blocked'
    ])

    # Build arsenal report
    arsenal = pitches.groupby('pitch_type').agg({
        'pitch_type': 'count',  # Total pitches
        'release_speed': 'mean',
        'release_spin_rate': 'mean',
        'pfx_z': lambda x: (x * 12).mean(),  # IVB in inches
        'pfx_x': lambda x: (x * 12).mean(),  # HB in inches
        'was_swing': 'sum',
        'was_whiff': 'sum'
    }).round(1)

    arsenal.columns = ['count', 'avg_velo', 'avg_spin', 'ivb', 'hb',
                       'swings', 'whiffs']
    arsenal = arsenal.reset_index()

    # Calculate rates
    total_pitches = arsenal['count'].sum()
    arsenal['usage_pct'] = (arsenal['count'] / total_pitches * 100).round(1)
    arsenal['whiff_rate'] = (arsenal['whiffs'] / arsenal['swings'] * 100).round(1)

    # Get xwOBA by pitch type (if available)
    if 'estimated_woba_using_speedangle' in pitches.columns:
        xwoba = pitches.groupby('pitch_type')['estimated_woba_using_speedangle'].mean()
        arsenal = arsenal.merge(
            xwoba.round(3).reset_index().rename(columns={'estimated_woba_using_speedangle': 'xwOBA'}),
            on='pitch_type',
            how='left'
        )

    # Sort by usage
    arsenal = arsenal.sort_values('usage_pct', ascending=False)

    return arsenal

# Generate arsenal report
cole_arsenal = create_arsenal_report(cole_pitches, "Gerrit Cole")

print("\n" + "="*90)
print(f"PITCH ARSENAL REPORT: Gerrit Cole 2024")
print("="*90)
print(cole_arsenal.to_string(index=False))
print("\nKey:")
print("  IVB = Induced Vertical Break (inches)")
print("  HB = Horizontal Break (inches, + = arm side, - = glove side)")
print("  Whiff Rate = Swinging strikes / Total swings")

# R version: Arsenal report
create_arsenal_report <- function(df, player_name = "Pitcher") {
  pitches <- df %>%
    filter(!is.na(pitch_type))

  # Identify swings and whiffs
  pitches <- pitches %>%
    mutate(
      was_swing = description %in% c('swinging_strike', 'swinging_strike_blocked',
                                      'foul', 'hit_into_play', 'foul_tip'),
      was_whiff = description %in% c('swinging_strike', 'swinging_strike_blocked')
    )

  # Build arsenal report
  arsenal <- pitches %>%
    group_by(pitch_type) %>%
    summarise(
      count = n(),
      avg_velo = mean(release_speed, na.rm = TRUE),
      avg_spin = mean(release_spin_rate, na.rm = TRUE),
      ivb = mean(pfx_z * 12, na.rm = TRUE),
      hb = mean(pfx_x * 12, na.rm = TRUE),
      swings = sum(was_swing, na.rm = TRUE),
      whiffs = sum(was_whiff, na.rm = TRUE),
      xwOBA = mean(estimated_woba_using_speedangle, na.rm = TRUE),
      .groups = 'drop'
    ) %>%
    mutate(
      usage_pct = count / sum(count) * 100,
      whiff_rate = whiffs / swings * 100,
      across(c(avg_velo, avg_spin, ivb, hb, usage_pct, whiff_rate, xwOBA),
             ~round(.x, 1))
    ) %>%
    arrange(desc(usage_pct))

  return(arsenal)
}

cole_arsenal <- create_arsenal_report(cole_pitches, "Gerrit Cole")

cat("\n", strrep("=", 90), "\n")
cat("PITCH ARSENAL REPORT: Gerrit Cole 2024\n")
cat(strrep("=", 90), "\n")
print(cole_arsenal)
cat("\nKey:\n")
cat("  IVB = Induced Vertical Break (inches)\n")
cat("  HB = Horizontal Break (inches, + = arm side, - = glove side)\n")
cat("  Whiff Rate = Swinging strikes / Total swings\n")

7.6.2 Understanding Stuff+

Stuff+ is a metric that evaluates the "quality" of a pitch in isolation, independent of results. It's scaled to 100, where:

100 = MLB average

110 = 10% better than average

90 = 10% worse than average

Stuff+ models incorporate:

Velocity

Spin rate

Movement (vertical and horizontal)

Release characteristics

Platoon matchup

Stuff+ helps identify pitchers who might be underperforming their arsenal quality (unlucky results or command issues) or overperforming (likely to regress).

Location+ and Pitching+ complement Stuff+:

Location+: Quality of pitch location/command

Pitching+: Combined metric (stuff + location + context)

While we can't replicate the proprietary Stuff+ model exactly, we can approximate it using Statcast data and understand its principles.

# R version: Arsenal report
create_arsenal_report <- function(df, player_name = "Pitcher") {
  pitches <- df %>%
    filter(!is.na(pitch_type))

  # Identify swings and whiffs
  pitches <- pitches %>%
    mutate(
      was_swing = description %in% c('swinging_strike', 'swinging_strike_blocked',
                                      'foul', 'hit_into_play', 'foul_tip'),
      was_whiff = description %in% c('swinging_strike', 'swinging_strike_blocked')
    )

  # Build arsenal report
  arsenal <- pitches %>%
    group_by(pitch_type) %>%
    summarise(
      count = n(),
      avg_velo = mean(release_speed, na.rm = TRUE),
      avg_spin = mean(release_spin_rate, na.rm = TRUE),
      ivb = mean(pfx_z * 12, na.rm = TRUE),
      hb = mean(pfx_x * 12, na.rm = TRUE),
      swings = sum(was_swing, na.rm = TRUE),
      whiffs = sum(was_whiff, na.rm = TRUE),
      xwOBA = mean(estimated_woba_using_speedangle, na.rm = TRUE),
      .groups = 'drop'
    ) %>%
    mutate(
      usage_pct = count / sum(count) * 100,
      whiff_rate = whiffs / swings * 100,
      across(c(avg_velo, avg_spin, ivb, hb, usage_pct, whiff_rate, xwOBA),
             ~round(.x, 1))
    ) %>%
    arrange(desc(usage_pct))

  return(arsenal)
}

cole_arsenal <- create_arsenal_report(cole_pitches, "Gerrit Cole")

cat("\n", strrep("=", 90), "\n")
cat("PITCH ARSENAL REPORT: Gerrit Cole 2024\n")
cat(strrep("=", 90), "\n")
print(cole_arsenal)
cat("\nKey:\n")
cat("  IVB = Induced Vertical Break (inches)\n")
cat("  HB = Horizontal Break (inches, + = arm side, - = glove side)\n")
cat("  Whiff Rate = Swinging strikes / Total swings\n")

Python

def create_arsenal_report(df, player_name="Pitcher"):
    """
    Create comprehensive pitch arsenal report.

    Parameters:
    df: Statcast DataFrame with pitch-level data
    player_name: Name for report header

    Returns:
    DataFrame with complete arsenal metrics
    """
    pitches = df[df['pitch_type'].notna()].copy()

    # Calculate whiff rate (swings and misses / swings)
    pitches['was_swing'] = pitches['description'].isin([
        'swinging_strike', 'swinging_strike_blocked',
        'foul', 'hit_into_play', 'foul_tip'
    ])
    pitches['was_whiff'] = pitches['description'].isin([
        'swinging_strike', 'swinging_strike_blocked'
    ])

    # Build arsenal report
    arsenal = pitches.groupby('pitch_type').agg({
        'pitch_type': 'count',  # Total pitches
        'release_speed': 'mean',
        'release_spin_rate': 'mean',
        'pfx_z': lambda x: (x * 12).mean(),  # IVB in inches
        'pfx_x': lambda x: (x * 12).mean(),  # HB in inches
        'was_swing': 'sum',
        'was_whiff': 'sum'
    }).round(1)

    arsenal.columns = ['count', 'avg_velo', 'avg_spin', 'ivb', 'hb',
                       'swings', 'whiffs']
    arsenal = arsenal.reset_index()

    # Calculate rates
    total_pitches = arsenal['count'].sum()
    arsenal['usage_pct'] = (arsenal['count'] / total_pitches * 100).round(1)
    arsenal['whiff_rate'] = (arsenal['whiffs'] / arsenal['swings'] * 100).round(1)

    # Get xwOBA by pitch type (if available)
    if 'estimated_woba_using_speedangle' in pitches.columns:
        xwoba = pitches.groupby('pitch_type')['estimated_woba_using_speedangle'].mean()
        arsenal = arsenal.merge(
            xwoba.round(3).reset_index().rename(columns={'estimated_woba_using_speedangle': 'xwOBA'}),
            on='pitch_type',
            how='left'
        )

    # Sort by usage
    arsenal = arsenal.sort_values('usage_pct', ascending=False)

    return arsenal

# Generate arsenal report
cole_arsenal = create_arsenal_report(cole_pitches, "Gerrit Cole")

print("\n" + "="*90)
print(f"PITCH ARSENAL REPORT: Gerrit Cole 2024")
print("="*90)
print(cole_arsenal.to_string(index=False))
print("\nKey:")
print("  IVB = Induced Vertical Break (inches)")
print("  HB = Horizontal Break (inches, + = arm side, - = glove side)")
print("  Whiff Rate = Swinging strikes / Total swings")

7.7 Location and Command

7.7.1 Zone Analysis Metrics

Pitch location is as important as pitch quality. The strike zone can be divided into regions for analysis:

Zone Metrics:

Zone%: Percentage of pitches in the strike zone

Edge%: Percentage of pitches on the edges (borders of zone)

Heart%: Percentage of pitches in the middle of the zone (most hittable)

Chase%: Percentage of pitches outside zone that generate swings

Waste%: Percentage of pitches well outside zone (intentional balls)

def analyze_location_metrics(df):
    """
    Calculate pitch location and command metrics.

    Parameters:
    df: Statcast DataFrame with location data

    Returns:
    DataFrame with location metrics by pitch type
    """
    pitches = df[df['zone'].notna()].copy()

    # Zone definitions (Statcast uses zones 1-9 for strike zone, 11-14 for outside)
    pitches['in_zone'] = pitches['zone'] <= 9
    pitches['heart'] = pitches['zone'].isin([5])  # Zone 5 is heart
    pitches['edge'] = pitches['zone'].isin([1, 2, 3, 4, 6, 7, 8, 9]) & ~pitches['heart']
    pitches['outside_zone'] = pitches['zone'] > 9

    # Identify chases (swings on pitches outside zone)
    pitches['is_swing'] = pitches['description'].isin([
        'swinging_strike', 'swinging_strike_blocked',
        'foul', 'hit_into_play', 'foul_tip'
    ])
    pitches['is_chase'] = pitches['is_swing'] & pitches['outside_zone']

    # Calculate metrics by pitch type
    location_metrics = pitches.groupby('pitch_type').agg({
        'in_zone': ['sum', 'count'],
        'heart': 'sum',
        'edge': 'sum',
        'outside_zone': 'sum',
        'is_chase': 'sum'
    })

    location_metrics.columns = ['zone_count', 'total', 'heart_count',
                                'edge_count', 'outside_count', 'chase_count']
    location_metrics = location_metrics.reset_index()

    # Calculate percentages
    location_metrics['zone_pct'] = (
        location_metrics['zone_count'] / location_metrics['total'] * 100
    ).round(1)
    location_metrics['heart_pct'] = (
        location_metrics['heart_count'] / location_metrics['total'] * 100
    ).round(1)
    location_metrics['edge_pct'] = (
        location_metrics['edge_count'] / location_metrics['total'] * 100
    ).round(1)

    # Chase rate: chases / pitches outside zone
    location_metrics['chase_rate'] = (
        location_metrics['chase_count'] / location_metrics['outside_count'] * 100
    ).round(1)

    # Select final columns
    result = location_metrics[[
        'pitch_type', 'total', 'zone_pct', 'heart_pct',
        'edge_pct', 'chase_rate'
    ]].copy()

    return result.sort_values('total', ascending=False)

# Analyze location metrics
cole_location = analyze_location_metrics(cole_pitches)

print("\nLocation & Command Metrics by Pitch Type")
print("=" * 70)
print(cole_location.to_string(index=False))

# R version: Location metrics
analyze_location_metrics <- function(df) {
  pitches <- df %>%
    filter(!is.na(zone))

  # Define zone categories
  pitches <- pitches %>%
    mutate(
      in_zone = zone <= 9,
      heart = zone == 5,
      edge = zone %in% c(1, 2, 3, 4, 6, 7, 8, 9),
      outside_zone = zone > 9,
      is_swing = description %in% c('swinging_strike', 'swinging_strike_blocked',
                                     'foul', 'hit_into_play', 'foul_tip'),
      is_chase = is_swing & outside_zone
    )

  # Calculate metrics by pitch type
  location_metrics <- pitches %>%
    group_by(pitch_type) %>%
    summarise(
      total = n(),
      zone_count = sum(in_zone),
      heart_count = sum(heart),
      edge_count = sum(edge),
      outside_count = sum(outside_zone),
      chase_count = sum(is_chase),
      .groups = 'drop'
    ) %>%
    mutate(
      zone_pct = zone_count / total * 100,
      heart_pct = heart_count / total * 100,
      edge_pct = edge_count / total * 100,
      chase_rate = chase_count / outside_count * 100,
      across(c(zone_pct, heart_pct, edge_pct, chase_rate), ~round(.x, 1))
    ) %>%
    select(pitch_type, total, zone_pct, heart_pct, edge_pct, chase_rate) %>%
    arrange(desc(total))

  return(location_metrics)
}

cole_location <- analyze_location_metrics(cole_pitches)

cat("\nLocation & Command Metrics by Pitch Type\n")
cat(strrep("=", 70), "\n")
print(cole_location)

7.7.2 CSW% and Chase Rate

Called Strikes + Whiffs (CSW%) is a simple but powerful metric for evaluating pitcher performance:

CSW% = (Called Strikes + Swinging Strikes) / Total Pitches

CSW% captures two essential pitcher skills:

Command: Throwing strikes that hitters don't swing at

Stuff: Getting whiffs when hitters do swing

League average CSW% is approximately 28-30%. Elite pitchers exceed 33%.

Chase Rate measures deception - how often hitters swing at pitches outside the zone:

Chase Rate = Swings Outside Zone / Pitches Outside Zone

League average chase rate is approximately 28-30%. Elite pitches/pitchers exceed 35%.

def calculate_csw_metrics(df):
    """
    Calculate CSW% and related metrics.

    Parameters:
    df: Statcast DataFrame

    Returns:
    DataFrame with CSW metrics by pitch type
    """
    pitches = df.copy()

    # Identify called strikes and whiffs
    pitches['called_strike'] = pitches['description'] == 'called_strike'
    pitches['swinging_strike'] = pitches['description'].isin([
        'swinging_strike', 'swinging_strike_blocked'
    ])
    pitches['csw'] = pitches['called_strike'] | pitches['swinging_strike']

    # Calculate by pitch type
    csw_metrics = pitches.groupby('pitch_type').agg({
        'pitch_type': 'count',
        'called_strike': 'sum',
        'swinging_strike': 'sum',
        'csw': 'sum'
    })

    csw_metrics.columns = ['total', 'called_strikes', 'whiffs', 'csw_count']
    csw_metrics = csw_metrics.reset_index()

    # Calculate percentages
    csw_metrics['called_strike_pct'] = (
        csw_metrics['called_strikes'] / csw_metrics['total'] * 100
    ).round(1)
    csw_metrics['whiff_pct'] = (
        csw_metrics['whiffs'] / csw_metrics['total'] * 100
    ).round(1)
    csw_metrics['csw_pct'] = (
        csw_metrics['csw_count'] / csw_metrics['total'] * 100
    ).round(1)

    result = csw_metrics[[
        'pitch_type', 'total', 'called_strike_pct',
        'whiff_pct', 'csw_pct'
    ]].copy()

    return result.sort_values('csw_pct', ascending=False)

cole_csw = calculate_csw_metrics(cole_pitches)

print("\nCSW% Analysis by Pitch Type")
print("=" * 70)
print(cole_csw.to_string(index=False))
print("\nMLB Average CSW%: ~29%")
print("Elite Threshold: 33%+")

# R version: CSW metrics
calculate_csw_metrics <- function(df) {
  pitches <- df %>%
    mutate(
      called_strike = description == 'called_strike',
      swinging_strike = description %in% c('swinging_strike', 'swinging_strike_blocked'),
      csw = called_strike | swinging_strike
    )

  csw_metrics <- pitches %>%
    group_by(pitch_type) %>%
    summarise(
      total = n(),
      called_strikes = sum(called_strike),
      whiffs = sum(swinging_strike),
      csw_count = sum(csw),
      .groups = 'drop'
    ) %>%
    mutate(
      called_strike_pct = called_strikes / total * 100,
      whiff_pct = whiffs / total * 100,
      csw_pct = csw_count / total * 100,
      across(c(called_strike_pct, whiff_pct, csw_pct), ~round(.x, 1))
    ) %>%
    select(pitch_type, total, called_strike_pct, whiff_pct, csw_pct) %>%
    arrange(desc(csw_pct))

  return(csw_metrics)
}

cole_csw <- calculate_csw_metrics(cole_pitches)

cat("\nCSW% Analysis by Pitch Type\n")
cat(strrep("=", 70), "\n")
print(cole_csw)
cat("\nMLB Average CSW%: ~29%\n")
cat("Elite Threshold: 33%+\n")

# R version: Location metrics
analyze_location_metrics <- function(df) {
  pitches <- df %>%
    filter(!is.na(zone))

  # Define zone categories
  pitches <- pitches %>%
    mutate(
      in_zone = zone <= 9,
      heart = zone == 5,
      edge = zone %in% c(1, 2, 3, 4, 6, 7, 8, 9),
      outside_zone = zone > 9,
      is_swing = description %in% c('swinging_strike', 'swinging_strike_blocked',
                                     'foul', 'hit_into_play', 'foul_tip'),
      is_chase = is_swing & outside_zone
    )

  # Calculate metrics by pitch type
  location_metrics <- pitches %>%
    group_by(pitch_type) %>%
    summarise(
      total = n(),
      zone_count = sum(in_zone),
      heart_count = sum(heart),
      edge_count = sum(edge),
      outside_count = sum(outside_zone),
      chase_count = sum(is_chase),
      .groups = 'drop'
    ) %>%
    mutate(
      zone_pct = zone_count / total * 100,
      heart_pct = heart_count / total * 100,
      edge_pct = edge_count / total * 100,
      chase_rate = chase_count / outside_count * 100,
      across(c(zone_pct, heart_pct, edge_pct, chase_rate), ~round(.x, 1))
    ) %>%
    select(pitch_type, total, zone_pct, heart_pct, edge_pct, chase_rate) %>%
    arrange(desc(total))

  return(location_metrics)
}

cole_location <- analyze_location_metrics(cole_pitches)

cat("\nLocation & Command Metrics by Pitch Type\n")
cat(strrep("=", 70), "\n")
print(cole_location)

# R version: CSW metrics
calculate_csw_metrics <- function(df) {
  pitches <- df %>%
    mutate(
      called_strike = description == 'called_strike',
      swinging_strike = description %in% c('swinging_strike', 'swinging_strike_blocked'),
      csw = called_strike | swinging_strike
    )

  csw_metrics <- pitches %>%
    group_by(pitch_type) %>%
    summarise(
      total = n(),
      called_strikes = sum(called_strike),
      whiffs = sum(swinging_strike),
      csw_count = sum(csw),
      .groups = 'drop'
    ) %>%
    mutate(
      called_strike_pct = called_strikes / total * 100,
      whiff_pct = whiffs / total * 100,
      csw_pct = csw_count / total * 100,
      across(c(called_strike_pct, whiff_pct, csw_pct), ~round(.x, 1))
    ) %>%
    select(pitch_type, total, called_strike_pct, whiff_pct, csw_pct) %>%
    arrange(desc(csw_pct))

  return(csw_metrics)
}

cole_csw <- calculate_csw_metrics(cole_pitches)

cat("\nCSW% Analysis by Pitch Type\n")
cat(strrep("=", 70), "\n")
print(cole_csw)
cat("\nMLB Average CSW%: ~29%\n")
cat("Elite Threshold: 33%+\n")

Python

def analyze_location_metrics(df):
    """
    Calculate pitch location and command metrics.

    Parameters:
    df: Statcast DataFrame with location data

    Returns:
    DataFrame with location metrics by pitch type
    """
    pitches = df[df['zone'].notna()].copy()

    # Zone definitions (Statcast uses zones 1-9 for strike zone, 11-14 for outside)
    pitches['in_zone'] = pitches['zone'] <= 9
    pitches['heart'] = pitches['zone'].isin([5])  # Zone 5 is heart
    pitches['edge'] = pitches['zone'].isin([1, 2, 3, 4, 6, 7, 8, 9]) & ~pitches['heart']
    pitches['outside_zone'] = pitches['zone'] > 9

    # Identify chases (swings on pitches outside zone)
    pitches['is_swing'] = pitches['description'].isin([
        'swinging_strike', 'swinging_strike_blocked',
        'foul', 'hit_into_play', 'foul_tip'
    ])
    pitches['is_chase'] = pitches['is_swing'] & pitches['outside_zone']

    # Calculate metrics by pitch type
    location_metrics = pitches.groupby('pitch_type').agg({
        'in_zone': ['sum', 'count'],
        'heart': 'sum',
        'edge': 'sum',
        'outside_zone': 'sum',
        'is_chase': 'sum'
    })

    location_metrics.columns = ['zone_count', 'total', 'heart_count',
                                'edge_count', 'outside_count', 'chase_count']
    location_metrics = location_metrics.reset_index()

    # Calculate percentages
    location_metrics['zone_pct'] = (
        location_metrics['zone_count'] / location_metrics['total'] * 100
    ).round(1)
    location_metrics['heart_pct'] = (
        location_metrics['heart_count'] / location_metrics['total'] * 100
    ).round(1)
    location_metrics['edge_pct'] = (
        location_metrics['edge_count'] / location_metrics['total'] * 100
    ).round(1)

    # Chase rate: chases / pitches outside zone
    location_metrics['chase_rate'] = (
        location_metrics['chase_count'] / location_metrics['outside_count'] * 100
    ).round(1)

    # Select final columns
    result = location_metrics[[
        'pitch_type', 'total', 'zone_pct', 'heart_pct',
        'edge_pct', 'chase_rate'
    ]].copy()

    return result.sort_values('total', ascending=False)

# Analyze location metrics
cole_location = analyze_location_metrics(cole_pitches)

print("\nLocation & Command Metrics by Pitch Type")
print("=" * 70)
print(cole_location.to_string(index=False))

Python

def calculate_csw_metrics(df):
    """
    Calculate CSW% and related metrics.

    Parameters:
    df: Statcast DataFrame

    Returns:
    DataFrame with CSW metrics by pitch type
    """
    pitches = df.copy()

    # Identify called strikes and whiffs
    pitches['called_strike'] = pitches['description'] == 'called_strike'
    pitches['swinging_strike'] = pitches['description'].isin([
        'swinging_strike', 'swinging_strike_blocked'
    ])
    pitches['csw'] = pitches['called_strike'] | pitches['swinging_strike']

    # Calculate by pitch type
    csw_metrics = pitches.groupby('pitch_type').agg({
        'pitch_type': 'count',
        'called_strike': 'sum',
        'swinging_strike': 'sum',
        'csw': 'sum'
    })

    csw_metrics.columns = ['total', 'called_strikes', 'whiffs', 'csw_count']
    csw_metrics = csw_metrics.reset_index()

    # Calculate percentages
    csw_metrics['called_strike_pct'] = (
        csw_metrics['called_strikes'] / csw_metrics['total'] * 100
    ).round(1)
    csw_metrics['whiff_pct'] = (
        csw_metrics['whiffs'] / csw_metrics['total'] * 100
    ).round(1)
    csw_metrics['csw_pct'] = (
        csw_metrics['csw_count'] / csw_metrics['total'] * 100
    ).round(1)

    result = csw_metrics[[
        'pitch_type', 'total', 'called_strike_pct',
        'whiff_pct', 'csw_pct'
    ]].copy()

    return result.sort_values('csw_pct', ascending=False)

cole_csw = calculate_csw_metrics(cole_pitches)

print("\nCSW% Analysis by Pitch Type")
print("=" * 70)
print(cole_csw.to_string(index=False))
print("\nMLB Average CSW%: ~29%")
print("Elite Threshold: 33%+")

7.8 Expected Stats for Pitchers

7.8.1 xERA and xwOBA Against

Just as hitters have expected statistics, pitchers have expected stats against based on contact quality allowed:

xwOBA Against (xwOBA): Expected weighted on-base average allowed, based on exit velocity and launch angle of batted balls. This removes defensive performance and luck, isolating the pitcher's responsibility.

xERA (Expected ERA): Estimated ERA based on expected outcomes rather than actual outcomes.

These metrics help identify:

Unlucky pitchers: High ERA but low xERA (likely to improve)

Lucky pitchers: Low ERA but high xERA (likely to regress)

Contact management: Pitchers who limit hard contact even when allowing hits

def analyze_expected_stats(df, player_name="Pitcher"):
    """
    Analyze expected statistics for a pitcher.

    Parameters:
    df: Statcast DataFrame with expected stats
    player_name: Name for report

    Returns:
    Summary of expected vs actual stats
    """
    # Filter for batted balls with expected stats
    batted_balls = df[
        df['estimated_woba_using_speedangle'].notna()
    ].copy()

    if len(batted_balls) == 0:
        return None

    # Calculate actual wOBA (simplified - using hits and outs)
    batted_balls['is_hit'] = batted_balls['events'].isin([
        'single', 'double', 'triple', 'home_run'
    ])
    batted_balls['is_hr'] = batted_balls['events'] == 'home_run'

    # Aggregate metrics
    xwOBA_against = batted_balls['estimated_woba_using_speedangle'].mean()
    avg_ev = batted_balls['launch_speed'].mean()
    avg_la = batted_balls['launch_angle'].mean()
    barrel_pct = (batted_balls['barrel'] == 1).mean() * 100
    hard_hit_pct = (batted_balls['launch_speed'] >= 95).mean() * 100

    results = {
        'Player': player_name,
        'xwOBA_Against': round(xwOBA_against, 3),
        'Avg_EV_Against': round(avg_ev, 1),
        'Avg_LA_Against': round(avg_la, 1),
        'Barrel%_Against': round(barrel_pct, 1),
        'HardHit%_Against': round(hard_hit_pct, 1),
        'Batted_Balls': len(batted_balls)
    }

    return pd.Series(results)

cole_xstats = analyze_expected_stats(cole_pitches, "Gerrit Cole")

if cole_xstats is not None:
    print("\nExpected Stats Against (2024)")
    print("=" * 60)
    for key, value in cole_xstats.items():
        print(f"{key:.<30} {value}")
    print("\nMLB Average xwOBA Against: ~.315")
    print("Elite Threshold: <.300")

# R version: Expected stats
analyze_expected_stats <- function(df, player_name = "Pitcher") {
  batted_balls <- df %>%
    filter(!is.na(estimated_woba_using_speedangle))

  if (nrow(batted_balls) == 0) {
    return(NULL)
  }

  results <- batted_balls %>%
    summarise(
      Player = player_name,
      xwOBA_Against = mean(estimated_woba_using_speedangle, na.rm = TRUE),
      Avg_EV_Against = mean(launch_speed, na.rm = TRUE),
      Avg_LA_Against = mean(launch_angle, na.rm = TRUE),
      Barrel_Pct_Against = mean(barrel == 1, na.rm = TRUE) * 100,
      HardHit_Pct_Against = mean(launch_speed >= 95, na.rm = TRUE) * 100,
      Batted_Balls = n()
    ) %>%
    mutate(across(where(is.numeric), ~round(.x, 3)))

  return(results)
}

cole_xstats <- analyze_expected_stats(cole_pitches, "Gerrit Cole")

if (!is.null(cole_xstats)) {
  cat("\nExpected Stats Against (2024)\n")
  cat(strrep("=", 60), "\n")
  print(t(cole_xstats), quote = FALSE)
  cat("\nMLB Average xwOBA Against: ~.315\n")
  cat("Elite Threshold: <.300\n")
}

7.8.2 Interpreting Expected Stats

When a pitcher's actual ERA significantly exceeds his xERA, several factors might be at play:

Poor defense: Fielders not converting outs at expected rates
Sequencing luck: Hits clustered together, leading to big innings
Runners on base: Performance worse with runners on (pitch selection, pressure)
Sample size: Early season stats can show large gaps that normalize

Conversely, when actual ERA is much lower than xERA:

Good defense: Excellent fielding behind the pitcher

Sequencing luck: Hits scattered, preventing big innings

Strand rate: Above-average at stranding runners

Unsustainable: Likely to see ERA rise toward xERA over time

# R version: Expected stats
analyze_expected_stats <- function(df, player_name = "Pitcher") {
  batted_balls <- df %>%
    filter(!is.na(estimated_woba_using_speedangle))

  if (nrow(batted_balls) == 0) {
    return(NULL)
  }

  results <- batted_balls %>%
    summarise(
      Player = player_name,
      xwOBA_Against = mean(estimated_woba_using_speedangle, na.rm = TRUE),
      Avg_EV_Against = mean(launch_speed, na.rm = TRUE),
      Avg_LA_Against = mean(launch_angle, na.rm = TRUE),
      Barrel_Pct_Against = mean(barrel == 1, na.rm = TRUE) * 100,
      HardHit_Pct_Against = mean(launch_speed >= 95, na.rm = TRUE) * 100,
      Batted_Balls = n()
    ) %>%
    mutate(across(where(is.numeric), ~round(.x, 3)))

  return(results)
}

cole_xstats <- analyze_expected_stats(cole_pitches, "Gerrit Cole")

if (!is.null(cole_xstats)) {
  cat("\nExpected Stats Against (2024)\n")
  cat(strrep("=", 60), "\n")
  print(t(cole_xstats), quote = FALSE)
  cat("\nMLB Average xwOBA Against: ~.315\n")
  cat("Elite Threshold: <.300\n")
}

Python

def analyze_expected_stats(df, player_name="Pitcher"):
    """
    Analyze expected statistics for a pitcher.

    Parameters:
    df: Statcast DataFrame with expected stats
    player_name: Name for report

    Returns:
    Summary of expected vs actual stats
    """
    # Filter for batted balls with expected stats
    batted_balls = df[
        df['estimated_woba_using_speedangle'].notna()
    ].copy()

    if len(batted_balls) == 0:
        return None

    # Calculate actual wOBA (simplified - using hits and outs)
    batted_balls['is_hit'] = batted_balls['events'].isin([
        'single', 'double', 'triple', 'home_run'
    ])
    batted_balls['is_hr'] = batted_balls['events'] == 'home_run'

    # Aggregate metrics
    xwOBA_against = batted_balls['estimated_woba_using_speedangle'].mean()
    avg_ev = batted_balls['launch_speed'].mean()
    avg_la = batted_balls['launch_angle'].mean()
    barrel_pct = (batted_balls['barrel'] == 1).mean() * 100
    hard_hit_pct = (batted_balls['launch_speed'] >= 95).mean() * 100

    results = {
        'Player': player_name,
        'xwOBA_Against': round(xwOBA_against, 3),
        'Avg_EV_Against': round(avg_ev, 1),
        'Avg_LA_Against': round(avg_la, 1),
        'Barrel%_Against': round(barrel_pct, 1),
        'HardHit%_Against': round(hard_hit_pct, 1),
        'Batted_Balls': len(batted_balls)
    }

    return pd.Series(results)

cole_xstats = analyze_expected_stats(cole_pitches, "Gerrit Cole")

if cole_xstats is not None:
    print("\nExpected Stats Against (2024)")
    print("=" * 60)
    for key, value in cole_xstats.items():
        print(f"{key:.<30} {value}")
    print("\nMLB Average xwOBA Against: ~.315")
    print("Elite Threshold: <.300")

7.9 Interactive Pitch Analysis Tools

Modern pitch analysis demands interactive visualizations that allow analysts, coaches, and fans to explore multi-dimensional data dynamically. While static charts effectively communicate specific insights, interactive tools enable deeper exploration of pitch arsenals, movement profiles, and sequencing patterns. This section introduces three powerful interactive visualization approaches using Plotly's interactive graphing capabilities, which provide zoom, pan, hover details, and filtering options that static visualizations cannot match.

Interactive pitch analysis tools serve multiple audiences. Player development staff use them to identify mechanical adjustments that could improve pitch characteristics. Opposing teams employ them for scouting and game-planning. Broadcasters leverage them to educate viewers about what makes certain pitches effective. The combination of real-time data updates and interactive exploration creates unprecedented opportunities for understanding pitcher performance.

7.9.1 Interactive Pitch Movement Chart

The pitch movement chart—plotting horizontal break against vertical break—is fundamental to understanding a pitcher's arsenal. Making this visualization interactive transforms it from a descriptive tool into an analytical powerhouse. Users can hover over individual pitches to see exact velocity, spin rate, and outcome data. They can filter by game situation, count, or pitch result. They can identify outlier pitches that deviate from typical movement patterns, potentially indicating mechanical issues or grip adjustments.

R Implementation:

library(tidyverse)
library(plotly)
library(baseballr)

create_interactive_pitch_movement <- function(pitcher_data, player_name = "Pitcher") {
  # Filter for pitches with complete movement data
  pitches <- pitcher_data %>%
    filter(!is.na(pfx_x), !is.na(pfx_z), !is.na(pitch_type)) %>%
    mutate(
      horizontal_break = pfx_x * 12,  # Convert to inches
      vertical_break = pfx_z * 12,    # Induced vertical break
      hover_text = paste0(
        "<b>", pitch_type, "</b><br>",
        "Velocity: ", round(release_speed, 1), " mph<br>",
        "Spin: ", round(release_spin_rate, 0), " rpm<br>",
        "H-Break: ", round(horizontal_break, 1), " in<br>",
        "V-Break: ", round(vertical_break, 1), " in<br>",
        "Result: ", events
      )
    )

  # Filter for pitch types with sufficient samples
  pitch_counts <- pitches %>% count(pitch_type)
  qualifying_pitches <- pitch_counts %>% filter(n >= 30) %>% pull(pitch_type)

  plot_data <- pitches %>% filter(pitch_type %in% qualifying_pitches)

  # Define color palette for pitch types
  pitch_colors <- c(
    'FF' = '#d62728',  # Four-seam: Red
    'SI' = '#ff7f0e',  # Sinker: Orange
    'FC' = '#2ca02c',  # Cutter: Green
    'SL' = '#9467bd',  # Slider: Purple
    'CU' = '#8c564b',  # Curve: Brown
    'CH' = '#e377c2',  # Change: Pink
    'FS' = '#17becf',  # Splitter: Cyan
    'KC' = '#bcbd22'   # Knuckle-curve: Yellow-green
  )

  # Create interactive scatter plot
  p <- plot_ly(data = plot_data) %>%
    add_markers(
      x = ~horizontal_break,
      y = ~vertical_break,
      color = ~pitch_type,
      colors = pitch_colors,
      text = ~hover_text,
      hoverinfo = "text",
      marker = list(
        size = 8,
        opacity = 0.6,
        line = list(width = 0.5, color = 'black')
      )
    ) %>%
    layout(
      title = list(
        text = paste0("<b>", player_name, " Pitch Movement Profile</b><br>",
                     "<sub>Catcher's Perspective - Hover for Details</sub>"),
        font = list(size = 16)
      ),
      xaxis = list(
        title = "<b>Horizontal Break (inches)</b><br>← Glove Side | Arm Side →",
        zeroline = TRUE,
        zerolinewidth = 2,
        zerolinecolor = 'gray',
        gridcolor = 'lightgray'
      ),
      yaxis = list(
        title = "<b>Induced Vertical Break (inches)</b><br>↓ Drop | Rise ↑",
        zeroline = TRUE,
        zerolinewidth = 2,
        zerolinecolor = 'gray',
        gridcolor = 'lightgray',
        scaleanchor = "x",  # Equal aspect ratio
        scaleratio = 1
      ),
      hovermode = 'closest',
      showlegend = TRUE,
      legend = list(
        title = list(text = '<b>Pitch Type</b>'),
        orientation = 'v',
        x = 1.02,
        y = 1
      ),
      margin = list(l = 80, r = 120, t = 100, b = 80)
    ) %>%
    config(displayModeBar = TRUE, displaylogo = FALSE)

  return(p)
}

# Example usage with Gerrit Cole's data
# cole_pitches <- statcast_search_pitchers(
#   start_date = "2024-04-01",
#   end_date = "2024-10-01",
#   pitcherid = 543037
# )
#
# interactive_movement_plot <- create_interactive_pitch_movement(
#   cole_pitches,
#   "Gerrit Cole 2024"
# )
# interactive_movement_plot

Python Implementation:

import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from pybaseball import statcast_pitcher

def create_interactive_pitch_movement(pitcher_data, player_name="Pitcher"):
    """
    Create interactive pitch movement visualization using Plotly.

    Parameters:
    pitcher_data: DataFrame with Statcast pitch data
    player_name: Name for chart title

    Returns:
    Plotly figure object
    """
    # Filter for complete movement data
    pitches = pitcher_data[
        pitcher_data['pfx_x'].notna() &
        pitcher_data['pfx_z'].notna() &
        pitcher_data['pitch_type'].notna()
    ].copy()

    # Convert to inches
    pitches['horizontal_break'] = pitches['pfx_x'] * 12
    pitches['vertical_break'] = pitches['pfx_z'] * 12

    # Create hover text
    pitches['hover_text'] = pitches.apply(
        lambda row: f"<b>{row['pitch_type']}</b><br>" +
                   f"Velocity: {row['release_speed']:.1f} mph<br>" +
                   f"Spin: {row['release_spin_rate']:.0f} rpm<br>" +
                   f"H-Break: {row['horizontal_break']:.1f} in<br>" +
                   f"V-Break: {row['vertical_break']:.1f} in<br>" +
                   f"Result: {row['events']}",
        axis=1
    )

    # Filter for qualifying pitch types
    pitch_counts = pitches['pitch_type'].value_counts()
    qualifying_pitches = pitch_counts[pitch_counts >= 30].index
    plot_data = pitches[pitches['pitch_type'].isin(qualifying_pitches)]

    # Pitch type colors
    pitch_colors = {
        'FF': '#d62728',  'SI': '#ff7f0e',  'FC': '#2ca02c',
        'SL': '#9467bd',  'CU': '#8c564b',  'CH': '#e377c2',
        'FS': '#17becf',  'KC': '#bcbd22'
    }

    # Create figure
    fig = go.Figure()

    # Add scatter trace for each pitch type
    for pitch_type in qualifying_pitches:
        pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]

        fig.add_trace(go.Scatter(
            x=pitch_subset['horizontal_break'],
            y=pitch_subset['vertical_break'],
            mode='markers',
            name=f"{pitch_type} (n={len(pitch_subset)})",
            text=pitch_subset['hover_text'],
            hoverinfo='text',
            marker=dict(
                color=pitch_colors.get(pitch_type, '#7f7f7f'),
                size=8,
                opacity=0.6,
                line=dict(width=0.5, color='black')
            )
        ))

    # Update layout
    fig.update_layout(
        title=dict(
            text=f"<b>{player_name} Pitch Movement Profile</b><br>" +
                 "<sub>Catcher's Perspective - Hover for Details</sub>",
            x=0.5,
            xanchor='center',
            font=dict(size=16)
        ),
        xaxis=dict(
            title="<b>Horizontal Break (inches)</b><br>← Glove Side | Arm Side →",
            zeroline=True,
            zerolinewidth=2,
            zerolinecolor='gray',
            gridcolor='lightgray',
            showgrid=True
        ),
        yaxis=dict(
            title="<b>Induced Vertical Break (inches)</b><br>↓ Drop | Rise ↑",
            zeroline=True,
            zerolinewidth=2,
            zerolinecolor='gray',
            gridcolor='lightgray',
            showgrid=True,
            scaleanchor="x",
            scaleratio=1
        ),
        hovermode='closest',
        showlegend=True,
        legend=dict(
            title=dict(text='<b>Pitch Type</b>'),
            orientation='v',
            x=1.02,
            y=1
        ),
        width=1000,
        height=900,
        margin=dict(l=80, r=150, t=100, b=80),
        template='plotly_white'
    )

    return fig

# Example usage
# cole_pitches = statcast_pitcher('2024-04-01', '2024-10-01', 543037)
# fig = create_interactive_pitch_movement(cole_pitches, "Gerrit Cole 2024")
# fig.show()

This interactive movement chart allows users to immediately identify pitch clustering, outliers, and the separation between pitch types. A pitcher with good pitch tunneling will show overlapping early trajectories but diverging final movement. The hover functionality enables quick identification of specific pitches for video review or further analysis.

7.9.2 Interactive Release Point Visualization

Release point consistency is critical for deception, but static visualizations can obscure patterns that emerge when examining pitches interactively. An interactive 3D release point chart allows rotation to examine side, height, and extension from multiple angles. Filtering by pitch type reveals whether a pitcher "tips" pitches through inconsistent release points. Color-coding by velocity or outcome adds another analytical dimension.

R Implementation:

library(plotly)
library(dplyr)

create_interactive_release_points <- function(pitcher_data, player_name = "Pitcher") {
  # Filter for complete release point data
  pitches <- pitcher_data %>%
    filter(
      !is.na(release_pos_x),
      !is.na(release_pos_y),
      !is.na(release_pos_z),
      !is.na(pitch_type)
    ) %>%
    mutate(
      hover_text = paste0(
        "<b>", pitch_type, "</b><br>",
        "X (side): ", round(release_pos_x, 2), " ft<br>",
        "Y (extension): ", round(release_pos_y, 2), " ft<br>",
        "Z (height): ", round(release_pos_z, 2), " ft<br>",
        "Velocity: ", round(release_speed, 1), " mph<br>",
        "Result: ", events
      )
    )

  # Filter for qualifying pitch types
  pitch_counts <- pitches %>% count(pitch_type)
  qualifying <- pitch_counts %>% filter(n >= 30) %>% pull(pitch_type)
  plot_data <- pitches %>% filter(pitch_type %in% qualifying)

  # Color palette
  pitch_colors <- c(
    'FF' = '#d62728', 'SI' = '#ff7f0e', 'FC' = '#2ca02c',
    'SL' = '#9467bd', 'CU' = '#8c564b', 'CH' = '#e377c2',
    'FS' = '#17becf', 'KC' = '#bcbd22'
  )

  # Create 3D scatter plot
  p <- plot_ly(data = plot_data) %>%
    add_markers(
      x = ~release_pos_x,
      y = ~release_extension,  # Use extension for depth
      z = ~release_pos_z,
      color = ~pitch_type,
      colors = pitch_colors,
      text = ~hover_text,
      hoverinfo = "text",
      marker = list(
        size = 5,
        opacity = 0.7,
        line = list(width = 0.3, color = 'black')
      )
    ) %>%
    layout(
      title = list(
        text = paste0("<b>", player_name, " Release Point Consistency</b><br>",
                     "<sub>3D View - Rotate to Explore</sub>"),
        font = list(size = 16)
      ),
      scene = list(
        xaxis = list(
          title = '<b>Horizontal Position (ft)</b><br>← Arm Side | Glove Side →',
          gridcolor = 'lightgray',
          backgroundcolor = 'white'
        ),
        yaxis = list(
          title = '<b>Extension (ft)</b>',
          gridcolor = 'lightgray',
          backgroundcolor = 'white'
        ),
        zaxis = list(
          title = '<b>Release Height (ft)</b>',
          gridcolor = 'lightgray',
          backgroundcolor = 'white'
        ),
        camera = list(
          eye = list(x = 1.5, y = 1.5, z = 1.3)
        ),
        aspectmode = 'cube'
      ),
      showlegend = TRUE,
      legend = list(
        title = list(text = '<b>Pitch Type</b>'),
        x = 1.02,
        y = 0.9
      )
    ) %>%
    config(displayModeBar = TRUE, displaylogo = FALSE)

  return(p)
}

# Example usage
# release_plot <- create_interactive_release_points(
#   cole_pitches,
#   "Gerrit Cole 2024"
# )
# release_plot

Python Implementation:

import plotly.graph_objects as go

def create_interactive_release_points(pitcher_data, player_name="Pitcher"):
    """
    Create 3D interactive release point visualization.

    Parameters:
    pitcher_data: DataFrame with Statcast pitch data
    player_name: Name for chart title

    Returns:
    Plotly figure object
    """
    # Filter for complete data
    pitches = pitcher_data[
        pitcher_data['release_pos_x'].notna() &
        pitcher_data['release_pos_y'].notna() &
        pitcher_data['release_pos_z'].notna() &
        pitcher_data['pitch_type'].notna()
    ].copy()

    # Create hover text
    pitches['hover_text'] = pitches.apply(
        lambda row: f"<b>{row['pitch_type']}</b><br>" +
                   f"X (side): {row['release_pos_x']:.2f} ft<br>" +
                   f"Y (extension): {row['release_pos_y']:.2f} ft<br>" +
                   f"Z (height): {row['release_pos_z']:.2f} ft<br>" +
                   f"Velocity: {row['release_speed']:.1f} mph<br>" +
                   f"Result: {row['events']}",
        axis=1
    )

    # Filter for qualifying pitch types
    pitch_counts = pitches['pitch_type'].value_counts()
    qualifying = pitch_counts[pitch_counts >= 30].index
    plot_data = pitches[pitches['pitch_type'].isin(qualifying)]

    # Pitch colors
    pitch_colors = {
        'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
        'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
        'FS': '#17becf', 'KC': '#bcbd22'
    }

    # Create figure
    fig = go.Figure()

    # Add trace for each pitch type
    for pitch_type in qualifying:
        pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]

        fig.add_trace(go.Scatter3d(
            x=pitch_subset['release_pos_x'],
            y=pitch_subset['release_extension'],  # Use extension for Y-axis
            z=pitch_subset['release_pos_z'],
            mode='markers',
            name=f"{pitch_type} (n={len(pitch_subset)})",
            text=pitch_subset['hover_text'],
            hoverinfo='text',
            marker=dict(
                color=pitch_colors.get(pitch_type, '#7f7f7f'),
                size=5,
                opacity=0.7,
                line=dict(width=0.3, color='black')
            )
        ))

    # Update layout
    fig.update_layout(
        title=dict(
            text=f"<b>{player_name} Release Point Consistency</b><br>" +
                 "<sub>3D View - Rotate to Explore</sub>",
            x=0.5,
            xanchor='center',
            font=dict(size=16)
        ),
        scene=dict(
            xaxis=dict(
                title='<b>Horizontal Position (ft)</b><br>← Arm Side | Glove Side →',
                gridcolor='lightgray',
                backgroundcolor='white'
            ),
            yaxis=dict(
                title='<b>Extension (ft)</b>',
                gridcolor='lightgray',
                backgroundcolor='white'
            ),
            zaxis=dict(
                title='<b>Release Height (ft)</b>',
                gridcolor='lightgray',
                backgroundcolor='white'
            ),
            camera=dict(
                eye=dict(x=1.5, y=1.5, z=1.3)
            ),
            aspectmode='cube'
        ),
        showlegend=True,
        legend=dict(
            title=dict(text='<b>Pitch Type</b>'),
            x=1.02,
            y=0.9
        ),
        width=1000,
        height=800,
        template='plotly_white'
    )

    return fig

# Example usage
# fig = create_interactive_release_points(cole_pitches, "Gerrit Cole 2024")
# fig.show()

The 3D release point visualization is particularly valuable for identifying "tipping" issues. If a pitcher's curveball consistently releases from a different height or arm slot than their fastball, hitters can pick up the pitch type early. The interactive rotation capability allows coaches to examine release points from the hitter's perspective, revealing subtle differences that might not be apparent in 2D projections.

7.9.3 Animated Pitch Sequence Explorer

Understanding pitch sequencing requires seeing how pitches relate to each other temporally. An animated pitch sequence explorer shows each pitch in order, tracking location, velocity, and movement while maintaining context of game situation, count, and previous pitches. This creates a narrative view of how a pitcher attacks hitters, revealing patterns in pitch selection and execution.

R Implementation:

library(plotly)
library(dplyr)

create_animated_pitch_sequence <- function(pitcher_data, player_name = "Pitcher",
                                          max_pitches = 200) {
  # Prepare pitch sequence data
  pitches <- pitcher_data %>%
    filter(!is.na(plate_x), !is.na(plate_z), !is.na(pitch_type)) %>%
    arrange(game_date, at_bat_number, pitch_number) %>%
    mutate(
      sequence_num = row_number(),
      count_state = paste0(balls, "-", strikes),
      frame_label = paste0("Pitch ", sequence_num, ": ", pitch_type,
                          " @ ", round(release_speed, 1), " mph<br>",
                          "Count: ", count_state, " | ",
                          description)
    ) %>%
    head(max_pitches)  # Limit for performance

  # Define strike zone boundaries
  sz_top <- 3.5    # Approximate top of zone
  sz_bottom <- 1.5 # Approximate bottom
  sz_left <- -0.83 # Left edge (catcher's view)
  sz_right <- 0.83 # Right edge

  # Pitch colors
  pitch_colors <- c(
    'FF' = '#d62728', 'SI' = '#ff7f0e', 'FC' = '#2ca02c',
    'SL' = '#9467bd', 'CU' = '#8c564b', 'CH' = '#e377c2',
    'FS' = '#17becf', 'KC' = '#bcbd22'
  )

  # Create animated scatter plot
  p <- plot_ly(
    data = pitches,
    x = ~plate_x,
    y = ~plate_z,
    frame = ~sequence_num,
    color = ~pitch_type,
    colors = pitch_colors,
    text = ~frame_label,
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers',
    marker = list(
      size = 12,
      opacity = 0.8,
      line = list(width = 1, color = 'black')
    )
  ) %>%
    layout(
      title = list(
        text = paste0("<b>", player_name, " Pitch Sequence</b><br>",
                     "<sub>Catcher's View - Press Play</sub>"),
        font = list(size = 16)
      ),
      xaxis = list(
        title = "<b>Horizontal Location (ft)</b><br>← Inside | Outside →",
        range = c(-2, 2),
        zeroline = TRUE,
        zerolinecolor = 'lightgray'
      ),
      yaxis = list(
        title = "<b>Vertical Location (ft)</b>",
        range = c(0, 5),
        zeroline = FALSE
      ),
      shapes = list(
        # Strike zone rectangle
        list(
          type = "rect",
          x0 = sz_left, x1 = sz_right,
          y0 = sz_bottom, y1 = sz_top,
          line = list(color = "black", width = 2),
          fillcolor = "rgba(200, 200, 200, 0.1)"
        )
      ),
      showlegend = TRUE,
      legend = list(title = list(text = '<b>Pitch Type</b>'))
    ) %>%
    animation_opts(
      frame = 1000,  # 1 second per pitch
      transition = 500,
      redraw = FALSE
    ) %>%
    animation_slider(
      currentvalue = list(
        prefix = "Pitch: ",
        font = list(size = 14, color = "black")
      )
    ) %>%
    config(displayModeBar = TRUE, displaylogo = FALSE)

  return(p)
}

# Example usage
# sequence_plot <- create_animated_pitch_sequence(
#   cole_pitches,
#   "Gerrit Cole 2024",
#   max_pitches = 150
# )
# sequence_plot

Python Implementation:

import plotly.graph_objects as go

def create_animated_pitch_sequence(pitcher_data, player_name="Pitcher",
                                   max_pitches=200):
    """
    Create animated pitch sequence visualization.

    Parameters:
    pitcher_data: DataFrame with Statcast pitch data
    player_name: Name for chart title
    max_pitches: Maximum pitches to include (for performance)

    Returns:
    Plotly figure object
    """
    # Prepare sequence data
    pitches = pitcher_data[
        pitcher_data['plate_x'].notna() &
        pitcher_data['plate_z'].notna() &
        pitcher_data['pitch_type'].notna()
    ].copy()

    # Sort chronologically
    pitches = pitches.sort_values(['game_date', 'at_bat_number', 'pitch_number'])
    pitches = pitches.head(max_pitches)  # Limit for performance
    pitches['sequence_num'] = range(1, len(pitches) + 1)

    # Create count state and labels
    pitches['count_state'] = pitches['balls'].astype(str) + '-' + pitches['strikes'].astype(str)
    pitches['frame_label'] = pitches.apply(
        lambda row: f"Pitch {row['sequence_num']}: {row['pitch_type']} " +
                   f"@ {row['release_speed']:.1f} mph<br>" +
                   f"Count: {row['count_state']} | {row['description']}",
        axis=1
    )

    # Strike zone boundaries
    sz_left, sz_right = -0.83, 0.83
    sz_bottom, sz_top = 1.5, 3.5

    # Pitch colors
    pitch_colors = {
        'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
        'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
        'FS': '#17becf', 'KC': '#bcbd22'
    }

    # Create figure with frames
    fig = go.Figure()

    # Get unique pitch types for legend
    pitch_types = pitches['pitch_type'].unique()

    # Create frames for animation
    frames = []
    for seq_num in pitches['sequence_num'].unique():
        frame_data = pitches[pitches['sequence_num'] <= seq_num]

        frame_traces = []
        for pitch_type in pitch_types:
            pt_data = frame_data[frame_data['pitch_type'] == pitch_type]
            if len(pt_data) > 0:
                frame_traces.append(go.Scatter(
                    x=pt_data['plate_x'],
                    y=pt_data['plate_z'],
                    mode='markers',
                    name=pitch_type,
                    text=pt_data['frame_label'],
                    hoverinfo='text',
                    marker=dict(
                        color=pitch_colors.get(pitch_type, '#7f7f7f'),
                        size=12,
                        opacity=0.8,
                        line=dict(width=1, color='black')
                    ),
                    showlegend=(seq_num == 1)  # Only show legend on first frame
                ))

        frames.append(go.Frame(data=frame_traces, name=str(seq_num)))

    # Add initial frame data
    initial_data = pitches[pitches['sequence_num'] == 1]
    for pitch_type in pitch_types:
        pt_data = initial_data[initial_data['pitch_type'] == pitch_type]
        if len(pt_data) > 0:
            fig.add_trace(go.Scatter(
                x=pt_data['plate_x'],
                y=pt_data['plate_z'],
                mode='markers',
                name=pitch_type,
                text=pt_data['frame_label'],
                hoverinfo='text',
                marker=dict(
                    color=pitch_colors.get(pitch_type, '#7f7f7f'),
                    size=12,
                    opacity=0.8,
                    line=dict(width=1, color='black')
                )
            ))

    # Add strike zone
    fig.add_shape(
        type="rect",
        x0=sz_left, x1=sz_right,
        y0=sz_bottom, y1=sz_top,
        line=dict(color="black", width=2),
        fillcolor="rgba(200, 200, 200, 0.1)"
    )

    fig.frames = frames

    # Update layout
    fig.update_layout(
        title=dict(
            text=f"<b>{player_name} Pitch Sequence</b><br>" +
                 "<sub>Catcher's View - Press Play</sub>",
            x=0.5,
            xanchor='center',
            font=dict(size=16)
        ),
        xaxis=dict(
            title="<b>Horizontal Location (ft)</b><br>← Inside | Outside →",
            range=[-2, 2],
            zeroline=True,
            zerolinecolor='lightgray',
            gridcolor='lightgray'
        ),
        yaxis=dict(
            title="<b>Vertical Location (ft)</b>",
            range=[0, 5],
            zeroline=False,
            gridcolor='lightgray'
        ),
        showlegend=True,
        legend=dict(title=dict(text='<b>Pitch Type</b>')),
        updatemenus=[{
            'type': 'buttons',
            'showactive': False,
            'buttons': [
                {
                    'label': 'Play',
                    'method': 'animate',
                    'args': [None, {
                        'frame': {'duration': 1000, 'redraw': True},
                        'fromcurrent': True,
                        'transition': {'duration': 500}
                    }]
                },
                {
                    'label': 'Pause',
                    'method': 'animate',
                    'args': [[None], {
                        'frame': {'duration': 0, 'redraw': False},
                        'mode': 'immediate',
                        'transition': {'duration': 0}
                    }]
                }
            ],
            'x': 0.1,
            'y': 0
        }],
        sliders=[{
            'active': 0,
            'steps': [
                {
                    'args': [[f.name], {
                        'frame': {'duration': 0, 'redraw': True},
                        'mode': 'immediate',
                        'transition': {'duration': 0}
                    }],
                    'label': f.name,
                    'method': 'animate'
                }
                for f in frames
            ],
            'currentvalue': {
                'prefix': 'Pitch: ',
                'font': {'size': 14, 'color': 'black'}
            },
            'x': 0.1,
            'len': 0.9,
            'xanchor': 'left',
            'y': 0,
            'yanchor': 'top'
        }],
        width=1000,
        height=800,
        template='plotly_white'
    )

    return fig

# Example usage
# fig = create_animated_pitch_sequence(cole_pitches, "Gerrit Cole 2024", max_pitches=150)
# fig.show()

The animated pitch sequence explorer is particularly powerful for understanding how pitchers set up hitters. Watch an elite pitcher work an at-bat: fastball up and in to establish the inner half, slider down and away to show something soft, another fastball up to get ahead, then put away with a slider off the plate that the hitter chases. The sequential nature reveals patterns that aggregated statistics miss entirely. Analysts can identify if a pitcher becomes predictable in certain counts or situations, offering actionable insights for both pitcher development and opponent preparation.

These interactive visualization tools represent the cutting edge of pitch analysis. They transform static data into explorable experiences, enabling deeper insights and more effective communication of complex multi-dimensional pitching data. The combination of hover details, filtering, rotation (for 3D), and animation creates unprecedented analytical power for understanding pitcher performance and arsenal effectiveness.

library(tidyverse)
library(plotly)
library(baseballr)

create_interactive_pitch_movement <- function(pitcher_data, player_name = "Pitcher") {
  # Filter for pitches with complete movement data
  pitches <- pitcher_data %>%
    filter(!is.na(pfx_x), !is.na(pfx_z), !is.na(pitch_type)) %>%
    mutate(
      horizontal_break = pfx_x * 12,  # Convert to inches
      vertical_break = pfx_z * 12,    # Induced vertical break
      hover_text = paste0(
        "<b>", pitch_type, "</b><br>",
        "Velocity: ", round(release_speed, 1), " mph<br>",
        "Spin: ", round(release_spin_rate, 0), " rpm<br>",
        "H-Break: ", round(horizontal_break, 1), " in<br>",
        "V-Break: ", round(vertical_break, 1), " in<br>",
        "Result: ", events
      )
    )

  # Filter for pitch types with sufficient samples
  pitch_counts <- pitches %>% count(pitch_type)
  qualifying_pitches <- pitch_counts %>% filter(n >= 30) %>% pull(pitch_type)

  plot_data <- pitches %>% filter(pitch_type %in% qualifying_pitches)

  # Define color palette for pitch types
  pitch_colors <- c(
    'FF' = '#d62728',  # Four-seam: Red
    'SI' = '#ff7f0e',  # Sinker: Orange
    'FC' = '#2ca02c',  # Cutter: Green
    'SL' = '#9467bd',  # Slider: Purple
    'CU' = '#8c564b',  # Curve: Brown
    'CH' = '#e377c2',  # Change: Pink
    'FS' = '#17becf',  # Splitter: Cyan
    'KC' = '#bcbd22'   # Knuckle-curve: Yellow-green
  )

  # Create interactive scatter plot
  p <- plot_ly(data = plot_data) %>%
    add_markers(
      x = ~horizontal_break,
      y = ~vertical_break,
      color = ~pitch_type,
      colors = pitch_colors,
      text = ~hover_text,
      hoverinfo = "text",
      marker = list(
        size = 8,
        opacity = 0.6,
        line = list(width = 0.5, color = 'black')
      )
    ) %>%
    layout(
      title = list(
        text = paste0("<b>", player_name, " Pitch Movement Profile</b><br>",
                     "<sub>Catcher's Perspective - Hover for Details</sub>"),
        font = list(size = 16)
      ),
      xaxis = list(
        title = "<b>Horizontal Break (inches)</b><br>← Glove Side | Arm Side →",
        zeroline = TRUE,
        zerolinewidth = 2,
        zerolinecolor = 'gray',
        gridcolor = 'lightgray'
      ),
      yaxis = list(
        title = "<b>Induced Vertical Break (inches)</b><br>↓ Drop | Rise ↑",
        zeroline = TRUE,
        zerolinewidth = 2,
        zerolinecolor = 'gray',
        gridcolor = 'lightgray',
        scaleanchor = "x",  # Equal aspect ratio
        scaleratio = 1
      ),
      hovermode = 'closest',
      showlegend = TRUE,
      legend = list(
        title = list(text = '<b>Pitch Type</b>'),
        orientation = 'v',
        x = 1.02,
        y = 1
      ),
      margin = list(l = 80, r = 120, t = 100, b = 80)
    ) %>%
    config(displayModeBar = TRUE, displaylogo = FALSE)

  return(p)
}

# Example usage with Gerrit Cole's data
# cole_pitches <- statcast_search_pitchers(
#   start_date = "2024-04-01",
#   end_date = "2024-10-01",
#   pitcherid = 543037
# )
#
# interactive_movement_plot <- create_interactive_pitch_movement(
#   cole_pitches,
#   "Gerrit Cole 2024"
# )
# interactive_movement_plot

library(plotly)
library(dplyr)

create_interactive_release_points <- function(pitcher_data, player_name = "Pitcher") {
  # Filter for complete release point data
  pitches <- pitcher_data %>%
    filter(
      !is.na(release_pos_x),
      !is.na(release_pos_y),
      !is.na(release_pos_z),
      !is.na(pitch_type)
    ) %>%
    mutate(
      hover_text = paste0(
        "<b>", pitch_type, "</b><br>",
        "X (side): ", round(release_pos_x, 2), " ft<br>",
        "Y (extension): ", round(release_pos_y, 2), " ft<br>",
        "Z (height): ", round(release_pos_z, 2), " ft<br>",
        "Velocity: ", round(release_speed, 1), " mph<br>",
        "Result: ", events
      )
    )

  # Filter for qualifying pitch types
  pitch_counts <- pitches %>% count(pitch_type)
  qualifying <- pitch_counts %>% filter(n >= 30) %>% pull(pitch_type)
  plot_data <- pitches %>% filter(pitch_type %in% qualifying)

  # Color palette
  pitch_colors <- c(
    'FF' = '#d62728', 'SI' = '#ff7f0e', 'FC' = '#2ca02c',
    'SL' = '#9467bd', 'CU' = '#8c564b', 'CH' = '#e377c2',
    'FS' = '#17becf', 'KC' = '#bcbd22'
  )

  # Create 3D scatter plot
  p <- plot_ly(data = plot_data) %>%
    add_markers(
      x = ~release_pos_x,
      y = ~release_extension,  # Use extension for depth
      z = ~release_pos_z,
      color = ~pitch_type,
      colors = pitch_colors,
      text = ~hover_text,
      hoverinfo = "text",
      marker = list(
        size = 5,
        opacity = 0.7,
        line = list(width = 0.3, color = 'black')
      )
    ) %>%
    layout(
      title = list(
        text = paste0("<b>", player_name, " Release Point Consistency</b><br>",
                     "<sub>3D View - Rotate to Explore</sub>"),
        font = list(size = 16)
      ),
      scene = list(
        xaxis = list(
          title = '<b>Horizontal Position (ft)</b><br>← Arm Side | Glove Side →',
          gridcolor = 'lightgray',
          backgroundcolor = 'white'
        ),
        yaxis = list(
          title = '<b>Extension (ft)</b>',
          gridcolor = 'lightgray',
          backgroundcolor = 'white'
        ),
        zaxis = list(
          title = '<b>Release Height (ft)</b>',
          gridcolor = 'lightgray',
          backgroundcolor = 'white'
        ),
        camera = list(
          eye = list(x = 1.5, y = 1.5, z = 1.3)
        ),
        aspectmode = 'cube'
      ),
      showlegend = TRUE,
      legend = list(
        title = list(text = '<b>Pitch Type</b>'),
        x = 1.02,
        y = 0.9
      )
    ) %>%
    config(displayModeBar = TRUE, displaylogo = FALSE)

  return(p)
}

# Example usage
# release_plot <- create_interactive_release_points(
#   cole_pitches,
#   "Gerrit Cole 2024"
# )
# release_plot

library(plotly)
library(dplyr)

create_animated_pitch_sequence <- function(pitcher_data, player_name = "Pitcher",
                                          max_pitches = 200) {
  # Prepare pitch sequence data
  pitches <- pitcher_data %>%
    filter(!is.na(plate_x), !is.na(plate_z), !is.na(pitch_type)) %>%
    arrange(game_date, at_bat_number, pitch_number) %>%
    mutate(
      sequence_num = row_number(),
      count_state = paste0(balls, "-", strikes),
      frame_label = paste0("Pitch ", sequence_num, ": ", pitch_type,
                          " @ ", round(release_speed, 1), " mph<br>",
                          "Count: ", count_state, " | ",
                          description)
    ) %>%
    head(max_pitches)  # Limit for performance

  # Define strike zone boundaries
  sz_top <- 3.5    # Approximate top of zone
  sz_bottom <- 1.5 # Approximate bottom
  sz_left <- -0.83 # Left edge (catcher's view)
  sz_right <- 0.83 # Right edge

  # Pitch colors
  pitch_colors <- c(
    'FF' = '#d62728', 'SI' = '#ff7f0e', 'FC' = '#2ca02c',
    'SL' = '#9467bd', 'CU' = '#8c564b', 'CH' = '#e377c2',
    'FS' = '#17becf', 'KC' = '#bcbd22'
  )

  # Create animated scatter plot
  p <- plot_ly(
    data = pitches,
    x = ~plate_x,
    y = ~plate_z,
    frame = ~sequence_num,
    color = ~pitch_type,
    colors = pitch_colors,
    text = ~frame_label,
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers',
    marker = list(
      size = 12,
      opacity = 0.8,
      line = list(width = 1, color = 'black')
    )
  ) %>%
    layout(
      title = list(
        text = paste0("<b>", player_name, " Pitch Sequence</b><br>",
                     "<sub>Catcher's View - Press Play</sub>"),
        font = list(size = 16)
      ),
      xaxis = list(
        title = "<b>Horizontal Location (ft)</b><br>← Inside | Outside →",
        range = c(-2, 2),
        zeroline = TRUE,
        zerolinecolor = 'lightgray'
      ),
      yaxis = list(
        title = "<b>Vertical Location (ft)</b>",
        range = c(0, 5),
        zeroline = FALSE
      ),
      shapes = list(
        # Strike zone rectangle
        list(
          type = "rect",
          x0 = sz_left, x1 = sz_right,
          y0 = sz_bottom, y1 = sz_top,
          line = list(color = "black", width = 2),
          fillcolor = "rgba(200, 200, 200, 0.1)"
        )
      ),
      showlegend = TRUE,
      legend = list(title = list(text = '<b>Pitch Type</b>'))
    ) %>%
    animation_opts(
      frame = 1000,  # 1 second per pitch
      transition = 500,
      redraw = FALSE
    ) %>%
    animation_slider(
      currentvalue = list(
        prefix = "Pitch: ",
        font = list(size = 14, color = "black")
      )
    ) %>%
    config(displayModeBar = TRUE, displaylogo = FALSE)

  return(p)
}

# Example usage
# sequence_plot <- create_animated_pitch_sequence(
#   cole_pitches,
#   "Gerrit Cole 2024",
#   max_pitches = 150
# )
# sequence_plot

Python

import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from pybaseball import statcast_pitcher

def create_interactive_pitch_movement(pitcher_data, player_name="Pitcher"):
    """
    Create interactive pitch movement visualization using Plotly.

    Parameters:
    pitcher_data: DataFrame with Statcast pitch data
    player_name: Name for chart title

    Returns:
    Plotly figure object
    """
    # Filter for complete movement data
    pitches = pitcher_data[
        pitcher_data['pfx_x'].notna() &
        pitcher_data['pfx_z'].notna() &
        pitcher_data['pitch_type'].notna()
    ].copy()

    # Convert to inches
    pitches['horizontal_break'] = pitches['pfx_x'] * 12
    pitches['vertical_break'] = pitches['pfx_z'] * 12

    # Create hover text
    pitches['hover_text'] = pitches.apply(
        lambda row: f"<b>{row['pitch_type']}</b><br>" +
                   f"Velocity: {row['release_speed']:.1f} mph<br>" +
                   f"Spin: {row['release_spin_rate']:.0f} rpm<br>" +
                   f"H-Break: {row['horizontal_break']:.1f} in<br>" +
                   f"V-Break: {row['vertical_break']:.1f} in<br>" +
                   f"Result: {row['events']}",
        axis=1
    )

    # Filter for qualifying pitch types
    pitch_counts = pitches['pitch_type'].value_counts()
    qualifying_pitches = pitch_counts[pitch_counts >= 30].index
    plot_data = pitches[pitches['pitch_type'].isin(qualifying_pitches)]

    # Pitch type colors
    pitch_colors = {
        'FF': '#d62728',  'SI': '#ff7f0e',  'FC': '#2ca02c',
        'SL': '#9467bd',  'CU': '#8c564b',  'CH': '#e377c2',
        'FS': '#17becf',  'KC': '#bcbd22'
    }

    # Create figure
    fig = go.Figure()

    # Add scatter trace for each pitch type
    for pitch_type in qualifying_pitches:
        pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]

        fig.add_trace(go.Scatter(
            x=pitch_subset['horizontal_break'],
            y=pitch_subset['vertical_break'],
            mode='markers',
            name=f"{pitch_type} (n={len(pitch_subset)})",
            text=pitch_subset['hover_text'],
            hoverinfo='text',
            marker=dict(
                color=pitch_colors.get(pitch_type, '#7f7f7f'),
                size=8,
                opacity=0.6,
                line=dict(width=0.5, color='black')
            )
        ))

    # Update layout
    fig.update_layout(
        title=dict(
            text=f"<b>{player_name} Pitch Movement Profile</b><br>" +
                 "<sub>Catcher's Perspective - Hover for Details</sub>",
            x=0.5,
            xanchor='center',
            font=dict(size=16)
        ),
        xaxis=dict(
            title="<b>Horizontal Break (inches)</b><br>← Glove Side | Arm Side →",
            zeroline=True,
            zerolinewidth=2,
            zerolinecolor='gray',
            gridcolor='lightgray',
            showgrid=True
        ),
        yaxis=dict(
            title="<b>Induced Vertical Break (inches)</b><br>↓ Drop | Rise ↑",
            zeroline=True,
            zerolinewidth=2,
            zerolinecolor='gray',
            gridcolor='lightgray',
            showgrid=True,
            scaleanchor="x",
            scaleratio=1
        ),
        hovermode='closest',
        showlegend=True,
        legend=dict(
            title=dict(text='<b>Pitch Type</b>'),
            orientation='v',
            x=1.02,
            y=1
        ),
        width=1000,
        height=900,
        margin=dict(l=80, r=150, t=100, b=80),
        template='plotly_white'
    )

    return fig

# Example usage
# cole_pitches = statcast_pitcher('2024-04-01', '2024-10-01', 543037)
# fig = create_interactive_pitch_movement(cole_pitches, "Gerrit Cole 2024")
# fig.show()

Python

import plotly.graph_objects as go

def create_interactive_release_points(pitcher_data, player_name="Pitcher"):
    """
    Create 3D interactive release point visualization.

    Parameters:
    pitcher_data: DataFrame with Statcast pitch data
    player_name: Name for chart title

    Returns:
    Plotly figure object
    """
    # Filter for complete data
    pitches = pitcher_data[
        pitcher_data['release_pos_x'].notna() &
        pitcher_data['release_pos_y'].notna() &
        pitcher_data['release_pos_z'].notna() &
        pitcher_data['pitch_type'].notna()
    ].copy()

    # Create hover text
    pitches['hover_text'] = pitches.apply(
        lambda row: f"<b>{row['pitch_type']}</b><br>" +
                   f"X (side): {row['release_pos_x']:.2f} ft<br>" +
                   f"Y (extension): {row['release_pos_y']:.2f} ft<br>" +
                   f"Z (height): {row['release_pos_z']:.2f} ft<br>" +
                   f"Velocity: {row['release_speed']:.1f} mph<br>" +
                   f"Result: {row['events']}",
        axis=1
    )

    # Filter for qualifying pitch types
    pitch_counts = pitches['pitch_type'].value_counts()
    qualifying = pitch_counts[pitch_counts >= 30].index
    plot_data = pitches[pitches['pitch_type'].isin(qualifying)]

    # Pitch colors
    pitch_colors = {
        'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
        'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
        'FS': '#17becf', 'KC': '#bcbd22'
    }

    # Create figure
    fig = go.Figure()

    # Add trace for each pitch type
    for pitch_type in qualifying:
        pitch_subset = plot_data[plot_data['pitch_type'] == pitch_type]

        fig.add_trace(go.Scatter3d(
            x=pitch_subset['release_pos_x'],
            y=pitch_subset['release_extension'],  # Use extension for Y-axis
            z=pitch_subset['release_pos_z'],
            mode='markers',
            name=f"{pitch_type} (n={len(pitch_subset)})",
            text=pitch_subset['hover_text'],
            hoverinfo='text',
            marker=dict(
                color=pitch_colors.get(pitch_type, '#7f7f7f'),
                size=5,
                opacity=0.7,
                line=dict(width=0.3, color='black')
            )
        ))

    # Update layout
    fig.update_layout(
        title=dict(
            text=f"<b>{player_name} Release Point Consistency</b><br>" +
                 "<sub>3D View - Rotate to Explore</sub>",
            x=0.5,
            xanchor='center',
            font=dict(size=16)
        ),
        scene=dict(
            xaxis=dict(
                title='<b>Horizontal Position (ft)</b><br>← Arm Side | Glove Side →',
                gridcolor='lightgray',
                backgroundcolor='white'
            ),
            yaxis=dict(
                title='<b>Extension (ft)</b>',
                gridcolor='lightgray',
                backgroundcolor='white'
            ),
            zaxis=dict(
                title='<b>Release Height (ft)</b>',
                gridcolor='lightgray',
                backgroundcolor='white'
            ),
            camera=dict(
                eye=dict(x=1.5, y=1.5, z=1.3)
            ),
            aspectmode='cube'
        ),
        showlegend=True,
        legend=dict(
            title=dict(text='<b>Pitch Type</b>'),
            x=1.02,
            y=0.9
        ),
        width=1000,
        height=800,
        template='plotly_white'
    )

    return fig

# Example usage
# fig = create_interactive_release_points(cole_pitches, "Gerrit Cole 2024")
# fig.show()

Python

import plotly.graph_objects as go

def create_animated_pitch_sequence(pitcher_data, player_name="Pitcher",
                                   max_pitches=200):
    """
    Create animated pitch sequence visualization.

    Parameters:
    pitcher_data: DataFrame with Statcast pitch data
    player_name: Name for chart title
    max_pitches: Maximum pitches to include (for performance)

    Returns:
    Plotly figure object
    """
    # Prepare sequence data
    pitches = pitcher_data[
        pitcher_data['plate_x'].notna() &
        pitcher_data['plate_z'].notna() &
        pitcher_data['pitch_type'].notna()
    ].copy()

    # Sort chronologically
    pitches = pitches.sort_values(['game_date', 'at_bat_number', 'pitch_number'])
    pitches = pitches.head(max_pitches)  # Limit for performance
    pitches['sequence_num'] = range(1, len(pitches) + 1)

    # Create count state and labels
    pitches['count_state'] = pitches['balls'].astype(str) + '-' + pitches['strikes'].astype(str)
    pitches['frame_label'] = pitches.apply(
        lambda row: f"Pitch {row['sequence_num']}: {row['pitch_type']} " +
                   f"@ {row['release_speed']:.1f} mph<br>" +
                   f"Count: {row['count_state']} | {row['description']}",
        axis=1
    )

    # Strike zone boundaries
    sz_left, sz_right = -0.83, 0.83
    sz_bottom, sz_top = 1.5, 3.5

    # Pitch colors
    pitch_colors = {
        'FF': '#d62728', 'SI': '#ff7f0e', 'FC': '#2ca02c',
        'SL': '#9467bd', 'CU': '#8c564b', 'CH': '#e377c2',
        'FS': '#17becf', 'KC': '#bcbd22'
    }

    # Create figure with frames
    fig = go.Figure()

    # Get unique pitch types for legend
    pitch_types = pitches['pitch_type'].unique()

    # Create frames for animation
    frames = []
    for seq_num in pitches['sequence_num'].unique():
        frame_data = pitches[pitches['sequence_num'] <= seq_num]

        frame_traces = []
        for pitch_type in pitch_types:
            pt_data = frame_data[frame_data['pitch_type'] == pitch_type]
            if len(pt_data) > 0:
                frame_traces.append(go.Scatter(
                    x=pt_data['plate_x'],
                    y=pt_data['plate_z'],
                    mode='markers',
                    name=pitch_type,
                    text=pt_data['frame_label'],
                    hoverinfo='text',
                    marker=dict(
                        color=pitch_colors.get(pitch_type, '#7f7f7f'),
                        size=12,
                        opacity=0.8,
                        line=dict(width=1, color='black')
                    ),
                    showlegend=(seq_num == 1)  # Only show legend on first frame
                ))

        frames.append(go.Frame(data=frame_traces, name=str(seq_num)))

    # Add initial frame data
    initial_data = pitches[pitches['sequence_num'] == 1]
    for pitch_type in pitch_types:
        pt_data = initial_data[initial_data['pitch_type'] == pitch_type]
        if len(pt_data) > 0:
            fig.add_trace(go.Scatter(
                x=pt_data['plate_x'],
                y=pt_data['plate_z'],
                mode='markers',
                name=pitch_type,
                text=pt_data['frame_label'],
                hoverinfo='text',
                marker=dict(
                    color=pitch_colors.get(pitch_type, '#7f7f7f'),
                    size=12,
                    opacity=0.8,
                    line=dict(width=1, color='black')
                )
            ))

    # Add strike zone
    fig.add_shape(
        type="rect",
        x0=sz_left, x1=sz_right,
        y0=sz_bottom, y1=sz_top,
        line=dict(color="black", width=2),
        fillcolor="rgba(200, 200, 200, 0.1)"
    )

    fig.frames = frames

    # Update layout
    fig.update_layout(
        title=dict(
            text=f"<b>{player_name} Pitch Sequence</b><br>" +
                 "<sub>Catcher's View - Press Play</sub>",
            x=0.5,
            xanchor='center',
            font=dict(size=16)
        ),
        xaxis=dict(
            title="<b>Horizontal Location (ft)</b><br>← Inside | Outside →",
            range=[-2, 2],
            zeroline=True,
            zerolinecolor='lightgray',
            gridcolor='lightgray'
        ),
        yaxis=dict(
            title="<b>Vertical Location (ft)</b>",
            range=[0, 5],
            zeroline=False,
            gridcolor='lightgray'
        ),
        showlegend=True,
        legend=dict(title=dict(text='<b>Pitch Type</b>')),
        updatemenus=[{
            'type': 'buttons',
            'showactive': False,
            'buttons': [
                {
                    'label': 'Play',
                    'method': 'animate',
                    'args': [None, {
                        'frame': {'duration': 1000, 'redraw': True},
                        'fromcurrent': True,
                        'transition': {'duration': 500}
                    }]
                },
                {
                    'label': 'Pause',
                    'method': 'animate',
                    'args': [[None], {
                        'frame': {'duration': 0, 'redraw': False},
                        'mode': 'immediate',
                        'transition': {'duration': 0}
                    }]
                }
            ],
            'x': 0.1,
            'y': 0
        }],
        sliders=[{
            'active': 0,
            'steps': [
                {
                    'args': [[f.name], {
                        'frame': {'duration': 0, 'redraw': True},
                        'mode': 'immediate',
                        'transition': {'duration': 0}
                    }],
                    'label': f.name,
                    'method': 'animate'
                }
                for f in frames
            ],
            'currentvalue': {
                'prefix': 'Pitch: ',
                'font': {'size': 14, 'color': 'black'}
            },
            'x': 0.1,
            'len': 0.9,
            'xanchor': 'left',
            'y': 0,
            'yanchor': 'top'
        }],
        width=1000,
        height=800,
        template='plotly_white'
    )

    return fig

# Example usage
# fig = create_animated_pitch_sequence(cole_pitches, "Gerrit Cole 2024", max_pitches=150)
# fig.show()

7.10 Exercises

Exercise 7.1: Pitcher Comparison Analysis

Using Statcast data, compare two starting pitchers from different teams:

Pull data for both pitchers for the 2024 season
Calculate velocity, spin rate, and movement profiles for each pitch type
Compare their arsenals: usage rates, average velocity, and whiff rates
Create a pitch movement chart for each pitcher
Write a brief scouting report comparing their arsenals

Suggested pitchers: Spencer Strider (ATL) and Shota Imanaga (CHC)

Exercise 7.2: Command and Location Analysis

Analyze pitch location patterns for a pitcher of your choice:

Calculate zone%, edge%, heart%, and chase rate by pitch type
Analyze CSW% overall and by pitch type
Create a heatmap showing pitch locations for their primary pitch (four-seam fastball)
Compare location patterns by count (ahead vs. behind in the count)
Assess: Is this pitcher's success driven more by stuff or command?

Exercise 7.3: Arsenal Effectiveness Study

Investigate which pitch in a pitcher's arsenal is most/least effective:

For each pitch type, calculate:

Whiff rate
xwOBA against
Hard hit rate against
CSW%
Usage rate

Identify the best and worst pitches in the arsenal
Analyze if usage rate aligns with effectiveness (do they throw their best pitches most?)
Calculate pitch values: Run Value per 100 pitches for each pitch type
Make a recommendation: Should they adjust their pitch usage?

Challenge Extension: Compare the pitcher's arsenal effectiveness against left-handed vs. right-handed batters. Do they have platoon splits? Which pitches drive those splits?

You've now completed your deep dive into Statcast pitching analytics. You understand how modern tracking systems measure every pitch, what those measurements reveal about pitcher performance, and how to analyze arsenals, command, and expected outcomes. These skills form the foundation for evaluating pitchers in the modern game, whether you're building projection models, designing development plans, or making strategic decisions.

The next chapter will explore park factors and environmental effects - understanding how context affects the statistics we've been analyzing throughout this book.

Practice Exercises

Reinforce what you've learned with these hands-on exercises. Try to solve them on your own before viewing hints or solutions.

3 exercises

Tips for Success

Read the problem carefully before starting to code
Break down complex problems into smaller steps
Use the hints if you're stuck - they won't give away the answer
After solving, compare your approach with the solution

Exercise 7.1

Pitcher Comparison Analysis

Hard

Using Statcast data, compare two starting pitchers from different teams:

1. Pull data for both pitchers for the 2024 season
2. Calculate velocity, spin rate, and movement profiles for each pitch type
3. Compare their arsenals: usage rates, average velocity, and whiff rates
4. Create a pitch movement chart for each pitcher
5. Write a brief scouting report comparing their arsenals

**Suggested pitchers**: Spencer Strider (ATL) and Shota Imanaga (CHC)

Exercise 7.2

Command and Location Analysis

Hard

Analyze pitch location patterns for a pitcher of your choice:

1. Calculate zone%, edge%, heart%, and chase rate by pitch type
2. Analyze CSW% overall and by pitch type
3. Create a heatmap showing pitch locations for their primary pitch (four-seam fastball)
4. Compare location patterns by count (ahead vs. behind in the count)
5. Assess: Is this pitcher's success driven more by stuff or command?

Exercise 7.3

Arsenal Effectiveness Study

Hard

Investigate which pitch in a pitcher's arsenal is most/least effective:

1. For each pitch type, calculate:
- Whiff rate
- xwOBA against
- Hard hit rate against
- CSW%
- Usage rate
2. Identify the best and worst pitches in the arsenal
3. Analyze if usage rate aligns with effectiveness (do they throw their best pitches most?)
4. Calculate pitch values: Run Value per 100 pitches for each pitch type
5. Make a recommendation: Should they adjust their pitch usage?

**Challenge Extension**: Compare the pitcher's arsenal effectiveness against left-handed vs. right-handed batters. Do they have platoon splits? Which pitches drive those splits?

---

You've now completed your deep dive into Statcast pitching analytics. You understand how modern tracking systems measure every pitch, what those measurements reveal about pitcher performance, and how to analyze arsenals, command, and expected outcomes. These skills form the foundation for evaluating pitchers in the modern game, whether you're building projection models, designing development plans, or making strategic decisions.

The next chapter will explore park factors and environmental effects - understanding how context affects the statistics we've been analyzing throughout this book.

Chapter 7: Statcast Analytics - Pitching

Book Progress

What You'll Learn

Languages in This Chapter

Table of Contents

Quick Navigation

7.1 The Modern Pitching Analytics Revolution

7.1.1 What Statcast Measures for Pitchers

7.1.2 Key Statcast Pitching Metrics

7.1.3 The Physics of Pitching

7.2 Velocity Analysis

7.2.1 Release Velocity vs. Perceived Velocity

7.2.2 Velocity Metrics by Pitch Type

Python Implementation

R Implementation

7.2.3 Velocity Decline and Fatigue Analysis

7.2.4 Key Insights on Velocity

7.3 Spin Rate and Movement

7.3.1 Understanding Spin Rate and Spin Axis

7.3.2 Movement Profiles by Pitch Type

7.3.3 Pitch Movement Visualization

Python Implementation

7.3.4 Analyzing Spin Efficiency

7.4 Vertical Approach Angle (VAA)

7.4.1 What VAA Is and Why It Matters

7.4.2 Calculating VAA

Python Implementation

R Implementation

7.4.3 VAA and Performance

7.5 Release Point and Tunneling

7.5.1 Release Point Consistency

7.5.2 Pitch Tunneling

7.6 Pitch Arsenal Analysis

7.6.1 Building an Arsenal Report

7.6.2 Understanding Stuff+

7.7 Location and Command

7.7.1 Zone Analysis Metrics

7.7.2 CSW% and Chase Rate

7.8 Expected Stats for Pitchers

7.8.1 xERA and xwOBA Against

7.8.2 Interpreting Expected Stats

7.9 Interactive Pitch Analysis Tools

7.9.1 Interactive Pitch Movement Chart

7.9.2 Interactive Release Point Visualization

7.9.3 Animated Pitch Sequence Explorer

7.10 Exercises

Exercise 7.1: Pitcher Comparison Analysis

Exercise 7.2: Command and Location Analysis

Exercise 7.3: Arsenal Effectiveness Study

Practice Exercises

Tips for Success

Pitcher Comparison Analysis

Command and Location Analysis

Arsenal Effectiveness Study

Chapter Summary

Related Resources

Glossary

Resources

All Chapters