Chapter 4: Data Visualization for Baseball

Data visualization transforms raw numbers into insights that can change how we understand baseball. While a batting average tells us something about a player's performance, a spray chart reveals where they hit the ball, against what pitches, and in what counts. A single number describing pitch movement pales in comparison to a plot showing how a pitcher's entire arsenal moves through the strike zone.

Intermediate ~14 min read 6 sections 43 code examples
Book Progress
9%
Chapter 5 of 54
What You'll Learn
  • Visualization Principles for Baseball
  • ggplot2 for Baseball (R)
  • Matplotlib and Seaborn for Baseball (Python)
  • Interactive Visualizations
  • And 2 more topics...
Languages in This Chapter
R (22) Python (21)

All code examples can be copied and run in your environment.

4.1 Visualization Principles for Baseball

Before diving into code, we need to understand what makes a visualization effective. The goal isn't just to make pretty pictures—it's to communicate information clearly and help viewers discover insights they couldn't see in raw data.

4.1.1 Choosing the Right Chart Type {#chart-types}

Different questions require different visualizations. Here's a framework for choosing the right chart type for baseball analytics:

Scatter Plots: Use when exploring relationships between two continuous variables. Perfect for:


  • Exit velocity vs. launch angle

  • Pitch velocity vs. spin rate

  • Barrel rate vs. hard-hit rate

  • Expected stats vs. actual stats

Bar Charts: Use for comparing values across categories. Ideal for:


  • Team statistics (HR, runs, ERA by team)

  • Player rankings

  • Pitch usage percentages

  • Situational splits (home/away, vs. LHP/RHP)

Histograms: Use to show distributions of a single variable. Great for:


  • Distribution of pitch velocities

  • Launch angle distributions

  • Exit velocity distributions

  • Spray angle distributions

Line Charts: Use to show trends over time. Essential for:


  • Performance over a season (game-by-game or rolling averages)

  • Career trajectories

  • In-game win probability

  • Pitch velocity by inning

Heat Maps: Use to show intensity across two dimensions. Perfect for:


  • Strike zone analysis (pitch location)

  • Spray charts (hit location)

  • Shift positioning

  • Platoon splits by zone

Box Plots: Use to compare distributions across groups. Useful for:


  • Comparing pitch velocities across pitchers

  • Exit velocity by player or team

  • Performance consistency metrics

The key is matching your question to the appropriate visualization. If you're asking "How do these two things relate?", use a scatter plot. If you're asking "How does this change over time?", use a line chart. If you're asking "How is this distributed?", use a histogram.

4.1.2 Color Palettes and Accessibility {#color-palettes}

Color is one of the most powerful tools in visualization, but it's also one of the most misused. Here are principles for effective color use:

Team Colors: When visualizing team data, use actual team colors when possible. This makes charts immediately recognizable and connects with fans' existing associations.

Colorblind-Friendly Palettes: Approximately 8% of men and 0.5% of women have some form of color blindness. Use palettes that work for everyone:


  • Viridis palettes (R: scale_color_viridis_d(), Python: cmap='viridis')

  • ColorBrewer palettes designed for accessibility

  • Avoid red-green combinations as the sole differentiator

Sequential vs. Diverging:


  • Sequential: Use for data that goes from low to high (e.g., pitch velocity, exit velocity)

  • Diverging: Use for data with a meaningful midpoint (e.g., difference from league average, positive/negative values)

Limiting Colors: Don't use too many colors in one chart. More than 7-8 categories becomes hard to distinguish. Consider grouping or using facets instead.

4.1.3 Baseball-Specific Visual Conventions {#baseball-conventions}

Baseball has developed its own visual language over decades:

Strike Zone: Always orient with the catcher's perspective (right-handed batter box on the left). The zone should match official dimensions: 17 inches wide, with height varying by batter stance (roughly knees to mid-torso).

Spray Charts: Show the field from above with home plate at the bottom. Standard is to show from the catcher's perspective, though some use the batter's perspective. Be consistent and label clearly.

Pitch Movement: Convention is to show horizontal break (x-axis) vs. vertical break (y-axis), both measured from the catcher's perspective. Positive horizontal break moves toward a right-handed batter; positive vertical break "rises" (actually drops less than gravity alone would cause).

Time-Based Charts: Baseball seasons flow left to right, with April on the left and October on the right. Career trajectories should also flow left to right chronologically.

4.1.4 Telling Stories with Data {#storytelling}

The best visualizations tell stories. They have:

  1. A Clear Focus: What's the one main point? Highlight it with color, size, or annotation.
  2. Context: Show league average lines, reference points, or comparison groups.
  3. Annotations: Add text labels for key players, events, or outliers.
  4. Titles and Labels: Use descriptive titles that state the finding, not just what's being shown.
  5. Source Attribution: Always cite your data source (Statcast, FanGraphs, Baseball Savant, etc.).

A scatter plot of exit velocity vs. launch angle is just data. But add a title like "Aaron Judge's Barrel Zone: High Exit Velocity Meets Optimal Launch Angle," highlight Judge's points in a distinct color, add reference lines at the "barrel" boundaries (95+ mph exit velocity, 15-35° launch angle), and annotate his home runs—now you're telling a story.


4.2 ggplot2 for Baseball (R)

The ggplot2 package, developed by Hadley Wickham, has become the gold standard for data visualization in R. It's based on "The Grammar of Graphics" by Leland Wilkinson, which provides a systematic way to build visualizations by combining independent components.

4.2.1 The Grammar of Graphics Philosophy {#grammar-of-graphics}

The grammar of graphics breaks down visualizations into layers:

  1. Data: The dataset you're visualizing
  2. Aesthetics (aes): How variables map to visual properties (x, y, color, size, shape)
  3. Geometries (geom): The type of plot (points, lines, bars, etc.)
  4. Scales: How aesthetic mappings are displayed (axis limits, color schemes)
  5. Facets: Splitting into multiple subplots
  6. Themes: Overall visual appearance (fonts, backgrounds, grid lines)

This might sound abstract, but it's incredibly powerful. You build plots by explicitly stating each component, which makes them both easy to understand and infinitely customizable.

4.2.2 Building Plots Layer by Layer {#building-plots}

Let's start with a simple example using fictional data to understand the syntax:

library(ggplot2)
library(dplyr)

# Create sample data
player_stats <- data.frame(
  player = c("Ohtani", "Judge", "Betts", "Acuna", "Trout"),
  hr = c(44, 62, 35, 41, 40),
  avg = c(.304, .311, .307, .337, .283)
)

# Build a plot layer by layer
ggplot(data = player_stats, aes(x = avg, y = hr)) +  # Data + aesthetics
  geom_point(size = 3) +                               # Geometry
  labs(title = "Home Runs vs Batting Average",         # Labels
       x = "Batting Average",
       y = "Home Runs") +
  theme_minimal()                                       # Theme

Every ggplot2 visualization starts with ggplot(), which creates a coordinate system. You then add layers with +. The aes() function defines aesthetic mappings—which variables control which visual properties.

4.2.3 Scatter Plots: Exit Velocity vs Launch Angle {#scatter-plots}

Scatter plots are the workhorse of baseball analytics. Let's create a sophisticated visualization showing the relationship between exit velocity and launch angle, colored by hit outcome.

library(ggplot2)
library(dplyr)
library(baseballr)

# Get Statcast data for Shohei Ohtani (2024 season)
# Note: You'll need to install and load baseballr package
# This fetches data from Baseball Savant

ohtani_data <- statcast_search(
  start_date = "2024-04-01",
  end_date = "2024-09-30",
  playerid = 660271,  # Ohtani's MLBAM ID
  player_type = "batter"
)

# Filter for batted balls only
ohtani_batted <- ohtani_data %>%
  filter(!is.na(launch_speed), !is.na(launch_angle)) %>%
  mutate(
    outcome = case_when(
      events == "home_run" ~ "Home Run",
      events %in% c("double", "triple") ~ "Extra Base Hit",
      events == "single" ~ "Single",
      str_detect(events, "out") ~ "Out",
      TRUE ~ "Other"
    )
  )

# Create the visualization
ggplot(ohtani_batted, aes(x = launch_angle, y = launch_speed, color = outcome)) +
  geom_point(alpha = 0.6, size = 2.5) +
  # Add "barrel" zone reference box
  geom_rect(aes(xmin = 15, xmax = 35, ymin = 95, ymax = 115),
            fill = NA, color = "black", linetype = "dashed",
            inherit.aes = FALSE) +
  # Add vertical line at 0 degrees (line drives)
  geom_vline(xintercept = 0, linetype = "dotted", color = "gray50") +
  scale_color_manual(
    values = c("Home Run" = "#FF0000",
               "Extra Base Hit" = "#FF8C00",
               "Single" = "#FFD700",
               "Out" = "#1E90FF",
               "Other" = "#808080")
  ) +
  labs(
    title = "Shohei Ohtani's Batted Ball Profile - 2024 Season",
    subtitle = "Exit Velocity vs Launch Angle colored by outcome. Dashed box shows 'barrel' zone.",
    x = "Launch Angle (degrees)",
    y = "Exit Velocity (mph)",
    color = "Outcome",
    caption = "Data: MLB Statcast via baseballr"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 10, color = "gray30"),
    legend.position = "right"
  )

This visualization immediately reveals several insights:

  1. Home runs cluster in the upper-right: high exit velocity (100+ mph) and moderate-to-high launch angles (20-35°)
  2. The "barrel" zone (dashed box) shows the sweet spot: 95+ mph exit velocity, 15-35° launch angle
  3. Ground balls (negative launch angles) rarely become hits, even with high exit velocity
  4. Weak contact (low exit velocity) almost never results in hits regardless of launch angle

4.2.4 Bar Charts: Team Home Run Leaders {#bar-charts}

Bar charts excel at comparing values across categories. Let's visualize the top home run hitting teams.

library(ggplot2)
library(dplyr)
library(Lahman)  # Historical baseball database

# Use Lahman database for team statistics
team_hr_2024 <- Teams %>%
  filter(yearID == 2023) %>%  # Use 2023 as most recent complete season
  select(name, HR) %>%
  arrange(desc(HR)) %>%
  top_n(10, HR) %>%
  mutate(name = reorder(name, HR))  # Reorder for plotting

ggplot(team_hr_2024, aes(x = name, y = HR, fill = HR)) +
  geom_col() +
  scale_fill_gradient(low = "#deebf7", high = "#08519c") +
  coord_flip() +  # Horizontal bars are easier to read with team names
  labs(
    title = "Top 10 MLB Teams by Home Runs - 2023 Season",
    x = NULL,
    y = "Home Runs",
    caption = "Data: Lahman Baseball Database"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "none",  # Color gradient is self-explanatory
    panel.grid.major.y = element_blank()
  )

Design choices explained:


  • coord_flip(): Makes horizontal bars, which are easier to read with long team names

  • reorder(name, HR): Orders bars from lowest to highest, making comparisons easier

  • Gradient fill: Reinforces the magnitude differences visually

  • Removed horizontal grid lines: They add clutter without helping interpretation

For comparing multiple statistics across teams, we can use grouped or stacked bars:

# Compare HR and K% for top teams
team_comparison <- Teams %>%
  filter(yearID == 2023) %>%
  top_n(8, HR) %>%
  mutate(
    K_pct = (SO / AB) * 100,
    team_abbr = substr(name, 1, 3)
  ) %>%
  select(team_abbr, HR, K_pct) %>%
  tidyr::pivot_longer(cols = c(HR, K_pct),
                      names_to = "stat",
                      values_to = "value")

ggplot(team_comparison, aes(x = reorder(team_abbr, value), y = value, fill = stat)) +
  geom_col(position = "dodge") +
  scale_fill_manual(
    values = c("HR" = "#d7191c", "K_pct" = "#2b83ba"),
    labels = c("Home Runs", "Strikeout %")
  ) +
  coord_flip() +
  labs(
    title = "Power vs Contact: HR Leaders and Their Strikeout Rates",
    subtitle = "Note: HR and K% are on different scales for visualization purposes",
    x = NULL,
    y = "Value",
    fill = "Statistic"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

4.2.5 Histograms: Distribution of Pitch Velocities {#histograms}

Histograms show the distribution of a single continuous variable. They're essential for understanding what's "normal" vs. exceptional.

library(ggplot2)
library(baseballr)

# Get all four-seam fastballs from a date range
fastballs <- statcast_search(
  start_date = "2024-06-01",
  end_date = "2024-06-07",
  pitch_type = "FF"  # Four-seam fastball
)

# Clean the data
fastballs_clean <- fastballs %>%
  filter(!is.na(release_speed), release_speed > 70, release_speed < 105)

# Create histogram with density curve
ggplot(fastballs_clean, aes(x = release_speed)) +
  geom_histogram(aes(y = after_stat(density)),
                 binwidth = 1,
                 fill = "#2b8cbe",
                 color = "white",
                 alpha = 0.7) +
  geom_density(color = "#08519c", size = 1.2) +
  # Add reference lines
  geom_vline(aes(xintercept = mean(release_speed)),
             color = "#d7301f",
             linetype = "dashed",
             size = 1) +
  geom_vline(aes(xintercept = median(release_speed)),
             color = "#fc8d59",
             linetype = "dashed",
             size = 1) +
  annotate("text", x = mean(fastballs_clean$release_speed) + 1, y = 0.08,
           label = paste0("Mean: ", round(mean(fastballs_clean$release_speed), 1), " mph"),
           color = "#d7301f", hjust = 0) +
  annotate("text", x = median(fastballs_clean$release_speed) + 1, y = 0.075,
           label = paste0("Median: ", round(median(fastballs_clean$release_speed), 1), " mph"),
           color = "#fc8d59", hjust = 0) +
  labs(
    title = "Distribution of Four-Seam Fastball Velocities",
    subtitle = "MLB pitches from June 1-7, 2024",
    x = "Velocity (mph)",
    y = "Density",
    caption = "Data: MLB Statcast"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid.minor = element_blank()
  )

This histogram reveals:


  • The distribution is approximately normal (bell-curved)

  • Mean velocity around 93-94 mph

  • Most pitchers throw between 90-97 mph

  • Outliers on both ends: some pitchers throw 85 mph "fastballs", others reach 100+ mph

Comparing distributions across groups:

# Compare fastball velocity by pitcher role
pitcher_fastballs <- fastballs_clean %>%
  left_join(
    # You'd typically get pitcher role from another source
    # This is illustrative
    data.frame(
      pitcher = unique(fastballs_clean$pitcher)[1:100],
      role = sample(c("Starter", "Reliever"), 100, replace = TRUE)
    ),
    by = "pitcher"
  ) %>%
  filter(!is.na(role))

ggplot(pitcher_fastballs, aes(x = release_speed, fill = role)) +
  geom_histogram(alpha = 0.6, position = "identity", binwidth = 1) +
  scale_fill_manual(values = c("Starter" = "#2c7bb6", "Reliever" = "#d7191c")) +
  labs(
    title = "Fastball Velocity: Starters vs Relievers",
    x = "Velocity (mph)",
    y = "Count",
    fill = "Role"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

Typically, relievers throw harder than starters because they only pitch 1-2 innings vs. 5-7, allowing them to maximize effort.

4.2.6 Line Charts: Season Performance Over Time {#line-charts}

Line charts track changes over time. They're perfect for showing player performance trends across a season.

library(ggplot2)
library(dplyr)
library(zoo)  # For rolling averages

# Simulate game-by-game batting average data
# In practice, you'd get this from a database or API
set.seed(123)
games <- 150
ohtani_games <- data.frame(
  game_num = 1:games,
  date = seq(as.Date("2024-04-01"), by = "day", length.out = games),
  hits = rbinom(games, 4, 0.31),  # ~.310 avg, ~4 AB per game
  ab = sample(3:5, games, replace = TRUE)
) %>%
  mutate(
    cumulative_hits = cumsum(hits),
    cumulative_ab = cumsum(ab),
    avg = cumulative_hits / cumulative_ab,
    # 10-game rolling average
    rolling_avg = rollmean(hits / ab, k = 10, fill = NA, align = "right")
  )

ggplot(ohtani_games, aes(x = game_num)) +
  # Cumulative average (overall season average)
  geom_line(aes(y = avg), color = "#2b8cbe", size = 1.2, alpha = 0.6) +
  # 10-game rolling average (recent form)
  geom_line(aes(y = rolling_avg), color = "#d7301f", size = 1.2) +
  # League average reference line
  geom_hline(yintercept = 0.250, linetype = "dashed", color = "gray50") +
  annotate("text", x = 130, y = 0.255,
           label = "League Avg (.250)", color = "gray30", size = 3) +
  scale_y_continuous(labels = scales::number_format(accuracy = 0.001)) +
  labs(
    title = "Shohei Ohtani's 2024 Batting Average Progression",
    subtitle = "Blue: Season average | Red: 10-game rolling average | Dashed: League average",
    x = "Game Number",
    y = "Batting Average",
    caption = "Data: Simulated for illustration"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid.minor = element_blank()
  )

This visualization shows:


  • Season average (blue): Gradually stabilizes over time (more stable with more games)

  • Rolling average (red): Shows hot and cold streaks

  • Comparison to league average: How the player performs relative to peers

Multiple players comparison:

# Compare multiple players' OPS over the season
players_ops <- data.frame(
  game_num = rep(1:games, 3),
  player = rep(c("Ohtani", "Judge", "Betts"), each = games)
) %>%
  group_by(player) %>%
  mutate(
    ops = cumsum(rnorm(games, mean = 0.9, sd = 0.1)) / game_num + rnorm(games, 0, 0.02)
  )

ggplot(players_ops, aes(x = game_num, y = ops, color = player)) +
  geom_line(size = 1.2, alpha = 0.8) +
  scale_color_manual(
    values = c("Ohtani" = "#BA0021", "Judge" = "#003087", "Betts" = "#005A9C")
  ) +
  labs(
    title = "2024 OPS Race: Ohtani vs Judge vs Betts",
    x = "Game Number",
    y = "OPS (On-base Plus Slugging)",
    color = "Player"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "bottom"
  )

4.2.7 Faceted Plots: Comparing Players/Teams {#faceted-plots}

Faceting creates multiple subplots based on a categorical variable. This is powerful for comparing patterns across groups without overplotting.

# Compare batted ball profiles for multiple players
players <- c(660271, 592450, 605141)  # Ohtani, Judge, Betts MLBAM IDs
player_names <- c("Shohei Ohtani", "Aaron Judge", "Mookie Betts")

# Fetch data for all players (this would be done in practice)
# For illustration, we'll create synthetic data
batted_balls <- expand.grid(
  player = player_names,
  launch_angle = seq(-50, 60, length.out = 200)
) %>%
  mutate(
    launch_speed = case_when(
      player == "Aaron Judge" ~ 85 + 20 * exp(-((launch_angle - 15)^2) / 400) + rnorm(n(), 0, 5),
      player == "Mookie Betts" ~ 82 + 18 * exp(-((launch_angle - 12)^2) / 350) + rnorm(n(), 0, 5),
      player == "Shohei Ohtani" ~ 84 + 19 * exp(-((launch_angle - 14)^2) / 380) + rnorm(n(), 0, 5)
    )
  )

ggplot(batted_balls, aes(x = launch_angle, y = launch_speed)) +
  geom_point(alpha = 0.3, color = "#2b8cbe") +
  geom_smooth(method = "loess", color = "#d7301f", se = FALSE, size = 1.2) +
  # Add barrel zone
  geom_rect(xmin = 15, xmax = 35, ymin = 95, ymax = 115,
            fill = NA, color = "black", linetype = "dashed") +
  facet_wrap(~ player, ncol = 3) +
  labs(
    title = "Batted Ball Profiles: Elite Hitters Compared",
    subtitle = "Each panel shows launch angle vs exit velocity. Dashed box = barrel zone.",
    x = "Launch Angle (degrees)",
    y = "Exit Velocity (mph)",
    caption = "Data: Simulated for illustration"
  ) +
  theme_minimal(base_size = 11) +
  theme(
    plot.title = element_text(face = "bold"),
    strip.text = element_text(face = "bold", size = 12),
    strip.background = element_rect(fill = "gray90", color = NA)
  )

Faceting can also be done in a grid with two variables:

# Compare performance by count (balls-strikes) for different batters
count_performance <- expand.grid(
  player = c("Ohtani", "Judge", "Betts"),
  balls = 0:3,
  strikes = 0:2
) %>%
  mutate(
    count = paste0(balls, "-", strikes),
    avg = 0.25 + 0.08 * (balls / 3) - 0.05 * (strikes / 2) + rnorm(n(), 0, 0.03),
    avg = pmax(0.1, pmin(0.5, avg))  # Keep realistic bounds
  )

ggplot(count_performance, aes(x = balls, y = avg, color = factor(strikes))) +
  geom_line(size = 1.2) +
  geom_point(size = 2.5) +
  facet_wrap(~ player, ncol = 3) +
  scale_color_manual(
    values = c("0" = "#31a354", "1" = "#fdae6b", "2" = "#e34a33"),
    name = "Strikes"
  ) +
  labs(
    title = "Batting Average by Count",
    subtitle = "How does performance change with different ball-strike counts?",
    x = "Balls",
    y = "Batting Average"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    strip.text = element_text(face = "bold")
  )

4.2.8 Customizing Themes and Aesthetics {#themes}

ggplot2 offers extensive customization. Here's how to create a publication-ready, branded theme:

# Create a custom baseball theme
theme_baseball <- function(base_size = 12, base_family = "") {
  theme_minimal(base_size = base_size, base_family = base_family) +
    theme(
      # Text
      plot.title = element_text(face = "bold", size = rel(1.3), hjust = 0),
      plot.subtitle = element_text(size = rel(0.9), color = "gray30",
                                    margin = margin(b = 10)),
      plot.caption = element_text(size = rel(0.7), color = "gray50",
                                  hjust = 1, margin = margin(t = 10)),

      # Axes
      axis.title = element_text(size = rel(1), face = "bold"),
      axis.text = element_text(size = rel(0.9)),

      # Legend
      legend.position = "right",
      legend.title = element_text(face = "bold", size = rel(1)),
      legend.text = element_text(size = rel(0.9)),

      # Panel
      panel.grid.minor = element_blank(),
      panel.grid.major = element_line(color = "gray90", size = 0.3),

      # Background
      plot.background = element_rect(fill = "white", color = NA),
      panel.background = element_rect(fill = "white", color = NA),

      # Margins
      plot.margin = margin(20, 20, 20, 20)
    )
}

# Use the custom theme
ggplot(ohtani_batted, aes(x = launch_angle, y = launch_speed, color = outcome)) +
  geom_point(alpha = 0.6, size = 2.5) +
  scale_color_manual(
    values = c("Home Run" = "#FF0000", "Extra Base Hit" = "#FF8C00",
               "Single" = "#FFD700", "Out" = "#1E90FF", "Other" = "#808080")
  ) +
  labs(
    title = "Elite Exit Velocity and Optimal Launch Angles",
    subtitle = "Shohei Ohtani's 2024 batted ball profile",
    x = "Launch Angle (degrees)",
    y = "Exit Velocity (mph)",
    color = "Outcome"
  ) +
  theme_baseball()
R
library(ggplot2)
library(dplyr)

# Create sample data
player_stats <- data.frame(
  player = c("Ohtani", "Judge", "Betts", "Acuna", "Trout"),
  hr = c(44, 62, 35, 41, 40),
  avg = c(.304, .311, .307, .337, .283)
)

# Build a plot layer by layer
ggplot(data = player_stats, aes(x = avg, y = hr)) +  # Data + aesthetics
  geom_point(size = 3) +                               # Geometry
  labs(title = "Home Runs vs Batting Average",         # Labels
       x = "Batting Average",
       y = "Home Runs") +
  theme_minimal()                                       # Theme
R
library(ggplot2)
library(dplyr)
library(baseballr)

# Get Statcast data for Shohei Ohtani (2024 season)
# Note: You'll need to install and load baseballr package
# This fetches data from Baseball Savant

ohtani_data <- statcast_search(
  start_date = "2024-04-01",
  end_date = "2024-09-30",
  playerid = 660271,  # Ohtani's MLBAM ID
  player_type = "batter"
)

# Filter for batted balls only
ohtani_batted <- ohtani_data %>%
  filter(!is.na(launch_speed), !is.na(launch_angle)) %>%
  mutate(
    outcome = case_when(
      events == "home_run" ~ "Home Run",
      events %in% c("double", "triple") ~ "Extra Base Hit",
      events == "single" ~ "Single",
      str_detect(events, "out") ~ "Out",
      TRUE ~ "Other"
    )
  )

# Create the visualization
ggplot(ohtani_batted, aes(x = launch_angle, y = launch_speed, color = outcome)) +
  geom_point(alpha = 0.6, size = 2.5) +
  # Add "barrel" zone reference box
  geom_rect(aes(xmin = 15, xmax = 35, ymin = 95, ymax = 115),
            fill = NA, color = "black", linetype = "dashed",
            inherit.aes = FALSE) +
  # Add vertical line at 0 degrees (line drives)
  geom_vline(xintercept = 0, linetype = "dotted", color = "gray50") +
  scale_color_manual(
    values = c("Home Run" = "#FF0000",
               "Extra Base Hit" = "#FF8C00",
               "Single" = "#FFD700",
               "Out" = "#1E90FF",
               "Other" = "#808080")
  ) +
  labs(
    title = "Shohei Ohtani's Batted Ball Profile - 2024 Season",
    subtitle = "Exit Velocity vs Launch Angle colored by outcome. Dashed box shows 'barrel' zone.",
    x = "Launch Angle (degrees)",
    y = "Exit Velocity (mph)",
    color = "Outcome",
    caption = "Data: MLB Statcast via baseballr"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 10, color = "gray30"),
    legend.position = "right"
  )
R
library(ggplot2)
library(dplyr)
library(Lahman)  # Historical baseball database

# Use Lahman database for team statistics
team_hr_2024 <- Teams %>%
  filter(yearID == 2023) %>%  # Use 2023 as most recent complete season
  select(name, HR) %>%
  arrange(desc(HR)) %>%
  top_n(10, HR) %>%
  mutate(name = reorder(name, HR))  # Reorder for plotting

ggplot(team_hr_2024, aes(x = name, y = HR, fill = HR)) +
  geom_col() +
  scale_fill_gradient(low = "#deebf7", high = "#08519c") +
  coord_flip() +  # Horizontal bars are easier to read with team names
  labs(
    title = "Top 10 MLB Teams by Home Runs - 2023 Season",
    x = NULL,
    y = "Home Runs",
    caption = "Data: Lahman Baseball Database"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "none",  # Color gradient is self-explanatory
    panel.grid.major.y = element_blank()
  )
R
# Compare HR and K% for top teams
team_comparison <- Teams %>%
  filter(yearID == 2023) %>%
  top_n(8, HR) %>%
  mutate(
    K_pct = (SO / AB) * 100,
    team_abbr = substr(name, 1, 3)
  ) %>%
  select(team_abbr, HR, K_pct) %>%
  tidyr::pivot_longer(cols = c(HR, K_pct),
                      names_to = "stat",
                      values_to = "value")

ggplot(team_comparison, aes(x = reorder(team_abbr, value), y = value, fill = stat)) +
  geom_col(position = "dodge") +
  scale_fill_manual(
    values = c("HR" = "#d7191c", "K_pct" = "#2b83ba"),
    labels = c("Home Runs", "Strikeout %")
  ) +
  coord_flip() +
  labs(
    title = "Power vs Contact: HR Leaders and Their Strikeout Rates",
    subtitle = "Note: HR and K% are on different scales for visualization purposes",
    x = NULL,
    y = "Value",
    fill = "Statistic"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))
R
library(ggplot2)
library(baseballr)

# Get all four-seam fastballs from a date range
fastballs <- statcast_search(
  start_date = "2024-06-01",
  end_date = "2024-06-07",
  pitch_type = "FF"  # Four-seam fastball
)

# Clean the data
fastballs_clean <- fastballs %>%
  filter(!is.na(release_speed), release_speed > 70, release_speed < 105)

# Create histogram with density curve
ggplot(fastballs_clean, aes(x = release_speed)) +
  geom_histogram(aes(y = after_stat(density)),
                 binwidth = 1,
                 fill = "#2b8cbe",
                 color = "white",
                 alpha = 0.7) +
  geom_density(color = "#08519c", size = 1.2) +
  # Add reference lines
  geom_vline(aes(xintercept = mean(release_speed)),
             color = "#d7301f",
             linetype = "dashed",
             size = 1) +
  geom_vline(aes(xintercept = median(release_speed)),
             color = "#fc8d59",
             linetype = "dashed",
             size = 1) +
  annotate("text", x = mean(fastballs_clean$release_speed) + 1, y = 0.08,
           label = paste0("Mean: ", round(mean(fastballs_clean$release_speed), 1), " mph"),
           color = "#d7301f", hjust = 0) +
  annotate("text", x = median(fastballs_clean$release_speed) + 1, y = 0.075,
           label = paste0("Median: ", round(median(fastballs_clean$release_speed), 1), " mph"),
           color = "#fc8d59", hjust = 0) +
  labs(
    title = "Distribution of Four-Seam Fastball Velocities",
    subtitle = "MLB pitches from June 1-7, 2024",
    x = "Velocity (mph)",
    y = "Density",
    caption = "Data: MLB Statcast"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid.minor = element_blank()
  )
R
# Compare fastball velocity by pitcher role
pitcher_fastballs <- fastballs_clean %>%
  left_join(
    # You'd typically get pitcher role from another source
    # This is illustrative
    data.frame(
      pitcher = unique(fastballs_clean$pitcher)[1:100],
      role = sample(c("Starter", "Reliever"), 100, replace = TRUE)
    ),
    by = "pitcher"
  ) %>%
  filter(!is.na(role))

ggplot(pitcher_fastballs, aes(x = release_speed, fill = role)) +
  geom_histogram(alpha = 0.6, position = "identity", binwidth = 1) +
  scale_fill_manual(values = c("Starter" = "#2c7bb6", "Reliever" = "#d7191c")) +
  labs(
    title = "Fastball Velocity: Starters vs Relievers",
    x = "Velocity (mph)",
    y = "Count",
    fill = "Role"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))
R
library(ggplot2)
library(dplyr)
library(zoo)  # For rolling averages

# Simulate game-by-game batting average data
# In practice, you'd get this from a database or API
set.seed(123)
games <- 150
ohtani_games <- data.frame(
  game_num = 1:games,
  date = seq(as.Date("2024-04-01"), by = "day", length.out = games),
  hits = rbinom(games, 4, 0.31),  # ~.310 avg, ~4 AB per game
  ab = sample(3:5, games, replace = TRUE)
) %>%
  mutate(
    cumulative_hits = cumsum(hits),
    cumulative_ab = cumsum(ab),
    avg = cumulative_hits / cumulative_ab,
    # 10-game rolling average
    rolling_avg = rollmean(hits / ab, k = 10, fill = NA, align = "right")
  )

ggplot(ohtani_games, aes(x = game_num)) +
  # Cumulative average (overall season average)
  geom_line(aes(y = avg), color = "#2b8cbe", size = 1.2, alpha = 0.6) +
  # 10-game rolling average (recent form)
  geom_line(aes(y = rolling_avg), color = "#d7301f", size = 1.2) +
  # League average reference line
  geom_hline(yintercept = 0.250, linetype = "dashed", color = "gray50") +
  annotate("text", x = 130, y = 0.255,
           label = "League Avg (.250)", color = "gray30", size = 3) +
  scale_y_continuous(labels = scales::number_format(accuracy = 0.001)) +
  labs(
    title = "Shohei Ohtani's 2024 Batting Average Progression",
    subtitle = "Blue: Season average | Red: 10-game rolling average | Dashed: League average",
    x = "Game Number",
    y = "Batting Average",
    caption = "Data: Simulated for illustration"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid.minor = element_blank()
  )
R
# Compare multiple players' OPS over the season
players_ops <- data.frame(
  game_num = rep(1:games, 3),
  player = rep(c("Ohtani", "Judge", "Betts"), each = games)
) %>%
  group_by(player) %>%
  mutate(
    ops = cumsum(rnorm(games, mean = 0.9, sd = 0.1)) / game_num + rnorm(games, 0, 0.02)
  )

ggplot(players_ops, aes(x = game_num, y = ops, color = player)) +
  geom_line(size = 1.2, alpha = 0.8) +
  scale_color_manual(
    values = c("Ohtani" = "#BA0021", "Judge" = "#003087", "Betts" = "#005A9C")
  ) +
  labs(
    title = "2024 OPS Race: Ohtani vs Judge vs Betts",
    x = "Game Number",
    y = "OPS (On-base Plus Slugging)",
    color = "Player"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "bottom"
  )
R
# Compare batted ball profiles for multiple players
players <- c(660271, 592450, 605141)  # Ohtani, Judge, Betts MLBAM IDs
player_names <- c("Shohei Ohtani", "Aaron Judge", "Mookie Betts")

# Fetch data for all players (this would be done in practice)
# For illustration, we'll create synthetic data
batted_balls <- expand.grid(
  player = player_names,
  launch_angle = seq(-50, 60, length.out = 200)
) %>%
  mutate(
    launch_speed = case_when(
      player == "Aaron Judge" ~ 85 + 20 * exp(-((launch_angle - 15)^2) / 400) + rnorm(n(), 0, 5),
      player == "Mookie Betts" ~ 82 + 18 * exp(-((launch_angle - 12)^2) / 350) + rnorm(n(), 0, 5),
      player == "Shohei Ohtani" ~ 84 + 19 * exp(-((launch_angle - 14)^2) / 380) + rnorm(n(), 0, 5)
    )
  )

ggplot(batted_balls, aes(x = launch_angle, y = launch_speed)) +
  geom_point(alpha = 0.3, color = "#2b8cbe") +
  geom_smooth(method = "loess", color = "#d7301f", se = FALSE, size = 1.2) +
  # Add barrel zone
  geom_rect(xmin = 15, xmax = 35, ymin = 95, ymax = 115,
            fill = NA, color = "black", linetype = "dashed") +
  facet_wrap(~ player, ncol = 3) +
  labs(
    title = "Batted Ball Profiles: Elite Hitters Compared",
    subtitle = "Each panel shows launch angle vs exit velocity. Dashed box = barrel zone.",
    x = "Launch Angle (degrees)",
    y = "Exit Velocity (mph)",
    caption = "Data: Simulated for illustration"
  ) +
  theme_minimal(base_size = 11) +
  theme(
    plot.title = element_text(face = "bold"),
    strip.text = element_text(face = "bold", size = 12),
    strip.background = element_rect(fill = "gray90", color = NA)
  )
R
# Compare performance by count (balls-strikes) for different batters
count_performance <- expand.grid(
  player = c("Ohtani", "Judge", "Betts"),
  balls = 0:3,
  strikes = 0:2
) %>%
  mutate(
    count = paste0(balls, "-", strikes),
    avg = 0.25 + 0.08 * (balls / 3) - 0.05 * (strikes / 2) + rnorm(n(), 0, 0.03),
    avg = pmax(0.1, pmin(0.5, avg))  # Keep realistic bounds
  )

ggplot(count_performance, aes(x = balls, y = avg, color = factor(strikes))) +
  geom_line(size = 1.2) +
  geom_point(size = 2.5) +
  facet_wrap(~ player, ncol = 3) +
  scale_color_manual(
    values = c("0" = "#31a354", "1" = "#fdae6b", "2" = "#e34a33"),
    name = "Strikes"
  ) +
  labs(
    title = "Batting Average by Count",
    subtitle = "How does performance change with different ball-strike counts?",
    x = "Balls",
    y = "Batting Average"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    strip.text = element_text(face = "bold")
  )
R
# Create a custom baseball theme
theme_baseball <- function(base_size = 12, base_family = "") {
  theme_minimal(base_size = base_size, base_family = base_family) +
    theme(
      # Text
      plot.title = element_text(face = "bold", size = rel(1.3), hjust = 0),
      plot.subtitle = element_text(size = rel(0.9), color = "gray30",
                                    margin = margin(b = 10)),
      plot.caption = element_text(size = rel(0.7), color = "gray50",
                                  hjust = 1, margin = margin(t = 10)),

      # Axes
      axis.title = element_text(size = rel(1), face = "bold"),
      axis.text = element_text(size = rel(0.9)),

      # Legend
      legend.position = "right",
      legend.title = element_text(face = "bold", size = rel(1)),
      legend.text = element_text(size = rel(0.9)),

      # Panel
      panel.grid.minor = element_blank(),
      panel.grid.major = element_line(color = "gray90", size = 0.3),

      # Background
      plot.background = element_rect(fill = "white", color = NA),
      panel.background = element_rect(fill = "white", color = NA),

      # Margins
      plot.margin = margin(20, 20, 20, 20)
    )
}

# Use the custom theme
ggplot(ohtani_batted, aes(x = launch_angle, y = launch_speed, color = outcome)) +
  geom_point(alpha = 0.6, size = 2.5) +
  scale_color_manual(
    values = c("Home Run" = "#FF0000", "Extra Base Hit" = "#FF8C00",
               "Single" = "#FFD700", "Out" = "#1E90FF", "Other" = "#808080")
  ) +
  labs(
    title = "Elite Exit Velocity and Optimal Launch Angles",
    subtitle = "Shohei Ohtani's 2024 batted ball profile",
    x = "Launch Angle (degrees)",
    y = "Exit Velocity (mph)",
    color = "Outcome"
  ) +
  theme_baseball()

4.3 Matplotlib and Seaborn for Baseball (Python)

Python's visualization ecosystem centers around Matplotlib (low-level, fine control) and Seaborn (high-level, statistical focus). Together, they provide the same power as ggplot2, though with different syntax.

4.3.1 Matplotlib Basics {#matplotlib-basics}

Matplotlib gives you complete control but requires more code. The key concept is the figure/axes hierarchy:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Create figure and axes
fig, ax = plt.subplots(figsize=(10, 6))

# Sample data
player_stats = pd.DataFrame({
    'player': ['Ohtani', 'Judge', 'Betts', 'Acuna', 'Trout'],
    'hr': [44, 62, 35, 41, 40],
    'avg': [.304, .311, .307, .337, .283]
})

# Create scatter plot
ax.scatter(player_stats['avg'], player_stats['hr'],
           s=100, alpha=0.6, color='#2b8cbe')

# Customize
ax.set_xlabel('Batting Average', fontsize=12, fontweight='bold')
ax.set_ylabel('Home Runs', fontsize=12, fontweight='bold')
ax.set_title('Home Runs vs Batting Average',
             fontsize=14, fontweight='bold', pad=20)
ax.grid(True, alpha=0.3)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

The fig, ax = plt.subplots() pattern creates a figure (the overall canvas) and axes (the plot area). You then call methods on ax to add elements.

4.3.2 Seaborn Statistical Plots {#seaborn-stats}

Seaborn builds on Matplotlib with a higher-level interface focused on statistical graphics. It works directly with pandas DataFrames and requires less code for complex visualizations.

import seaborn as sns
import matplotlib.pyplot as plt

# Set style
sns.set_style("whitegrid")
sns.set_palette("husl")

# The same scatter plot in Seaborn
plt.figure(figsize=(10, 6))
sns.scatterplot(data=player_stats, x='avg', y='hr', s=150, alpha=0.7)
plt.xlabel('Batting Average', fontweight='bold')
plt.ylabel('Home Runs', fontweight='bold')
plt.title('Home Runs vs Batting Average', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

Seaborn shines with statistical visualizations like regression plots, distribution plots, and categorical plots.

4.3.3 Creating Baseball Visualizations in Python {#python-viz-examples}

Now let's recreate the R examples in Python.

Scatter Plot: Exit Velocity vs Launch Angle

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from pybaseball import statcast
from datetime import datetime

# Get Statcast data for Shohei Ohtani
# Note: pybaseball uses different ID system than R's baseballr
ohtani_data = statcast(
    start_dt='2024-04-01',
    end_dt='2024-09-30'
)

# Filter for Ohtani's batted balls
# You'd need to filter by player_name or use playerid_lookup
ohtani_batted = ohtani_data[
    (ohtani_data['player_name'] == 'Ohtani, Shohei') &
    (ohtani_data['launch_speed'].notna()) &
    (ohtani_data['launch_angle'].notna())
].copy()

# Categorize outcomes
def categorize_outcome(row):
    event = row['events']
    if pd.isna(event):
        return 'Other'
    elif event == 'home_run':
        return 'Home Run'
    elif event in ['double', 'triple']:
        return 'Extra Base Hit'
    elif event == 'single':
        return 'Single'
    elif 'out' in event:
        return 'Out'
    else:
        return 'Other'

ohtani_batted['outcome'] = ohtani_batted.apply(categorize_outcome, axis=1)

# Create the plot
fig, ax = plt.subplots(figsize=(12, 8))

# Define colors
colors = {
    'Home Run': '#FF0000',
    'Extra Base Hit': '#FF8C00',
    'Single': '#FFD700',
    'Out': '#1E90FF',
    'Other': '#808080'
}

# Plot each outcome type
for outcome, color in colors.items():
    data = ohtani_batted[ohtani_batted['outcome'] == outcome]
    ax.scatter(data['launch_angle'], data['launch_speed'],
               c=color, label=outcome, alpha=0.6, s=50, edgecolors='none')

# Add barrel zone rectangle
from matplotlib.patches import Rectangle
barrel_zone = Rectangle((15, 95), 20, 20,
                         fill=False, edgecolor='black',
                         linestyle='--', linewidth=2)
ax.add_patch(barrel_zone)

# Add vertical reference line at 0 degrees
ax.axvline(x=0, color='gray', linestyle=':', alpha=0.5)

# Customize plot
ax.set_xlabel('Launch Angle (degrees)', fontsize=12, fontweight='bold')
ax.set_ylabel('Exit Velocity (mph)', fontsize=12, fontweight='bold')
ax.set_title("Shohei Ohtani's Batted Ball Profile - 2024 Season",
             fontsize=14, fontweight='bold', pad=20)
ax.text(17, 92, 'Barrel Zone', fontsize=9, style='italic')
ax.legend(loc='upper right', frameon=True, framealpha=0.9)
ax.grid(True, alpha=0.3)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.figtext(0.99, 0.01, 'Data: MLB Statcast via pybaseball',
            ha='right', fontsize=8, style='italic', color='gray')
plt.tight_layout()
plt.show()

Using Seaborn for the same plot (simpler syntax):

# Seaborn version
plt.figure(figsize=(12, 8))
sns.scatterplot(
    data=ohtani_batted,
    x='launch_angle',
    y='launch_speed',
    hue='outcome',
    palette=colors,
    alpha=0.6,
    s=80,
    edgecolor='none'
)

# Add barrel zone
from matplotlib.patches import Rectangle
ax = plt.gca()
barrel_zone = Rectangle((15, 95), 20, 20,
                         fill=False, edgecolor='black',
                         linestyle='--', linewidth=2)
ax.add_patch(barrel_zone)

plt.axvline(x=0, color='gray', linestyle=':', alpha=0.5)
plt.xlabel('Launch Angle (degrees)', fontweight='bold')
plt.ylabel('Exit Velocity (mph)', fontweight='bold')
plt.title("Shohei Ohtani's Batted Ball Profile - 2024 Season",
          fontweight='bold', fontsize=14, pad=20)
plt.legend(title='Outcome', loc='upper right', frameon=True)
plt.tight_layout()
plt.show()

Bar Chart: Team Home Run Leaders

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Sample team HR data (2023 season)
team_hr = pd.DataFrame({
    'team': ['Braves', 'Dodgers', 'Astros', 'Rangers', 'Blue Jays',
             'Yankees', 'Phillies', 'Rays', 'Orioles', 'Diamondbacks'],
    'hr': [307, 280, 239, 234, 231, 224, 223, 215, 214, 212]
})

# Sort for plotting
team_hr_sorted = team_hr.sort_values('hr')

# Create horizontal bar chart
fig, ax = plt.subplots(figsize=(10, 6))

# Create bars with gradient color
colors_grad = plt.cm.Blues(np.linspace(0.4, 0.9, len(team_hr_sorted)))
bars = ax.barh(team_hr_sorted['team'], team_hr_sorted['hr'], color=colors_grad)

# Add value labels
for i, (idx, row) in enumerate(team_hr_sorted.iterrows()):
    ax.text(row['hr'] + 3, i, str(row['hr']),
            va='center', fontweight='bold', fontsize=10)

# Customize
ax.set_xlabel('Home Runs', fontsize=12, fontweight='bold')
ax.set_title('Top 10 MLB Teams by Home Runs - 2023 Season',
             fontsize=14, fontweight='bold', pad=20)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.tick_params(left=False)
ax.set_axisbelow(True)
ax.grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.show()

Using Seaborn:

plt.figure(figsize=(10, 6))
sns.barplot(
    data=team_hr_sorted,
    y='team',
    x='hr',
    palette='Blues_r',
    orient='h'
)
plt.xlabel('Home Runs', fontweight='bold')
plt.title('Top 10 MLB Teams by Home Runs - 2023 Season',
          fontweight='bold', fontsize=14, pad=20)
plt.tight_layout()
plt.show()

Histogram: Distribution of Pitch Velocities

import matplotlib.pyplot as plt
import seaborn as sns
from pybaseball import statcast
import numpy as np

# Get fastball data
fastballs = statcast(start_dt='2024-06-01', end_dt='2024-06-07')
fastballs_ff = fastballs[
    (fastballs['pitch_type'] == 'FF') &
    (fastballs['release_speed'].notna()) &
    (fastballs['release_speed'] > 70) &
    (fastballs['release_speed'] < 105)
]

# Create histogram with KDE
fig, ax = plt.subplots(figsize=(12, 7))

# Histogram
ax.hist(fastballs_ff['release_speed'], bins=30,
        density=True, alpha=0.7, color='#2b8cbe', edgecolor='white')

# KDE (kernel density estimate)
from scipy import stats
density = stats.gaussian_kde(fastballs_ff['release_speed'])
xs = np.linspace(fastballs_ff['release_speed'].min(),
                 fastballs_ff['release_speed'].max(), 200)
ax.plot(xs, density(xs), color='#08519c', linewidth=2.5, label='Density')

# Add mean and median lines
mean_vel = fastballs_ff['release_speed'].mean()
median_vel = fastballs_ff['release_speed'].median()

ax.axvline(mean_vel, color='#d7301f', linestyle='--', linewidth=2,
           label=f'Mean: {mean_vel:.1f} mph')
ax.axvline(median_vel, color='#fc8d59', linestyle='--', linewidth=2,
           label=f'Median: {median_vel:.1f} mph')

# Customize
ax.set_xlabel('Velocity (mph)', fontsize=12, fontweight='bold')
ax.set_ylabel('Density', fontsize=12, fontweight='bold')
ax.set_title('Distribution of Four-Seam Fastball Velocities',
             fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='upper left', frameon=True)
ax.grid(True, alpha=0.3, axis='y')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.figtext(0.99, 0.01, 'Data: MLB Statcast | June 1-7, 2024',
            ha='right', fontsize=8, style='italic', color='gray')
plt.tight_layout()
plt.show()

Using Seaborn (much simpler):

plt.figure(figsize=(12, 7))
sns.histplot(data=fastballs_ff, x='release_speed',
             bins=30, kde=True, color='#2b8cbe')

# Add reference lines
mean_vel = fastballs_ff['release_speed'].mean()
median_vel = fastballs_ff['release_speed'].median()
plt.axvline(mean_vel, color='#d7301f', linestyle='--', linewidth=2,
            label=f'Mean: {mean_vel:.1f} mph')
plt.axvline(median_vel, color='#fc8d59', linestyle='--', linewidth=2,
            label=f'Median: {median_vel:.1f} mph')

plt.xlabel('Velocity (mph)', fontweight='bold')
plt.ylabel('Count', fontweight='bold')
plt.title('Distribution of Four-Seam Fastball Velocities',
          fontweight='bold', fontsize=14, pad=20)
plt.legend()
plt.tight_layout()
plt.show()

Line Chart: Season Performance Over Time

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Simulate game-by-game data
np.random.seed(123)
games = 150
ohtani_games = pd.DataFrame({
    'game_num': range(1, games + 1),
    'hits': np.random.binomial(4, 0.31, games),
    'ab': np.random.choice([3, 4, 5], games)
})

# Calculate cumulative and rolling averages
ohtani_games['cumulative_hits'] = ohtani_games['hits'].cumsum()
ohtani_games['cumulative_ab'] = ohtani_games['ab'].cumsum()
ohtani_games['avg'] = ohtani_games['cumulative_hits'] / ohtani_games['cumulative_ab']
ohtani_games['rolling_avg'] = (ohtani_games['hits'] / ohtani_games['ab']).rolling(
    window=10, min_periods=1
).mean()

# Create line chart
fig, ax = plt.subplots(figsize=(14, 7))

# Plot lines
ax.plot(ohtani_games['game_num'], ohtani_games['avg'],
        color='#2b8cbe', linewidth=2, alpha=0.6, label='Season Average')
ax.plot(ohtani_games['game_num'], ohtani_games['rolling_avg'],
        color='#d7301f', linewidth=2, label='10-Game Rolling Average')

# League average reference
ax.axhline(y=0.250, color='gray', linestyle='--', linewidth=1.5, alpha=0.7)
ax.text(130, 0.255, 'League Avg (.250)', color='gray', fontsize=10)

# Customize
ax.set_xlabel('Game Number', fontsize=12, fontweight='bold')
ax.set_ylabel('Batting Average', fontsize=12, fontweight='bold')
ax.set_title("Shohei Ohtani's 2024 Batting Average Progression",
             fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='lower right', frameon=True, fontsize=10)
ax.grid(True, alpha=0.3)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_ylim([0.15, 0.45])

plt.figtext(0.99, 0.01, 'Data: Simulated for illustration',
            ha='right', fontsize=8, style='italic', color='gray')
plt.tight_layout()
plt.show()

Faceted Plots: Comparing Multiple Players

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# Create synthetic batted ball data for multiple players
np.random.seed(42)
players = ['Shohei Ohtani', 'Aaron Judge', 'Mookie Betts']
n_points = 200

data_list = []
for player in players:
    launch_angles = np.linspace(-50, 60, n_points)
    if player == 'Aaron Judge':
        exit_vels = 85 + 20 * np.exp(-((launch_angles - 15)**2) / 400) + np.random.normal(0, 5, n_points)
    elif player == 'Mookie Betts':
        exit_vels = 82 + 18 * np.exp(-((launch_angles - 12)**2) / 350) + np.random.normal(0, 5, n_points)
    else:  # Ohtani
        exit_vels = 84 + 19 * np.exp(-((launch_angles - 14)**2) / 380) + np.random.normal(0, 5, n_points)

    data_list.append(pd.DataFrame({
        'player': player,
        'launch_angle': launch_angles,
        'launch_speed': exit_vels
    }))

batted_balls = pd.concat(data_list, ignore_index=True)

# Create faceted plot using Seaborn
g = sns.FacetGrid(batted_balls, col='player', col_wrap=3, height=5, aspect=1.2)
g.map(sns.scatterplot, 'launch_angle', 'launch_speed', alpha=0.3, color='#2b8cbe')

# Add smooth line to each facet
g.map(sns.regplot, 'launch_angle', 'launch_speed',
      scatter=False, lowess=True, color='#d7301f', line_kws={'linewidth': 2})

# Add barrel zone to each facet
for ax in g.axes.flat:
    from matplotlib.patches import Rectangle
    barrel = Rectangle((15, 95), 20, 20, fill=False,
                       edgecolor='black', linestyle='--', linewidth=1.5)
    ax.add_patch(barrel)

# Customize
g.set_axis_labels('Launch Angle (degrees)', 'Exit Velocity (mph)', fontweight='bold')
g.set_titles('{col_name}', fontweight='bold', fontsize=12)
g.fig.suptitle('Batted Ball Profiles: Elite Hitters Compared',
               fontsize=14, fontweight='bold', y=1.02)
g.fig.text(0.5, -0.02, 'Dashed box = barrel zone (95+ mph, 15-35°)',
           ha='center', fontsize=9, style='italic')

plt.tight_layout()
plt.show()

Alternative using Matplotlib subplots:

fig, axes = plt.subplots(1, 3, figsize=(15, 5), sharey=True)

for idx, (player, ax) in enumerate(zip(players, axes)):
    player_data = batted_balls[batted_balls['player'] == player]

    # Scatter plot
    ax.scatter(player_data['launch_angle'], player_data['launch_speed'],
               alpha=0.3, color='#2b8cbe', s=30)

    # Smooth line
    from scipy.signal import savgol_filter
    sorted_data = player_data.sort_values('launch_angle')
    smoothed = savgol_filter(sorted_data['launch_speed'], 51, 3)
    ax.plot(sorted_data['launch_angle'], smoothed,
            color='#d7301f', linewidth=2)

    # Barrel zone
    from matplotlib.patches import Rectangle
    barrel = Rectangle((15, 95), 20, 20, fill=False,
                       edgecolor='black', linestyle='--', linewidth=1.5)
    ax.add_patch(barrel)

    # Customize
    ax.set_xlabel('Launch Angle (degrees)', fontweight='bold')
    if idx == 0:
        ax.set_ylabel('Exit Velocity (mph)', fontweight='bold')
    ax.set_title(player, fontweight='bold', fontsize=12)
    ax.grid(True, alpha=0.3)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)

fig.suptitle('Batted Ball Profiles: Elite Hitters Compared',
             fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()
Python
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Create figure and axes
fig, ax = plt.subplots(figsize=(10, 6))

# Sample data
player_stats = pd.DataFrame({
    'player': ['Ohtani', 'Judge', 'Betts', 'Acuna', 'Trout'],
    'hr': [44, 62, 35, 41, 40],
    'avg': [.304, .311, .307, .337, .283]
})

# Create scatter plot
ax.scatter(player_stats['avg'], player_stats['hr'],
           s=100, alpha=0.6, color='#2b8cbe')

# Customize
ax.set_xlabel('Batting Average', fontsize=12, fontweight='bold')
ax.set_ylabel('Home Runs', fontsize=12, fontweight='bold')
ax.set_title('Home Runs vs Batting Average',
             fontsize=14, fontweight='bold', pad=20)
ax.grid(True, alpha=0.3)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.show()
Python
import seaborn as sns
import matplotlib.pyplot as plt

# Set style
sns.set_style("whitegrid")
sns.set_palette("husl")

# The same scatter plot in Seaborn
plt.figure(figsize=(10, 6))
sns.scatterplot(data=player_stats, x='avg', y='hr', s=150, alpha=0.7)
plt.xlabel('Batting Average', fontweight='bold')
plt.ylabel('Home Runs', fontweight='bold')
plt.title('Home Runs vs Batting Average', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()
Python
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from pybaseball import statcast
from datetime import datetime

# Get Statcast data for Shohei Ohtani
# Note: pybaseball uses different ID system than R's baseballr
ohtani_data = statcast(
    start_dt='2024-04-01',
    end_dt='2024-09-30'
)

# Filter for Ohtani's batted balls
# You'd need to filter by player_name or use playerid_lookup
ohtani_batted = ohtani_data[
    (ohtani_data['player_name'] == 'Ohtani, Shohei') &
    (ohtani_data['launch_speed'].notna()) &
    (ohtani_data['launch_angle'].notna())
].copy()

# Categorize outcomes
def categorize_outcome(row):
    event = row['events']
    if pd.isna(event):
        return 'Other'
    elif event == 'home_run':
        return 'Home Run'
    elif event in ['double', 'triple']:
        return 'Extra Base Hit'
    elif event == 'single':
        return 'Single'
    elif 'out' in event:
        return 'Out'
    else:
        return 'Other'

ohtani_batted['outcome'] = ohtani_batted.apply(categorize_outcome, axis=1)

# Create the plot
fig, ax = plt.subplots(figsize=(12, 8))

# Define colors
colors = {
    'Home Run': '#FF0000',
    'Extra Base Hit': '#FF8C00',
    'Single': '#FFD700',
    'Out': '#1E90FF',
    'Other': '#808080'
}

# Plot each outcome type
for outcome, color in colors.items():
    data = ohtani_batted[ohtani_batted['outcome'] == outcome]
    ax.scatter(data['launch_angle'], data['launch_speed'],
               c=color, label=outcome, alpha=0.6, s=50, edgecolors='none')

# Add barrel zone rectangle
from matplotlib.patches import Rectangle
barrel_zone = Rectangle((15, 95), 20, 20,
                         fill=False, edgecolor='black',
                         linestyle='--', linewidth=2)
ax.add_patch(barrel_zone)

# Add vertical reference line at 0 degrees
ax.axvline(x=0, color='gray', linestyle=':', alpha=0.5)

# Customize plot
ax.set_xlabel('Launch Angle (degrees)', fontsize=12, fontweight='bold')
ax.set_ylabel('Exit Velocity (mph)', fontsize=12, fontweight='bold')
ax.set_title("Shohei Ohtani's Batted Ball Profile - 2024 Season",
             fontsize=14, fontweight='bold', pad=20)
ax.text(17, 92, 'Barrel Zone', fontsize=9, style='italic')
ax.legend(loc='upper right', frameon=True, framealpha=0.9)
ax.grid(True, alpha=0.3)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.figtext(0.99, 0.01, 'Data: MLB Statcast via pybaseball',
            ha='right', fontsize=8, style='italic', color='gray')
plt.tight_layout()
plt.show()
Python
# Seaborn version
plt.figure(figsize=(12, 8))
sns.scatterplot(
    data=ohtani_batted,
    x='launch_angle',
    y='launch_speed',
    hue='outcome',
    palette=colors,
    alpha=0.6,
    s=80,
    edgecolor='none'
)

# Add barrel zone
from matplotlib.patches import Rectangle
ax = plt.gca()
barrel_zone = Rectangle((15, 95), 20, 20,
                         fill=False, edgecolor='black',
                         linestyle='--', linewidth=2)
ax.add_patch(barrel_zone)

plt.axvline(x=0, color='gray', linestyle=':', alpha=0.5)
plt.xlabel('Launch Angle (degrees)', fontweight='bold')
plt.ylabel('Exit Velocity (mph)', fontweight='bold')
plt.title("Shohei Ohtani's Batted Ball Profile - 2024 Season",
          fontweight='bold', fontsize=14, pad=20)
plt.legend(title='Outcome', loc='upper right', frameon=True)
plt.tight_layout()
plt.show()
Python
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Sample team HR data (2023 season)
team_hr = pd.DataFrame({
    'team': ['Braves', 'Dodgers', 'Astros', 'Rangers', 'Blue Jays',
             'Yankees', 'Phillies', 'Rays', 'Orioles', 'Diamondbacks'],
    'hr': [307, 280, 239, 234, 231, 224, 223, 215, 214, 212]
})

# Sort for plotting
team_hr_sorted = team_hr.sort_values('hr')

# Create horizontal bar chart
fig, ax = plt.subplots(figsize=(10, 6))

# Create bars with gradient color
colors_grad = plt.cm.Blues(np.linspace(0.4, 0.9, len(team_hr_sorted)))
bars = ax.barh(team_hr_sorted['team'], team_hr_sorted['hr'], color=colors_grad)

# Add value labels
for i, (idx, row) in enumerate(team_hr_sorted.iterrows()):
    ax.text(row['hr'] + 3, i, str(row['hr']),
            va='center', fontweight='bold', fontsize=10)

# Customize
ax.set_xlabel('Home Runs', fontsize=12, fontweight='bold')
ax.set_title('Top 10 MLB Teams by Home Runs - 2023 Season',
             fontsize=14, fontweight='bold', pad=20)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.tick_params(left=False)
ax.set_axisbelow(True)
ax.grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.show()
Python
plt.figure(figsize=(10, 6))
sns.barplot(
    data=team_hr_sorted,
    y='team',
    x='hr',
    palette='Blues_r',
    orient='h'
)
plt.xlabel('Home Runs', fontweight='bold')
plt.title('Top 10 MLB Teams by Home Runs - 2023 Season',
          fontweight='bold', fontsize=14, pad=20)
plt.tight_layout()
plt.show()
Python
import matplotlib.pyplot as plt
import seaborn as sns
from pybaseball import statcast
import numpy as np

# Get fastball data
fastballs = statcast(start_dt='2024-06-01', end_dt='2024-06-07')
fastballs_ff = fastballs[
    (fastballs['pitch_type'] == 'FF') &
    (fastballs['release_speed'].notna()) &
    (fastballs['release_speed'] > 70) &
    (fastballs['release_speed'] < 105)
]

# Create histogram with KDE
fig, ax = plt.subplots(figsize=(12, 7))

# Histogram
ax.hist(fastballs_ff['release_speed'], bins=30,
        density=True, alpha=0.7, color='#2b8cbe', edgecolor='white')

# KDE (kernel density estimate)
from scipy import stats
density = stats.gaussian_kde(fastballs_ff['release_speed'])
xs = np.linspace(fastballs_ff['release_speed'].min(),
                 fastballs_ff['release_speed'].max(), 200)
ax.plot(xs, density(xs), color='#08519c', linewidth=2.5, label='Density')

# Add mean and median lines
mean_vel = fastballs_ff['release_speed'].mean()
median_vel = fastballs_ff['release_speed'].median()

ax.axvline(mean_vel, color='#d7301f', linestyle='--', linewidth=2,
           label=f'Mean: {mean_vel:.1f} mph')
ax.axvline(median_vel, color='#fc8d59', linestyle='--', linewidth=2,
           label=f'Median: {median_vel:.1f} mph')

# Customize
ax.set_xlabel('Velocity (mph)', fontsize=12, fontweight='bold')
ax.set_ylabel('Density', fontsize=12, fontweight='bold')
ax.set_title('Distribution of Four-Seam Fastball Velocities',
             fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='upper left', frameon=True)
ax.grid(True, alpha=0.3, axis='y')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.figtext(0.99, 0.01, 'Data: MLB Statcast | June 1-7, 2024',
            ha='right', fontsize=8, style='italic', color='gray')
plt.tight_layout()
plt.show()
Python
plt.figure(figsize=(12, 7))
sns.histplot(data=fastballs_ff, x='release_speed',
             bins=30, kde=True, color='#2b8cbe')

# Add reference lines
mean_vel = fastballs_ff['release_speed'].mean()
median_vel = fastballs_ff['release_speed'].median()
plt.axvline(mean_vel, color='#d7301f', linestyle='--', linewidth=2,
            label=f'Mean: {mean_vel:.1f} mph')
plt.axvline(median_vel, color='#fc8d59', linestyle='--', linewidth=2,
            label=f'Median: {median_vel:.1f} mph')

plt.xlabel('Velocity (mph)', fontweight='bold')
plt.ylabel('Count', fontweight='bold')
plt.title('Distribution of Four-Seam Fastball Velocities',
          fontweight='bold', fontsize=14, pad=20)
plt.legend()
plt.tight_layout()
plt.show()
Python
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Simulate game-by-game data
np.random.seed(123)
games = 150
ohtani_games = pd.DataFrame({
    'game_num': range(1, games + 1),
    'hits': np.random.binomial(4, 0.31, games),
    'ab': np.random.choice([3, 4, 5], games)
})

# Calculate cumulative and rolling averages
ohtani_games['cumulative_hits'] = ohtani_games['hits'].cumsum()
ohtani_games['cumulative_ab'] = ohtani_games['ab'].cumsum()
ohtani_games['avg'] = ohtani_games['cumulative_hits'] / ohtani_games['cumulative_ab']
ohtani_games['rolling_avg'] = (ohtani_games['hits'] / ohtani_games['ab']).rolling(
    window=10, min_periods=1
).mean()

# Create line chart
fig, ax = plt.subplots(figsize=(14, 7))

# Plot lines
ax.plot(ohtani_games['game_num'], ohtani_games['avg'],
        color='#2b8cbe', linewidth=2, alpha=0.6, label='Season Average')
ax.plot(ohtani_games['game_num'], ohtani_games['rolling_avg'],
        color='#d7301f', linewidth=2, label='10-Game Rolling Average')

# League average reference
ax.axhline(y=0.250, color='gray', linestyle='--', linewidth=1.5, alpha=0.7)
ax.text(130, 0.255, 'League Avg (.250)', color='gray', fontsize=10)

# Customize
ax.set_xlabel('Game Number', fontsize=12, fontweight='bold')
ax.set_ylabel('Batting Average', fontsize=12, fontweight='bold')
ax.set_title("Shohei Ohtani's 2024 Batting Average Progression",
             fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='lower right', frameon=True, fontsize=10)
ax.grid(True, alpha=0.3)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_ylim([0.15, 0.45])

plt.figtext(0.99, 0.01, 'Data: Simulated for illustration',
            ha='right', fontsize=8, style='italic', color='gray')
plt.tight_layout()
plt.show()
Python
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# Create synthetic batted ball data for multiple players
np.random.seed(42)
players = ['Shohei Ohtani', 'Aaron Judge', 'Mookie Betts']
n_points = 200

data_list = []
for player in players:
    launch_angles = np.linspace(-50, 60, n_points)
    if player == 'Aaron Judge':
        exit_vels = 85 + 20 * np.exp(-((launch_angles - 15)**2) / 400) + np.random.normal(0, 5, n_points)
    elif player == 'Mookie Betts':
        exit_vels = 82 + 18 * np.exp(-((launch_angles - 12)**2) / 350) + np.random.normal(0, 5, n_points)
    else:  # Ohtani
        exit_vels = 84 + 19 * np.exp(-((launch_angles - 14)**2) / 380) + np.random.normal(0, 5, n_points)

    data_list.append(pd.DataFrame({
        'player': player,
        'launch_angle': launch_angles,
        'launch_speed': exit_vels
    }))

batted_balls = pd.concat(data_list, ignore_index=True)

# Create faceted plot using Seaborn
g = sns.FacetGrid(batted_balls, col='player', col_wrap=3, height=5, aspect=1.2)
g.map(sns.scatterplot, 'launch_angle', 'launch_speed', alpha=0.3, color='#2b8cbe')

# Add smooth line to each facet
g.map(sns.regplot, 'launch_angle', 'launch_speed',
      scatter=False, lowess=True, color='#d7301f', line_kws={'linewidth': 2})

# Add barrel zone to each facet
for ax in g.axes.flat:
    from matplotlib.patches import Rectangle
    barrel = Rectangle((15, 95), 20, 20, fill=False,
                       edgecolor='black', linestyle='--', linewidth=1.5)
    ax.add_patch(barrel)

# Customize
g.set_axis_labels('Launch Angle (degrees)', 'Exit Velocity (mph)', fontweight='bold')
g.set_titles('{col_name}', fontweight='bold', fontsize=12)
g.fig.suptitle('Batted Ball Profiles: Elite Hitters Compared',
               fontsize=14, fontweight='bold', y=1.02)
g.fig.text(0.5, -0.02, 'Dashed box = barrel zone (95+ mph, 15-35°)',
           ha='center', fontsize=9, style='italic')

plt.tight_layout()
plt.show()
Python
fig, axes = plt.subplots(1, 3, figsize=(15, 5), sharey=True)

for idx, (player, ax) in enumerate(zip(players, axes)):
    player_data = batted_balls[batted_balls['player'] == player]

    # Scatter plot
    ax.scatter(player_data['launch_angle'], player_data['launch_speed'],
               alpha=0.3, color='#2b8cbe', s=30)

    # Smooth line
    from scipy.signal import savgol_filter
    sorted_data = player_data.sort_values('launch_angle')
    smoothed = savgol_filter(sorted_data['launch_speed'], 51, 3)
    ax.plot(sorted_data['launch_angle'], smoothed,
            color='#d7301f', linewidth=2)

    # Barrel zone
    from matplotlib.patches import Rectangle
    barrel = Rectangle((15, 95), 20, 20, fill=False,
                       edgecolor='black', linestyle='--', linewidth=1.5)
    ax.add_patch(barrel)

    # Customize
    ax.set_xlabel('Launch Angle (degrees)', fontweight='bold')
    if idx == 0:
        ax.set_ylabel('Exit Velocity (mph)', fontweight='bold')
    ax.set_title(player, fontweight='bold', fontsize=12)
    ax.grid(True, alpha=0.3)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)

fig.suptitle('Batted Ball Profiles: Elite Hitters Compared',
             fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

4.4 Interactive Visualizations

Static plots are excellent for publication and analysis, but interactive visualizations add another dimension: exploration. Users can hover for details, zoom into regions of interest, and filter data dynamically.

4.4.1 plotly in R and Python {#plotly-basics}

Plotly creates interactive, web-based visualizations that work in notebooks, R Markdown documents, and standalone HTML files.

R with plotly:

library(plotly)
library(dplyr)

# Convert ggplot to plotly (easiest method)
library(ggplot2)

# Create a ggplot
p <- ggplot(ohtani_batted, aes(x = launch_angle, y = launch_speed,
                                color = outcome, text = paste(
                                  "LA:", round(launch_angle, 1), "°<br>",
                                  "EV:", round(launch_speed, 1), "mph<br>",
                                  "Outcome:", outcome
                                ))) +
  geom_point(alpha = 0.6, size = 2) +
  scale_color_manual(values = c(
    "Home Run" = "#FF0000", "Extra Base Hit" = "#FF8C00",
    "Single" = "#FFD700", "Out" = "#1E90FF", "Other" = "#808080"
  )) +
  labs(title = "Shohei Ohtani's Batted Balls - 2024 (Interactive)",
       x = "Launch Angle (degrees)",
       y = "Exit Velocity (mph)") +
  theme_minimal()

# Convert to plotly
ggplotly(p, tooltip = "text")

Native plotly in R (more control):

library(plotly)

plot_ly(ohtani_batted,
        x = ~launch_angle,
        y = ~launch_speed,
        color = ~outcome,
        colors = c("Home Run" = "#FF0000", "Extra Base Hit" = "#FF8C00",
                   "Single" = "#FFD700", "Out" = "#1E90FF", "Other" = "#808080"),
        type = 'scatter',
        mode = 'markers',
        marker = list(size = 8, opacity = 0.6),
        hovertemplate = paste(
          '<b>%{customdata}</b><br>',
          'Launch Angle: %{x:.1f}°<br>',
          'Exit Velocity: %{y:.1f} mph<br>',
          '<extra></extra>'
        ),
        customdata = ~outcome) %>%
  layout(
    title = "Shohei Ohtani's Batted Ball Profile - 2024 (Interactive)",
    xaxis = list(title = "Launch Angle (degrees)", showgrid = TRUE),
    yaxis = list(title = "Exit Velocity (mph)", showgrid = TRUE),
    hovermode = 'closest',
    # Add barrel zone as shape
    shapes = list(
      list(
        type = "rect",
        x0 = 15, x1 = 35, y0 = 95, y1 = 115,
        line = list(color = "black", dash = "dash"),
        fillcolor = "transparent"
      )
    )
  )

Python with plotly:

import plotly.express as px
import plotly.graph_objects as go

# Using plotly express (high-level)
fig = px.scatter(
    ohtani_batted,
    x='launch_angle',
    y='launch_speed',
    color='outcome',
    color_discrete_map={
        'Home Run': '#FF0000',
        'Extra Base Hit': '#FF8C00',
        'Single': '#FFD700',
        'Out': '#1E90FF',
        'Other': '#808080'
    },
    hover_data={
        'launch_angle': ':.1f',
        'launch_speed': ':.1f',
        'outcome': True
    },
    title="Shohei Ohtani's Batted Ball Profile - 2024 (Interactive)",
    labels={
        'launch_angle': 'Launch Angle (degrees)',
        'launch_speed': 'Exit Velocity (mph)',
        'outcome': 'Outcome'
    }
)

# Add barrel zone
fig.add_shape(
    type="rect",
    x0=15, y0=95, x1=35, y1=115,
    line=dict(color="black", dash="dash", width=2),
    fillcolor="rgba(0,0,0,0)"
)

# Update layout
fig.update_layout(
    hovermode='closest',
    plot_bgcolor='white',
    font=dict(size=11)
)

fig.show()

Using plotly graph objects (more control):

import plotly.graph_objects as go

# Create traces for each outcome
fig = go.Figure()

colors = {
    'Home Run': '#FF0000',
    'Extra Base Hit': '#FF8C00',
    'Single': '#FFD700',
    'Out': '#1E90FF',
    'Other': '#808080'
}

for outcome, color in colors.items():
    data = ohtani_batted[ohtani_batted['outcome'] == outcome]
    fig.add_trace(go.Scatter(
        x=data['launch_angle'],
        y=data['launch_speed'],
        mode='markers',
        name=outcome,
        marker=dict(
            color=color,
            size=8,
            opacity=0.6,
            line=dict(width=0)
        ),
        hovertemplate=(
            '<b>%{fullData.name}</b><br>' +
            'Launch Angle: %{x:.1f}°<br>' +
            'Exit Velocity: %{y:.1f} mph<br>' +
            '<extra></extra>'
        )
    ))

# Add barrel zone
fig.add_shape(
    type="rect",
    x0=15, y0=95, x1=35, y1=115,
    line=dict(color="black", dash="dash", width=2),
    fillcolor="rgba(0,0,0,0)"
)

# Update layout
fig.update_layout(
    title="Shohei Ohtani's Batted Ball Profile - 2024 (Interactive)",
    xaxis_title="Launch Angle (degrees)",
    yaxis_title="Exit Velocity (mph)",
    hovermode='closest',
    plot_bgcolor='white',
    font=dict(size=11),
    legend=dict(
        yanchor="top",
        y=0.99,
        xanchor="right",
        x=0.99
    )
)

fig.show()

4.4.2 Hover Tooltips for Player Data {#hover-tooltips}

Rich tooltips make interactive plots powerful. Here's an example with detailed player information:

import plotly.graph_objects as go
import pandas as pd

# Sample data with multiple attributes
player_season_stats = pd.DataFrame({
    'player': ['Ohtani', 'Judge', 'Betts', 'Acuna', 'Trout', 'Soto', 'Freeman'],
    'hr': [44, 62, 35, 41, 40, 35, 29],
    'avg': [.304, .311, .307, .337, .283, .275, .331],
    'ops': [.965, 1.111, .913, 1.012, .962, .932, .973],
    'sb': [20, 3, 23, 73, 8, 7, 13],
    'war': [8.1, 10.6, 7.9, 8.3, 7.2, 6.1, 6.5]
})

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=player_season_stats['avg'],
    y=player_season_stats['hr'],
    mode='markers',
    marker=dict(
        size=player_season_stats['war'] * 5,  # Size by WAR
        color=player_season_stats['ops'],     # Color by OPS
        colorscale='Viridis',
        showscale=True,
        colorbar=dict(title="OPS"),
        line=dict(width=1, color='white')
    ),
    text=player_season_stats['player'],
    customdata=player_season_stats[['player', 'hr', 'avg', 'ops', 'sb', 'war']],
    hovertemplate=(
        '<b>%{customdata[0]}</b><br><br>' +
        'AVG: %{customdata[2]:.3f}<br>' +
        'HR: %{customdata[1]}<br>' +
        'OPS: %{customdata[3]:.3f}<br>' +
        'SB: %{customdata[4]}<br>' +
        'WAR: %{customdata[5]:.1f}<br>' +
        '<extra></extra>'
    )
))

fig.update_layout(
    title="MLB Elite Hitters - 2024 Season<br><sub>Bubble size = WAR, Color = OPS</sub>",
    xaxis_title="Batting Average",
    yaxis_title="Home Runs",
    hovermode='closest',
    plot_bgcolor='white',
    font=dict(size=12)
)

fig.show()

R equivalent:

library(plotly)

player_season_stats <- data.frame(
  player = c("Ohtani", "Judge", "Betts", "Acuna", "Trout", "Soto", "Freeman"),
  hr = c(44, 62, 35, 41, 40, 35, 29),
  avg = c(.304, .311, .307, .337, .283, .275, .331),
  ops = c(.965, 1.111, .913, 1.012, .962, .932, .973),
  sb = c(20, 3, 23, 73, 8, 7, 13),
  war = c(8.1, 10.6, 7.9, 8.3, 7.2, 6.1, 6.5)
)

plot_ly(player_season_stats,
        x = ~avg,
        y = ~hr,
        type = 'scatter',
        mode = 'markers',
        marker = list(
          size = ~war * 5,
          color = ~ops,
          colorscale = 'Viridis',
          showscale = TRUE,
          colorbar = list(title = "OPS"),
          line = list(width = 1, color = 'white')
        ),
        text = ~player,
        hovertemplate = paste(
          '<b>%{text}</b><br><br>',
          'AVG: %{x:.3f}<br>',
          'HR: %{y}<br>',
          'OPS: %{marker.color:.3f}<br>',
          'WAR: %{marker.size::.1f}<br>',
          '<extra></extra>'
        )) %>%
  layout(
    title = "MLB Elite Hitters - 2024 Season<br><sub>Bubble size = WAR, Color = OPS</sub>",
    xaxis = list(title = "Batting Average"),
    yaxis = list(title = "Home Runs"),
    hovermode = 'closest',
    plot_bgcolor = 'white'
  )

4.4.3 Zooming and Filtering {#zooming-filtering}

Plotly automatically includes zoom, pan, and selection tools. You can also add custom controls:

import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Create figure with dropdown menu to filter by outcome
fig = go.Figure()

# Add traces for all outcomes (initially visible)
for outcome, color in colors.items():
    data = ohtani_batted[ohtani_batted['outcome'] == outcome]
    fig.add_trace(go.Scatter(
        x=data['launch_angle'],
        y=data['launch_speed'],
        mode='markers',
        name=outcome,
        marker=dict(color=color, size=8, opacity=0.6),
        visible=True
    ))

# Create buttons for dropdown
buttons = [
    dict(
        label="All",
        method="update",
        args=[{"visible": [True] * len(colors)}]
    )
]

# Add button for each outcome
for i, outcome in enumerate(colors.keys()):
    visibility = [i == j for j in range(len(colors))]
    buttons.append(
        dict(
            label=outcome,
            method="update",
            args=[{"visible": visibility}]
        )
    )

# Update layout with dropdown
fig.update_layout(
    updatemenus=[
        dict(
            buttons=buttons,
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.11,
            xanchor="left",
            y=1.15,
            yanchor="top"
        )
    ],
    title="Ohtani's Batted Balls - Filter by Outcome",
    xaxis_title="Launch Angle (degrees)",
    yaxis_title="Exit Velocity (mph)"
)

fig.show()
R
library(plotly)
library(dplyr)

# Convert ggplot to plotly (easiest method)
library(ggplot2)

# Create a ggplot
p <- ggplot(ohtani_batted, aes(x = launch_angle, y = launch_speed,
                                color = outcome, text = paste(
                                  "LA:", round(launch_angle, 1), "°<br>",
                                  "EV:", round(launch_speed, 1), "mph<br>",
                                  "Outcome:", outcome
                                ))) +
  geom_point(alpha = 0.6, size = 2) +
  scale_color_manual(values = c(
    "Home Run" = "#FF0000", "Extra Base Hit" = "#FF8C00",
    "Single" = "#FFD700", "Out" = "#1E90FF", "Other" = "#808080"
  )) +
  labs(title = "Shohei Ohtani's Batted Balls - 2024 (Interactive)",
       x = "Launch Angle (degrees)",
       y = "Exit Velocity (mph)") +
  theme_minimal()

# Convert to plotly
ggplotly(p, tooltip = "text")
R
library(plotly)

plot_ly(ohtani_batted,
        x = ~launch_angle,
        y = ~launch_speed,
        color = ~outcome,
        colors = c("Home Run" = "#FF0000", "Extra Base Hit" = "#FF8C00",
                   "Single" = "#FFD700", "Out" = "#1E90FF", "Other" = "#808080"),
        type = 'scatter',
        mode = 'markers',
        marker = list(size = 8, opacity = 0.6),
        hovertemplate = paste(
          '<b>%{customdata}</b><br>',
          'Launch Angle: %{x:.1f}°<br>',
          'Exit Velocity: %{y:.1f} mph<br>',
          '<extra></extra>'
        ),
        customdata = ~outcome) %>%
  layout(
    title = "Shohei Ohtani's Batted Ball Profile - 2024 (Interactive)",
    xaxis = list(title = "Launch Angle (degrees)", showgrid = TRUE),
    yaxis = list(title = "Exit Velocity (mph)", showgrid = TRUE),
    hovermode = 'closest',
    # Add barrel zone as shape
    shapes = list(
      list(
        type = "rect",
        x0 = 15, x1 = 35, y0 = 95, y1 = 115,
        line = list(color = "black", dash = "dash"),
        fillcolor = "transparent"
      )
    )
  )
R
library(plotly)

player_season_stats <- data.frame(
  player = c("Ohtani", "Judge", "Betts", "Acuna", "Trout", "Soto", "Freeman"),
  hr = c(44, 62, 35, 41, 40, 35, 29),
  avg = c(.304, .311, .307, .337, .283, .275, .331),
  ops = c(.965, 1.111, .913, 1.012, .962, .932, .973),
  sb = c(20, 3, 23, 73, 8, 7, 13),
  war = c(8.1, 10.6, 7.9, 8.3, 7.2, 6.1, 6.5)
)

plot_ly(player_season_stats,
        x = ~avg,
        y = ~hr,
        type = 'scatter',
        mode = 'markers',
        marker = list(
          size = ~war * 5,
          color = ~ops,
          colorscale = 'Viridis',
          showscale = TRUE,
          colorbar = list(title = "OPS"),
          line = list(width = 1, color = 'white')
        ),
        text = ~player,
        hovertemplate = paste(
          '<b>%{text}</b><br><br>',
          'AVG: %{x:.3f}<br>',
          'HR: %{y}<br>',
          'OPS: %{marker.color:.3f}<br>',
          'WAR: %{marker.size::.1f}<br>',
          '<extra></extra>'
        )) %>%
  layout(
    title = "MLB Elite Hitters - 2024 Season<br><sub>Bubble size = WAR, Color = OPS</sub>",
    xaxis = list(title = "Batting Average"),
    yaxis = list(title = "Home Runs"),
    hovermode = 'closest',
    plot_bgcolor = 'white'
  )
Python
import plotly.express as px
import plotly.graph_objects as go

# Using plotly express (high-level)
fig = px.scatter(
    ohtani_batted,
    x='launch_angle',
    y='launch_speed',
    color='outcome',
    color_discrete_map={
        'Home Run': '#FF0000',
        'Extra Base Hit': '#FF8C00',
        'Single': '#FFD700',
        'Out': '#1E90FF',
        'Other': '#808080'
    },
    hover_data={
        'launch_angle': ':.1f',
        'launch_speed': ':.1f',
        'outcome': True
    },
    title="Shohei Ohtani's Batted Ball Profile - 2024 (Interactive)",
    labels={
        'launch_angle': 'Launch Angle (degrees)',
        'launch_speed': 'Exit Velocity (mph)',
        'outcome': 'Outcome'
    }
)

# Add barrel zone
fig.add_shape(
    type="rect",
    x0=15, y0=95, x1=35, y1=115,
    line=dict(color="black", dash="dash", width=2),
    fillcolor="rgba(0,0,0,0)"
)

# Update layout
fig.update_layout(
    hovermode='closest',
    plot_bgcolor='white',
    font=dict(size=11)
)

fig.show()
Python
import plotly.graph_objects as go

# Create traces for each outcome
fig = go.Figure()

colors = {
    'Home Run': '#FF0000',
    'Extra Base Hit': '#FF8C00',
    'Single': '#FFD700',
    'Out': '#1E90FF',
    'Other': '#808080'
}

for outcome, color in colors.items():
    data = ohtani_batted[ohtani_batted['outcome'] == outcome]
    fig.add_trace(go.Scatter(
        x=data['launch_angle'],
        y=data['launch_speed'],
        mode='markers',
        name=outcome,
        marker=dict(
            color=color,
            size=8,
            opacity=0.6,
            line=dict(width=0)
        ),
        hovertemplate=(
            '<b>%{fullData.name}</b><br>' +
            'Launch Angle: %{x:.1f}°<br>' +
            'Exit Velocity: %{y:.1f} mph<br>' +
            '<extra></extra>'
        )
    ))

# Add barrel zone
fig.add_shape(
    type="rect",
    x0=15, y0=95, x1=35, y1=115,
    line=dict(color="black", dash="dash", width=2),
    fillcolor="rgba(0,0,0,0)"
)

# Update layout
fig.update_layout(
    title="Shohei Ohtani's Batted Ball Profile - 2024 (Interactive)",
    xaxis_title="Launch Angle (degrees)",
    yaxis_title="Exit Velocity (mph)",
    hovermode='closest',
    plot_bgcolor='white',
    font=dict(size=11),
    legend=dict(
        yanchor="top",
        y=0.99,
        xanchor="right",
        x=0.99
    )
)

fig.show()
Python
import plotly.graph_objects as go
import pandas as pd

# Sample data with multiple attributes
player_season_stats = pd.DataFrame({
    'player': ['Ohtani', 'Judge', 'Betts', 'Acuna', 'Trout', 'Soto', 'Freeman'],
    'hr': [44, 62, 35, 41, 40, 35, 29],
    'avg': [.304, .311, .307, .337, .283, .275, .331],
    'ops': [.965, 1.111, .913, 1.012, .962, .932, .973],
    'sb': [20, 3, 23, 73, 8, 7, 13],
    'war': [8.1, 10.6, 7.9, 8.3, 7.2, 6.1, 6.5]
})

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=player_season_stats['avg'],
    y=player_season_stats['hr'],
    mode='markers',
    marker=dict(
        size=player_season_stats['war'] * 5,  # Size by WAR
        color=player_season_stats['ops'],     # Color by OPS
        colorscale='Viridis',
        showscale=True,
        colorbar=dict(title="OPS"),
        line=dict(width=1, color='white')
    ),
    text=player_season_stats['player'],
    customdata=player_season_stats[['player', 'hr', 'avg', 'ops', 'sb', 'war']],
    hovertemplate=(
        '<b>%{customdata[0]}</b><br><br>' +
        'AVG: %{customdata[2]:.3f}<br>' +
        'HR: %{customdata[1]}<br>' +
        'OPS: %{customdata[3]:.3f}<br>' +
        'SB: %{customdata[4]}<br>' +
        'WAR: %{customdata[5]:.1f}<br>' +
        '<extra></extra>'
    )
))

fig.update_layout(
    title="MLB Elite Hitters - 2024 Season<br><sub>Bubble size = WAR, Color = OPS</sub>",
    xaxis_title="Batting Average",
    yaxis_title="Home Runs",
    hovermode='closest',
    plot_bgcolor='white',
    font=dict(size=12)
)

fig.show()
Python
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Create figure with dropdown menu to filter by outcome
fig = go.Figure()

# Add traces for all outcomes (initially visible)
for outcome, color in colors.items():
    data = ohtani_batted[ohtani_batted['outcome'] == outcome]
    fig.add_trace(go.Scatter(
        x=data['launch_angle'],
        y=data['launch_speed'],
        mode='markers',
        name=outcome,
        marker=dict(color=color, size=8, opacity=0.6),
        visible=True
    ))

# Create buttons for dropdown
buttons = [
    dict(
        label="All",
        method="update",
        args=[{"visible": [True] * len(colors)}]
    )
]

# Add button for each outcome
for i, outcome in enumerate(colors.keys()):
    visibility = [i == j for j in range(len(colors))]
    buttons.append(
        dict(
            label=outcome,
            method="update",
            args=[{"visible": visibility}]
        )
    )

# Update layout with dropdown
fig.update_layout(
    updatemenus=[
        dict(
            buttons=buttons,
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.11,
            xanchor="left",
            y=1.15,
            yanchor="top"
        )
    ],
    title="Ohtani's Batted Balls - Filter by Outcome",
    xaxis_title="Launch Angle (degrees)",
    yaxis_title="Exit Velocity (mph)"
)

fig.show()

4.5 Baseball-Specific Visualizations

Now we'll tackle visualizations unique to baseball: spray charts, strike zone heat maps, pitch movement plots, and field diagrams.

4.5.1 Spray Charts: Hit Locations {#spray-charts}

Spray charts show where batted balls land on the field. They're essential for analyzing hitting tendencies and defensive positioning.

R with ggplot2 and geom_baseball:

library(ggplot2)
library(dplyr)
library(baseballr)

# Get batted ball data with coordinates
ohtani_spray <- statcast_search(
  start_date = "2024-04-01",
  end_date = "2024-09-30",
  playerid = 660271,
  player_type = "batter"
) %>%
  filter(!is.na(hc_x), !is.na(hc_y))  # Statcast coordinates

# Categorize hit types
ohtani_spray <- ohtani_spray %>%
  mutate(
    hit_type = case_when(
      events == "home_run" ~ "Home Run",
      events %in% c("single", "double", "triple") ~ "Hit",
      str_detect(events, "out") ~ "Out",
      TRUE ~ "Other"
    )
  )

# Create spray chart
# Note: Statcast coordinates are from catcher's perspective
# X: 0 = center, negative = left, positive = right
# Y: Home plate is ~200, outfield wall ~400+

ggplot(ohtani_spray, aes(x = hc_x, y = hc_y)) +
  # Field dimensions (approximations)
  # Infield arc
  annotate("path",
           x = 125 * cos(seq(pi/4, 3*pi/4, length.out = 100)),
           y = 200 - 125 * sin(seq(pi/4, 3*pi/4, length.out = 100)),
           color = "darkgreen", size = 1) +
  # Outfield arc (wall)
  annotate("path",
           x = 230 * cos(seq(pi/4, 3*pi/4, length.out = 100)),
           y = 200 - 230 * sin(seq(pi/4, 3*pi/4, length.out = 100)),
           color = "black", size = 1.5) +
  # Foul lines
  geom_segment(aes(x = 0, y = 200, xend = -230, yend = 200 - 230),
               color = "white", size = 1) +
  geom_segment(aes(x = 0, y = 200, xend = 230, yend = 200 - 230),
               color = "white", size = 1) +
  # Batted balls
  geom_point(aes(color = hit_type), alpha = 0.6, size = 3) +
  scale_color_manual(
    values = c("Home Run" = "#FF0000", "Hit" = "#FFD700",
               "Out" = "#4292C6", "Other" = "#969696")
  ) +
  # Formatting
  coord_fixed() +
  scale_x_continuous(limits = c(-250, 250)) +
  scale_y_continuous(limits = c(-50, 450)) +
  labs(
    title = "Shohei Ohtani Spray Chart - 2024 Season",
    subtitle = "Hit locations from catcher's perspective",
    color = "Outcome",
    caption = "Data: MLB Statcast"
  ) +
  theme_void() +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 10, hjust = 0.5, color = "gray30"),
    plot.background = element_rect(fill = "#1e8449", color = NA),
    legend.position = "right"
  )

Better approach using GeomMLBStadium from baseballr:

library(baseballr)
library(ggplot2)

# GeomMLBStadium provides proper field dimensions
ggplot(ohtani_spray, aes(x = hc_x, y = hc_y - 200)) +  # Adjust Y to origin
  geom_mlb_stadium(stadium_ids = "generic") +  # Generic stadium outline
  geom_point(aes(color = hit_type, shape = hit_type),
             alpha = 0.7, size = 3.5) +
  scale_color_manual(
    values = c("Home Run" = "#FF0000", "Hit" = "#FFD700",
               "Out" = "#4292C6", "Other" = "#969696")
  ) +
  scale_shape_manual(
    values = c("Home Run" = 17, "Hit" = 16, "Out" = 16, "Other" = 4)
  ) +
  coord_fixed() +
  labs(
    title = "Shohei Ohtani Spray Chart - 2024 Season",
    color = "Outcome",
    shape = "Outcome"
  ) +
  theme_void() +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    legend.position = "right"
  )

Python spray chart:

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.patches import Arc, Rectangle, Circle

def plot_field():
    """Draw baseball field"""
    fig, ax = plt.subplots(figsize=(10, 10))

    # Set field color
    ax.set_facecolor('#1e8449')

    # Infield grass line (arc)
    infield = Arc((0, 0), 250, 250, angle=0, theta1=45, theta2=135,
                  color='darkgreen', linewidth=2)
    ax.add_patch(infield)

    # Outfield wall (arc)
    wall = Arc((0, 0), 460, 460, angle=0, theta1=45, theta2=135,
               color='black', linewidth=3)
    ax.add_patch(wall)

    # Foul lines
    ax.plot([0, -230], [0, 230], color='white', linewidth=2)
    ax.plot([0, 230], [0, 230], color='white', linewidth=2)

    # Infield dirt (diamond)
    ax.fill([0, 90, 0, -90], [0, 90, 127, 90], color='#CD853F', alpha=0.3)

    # Bases
    base_size = 8
    # First base
    ax.add_patch(Rectangle((90-base_size/2, 90-base_size/2), base_size, base_size,
                           facecolor='white', edgecolor='black'))
    # Second base
    ax.add_patch(Rectangle((-base_size/2, 127-base_size/2), base_size, base_size,
                           facecolor='white', edgecolor='black'))
    # Third base
    ax.add_patch(Rectangle((-90-base_size/2, 90-base_size/2), base_size, base_size,
                           facecolor='white', edgecolor='black'))
    # Home plate
    ax.add_patch(Circle((0, 0), 5, facecolor='white', edgecolor='black'))

    ax.set_xlim(-250, 250)
    ax.set_ylim(-50, 450)
    ax.set_aspect('equal')
    ax.axis('off')

    return fig, ax

# Create spray chart
fig, ax = plot_field()

# Transform Statcast coordinates (y needs to be inverted and translated)
ohtani_spray_clean = ohtani_batted[
    ohtani_batted['hc_x'].notna() & ohtani_batted['hc_y'].notna()
].copy()
ohtani_spray_clean['field_x'] = ohtani_spray_clean['hc_x'] - 125
ohtani_spray_clean['field_y'] = 200 - ohtani_spray_clean['hc_y']

# Plot by outcome
colors_spray = {
    'Home Run': '#FF0000',
    'Extra Base Hit': '#FFD700',
    'Single': '#FFFF00',
    'Out': '#4292C6',
    'Other': '#969696'
}

for outcome, color in colors_spray.items():
    data = ohtani_spray_clean[ohtani_spray_clean['outcome'] == outcome]
    marker = '^' if outcome == 'Home Run' else 'o'
    ax.scatter(data['field_x'], data['field_y'],
               c=color, label=outcome, alpha=0.7, s=80,
               edgecolors='black', linewidth=0.5, marker=marker)

ax.legend(loc='upper right', frameon=True, facecolor='white', edgecolor='black')
plt.title("Shohei Ohtani Spray Chart - 2024 Season",
          fontweight='bold', fontsize=14, pad=20)
plt.figtext(0.5, 0.02, 'Data: MLB Statcast | View from catcher\'s perspective',
            ha='center', fontsize=9, style='italic')
plt.tight_layout()
plt.show()

4.5.2 Strike Zone Heat Maps {#strike-zone-heatmap}

Strike zone heat maps show pitch locations and outcomes. They're crucial for understanding pitcher command and batter approach.

R strike zone heat map:

library(ggplot2)
library(baseballr)

# Get pitch data for a pitcher
gerrit_cole_pitches <- statcast_search(
  start_date = "2024-06-01",
  end_date = "2024-08-31",
  playerid = 543037,  # Gerrit Cole
  player_type = "pitcher"
) %>%
  filter(!is.na(plate_x), !is.na(plate_z))

# Categorize pitch results
gerrit_cole_pitches <- gerrit_cole_pitches %>%
  mutate(
    result = case_when(
      description %in% c("swinging_strike", "swinging_strike_blocked") ~ "Whiff",
      description == "called_strike" ~ "Called Strike",
      description %in% c("ball", "blocked_ball") ~ "Ball",
      description %in% c("foul", "foul_tip") ~ "Foul",
      description == "hit_into_play" ~ "In Play",
      TRUE ~ "Other"
    )
  )

# Strike zone boundaries (approximate, adjusts per batter)
# Standard zone: 17 inches wide (plate), roughly 1.5 to 3.5 feet in height
# In Statcast units: x = -0.85 to 0.85, z = 1.5 to 3.5

ggplot(gerrit_cole_pitches, aes(x = plate_x, y = plate_z)) +
  # Strike zone rectangle
  geom_rect(aes(xmin = -0.85, xmax = 0.85, ymin = 1.5, ymax = 3.5),
            fill = NA, color = "black", size = 1.2) +
  # Home plate (17 inches wide, 8.5 inches point to back)
  geom_polygon(
    data = data.frame(
      x = c(-0.85, 0.85, 0.85, 0, -0.85),
      y = c(0, 0, 0.15, 0.25, 0.15)
    ),
    aes(x = x, y = y),
    fill = "white", color = "black", size = 1
  ) +
  # Heat map
  stat_density_2d(aes(fill = after_stat(level)), geom = "polygon",
                  alpha = 0.5, bins = 20) +
  scale_fill_gradientn(colors = c("blue", "green", "yellow", "red"),
                       name = "Pitch\nDensity") +
  # Points colored by result
  geom_point(aes(color = result), alpha = 0.4, size = 1.5) +
  scale_color_manual(
    values = c(
      "Whiff" = "#FF0000",
      "Called Strike" = "#FFA500",
      "Ball" = "#4292C6",
      "Foul" = "#969696",
      "In Play" = "#2ca25f",
      "Other" = "#000000"
    )
  ) +
  coord_fixed(ratio = 1) +
  scale_x_continuous(limits = c(-2, 2)) +
  scale_y_continuous(limits = c(0, 5)) +
  labs(
    title = "Gerrit Cole Strike Zone - 2024 Season",
    subtitle = "Pitch locations from catcher's perspective. Right-handed batter stance.",
    x = "Horizontal Location (feet from center of plate)",
    y = "Vertical Location (feet from ground)",
    color = "Result",
    caption = "Data: MLB Statcast"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid = element_blank(),
    plot.background = element_rect(fill = "gray95", color = NA)
  )

Faceted by pitch type:

# Filter to main pitch types
cole_main_pitches <- gerrit_cole_pitches %>%
  filter(pitch_type %in% c("FF", "SL", "CH", "CU"))

ggplot(cole_main_pitches, aes(x = plate_x, y = plate_z)) +
  geom_rect(aes(xmin = -0.85, xmax = 0.85, ymin = 1.5, ymax = 3.5),
            fill = NA, color = "black", size = 0.8) +
  stat_density_2d(aes(fill = after_stat(level)), geom = "polygon",
                  alpha = 0.6, bins = 15) +
  scale_fill_gradientn(colors = c("blue", "cyan", "yellow", "red")) +
  facet_wrap(~ pitch_type, ncol = 2,
             labeller = labeller(pitch_type = c(
               "FF" = "Four-Seam Fastball",
               "SL" = "Slider",
               "CH" = "Changeup",
               "CU" = "Curveball"
             ))) +
  coord_fixed(ratio = 1) +
  scale_x_continuous(limits = c(-1.5, 1.5)) +
  scale_y_continuous(limits = c(1, 4)) +
  labs(
    title = "Gerrit Cole Pitch Locations by Type",
    x = "Horizontal Location (feet)",
    y = "Vertical Location (feet)",
    fill = "Density"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    strip.text = element_text(face = "bold"),
    panel.grid = element_blank()
  )

Python strike zone heat map:

import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.patches import Rectangle

# Assuming we have pitcher data
# Filter for valid pitch locations
cole_pitches = gerrit_cole_data[
    gerrit_cole_data['plate_x'].notna() &
    gerrit_cole_data['plate_z'].notna()
].copy()

# Create figure
fig, ax = plt.subplots(figsize=(10, 12))

# Create hexbin heat map
hexbin = ax.hexbin(
    cole_pitches['plate_x'],
    cole_pitches['plate_z'],
    gridsize=25,
    cmap='YlOrRd',
    alpha=0.7,
    mincnt=1
)

# Add strike zone
strike_zone = Rectangle(
    (-0.85, 1.5), 1.7, 2.0,
    fill=False, edgecolor='black', linewidth=2.5
)
ax.add_patch(strike_zone)

# Add home plate
home_plate_x = [-0.85, 0.85, 0.85, 0, -0.85]
home_plate_y = [0, 0, 0.15, 0.25, 0.15]
ax.fill(home_plate_x, home_plate_y, color='white',
        edgecolor='black', linewidth=2, zorder=10)

# Customize
ax.set_xlim(-2, 2)
ax.set_ylim(0, 5)
ax.set_aspect('equal')
ax.set_xlabel('Horizontal Location (feet from center)', fontweight='bold')
ax.set_ylabel('Vertical Location (feet from ground)', fontweight='bold')
ax.set_title("Gerrit Cole Strike Zone Heat Map - 2024 Season",
             fontweight='bold', fontsize=14, pad=20)
ax.set_facecolor('#f0f0f0')

# Colorbar
cbar = plt.colorbar(hexbin, ax=ax)
cbar.set_label('Number of Pitches', fontweight='bold')

plt.figtext(0.5, 0.02, 'Data: MLB Statcast | Catcher\'s perspective, RHB',
            ha='center', fontsize=9, style='italic')
plt.tight_layout()
plt.show()

Using seaborn kdeplot for smoother heat map:

fig, ax = plt.subplots(figsize=(10, 12))

# KDE heat map
sns.kdeplot(
    data=cole_pitches,
    x='plate_x',
    y='plate_z',
    fill=True,
    cmap='YlOrRd',
    levels=20,
    alpha=0.7,
    ax=ax
)

# Strike zone
strike_zone = Rectangle((-0.85, 1.5), 1.7, 2.0,
                         fill=False, edgecolor='black', linewidth=2.5)
ax.add_patch(strike_zone)

# Overlay actual pitches
ax.scatter(cole_pitches['plate_x'], cole_pitches['plate_z'],
           s=10, alpha=0.2, color='black', edgecolors='none')

ax.set_xlim(-2, 2)
ax.set_ylim(0, 5)
ax.set_aspect('equal')
ax.set_xlabel('Horizontal Location (feet)', fontweight='bold')
ax.set_ylabel('Vertical Location (feet)', fontweight='bold')
ax.set_title("Gerrit Cole Strike Zone Density",
             fontweight='bold', fontsize=14, pad=20)

plt.tight_layout()
plt.show()

4.5.3 Pitch Movement Plots {#pitch-movement}

Pitch movement plots show how different pitch types move (break) horizontally and vertically. This reveals a pitcher's arsenal diversity.

R pitch movement plot:

library(ggplot2)
library(baseballr)

# Get pitch data with movement
cole_movement <- gerrit_cole_pitches %>%
  filter(!is.na(pfx_x), !is.na(pfx_z), !is.na(pitch_type)) %>%
  # Convert Statcast movement to inches (from feet)
  mutate(
    horizontal_break = pfx_x * 12,  # Convert feet to inches
    vertical_break = pfx_z * 12,
    pitch_name = case_when(
      pitch_type == "FF" ~ "Four-Seam FB",
      pitch_type == "SL" ~ "Slider",
      pitch_type == "CH" ~ "Changeup",
      pitch_type == "CU" ~ "Curveball",
      pitch_type == "SI" ~ "Sinker",
      TRUE ~ pitch_type
    )
  ) %>%
  filter(pitch_name %in% c("Four-Seam FB", "Slider", "Changeup", "Curveball"))

# Calculate average movement for each pitch type
pitch_avg_movement <- cole_movement %>%
  group_by(pitch_name) %>%
  summarize(
    avg_horizontal = mean(horizontal_break),
    avg_vertical = mean(vertical_break),
    avg_velocity = mean(release_speed, na.rm = TRUE),
    count = n()
  )

ggplot(cole_movement, aes(x = horizontal_break, y = vertical_break,
                          color = pitch_name)) +
  geom_point(alpha = 0.3, size = 2) +
  # Add average points
  geom_point(data = pitch_avg_movement,
             aes(x = avg_horizontal, y = avg_vertical, color = pitch_name),
             size = 8, shape = 17) +
  # Add labels for averages
  geom_text(data = pitch_avg_movement,
            aes(x = avg_horizontal, y = avg_vertical,
                label = paste0(pitch_name, "\n", round(avg_velocity, 1), " mph")),
            size = 3, fontface = "bold", vjust = -1.5, hjust = 0.5,
            color = "black") +
  # Reference lines
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
  scale_color_brewer(palette = "Set1") +
  labs(
    title = "Gerrit Cole Pitch Movement Profile - 2024",
    subtitle = "Triangles show average movement. Measured from catcher's perspective.",
    x = "Horizontal Break (inches, negative = arm side)",
    y = "Vertical Break (inches, negative = drop)",
    color = "Pitch Type",
    caption = "Data: MLB Statcast"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  )

Python pitch movement plot:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Prepare movement data
cole_movement = cole_pitches[
    cole_pitches['pfx_x'].notna() &
    cole_pitches['pfx_z'].notna() &
    cole_pitches['pitch_type'].notna()
].copy()

# Convert to inches
cole_movement['horizontal_break'] = cole_movement['pfx_x'] * 12
cole_movement['vertical_break'] = cole_movement['pfx_z'] * 12

# Map pitch types to names
pitch_names = {
    'FF': 'Four-Seam FB',
    'SL': 'Slider',
    'CH': 'Changeup',
    'CU': 'Curveball',
    'SI': 'Sinker'
}
cole_movement['pitch_name'] = cole_movement['pitch_type'].map(pitch_names)
cole_movement = cole_movement[cole_movement['pitch_name'].notna()]

# Calculate averages
pitch_avg = cole_movement.groupby('pitch_name').agg({
    'horizontal_break': 'mean',
    'vertical_break': 'mean',
    'release_speed': 'mean',
    'pitch_type': 'count'
}).reset_index()
pitch_avg.columns = ['pitch_name', 'avg_horizontal', 'avg_vertical',
                     'avg_velocity', 'count']

# Create plot
fig, ax = plt.subplots(figsize=(12, 10))

# Set color palette
colors = {'Four-Seam FB': '#e41a1c', 'Slider': '#377eb8',
          'Changeup': '#4daf4a', 'Curveball': '#984ea3', 'Sinker': '#ff7f00'}

# Plot individual pitches
for pitch_type, color in colors.items():
    data = cole_movement[cole_movement['pitch_name'] == pitch_type]
    ax.scatter(data['horizontal_break'], data['vertical_break'],
               c=color, alpha=0.3, s=30, label=pitch_type, edgecolors='none')

# Plot averages
for _, row in pitch_avg.iterrows():
    color = colors.get(row['pitch_name'], '#000000')
    ax.scatter(row['avg_horizontal'], row['avg_vertical'],
               c=color, s=400, marker='^', edgecolors='black', linewidth=2,
               zorder=10)
    ax.annotate(f"{row['pitch_name']}\n{row['avg_velocity']:.1f} mph",
                (row['avg_horizontal'], row['avg_vertical']),
                textcoords="offset points", xytext=(0, 15),
                ha='center', fontweight='bold', fontsize=9)

# Reference lines
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
ax.axvline(x=0, color='gray', linestyle='--', alpha=0.5)

# Customize
ax.set_xlabel('Horizontal Break (inches, negative = arm side)',
              fontweight='bold', fontsize=12)
ax.set_ylabel('Vertical Break (inches, negative = drop)',
              fontweight='bold', fontsize=12)
ax.set_title("Gerrit Cole Pitch Movement Profile - 2024 Season",
             fontweight='bold', fontsize=14, pad=20)
ax.legend(loc='best', frameon=True, title='Pitch Type', title_fontsize=11)
ax.grid(True, alpha=0.3)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.figtext(0.99, 0.01, 'Data: MLB Statcast | Catcher\'s perspective',
            ha='right', fontsize=9, style='italic')
plt.tight_layout()
plt.show()

4.5.4 Field Diagrams with sportyR {#sportyr}

The sportyR package in R provides professional-quality sports field/court visualizations.

library(sportyR)
library(ggplot2)

# Create MLB field
field <- geom_baseball(league = "MLB")

# Basic field
field

# Add spray chart data
field +
  geom_point(data = ohtani_spray,
             aes(x = hc_x - 125, y = -(hc_y - 200), color = hit_type),
             alpha = 0.6, size = 3) +
  scale_color_manual(
    values = c("Home Run" = "#FF0000", "Hit" = "#FFD700",
               "Out" = "#4292C6", "Other" = "#969696")
  ) +
  labs(
    title = "Shohei Ohtani Spray Chart",
    color = "Outcome"
  ) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    legend.position = "right"
  )

# Specific stadium
field_yankee <- geom_baseball(league = "MLB", stadium = "yankee_stadium")

field_yankee +
  labs(title = "Yankee Stadium") +
  theme(plot.title = element_text(face = "bold", hjust = 0.5))

4.5.5 Statcast Zone Charts {#statcast-zones}

MLB divides the strike zone into zones for analysis. We can visualize performance by zone.

library(ggplot2)
library(dplyr)

# Statcast uses 14 zones (9 in strike zone, 5 outside)
# Zone numbering:
# 11  12  13
# 4   5   6   (strike zone)
# 7   8   9
# 14 (below zone)
# 1, 2, 3 (outside zone left, up, right)

# Get pitch data with zones
cole_zones <- gerrit_cole_pitches %>%
  filter(!is.na(zone)) %>%
  group_by(zone) %>%
  summarize(
    pitches = n(),
    whiff_rate = sum(description %in% c("swinging_strike", "swinging_strike_blocked")) / n(),
    avg_velocity = mean(release_speed, na.rm = TRUE)
  ) %>%
  mutate(
    zone_x = case_when(
      zone %in% c(1, 4, 7, 11) ~ 1,
      zone %in% c(2, 5, 8, 12, 14) ~ 2,
      zone %in% c(3, 6, 9, 13) ~ 3
    ),
    zone_y = case_when(
      zone %in% c(11, 12, 13) ~ 3,
      zone %in% c(4, 5, 6) ~ 2,
      zone %in% c(7, 8, 9) ~ 1,
      zone %in% c(1, 2, 3, 14) ~ 0
    )
  )

# Create zone chart
ggplot(cole_zones, aes(x = zone_x, y = zone_y, fill = whiff_rate)) +
  geom_tile(color = "white", size = 2) +
  geom_text(aes(label = paste0(round(whiff_rate * 100, 1), "%\n",
                                "n=", pitches)),
            color = "white", fontface = "bold", size = 4) +
  scale_fill_gradient2(
    low = "#2166ac", mid = "#f7f7f7", high = "#b2182b",
    midpoint = 0.15, labels = scales::percent
  ) +
  coord_fixed() +
  scale_x_continuous(breaks = NULL) +
  scale_y_continuous(breaks = NULL) +
  labs(
    title = "Gerrit Cole Whiff Rate by Zone - 2024",
    subtitle = "Catcher's perspective. Zones 11-13 (top) to 7-9 (bottom)",
    fill = "Whiff Rate",
    caption = "Data: MLB Statcast"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    axis.text = element_blank(),
    axis.title = element_blank(),
    panel.grid = element_blank()
  )

Python version:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Calculate zone statistics
zone_stats = cole_pitches.groupby('zone').agg({
    'pitch_type': 'count',
    'description': lambda x: (x.isin(['swinging_strike', 'swinging_strike_blocked'])).sum() / len(x),
    'release_speed': 'mean'
}).reset_index()
zone_stats.columns = ['zone', 'pitches', 'whiff_rate', 'avg_velocity']

# Map zones to grid positions
zone_positions = {
    11: (0, 2), 12: (1, 2), 13: (2, 2),
    4: (0, 1), 5: (1, 1), 6: (2, 1),
    7: (0, 0), 8: (1, 0), 9: (2, 0),
    1: (0, 3), 2: (1, 3), 3: (2, 3), 14: (1, -1)
}

zone_stats['x'] = zone_stats['zone'].map(lambda z: zone_positions.get(z, (0, 0))[0])
zone_stats['y'] = zone_stats['zone'].map(lambda z: zone_positions.get(z, (0, 0))[1])

# Create heatmap matrix
heatmap_data = np.zeros((4, 3))
for _, row in zone_stats.iterrows():
    if not np.isnan(row['x']) and not np.isnan(row['y']):
        heatmap_data[int(3 - row['y']), int(row['x'])] = row['whiff_rate']

# Plot
fig, ax = plt.subplots(figsize=(8, 10))

sns.heatmap(
    heatmap_data,
    annot=False,
    cmap='RdYlBu_r',
    center=0.15,
    vmin=0, vmax=0.4,
    cbar_kws={'label': 'Whiff Rate'},
    linewidths=2,
    linecolor='white',
    square=True,
    ax=ax
)

# Add text annotations
for _, row in zone_stats.iterrows():
    if not np.isnan(row['x']) and not np.isnan(row['y']):
        ax.text(row['x'] + 0.5, 3.5 - row['y'],
                f"{row['whiff_rate']:.1%}\nn={int(row['pitches'])}",
                ha='center', va='center', color='white',
                fontweight='bold', fontsize=10)

ax.set_xticks([])
ax.set_yticks([])
ax.set_xlabel('')
ax.set_ylabel('')
ax.set_title("Gerrit Cole Whiff Rate by Zone - 2024 Season",
             fontweight='bold', fontsize=14, pad=20)

plt.tight_layout()
plt.show()
R
library(ggplot2)
library(dplyr)
library(baseballr)

# Get batted ball data with coordinates
ohtani_spray <- statcast_search(
  start_date = "2024-04-01",
  end_date = "2024-09-30",
  playerid = 660271,
  player_type = "batter"
) %>%
  filter(!is.na(hc_x), !is.na(hc_y))  # Statcast coordinates

# Categorize hit types
ohtani_spray <- ohtani_spray %>%
  mutate(
    hit_type = case_when(
      events == "home_run" ~ "Home Run",
      events %in% c("single", "double", "triple") ~ "Hit",
      str_detect(events, "out") ~ "Out",
      TRUE ~ "Other"
    )
  )

# Create spray chart
# Note: Statcast coordinates are from catcher's perspective
# X: 0 = center, negative = left, positive = right
# Y: Home plate is ~200, outfield wall ~400+

ggplot(ohtani_spray, aes(x = hc_x, y = hc_y)) +
  # Field dimensions (approximations)
  # Infield arc
  annotate("path",
           x = 125 * cos(seq(pi/4, 3*pi/4, length.out = 100)),
           y = 200 - 125 * sin(seq(pi/4, 3*pi/4, length.out = 100)),
           color = "darkgreen", size = 1) +
  # Outfield arc (wall)
  annotate("path",
           x = 230 * cos(seq(pi/4, 3*pi/4, length.out = 100)),
           y = 200 - 230 * sin(seq(pi/4, 3*pi/4, length.out = 100)),
           color = "black", size = 1.5) +
  # Foul lines
  geom_segment(aes(x = 0, y = 200, xend = -230, yend = 200 - 230),
               color = "white", size = 1) +
  geom_segment(aes(x = 0, y = 200, xend = 230, yend = 200 - 230),
               color = "white", size = 1) +
  # Batted balls
  geom_point(aes(color = hit_type), alpha = 0.6, size = 3) +
  scale_color_manual(
    values = c("Home Run" = "#FF0000", "Hit" = "#FFD700",
               "Out" = "#4292C6", "Other" = "#969696")
  ) +
  # Formatting
  coord_fixed() +
  scale_x_continuous(limits = c(-250, 250)) +
  scale_y_continuous(limits = c(-50, 450)) +
  labs(
    title = "Shohei Ohtani Spray Chart - 2024 Season",
    subtitle = "Hit locations from catcher's perspective",
    color = "Outcome",
    caption = "Data: MLB Statcast"
  ) +
  theme_void() +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 10, hjust = 0.5, color = "gray30"),
    plot.background = element_rect(fill = "#1e8449", color = NA),
    legend.position = "right"
  )
R
library(baseballr)
library(ggplot2)

# GeomMLBStadium provides proper field dimensions
ggplot(ohtani_spray, aes(x = hc_x, y = hc_y - 200)) +  # Adjust Y to origin
  geom_mlb_stadium(stadium_ids = "generic") +  # Generic stadium outline
  geom_point(aes(color = hit_type, shape = hit_type),
             alpha = 0.7, size = 3.5) +
  scale_color_manual(
    values = c("Home Run" = "#FF0000", "Hit" = "#FFD700",
               "Out" = "#4292C6", "Other" = "#969696")
  ) +
  scale_shape_manual(
    values = c("Home Run" = 17, "Hit" = 16, "Out" = 16, "Other" = 4)
  ) +
  coord_fixed() +
  labs(
    title = "Shohei Ohtani Spray Chart - 2024 Season",
    color = "Outcome",
    shape = "Outcome"
  ) +
  theme_void() +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    legend.position = "right"
  )
R
library(ggplot2)
library(baseballr)

# Get pitch data for a pitcher
gerrit_cole_pitches <- statcast_search(
  start_date = "2024-06-01",
  end_date = "2024-08-31",
  playerid = 543037,  # Gerrit Cole
  player_type = "pitcher"
) %>%
  filter(!is.na(plate_x), !is.na(plate_z))

# Categorize pitch results
gerrit_cole_pitches <- gerrit_cole_pitches %>%
  mutate(
    result = case_when(
      description %in% c("swinging_strike", "swinging_strike_blocked") ~ "Whiff",
      description == "called_strike" ~ "Called Strike",
      description %in% c("ball", "blocked_ball") ~ "Ball",
      description %in% c("foul", "foul_tip") ~ "Foul",
      description == "hit_into_play" ~ "In Play",
      TRUE ~ "Other"
    )
  )

# Strike zone boundaries (approximate, adjusts per batter)
# Standard zone: 17 inches wide (plate), roughly 1.5 to 3.5 feet in height
# In Statcast units: x = -0.85 to 0.85, z = 1.5 to 3.5

ggplot(gerrit_cole_pitches, aes(x = plate_x, y = plate_z)) +
  # Strike zone rectangle
  geom_rect(aes(xmin = -0.85, xmax = 0.85, ymin = 1.5, ymax = 3.5),
            fill = NA, color = "black", size = 1.2) +
  # Home plate (17 inches wide, 8.5 inches point to back)
  geom_polygon(
    data = data.frame(
      x = c(-0.85, 0.85, 0.85, 0, -0.85),
      y = c(0, 0, 0.15, 0.25, 0.15)
    ),
    aes(x = x, y = y),
    fill = "white", color = "black", size = 1
  ) +
  # Heat map
  stat_density_2d(aes(fill = after_stat(level)), geom = "polygon",
                  alpha = 0.5, bins = 20) +
  scale_fill_gradientn(colors = c("blue", "green", "yellow", "red"),
                       name = "Pitch\nDensity") +
  # Points colored by result
  geom_point(aes(color = result), alpha = 0.4, size = 1.5) +
  scale_color_manual(
    values = c(
      "Whiff" = "#FF0000",
      "Called Strike" = "#FFA500",
      "Ball" = "#4292C6",
      "Foul" = "#969696",
      "In Play" = "#2ca25f",
      "Other" = "#000000"
    )
  ) +
  coord_fixed(ratio = 1) +
  scale_x_continuous(limits = c(-2, 2)) +
  scale_y_continuous(limits = c(0, 5)) +
  labs(
    title = "Gerrit Cole Strike Zone - 2024 Season",
    subtitle = "Pitch locations from catcher's perspective. Right-handed batter stance.",
    x = "Horizontal Location (feet from center of plate)",
    y = "Vertical Location (feet from ground)",
    color = "Result",
    caption = "Data: MLB Statcast"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid = element_blank(),
    plot.background = element_rect(fill = "gray95", color = NA)
  )
R
# Filter to main pitch types
cole_main_pitches <- gerrit_cole_pitches %>%
  filter(pitch_type %in% c("FF", "SL", "CH", "CU"))

ggplot(cole_main_pitches, aes(x = plate_x, y = plate_z)) +
  geom_rect(aes(xmin = -0.85, xmax = 0.85, ymin = 1.5, ymax = 3.5),
            fill = NA, color = "black", size = 0.8) +
  stat_density_2d(aes(fill = after_stat(level)), geom = "polygon",
                  alpha = 0.6, bins = 15) +
  scale_fill_gradientn(colors = c("blue", "cyan", "yellow", "red")) +
  facet_wrap(~ pitch_type, ncol = 2,
             labeller = labeller(pitch_type = c(
               "FF" = "Four-Seam Fastball",
               "SL" = "Slider",
               "CH" = "Changeup",
               "CU" = "Curveball"
             ))) +
  coord_fixed(ratio = 1) +
  scale_x_continuous(limits = c(-1.5, 1.5)) +
  scale_y_continuous(limits = c(1, 4)) +
  labs(
    title = "Gerrit Cole Pitch Locations by Type",
    x = "Horizontal Location (feet)",
    y = "Vertical Location (feet)",
    fill = "Density"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    strip.text = element_text(face = "bold"),
    panel.grid = element_blank()
  )
R
library(ggplot2)
library(baseballr)

# Get pitch data with movement
cole_movement <- gerrit_cole_pitches %>%
  filter(!is.na(pfx_x), !is.na(pfx_z), !is.na(pitch_type)) %>%
  # Convert Statcast movement to inches (from feet)
  mutate(
    horizontal_break = pfx_x * 12,  # Convert feet to inches
    vertical_break = pfx_z * 12,
    pitch_name = case_when(
      pitch_type == "FF" ~ "Four-Seam FB",
      pitch_type == "SL" ~ "Slider",
      pitch_type == "CH" ~ "Changeup",
      pitch_type == "CU" ~ "Curveball",
      pitch_type == "SI" ~ "Sinker",
      TRUE ~ pitch_type
    )
  ) %>%
  filter(pitch_name %in% c("Four-Seam FB", "Slider", "Changeup", "Curveball"))

# Calculate average movement for each pitch type
pitch_avg_movement <- cole_movement %>%
  group_by(pitch_name) %>%
  summarize(
    avg_horizontal = mean(horizontal_break),
    avg_vertical = mean(vertical_break),
    avg_velocity = mean(release_speed, na.rm = TRUE),
    count = n()
  )

ggplot(cole_movement, aes(x = horizontal_break, y = vertical_break,
                          color = pitch_name)) +
  geom_point(alpha = 0.3, size = 2) +
  # Add average points
  geom_point(data = pitch_avg_movement,
             aes(x = avg_horizontal, y = avg_vertical, color = pitch_name),
             size = 8, shape = 17) +
  # Add labels for averages
  geom_text(data = pitch_avg_movement,
            aes(x = avg_horizontal, y = avg_vertical,
                label = paste0(pitch_name, "\n", round(avg_velocity, 1), " mph")),
            size = 3, fontface = "bold", vjust = -1.5, hjust = 0.5,
            color = "black") +
  # Reference lines
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
  scale_color_brewer(palette = "Set1") +
  labs(
    title = "Gerrit Cole Pitch Movement Profile - 2024",
    subtitle = "Triangles show average movement. Measured from catcher's perspective.",
    x = "Horizontal Break (inches, negative = arm side)",
    y = "Vertical Break (inches, negative = drop)",
    color = "Pitch Type",
    caption = "Data: MLB Statcast"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  )
R
library(sportyR)
library(ggplot2)

# Create MLB field
field <- geom_baseball(league = "MLB")

# Basic field
field

# Add spray chart data
field +
  geom_point(data = ohtani_spray,
             aes(x = hc_x - 125, y = -(hc_y - 200), color = hit_type),
             alpha = 0.6, size = 3) +
  scale_color_manual(
    values = c("Home Run" = "#FF0000", "Hit" = "#FFD700",
               "Out" = "#4292C6", "Other" = "#969696")
  ) +
  labs(
    title = "Shohei Ohtani Spray Chart",
    color = "Outcome"
  ) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    legend.position = "right"
  )

# Specific stadium
field_yankee <- geom_baseball(league = "MLB", stadium = "yankee_stadium")

field_yankee +
  labs(title = "Yankee Stadium") +
  theme(plot.title = element_text(face = "bold", hjust = 0.5))
R
library(ggplot2)
library(dplyr)

# Statcast uses 14 zones (9 in strike zone, 5 outside)
# Zone numbering:
# 11  12  13
# 4   5   6   (strike zone)
# 7   8   9
# 14 (below zone)
# 1, 2, 3 (outside zone left, up, right)

# Get pitch data with zones
cole_zones <- gerrit_cole_pitches %>%
  filter(!is.na(zone)) %>%
  group_by(zone) %>%
  summarize(
    pitches = n(),
    whiff_rate = sum(description %in% c("swinging_strike", "swinging_strike_blocked")) / n(),
    avg_velocity = mean(release_speed, na.rm = TRUE)
  ) %>%
  mutate(
    zone_x = case_when(
      zone %in% c(1, 4, 7, 11) ~ 1,
      zone %in% c(2, 5, 8, 12, 14) ~ 2,
      zone %in% c(3, 6, 9, 13) ~ 3
    ),
    zone_y = case_when(
      zone %in% c(11, 12, 13) ~ 3,
      zone %in% c(4, 5, 6) ~ 2,
      zone %in% c(7, 8, 9) ~ 1,
      zone %in% c(1, 2, 3, 14) ~ 0
    )
  )

# Create zone chart
ggplot(cole_zones, aes(x = zone_x, y = zone_y, fill = whiff_rate)) +
  geom_tile(color = "white", size = 2) +
  geom_text(aes(label = paste0(round(whiff_rate * 100, 1), "%\n",
                                "n=", pitches)),
            color = "white", fontface = "bold", size = 4) +
  scale_fill_gradient2(
    low = "#2166ac", mid = "#f7f7f7", high = "#b2182b",
    midpoint = 0.15, labels = scales::percent
  ) +
  coord_fixed() +
  scale_x_continuous(breaks = NULL) +
  scale_y_continuous(breaks = NULL) +
  labs(
    title = "Gerrit Cole Whiff Rate by Zone - 2024",
    subtitle = "Catcher's perspective. Zones 11-13 (top) to 7-9 (bottom)",
    fill = "Whiff Rate",
    caption = "Data: MLB Statcast"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    axis.text = element_blank(),
    axis.title = element_blank(),
    panel.grid = element_blank()
  )
Python
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.patches import Arc, Rectangle, Circle

def plot_field():
    """Draw baseball field"""
    fig, ax = plt.subplots(figsize=(10, 10))

    # Set field color
    ax.set_facecolor('#1e8449')

    # Infield grass line (arc)
    infield = Arc((0, 0), 250, 250, angle=0, theta1=45, theta2=135,
                  color='darkgreen', linewidth=2)
    ax.add_patch(infield)

    # Outfield wall (arc)
    wall = Arc((0, 0), 460, 460, angle=0, theta1=45, theta2=135,
               color='black', linewidth=3)
    ax.add_patch(wall)

    # Foul lines
    ax.plot([0, -230], [0, 230], color='white', linewidth=2)
    ax.plot([0, 230], [0, 230], color='white', linewidth=2)

    # Infield dirt (diamond)
    ax.fill([0, 90, 0, -90], [0, 90, 127, 90], color='#CD853F', alpha=0.3)

    # Bases
    base_size = 8
    # First base
    ax.add_patch(Rectangle((90-base_size/2, 90-base_size/2), base_size, base_size,
                           facecolor='white', edgecolor='black'))
    # Second base
    ax.add_patch(Rectangle((-base_size/2, 127-base_size/2), base_size, base_size,
                           facecolor='white', edgecolor='black'))
    # Third base
    ax.add_patch(Rectangle((-90-base_size/2, 90-base_size/2), base_size, base_size,
                           facecolor='white', edgecolor='black'))
    # Home plate
    ax.add_patch(Circle((0, 0), 5, facecolor='white', edgecolor='black'))

    ax.set_xlim(-250, 250)
    ax.set_ylim(-50, 450)
    ax.set_aspect('equal')
    ax.axis('off')

    return fig, ax

# Create spray chart
fig, ax = plot_field()

# Transform Statcast coordinates (y needs to be inverted and translated)
ohtani_spray_clean = ohtani_batted[
    ohtani_batted['hc_x'].notna() & ohtani_batted['hc_y'].notna()
].copy()
ohtani_spray_clean['field_x'] = ohtani_spray_clean['hc_x'] - 125
ohtani_spray_clean['field_y'] = 200 - ohtani_spray_clean['hc_y']

# Plot by outcome
colors_spray = {
    'Home Run': '#FF0000',
    'Extra Base Hit': '#FFD700',
    'Single': '#FFFF00',
    'Out': '#4292C6',
    'Other': '#969696'
}

for outcome, color in colors_spray.items():
    data = ohtani_spray_clean[ohtani_spray_clean['outcome'] == outcome]
    marker = '^' if outcome == 'Home Run' else 'o'
    ax.scatter(data['field_x'], data['field_y'],
               c=color, label=outcome, alpha=0.7, s=80,
               edgecolors='black', linewidth=0.5, marker=marker)

ax.legend(loc='upper right', frameon=True, facecolor='white', edgecolor='black')
plt.title("Shohei Ohtani Spray Chart - 2024 Season",
          fontweight='bold', fontsize=14, pad=20)
plt.figtext(0.5, 0.02, 'Data: MLB Statcast | View from catcher\'s perspective',
            ha='center', fontsize=9, style='italic')
plt.tight_layout()
plt.show()
Python
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.patches import Rectangle

# Assuming we have pitcher data
# Filter for valid pitch locations
cole_pitches = gerrit_cole_data[
    gerrit_cole_data['plate_x'].notna() &
    gerrit_cole_data['plate_z'].notna()
].copy()

# Create figure
fig, ax = plt.subplots(figsize=(10, 12))

# Create hexbin heat map
hexbin = ax.hexbin(
    cole_pitches['plate_x'],
    cole_pitches['plate_z'],
    gridsize=25,
    cmap='YlOrRd',
    alpha=0.7,
    mincnt=1
)

# Add strike zone
strike_zone = Rectangle(
    (-0.85, 1.5), 1.7, 2.0,
    fill=False, edgecolor='black', linewidth=2.5
)
ax.add_patch(strike_zone)

# Add home plate
home_plate_x = [-0.85, 0.85, 0.85, 0, -0.85]
home_plate_y = [0, 0, 0.15, 0.25, 0.15]
ax.fill(home_plate_x, home_plate_y, color='white',
        edgecolor='black', linewidth=2, zorder=10)

# Customize
ax.set_xlim(-2, 2)
ax.set_ylim(0, 5)
ax.set_aspect('equal')
ax.set_xlabel('Horizontal Location (feet from center)', fontweight='bold')
ax.set_ylabel('Vertical Location (feet from ground)', fontweight='bold')
ax.set_title("Gerrit Cole Strike Zone Heat Map - 2024 Season",
             fontweight='bold', fontsize=14, pad=20)
ax.set_facecolor('#f0f0f0')

# Colorbar
cbar = plt.colorbar(hexbin, ax=ax)
cbar.set_label('Number of Pitches', fontweight='bold')

plt.figtext(0.5, 0.02, 'Data: MLB Statcast | Catcher\'s perspective, RHB',
            ha='center', fontsize=9, style='italic')
plt.tight_layout()
plt.show()
Python
fig, ax = plt.subplots(figsize=(10, 12))

# KDE heat map
sns.kdeplot(
    data=cole_pitches,
    x='plate_x',
    y='plate_z',
    fill=True,
    cmap='YlOrRd',
    levels=20,
    alpha=0.7,
    ax=ax
)

# Strike zone
strike_zone = Rectangle((-0.85, 1.5), 1.7, 2.0,
                         fill=False, edgecolor='black', linewidth=2.5)
ax.add_patch(strike_zone)

# Overlay actual pitches
ax.scatter(cole_pitches['plate_x'], cole_pitches['plate_z'],
           s=10, alpha=0.2, color='black', edgecolors='none')

ax.set_xlim(-2, 2)
ax.set_ylim(0, 5)
ax.set_aspect('equal')
ax.set_xlabel('Horizontal Location (feet)', fontweight='bold')
ax.set_ylabel('Vertical Location (feet)', fontweight='bold')
ax.set_title("Gerrit Cole Strike Zone Density",
             fontweight='bold', fontsize=14, pad=20)

plt.tight_layout()
plt.show()
Python
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Prepare movement data
cole_movement = cole_pitches[
    cole_pitches['pfx_x'].notna() &
    cole_pitches['pfx_z'].notna() &
    cole_pitches['pitch_type'].notna()
].copy()

# Convert to inches
cole_movement['horizontal_break'] = cole_movement['pfx_x'] * 12
cole_movement['vertical_break'] = cole_movement['pfx_z'] * 12

# Map pitch types to names
pitch_names = {
    'FF': 'Four-Seam FB',
    'SL': 'Slider',
    'CH': 'Changeup',
    'CU': 'Curveball',
    'SI': 'Sinker'
}
cole_movement['pitch_name'] = cole_movement['pitch_type'].map(pitch_names)
cole_movement = cole_movement[cole_movement['pitch_name'].notna()]

# Calculate averages
pitch_avg = cole_movement.groupby('pitch_name').agg({
    'horizontal_break': 'mean',
    'vertical_break': 'mean',
    'release_speed': 'mean',
    'pitch_type': 'count'
}).reset_index()
pitch_avg.columns = ['pitch_name', 'avg_horizontal', 'avg_vertical',
                     'avg_velocity', 'count']

# Create plot
fig, ax = plt.subplots(figsize=(12, 10))

# Set color palette
colors = {'Four-Seam FB': '#e41a1c', 'Slider': '#377eb8',
          'Changeup': '#4daf4a', 'Curveball': '#984ea3', 'Sinker': '#ff7f00'}

# Plot individual pitches
for pitch_type, color in colors.items():
    data = cole_movement[cole_movement['pitch_name'] == pitch_type]
    ax.scatter(data['horizontal_break'], data['vertical_break'],
               c=color, alpha=0.3, s=30, label=pitch_type, edgecolors='none')

# Plot averages
for _, row in pitch_avg.iterrows():
    color = colors.get(row['pitch_name'], '#000000')
    ax.scatter(row['avg_horizontal'], row['avg_vertical'],
               c=color, s=400, marker='^', edgecolors='black', linewidth=2,
               zorder=10)
    ax.annotate(f"{row['pitch_name']}\n{row['avg_velocity']:.1f} mph",
                (row['avg_horizontal'], row['avg_vertical']),
                textcoords="offset points", xytext=(0, 15),
                ha='center', fontweight='bold', fontsize=9)

# Reference lines
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
ax.axvline(x=0, color='gray', linestyle='--', alpha=0.5)

# Customize
ax.set_xlabel('Horizontal Break (inches, negative = arm side)',
              fontweight='bold', fontsize=12)
ax.set_ylabel('Vertical Break (inches, negative = drop)',
              fontweight='bold', fontsize=12)
ax.set_title("Gerrit Cole Pitch Movement Profile - 2024 Season",
             fontweight='bold', fontsize=14, pad=20)
ax.legend(loc='best', frameon=True, title='Pitch Type', title_fontsize=11)
ax.grid(True, alpha=0.3)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.figtext(0.99, 0.01, 'Data: MLB Statcast | Catcher\'s perspective',
            ha='right', fontsize=9, style='italic')
plt.tight_layout()
plt.show()
Python
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Calculate zone statistics
zone_stats = cole_pitches.groupby('zone').agg({
    'pitch_type': 'count',
    'description': lambda x: (x.isin(['swinging_strike', 'swinging_strike_blocked'])).sum() / len(x),
    'release_speed': 'mean'
}).reset_index()
zone_stats.columns = ['zone', 'pitches', 'whiff_rate', 'avg_velocity']

# Map zones to grid positions
zone_positions = {
    11: (0, 2), 12: (1, 2), 13: (2, 2),
    4: (0, 1), 5: (1, 1), 6: (2, 1),
    7: (0, 0), 8: (1, 0), 9: (2, 0),
    1: (0, 3), 2: (1, 3), 3: (2, 3), 14: (1, -1)
}

zone_stats['x'] = zone_stats['zone'].map(lambda z: zone_positions.get(z, (0, 0))[0])
zone_stats['y'] = zone_stats['zone'].map(lambda z: zone_positions.get(z, (0, 0))[1])

# Create heatmap matrix
heatmap_data = np.zeros((4, 3))
for _, row in zone_stats.iterrows():
    if not np.isnan(row['x']) and not np.isnan(row['y']):
        heatmap_data[int(3 - row['y']), int(row['x'])] = row['whiff_rate']

# Plot
fig, ax = plt.subplots(figsize=(8, 10))

sns.heatmap(
    heatmap_data,
    annot=False,
    cmap='RdYlBu_r',
    center=0.15,
    vmin=0, vmax=0.4,
    cbar_kws={'label': 'Whiff Rate'},
    linewidths=2,
    linecolor='white',
    square=True,
    ax=ax
)

# Add text annotations
for _, row in zone_stats.iterrows():
    if not np.isnan(row['x']) and not np.isnan(row['y']):
        ax.text(row['x'] + 0.5, 3.5 - row['y'],
                f"{row['whiff_rate']:.1%}\nn={int(row['pitches'])}",
                ha='center', va='center', color='white',
                fontweight='bold', fontsize=10)

ax.set_xticks([])
ax.set_yticks([])
ax.set_xlabel('')
ax.set_ylabel('')
ax.set_title("Gerrit Cole Whiff Rate by Zone - 2024 Season",
             fontweight='bold', fontsize=14, pad=20)

plt.tight_layout()
plt.show()

4.6 Exercises

These exercises will test your understanding of baseball visualization principles and techniques.

Exercise 1: Exit Velocity Distribution by Pitch Type

Create a visualization comparing the distribution of exit velocities for batted balls hit off different pitch types (fastball, slider, changeup, curveball).

Requirements:


  • Use histogram or density plot

  • Compare 3-4 pitch types

  • Add mean/median reference lines

  • Include proper labels and title

R Approach:

# Get batted ball data with pitch types
batted_pitches <- statcast_search(...) %>%
  filter(!is.na(launch_speed),
         pitch_type %in% c("FF", "SL", "CH", "CU"))

# Create overlapping histograms or faceted density plots
ggplot(batted_pitches, aes(x = launch_speed, fill = pitch_type)) +
  geom_density(alpha = 0.5) +
  # Add your customization here

Python Approach:

# Filter data
batted_pitches = data[data['launch_speed'].notna() & ...]

# Create overlapping density plots
plt.figure(figsize=(12, 6))
for pitch_type in ['FF', 'SL', 'CH', 'CU']:
    subset = batted_pitches[batted_pitches['pitch_type'] == pitch_type]
    sns.kdeplot(subset['launch_speed'], label=pitch_type, alpha=0.6)
# Add customization

Exercise 2: Team Performance Comparison

Create a grouped or faceted bar chart comparing multiple offensive statistics (HR, BA, OPS) across the top 8 teams.

Requirements:


  • Use actual 2023 team data

  • Include at least 3 statistics

  • Use appropriate color scheme

  • Make comparisons easy to see

Exercise 3: Interactive Player Comparison

Create an interactive scatter plot comparing players on two dimensions (e.g., barrel rate vs. hard-hit rate) with rich hover tooltips showing additional statistics.

Requirements:


  • Use plotly

  • Include hover tooltips with at least 5 statistics

  • Size points by a meaningful variable (PA, WAR)

  • Color by another variable (team, position)

Exercise 4: Custom Spray Chart

Create a spray chart for a player of your choice, customizing it to show:


  • Different markers for different outcomes

  • Color by exit velocity or launch angle

  • Field dimensions (can be approximate)

Requirements:


  • Use actual Statcast data

  • Show clear field outline

  • Distinguish home runs visually

  • Add meaningful title and labels

Exercise 5: Pitch Arsenal Visualization

Create a comprehensive pitch arsenal visualization showing:


  • Pitch movement plot (horizontal vs vertical break)

  • Velocity distribution for each pitch type

  • Usage percentage

Requirements:


  • Can be multiple plots or combined into one figure

  • Use consistent color scheme across plots

  • Show both individual pitches and averages

  • Include pitch counts or percentages

Bonus Challenge: Create a dashboard-style layout combining multiple visualizations for a single player or team. Use faceting, subplots, or a grid layout to show:


  1. Performance trend over season

  2. Situational splits (home/away, day/night)

  3. Distribution of key metrics

  4. Comparison to league average


Summary

In this chapter, we've covered the complete landscape of baseball data visualization:

  1. Principles: Choosing appropriate chart types, using accessible color palettes, following baseball conventions, and telling stories with data
  2. ggplot2: Building visualizations layer by layer using the grammar of graphics
  3. Matplotlib/Seaborn: Python alternatives with both low-level control and high-level convenience
  4. Interactive Viz: Creating explorable graphics with plotly
  5. Baseball-Specific: Spray charts, strike zone heat maps, pitch movement plots, and zone analysis

The key to effective visualization is matching the graphic type to your question, keeping it simple enough to understand quickly, and adding enough context (reference lines, annotations, color) to tell a complete story. Practice these techniques with real data, experiment with different approaches, and always ask: "What insight am I trying to communicate?"

In Chapter 5, we'll move from description to prediction, exploring statistical models that forecast player performance and team outcomes.

R
# Get batted ball data with pitch types
batted_pitches <- statcast_search(...) %>%
  filter(!is.na(launch_speed),
         pitch_type %in% c("FF", "SL", "CH", "CU"))

# Create overlapping histograms or faceted density plots
ggplot(batted_pitches, aes(x = launch_speed, fill = pitch_type)) +
  geom_density(alpha = 0.5) +
  # Add your customization here
Python
# Filter data
batted_pitches = data[data['launch_speed'].notna() & ...]

# Create overlapping density plots
plt.figure(figsize=(12, 6))
for pitch_type in ['FF', 'SL', 'CH', 'CU']:
    subset = batted_pitches[batted_pitches['pitch_type'] == pitch_type]
    sns.kdeplot(subset['launch_speed'], label=pitch_type, alpha=0.6)
# Add customization

Chapter Summary

In this chapter, you learned about data visualization for baseball. Key topics covered:

  • Visualization Principles for Baseball
  • ggplot2 for Baseball (R)
  • Matplotlib and Seaborn for Baseball (Python)
  • Interactive Visualizations
  • Baseball-Specific Visualizations
  • Exercises