Chapter 16: In-Game Strategy & Decision Making

16.1 Run Expectancy & Win Probability

Understanding Run Expectancy

Run expectancy (RE) is the foundation of modern in-game strategy analysis. It answers a simple question: Given the current game state (inning, outs, runners on base), how many runs should we expect to score in the remainder of the inning?

The run expectancy matrix is constructed by examining thousands of games and calculating the average runs scored from each of the 24 possible base-out states (3 out states × 8 base configurations).

Here's the run expectancy matrix based on recent MLB data (2020-2024):

Bases	0 Outs	1 Out	2 Outs
Empty	0.481	0.254	0.098
1st	0.859	0.509	0.214
2nd	1.100	0.664	0.315
3rd	1.350	0.897	0.361
1st & 2nd	1.437	0.908	0.430
1st & 3rd	1.784	1.172	0.494
2nd & 3rd	1.946	1.352	0.578
Bases Loaded	2.282	1.541	0.736

This table reveals important strategic insights:

Each out is costly: Going from 0 to 1 out with bases empty reduces expected runs from 0.481 to 0.254—nearly a 50% reduction.
Runners matter more with fewer outs: A runner on second with 0 outs is worth 1.100 runs, but only 0.315 with 2 outs.
First and third is valuable: Contrary to conventional wisdom about avoiding the double play, first and third with 1 out (1.172 expected runs) is better than just third base (0.897).

Calculating Run Expectancy

Let's build a run expectancy matrix from play-by-play data:

# R: Build run expectancy matrix
library(dplyr)

# Function to create base-out state
create_state <- function(on_1b, on_2b, on_3b, outs) {
  bases <- case_when(
    on_1b & on_2b & on_3b ~ "123",
    on_1b & on_2b ~ "12_",
    on_1b & on_3b ~ "1_3",
    on_2b & on_3b ~ "_23",
    on_1b ~ "1__",
    on_2b ~ "_2_",
    on_3b ~ "__3",
    TRUE ~ "___"
  )
  paste0(bases, "_", outs)
}

# Calculate run expectancy
# Assumes pbp dataframe with: inning, outs, on_1b, on_2b, on_3b, runs_scored_rest_inning
run_expectancy <- pbp %>%
  mutate(state = create_state(on_1b, on_2b, on_3b, outs)) %>%
  group_by(state) %>%
  summarise(
    avg_runs = mean(runs_scored_rest_inning, na.rm = TRUE),
    n_plays = n()
  ) %>%
  arrange(state)

# Reshape into matrix format
re_matrix <- run_expectancy %>%
  separate(state, into = c("bases", "outs"), sep = "_") %>%
  pivot_wider(names_from = outs, values_from = avg_runs, names_prefix = "out_")

print(re_matrix)

# Python: Build run expectancy matrix
import pandas as pd
import numpy as np

def create_state(row):
    """Create base-out state string"""
    bases = ""
    if row['on_1b']: bases += "1"
    else: bases += "_"
    if row['on_2b']: bases += "2"
    else: bases += "_"
    if row['on_3b']: bases += "3"
    else: bases += "_"
    return f"{bases}_{row['outs']}"

# Calculate run expectancy
# Assumes pbp dataframe with: inning, outs, on_1b, on_2b, on_3b, runs_scored_rest_inning
pbp['state'] = pbp.apply(create_state, axis=1)

run_expectancy = pbp.groupby('state').agg(
    avg_runs=('runs_scored_rest_inning', 'mean'),
    n_plays=('runs_scored_rest_inning', 'count')
).reset_index()

# Reshape into matrix format
run_expectancy[['bases', 'outs']] = run_expectancy['state'].str.split('_', expand=True)
re_matrix = run_expectancy.pivot(index='bases', columns='outs', values='avg_runs')

print(re_matrix)

Run Expectancy Added (RE24)

Run Expectancy Added (RE24) measures how much a player's actions changed their team's run expectancy. Each event changes the game state—the difference in expected runs between the new and old states, plus any runs that actually scored, equals the value of that play.

Formula: RE24 = Runs Scored + RE(new state) - RE(old state)

Example: Runner on first, 0 outs (RE = 0.859). Batter hits a single, advancing the runner to third. New state is first and third, 0 outs (RE = 1.784).

RE24 = 0 + 1.784 - 0.859 = +0.925 runs

If instead the batter grounded into a double play, ending the inning:

RE24 = 0 + 0.000 - 0.859 = -0.859 runs

# R: Calculate RE24 for a player's season
calculate_re24 <- function(plays, re_matrix) {
  plays %>%
    left_join(re_matrix, by = c("state_before" = "state")) %>%
    rename(re_before = avg_runs) %>%
    left_join(re_matrix, by = c("state_after" = "state")) %>%
    rename(re_after = avg_runs) %>%
    mutate(
      re_after = ifelse(inning_end, 0, re_after),  # Inning over = 0 RE
      re24 = runs_scored + re_after - re_before
    ) %>%
    group_by(batter_id, batter_name) %>%
    summarise(
      total_re24 = sum(re24, na.rm = TRUE),
      avg_re24 = mean(re24, na.rm = TRUE),
      plate_appearances = n()
    ) %>%
    arrange(desc(total_re24))
}

# Python: Calculate RE24 for a player's season
def calculate_re24(plays, re_matrix):
    """Calculate RE24 for each play"""
    # Merge RE values
    plays = plays.merge(
        re_matrix,
        left_on='state_before',
        right_index=True,
        how='left'
    ).rename(columns={'avg_runs': 're_before'})

    plays = plays.merge(
        re_matrix,
        left_on='state_after',
        right_index=True,
        how='left'
    ).rename(columns={'avg_runs': 're_after'})

    # Inning over = 0 RE
    plays['re_after'] = plays['re_after'].fillna(0)
    plays.loc[plays['inning_end'], 're_after'] = 0

    # Calculate RE24
    plays['re24'] = plays['runs_scored'] + plays['re_after'] - plays['re_before']

    # Aggregate by player
    player_re24 = plays.groupby(['batter_id', 'batter_name']).agg({
        're24': ['sum', 'mean', 'count']
    }).reset_index()

    player_re24.columns = ['batter_id', 'batter_name', 'total_re24', 'avg_re24', 'pa']

    return player_re24.sort_values('total_re24', ascending=False)

Win Probability

While run expectancy focuses on runs in the current inning, win probability (WP) estimates the chance of winning the entire game based on the current situation. Win probability considers:

Score differential: How many runs ahead or behind
Inning: How much time remains
Base-out state: Current run-scoring potential
Home/away: The home team gets last at-bats if tied or behind

A win probability model is built by examining thousands of historical games and determining what percentage of teams in each situation went on to win.

Example Win Probability Scenarios:

Home team, tied score, top of 1st, bases empty, 0 outs: 52% WP (slight home advantage)
Home team, down 1 run, bottom of 9th, bases loaded, 1 out: 62% WP
Home team, up 3 runs, top of 9th, bases empty, 2 outs: 98% WP

Win Probability Added (WPA)

Win Probability Added measures the change in win probability caused by a specific event. It's calculated as:

WPA = WP(after) - WP(before)

Example: Bottom of the 9th, score tied, bases loaded, 1 out. WP = 68%. Batter hits a sacrifice fly to win the game. New WP = 100%.

WPA = 100% - 68% = +32% (or +0.32)

The most valuable plays in baseball have high WPA: walk-off home runs, go-ahead hits in late innings, or strikeouts that strand the bases loaded.

# R: Calculate WPA using a simple model
calculate_wp <- function(score_diff, inning, half, outs, runners) {
  # Simplified model - in practice, use logistic regression on historical data

  # Base probability from score differential (logistic curve)
  base_wp <- 1 / (1 + exp(-0.14 * score_diff))

  # Adjust for inning progression (more weight to later innings)
  inning_weight <- (inning - 1) / 8

  # Adjust for current base-out state run potential
  re_factor <- runners / 2.5  # Normalize to typical max RE

  # Home team bonus in bottom of 9th or extras
  home_bonus <- ifelse(half == "bottom" & inning >= 9, 0.05, 0)

  # Combine factors
  wp <- base_wp + (1 - inning_weight) * re_factor * 0.1 + home_bonus
  wp <- pmax(0.01, pmin(0.99, wp))  # Bound between 1% and 99%

  return(wp)
}

# Python: Calculate WPA using a simple model
import numpy as np

def calculate_wp(score_diff, inning, half, outs, runners):
    """
    Calculate win probability
    Simplified model - in practice, use logistic regression on historical data
    """
    # Base probability from score differential (logistic curve)
    base_wp = 1 / (1 + np.exp(-0.14 * score_diff))

    # Adjust for inning progression
    inning_weight = (inning - 1) / 8

    # Adjust for current base-out state run potential
    re_factor = runners / 2.5  # Normalize to typical max RE

    # Home team bonus in bottom of 9th or extras
    home_bonus = 0.05 if (half == "bottom" and inning >= 9) else 0

    # Combine factors
    wp = base_wp + (1 - inning_weight) * re_factor * 0.1 + home_bonus
    wp = np.clip(wp, 0.01, 0.99)  # Bound between 1% and 99%

    return wp

Leverage Index

Leverage Index (LI) measures how important a particular game situation is. High-leverage situations are those where the outcome significantly affects win probability; low-leverage situations have minimal impact.

Average leverage is defined as 1.0. A leverage index of 2.0 means the situation is twice as important as average; 0.5 means half as important.

High-leverage situations:

Tied game, late innings, runners in scoring position

One-run game, 7th inning or later

Close games with bases loaded

Low-leverage situations:

Large lead or deficit

Early innings with bases empty

Blowout games

Leverage Index helps evaluate relief pitchers, who typically enter high-leverage situations, and clutch performance.

# R: Calculate Leverage Index
calculate_leverage <- function(wp_before, wp_after_pos, wp_after_neg) {
  # Leverage is the average WP swing from positive or negative outcomes
  # compared to the average WP swing

  swing_positive <- abs(wp_after_pos - wp_before)
  swing_negative <- abs(wp_after_neg - wp_before)

  leverage <- (swing_positive + swing_negative) / 2

  # Normalize so average leverage = 1.0
  # (would need to calculate average swing across all situations)

  return(leverage)
}

# High leverage situations typically have LI > 2.0
# Bottom 9th, tie game, runner on 3rd, 2 outs: LI ~ 3.5

R

# R: Build run expectancy matrix
library(dplyr)

# Function to create base-out state
create_state <- function(on_1b, on_2b, on_3b, outs) {
  bases <- case_when(
    on_1b & on_2b & on_3b ~ "123",
    on_1b & on_2b ~ "12_",
    on_1b & on_3b ~ "1_3",
    on_2b & on_3b ~ "_23",
    on_1b ~ "1__",
    on_2b ~ "_2_",
    on_3b ~ "__3",
    TRUE ~ "___"
  )
  paste0(bases, "_", outs)
}

# Calculate run expectancy
# Assumes pbp dataframe with: inning, outs, on_1b, on_2b, on_3b, runs_scored_rest_inning
run_expectancy <- pbp %>%
  mutate(state = create_state(on_1b, on_2b, on_3b, outs)) %>%
  group_by(state) %>%
  summarise(
    avg_runs = mean(runs_scored_rest_inning, na.rm = TRUE),
    n_plays = n()
  ) %>%
  arrange(state)

# Reshape into matrix format
re_matrix <- run_expectancy %>%
  separate(state, into = c("bases", "outs"), sep = "_") %>%
  pivot_wider(names_from = outs, values_from = avg_runs, names_prefix = "out_")

print(re_matrix)

R

# R: Calculate RE24 for a player's season
calculate_re24 <- function(plays, re_matrix) {
  plays %>%
    left_join(re_matrix, by = c("state_before" = "state")) %>%
    rename(re_before = avg_runs) %>%
    left_join(re_matrix, by = c("state_after" = "state")) %>%
    rename(re_after = avg_runs) %>%
    mutate(
      re_after = ifelse(inning_end, 0, re_after),  # Inning over = 0 RE
      re24 = runs_scored + re_after - re_before
    ) %>%
    group_by(batter_id, batter_name) %>%
    summarise(
      total_re24 = sum(re24, na.rm = TRUE),
      avg_re24 = mean(re24, na.rm = TRUE),
      plate_appearances = n()
    ) %>%
    arrange(desc(total_re24))
}

R

# R: Calculate WPA using a simple model
calculate_wp <- function(score_diff, inning, half, outs, runners) {
  # Simplified model - in practice, use logistic regression on historical data

  # Base probability from score differential (logistic curve)
  base_wp <- 1 / (1 + exp(-0.14 * score_diff))

  # Adjust for inning progression (more weight to later innings)
  inning_weight <- (inning - 1) / 8

  # Adjust for current base-out state run potential
  re_factor <- runners / 2.5  # Normalize to typical max RE

  # Home team bonus in bottom of 9th or extras
  home_bonus <- ifelse(half == "bottom" & inning >= 9, 0.05, 0)

  # Combine factors
  wp <- base_wp + (1 - inning_weight) * re_factor * 0.1 + home_bonus
  wp <- pmax(0.01, pmin(0.99, wp))  # Bound between 1% and 99%

  return(wp)
}

R

# R: Calculate Leverage Index
calculate_leverage <- function(wp_before, wp_after_pos, wp_after_neg) {
  # Leverage is the average WP swing from positive or negative outcomes
  # compared to the average WP swing

  swing_positive <- abs(wp_after_pos - wp_before)
  swing_negative <- abs(wp_after_neg - wp_before)

  leverage <- (swing_positive + swing_negative) / 2

  # Normalize so average leverage = 1.0
  # (would need to calculate average swing across all situations)

  return(leverage)
}

# High leverage situations typically have LI > 2.0
# Bottom 9th, tie game, runner on 3rd, 2 outs: LI ~ 3.5

Python

# Python: Build run expectancy matrix
import pandas as pd
import numpy as np

def create_state(row):
    """Create base-out state string"""
    bases = ""
    if row['on_1b']: bases += "1"
    else: bases += "_"
    if row['on_2b']: bases += "2"
    else: bases += "_"
    if row['on_3b']: bases += "3"
    else: bases += "_"
    return f"{bases}_{row['outs']}"

# Calculate run expectancy
# Assumes pbp dataframe with: inning, outs, on_1b, on_2b, on_3b, runs_scored_rest_inning
pbp['state'] = pbp.apply(create_state, axis=1)

run_expectancy = pbp.groupby('state').agg(
    avg_runs=('runs_scored_rest_inning', 'mean'),
    n_plays=('runs_scored_rest_inning', 'count')
).reset_index()

# Reshape into matrix format
run_expectancy[['bases', 'outs']] = run_expectancy['state'].str.split('_', expand=True)
re_matrix = run_expectancy.pivot(index='bases', columns='outs', values='avg_runs')

print(re_matrix)

Python

# Python: Calculate RE24 for a player's season
def calculate_re24(plays, re_matrix):
    """Calculate RE24 for each play"""
    # Merge RE values
    plays = plays.merge(
        re_matrix,
        left_on='state_before',
        right_index=True,
        how='left'
    ).rename(columns={'avg_runs': 're_before'})

    plays = plays.merge(
        re_matrix,
        left_on='state_after',
        right_index=True,
        how='left'
    ).rename(columns={'avg_runs': 're_after'})

    # Inning over = 0 RE
    plays['re_after'] = plays['re_after'].fillna(0)
    plays.loc[plays['inning_end'], 're_after'] = 0

    # Calculate RE24
    plays['re24'] = plays['runs_scored'] + plays['re_after'] - plays['re_before']

    # Aggregate by player
    player_re24 = plays.groupby(['batter_id', 'batter_name']).agg({
        're24': ['sum', 'mean', 'count']
    }).reset_index()

    player_re24.columns = ['batter_id', 'batter_name', 'total_re24', 'avg_re24', 'pa']

    return player_re24.sort_values('total_re24', ascending=False)

Python

# Python: Calculate WPA using a simple model
import numpy as np

def calculate_wp(score_diff, inning, half, outs, runners):
    """
    Calculate win probability
    Simplified model - in practice, use logistic regression on historical data
    """
    # Base probability from score differential (logistic curve)
    base_wp = 1 / (1 + np.exp(-0.14 * score_diff))

    # Adjust for inning progression
    inning_weight = (inning - 1) / 8

    # Adjust for current base-out state run potential
    re_factor = runners / 2.5  # Normalize to typical max RE

    # Home team bonus in bottom of 9th or extras
    home_bonus = 0.05 if (half == "bottom" and inning >= 9) else 0

    # Combine factors
    wp = base_wp + (1 - inning_weight) * re_factor * 0.1 + home_bonus
    wp = np.clip(wp, 0.01, 0.99)  # Bound between 1% and 99%

    return wp

16.2 Pitching Strategy

Times Through the Order Penalty

One of the most important analytical discoveries of the past decade is the "times through the order penalty" (TTOP). Pitchers perform significantly worse each time they face the same batters in a game.

Average Performance by Time Through Order:

Time	wOBA Against	ERA	K%	BB%
1st	.308	3.68	22.5%	7.8%
2nd	.317	4.12	21.8%	8.1%
3rd	.330	4.58	20.9%	8.6%
4th+	.341	4.95	20.1%	9.2%

The third time through, pitchers allow a wOBA roughly .022 points higher—equivalent to the difference between an average hitter and an All-Star. This penalty exists even when controlling for pitcher fatigue, pitch count, and game situation.

Why does this happen?

Familiarity: Batters learn pitcher patterns and tendencies

Sequencing exhaustion: Pitchers run out of new sequences to show

Fatigue: Physical decline compounds across innings

Stuff degradation: Velocity and movement decrease

This finding has revolutionized pitching management, leading to:

Earlier hook for starting pitchers (fewer complete games)

Increased use of "openers" and bulk relievers

Strategic removal even when pitcher "looks fine"

# R: Analyze times through order penalty
analyze_ttop <- function(pitcher_data) {
  pitcher_data %>%
    mutate(times_faced = factor(pmin(times_faced, 4),
                                labels = c("1st", "2nd", "3rd", "4th+"))) %>%
    group_by(pitcher_id, pitcher_name, times_faced) %>%
    summarise(
      pa = n(),
      woba = sum(woba_value, na.rm = TRUE) / sum(woba_denom, na.rm = TRUE),
      k_rate = sum(strikeout) / n(),
      bb_rate = sum(walk) / n(),
      avg_velo = mean(release_speed, na.rm = TRUE)
    ) %>%
    pivot_wider(
      names_from = times_faced,
      values_from = c(woba, k_rate, bb_rate, avg_velo)
    ) %>%
    mutate(
      ttop_penalty = woba_3rd - woba_1st
    ) %>%
    arrange(desc(ttop_penalty))
}

# Python: Analyze times through order penalty
def analyze_ttop(pitcher_data):
    """Analyze times through the order penalty"""
    pitcher_data['times_faced_cat'] = pitcher_data['times_faced'].clip(upper=4)
    pitcher_data['times_faced_cat'] = pitcher_data['times_faced_cat'].map({
        1: '1st', 2: '2nd', 3: '3rd', 4: '4th+'
    })

    # Calculate metrics by times faced
    ttop_analysis = pitcher_data.groupby(['pitcher_id', 'pitcher_name', 'times_faced_cat']).agg({
        'woba_value': 'sum',
        'woba_denom': 'sum',
        'strikeout': 'sum',
        'walk': 'sum',
        'release_speed': 'mean',
        'pitcher_id': 'count'
    }).rename(columns={'pitcher_id': 'pa'}).reset_index()

    # Calculate rates
    ttop_analysis['woba'] = ttop_analysis['woba_value'] / ttop_analysis['woba_denom']
    ttop_analysis['k_rate'] = ttop_analysis['strikeout'] / ttop_analysis['pa']
    ttop_analysis['bb_rate'] = ttop_analysis['walk'] / ttop_analysis['pa']

    # Pivot to wide format
    ttop_wide = ttop_analysis.pivot(
        index=['pitcher_id', 'pitcher_name'],
        columns='times_faced_cat',
        values=['woba', 'k_rate', 'bb_rate', 'release_speed']
    )

    # Calculate penalty
    ttop_wide['ttop_penalty'] = ttop_wide['woba']['3rd'] - ttop_wide['woba']['1st']

    return ttop_wide.sort_values('ttop_penalty', ascending=False)

Pitch Mix Optimization

Modern analytics allows pitchers to optimize their pitch mix based on what generates the best outcomes. Key considerations:

Usage rate vs. effectiveness: Sometimes pitchers overuse their best pitch, making it predictable
Platoon splits: Different pitches work better against same/opposite-handed batters
Count leverage: Pitches that generate called strikes (fastballs) vs. swings-and-misses (breaking balls)
Tunneling: Pitch combinations that look similar initially but diverge

# R: Analyze pitch effectiveness
analyze_pitch_mix <- function(pitcher_data) {
  pitcher_data %>%
    group_by(pitcher_id, pitcher_name, pitch_type) %>%
    summarise(
      pitches = n(),
      usage = n() / nrow(.),
      whiff_rate = sum(description == "swinging_strike") /
                   sum(description %in% c("swinging_strike", "foul", "in_play")),
      csw = sum(description %in% c("called_strike", "swinging_strike")) / n(),
      woba = sum(woba_value, na.rm = TRUE) / sum(woba_denom, na.rm = TRUE),
      avg_velo = mean(release_speed, na.rm = TRUE),
      avg_spin = mean(release_spin_rate, na.rm = TRUE)
    ) %>%
    arrange(pitcher_id, desc(pitches))
}

# Optimal mix balances usage with effectiveness
# Red flags: High-usage pitch with below-average results
# Opportunities: Underused pitch with strong results

# Python: Analyze pitch effectiveness
def analyze_pitch_mix(pitcher_data):
    """Analyze effectiveness of each pitch type"""

    # Calculate swinging strikes
    pitcher_data['is_whiff'] = pitcher_data['description'] == 'swinging_strike'
    pitcher_data['is_swing'] = pitcher_data['description'].isin([
        'swinging_strike', 'foul', 'in_play'
    ])
    pitcher_data['is_csw'] = pitcher_data['description'].isin([
        'called_strike', 'swinging_strike'
    ])

    pitch_analysis = pitcher_data.groupby(['pitcher_id', 'pitcher_name', 'pitch_type']).agg({
        'pitch_type': 'count',
        'is_whiff': 'sum',
        'is_swing': 'sum',
        'is_csw': 'sum',
        'woba_value': 'sum',
        'woba_denom': 'sum',
        'release_speed': 'mean',
        'release_spin_rate': 'mean'
    }).rename(columns={'pitch_type': 'pitches'}).reset_index()

    # Calculate rates
    total_pitches = pitch_analysis.groupby(['pitcher_id', 'pitcher_name'])['pitches'].transform('sum')
    pitch_analysis['usage'] = pitch_analysis['pitches'] / total_pitches
    pitch_analysis['whiff_rate'] = pitch_analysis['is_whiff'] / pitch_analysis['is_swing']
    pitch_analysis['csw'] = pitch_analysis['is_csw'] / pitch_analysis['pitches']
    pitch_analysis['woba'] = pitch_analysis['woba_value'] / pitch_analysis['woba_denom']

    return pitch_analysis.sort_values(['pitcher_id', 'pitches'], ascending=[True, False])

Platoon Advantages

Pitchers typically perform better against same-handed batters due to:

Better pitch visibility angles

More effective breaking balls

Traditional platoon advantage

Average Platoon Splits (RHP vs. RHB vs. LHB):

RHP vs. RHB: .310 wOBA

RHP vs. LHB: .325 wOBA

LHP vs. LHB: .305 wOBA

LHP vs. RHB: .330 wOBA

However, not all pitchers have significant platoon splits. Some are "platoon-neutral" and effective against both handedness, making them more valuable.

R

# R: Analyze times through order penalty
analyze_ttop <- function(pitcher_data) {
  pitcher_data %>%
    mutate(times_faced = factor(pmin(times_faced, 4),
                                labels = c("1st", "2nd", "3rd", "4th+"))) %>%
    group_by(pitcher_id, pitcher_name, times_faced) %>%
    summarise(
      pa = n(),
      woba = sum(woba_value, na.rm = TRUE) / sum(woba_denom, na.rm = TRUE),
      k_rate = sum(strikeout) / n(),
      bb_rate = sum(walk) / n(),
      avg_velo = mean(release_speed, na.rm = TRUE)
    ) %>%
    pivot_wider(
      names_from = times_faced,
      values_from = c(woba, k_rate, bb_rate, avg_velo)
    ) %>%
    mutate(
      ttop_penalty = woba_3rd - woba_1st
    ) %>%
    arrange(desc(ttop_penalty))
}

R

# R: Analyze pitch effectiveness
analyze_pitch_mix <- function(pitcher_data) {
  pitcher_data %>%
    group_by(pitcher_id, pitcher_name, pitch_type) %>%
    summarise(
      pitches = n(),
      usage = n() / nrow(.),
      whiff_rate = sum(description == "swinging_strike") /
                   sum(description %in% c("swinging_strike", "foul", "in_play")),
      csw = sum(description %in% c("called_strike", "swinging_strike")) / n(),
      woba = sum(woba_value, na.rm = TRUE) / sum(woba_denom, na.rm = TRUE),
      avg_velo = mean(release_speed, na.rm = TRUE),
      avg_spin = mean(release_spin_rate, na.rm = TRUE)
    ) %>%
    arrange(pitcher_id, desc(pitches))
}

# Optimal mix balances usage with effectiveness
# Red flags: High-usage pitch with below-average results
# Opportunities: Underused pitch with strong results

Python

# Python: Analyze times through order penalty
def analyze_ttop(pitcher_data):
    """Analyze times through the order penalty"""
    pitcher_data['times_faced_cat'] = pitcher_data['times_faced'].clip(upper=4)
    pitcher_data['times_faced_cat'] = pitcher_data['times_faced_cat'].map({
        1: '1st', 2: '2nd', 3: '3rd', 4: '4th+'
    })

    # Calculate metrics by times faced
    ttop_analysis = pitcher_data.groupby(['pitcher_id', 'pitcher_name', 'times_faced_cat']).agg({
        'woba_value': 'sum',
        'woba_denom': 'sum',
        'strikeout': 'sum',
        'walk': 'sum',
        'release_speed': 'mean',
        'pitcher_id': 'count'
    }).rename(columns={'pitcher_id': 'pa'}).reset_index()

    # Calculate rates
    ttop_analysis['woba'] = ttop_analysis['woba_value'] / ttop_analysis['woba_denom']
    ttop_analysis['k_rate'] = ttop_analysis['strikeout'] / ttop_analysis['pa']
    ttop_analysis['bb_rate'] = ttop_analysis['walk'] / ttop_analysis['pa']

    # Pivot to wide format
    ttop_wide = ttop_analysis.pivot(
        index=['pitcher_id', 'pitcher_name'],
        columns='times_faced_cat',
        values=['woba', 'k_rate', 'bb_rate', 'release_speed']
    )

    # Calculate penalty
    ttop_wide['ttop_penalty'] = ttop_wide['woba']['3rd'] - ttop_wide['woba']['1st']

    return ttop_wide.sort_values('ttop_penalty', ascending=False)

Python

# Python: Analyze pitch effectiveness
def analyze_pitch_mix(pitcher_data):
    """Analyze effectiveness of each pitch type"""

    # Calculate swinging strikes
    pitcher_data['is_whiff'] = pitcher_data['description'] == 'swinging_strike'
    pitcher_data['is_swing'] = pitcher_data['description'].isin([
        'swinging_strike', 'foul', 'in_play'
    ])
    pitcher_data['is_csw'] = pitcher_data['description'].isin([
        'called_strike', 'swinging_strike'
    ])

    pitch_analysis = pitcher_data.groupby(['pitcher_id', 'pitcher_name', 'pitch_type']).agg({
        'pitch_type': 'count',
        'is_whiff': 'sum',
        'is_swing': 'sum',
        'is_csw': 'sum',
        'woba_value': 'sum',
        'woba_denom': 'sum',
        'release_speed': 'mean',
        'release_spin_rate': 'mean'
    }).rename(columns={'pitch_type': 'pitches'}).reset_index()

    # Calculate rates
    total_pitches = pitch_analysis.groupby(['pitcher_id', 'pitcher_name'])['pitches'].transform('sum')
    pitch_analysis['usage'] = pitch_analysis['pitches'] / total_pitches
    pitch_analysis['whiff_rate'] = pitch_analysis['is_whiff'] / pitch_analysis['is_swing']
    pitch_analysis['csw'] = pitch_analysis['is_csw'] / pitch_analysis['pitches']
    pitch_analysis['woba'] = pitch_analysis['woba_value'] / pitch_analysis['woba_denom']

    return pitch_analysis.sort_values(['pitcher_id', 'pitches'], ascending=[True, False])

16.3 Batting Order Optimization

The Traditional Lineup

Traditional baseball wisdom dictates a specific batting order construction:

Leadoff: Fast, high OBP
Two-hole: Contact hitter who can hit-and-run
Three-hole: Best hitter
Cleanup: Power hitter
Five-hole: Second-best power hitter

6-9. Lower order: Declining quality

This construction is suboptimal. Analytics suggests a different approach.

Optimal Lineup Construction

The key insight: Your best hitters should get the most plate appearances. Lineup position dramatically affects PAs over a season:

Lineup Spot	PA over 162 Games
1st	740
2nd	735
3rd	725
4th	715
5th	705
6th	695
7th	680
8th	665
9th	650

The leadoff hitter gets nearly 90 more PAs than the 9-hole hitter. That's equivalent to 20 games of at-bats!

Analytical Optimal Order:

Best on-base hitter: Maximizes times on base

Best overall hitter: Maximize PAs for best player

Second-best overall hitter: Still gets high PA total

Best power hitter: Drives in runners from top of order

Next best hitter: Continued quality

The old wisdom of "saving" your best hitter for the 3-hole costs wins. The difference between batting 1st and 3rd is about 15 PAs—equivalent to 1-2 runs over a full season for an elite hitter.

Advanced Lineup Optimization

Modern lineup optimization uses simulation to test thousands of possible orders:

# R: Simulate lineup performance
simulate_lineup <- function(lineup, n_games = 1000) {
  # lineup is a dataframe with: name, obp, slg, position

  results <- replicate(n_games, {
    runs <- 0
    for (inning in 1:9) {
      outs <- 0
      bases <- c(0, 0, 0)  # 1st, 2nd, 3rd
      batter_index <- (inning - 1) %% 9 + 1

      while (outs < 3) {
        batter <- lineup[batter_index, ]

        # Simulate outcome based on batter stats
        outcome <- simulate_pa(batter$obp, batter$slg)

        # Update game state
        result <- update_bases(bases, outs, outcome)
        bases <- result$bases
        outs <- result$outs
        runs <- runs + result$runs_scored

        batter_index <- (batter_index %% 9) + 1
      }
    }
    runs
  })

  return(mean(results))
}

# Compare different lineup orders
optimize_lineup <- function(players) {
  # Generate permutations and simulate
  # Return best performing order
}

# Python: Simulate lineup performance
import numpy as np
from itertools import permutations

def simulate_pa(obp, slg):
    """Simulate a plate appearance outcome"""
    rand = np.random.random()

    if rand < obp:  # On base
        if np.random.random() < (slg - obp) / (1 - obp):
            return 'extra_base'  # Double or better
        return 'single'
    else:
        return 'out'

def update_bases(bases, outs, outcome, runs):
    """Update base state after PA"""
    if outcome == 'out':
        return bases, outs + 1, runs
    elif outcome == 'single':
        runs += bases[2]  # Score from 3rd
        bases = [1, bases[0], bases[1]]  # Advance
        return bases, outs, runs
    elif outcome == 'extra_base':
        runs += sum(bases)  # Score all runners
        bases = [0, 1, 0]  # Batter to second (simplified)
        return bases, outs, runs

def simulate_lineup(lineup, n_games=1000):
    """Simulate lineup performance over many games"""
    total_runs = []

    for _ in range(n_games):
        runs = 0
        batter_index = 0

        for inning in range(9):
            outs = 0
            bases = [0, 0, 0]  # 1st, 2nd, 3rd

            while outs < 3:
                batter = lineup.iloc[batter_index]
                outcome = simulate_pa(batter['obp'], batter['slg'])
                bases, outs, runs = update_bases(bases, outs, outcome, runs)
                batter_index = (batter_index + 1) % 9

        total_runs.append(runs)

    return np.mean(total_runs)

# Test multiple lineup orders to find optimal
def optimize_lineup(players):
    """Find optimal lineup order through simulation"""
    best_score = 0
    best_lineup = None

    # Test sample of permutations (9! = 362,880 is too many)
    for _ in range(1000):
        lineup_order = np.random.permutation(len(players))
        test_lineup = players.iloc[lineup_order]
        score = simulate_lineup(test_lineup)

        if score > best_score:
            best_score = score
            best_lineup = test_lineup

    return best_lineup, best_score

Special Considerations

Pitcher's spot (NL, pre-DH): The pitcher batting 9th creates a "second leadoff" effect—the top of the order follows the pitcher. Some teams experimented with batting the pitcher 8th to avoid this.

Speed vs. power: Fast players are often placed leadoff, but a slow player with .380 OBP is better than a fast player with .320 OBP.

Lefty-righty balance: Alternating handedness prevents wholesale bullpen matchup hunting, though this is a minor consideration.

R

# R: Simulate lineup performance
simulate_lineup <- function(lineup, n_games = 1000) {
  # lineup is a dataframe with: name, obp, slg, position

  results <- replicate(n_games, {
    runs <- 0
    for (inning in 1:9) {
      outs <- 0
      bases <- c(0, 0, 0)  # 1st, 2nd, 3rd
      batter_index <- (inning - 1) %% 9 + 1

      while (outs < 3) {
        batter <- lineup[batter_index, ]

        # Simulate outcome based on batter stats
        outcome <- simulate_pa(batter$obp, batter$slg)

        # Update game state
        result <- update_bases(bases, outs, outcome)
        bases <- result$bases
        outs <- result$outs
        runs <- runs + result$runs_scored

        batter_index <- (batter_index %% 9) + 1
      }
    }
    runs
  })

  return(mean(results))
}

# Compare different lineup orders
optimize_lineup <- function(players) {
  # Generate permutations and simulate
  # Return best performing order
}

Python

# Python: Simulate lineup performance
import numpy as np
from itertools import permutations

def simulate_pa(obp, slg):
    """Simulate a plate appearance outcome"""
    rand = np.random.random()

    if rand < obp:  # On base
        if np.random.random() < (slg - obp) / (1 - obp):
            return 'extra_base'  # Double or better
        return 'single'
    else:
        return 'out'

def update_bases(bases, outs, outcome, runs):
    """Update base state after PA"""
    if outcome == 'out':
        return bases, outs + 1, runs
    elif outcome == 'single':
        runs += bases[2]  # Score from 3rd
        bases = [1, bases[0], bases[1]]  # Advance
        return bases, outs, runs
    elif outcome == 'extra_base':
        runs += sum(bases)  # Score all runners
        bases = [0, 1, 0]  # Batter to second (simplified)
        return bases, outs, runs

def simulate_lineup(lineup, n_games=1000):
    """Simulate lineup performance over many games"""
    total_runs = []

    for _ in range(n_games):
        runs = 0
        batter_index = 0

        for inning in range(9):
            outs = 0
            bases = [0, 0, 0]  # 1st, 2nd, 3rd

            while outs < 3:
                batter = lineup.iloc[batter_index]
                outcome = simulate_pa(batter['obp'], batter['slg'])
                bases, outs, runs = update_bases(bases, outs, outcome, runs)
                batter_index = (batter_index + 1) % 9

        total_runs.append(runs)

    return np.mean(total_runs)

# Test multiple lineup orders to find optimal
def optimize_lineup(players):
    """Find optimal lineup order through simulation"""
    best_score = 0
    best_lineup = None

    # Test sample of permutations (9! = 362,880 is too many)
    for _ in range(1000):
        lineup_order = np.random.permutation(len(players))
        test_lineup = players.iloc[lineup_order]
        score = simulate_lineup(test_lineup)

        if score > best_score:
            best_score = score
            best_lineup = test_lineup

    return best_lineup, best_score

16.4 Baserunning Decisions

Stolen Base Analysis

The stolen base decision is one of baseball's most analyzable strategic choices. A successful steal improves run expectancy; a caught stealing damages it severely.

Run Expectancy Changes:

Situation	Steal Success	Caught Stealing	Difference
1st, 0 out	+0.241	-0.571	0.812
1st, 1 out	+0.155	-0.295	0.450
1st, 2 out	+0.101	-0.116	0.217

Break-even success rate = CS penalty / (SB benefit + CS penalty)

For runner on first, 0 outs:
Break-even = 0.571 / (0.241 + 0.571) = 70.3%

You need to succeed at least 70% of the time for stealing to be worthwhile with 0 outs. With 2 outs, the break-even drops to about 53%, making stolen bases more viable late in innings.

Modern context: With 2 outs and modern low batting averages, stolen bases are less valuable—fewer runners left to be driven in. With 0 outs, the high break-even rate makes steals risky unless the runner has elite success rates (80%+).

# R: Calculate break-even SB rate
calculate_breakeven_sb <- function(re_matrix) {
  breakeven_rates <- data.frame(
    situation = character(),
    outs = integer(),
    breakeven_rate = numeric()
  )

  # Runner on 1st scenarios
  for (out in 0:2) {
    re_1st <- re_matrix[re_matrix$bases == "1__" & re_matrix$outs == out, ]$avg_runs
    re_2nd <- re_matrix[re_matrix$bases == "_2_" & re_matrix$outs == out, ]$avg_runs
    re_out <- re_matrix[re_matrix$bases == "___" & re_matrix$outs == (out + 1), ]$avg_runs

    if (out < 2) {
      benefit <- re_2nd - re_1st
      penalty <- re_1st - re_out
      breakeven <- penalty / (benefit + penalty)

      breakeven_rates <- rbind(breakeven_rates, data.frame(
        situation = "1st base",
        outs = out,
        sb_benefit = benefit,
        cs_penalty = penalty,
        breakeven_rate = breakeven
      ))
    }
  }

  return(breakeven_rates)
}

# Python: Calculate break-even SB rate
def calculate_breakeven_sb(re_matrix):
    """Calculate break-even stolen base success rate"""
    breakeven_rates = []

    # Runner on 1st scenarios
    for outs in range(3):
        re_1st = re_matrix.loc['1__', str(outs)]
        re_2nd = re_matrix.loc['_2_', str(outs)]

        if outs < 2:
            re_out = re_matrix.loc['___', str(outs + 1)]
        else:
            re_out = 0  # Inning over

        benefit = re_2nd - re_1st
        penalty = re_1st - re_out
        breakeven = penalty / (benefit + penalty) if (benefit + penalty) > 0 else 0

        breakeven_rates.append({
            'situation': '1st base',
            'outs': outs,
            'sb_benefit': benefit,
            'cs_penalty': penalty,
            'breakeven_rate': breakeven
        })

    return pd.DataFrame(breakeven_rates)

# Example output:
#   situation  outs  sb_benefit  cs_penalty  breakeven_rate
# 0  1st base     0      0.241       0.571           0.703
# 1  1st base     1      0.155       0.295           0.655
# 2  1st base     2      0.101       0.116           0.534

Going First-to-Third

Advancing from first to third on a single is one of the most valuable baserunning plays. It increases run expectancy significantly:

Runner on 1st, 1 out: RE = 0.509
Single, runner stops at 2nd: Runners on 1st and 2nd, 1 out = 0.908 (↑0.399)
Single, runner to 3rd: Runners on 1st and 3rd, 1 out = 1.172 (↑0.663)

The extra base is worth +0.264 runs of expectancy—nearly as valuable as the single itself!

Factors determining first-to-third success:

Runner speed (sprint speed from Statcast)

Ball location (shallow hit vs. deep)

Outfielder arm strength

Game situation (down late, need runs)

Taking Extra Bases

Similar analysis applies to all extra-base advancement decisions:

Second to home on single

First to home on double

Second to third on ground ball out

The decision framework:

Estimate success probability

Calculate RE benefit if successful

Calculate RE penalty if unsuccessful

Compare expected value: (Success% × Benefit) - (Failure% × Penalty)

R

# R: Calculate break-even SB rate
calculate_breakeven_sb <- function(re_matrix) {
  breakeven_rates <- data.frame(
    situation = character(),
    outs = integer(),
    breakeven_rate = numeric()
  )

  # Runner on 1st scenarios
  for (out in 0:2) {
    re_1st <- re_matrix[re_matrix$bases == "1__" & re_matrix$outs == out, ]$avg_runs
    re_2nd <- re_matrix[re_matrix$bases == "_2_" & re_matrix$outs == out, ]$avg_runs
    re_out <- re_matrix[re_matrix$bases == "___" & re_matrix$outs == (out + 1), ]$avg_runs

    if (out < 2) {
      benefit <- re_2nd - re_1st
      penalty <- re_1st - re_out
      breakeven <- penalty / (benefit + penalty)

      breakeven_rates <- rbind(breakeven_rates, data.frame(
        situation = "1st base",
        outs = out,
        sb_benefit = benefit,
        cs_penalty = penalty,
        breakeven_rate = breakeven
      ))
    }
  }

  return(breakeven_rates)
}

Python

# Python: Calculate break-even SB rate
def calculate_breakeven_sb(re_matrix):
    """Calculate break-even stolen base success rate"""
    breakeven_rates = []

    # Runner on 1st scenarios
    for outs in range(3):
        re_1st = re_matrix.loc['1__', str(outs)]
        re_2nd = re_matrix.loc['_2_', str(outs)]

        if outs < 2:
            re_out = re_matrix.loc['___', str(outs + 1)]
        else:
            re_out = 0  # Inning over

        benefit = re_2nd - re_1st
        penalty = re_1st - re_out
        breakeven = penalty / (benefit + penalty) if (benefit + penalty) > 0 else 0

        breakeven_rates.append({
            'situation': '1st base',
            'outs': outs,
            'sb_benefit': benefit,
            'cs_penalty': penalty,
            'breakeven_rate': breakeven
        })

    return pd.DataFrame(breakeven_rates)

# Example output:
#   situation  outs  sb_benefit  cs_penalty  breakeven_rate
# 0  1st base     0      0.241       0.571           0.703
# 1  1st base     1      0.155       0.295           0.655
# 2  1st base     2      0.101       0.116           0.534

16.5 Defensive Positioning

The Shift Era (2011-2022)

The defensive shift—positioning three infielders on one side of second base—was baseball's most visible analytical innovation of the 2010s.

Shift Usage Growth:

2011: 2,357 shifts (1.3% of PA)

2015: 13,299 shifts (6.7% of PA)

2019: 34,188 shifts (17.3% of PA)

2022: 50,720 shifts (25.7% of PA)

Effectiveness: Shifts reduced BABIP by approximately 20-30 points against shifted batters, saving roughly 0.5-1.0 runs per 100 PAs against extreme pull hitters.

How it worked: Spray chart analysis revealed that certain pull-heavy hitters hit 70%+ of ground balls to one side. Shifting three infielders to that side converted hits into outs.

The shift was banned in 2023 (requiring two infielders on each side of second base), creating a natural experiment in analytical value.

# R: Analyze shift effectiveness
analyze_shift_effectiveness <- function(batted_ball_data) {
  batted_ball_data %>%
    filter(type == "ground_ball" | type == "line_drive") %>%
    group_by(batter_id, batter_name, shift_on = if_fielding_alignment == "Strategic") %>%
    summarise(
      balls_in_play = n(),
      hits = sum(events %in% c("single", "double", "triple")),
      outs = sum(events %in% c("field_out", "force_out", "double_play")),
      babip = hits / balls_in_play,
      woba = sum(woba_value, na.rm = TRUE) / sum(woba_denom, na.rm = TRUE)
    ) %>%
    pivot_wider(
      names_from = shift_on,
      values_from = c(babip, woba, balls_in_play),
      names_prefix = c("shift_", "no_shift_")
    ) %>%
    mutate(
      shift_penalty = babip_shift_TRUE - babip_shift_FALSE,
      runs_saved_per_100 = shift_penalty * -100 * 1.3  # Convert BABIP to runs
    ) %>%
    arrange(desc(runs_saved_per_100))
}

# Python: Analyze shift effectiveness
def analyze_shift_effectiveness(batted_ball_data):
    """Analyze how much shifts hurt different batters"""

    # Filter to relevant batted balls
    bb_data = batted_ball_data[
        batted_ball_data['type'].isin(['ground_ball', 'line_drive'])
    ].copy()

    # Define shift
    bb_data['shift_on'] = bb_data['if_fielding_alignment'] == 'Strategic'

    # Calculate by batter and shift status
    shift_analysis = bb_data.groupby(['batter_id', 'batter_name', 'shift_on']).agg({
        'events': 'count',
        'woba_value': 'sum',
        'woba_denom': 'sum'
    }).rename(columns={'events': 'bip'}).reset_index()

    # Calculate BABIP
    hits = bb_data[bb_data['events'].isin(['single', 'double', 'triple'])]
    hits_by_shift = hits.groupby(['batter_id', 'shift_on']).size()
    shift_analysis = shift_analysis.merge(
        hits_by_shift.rename('hits'),
        on=['batter_id', 'shift_on'],
        how='left'
    )
    shift_analysis['babip'] = shift_analysis['hits'] / shift_analysis['bip']
    shift_analysis['woba'] = shift_analysis['woba_value'] / shift_analysis['woba_denom']

    # Pivot to compare shift vs. no shift
    shift_comparison = shift_analysis.pivot(
        index=['batter_id', 'batter_name'],
        columns='shift_on',
        values=['babip', 'woba', 'bip']
    )

    # Calculate penalty
    shift_comparison['shift_penalty'] = (
        shift_comparison['babip'][True] - shift_comparison['babip'][False]
    )
    shift_comparison['runs_saved_per_100'] = shift_comparison['shift_penalty'] * -100 * 1.3

    return shift_comparison.sort_values('runs_saved_per_100', ascending=False)

Outfield Positioning

While infield shifts grabbed headlines, outfield positioning is equally important. Spray chart analysis reveals:

Pull rate: Percentage of balls hit to pull side vs. opposite field
Depth: How far to play shallow vs. deep
Shading: How much to shade toward gaps vs. lines

Modern teams position outfielders:

Based on batter tendencies (spray charts)

Based on pitcher tendencies (ground ball vs. fly ball)

Based on count and situation

Based on game state (prevent extra bases vs. accept singles)

No-doubles defense: Late in close games with runners on base, outfielders play deep to prevent extra-base hits, accepting that some singles will drop in front of them.

Catcher Framing

Pitch framing—catching pitches in a way that influences umpire calls—is worth 15-30 runs per season for elite framers.

What makes good framing:

Minimal glove movement after catch

Catching pitch at edge of zone

"Presenting" pitch to umpire

Consistent receiving technique

Value: Top framers turn 2-3% more borderline pitches into strikes. Over 7,000+ called pitches per season, this is worth significant run prevention.

# R: Calculate framing value
calculate_framing_value <- function(pitch_data) {
  # Build model of strike probability based on location
  strike_model <- glm(
    called_strike ~ poly(plate_x, 2) + poly(plate_z, 2),
    data = pitch_data,
    family = binomial
  )

  # Predict expected strikes
  pitch_data$expected_strike <- predict(strike_model, type = "response")

  # Calculate framing runs above average
  framing_runs <- pitch_data %>%
    group_by(catcher_id, catcher_name) %>%
    summarise(
      pitches = n(),
      strikes = sum(called_strike),
      expected_strikes = sum(expected_strike),
      extra_strikes = strikes - expected_strikes,
      framing_runs = extra_strikes * 0.13  # ~0.13 runs per strike
    ) %>%
    arrange(desc(framing_runs))

  return(framing_runs)
}

R

# R: Analyze shift effectiveness
analyze_shift_effectiveness <- function(batted_ball_data) {
  batted_ball_data %>%
    filter(type == "ground_ball" | type == "line_drive") %>%
    group_by(batter_id, batter_name, shift_on = if_fielding_alignment == "Strategic") %>%
    summarise(
      balls_in_play = n(),
      hits = sum(events %in% c("single", "double", "triple")),
      outs = sum(events %in% c("field_out", "force_out", "double_play")),
      babip = hits / balls_in_play,
      woba = sum(woba_value, na.rm = TRUE) / sum(woba_denom, na.rm = TRUE)
    ) %>%
    pivot_wider(
      names_from = shift_on,
      values_from = c(babip, woba, balls_in_play),
      names_prefix = c("shift_", "no_shift_")
    ) %>%
    mutate(
      shift_penalty = babip_shift_TRUE - babip_shift_FALSE,
      runs_saved_per_100 = shift_penalty * -100 * 1.3  # Convert BABIP to runs
    ) %>%
    arrange(desc(runs_saved_per_100))
}

R

# R: Calculate framing value
calculate_framing_value <- function(pitch_data) {
  # Build model of strike probability based on location
  strike_model <- glm(
    called_strike ~ poly(plate_x, 2) + poly(plate_z, 2),
    data = pitch_data,
    family = binomial
  )

  # Predict expected strikes
  pitch_data$expected_strike <- predict(strike_model, type = "response")

  # Calculate framing runs above average
  framing_runs <- pitch_data %>%
    group_by(catcher_id, catcher_name) %>%
    summarise(
      pitches = n(),
      strikes = sum(called_strike),
      expected_strikes = sum(expected_strike),
      extra_strikes = strikes - expected_strikes,
      framing_runs = extra_strikes * 0.13  # ~0.13 runs per strike
    ) %>%
    arrange(desc(framing_runs))

  return(framing_runs)
}

Python

# Python: Analyze shift effectiveness
def analyze_shift_effectiveness(batted_ball_data):
    """Analyze how much shifts hurt different batters"""

    # Filter to relevant batted balls
    bb_data = batted_ball_data[
        batted_ball_data['type'].isin(['ground_ball', 'line_drive'])
    ].copy()

    # Define shift
    bb_data['shift_on'] = bb_data['if_fielding_alignment'] == 'Strategic'

    # Calculate by batter and shift status
    shift_analysis = bb_data.groupby(['batter_id', 'batter_name', 'shift_on']).agg({
        'events': 'count',
        'woba_value': 'sum',
        'woba_denom': 'sum'
    }).rename(columns={'events': 'bip'}).reset_index()

    # Calculate BABIP
    hits = bb_data[bb_data['events'].isin(['single', 'double', 'triple'])]
    hits_by_shift = hits.groupby(['batter_id', 'shift_on']).size()
    shift_analysis = shift_analysis.merge(
        hits_by_shift.rename('hits'),
        on=['batter_id', 'shift_on'],
        how='left'
    )
    shift_analysis['babip'] = shift_analysis['hits'] / shift_analysis['bip']
    shift_analysis['woba'] = shift_analysis['woba_value'] / shift_analysis['woba_denom']

    # Pivot to compare shift vs. no shift
    shift_comparison = shift_analysis.pivot(
        index=['batter_id', 'batter_name'],
        columns='shift_on',
        values=['babip', 'woba', 'bip']
    )

    # Calculate penalty
    shift_comparison['shift_penalty'] = (
        shift_comparison['babip'][True] - shift_comparison['babip'][False]
    )
    shift_comparison['runs_saved_per_100'] = shift_comparison['shift_penalty'] * -100 * 1.3

    return shift_comparison.sort_values('runs_saved_per_100', ascending=False)

16.6 Pinch Hitting

When to Pinch Hit

Pinch hitting involves trading a better current at-bat for worse defensive play and potential lineup complications later. The decision depends on:

Leverage: High-leverage situations justify using your best bench bat
Player quality gap: Difference between starter and bench player
Remaining innings: More innings remaining increases defensive/lineup cost
Bench depth: How many quality bench players remain

General principle: Pinch hit in high-leverage situations (LI > 1.5) when the replacement provides significant upgrade (30+ wRC+ points) and sufficient game remains to matter.

Platoon Advantage

The most common pinch-hitting scenario is gaining platoon advantage:

Average Platoon Splits:

Same-handed: .310 wOBA

Opposite-handed: .325 wOBA

Platoon advantage: ~15 points of wOBA

For extreme platoon hitters, this advantage can be 40+ points, making pinch-hitting highly valuable.

# R: Evaluate pinch-hit decision
evaluate_pinch_hit <- function(current_batter, bench_batter, leverage_index,
                                innings_remaining, defensive_penalty) {

  # Expected runs added from hitting upgrade
  woba_diff <- bench_batter$woba - current_batter$woba
  runs_added_hitting <- woba_diff * 1.3 * leverage_index

  # Expected runs lost from defensive downgrade (rest of game)
  # Assuming ~4.5 plate appearances per 9 innings for position
  innings_factor <- innings_remaining / 9
  expected_pas <- 4.5 * innings_factor
  runs_lost_defense <- defensive_penalty * expected_pas / 100

  # Net expected value
  net_value <- runs_added_hitting - runs_lost_defense

  decision <- ifelse(net_value > 0, "PINCH HIT", "LEAVE IN")

  return(list(
    runs_added_hitting = runs_added_hitting,
    runs_lost_defense = runs_lost_defense,
    net_value = net_value,
    decision = decision
  ))
}

# Example: 7th inning, LI = 2.0, innings remaining = 3
# Current batter: .300 wOBA, bench bat: .340 wOBA (vs. RHP)
# Defensive penalty: -5 runs/100 PAs (bench player is worse defender)

# Result: +0.104 hitting, -0.015 defense = +0.089 net → PINCH HIT

# Python: Evaluate pinch-hit decision
def evaluate_pinch_hit(current_batter_woba, bench_batter_woba, leverage_index,
                       innings_remaining, defensive_penalty):
    """
    Evaluate whether to pinch hit

    defensive_penalty: runs per 100 PAs (negative = worse defender)
    """
    # Expected runs added from hitting upgrade
    woba_diff = bench_batter_woba - current_batter_woba
    runs_added_hitting = woba_diff * 1.3 * leverage_index

    # Expected runs lost from defensive downgrade
    innings_factor = innings_remaining / 9
    expected_pas = 4.5 * innings_factor
    runs_lost_defense = defensive_penalty * expected_pas / 100

    # Net expected value
    net_value = runs_added_hitting - runs_lost_defense

    decision = "PINCH HIT" if net_value > 0 else "LEAVE IN"

    return {
        'runs_added_hitting': runs_added_hitting,
        'runs_lost_defense': runs_lost_defense,
        'net_value': net_value,
        'decision': decision
    }

# Example usage
result = evaluate_pinch_hit(
    current_batter_woba=0.300,
    bench_batter_woba=0.340,
    leverage_index=2.0,
    innings_remaining=3,
    defensive_penalty=-5  # Bench player is worse defender
)

print(f"Decision: {result['decision']}")
print(f"Net value: {result['net_value']:.3f} runs")

Double Switch Strategy

In leagues without the DH, the double switch—simultaneously substituting a position player and pitcher while swapping their lineup positions—allows managers to optimize the pitcher's spot in the order.

Classic scenario: Pitcher due up second next inning. Replace pitcher and 8th-place hitter, putting new pitcher in 8th spot and new position player in pitcher's spot (delaying pitcher's next PA).

This delays the new pitcher's plate appearance by 1-2 spots, potentially avoiding a pinch-hitting situation next inning.

R

# R: Evaluate pinch-hit decision
evaluate_pinch_hit <- function(current_batter, bench_batter, leverage_index,
                                innings_remaining, defensive_penalty) {

  # Expected runs added from hitting upgrade
  woba_diff <- bench_batter$woba - current_batter$woba
  runs_added_hitting <- woba_diff * 1.3 * leverage_index

  # Expected runs lost from defensive downgrade (rest of game)
  # Assuming ~4.5 plate appearances per 9 innings for position
  innings_factor <- innings_remaining / 9
  expected_pas <- 4.5 * innings_factor
  runs_lost_defense <- defensive_penalty * expected_pas / 100

  # Net expected value
  net_value <- runs_added_hitting - runs_lost_defense

  decision <- ifelse(net_value > 0, "PINCH HIT", "LEAVE IN")

  return(list(
    runs_added_hitting = runs_added_hitting,
    runs_lost_defense = runs_lost_defense,
    net_value = net_value,
    decision = decision
  ))
}

# Example: 7th inning, LI = 2.0, innings remaining = 3
# Current batter: .300 wOBA, bench bat: .340 wOBA (vs. RHP)
# Defensive penalty: -5 runs/100 PAs (bench player is worse defender)

# Result: +0.104 hitting, -0.015 defense = +0.089 net → PINCH HIT

Python

# Python: Evaluate pinch-hit decision
def evaluate_pinch_hit(current_batter_woba, bench_batter_woba, leverage_index,
                       innings_remaining, defensive_penalty):
    """
    Evaluate whether to pinch hit

    defensive_penalty: runs per 100 PAs (negative = worse defender)
    """
    # Expected runs added from hitting upgrade
    woba_diff = bench_batter_woba - current_batter_woba
    runs_added_hitting = woba_diff * 1.3 * leverage_index

    # Expected runs lost from defensive downgrade
    innings_factor = innings_remaining / 9
    expected_pas = 4.5 * innings_factor
    runs_lost_defense = defensive_penalty * expected_pas / 100

    # Net expected value
    net_value = runs_added_hitting - runs_lost_defense

    decision = "PINCH HIT" if net_value > 0 else "LEAVE IN"

    return {
        'runs_added_hitting': runs_added_hitting,
        'runs_lost_defense': runs_lost_defense,
        'net_value': net_value,
        'decision': decision
    }

# Example usage
result = evaluate_pinch_hit(
    current_batter_woba=0.300,
    bench_batter_woba=0.340,
    leverage_index=2.0,
    innings_remaining=3,
    defensive_penalty=-5  # Bench player is worse defender
)

print(f"Decision: {result['decision']}")
print(f"Net value: {result['net_value']:.3f} runs")

16.7 Exercises

Exercise 1: Run Expectancy Calculation

Using the run expectancy matrix provided in section 16.1:

a) Calculate the RE24 value for each of these plays:

Runner on 2nd, 1 out. Batter doubles, scoring the runner.

Bases loaded, 0 outs. Batter grounds into double play, runner scores from third.

Runner on 1st, 2 outs. Batter strikes out.

b) A manager must decide: runner on 2nd, 0 outs, bottom of 9th, tie game. Should he:

Bunt the runner to 3rd (assume 90% success rate)

Swing away

Calculate the expected runs for each strategy and recommend the optimal decision.

c) Write code to calculate season-long RE24 for all batters in a dataset, identifying the players who added the most run expectancy above average.

Exercise 2: Stolen Base Break-Even Analysis

a) Using the run expectancy matrix:

Calculate the break-even stolen base success rate for a runner on 1st with 0 outs

Calculate the break-even rate for a runner on 2nd with 1 out (attempting to steal 3rd)

Which situation requires a higher success rate? Why?

b) A baserunner has an 82% career stolen base success rate. In which base-out states should he attempt to steal? Consider all combinations of runner on 1st (with 0, 1, and 2 outs).

c) Download Statcast sprint speed data. Analyze the correlation between sprint speed and stolen base success rate. What is the minimum sprint speed typically required to maintain a 75% success rate?

d) Write a function that takes a runner's success rate and recommends whether to steal in different base-out states based on run expectancy analysis.

Exercise 3: Lineup Optimization Simulation

a) Create a dataset of 9 batters with realistic OBP and SLG values representing a team lineup.

b) Write a simulation function that:

Simulates 1,000 games with a given lineup order

Tracks total runs scored

Returns average runs per game

c) Test at least 5 different lineup configurations:

Traditional order (best hitter 3rd, power hitter 4th)

Analytical optimal (best hitters 1st and 2nd)

Random orders

d) Calculate how many additional runs per 162 games the optimal lineup generates compared to the traditional lineup. Convert this to wins using the Pythagorean expectation (typically 10 runs ≈ 1 win).

e) BONUS: Implement a genetic algorithm or simulated annealing approach to find the true optimal lineup order without testing all 362,880 possible permutations.

Exercise 4: Defensive Shift Analysis

a) Using Statcast data with spray angle and shift alignment information:

Identify the 10 batters most hurt by defensive shifts (largest BABIP penalty)

Identify the 10 batters least affected by shifts

b) Calculate the relationship between:

Pull rate (percentage of balls hit to pull side)

Ground ball rate

Shift frequency faced

BABIP penalty from shifts

c) Build a model to predict which batters would benefit most from the 2023 shift ban. Estimate total hits gained per season.

d) Analyze the 2023 season (first year of shift ban) compared to 2022:

Did league-wide BABIP increase?

Did pull-heavy hitters see larger BABIP increases?

Were your predictions accurate?

e) Write a function that recommends whether to shift against a specific batter based on their spray chart data, ground ball rate, and speed. The function should consider that shifts must be legal (two infielders each side of second base as of 2023).

Summary

In-game strategy represents where analytics meets real-time decision-making. The frameworks we've explored—run expectancy, win probability, lineup optimization, baserunning analysis, and defensive positioning—transform gut decisions into quantifiable choices.

Key takeaways:

Run expectancy matrices provide the foundation for valuing different game states and measuring player contributions beyond traditional stats.

Win probability shifts focus from runs to actual winning, accounting for context and game situation.

Times through the order penalty justifies earlier pitcher removal, revolutionizing bullpen usage.

Optimal lineups maximize plate appearances for best hitters, not arbitrary traditional positions.

Baserunning decisions require high success rates to justify the risk, with break-even rates often above 70%.

Defensive positioning based on data can save dozens of hits per season, though the shift ban has limited some applications.

Pinch-hitting balances immediate upgrade against future defensive/lineup costs.

The modern manager increasingly relies on these analytical tools, supported by front office analysts providing real-time recommendations. While baseball remains a game of human execution and judgment, the framework for making optimal decisions has never been more sophisticated.

In the next chapter, we'll explore player valuation and projection systems, learning how to forecast future performance and build comprehensive metrics like WAR that capture total player value.

Chapter 16: In-Game Strategy & Decision Making

Book Progress

What You'll Learn

Languages in This Chapter

Table of Contents

Quick Navigation

16.1 Run Expectancy & Win Probability

Understanding Run Expectancy

Calculating Run Expectancy

Run Expectancy Added (RE24)

Win Probability

Win Probability Added (WPA)

Leverage Index

16.2 Pitching Strategy

Times Through the Order Penalty

Pitch Mix Optimization

Platoon Advantages

16.3 Batting Order Optimization

The Traditional Lineup

Optimal Lineup Construction

Advanced Lineup Optimization

Special Considerations

16.4 Baserunning Decisions

Stolen Base Analysis

Going First-to-Third

Taking Extra Bases

16.5 Defensive Positioning

The Shift Era (2011-2022)

Outfield Positioning

Catcher Framing

16.6 Pinch Hitting

When to Pinch Hit

Platoon Advantage

Double Switch Strategy

16.7 Exercises

Exercise 1: Run Expectancy Calculation

Exercise 2: Stolen Base Break-Even Analysis

Exercise 3: Lineup Optimization Simulation

Exercise 4: Defensive Shift Analysis

Summary

Chapter Summary

Related Resources

Glossary

Resources

All Chapters