Chapter 22: Weather, Ballpark & Environmental Factors

Baseball is played outdoors, exposed to the elements in ways that distinguish it from arena sports. A home run at Coors Field might be a routine fly ball at Oracle Park. A ball crushed in 90-degree July heat travels farther than the same swing in 50-degree April chill. Wind, humidity, altitude, and ballpark dimensions create substantial variance in outcomes that analytics must account for when evaluating performance, forecasting results, and making strategic decisions.

Intermediate ~8 min read 7 sections 22 code examples
Book Progress
43%
Chapter 23 of 54
What You'll Learn
  • Park Factors Deep Dive
  • Weather Effects on Ball Flight
  • Altitude & Air Density: Coors Field Analysis
  • Temperature & Humidity Effects
  • And 3 more topics...
Languages in This Chapter
R (12) Python (10)

All code examples can be copied and run in your environment.

22.1 Park Factors Deep Dive

The Need for Park Adjustment

Park factors quantify how much a ballpark increases or decreases various outcomes compared to a neutral environment. A park factor of 100 is neutral, above 100 favors a particular outcome (e.g., hitting), and below 100 suppresses it.

Why Park Factors Matter:


  • Player Evaluation: A .280 hitter in Colorado might be equivalent to a .260 hitter elsewhere

  • Projections: Future performance depends partly on home ballpark

  • Contract Negotiations: Teams discount stats accumulated in extreme parks

  • Strategic Decisions: Roster construction should account for home environment

  • Historical Comparison: Comparing players across eras requires park context

Calculating Basic Park Factors

The fundamental park factor formula compares outcomes at home versus road:

Park Factor = 100 × (Home Rate / Road Rate)

However, this naive approach has issues. A better method accounts for the fact that the team plays in both environments:

Park Factor = 100 × ((Home Games) / (Road Games)) / (((Home Games + Road Games) / 2) / Road Games)

Simplified:

Park Factor = 100 × (Home Rate × Road Rate) / ((Home Rate + Road Rate) / 2)

R Implementation: Multi-Year Park Factors

# R: Calculate Park Factors
library(tidyverse)
library(Lahman)

# Calculate park factors for runs scored (2019-2023)
calculate_park_factors <- function(start_year = 2019, end_year = 2023) {

  # Get team game logs
  team_stats <- Teams %>%
    filter(yearID >= start_year, yearID <= end_year) %>%
    select(yearID, teamID, G, R, RA, HR, H, BB, SO) %>%
    mutate(
      home_games = G / 2,  # Approximation
      road_games = G / 2
    )

  # Get park-specific data
  # For demonstration, we'll calculate from team splits
  # In practice, you'd use game-by-game data

  park_factors <- team_stats %>%
    group_by(teamID) %>%
    summarise(
      years = n(),
      total_games = sum(G),
      total_runs = sum(R),
      total_hr = sum(HR),
      runs_per_game = sum(R) / sum(G),
      hr_per_game = sum(HR) / sum(G),
      .groups = 'drop'
    )

  # Calculate league averages
  lg_avg_rpg <- sum(team_stats$R) / sum(team_stats$G)
  lg_avg_hrpg <- sum(team_stats$HR) / sum(team_stats$G)

  park_factors <- park_factors %>%
    mutate(
      runs_park_factor = 100 * (runs_per_game / lg_avg_rpg),
      hr_park_factor = 100 * (hr_per_game / lg_avg_hrpg)
    ) %>%
    arrange(desc(runs_park_factor))

  return(park_factors)
}

# Calculate and display park factors
pf <- calculate_park_factors()
print(pf %>% select(teamID, runs_park_factor, hr_park_factor))

# Visualize park factors
ggplot(pf, aes(x = reorder(teamID, runs_park_factor), y = runs_park_factor)) +
  geom_col(aes(fill = runs_park_factor > 100), alpha = 0.8) +
  geom_hline(yintercept = 100, linetype = "dashed", color = "black", size = 1) +
  scale_fill_manual(values = c("steelblue", "coral"), guide = "none") +
  coord_flip() +
  labs(title = "MLB Park Factors for Runs (2019-2023)",
       subtitle = "100 = League Average",
       x = "Team",
       y = "Park Factor") +
  theme_minimal() +
  theme(axis.text = element_text(size = 10))

Python Implementation: Comprehensive Park Factors

# Python: Advanced Park Factor Analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pybaseball import statcast, team_batting, team_pitching

# Define park characteristics (approximate dimensions)
park_dimensions = pd.DataFrame({
    'park': ['Coors Field', 'Yankee Stadium', 'Fenway Park', 'Oracle Park',
             'Petco Park', 'Dodger Stadium', 'Wrigley Field', 'Minute Maid Park'],
    'team': ['COL', 'NYY', 'BOS', 'SFG', 'SDP', 'LAD', 'CHC', 'HOU'],
    'lf_distance': [347, 318, 310, 339, 334, 330, 355, 315],
    'cf_distance': [415, 408, 390, 399, 396, 395, 400, 409],
    'rf_distance': [350, 314, 302, 309, 322, 330, 353, 326],
    'lf_wall_height': [8, 8, 37, 8, 8, 3.75, 11, 19],
    'elevation': [5200, 55, 20, 0, 20, 340, 595, 50],
    'known_factor': ['Extreme hitter', 'RH hitter', 'LH hitter', 'Extreme pitcher',
                     'Pitcher friendly', 'Pitcher friendly', 'Neutral', 'LH hitter']
})

print("MLB Park Dimensions and Characteristics:")
print(park_dimensions.to_string(index=False))

# Simulate park factors based on real tendencies
np.random.seed(42)

park_factors_data = {
    'park': ['Coors Field', 'Globe Life Field', 'Great American Ball Park',
             'Rogers Centre', 'Yankee Stadium', 'Camden Yards', 'Fenway Park',
             'Wrigley Field', 'Chase Field', 'Minute Maid Park',
             'T-Mobile Park', 'Target Field', 'Guaranteed Rate Field',
             'Kauffman Stadium', 'Dodger Stadium', 'Petco Park', 'Oracle Park',
             'Truist Park', 'Nationals Park', 'Progressive Field'],
    'team': ['COL', 'TEX', 'CIN', 'TOR', 'NYY', 'BAL', 'BOS', 'CHC', 'ARI', 'HOU',
             'SEA', 'MIN', 'CHW', 'KCR', 'LAD', 'SDP', 'SFG', 'ATL', 'WSN', 'CLE'],
    'runs_pf': [115, 108, 107, 104, 103, 103, 102, 101, 100, 100,
                99, 98, 98, 97, 96, 94, 92, 100, 99, 97],
    'hr_pf': [122, 110, 112, 105, 108, 106, 104, 102, 98, 103,
              96, 97, 99, 95, 97, 89, 85, 101, 100, 98],
    'hits_pf': [112, 106, 104, 103, 102, 102, 103, 101, 101, 99,
                100, 99, 98, 98, 97, 95, 94, 100, 99, 97],
    'doubles_pf': [109, 105, 103, 104, 100, 103, 107, 102, 99, 98,
                   102, 98, 97, 99, 96, 96, 92, 101, 100, 99],
    'triples_pf': [118, 95, 98, 92, 88, 97, 85, 103, 105, 94,
                   108, 102, 101, 112, 99, 102, 104, 96, 98, 100]
}

park_factors_df = pd.DataFrame(park_factors_data)

# Calculate aggregate offensive park factor
park_factors_df['overall_pf'] = (
    park_factors_df['runs_pf'] * 0.4 +
    park_factors_df['hr_pf'] * 0.3 +
    park_factors_df['hits_pf'] * 0.3
)

park_factors_df = park_factors_df.sort_values('overall_pf', ascending=False)

print("\nPark Factors Summary (2019-2023):")
print(park_factors_df[['park', 'runs_pf', 'hr_pf', 'overall_pf']].to_string(index=False))

# Visualize multi-dimensional park factors
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Runs vs HR park factors
axes[0].scatter(park_factors_df['runs_pf'], park_factors_df['hr_pf'],
                s=100, alpha=0.6, c=park_factors_df['overall_pf'],
                cmap='RdYlGn_r')
axes[0].axhline(y=100, color='gray', linestyle='--', alpha=0.5)
axes[0].axvline(x=100, color='gray', linestyle='--', alpha=0.5)

# Annotate extreme parks
for idx, row in park_factors_df.iterrows():
    if row['runs_pf'] > 108 or row['runs_pf'] < 94:
        axes[0].annotate(row['team'], (row['runs_pf'], row['hr_pf']),
                        fontsize=8, alpha=0.7)

axes[0].set_xlabel('Runs Park Factor')
axes[0].set_ylabel('Home Runs Park Factor')
axes[0].set_title('Park Factors: Runs vs Home Runs')
axes[0].grid(True, alpha=0.3)

# Overall park factor ranking
park_factors_df_sorted = park_factors_df.sort_values('overall_pf')
colors = ['green' if x < 100 else 'red' for x in park_factors_df_sorted['overall_pf']]
axes[1].barh(range(len(park_factors_df_sorted)),
             park_factors_df_sorted['overall_pf'],
             color=colors, alpha=0.6)
axes[1].set_yticks(range(len(park_factors_df_sorted)))
axes[1].set_yticklabels(park_factors_df_sorted['team'], fontsize=8)
axes[1].axvline(x=100, color='black', linestyle='--', linewidth=2)
axes[1].set_xlabel('Overall Park Factor')
axes[1].set_title('Overall Offensive Park Factors')
axes[1].grid(True, alpha=0.3, axis='x')

plt.tight_layout()
plt.savefig('park_factors_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

# Statistical significance of park effects
from scipy import stats

def assess_park_significance(home_rpg, road_rpg, games=81):
    """
    Test if park effect is statistically significant
    """
    # Assuming Poisson distribution for runs
    # Standard error for rate difference
    se = np.sqrt((home_rpg + road_rpg) / games)
    z_score = (home_rpg - road_rpg) / se
    p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

    return {
        'z_score': z_score,
        'p_value': p_value,
        'significant': p_value < 0.05
    }

# Example: Coors Field
coors_home_rpg = 5.8
coors_road_rpg = 4.7

significance = assess_park_significance(coors_home_rpg, coors_road_rpg)
print(f"\nCoors Field Park Effect Significance:")
print(f"Z-score: {significance['z_score']:.2f}")
print(f"P-value: {significance['p_value']:.4f}")
print(f"Statistically significant: {significance['significant']}")

Advanced Park Factors: Handedness and Batted Ball Type

Modern park factors account for batter handedness and batted ball outcomes:

# Python: Handedness-Specific Park Factors
def calculate_handedness_park_factors():
    """
    Calculate park factors by batter handedness
    """
    # Example data structure
    handedness_pf = pd.DataFrame({
        'park': ['Yankee Stadium', 'Fenway Park', 'Oracle Park', 'Coors Field',
                 'Petco Park', 'Dodger Stadium'],
        'team': ['NYY', 'BOS', 'SFG', 'COL', 'SDP', 'LAD'],
        'rhb_hr_pf': [125, 95, 82, 118, 88, 95],
        'lhb_hr_pf': [88, 115, 90, 126, 92, 98],
        'rhb_overall_pf': [108, 98, 90, 113, 92, 97],
        'lhb_overall_pf': [95, 107, 93, 117, 95, 96]
    })

    print("Park Factors by Batter Handedness:")
    print(handedness_pf.to_string(index=False))

    # Visualize the handedness effect
    fig, ax = plt.subplots(figsize=(10, 6))

    x = np.arange(len(handedness_pf))
    width = 0.35

    ax.bar(x - width/2, handedness_pf['rhb_hr_pf'], width,
           label='RHB HR PF', alpha=0.8, color='steelblue')
    ax.bar(x + width/2, handedness_pf['lhb_hr_pf'], width,
           label='LHB HR PF', alpha=0.8, color='coral')

    ax.axhline(y=100, color='black', linestyle='--', linewidth=1)
    ax.set_xlabel('Ballpark')
    ax.set_ylabel('Home Run Park Factor')
    ax.set_title('Home Run Park Factors by Batter Handedness')
    ax.set_xticks(x)
    ax.set_xticklabels(handedness_pf['park'], rotation=45, ha='right')
    ax.legend()
    ax.grid(True, alpha=0.3, axis='y')

    plt.tight_layout()
    plt.show()

    return handedness_pf

handedness_pf = calculate_handedness_park_factors()

# Identify parks with largest handedness splits
handedness_pf['hr_split'] = abs(handedness_pf['rhb_hr_pf'] -
                                 handedness_pf['lhb_hr_pf'])
print("\nParks with Largest Handedness Effects:")
print(handedness_pf.nlargest(5, 'hr_split')[['park', 'rhb_hr_pf',
                                               'lhb_hr_pf', 'hr_split']])

Time-Varying Park Factors

Park factors change over time due to:


  • Ballpark modifications (fence distances, wall heights)

  • Climate changes

  • Surrounding construction affecting wind patterns

  • Ball specifications (deadened ball era 2021-2022)

# R: Track Park Factor Changes Over Time
library(tidyverse)

# Simulate park factor changes for select ballparks
years <- 2015:2023
parks <- c("COL", "BOS", "NYY", "SFG", "LAD")

set.seed(123)
park_trends <- expand_grid(year = years, park = parks) %>%
  mutate(
    base_pf = case_when(
      park == "COL" ~ 115,
      park == "BOS" ~ 102,
      park == "NYY" ~ 103,
      park == "SFG" ~ 92,
      park == "LAD" ~ 96
    ),
    # Add trend and noise
    trend = case_when(
      park == "COL" & year >= 2019 ~ -1.5,  # Humidor effect
      park == "SFG" & year >= 2020 ~ -2,     # Oracle Park settling
      TRUE ~ 0
    ),
    noise = rnorm(n(), 0, 2),
    runs_pf = base_pf + (year - 2015) * trend + noise
  )

# Visualize trends
ggplot(park_trends, aes(x = year, y = runs_pf, color = park)) +
  geom_line(size = 1.2) +
  geom_point(size = 2.5) +
  geom_hline(yintercept = 100, linetype = "dashed", color = "gray50") +
  scale_color_brewer(palette = "Set1", name = "Ballpark") +
  labs(title = "Park Factor Evolution Over Time",
       subtitle = "Notable: Coors Field humidor impact (2019)",
       x = "Year",
       y = "Runs Park Factor") +
  theme_minimal() +
  theme(legend.position = "right")

# Statistical test for trend
coors_trend <- park_trends %>%
  filter(park == "COL")

coors_model <- lm(runs_pf ~ year, data = coors_trend)
summary(coors_model)

cat("\nCoors Field Trend Analysis:\n")
cat(sprintf("Slope: %.2f runs PF per year\n", coef(coors_model)[2]))
cat(sprintf("P-value: %.4f\n", summary(coors_model)$coefficients[2, 4]))

R
Park Factor = 100 × (Home Rate / Road Rate)
R
Park Factor = 100 × ((Home Games) / (Road Games)) / (((Home Games + Road Games) / 2) / Road Games)
R
Park Factor = 100 × (Home Rate × Road Rate) / ((Home Rate + Road Rate) / 2)
R
# R: Calculate Park Factors
library(tidyverse)
library(Lahman)

# Calculate park factors for runs scored (2019-2023)
calculate_park_factors <- function(start_year = 2019, end_year = 2023) {

  # Get team game logs
  team_stats <- Teams %>%
    filter(yearID >= start_year, yearID <= end_year) %>%
    select(yearID, teamID, G, R, RA, HR, H, BB, SO) %>%
    mutate(
      home_games = G / 2,  # Approximation
      road_games = G / 2
    )

  # Get park-specific data
  # For demonstration, we'll calculate from team splits
  # In practice, you'd use game-by-game data

  park_factors <- team_stats %>%
    group_by(teamID) %>%
    summarise(
      years = n(),
      total_games = sum(G),
      total_runs = sum(R),
      total_hr = sum(HR),
      runs_per_game = sum(R) / sum(G),
      hr_per_game = sum(HR) / sum(G),
      .groups = 'drop'
    )

  # Calculate league averages
  lg_avg_rpg <- sum(team_stats$R) / sum(team_stats$G)
  lg_avg_hrpg <- sum(team_stats$HR) / sum(team_stats$G)

  park_factors <- park_factors %>%
    mutate(
      runs_park_factor = 100 * (runs_per_game / lg_avg_rpg),
      hr_park_factor = 100 * (hr_per_game / lg_avg_hrpg)
    ) %>%
    arrange(desc(runs_park_factor))

  return(park_factors)
}

# Calculate and display park factors
pf <- calculate_park_factors()
print(pf %>% select(teamID, runs_park_factor, hr_park_factor))

# Visualize park factors
ggplot(pf, aes(x = reorder(teamID, runs_park_factor), y = runs_park_factor)) +
  geom_col(aes(fill = runs_park_factor > 100), alpha = 0.8) +
  geom_hline(yintercept = 100, linetype = "dashed", color = "black", size = 1) +
  scale_fill_manual(values = c("steelblue", "coral"), guide = "none") +
  coord_flip() +
  labs(title = "MLB Park Factors for Runs (2019-2023)",
       subtitle = "100 = League Average",
       x = "Team",
       y = "Park Factor") +
  theme_minimal() +
  theme(axis.text = element_text(size = 10))
R
# R: Track Park Factor Changes Over Time
library(tidyverse)

# Simulate park factor changes for select ballparks
years <- 2015:2023
parks <- c("COL", "BOS", "NYY", "SFG", "LAD")

set.seed(123)
park_trends <- expand_grid(year = years, park = parks) %>%
  mutate(
    base_pf = case_when(
      park == "COL" ~ 115,
      park == "BOS" ~ 102,
      park == "NYY" ~ 103,
      park == "SFG" ~ 92,
      park == "LAD" ~ 96
    ),
    # Add trend and noise
    trend = case_when(
      park == "COL" & year >= 2019 ~ -1.5,  # Humidor effect
      park == "SFG" & year >= 2020 ~ -2,     # Oracle Park settling
      TRUE ~ 0
    ),
    noise = rnorm(n(), 0, 2),
    runs_pf = base_pf + (year - 2015) * trend + noise
  )

# Visualize trends
ggplot(park_trends, aes(x = year, y = runs_pf, color = park)) +
  geom_line(size = 1.2) +
  geom_point(size = 2.5) +
  geom_hline(yintercept = 100, linetype = "dashed", color = "gray50") +
  scale_color_brewer(palette = "Set1", name = "Ballpark") +
  labs(title = "Park Factor Evolution Over Time",
       subtitle = "Notable: Coors Field humidor impact (2019)",
       x = "Year",
       y = "Runs Park Factor") +
  theme_minimal() +
  theme(legend.position = "right")

# Statistical test for trend
coors_trend <- park_trends %>%
  filter(park == "COL")

coors_model <- lm(runs_pf ~ year, data = coors_trend)
summary(coors_model)

cat("\nCoors Field Trend Analysis:\n")
cat(sprintf("Slope: %.2f runs PF per year\n", coef(coors_model)[2]))
cat(sprintf("P-value: %.4f\n", summary(coors_model)$coefficients[2, 4]))
Python
# Python: Advanced Park Factor Analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pybaseball import statcast, team_batting, team_pitching

# Define park characteristics (approximate dimensions)
park_dimensions = pd.DataFrame({
    'park': ['Coors Field', 'Yankee Stadium', 'Fenway Park', 'Oracle Park',
             'Petco Park', 'Dodger Stadium', 'Wrigley Field', 'Minute Maid Park'],
    'team': ['COL', 'NYY', 'BOS', 'SFG', 'SDP', 'LAD', 'CHC', 'HOU'],
    'lf_distance': [347, 318, 310, 339, 334, 330, 355, 315],
    'cf_distance': [415, 408, 390, 399, 396, 395, 400, 409],
    'rf_distance': [350, 314, 302, 309, 322, 330, 353, 326],
    'lf_wall_height': [8, 8, 37, 8, 8, 3.75, 11, 19],
    'elevation': [5200, 55, 20, 0, 20, 340, 595, 50],
    'known_factor': ['Extreme hitter', 'RH hitter', 'LH hitter', 'Extreme pitcher',
                     'Pitcher friendly', 'Pitcher friendly', 'Neutral', 'LH hitter']
})

print("MLB Park Dimensions and Characteristics:")
print(park_dimensions.to_string(index=False))

# Simulate park factors based on real tendencies
np.random.seed(42)

park_factors_data = {
    'park': ['Coors Field', 'Globe Life Field', 'Great American Ball Park',
             'Rogers Centre', 'Yankee Stadium', 'Camden Yards', 'Fenway Park',
             'Wrigley Field', 'Chase Field', 'Minute Maid Park',
             'T-Mobile Park', 'Target Field', 'Guaranteed Rate Field',
             'Kauffman Stadium', 'Dodger Stadium', 'Petco Park', 'Oracle Park',
             'Truist Park', 'Nationals Park', 'Progressive Field'],
    'team': ['COL', 'TEX', 'CIN', 'TOR', 'NYY', 'BAL', 'BOS', 'CHC', 'ARI', 'HOU',
             'SEA', 'MIN', 'CHW', 'KCR', 'LAD', 'SDP', 'SFG', 'ATL', 'WSN', 'CLE'],
    'runs_pf': [115, 108, 107, 104, 103, 103, 102, 101, 100, 100,
                99, 98, 98, 97, 96, 94, 92, 100, 99, 97],
    'hr_pf': [122, 110, 112, 105, 108, 106, 104, 102, 98, 103,
              96, 97, 99, 95, 97, 89, 85, 101, 100, 98],
    'hits_pf': [112, 106, 104, 103, 102, 102, 103, 101, 101, 99,
                100, 99, 98, 98, 97, 95, 94, 100, 99, 97],
    'doubles_pf': [109, 105, 103, 104, 100, 103, 107, 102, 99, 98,
                   102, 98, 97, 99, 96, 96, 92, 101, 100, 99],
    'triples_pf': [118, 95, 98, 92, 88, 97, 85, 103, 105, 94,
                   108, 102, 101, 112, 99, 102, 104, 96, 98, 100]
}

park_factors_df = pd.DataFrame(park_factors_data)

# Calculate aggregate offensive park factor
park_factors_df['overall_pf'] = (
    park_factors_df['runs_pf'] * 0.4 +
    park_factors_df['hr_pf'] * 0.3 +
    park_factors_df['hits_pf'] * 0.3
)

park_factors_df = park_factors_df.sort_values('overall_pf', ascending=False)

print("\nPark Factors Summary (2019-2023):")
print(park_factors_df[['park', 'runs_pf', 'hr_pf', 'overall_pf']].to_string(index=False))

# Visualize multi-dimensional park factors
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Runs vs HR park factors
axes[0].scatter(park_factors_df['runs_pf'], park_factors_df['hr_pf'],
                s=100, alpha=0.6, c=park_factors_df['overall_pf'],
                cmap='RdYlGn_r')
axes[0].axhline(y=100, color='gray', linestyle='--', alpha=0.5)
axes[0].axvline(x=100, color='gray', linestyle='--', alpha=0.5)

# Annotate extreme parks
for idx, row in park_factors_df.iterrows():
    if row['runs_pf'] > 108 or row['runs_pf'] < 94:
        axes[0].annotate(row['team'], (row['runs_pf'], row['hr_pf']),
                        fontsize=8, alpha=0.7)

axes[0].set_xlabel('Runs Park Factor')
axes[0].set_ylabel('Home Runs Park Factor')
axes[0].set_title('Park Factors: Runs vs Home Runs')
axes[0].grid(True, alpha=0.3)

# Overall park factor ranking
park_factors_df_sorted = park_factors_df.sort_values('overall_pf')
colors = ['green' if x < 100 else 'red' for x in park_factors_df_sorted['overall_pf']]
axes[1].barh(range(len(park_factors_df_sorted)),
             park_factors_df_sorted['overall_pf'],
             color=colors, alpha=0.6)
axes[1].set_yticks(range(len(park_factors_df_sorted)))
axes[1].set_yticklabels(park_factors_df_sorted['team'], fontsize=8)
axes[1].axvline(x=100, color='black', linestyle='--', linewidth=2)
axes[1].set_xlabel('Overall Park Factor')
axes[1].set_title('Overall Offensive Park Factors')
axes[1].grid(True, alpha=0.3, axis='x')

plt.tight_layout()
plt.savefig('park_factors_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

# Statistical significance of park effects
from scipy import stats

def assess_park_significance(home_rpg, road_rpg, games=81):
    """
    Test if park effect is statistically significant
    """
    # Assuming Poisson distribution for runs
    # Standard error for rate difference
    se = np.sqrt((home_rpg + road_rpg) / games)
    z_score = (home_rpg - road_rpg) / se
    p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

    return {
        'z_score': z_score,
        'p_value': p_value,
        'significant': p_value < 0.05
    }

# Example: Coors Field
coors_home_rpg = 5.8
coors_road_rpg = 4.7

significance = assess_park_significance(coors_home_rpg, coors_road_rpg)
print(f"\nCoors Field Park Effect Significance:")
print(f"Z-score: {significance['z_score']:.2f}")
print(f"P-value: {significance['p_value']:.4f}")
print(f"Statistically significant: {significance['significant']}")
Python
# Python: Handedness-Specific Park Factors
def calculate_handedness_park_factors():
    """
    Calculate park factors by batter handedness
    """
    # Example data structure
    handedness_pf = pd.DataFrame({
        'park': ['Yankee Stadium', 'Fenway Park', 'Oracle Park', 'Coors Field',
                 'Petco Park', 'Dodger Stadium'],
        'team': ['NYY', 'BOS', 'SFG', 'COL', 'SDP', 'LAD'],
        'rhb_hr_pf': [125, 95, 82, 118, 88, 95],
        'lhb_hr_pf': [88, 115, 90, 126, 92, 98],
        'rhb_overall_pf': [108, 98, 90, 113, 92, 97],
        'lhb_overall_pf': [95, 107, 93, 117, 95, 96]
    })

    print("Park Factors by Batter Handedness:")
    print(handedness_pf.to_string(index=False))

    # Visualize the handedness effect
    fig, ax = plt.subplots(figsize=(10, 6))

    x = np.arange(len(handedness_pf))
    width = 0.35

    ax.bar(x - width/2, handedness_pf['rhb_hr_pf'], width,
           label='RHB HR PF', alpha=0.8, color='steelblue')
    ax.bar(x + width/2, handedness_pf['lhb_hr_pf'], width,
           label='LHB HR PF', alpha=0.8, color='coral')

    ax.axhline(y=100, color='black', linestyle='--', linewidth=1)
    ax.set_xlabel('Ballpark')
    ax.set_ylabel('Home Run Park Factor')
    ax.set_title('Home Run Park Factors by Batter Handedness')
    ax.set_xticks(x)
    ax.set_xticklabels(handedness_pf['park'], rotation=45, ha='right')
    ax.legend()
    ax.grid(True, alpha=0.3, axis='y')

    plt.tight_layout()
    plt.show()

    return handedness_pf

handedness_pf = calculate_handedness_park_factors()

# Identify parks with largest handedness splits
handedness_pf['hr_split'] = abs(handedness_pf['rhb_hr_pf'] -
                                 handedness_pf['lhb_hr_pf'])
print("\nParks with Largest Handedness Effects:")
print(handedness_pf.nlargest(5, 'hr_split')[['park', 'rhb_hr_pf',
                                               'lhb_hr_pf', 'hr_split']])

22.2 Weather Effects on Ball Flight

Physics of Batted Ball Flight

A baseball in flight experiences three primary forces:


  1. Gravity: Constant downward acceleration (9.8 m/s²)

  2. Drag: Air resistance proportional to velocity squared

  3. Magnus Force: Spin-induced lift (or drop)

The drag force equation:

F_drag = 0.5 × ρ × C_d × A × v²

Where:


  • ρ = air density (kg/m³)

  • C_d = drag coefficient (~0.3-0.5 for baseball)

  • A = cross-sectional area (0.00426 m² for baseball)

  • v = velocity (m/s)

Air density depends on temperature, pressure, and humidity:

ρ = (P × M) / (R × T)

Where:


  • P = pressure (Pa)

  • M = molar mass of air (kg/mol)

  • R = gas constant (8.314 J/(mol·K))

  • T = temperature (K)

R Implementation: Ball Flight Simulator

# R: Baseball Flight Physics Simulator
library(tidyverse)

# Constants
g <- 9.8  # gravity (m/s^2)
R_gas <- 8.314  # gas constant
M_air <- 0.029  # molar mass of dry air (kg/mol)
baseball_area <- 0.00426  # cross-sectional area (m^2)
baseball_mass <- 0.145  # mass (kg)
C_d <- 0.35  # drag coefficient

# Calculate air density
calculate_air_density <- function(temp_f, pressure_mb, humidity_pct) {
  # Convert to SI units
  temp_k <- (temp_f - 32) * 5/9 + 273.15
  pressure_pa <- pressure_mb * 100

  # Calculate saturation vapor pressure (Clausius-Clapeyron)
  e_s <- 611.2 * exp(17.67 * (temp_k - 273.15) / (temp_k - 29.65))

  # Actual vapor pressure
  e <- (humidity_pct / 100) * e_s

  # Partial pressure of dry air
  p_d <- pressure_pa - e

  # Air density (accounting for humidity)
  rho <- (p_d * M_air + e * 0.018) / (R_gas * temp_k)

  return(rho)
}

# Simulate ball flight
simulate_batted_ball <- function(exit_velo_mph, launch_angle_deg,
                                 temp_f, pressure_mb = 1013,
                                 humidity_pct = 50, dt = 0.01) {
  # Convert inputs
  v0 <- exit_velo_mph * 0.44704  # mph to m/s
  angle <- launch_angle_deg * pi / 180

  # Initial conditions
  vx <- v0 * cos(angle)
  vy <- v0 * sin(angle)
  x <- 0
  y <- 1  # bat height

  # Air density
  rho <- calculate_air_density(temp_f, pressure_mb, humidity_pct)

  # Storage
  trajectory <- data.frame(t = 0, x = x, y = y, vx = vx, vy = vy)

  t <- 0
  while (y > 0 && t < 10) {
    # Velocity magnitude
    v <- sqrt(vx^2 + vy^2)

    # Drag force
    F_drag <- 0.5 * rho * C_d * baseball_area * v^2

    # Acceleration components
    ax <- -(F_drag / baseball_mass) * (vx / v)
    ay <- -g - (F_drag / baseball_mass) * (vy / v)

    # Update velocity
    vx <- vx + ax * dt
    vy <- vy + ay * dt

    # Update position
    x <- x + vx * dt
    y <- y + vy * dt
    t <- t + dt

    # Store
    trajectory <- rbind(trajectory, data.frame(t = t, x = x, y = y,
                                               vx = vx, vy = vy))
  }

  # Convert distance to feet
  trajectory$x_ft <- trajectory$x * 3.28084
  trajectory$y_ft <- trajectory$y * 3.28084

  return(trajectory)
}

# Simulate the same batted ball at different temperatures
exit_velo <- 105  # mph
launch_angle <- 30  # degrees

temps <- c(50, 60, 70, 80, 90, 100)
results <- list()

for (temp in temps) {
  traj <- simulate_batted_ball(exit_velo, launch_angle, temp)
  max_dist <- max(traj$x_ft)
  results[[as.character(temp)]] <- data.frame(
    temp = temp,
    distance = max_dist
  )
}

distance_vs_temp <- bind_rows(results)

print("Distance vs Temperature:")
print(distance_vs_temp)

# Visualize impact
ggplot(distance_vs_temp, aes(x = temp, y = distance)) +
  geom_line(color = "red", size = 1.5) +
  geom_point(size = 3, color = "darkred") +
  labs(title = "Impact of Temperature on Batted Ball Distance",
       subtitle = "105 mph exit velocity, 30° launch angle",
       x = "Temperature (°F)",
       y = "Distance (feet)") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

# Calculate temperature effect
temp_effect <- lm(distance ~ temp, data = distance_vs_temp)
cat(sprintf("\nEstimated effect: %.2f feet per 10°F increase\n",
            coef(temp_effect)[2] * 10))

Python Implementation: Comprehensive Weather Effects

# Python: Advanced Weather Impact Analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint

class BallFlightSimulator:
    """
    Simulate batted ball flight with weather effects
    """

    def __init__(self):
        self.g = 9.8  # m/s^2
        self.m = 0.145  # kg
        self.A = 0.00426  # m^2
        self.Cd = 0.35
        self.R = 8.314
        self.M_air = 0.029

    def air_density(self, temp_f, pressure_mb, humidity_pct):
        """Calculate air density accounting for temperature, pressure, humidity"""
        # Convert temperature
        temp_c = (temp_f - 32) * 5/9
        temp_k = temp_c + 273.15

        # Saturation vapor pressure (Magnus formula)
        e_s = 611.2 * np.exp(17.67 * temp_c / (temp_c + 243.5))

        # Actual vapor pressure
        e = (humidity_pct / 100) * e_s

        # Pressure in Pa
        pressure_pa = pressure_mb * 100

        # Partial pressure of dry air
        p_d = pressure_pa - e

        # Density (accounting for water vapor)
        rho = (p_d * self.M_air + e * 0.018) / (self.R * temp_k)

        return rho

    def simulate(self, exit_velo_mph, launch_angle_deg,
                 temp_f, pressure_mb=1013.25, humidity_pct=50,
                 wind_mph=0, backspin_rpm=2000):
        """
        Simulate ball flight trajectory

        Parameters:
        - exit_velo_mph: Exit velocity in mph
        - launch_angle_deg: Launch angle in degrees
        - temp_f: Temperature in Fahrenheit
        - pressure_mb: Atmospheric pressure in millibars
        - humidity_pct: Relative humidity percentage
        - wind_mph: Wind speed (positive = helping wind)
        - backspin_rpm: Backspin in RPM
        """
        # Convert units
        v0 = exit_velo_mph * 0.44704  # to m/s
        angle = np.radians(launch_angle_deg)
        wind = wind_mph * 0.44704  # to m/s
        omega = backspin_rpm * 2 * np.pi / 60  # to rad/s

        # Initial conditions
        vx0 = v0 * np.cos(angle)
        vy0 = v0 * np.sin(angle)

        # Air density
        rho = self.air_density(temp_f, pressure_mb, humidity_pct)

        # Time vector
        t = np.linspace(0, 10, 1000)

        # Differential equations
        def derivatives(state, t):
            x, y, vx, vy = state

            # Relative velocity (accounting for wind)
            vx_rel = vx - wind
            v = np.sqrt(vx_rel**2 + vy**2)

            if v == 0:
                return [vx, vy, 0, -self.g]

            # Drag force
            F_drag = 0.5 * rho * self.Cd * self.A * v**2

            # Magnus force (lift from backspin)
            # Simplified: F_magnus = S * omega * v
            S = 4.1e-4  # Magnus coefficient
            F_magnus = S * omega * v

            # Accelerations
            ax = -(F_drag / self.m) * (vx_rel / v)
            ay = -self.g - (F_drag / self.m) * (vy / v) + (F_magnus / self.m)

            return [vx, vy, ax, ay]

        # Solve ODE
        initial_state = [0, 1, vx0, vy0]
        solution = odeint(derivatives, initial_state, t)

        # Extract trajectory
        x = solution[:, 0] * 3.28084  # to feet
        y = solution[:, 1] * 3.28084  # to feet

        # Find landing point
        landing_idx = np.where(y < 0)[0]
        if len(landing_idx) > 0:
            landing_idx = landing_idx[0]
            distance = x[landing_idx]
            hang_time = t[landing_idx]
        else:
            distance = x[-1]
            hang_time = t[-1]

        return {
            'distance': distance,
            'hang_time': hang_time,
            'trajectory': pd.DataFrame({'x': x[:landing_idx+1],
                                       'y': y[:landing_idx+1],
                                       't': t[:landing_idx+1]})
        }

# Initialize simulator
sim = BallFlightSimulator()

# Standard batted ball: 105 mph, 30 degrees, 2000 rpm backspin
print("Weather Impact on Identical Batted Ball (105 mph, 30°, 2000 RPM):\n")

weather_conditions = [
    {'name': 'Cold April', 'temp': 45, 'pressure': 1020, 'humidity': 40},
    {'name': 'Cool Spring', 'temp': 60, 'pressure': 1015, 'humidity': 50},
    {'name': 'Mild Summer', 'temp': 75, 'pressure': 1013, 'humidity': 60},
    {'name': 'Hot & Humid', 'temp': 90, 'pressure': 1010, 'humidity': 80},
    {'name': 'Extreme Heat', 'temp': 105, 'pressure': 1005, 'humidity': 30}
]

results = []
for condition in weather_conditions:
    result = sim.simulate(105, 30, condition['temp'],
                         condition['pressure'], condition['humidity'])
    results.append({
        'condition': condition['name'],
        'temp': condition['temp'],
        'distance': result['distance'],
        'hang_time': result['hang_time']
    })

results_df = pd.DataFrame(results)
print(results_df.to_string(index=False))

# Calculate distance gained from cold to hot
cold_dist = results_df[results_df['temp'] == 45]['distance'].values[0]
hot_dist = results_df[results_df['temp'] == 105]['distance'].values[0]
print(f"\nDistance gained from 45°F to 105°F: {hot_dist - cold_dist:.1f} feet")
print(f"Percentage increase: {((hot_dist - cold_dist) / cold_dist * 100):.1f}%")

# Visualize results
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Distance vs temperature
axes[0].plot(results_df['temp'], results_df['distance'],
             'o-', linewidth=2, markersize=8, color='orangered')
axes[0].set_xlabel('Temperature (°F)', fontsize=12)
axes[0].set_ylabel('Distance (feet)', fontsize=12)
axes[0].set_title('Ball Flight Distance vs Temperature', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# Add reference line
axes[0].axhline(y=400, color='gray', linestyle='--', alpha=0.5, label='Home Run Distance')
axes[0].legend()

# Trajectory comparison (extreme conditions)
cold_traj = sim.simulate(105, 30, 45, 1020, 40)['trajectory']
hot_traj = sim.simulate(105, 30, 105, 1005, 30)['trajectory']

axes[1].plot(cold_traj['x'], cold_traj['y'], label='45°F (Cold)', linewidth=2)
axes[1].plot(hot_traj['x'], hot_traj['y'], label='105°F (Hot)', linewidth=2)
axes[1].set_xlabel('Distance (feet)', fontsize=12)
axes[1].set_ylabel('Height (feet)', fontsize=12)
axes[1].set_title('Trajectory Comparison', fontsize=14, fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
axes[1].set_xlim(0, 450)

plt.tight_layout()
plt.savefig('weather_ball_flight.png', dpi=300, bbox_inches='tight')
plt.show()

Real-World Weather Impact on Game Outcomes

# Python: Analyze actual weather effects on scoring
def analyze_weather_scoring_relationship():
    """
    Analyze relationship between weather and runs scored
    Using hypothetical but realistic data
    """

    # Simulate realistic game data with weather
    np.random.seed(42)
    n_games = 500

    game_data = pd.DataFrame({
        'game_id': range(n_games),
        'temperature': np.random.normal(72, 15, n_games),
        'humidity': np.random.normal(55, 20, n_games),
        'wind_speed': np.abs(np.random.normal(5, 5, n_games)),
        'pressure': np.random.normal(1013, 10, n_games)
    })

    # Clip to realistic ranges
    game_data['temperature'] = game_data['temperature'].clip(35, 105)
    game_data['humidity'] = game_data['humidity'].clip(10, 95)
    game_data['wind_speed'] = game_data['wind_speed'].clip(0, 25)

    # Generate runs with weather dependencies
    base_runs = 4.5
    temp_effect = (game_data['temperature'] - 70) * 0.03  # 0.3 runs per 10°F
    wind_effect = game_data['wind_speed'] * 0.05
    noise = np.random.normal(0, 1.5, n_games)

    game_data['total_runs'] = (base_runs + temp_effect +
                                wind_effect + noise).clip(0, None)
    game_data['total_runs'] = game_data['total_runs'].round().astype(int)

    # Statistical analysis
    from scipy.stats import pearsonr, spearmanr

    temp_corr, temp_p = pearsonr(game_data['temperature'], game_data['total_runs'])
    wind_corr, wind_p = pearsonr(game_data['wind_speed'], game_data['total_runs'])

    print("Weather-Runs Correlation Analysis:")
    print(f"Temperature: r = {temp_corr:.3f}, p = {temp_p:.4f}")
    print(f"Wind Speed: r = {wind_corr:.3f}, p = {wind_p:.4f}")

    # Regression model
    from sklearn.linear_model import LinearRegression

    X = game_data[['temperature', 'wind_speed', 'humidity', 'pressure']]
    y = game_data['total_runs']

    model = LinearRegression()
    model.fit(X, y)

    print("\nRegression Coefficients:")
    for feature, coef in zip(X.columns, model.coef_):
        print(f"{feature}: {coef:.4f}")
    print(f"R-squared: {model.score(X, y):.3f}")

    # Visualize
    fig, axes = plt.subplots(2, 2, figsize=(12, 10))

    # Temperature vs runs
    axes[0, 0].scatter(game_data['temperature'], game_data['total_runs'],
                       alpha=0.5, s=20)
    axes[0, 0].plot(np.unique(game_data['temperature']),
                    np.poly1d(np.polyfit(game_data['temperature'],
                                        game_data['total_runs'], 1))(
                        np.unique(game_data['temperature'])),
                    color='red', linewidth=2)
    axes[0, 0].set_xlabel('Temperature (°F)')
    axes[0, 0].set_ylabel('Total Runs')
    axes[0, 0].set_title('Temperature vs Runs Scored')
    axes[0, 0].grid(True, alpha=0.3)

    # Wind vs runs
    axes[0, 1].scatter(game_data['wind_speed'], game_data['total_runs'],
                       alpha=0.5, s=20, color='green')
    axes[0, 1].plot(np.unique(game_data['wind_speed']),
                    np.poly1d(np.polyfit(game_data['wind_speed'],
                                        game_data['total_runs'], 1))(
                        np.unique(game_data['wind_speed'])),
                    color='darkgreen', linewidth=2)
    axes[0, 1].set_xlabel('Wind Speed (mph)')
    axes[0, 1].set_ylabel('Total Runs')
    axes[0, 1].set_title('Wind Speed vs Runs Scored')
    axes[0, 1].grid(True, alpha=0.3)

    # Humidity vs runs
    axes[1, 0].scatter(game_data['humidity'], game_data['total_runs'],
                       alpha=0.5, s=20, color='blue')
    axes[1, 0].set_xlabel('Humidity (%)')
    axes[1, 0].set_ylabel('Total Runs')
    axes[1, 0].set_title('Humidity vs Runs Scored')
    axes[1, 0].grid(True, alpha=0.3)

    # Temperature distribution by scoring level
    game_data['scoring_level'] = pd.cut(game_data['total_runs'],
                                         bins=[0, 3, 6, 100],
                                         labels=['Low (0-3)', 'Med (4-6)', 'High (7+)'])

    for level in ['Low (0-3)', 'Med (4-6)', 'High (7+)']:
        data = game_data[game_data['scoring_level'] == level]['temperature']
        axes[1, 1].hist(data, alpha=0.5, bins=20, label=level)

    axes[1, 1].set_xlabel('Temperature (°F)')
    axes[1, 1].set_ylabel('Frequency')
    axes[1, 1].set_title('Temperature Distribution by Scoring Level')
    axes[1, 1].legend()
    axes[1, 1].grid(True, alpha=0.3)

    plt.tight_layout()
    plt.savefig('weather_scoring_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()

    return game_data, model

game_data, weather_model = analyze_weather_scoring_relationship()

R
F_drag = 0.5 × ρ × C_d × A × v²
R
ρ = (P × M) / (R × T)
R
# R: Baseball Flight Physics Simulator
library(tidyverse)

# Constants
g <- 9.8  # gravity (m/s^2)
R_gas <- 8.314  # gas constant
M_air <- 0.029  # molar mass of dry air (kg/mol)
baseball_area <- 0.00426  # cross-sectional area (m^2)
baseball_mass <- 0.145  # mass (kg)
C_d <- 0.35  # drag coefficient

# Calculate air density
calculate_air_density <- function(temp_f, pressure_mb, humidity_pct) {
  # Convert to SI units
  temp_k <- (temp_f - 32) * 5/9 + 273.15
  pressure_pa <- pressure_mb * 100

  # Calculate saturation vapor pressure (Clausius-Clapeyron)
  e_s <- 611.2 * exp(17.67 * (temp_k - 273.15) / (temp_k - 29.65))

  # Actual vapor pressure
  e <- (humidity_pct / 100) * e_s

  # Partial pressure of dry air
  p_d <- pressure_pa - e

  # Air density (accounting for humidity)
  rho <- (p_d * M_air + e * 0.018) / (R_gas * temp_k)

  return(rho)
}

# Simulate ball flight
simulate_batted_ball <- function(exit_velo_mph, launch_angle_deg,
                                 temp_f, pressure_mb = 1013,
                                 humidity_pct = 50, dt = 0.01) {
  # Convert inputs
  v0 <- exit_velo_mph * 0.44704  # mph to m/s
  angle <- launch_angle_deg * pi / 180

  # Initial conditions
  vx <- v0 * cos(angle)
  vy <- v0 * sin(angle)
  x <- 0
  y <- 1  # bat height

  # Air density
  rho <- calculate_air_density(temp_f, pressure_mb, humidity_pct)

  # Storage
  trajectory <- data.frame(t = 0, x = x, y = y, vx = vx, vy = vy)

  t <- 0
  while (y > 0 && t < 10) {
    # Velocity magnitude
    v <- sqrt(vx^2 + vy^2)

    # Drag force
    F_drag <- 0.5 * rho * C_d * baseball_area * v^2

    # Acceleration components
    ax <- -(F_drag / baseball_mass) * (vx / v)
    ay <- -g - (F_drag / baseball_mass) * (vy / v)

    # Update velocity
    vx <- vx + ax * dt
    vy <- vy + ay * dt

    # Update position
    x <- x + vx * dt
    y <- y + vy * dt
    t <- t + dt

    # Store
    trajectory <- rbind(trajectory, data.frame(t = t, x = x, y = y,
                                               vx = vx, vy = vy))
  }

  # Convert distance to feet
  trajectory$x_ft <- trajectory$x * 3.28084
  trajectory$y_ft <- trajectory$y * 3.28084

  return(trajectory)
}

# Simulate the same batted ball at different temperatures
exit_velo <- 105  # mph
launch_angle <- 30  # degrees

temps <- c(50, 60, 70, 80, 90, 100)
results <- list()

for (temp in temps) {
  traj <- simulate_batted_ball(exit_velo, launch_angle, temp)
  max_dist <- max(traj$x_ft)
  results[[as.character(temp)]] <- data.frame(
    temp = temp,
    distance = max_dist
  )
}

distance_vs_temp <- bind_rows(results)

print("Distance vs Temperature:")
print(distance_vs_temp)

# Visualize impact
ggplot(distance_vs_temp, aes(x = temp, y = distance)) +
  geom_line(color = "red", size = 1.5) +
  geom_point(size = 3, color = "darkred") +
  labs(title = "Impact of Temperature on Batted Ball Distance",
       subtitle = "105 mph exit velocity, 30° launch angle",
       x = "Temperature (°F)",
       y = "Distance (feet)") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

# Calculate temperature effect
temp_effect <- lm(distance ~ temp, data = distance_vs_temp)
cat(sprintf("\nEstimated effect: %.2f feet per 10°F increase\n",
            coef(temp_effect)[2] * 10))
Python
# Python: Advanced Weather Impact Analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint

class BallFlightSimulator:
    """
    Simulate batted ball flight with weather effects
    """

    def __init__(self):
        self.g = 9.8  # m/s^2
        self.m = 0.145  # kg
        self.A = 0.00426  # m^2
        self.Cd = 0.35
        self.R = 8.314
        self.M_air = 0.029

    def air_density(self, temp_f, pressure_mb, humidity_pct):
        """Calculate air density accounting for temperature, pressure, humidity"""
        # Convert temperature
        temp_c = (temp_f - 32) * 5/9
        temp_k = temp_c + 273.15

        # Saturation vapor pressure (Magnus formula)
        e_s = 611.2 * np.exp(17.67 * temp_c / (temp_c + 243.5))

        # Actual vapor pressure
        e = (humidity_pct / 100) * e_s

        # Pressure in Pa
        pressure_pa = pressure_mb * 100

        # Partial pressure of dry air
        p_d = pressure_pa - e

        # Density (accounting for water vapor)
        rho = (p_d * self.M_air + e * 0.018) / (self.R * temp_k)

        return rho

    def simulate(self, exit_velo_mph, launch_angle_deg,
                 temp_f, pressure_mb=1013.25, humidity_pct=50,
                 wind_mph=0, backspin_rpm=2000):
        """
        Simulate ball flight trajectory

        Parameters:
        - exit_velo_mph: Exit velocity in mph
        - launch_angle_deg: Launch angle in degrees
        - temp_f: Temperature in Fahrenheit
        - pressure_mb: Atmospheric pressure in millibars
        - humidity_pct: Relative humidity percentage
        - wind_mph: Wind speed (positive = helping wind)
        - backspin_rpm: Backspin in RPM
        """
        # Convert units
        v0 = exit_velo_mph * 0.44704  # to m/s
        angle = np.radians(launch_angle_deg)
        wind = wind_mph * 0.44704  # to m/s
        omega = backspin_rpm * 2 * np.pi / 60  # to rad/s

        # Initial conditions
        vx0 = v0 * np.cos(angle)
        vy0 = v0 * np.sin(angle)

        # Air density
        rho = self.air_density(temp_f, pressure_mb, humidity_pct)

        # Time vector
        t = np.linspace(0, 10, 1000)

        # Differential equations
        def derivatives(state, t):
            x, y, vx, vy = state

            # Relative velocity (accounting for wind)
            vx_rel = vx - wind
            v = np.sqrt(vx_rel**2 + vy**2)

            if v == 0:
                return [vx, vy, 0, -self.g]

            # Drag force
            F_drag = 0.5 * rho * self.Cd * self.A * v**2

            # Magnus force (lift from backspin)
            # Simplified: F_magnus = S * omega * v
            S = 4.1e-4  # Magnus coefficient
            F_magnus = S * omega * v

            # Accelerations
            ax = -(F_drag / self.m) * (vx_rel / v)
            ay = -self.g - (F_drag / self.m) * (vy / v) + (F_magnus / self.m)

            return [vx, vy, ax, ay]

        # Solve ODE
        initial_state = [0, 1, vx0, vy0]
        solution = odeint(derivatives, initial_state, t)

        # Extract trajectory
        x = solution[:, 0] * 3.28084  # to feet
        y = solution[:, 1] * 3.28084  # to feet

        # Find landing point
        landing_idx = np.where(y < 0)[0]
        if len(landing_idx) > 0:
            landing_idx = landing_idx[0]
            distance = x[landing_idx]
            hang_time = t[landing_idx]
        else:
            distance = x[-1]
            hang_time = t[-1]

        return {
            'distance': distance,
            'hang_time': hang_time,
            'trajectory': pd.DataFrame({'x': x[:landing_idx+1],
                                       'y': y[:landing_idx+1],
                                       't': t[:landing_idx+1]})
        }

# Initialize simulator
sim = BallFlightSimulator()

# Standard batted ball: 105 mph, 30 degrees, 2000 rpm backspin
print("Weather Impact on Identical Batted Ball (105 mph, 30°, 2000 RPM):\n")

weather_conditions = [
    {'name': 'Cold April', 'temp': 45, 'pressure': 1020, 'humidity': 40},
    {'name': 'Cool Spring', 'temp': 60, 'pressure': 1015, 'humidity': 50},
    {'name': 'Mild Summer', 'temp': 75, 'pressure': 1013, 'humidity': 60},
    {'name': 'Hot & Humid', 'temp': 90, 'pressure': 1010, 'humidity': 80},
    {'name': 'Extreme Heat', 'temp': 105, 'pressure': 1005, 'humidity': 30}
]

results = []
for condition in weather_conditions:
    result = sim.simulate(105, 30, condition['temp'],
                         condition['pressure'], condition['humidity'])
    results.append({
        'condition': condition['name'],
        'temp': condition['temp'],
        'distance': result['distance'],
        'hang_time': result['hang_time']
    })

results_df = pd.DataFrame(results)
print(results_df.to_string(index=False))

# Calculate distance gained from cold to hot
cold_dist = results_df[results_df['temp'] == 45]['distance'].values[0]
hot_dist = results_df[results_df['temp'] == 105]['distance'].values[0]
print(f"\nDistance gained from 45°F to 105°F: {hot_dist - cold_dist:.1f} feet")
print(f"Percentage increase: {((hot_dist - cold_dist) / cold_dist * 100):.1f}%")

# Visualize results
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Distance vs temperature
axes[0].plot(results_df['temp'], results_df['distance'],
             'o-', linewidth=2, markersize=8, color='orangered')
axes[0].set_xlabel('Temperature (°F)', fontsize=12)
axes[0].set_ylabel('Distance (feet)', fontsize=12)
axes[0].set_title('Ball Flight Distance vs Temperature', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# Add reference line
axes[0].axhline(y=400, color='gray', linestyle='--', alpha=0.5, label='Home Run Distance')
axes[0].legend()

# Trajectory comparison (extreme conditions)
cold_traj = sim.simulate(105, 30, 45, 1020, 40)['trajectory']
hot_traj = sim.simulate(105, 30, 105, 1005, 30)['trajectory']

axes[1].plot(cold_traj['x'], cold_traj['y'], label='45°F (Cold)', linewidth=2)
axes[1].plot(hot_traj['x'], hot_traj['y'], label='105°F (Hot)', linewidth=2)
axes[1].set_xlabel('Distance (feet)', fontsize=12)
axes[1].set_ylabel('Height (feet)', fontsize=12)
axes[1].set_title('Trajectory Comparison', fontsize=14, fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
axes[1].set_xlim(0, 450)

plt.tight_layout()
plt.savefig('weather_ball_flight.png', dpi=300, bbox_inches='tight')
plt.show()
Python
# Python: Analyze actual weather effects on scoring
def analyze_weather_scoring_relationship():
    """
    Analyze relationship between weather and runs scored
    Using hypothetical but realistic data
    """

    # Simulate realistic game data with weather
    np.random.seed(42)
    n_games = 500

    game_data = pd.DataFrame({
        'game_id': range(n_games),
        'temperature': np.random.normal(72, 15, n_games),
        'humidity': np.random.normal(55, 20, n_games),
        'wind_speed': np.abs(np.random.normal(5, 5, n_games)),
        'pressure': np.random.normal(1013, 10, n_games)
    })

    # Clip to realistic ranges
    game_data['temperature'] = game_data['temperature'].clip(35, 105)
    game_data['humidity'] = game_data['humidity'].clip(10, 95)
    game_data['wind_speed'] = game_data['wind_speed'].clip(0, 25)

    # Generate runs with weather dependencies
    base_runs = 4.5
    temp_effect = (game_data['temperature'] - 70) * 0.03  # 0.3 runs per 10°F
    wind_effect = game_data['wind_speed'] * 0.05
    noise = np.random.normal(0, 1.5, n_games)

    game_data['total_runs'] = (base_runs + temp_effect +
                                wind_effect + noise).clip(0, None)
    game_data['total_runs'] = game_data['total_runs'].round().astype(int)

    # Statistical analysis
    from scipy.stats import pearsonr, spearmanr

    temp_corr, temp_p = pearsonr(game_data['temperature'], game_data['total_runs'])
    wind_corr, wind_p = pearsonr(game_data['wind_speed'], game_data['total_runs'])

    print("Weather-Runs Correlation Analysis:")
    print(f"Temperature: r = {temp_corr:.3f}, p = {temp_p:.4f}")
    print(f"Wind Speed: r = {wind_corr:.3f}, p = {wind_p:.4f}")

    # Regression model
    from sklearn.linear_model import LinearRegression

    X = game_data[['temperature', 'wind_speed', 'humidity', 'pressure']]
    y = game_data['total_runs']

    model = LinearRegression()
    model.fit(X, y)

    print("\nRegression Coefficients:")
    for feature, coef in zip(X.columns, model.coef_):
        print(f"{feature}: {coef:.4f}")
    print(f"R-squared: {model.score(X, y):.3f}")

    # Visualize
    fig, axes = plt.subplots(2, 2, figsize=(12, 10))

    # Temperature vs runs
    axes[0, 0].scatter(game_data['temperature'], game_data['total_runs'],
                       alpha=0.5, s=20)
    axes[0, 0].plot(np.unique(game_data['temperature']),
                    np.poly1d(np.polyfit(game_data['temperature'],
                                        game_data['total_runs'], 1))(
                        np.unique(game_data['temperature'])),
                    color='red', linewidth=2)
    axes[0, 0].set_xlabel('Temperature (°F)')
    axes[0, 0].set_ylabel('Total Runs')
    axes[0, 0].set_title('Temperature vs Runs Scored')
    axes[0, 0].grid(True, alpha=0.3)

    # Wind vs runs
    axes[0, 1].scatter(game_data['wind_speed'], game_data['total_runs'],
                       alpha=0.5, s=20, color='green')
    axes[0, 1].plot(np.unique(game_data['wind_speed']),
                    np.poly1d(np.polyfit(game_data['wind_speed'],
                                        game_data['total_runs'], 1))(
                        np.unique(game_data['wind_speed'])),
                    color='darkgreen', linewidth=2)
    axes[0, 1].set_xlabel('Wind Speed (mph)')
    axes[0, 1].set_ylabel('Total Runs')
    axes[0, 1].set_title('Wind Speed vs Runs Scored')
    axes[0, 1].grid(True, alpha=0.3)

    # Humidity vs runs
    axes[1, 0].scatter(game_data['humidity'], game_data['total_runs'],
                       alpha=0.5, s=20, color='blue')
    axes[1, 0].set_xlabel('Humidity (%)')
    axes[1, 0].set_ylabel('Total Runs')
    axes[1, 0].set_title('Humidity vs Runs Scored')
    axes[1, 0].grid(True, alpha=0.3)

    # Temperature distribution by scoring level
    game_data['scoring_level'] = pd.cut(game_data['total_runs'],
                                         bins=[0, 3, 6, 100],
                                         labels=['Low (0-3)', 'Med (4-6)', 'High (7+)'])

    for level in ['Low (0-3)', 'Med (4-6)', 'High (7+)']:
        data = game_data[game_data['scoring_level'] == level]['temperature']
        axes[1, 1].hist(data, alpha=0.5, bins=20, label=level)

    axes[1, 1].set_xlabel('Temperature (°F)')
    axes[1, 1].set_ylabel('Frequency')
    axes[1, 1].set_title('Temperature Distribution by Scoring Level')
    axes[1, 1].legend()
    axes[1, 1].grid(True, alpha=0.3)

    plt.tight_layout()
    plt.savefig('weather_scoring_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()

    return game_data, model

game_data, weather_model = analyze_weather_scoring_relationship()

22.3 Altitude & Air Density: Coors Field Analysis

The Coors Field Effect

Coors Field in Denver sits at 5,200 feet elevation, where air density is approximately 17% lower than sea level. This dramatically affects baseball flight:

Physical Effects:


  • Reduced drag force (proportional to air density)

  • Balls travel 5-10% farther

  • Curveballs break less (reduced Magnus force)

  • Fastballs have less rise

  • Harder for pitchers to control breaking pitches

Historical Impact:


  • 1995-2001: Extreme offense (park factor ~125)

  • 2002: Humidor installed

  • 2002-2018: Reduced but still high offense (PF ~115)

  • 2019: Enhanced humidor

  • 2019-present: More suppressed offense (PF ~108-110)

R Analysis: Quantifying Coors Effect

# R: Coors Field Deep Dive
library(tidyverse)
library(Lahman)

# Analyze Coors Field impact across eras
coors_analysis <- Teams %>%
  filter(teamID == "COL") %>%
  select(yearID, teamID, G, R, RA, HR, H, BB, SO) %>%
  mutate(
    rpg = R / G,
    hrpg = HR / G,
    era = case_when(
      yearID >= 1993 & yearID <= 2001 ~ "Pre-Humidor (1993-2001)",
      yearID >= 2002 & yearID <= 2018 ~ "Humidor v1 (2002-2018)",
      yearID >= 2019 ~ "Enhanced Humidor (2019+)"
    )
  ) %>%
  filter(!is.na(era))

# Compare eras
era_summary <- coors_analysis %>%
  group_by(era) %>%
  summarise(
    years = n(),
    avg_rpg = mean(rpg),
    avg_hrpg = mean(hrpg),
    .groups = 'drop'
  )

print("Coors Field by Era:")
print(era_summary)

# Visualize the change
ggplot(coors_analysis, aes(x = yearID, y = rpg)) +
  geom_line(size = 1.2, color = "purple") +
  geom_point(size = 2.5, color = "purple") +
  geom_vline(xintercept = 2002, linetype = "dashed", color = "blue", size = 1) +
  geom_vline(xintercept = 2019, linetype = "dashed", color = "red", size = 1) +
  annotate("text", x = 2002, y = 6.5, label = "Humidor\nInstalled",
           hjust = -0.1, color = "blue") +
  annotate("text", x = 2019, y = 6.5, label = "Enhanced\nHumidor",
           hjust = -0.1, color = "red") +
  labs(title = "Colorado Rockies Runs Per Game Over Time",
       subtitle = "Impact of Humidor Installation",
       x = "Year",
       y = "Runs Per Game") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 14))

# Calculate altitude-adjusted metrics for Coors players
# Example: Player home/road splits

# Simulate player data at Coors
player_coors <- data.frame(
  player = c("Larry Walker", "Todd Helton", "Charlie Blackmon",
             "Nolan Arenado", "Trevor Story"),
  years_coors = c(10, 17, 9, 8, 7),
  home_avg = c(.381, .345, .331, .315, .294),
  road_avg = c(.282, .287, .281, .267, .251),
  home_hr_rate = c(.075, .062, .048, .055, .057),
  road_hr_rate = c(.048, .041, .029, .038, .039)
)

player_coors <- player_coors %>%
  mutate(
    avg_split = home_avg - road_avg,
    hr_rate_split = home_hr_rate - road_hr_rate,
    avg_ratio = home_avg / road_avg,
    hr_ratio = home_hr_rate / road_hr_rate
  )

print("\nCoors Field Player Splits:")
print(player_coors %>%
        select(player, avg_split, hr_rate_split, avg_ratio, hr_ratio))

# Visualize splits
player_coors_long <- player_coors %>%
  select(player, home_avg, road_avg) %>%
  pivot_longer(cols = c(home_avg, road_avg),
               names_to = "location",
               values_to = "batting_avg") %>%
  mutate(location = ifelse(location == "home_avg", "Home (Coors)", "Road"))

ggplot(player_coors_long, aes(x = player, y = batting_avg, fill = location)) +
  geom_col(position = "dodge", alpha = 0.8) +
  scale_fill_manual(values = c("purple", "gray60"), name = "") +
  coord_flip() +
  labs(title = "Home vs Road Batting Average",
       subtitle = "Notable Rockies Hitters",
       x = "",
       y = "Batting Average") +
  theme_minimal() +
  theme(legend.position = "top")

Python Implementation: Physics-Based Altitude Modeling

# Python: Altitude Effects on Ball Flight
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

class AltitudeEffects:
    """
    Model altitude effects on baseball physics
    """

    def __init__(self):
        self.g = 9.8  # m/s^2
        self.sea_level_pressure = 101325  # Pa
        self.sea_level_temp = 288.15  # K
        self.lapse_rate = 0.0065  # K/m

    def pressure_at_altitude(self, altitude_ft):
        """Calculate atmospheric pressure at altitude"""
        altitude_m = altitude_ft * 0.3048

        # Barometric formula
        temp = self.sea_level_temp - self.lapse_rate * altitude_m
        pressure = self.sea_level_pressure * (
            (temp / self.sea_level_temp) ** 5.255
        )

        return pressure / 100  # Convert to mb

    def air_density_ratio(self, altitude_ft, temp_f=70):
        """
        Calculate air density relative to sea level
        """
        # Convert temperature
        temp_c = (temp_f - 32) * 5/9
        temp_k = temp_c + 273.15

        # Pressure at altitude
        pressure_alt = self.pressure_at_altitude(altitude_ft)

        # Density ratio (simplified)
        altitude_m = altitude_ft * 0.3048
        temp_alt = temp_k - self.lapse_rate * altitude_m

        rho_ratio = (pressure_alt / (self.sea_level_pressure / 100)) * (
            self.sea_level_temp / temp_alt
        )

        return rho_ratio

    def distance_multiplier(self, altitude_ft, temp_f=70):
        """
        Estimate distance multiplier due to altitude
        Empirical relationship: distance increases ~4% per 1000 ft
        """
        rho_ratio = self.air_density_ratio(altitude_ft, temp_f)

        # Distance is inversely related to drag, which is proportional to density
        # Approximately: distance_ratio = rho_ratio^(-0.5)
        # More accurate empirical fit: distance_ratio = 1 / (rho_ratio^0.55)

        distance_ratio = 1 / (rho_ratio ** 0.55)

        return distance_ratio

# Initialize
alt_model = AltitudeEffects()

# Compare major league ballparks by elevation
ballparks = pd.DataFrame({
    'park': ['Coors Field', 'Chase Field', 'Kauffman Stadium',
             'Dodger Stadium', 'Wrigley Field', 'Fenway Park',
             'Yankee Stadium', 'Oracle Park'],
    'city': ['Denver', 'Phoenix', 'Kansas City', 'Los Angeles',
             'Chicago', 'Boston', 'New York', 'San Francisco'],
    'elevation_ft': [5200, 1090, 750, 340, 595, 20, 55, 0],
    'avg_temp_summer': [73, 95, 79, 75, 73, 73, 77, 65]
})

# Calculate effects
ballparks['pressure_mb'] = ballparks['elevation_ft'].apply(
    alt_model.pressure_at_altitude
)
ballparks['density_ratio'] = ballparks.apply(
    lambda row: alt_model.air_density_ratio(row['elevation_ft'],
                                            row['avg_temp_summer']),
    axis=1
)
ballparks['distance_multiplier'] = ballparks.apply(
    lambda row: alt_model.distance_multiplier(row['elevation_ft'],
                                              row['avg_temp_summer']),
    axis=1
)

# Calculate distance for standard 400-foot fly ball
ballparks['actual_distance'] = 400 * ballparks['distance_multiplier']
ballparks['distance_gain'] = ballparks['actual_distance'] - 400

print("Altitude Effects by Ballpark:\n")
print(ballparks[['park', 'elevation_ft', 'density_ratio',
                 'distance_multiplier', 'actual_distance',
                 'distance_gain']].to_string(index=False))

# Visualize
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Elevation vs distance gain
axes[0, 0].scatter(ballparks['elevation_ft'], ballparks['distance_gain'],
                   s=100, alpha=0.6, c=ballparks['elevation_ft'],
                   cmap='terrain')
for idx, row in ballparks.iterrows():
    if row['elevation_ft'] > 500:
        axes[0, 0].annotate(row['city'],
                           (row['elevation_ft'], row['distance_gain']),
                           fontsize=9, alpha=0.8)
axes[0, 0].set_xlabel('Elevation (feet)')
axes[0, 0].set_ylabel('Distance Gained (feet)')
axes[0, 0].set_title('Altitude Effect on 400-ft Fly Ball')
axes[0, 0].grid(True, alpha=0.3)

# Air density ratio by park
parks_sorted = ballparks.sort_values('density_ratio')
colors = ['red' if x > 5000 else 'blue' if x > 1000 else 'green'
          for x in parks_sorted['elevation_ft']]
axes[0, 1].barh(range(len(parks_sorted)), parks_sorted['density_ratio'],
                color=colors, alpha=0.6)
axes[0, 1].set_yticks(range(len(parks_sorted)))
axes[0, 1].set_yticklabels(parks_sorted['city'])
axes[0, 1].set_xlabel('Air Density Ratio (vs sea level)')
axes[0, 1].set_title('Relative Air Density by Ballpark')
axes[0, 1].grid(True, alpha=0.3, axis='x')

# Distance multiplier across range of elevations
elevations = np.linspace(0, 6000, 100)
multipliers = [alt_model.distance_multiplier(e) for e in elevations]

axes[1, 0].plot(elevations, multipliers, linewidth=2, color='purple')
axes[1, 0].axhline(y=1.0, linestyle='--', color='gray', alpha=0.5)
axes[1, 0].scatter([5200], [alt_model.distance_multiplier(5200)],
                   s=200, c='red', zorder=5, label='Coors Field')
axes[1, 0].set_xlabel('Elevation (feet)')
axes[1, 0].set_ylabel('Distance Multiplier')
axes[1, 0].set_title('Ball Flight Distance vs Elevation')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Expected vs actual Coors park factor
# Theoretical prediction vs observed
years = np.arange(1995, 2024)
theoretical_pf = 100 * alt_model.distance_multiplier(5200)

# Simulate observed park factors (with humidor effects)
observed_pf = []
for year in years:
    if year < 2002:
        pf = np.random.normal(125, 3)  # Pre-humidor
    elif year < 2019:
        pf = np.random.normal(115, 3)  # Humidor v1
    else:
        pf = np.random.normal(110, 2)  # Enhanced humidor
    observed_pf.append(pf)

axes[1, 1].plot(years, observed_pf, label='Observed', linewidth=2)
axes[1, 1].axhline(y=theoretical_pf, linestyle='--', color='red',
                   linewidth=2, label=f'Theoretical ({theoretical_pf:.0f})')
axes[1, 1].axvline(x=2002, linestyle=':', color='blue', alpha=0.5)
axes[1, 1].axvline(x=2019, linestyle=':', color='green', alpha=0.5)
axes[1, 1].annotate('Humidor', xy=(2002, 120), fontsize=9)
axes[1, 1].annotate('Enhanced', xy=(2019, 120), fontsize=9)
axes[1, 1].set_xlabel('Year')
axes[1, 1].set_ylabel('Park Factor')
axes[1, 1].set_title('Coors Field Park Factor Over Time')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('altitude_effects_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

Breaking Ball Analysis at Altitude

# Python: Analyze breaking ball effectiveness at different elevations
def breaking_ball_altitude_effect():
    """
    Model breaking ball movement at different altitudes
    """

    # Magnus force depends on air density
    # F_magnus = S * rho * A * v * omega
    # Where omega is spin rate

    alt_model = AltitudeEffects()

    # Standard curveball parameters
    velocity_mph = 78
    spin_rpm = 2400

    # Convert units
    v = velocity_mph * 0.44704  # m/s
    omega = spin_rpm * 2 * np.pi / 60  # rad/s

    elevations = [0, 1000, 2000, 3000, 4000, 5200]

    results = []
    for elevation in elevations:
        rho_ratio = alt_model.air_density_ratio(elevation)

        # Break is proportional to Magnus force, which is proportional to density
        break_ratio = rho_ratio

        # Assuming sea level break is 12 inches
        sea_level_break = 12
        actual_break = sea_level_break * break_ratio
        break_reduction = sea_level_break - actual_break

        results.append({
            'elevation': elevation,
            'density_ratio': rho_ratio,
            'break_inches': actual_break,
            'break_reduction': break_reduction,
            'break_pct_reduction': (break_reduction / sea_level_break) * 100
        })

    results_df = pd.DataFrame(results)

    print("Breaking Ball Movement at Altitude:")
    print("(78 mph curveball with 2400 RPM)\n")
    print(results_df.to_string(index=False))

    # Visualize
    fig, ax = plt.subplots(figsize=(10, 6))

    ax.plot(results_df['elevation'], results_df['break_inches'],
            'o-', linewidth=2, markersize=8, color='red')
    ax.axhline(y=12, linestyle='--', color='gray', alpha=0.5,
               label='Sea Level Break')
    ax.scatter([5200], [results_df[results_df['elevation']==5200]['break_inches'].values[0]],
               s=300, c='darkred', zorder=5, marker='*', label='Coors Field')

    ax.set_xlabel('Elevation (feet)', fontsize=12)
    ax.set_ylabel('Curveball Break (inches)', fontsize=12)
    ax.set_title('Breaking Ball Movement vs Altitude', fontsize=14, fontweight='bold')
    ax.grid(True, alpha=0.3)
    ax.legend()

    plt.tight_layout()
    plt.show()

    return results_df

breaking_df = breaking_ball_altitude_effect()

# Calculate pitcher disadvantage at Coors
coors_break = breaking_df[breaking_df['elevation']==5200]['break_inches'].values[0]
sea_level_break = breaking_df[breaking_df['elevation']==0]['break_inches'].values[0]
print(f"\nCurveball at Coors: {coors_break:.1f} inches break")
print(f"Curveball at sea level: {sea_level_break:.1f} inches break")
print(f"Reduction: {sea_level_break - coors_break:.1f} inches ({((sea_level_break - coors_break)/sea_level_break*100):.1f}%)")

R
# R: Coors Field Deep Dive
library(tidyverse)
library(Lahman)

# Analyze Coors Field impact across eras
coors_analysis <- Teams %>%
  filter(teamID == "COL") %>%
  select(yearID, teamID, G, R, RA, HR, H, BB, SO) %>%
  mutate(
    rpg = R / G,
    hrpg = HR / G,
    era = case_when(
      yearID >= 1993 & yearID <= 2001 ~ "Pre-Humidor (1993-2001)",
      yearID >= 2002 & yearID <= 2018 ~ "Humidor v1 (2002-2018)",
      yearID >= 2019 ~ "Enhanced Humidor (2019+)"
    )
  ) %>%
  filter(!is.na(era))

# Compare eras
era_summary <- coors_analysis %>%
  group_by(era) %>%
  summarise(
    years = n(),
    avg_rpg = mean(rpg),
    avg_hrpg = mean(hrpg),
    .groups = 'drop'
  )

print("Coors Field by Era:")
print(era_summary)

# Visualize the change
ggplot(coors_analysis, aes(x = yearID, y = rpg)) +
  geom_line(size = 1.2, color = "purple") +
  geom_point(size = 2.5, color = "purple") +
  geom_vline(xintercept = 2002, linetype = "dashed", color = "blue", size = 1) +
  geom_vline(xintercept = 2019, linetype = "dashed", color = "red", size = 1) +
  annotate("text", x = 2002, y = 6.5, label = "Humidor\nInstalled",
           hjust = -0.1, color = "blue") +
  annotate("text", x = 2019, y = 6.5, label = "Enhanced\nHumidor",
           hjust = -0.1, color = "red") +
  labs(title = "Colorado Rockies Runs Per Game Over Time",
       subtitle = "Impact of Humidor Installation",
       x = "Year",
       y = "Runs Per Game") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 14))

# Calculate altitude-adjusted metrics for Coors players
# Example: Player home/road splits

# Simulate player data at Coors
player_coors <- data.frame(
  player = c("Larry Walker", "Todd Helton", "Charlie Blackmon",
             "Nolan Arenado", "Trevor Story"),
  years_coors = c(10, 17, 9, 8, 7),
  home_avg = c(.381, .345, .331, .315, .294),
  road_avg = c(.282, .287, .281, .267, .251),
  home_hr_rate = c(.075, .062, .048, .055, .057),
  road_hr_rate = c(.048, .041, .029, .038, .039)
)

player_coors <- player_coors %>%
  mutate(
    avg_split = home_avg - road_avg,
    hr_rate_split = home_hr_rate - road_hr_rate,
    avg_ratio = home_avg / road_avg,
    hr_ratio = home_hr_rate / road_hr_rate
  )

print("\nCoors Field Player Splits:")
print(player_coors %>%
        select(player, avg_split, hr_rate_split, avg_ratio, hr_ratio))

# Visualize splits
player_coors_long <- player_coors %>%
  select(player, home_avg, road_avg) %>%
  pivot_longer(cols = c(home_avg, road_avg),
               names_to = "location",
               values_to = "batting_avg") %>%
  mutate(location = ifelse(location == "home_avg", "Home (Coors)", "Road"))

ggplot(player_coors_long, aes(x = player, y = batting_avg, fill = location)) +
  geom_col(position = "dodge", alpha = 0.8) +
  scale_fill_manual(values = c("purple", "gray60"), name = "") +
  coord_flip() +
  labs(title = "Home vs Road Batting Average",
       subtitle = "Notable Rockies Hitters",
       x = "",
       y = "Batting Average") +
  theme_minimal() +
  theme(legend.position = "top")
Python
# Python: Altitude Effects on Ball Flight
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

class AltitudeEffects:
    """
    Model altitude effects on baseball physics
    """

    def __init__(self):
        self.g = 9.8  # m/s^2
        self.sea_level_pressure = 101325  # Pa
        self.sea_level_temp = 288.15  # K
        self.lapse_rate = 0.0065  # K/m

    def pressure_at_altitude(self, altitude_ft):
        """Calculate atmospheric pressure at altitude"""
        altitude_m = altitude_ft * 0.3048

        # Barometric formula
        temp = self.sea_level_temp - self.lapse_rate * altitude_m
        pressure = self.sea_level_pressure * (
            (temp / self.sea_level_temp) ** 5.255
        )

        return pressure / 100  # Convert to mb

    def air_density_ratio(self, altitude_ft, temp_f=70):
        """
        Calculate air density relative to sea level
        """
        # Convert temperature
        temp_c = (temp_f - 32) * 5/9
        temp_k = temp_c + 273.15

        # Pressure at altitude
        pressure_alt = self.pressure_at_altitude(altitude_ft)

        # Density ratio (simplified)
        altitude_m = altitude_ft * 0.3048
        temp_alt = temp_k - self.lapse_rate * altitude_m

        rho_ratio = (pressure_alt / (self.sea_level_pressure / 100)) * (
            self.sea_level_temp / temp_alt
        )

        return rho_ratio

    def distance_multiplier(self, altitude_ft, temp_f=70):
        """
        Estimate distance multiplier due to altitude
        Empirical relationship: distance increases ~4% per 1000 ft
        """
        rho_ratio = self.air_density_ratio(altitude_ft, temp_f)

        # Distance is inversely related to drag, which is proportional to density
        # Approximately: distance_ratio = rho_ratio^(-0.5)
        # More accurate empirical fit: distance_ratio = 1 / (rho_ratio^0.55)

        distance_ratio = 1 / (rho_ratio ** 0.55)

        return distance_ratio

# Initialize
alt_model = AltitudeEffects()

# Compare major league ballparks by elevation
ballparks = pd.DataFrame({
    'park': ['Coors Field', 'Chase Field', 'Kauffman Stadium',
             'Dodger Stadium', 'Wrigley Field', 'Fenway Park',
             'Yankee Stadium', 'Oracle Park'],
    'city': ['Denver', 'Phoenix', 'Kansas City', 'Los Angeles',
             'Chicago', 'Boston', 'New York', 'San Francisco'],
    'elevation_ft': [5200, 1090, 750, 340, 595, 20, 55, 0],
    'avg_temp_summer': [73, 95, 79, 75, 73, 73, 77, 65]
})

# Calculate effects
ballparks['pressure_mb'] = ballparks['elevation_ft'].apply(
    alt_model.pressure_at_altitude
)
ballparks['density_ratio'] = ballparks.apply(
    lambda row: alt_model.air_density_ratio(row['elevation_ft'],
                                            row['avg_temp_summer']),
    axis=1
)
ballparks['distance_multiplier'] = ballparks.apply(
    lambda row: alt_model.distance_multiplier(row['elevation_ft'],
                                              row['avg_temp_summer']),
    axis=1
)

# Calculate distance for standard 400-foot fly ball
ballparks['actual_distance'] = 400 * ballparks['distance_multiplier']
ballparks['distance_gain'] = ballparks['actual_distance'] - 400

print("Altitude Effects by Ballpark:\n")
print(ballparks[['park', 'elevation_ft', 'density_ratio',
                 'distance_multiplier', 'actual_distance',
                 'distance_gain']].to_string(index=False))

# Visualize
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Elevation vs distance gain
axes[0, 0].scatter(ballparks['elevation_ft'], ballparks['distance_gain'],
                   s=100, alpha=0.6, c=ballparks['elevation_ft'],
                   cmap='terrain')
for idx, row in ballparks.iterrows():
    if row['elevation_ft'] > 500:
        axes[0, 0].annotate(row['city'],
                           (row['elevation_ft'], row['distance_gain']),
                           fontsize=9, alpha=0.8)
axes[0, 0].set_xlabel('Elevation (feet)')
axes[0, 0].set_ylabel('Distance Gained (feet)')
axes[0, 0].set_title('Altitude Effect on 400-ft Fly Ball')
axes[0, 0].grid(True, alpha=0.3)

# Air density ratio by park
parks_sorted = ballparks.sort_values('density_ratio')
colors = ['red' if x > 5000 else 'blue' if x > 1000 else 'green'
          for x in parks_sorted['elevation_ft']]
axes[0, 1].barh(range(len(parks_sorted)), parks_sorted['density_ratio'],
                color=colors, alpha=0.6)
axes[0, 1].set_yticks(range(len(parks_sorted)))
axes[0, 1].set_yticklabels(parks_sorted['city'])
axes[0, 1].set_xlabel('Air Density Ratio (vs sea level)')
axes[0, 1].set_title('Relative Air Density by Ballpark')
axes[0, 1].grid(True, alpha=0.3, axis='x')

# Distance multiplier across range of elevations
elevations = np.linspace(0, 6000, 100)
multipliers = [alt_model.distance_multiplier(e) for e in elevations]

axes[1, 0].plot(elevations, multipliers, linewidth=2, color='purple')
axes[1, 0].axhline(y=1.0, linestyle='--', color='gray', alpha=0.5)
axes[1, 0].scatter([5200], [alt_model.distance_multiplier(5200)],
                   s=200, c='red', zorder=5, label='Coors Field')
axes[1, 0].set_xlabel('Elevation (feet)')
axes[1, 0].set_ylabel('Distance Multiplier')
axes[1, 0].set_title('Ball Flight Distance vs Elevation')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Expected vs actual Coors park factor
# Theoretical prediction vs observed
years = np.arange(1995, 2024)
theoretical_pf = 100 * alt_model.distance_multiplier(5200)

# Simulate observed park factors (with humidor effects)
observed_pf = []
for year in years:
    if year < 2002:
        pf = np.random.normal(125, 3)  # Pre-humidor
    elif year < 2019:
        pf = np.random.normal(115, 3)  # Humidor v1
    else:
        pf = np.random.normal(110, 2)  # Enhanced humidor
    observed_pf.append(pf)

axes[1, 1].plot(years, observed_pf, label='Observed', linewidth=2)
axes[1, 1].axhline(y=theoretical_pf, linestyle='--', color='red',
                   linewidth=2, label=f'Theoretical ({theoretical_pf:.0f})')
axes[1, 1].axvline(x=2002, linestyle=':', color='blue', alpha=0.5)
axes[1, 1].axvline(x=2019, linestyle=':', color='green', alpha=0.5)
axes[1, 1].annotate('Humidor', xy=(2002, 120), fontsize=9)
axes[1, 1].annotate('Enhanced', xy=(2019, 120), fontsize=9)
axes[1, 1].set_xlabel('Year')
axes[1, 1].set_ylabel('Park Factor')
axes[1, 1].set_title('Coors Field Park Factor Over Time')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('altitude_effects_analysis.png', dpi=300, bbox_inches='tight')
plt.show()
Python
# Python: Analyze breaking ball effectiveness at different elevations
def breaking_ball_altitude_effect():
    """
    Model breaking ball movement at different altitudes
    """

    # Magnus force depends on air density
    # F_magnus = S * rho * A * v * omega
    # Where omega is spin rate

    alt_model = AltitudeEffects()

    # Standard curveball parameters
    velocity_mph = 78
    spin_rpm = 2400

    # Convert units
    v = velocity_mph * 0.44704  # m/s
    omega = spin_rpm * 2 * np.pi / 60  # rad/s

    elevations = [0, 1000, 2000, 3000, 4000, 5200]

    results = []
    for elevation in elevations:
        rho_ratio = alt_model.air_density_ratio(elevation)

        # Break is proportional to Magnus force, which is proportional to density
        break_ratio = rho_ratio

        # Assuming sea level break is 12 inches
        sea_level_break = 12
        actual_break = sea_level_break * break_ratio
        break_reduction = sea_level_break - actual_break

        results.append({
            'elevation': elevation,
            'density_ratio': rho_ratio,
            'break_inches': actual_break,
            'break_reduction': break_reduction,
            'break_pct_reduction': (break_reduction / sea_level_break) * 100
        })

    results_df = pd.DataFrame(results)

    print("Breaking Ball Movement at Altitude:")
    print("(78 mph curveball with 2400 RPM)\n")
    print(results_df.to_string(index=False))

    # Visualize
    fig, ax = plt.subplots(figsize=(10, 6))

    ax.plot(results_df['elevation'], results_df['break_inches'],
            'o-', linewidth=2, markersize=8, color='red')
    ax.axhline(y=12, linestyle='--', color='gray', alpha=0.5,
               label='Sea Level Break')
    ax.scatter([5200], [results_df[results_df['elevation']==5200]['break_inches'].values[0]],
               s=300, c='darkred', zorder=5, marker='*', label='Coors Field')

    ax.set_xlabel('Elevation (feet)', fontsize=12)
    ax.set_ylabel('Curveball Break (inches)', fontsize=12)
    ax.set_title('Breaking Ball Movement vs Altitude', fontsize=14, fontweight='bold')
    ax.grid(True, alpha=0.3)
    ax.legend()

    plt.tight_layout()
    plt.show()

    return results_df

breaking_df = breaking_ball_altitude_effect()

# Calculate pitcher disadvantage at Coors
coors_break = breaking_df[breaking_df['elevation']==5200]['break_inches'].values[0]
sea_level_break = breaking_df[breaking_df['elevation']==0]['break_inches'].values[0]
print(f"\nCurveball at Coors: {coors_break:.1f} inches break")
print(f"Curveball at sea level: {sea_level_break:.1f} inches break")
print(f"Reduction: {sea_level_break - coors_break:.1f} inches ({((sea_level_break - coors_break)/sea_level_break*100):.1f}%)")

22.4 Temperature & Humidity Effects

Temperature's Multi-Faceted Impact

Temperature affects baseball through several mechanisms:

  1. Ball Elasticity: Warmer balls are more elastic, bouncing off bats with higher exit velocity
  2. Air Density: Warmer air is less dense, reducing drag
  3. Player Physiology: Muscles function better in moderate warmth

Empirical Rule of Thumb: Every 10°F increase adds approximately 2-3 feet to fly ball distance.

R Analysis: Temperature and Offense

# R: Temperature Impact on MLB Offense
library(tidyverse)

# Simulate realistic temperature-performance data
set.seed(456)
n_games <- 1000

temp_data <- tibble(
  game_id = 1:n_games,
  temperature = round(rnorm(n_games, 75, 12)),
  # Add realistic constraints
  temperature = pmin(pmax(temperature, 40), 105),

  # Generate runs with temperature effect
  # Base: 4.5 runs, +0.25 runs per 10°F above 70
  base_runs = 4.5,
  temp_effect = (temperature - 70) * 0.025,
  random_effect = rnorm(n_games, 0, 1.5),
  total_runs = round(pmax(base_runs + temp_effect + random_effect, 0)),

  # Home runs with stronger temperature effect
  base_hr = 1.2,
  hr_temp_effect = (temperature - 70) * 0.015,
  hr_random = rnorm(n_games, 0, 0.5),
  home_runs = round(pmax(base_hr + hr_temp_effect + hr_random, 0))
)

# Statistical analysis
temp_model <- lm(total_runs ~ temperature, data = temp_data)
hr_model <- lm(home_runs ~ temperature, data = temp_data)

cat("Temperature Effect on Runs:\n")
cat(sprintf("Coefficient: %.4f runs per degree F\n", coef(temp_model)[2]))
cat(sprintf("Per 10°F: %.2f runs\n", coef(temp_model)[2] * 10))
cat(sprintf("R-squared: %.3f\n", summary(temp_model)$r.squared))
cat(sprintf("P-value: %.4f\n\n", summary(temp_model)$coefficients[2, 4]))

cat("Temperature Effect on Home Runs:\n")
cat(sprintf("Coefficient: %.4f HR per degree F\n", coef(hr_model)[2]))
cat(sprintf("Per 10°F: %.2f HR\n", coef(hr_model)[2] * 10))
cat(sprintf("R-squared: %.3f\n", summary(hr_model)$r.squared))

# Bin by temperature ranges
temp_bins <- temp_data %>%
  mutate(
    temp_category = cut(temperature,
                       breaks = c(0, 55, 65, 75, 85, 110),
                       labels = c("Cold (<55)", "Cool (55-65)",
                                 "Mild (65-75)", "Warm (75-85)",
                                 "Hot (85+)"))
  ) %>%
  group_by(temp_category) %>%
  summarise(
    n_games = n(),
    avg_runs = mean(total_runs),
    avg_hr = mean(home_runs),
    sd_runs = sd(total_runs),
    se_runs = sd_runs / sqrt(n_games),
    .groups = 'drop'
  )

print("\nScoring by Temperature Range:")
print(temp_bins)

# Visualizations
p1 <- ggplot(temp_data, aes(x = temperature, y = total_runs)) +
  geom_point(alpha = 0.3, color = "coral") +
  geom_smooth(method = "lm", se = TRUE, color = "darkred", size = 1.5) +
  labs(title = "Temperature vs Total Runs",
       x = "Temperature (°F)",
       y = "Total Runs") +
  theme_minimal()

p2 <- ggplot(temp_bins, aes(x = temp_category, y = avg_runs)) +
  geom_col(fill = "steelblue", alpha = 0.7) +
  geom_errorbar(aes(ymin = avg_runs - se_runs, ymax = avg_runs + se_runs),
                width = 0.2) +
  labs(title = "Average Runs by Temperature Category",
       x = "Temperature Range",
       y = "Average Runs per Game") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

library(patchwork)
p1 + p2

# Predict scoring for extreme temperatures
new_temps <- data.frame(temperature = c(45, 60, 75, 90, 100))
predictions <- predict(temp_model, new_temps, interval = "confidence")

cat("\nPredicted Runs at Different Temperatures:\n")
print(cbind(new_temps, predictions))

Python Implementation: Humidity Analysis

# Python: Humidity Effects on Baseball
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

class HumidityEffect:
    """
    Model humidity effects on air density and ball flight
    """

    def __init__(self):
        self.R = 8.314  # Gas constant
        self.M_dry = 0.029  # Molar mass dry air
        self.M_water = 0.018  # Molar mass water vapor

    def air_density_with_humidity(self, temp_f, pressure_mb, humidity_pct):
        """
        Calculate air density accounting for humidity

        Counterintuitive: Higher humidity = LOWER density
        Water molecules (H2O, M=18) displace nitrogen/oxygen (M=28/32)
        """
        # Convert temperature
        temp_c = (temp_f - 32) * 5/9
        temp_k = temp_c + 273.15

        # Saturation vapor pressure (Magnus formula)
        e_s = 611.2 * np.exp(17.67 * temp_c / (temp_c + 243.5))

        # Actual vapor pressure
        e = (humidity_pct / 100) * e_s

        # Pressure in Pa
        P = pressure_mb * 100

        # Partial pressure of dry air
        P_d = P - e

        # Air density
        rho = (P_d * self.M_dry + e * self.M_water) / (self.R * temp_k)

        return rho

    def humidity_distance_effect(self, temp_f, base_humidity=50):
        """
        Calculate distance change due to humidity
        Higher humidity = less dense air = farther distance
        """
        pressure = 1013.25  # Standard pressure

        # Base density at 50% humidity
        rho_base = self.air_density_with_humidity(temp_f, pressure, base_humidity)

        # Calculate for range of humidities
        humidities = np.arange(10, 100, 10)
        densities = [self.air_density_with_humidity(temp_f, pressure, h)
                     for h in humidities]

        # Distance is inversely proportional to density^0.55
        distance_ratios = [(rho_base / rho) ** 0.55 for rho in densities]

        return pd.DataFrame({
            'humidity': humidities,
            'density': densities,
            'distance_ratio': distance_ratios,
            'distance_400ft': [400 * ratio for ratio in distance_ratios]
        })

# Analysis
humidity_model = HumidityEffect()

# Compare different temperatures
temps = [50, 70, 90]
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for idx, temp in enumerate(temps):
    df = humidity_model.humidity_distance_effect(temp)

    axes[idx].plot(df['humidity'], df['distance_400ft'], 'o-',
                   linewidth=2, markersize=6)
    axes[idx].axhline(y=400, linestyle='--', color='gray', alpha=0.5)
    axes[idx].set_xlabel('Humidity (%)', fontsize=11)
    axes[idx].set_ylabel('Distance (feet)', fontsize=11)
    axes[idx].set_title(f'Temperature: {temp}°F', fontsize=12, fontweight='bold')
    axes[idx].grid(True, alpha=0.3)
    axes[idx].set_ylim([399, 406])

plt.tight_layout()
plt.savefig('humidity_effects.png', dpi=300, bbox_inches='tight')
plt.show()

# Calculate typical effect size
df_70 = humidity_model.humidity_distance_effect(70)
dry_distance = df_70[df_70['humidity']==10]['distance_400ft'].values[0]
humid_distance = df_70[df_70['humidity']==90]['distance_400ft'].values[0]

print(f"At 70°F, 400-foot fly ball:")
print(f"At 10% humidity: {dry_distance:.1f} feet")
print(f"At 90% humidity: {humid_distance:.1f} feet")
print(f"Difference: {humid_distance - dry_distance:.1f} feet")
print(f"\nCounterIntuitive: Higher humidity HELPS hitting!")
print("Water vapor is lighter than air, reducing air density")

# Combined temperature-humidity effect
def combined_weather_effect():
    """Model combined temperature and humidity effects"""

    temps = np.linspace(50, 95, 10)
    humidities = np.linspace(20, 80, 10)

    T, H = np.meshgrid(temps, humidities)

    # Calculate distance for each combination
    distances = np.zeros_like(T)
    for i in range(len(humidities)):
        for j in range(len(temps)):
            rho = humidity_model.air_density_with_humidity(
                T[i, j], 1013.25, H[i, j]
            )
            # Reference: sea level, 70°F, 50% humidity
            rho_ref = humidity_model.air_density_with_humidity(
                70, 1013.25, 50
            )
            distances[i, j] = 400 * (rho_ref / rho) ** 0.55

    # Create contour plot
    fig, ax = plt.subplots(figsize=(10, 7))

    contour = ax.contourf(T, H, distances, levels=15, cmap='RdYlGn')
    contour_lines = ax.contour(T, H, distances, levels=10, colors='black',
                               alpha=0.3, linewidths=0.5)
    ax.clabel(contour_lines, inline=True, fontsize=8, fmt='%.0f ft')

    cbar = plt.colorbar(contour, ax=ax, label='Distance (feet)')
    ax.set_xlabel('Temperature (°F)', fontsize=12)
    ax.set_ylabel('Humidity (%)', fontsize=12)
    ax.set_title('Combined Temperature-Humidity Effect on 400-ft Fly Ball',
                 fontsize=14, fontweight='bold')

    # Mark some notable conditions
    ax.plot(50, 30, 'b*', markersize=15, label='Cold & Dry')
    ax.plot(95, 80, 'r*', markersize=15, label='Hot & Humid')
    ax.legend(loc='upper left')

    plt.tight_layout()
    plt.savefig('temp_humidity_combined.png', dpi=300, bbox_inches='tight')
    plt.show()

combined_weather_effect()

Real-World Application: Weather-Based Betting

# Python: Using weather data for run total predictions
def weather_betting_model():
    """
    Build model to predict over/under based on weather
    """

    # Simulate historical game data with weather
    np.random.seed(789)
    n = 2000

    data = pd.DataFrame({
        'temp': np.random.normal(72, 13, n),
        'humidity': np.random.normal(55, 18, n),
        'wind_speed': np.abs(np.random.normal(6, 4, n)),
        'wind_direction': np.random.choice(['In', 'Out', 'Cross', 'Calm'], n,
                                          p=[0.25, 0.25, 0.3, 0.2]),
        'park_factor': np.random.normal(100, 8, n)
    })

    # Generate runs with realistic relationships
    base_runs = 8.5
    temp_effect = (data['temp'] - 70) * 0.04
    humidity_effect = (data['humidity'] - 50) * 0.01
    wind_effect = np.where(data['wind_direction'] == 'Out',
                           data['wind_speed'] * 0.08,
                           -data['wind_speed'] * 0.03)
    park_effect = (data['park_factor'] - 100) * 0.03
    noise = np.random.normal(0, 1.8, n)

    data['total_runs'] = (base_runs + temp_effect + humidity_effect +
                          wind_effect + park_effect + noise).clip(2, 20)
    data['total_runs'] = data['total_runs'].round()

    # Build regression model
    from sklearn.linear_model import LinearRegression
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import mean_absolute_error, r2_score

    # Encode wind direction
    data_encoded = pd.get_dummies(data, columns=['wind_direction'],
                                   drop_first=True)

    X = data_encoded.drop('total_runs', axis=1)
    y = data_encoded['total_runs']

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )

    model = LinearRegression()
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)

    print("Weather-Based Run Prediction Model")
    print("=" * 50)
    print(f"R-squared: {r2_score(y_test, y_pred):.3f}")
    print(f"MAE: {mean_absolute_error(y_test, y_pred):.2f} runs")
    print(f"\nFeature Coefficients:")

    for feature, coef in zip(X.columns, model.coef_):
        print(f"  {feature}: {coef:.4f}")

    # Example prediction
    example_game = pd.DataFrame({
        'temp': [85],
        'humidity': [70],
        'wind_speed': [10],
        'park_factor': [110],
        'wind_direction_Out': [1],
        'wind_direction_Calm': [0],
        'wind_direction_Cross': [0]
    })

    prediction = model.predict(example_game)[0]
    print(f"\nExample Prediction:")
    print(f"Conditions: 85°F, 70% humidity, 10mph wind OUT, PF=110")
    print(f"Predicted Total Runs: {prediction:.1f}")

    # Visualize predictions vs actual
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))

    axes[0].scatter(y_test, y_pred, alpha=0.3, s=20)
    axes[0].plot([y_test.min(), y_test.max()],
                 [y_test.min(), y_test.max()],
                 'r--', linewidth=2)
    axes[0].set_xlabel('Actual Total Runs')
    axes[0].set_ylabel('Predicted Total Runs')
    axes[0].set_title('Prediction Accuracy')
    axes[0].grid(True, alpha=0.3)

    residuals = y_test - y_pred
    axes[1].hist(residuals, bins=30, edgecolor='black', alpha=0.7)
    axes[1].set_xlabel('Prediction Error (runs)')
    axes[1].set_ylabel('Frequency')
    axes[1].set_title('Distribution of Errors')
    axes[1].grid(True, alpha=0.3, axis='y')

    plt.tight_layout()
    plt.show()

    return model, data_encoded

weather_model, weather_data = weather_betting_model()

R
# R: Temperature Impact on MLB Offense
library(tidyverse)

# Simulate realistic temperature-performance data
set.seed(456)
n_games <- 1000

temp_data <- tibble(
  game_id = 1:n_games,
  temperature = round(rnorm(n_games, 75, 12)),
  # Add realistic constraints
  temperature = pmin(pmax(temperature, 40), 105),

  # Generate runs with temperature effect
  # Base: 4.5 runs, +0.25 runs per 10°F above 70
  base_runs = 4.5,
  temp_effect = (temperature - 70) * 0.025,
  random_effect = rnorm(n_games, 0, 1.5),
  total_runs = round(pmax(base_runs + temp_effect + random_effect, 0)),

  # Home runs with stronger temperature effect
  base_hr = 1.2,
  hr_temp_effect = (temperature - 70) * 0.015,
  hr_random = rnorm(n_games, 0, 0.5),
  home_runs = round(pmax(base_hr + hr_temp_effect + hr_random, 0))
)

# Statistical analysis
temp_model <- lm(total_runs ~ temperature, data = temp_data)
hr_model <- lm(home_runs ~ temperature, data = temp_data)

cat("Temperature Effect on Runs:\n")
cat(sprintf("Coefficient: %.4f runs per degree F\n", coef(temp_model)[2]))
cat(sprintf("Per 10°F: %.2f runs\n", coef(temp_model)[2] * 10))
cat(sprintf("R-squared: %.3f\n", summary(temp_model)$r.squared))
cat(sprintf("P-value: %.4f\n\n", summary(temp_model)$coefficients[2, 4]))

cat("Temperature Effect on Home Runs:\n")
cat(sprintf("Coefficient: %.4f HR per degree F\n", coef(hr_model)[2]))
cat(sprintf("Per 10°F: %.2f HR\n", coef(hr_model)[2] * 10))
cat(sprintf("R-squared: %.3f\n", summary(hr_model)$r.squared))

# Bin by temperature ranges
temp_bins <- temp_data %>%
  mutate(
    temp_category = cut(temperature,
                       breaks = c(0, 55, 65, 75, 85, 110),
                       labels = c("Cold (<55)", "Cool (55-65)",
                                 "Mild (65-75)", "Warm (75-85)",
                                 "Hot (85+)"))
  ) %>%
  group_by(temp_category) %>%
  summarise(
    n_games = n(),
    avg_runs = mean(total_runs),
    avg_hr = mean(home_runs),
    sd_runs = sd(total_runs),
    se_runs = sd_runs / sqrt(n_games),
    .groups = 'drop'
  )

print("\nScoring by Temperature Range:")
print(temp_bins)

# Visualizations
p1 <- ggplot(temp_data, aes(x = temperature, y = total_runs)) +
  geom_point(alpha = 0.3, color = "coral") +
  geom_smooth(method = "lm", se = TRUE, color = "darkred", size = 1.5) +
  labs(title = "Temperature vs Total Runs",
       x = "Temperature (°F)",
       y = "Total Runs") +
  theme_minimal()

p2 <- ggplot(temp_bins, aes(x = temp_category, y = avg_runs)) +
  geom_col(fill = "steelblue", alpha = 0.7) +
  geom_errorbar(aes(ymin = avg_runs - se_runs, ymax = avg_runs + se_runs),
                width = 0.2) +
  labs(title = "Average Runs by Temperature Category",
       x = "Temperature Range",
       y = "Average Runs per Game") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

library(patchwork)
p1 + p2

# Predict scoring for extreme temperatures
new_temps <- data.frame(temperature = c(45, 60, 75, 90, 100))
predictions <- predict(temp_model, new_temps, interval = "confidence")

cat("\nPredicted Runs at Different Temperatures:\n")
print(cbind(new_temps, predictions))
Python
# Python: Humidity Effects on Baseball
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

class HumidityEffect:
    """
    Model humidity effects on air density and ball flight
    """

    def __init__(self):
        self.R = 8.314  # Gas constant
        self.M_dry = 0.029  # Molar mass dry air
        self.M_water = 0.018  # Molar mass water vapor

    def air_density_with_humidity(self, temp_f, pressure_mb, humidity_pct):
        """
        Calculate air density accounting for humidity

        Counterintuitive: Higher humidity = LOWER density
        Water molecules (H2O, M=18) displace nitrogen/oxygen (M=28/32)
        """
        # Convert temperature
        temp_c = (temp_f - 32) * 5/9
        temp_k = temp_c + 273.15

        # Saturation vapor pressure (Magnus formula)
        e_s = 611.2 * np.exp(17.67 * temp_c / (temp_c + 243.5))

        # Actual vapor pressure
        e = (humidity_pct / 100) * e_s

        # Pressure in Pa
        P = pressure_mb * 100

        # Partial pressure of dry air
        P_d = P - e

        # Air density
        rho = (P_d * self.M_dry + e * self.M_water) / (self.R * temp_k)

        return rho

    def humidity_distance_effect(self, temp_f, base_humidity=50):
        """
        Calculate distance change due to humidity
        Higher humidity = less dense air = farther distance
        """
        pressure = 1013.25  # Standard pressure

        # Base density at 50% humidity
        rho_base = self.air_density_with_humidity(temp_f, pressure, base_humidity)

        # Calculate for range of humidities
        humidities = np.arange(10, 100, 10)
        densities = [self.air_density_with_humidity(temp_f, pressure, h)
                     for h in humidities]

        # Distance is inversely proportional to density^0.55
        distance_ratios = [(rho_base / rho) ** 0.55 for rho in densities]

        return pd.DataFrame({
            'humidity': humidities,
            'density': densities,
            'distance_ratio': distance_ratios,
            'distance_400ft': [400 * ratio for ratio in distance_ratios]
        })

# Analysis
humidity_model = HumidityEffect()

# Compare different temperatures
temps = [50, 70, 90]
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for idx, temp in enumerate(temps):
    df = humidity_model.humidity_distance_effect(temp)

    axes[idx].plot(df['humidity'], df['distance_400ft'], 'o-',
                   linewidth=2, markersize=6)
    axes[idx].axhline(y=400, linestyle='--', color='gray', alpha=0.5)
    axes[idx].set_xlabel('Humidity (%)', fontsize=11)
    axes[idx].set_ylabel('Distance (feet)', fontsize=11)
    axes[idx].set_title(f'Temperature: {temp}°F', fontsize=12, fontweight='bold')
    axes[idx].grid(True, alpha=0.3)
    axes[idx].set_ylim([399, 406])

plt.tight_layout()
plt.savefig('humidity_effects.png', dpi=300, bbox_inches='tight')
plt.show()

# Calculate typical effect size
df_70 = humidity_model.humidity_distance_effect(70)
dry_distance = df_70[df_70['humidity']==10]['distance_400ft'].values[0]
humid_distance = df_70[df_70['humidity']==90]['distance_400ft'].values[0]

print(f"At 70°F, 400-foot fly ball:")
print(f"At 10% humidity: {dry_distance:.1f} feet")
print(f"At 90% humidity: {humid_distance:.1f} feet")
print(f"Difference: {humid_distance - dry_distance:.1f} feet")
print(f"\nCounterIntuitive: Higher humidity HELPS hitting!")
print("Water vapor is lighter than air, reducing air density")

# Combined temperature-humidity effect
def combined_weather_effect():
    """Model combined temperature and humidity effects"""

    temps = np.linspace(50, 95, 10)
    humidities = np.linspace(20, 80, 10)

    T, H = np.meshgrid(temps, humidities)

    # Calculate distance for each combination
    distances = np.zeros_like(T)
    for i in range(len(humidities)):
        for j in range(len(temps)):
            rho = humidity_model.air_density_with_humidity(
                T[i, j], 1013.25, H[i, j]
            )
            # Reference: sea level, 70°F, 50% humidity
            rho_ref = humidity_model.air_density_with_humidity(
                70, 1013.25, 50
            )
            distances[i, j] = 400 * (rho_ref / rho) ** 0.55

    # Create contour plot
    fig, ax = plt.subplots(figsize=(10, 7))

    contour = ax.contourf(T, H, distances, levels=15, cmap='RdYlGn')
    contour_lines = ax.contour(T, H, distances, levels=10, colors='black',
                               alpha=0.3, linewidths=0.5)
    ax.clabel(contour_lines, inline=True, fontsize=8, fmt='%.0f ft')

    cbar = plt.colorbar(contour, ax=ax, label='Distance (feet)')
    ax.set_xlabel('Temperature (°F)', fontsize=12)
    ax.set_ylabel('Humidity (%)', fontsize=12)
    ax.set_title('Combined Temperature-Humidity Effect on 400-ft Fly Ball',
                 fontsize=14, fontweight='bold')

    # Mark some notable conditions
    ax.plot(50, 30, 'b*', markersize=15, label='Cold & Dry')
    ax.plot(95, 80, 'r*', markersize=15, label='Hot & Humid')
    ax.legend(loc='upper left')

    plt.tight_layout()
    plt.savefig('temp_humidity_combined.png', dpi=300, bbox_inches='tight')
    plt.show()

combined_weather_effect()
Python
# Python: Using weather data for run total predictions
def weather_betting_model():
    """
    Build model to predict over/under based on weather
    """

    # Simulate historical game data with weather
    np.random.seed(789)
    n = 2000

    data = pd.DataFrame({
        'temp': np.random.normal(72, 13, n),
        'humidity': np.random.normal(55, 18, n),
        'wind_speed': np.abs(np.random.normal(6, 4, n)),
        'wind_direction': np.random.choice(['In', 'Out', 'Cross', 'Calm'], n,
                                          p=[0.25, 0.25, 0.3, 0.2]),
        'park_factor': np.random.normal(100, 8, n)
    })

    # Generate runs with realistic relationships
    base_runs = 8.5
    temp_effect = (data['temp'] - 70) * 0.04
    humidity_effect = (data['humidity'] - 50) * 0.01
    wind_effect = np.where(data['wind_direction'] == 'Out',
                           data['wind_speed'] * 0.08,
                           -data['wind_speed'] * 0.03)
    park_effect = (data['park_factor'] - 100) * 0.03
    noise = np.random.normal(0, 1.8, n)

    data['total_runs'] = (base_runs + temp_effect + humidity_effect +
                          wind_effect + park_effect + noise).clip(2, 20)
    data['total_runs'] = data['total_runs'].round()

    # Build regression model
    from sklearn.linear_model import LinearRegression
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import mean_absolute_error, r2_score

    # Encode wind direction
    data_encoded = pd.get_dummies(data, columns=['wind_direction'],
                                   drop_first=True)

    X = data_encoded.drop('total_runs', axis=1)
    y = data_encoded['total_runs']

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )

    model = LinearRegression()
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)

    print("Weather-Based Run Prediction Model")
    print("=" * 50)
    print(f"R-squared: {r2_score(y_test, y_pred):.3f}")
    print(f"MAE: {mean_absolute_error(y_test, y_pred):.2f} runs")
    print(f"\nFeature Coefficients:")

    for feature, coef in zip(X.columns, model.coef_):
        print(f"  {feature}: {coef:.4f}")

    # Example prediction
    example_game = pd.DataFrame({
        'temp': [85],
        'humidity': [70],
        'wind_speed': [10],
        'park_factor': [110],
        'wind_direction_Out': [1],
        'wind_direction_Calm': [0],
        'wind_direction_Cross': [0]
    })

    prediction = model.predict(example_game)[0]
    print(f"\nExample Prediction:")
    print(f"Conditions: 85°F, 70% humidity, 10mph wind OUT, PF=110")
    print(f"Predicted Total Runs: {prediction:.1f}")

    # Visualize predictions vs actual
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))

    axes[0].scatter(y_test, y_pred, alpha=0.3, s=20)
    axes[0].plot([y_test.min(), y_test.max()],
                 [y_test.min(), y_test.max()],
                 'r--', linewidth=2)
    axes[0].set_xlabel('Actual Total Runs')
    axes[0].set_ylabel('Predicted Total Runs')
    axes[0].set_title('Prediction Accuracy')
    axes[0].grid(True, alpha=0.3)

    residuals = y_test - y_pred
    axes[1].hist(residuals, bins=30, edgecolor='black', alpha=0.7)
    axes[1].set_xlabel('Prediction Error (runs)')
    axes[1].set_ylabel('Frequency')
    axes[1].set_title('Distribution of Errors')
    axes[1].grid(True, alpha=0.3, axis='y')

    plt.tight_layout()
    plt.show()

    return model, data_encoded

weather_model, weather_data = weather_betting_model()

22.5 Wind Analysis & Modeling

Wind's Directional Impact

Wind affects batted balls differently based on direction:


  • Blowing Out: Helps fly balls, especially to the outfield wind is blowing toward

  • Blowing In: Suppresses fly balls

  • Cross Wind: Affects lateral movement, can turn home runs foul

  • Swirling: Creates unpredictability, especially at Wrigley Field

Rule of Thumb: 10 mph wind blowing out adds ~15-25 feet to fly balls; blowing in subtracts similar amounts.

R Implementation: Wind Effect Analysis

# R: Analyze Wind Effects
library(tidyverse)

# Simulate wind and batted ball outcomes
set.seed(321)

wind_data <- expand_grid(
  wind_direction = c("Out to CF", "Out to LF", "Out to RF",
                    "In from CF", "Cross L-to-R", "Cross R-to-L", "Calm"),
  wind_speed = c(0, 5, 10, 15, 20),
  exit_velo = seq(95, 110, 5),
  launch_angle = seq(20, 35, 5)
) %>%
  rowwise() %>%
  mutate(
    # Base distance
    base_distance = -140 + 4.5 * exit_velo + 6 * launch_angle -
                   0.08 * launch_angle^2,

    # Wind effect
    wind_effect = case_when(
      wind_direction == "Calm" ~ 0,
      grepl("Out", wind_direction) ~ wind_speed * 1.8,
      grepl("In", wind_direction) ~ -wind_speed * 2.0,
      grepl("Cross", wind_direction) ~ wind_speed * 0.3
    ),

    # Actual distance
    actual_distance = base_distance + wind_effect + rnorm(1, 0, 5),

    # Outcome
    is_hr = actual_distance >= 400
  ) %>%
  ungroup()

# Analyze wind impact on HR rate
hr_by_wind <- wind_data %>%
  group_by(wind_direction, wind_speed) %>%
  summarise(
    batted_balls = n(),
    home_runs = sum(is_hr),
    hr_rate = mean(is_hr),
    avg_distance = mean(actual_distance),
    .groups = 'drop'
  )

# Focus on meaningful wind speeds
hr_summary <- wind_data %>%
  filter(wind_speed %in% c(0, 10, 20)) %>%
  mutate(wind_label = paste(wind_direction, wind_speed, "mph")) %>%
  group_by(wind_label) %>%
  summarise(
    hr_rate = mean(is_hr),
    avg_distance = mean(actual_distance),
    .groups = 'drop'
  ) %>%
  arrange(desc(hr_rate))

print("Home Run Rate by Wind Conditions:")
print(hr_summary, n = 25)

# Visualize
wind_out <- wind_data %>%
  filter(wind_direction %in% c("Out to CF", "In from CF", "Calm")) %>%
  mutate(wind_direction = factor(wind_direction,
                                levels = c("In from CF", "Calm", "Out to CF")))

ggplot(wind_out, aes(x = wind_speed, y = actual_distance,
                     color = wind_direction)) +
  geom_smooth(method = "lm", se = TRUE, size = 1.5) +
  scale_color_manual(values = c("blue", "gray", "red"),
                    name = "Wind Direction") +
  labs(title = "Wind Impact on Batted Ball Distance",
       subtitle = "All exit velocities and launch angles",
       x = "Wind Speed (mph)",
       y = "Distance (feet)") +
  theme_minimal() +
  theme(legend.position = "top")

# Statistical model
wind_model <- lm(actual_distance ~ wind_speed * wind_direction +
                exit_velo + launch_angle,
                data = wind_data)

cat("\nWind Effect Model:\n")
print(summary(wind_model)$coefficients[2:3, ])

# Calculate expected distance change
wind_predictions <- expand_grid(
  wind_speed = c(0, 10, 20),
  wind_direction = "Out to CF",
  exit_velo = 105,
  launch_angle = 28
)

wind_predictions$predicted_distance <- predict(wind_model, wind_predictions)

cat("\nPredicted Distance (105 mph EV, 28° LA):\n")
print(wind_predictions %>% select(wind_speed, wind_direction, predicted_distance))

Python Implementation: Wrigley Field Wind Analysis

# Python: Wrigley Field Wind Patterns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import seaborn as sns

# Wrigley Field is famous for wind effects
# Lake Michigan creates unique patterns

def wrigley_wind_analysis():
    """
    Analyze characteristic wind patterns at Wrigley Field
    """

    # Simulate Wrigley wind data by month
    months = ['April', 'May', 'June', 'July', 'August', 'September', 'October']

    # Wrigley: wind often blows IN in April/May, OUT in summer
    wind_patterns = pd.DataFrame({
        'month': months,
        'avg_wind_speed': [12, 11, 9, 8, 8, 9, 11],
        'pct_blowing_out': [25, 35, 45, 50, 48, 40, 30],
        'pct_blowing_in': [45, 40, 30, 25, 27, 35, 40],
        'pct_cross': [30, 25, 25, 25, 25, 25, 30],
        'avg_runs_per_game': [8.2, 8.9, 9.4, 9.7, 9.5, 9.1, 8.5],
        'avg_hr_per_game': [2.3, 2.7, 3.1, 3.4, 3.2, 2.9, 2.5]
    })

    print("Wrigley Field Wind Patterns by Month:")
    print(wind_patterns.to_string(index=False))

    # Correlation between wind direction and offense
    from scipy.stats import pearsonr

    corr_runs, p_runs = pearsonr(wind_patterns['pct_blowing_out'],
                                  wind_patterns['avg_runs_per_game'])
    corr_hr, p_hr = pearsonr(wind_patterns['pct_blowing_out'],
                             wind_patterns['avg_hr_per_game'])

    print(f"\nCorrelation Analysis:")
    print(f"Wind Out % vs Runs/Game: r = {corr_runs:.3f}, p = {p_runs:.3f}")
    print(f"Wind Out % vs HR/Game: r = {corr_hr:.3f}, p = {p_hr:.3f}")

    # Visualize
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))

    # Wind direction by month
    x = np.arange(len(months))
    width = 0.25

    axes[0, 0].bar(x - width, wind_patterns['pct_blowing_out'], width,
                   label='Blowing Out', color='red', alpha=0.7)
    axes[0, 0].bar(x, wind_patterns['pct_blowing_in'], width,
                   label='Blowing In', color='blue', alpha=0.7)
    axes[0, 0].bar(x + width, wind_patterns['pct_cross'], width,
                   label='Cross Wind', color='gray', alpha=0.7)
    axes[0, 0].set_xlabel('Month')
    axes[0, 0].set_ylabel('Percentage of Games (%)')
    axes[0, 0].set_title('Wrigley Field Wind Direction by Month')
    axes[0, 0].set_xticks(x)
    axes[0, 0].set_xticklabels(months, rotation=45)
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3, axis='y')

    # Scoring vs wind out percentage
    axes[0, 1].scatter(wind_patterns['pct_blowing_out'],
                      wind_patterns['avg_runs_per_game'],
                      s=100, alpha=0.6)
    for idx, month in enumerate(months):
        axes[0, 1].annotate(month,
                           (wind_patterns.iloc[idx]['pct_blowing_out'],
                            wind_patterns.iloc[idx]['avg_runs_per_game']),
                           fontsize=9, alpha=0.7)

    # Add trend line
    z = np.polyfit(wind_patterns['pct_blowing_out'],
                   wind_patterns['avg_runs_per_game'], 1)
    p = np.poly1d(z)
    axes[0, 1].plot(wind_patterns['pct_blowing_out'],
                    p(wind_patterns['pct_blowing_out']),
                    "r--", alpha=0.8, linewidth=2)

    axes[0, 1].set_xlabel('% Games with Wind Blowing Out')
    axes[0, 1].set_ylabel('Average Runs per Game')
    axes[0, 1].set_title('Wind Direction vs Scoring')
    axes[0, 1].grid(True, alpha=0.3)

    # Home runs vs wind
    axes[1, 0].scatter(wind_patterns['pct_blowing_out'],
                      wind_patterns['avg_hr_per_game'],
                      s=100, alpha=0.6, color='orange')
    for idx, month in enumerate(months):
        axes[1, 0].annotate(month,
                           (wind_patterns.iloc[idx]['pct_blowing_out'],
                            wind_patterns.iloc[idx]['avg_hr_per_game']),
                           fontsize=9, alpha=0.7)

    z = np.polyfit(wind_patterns['pct_blowing_out'],
                   wind_patterns['avg_hr_per_game'], 1)
    p = np.poly1d(z)
    axes[1, 0].plot(wind_patterns['pct_blowing_out'],
                    p(wind_patterns['pct_blowing_out']),
                    "r--", alpha=0.8, linewidth=2)

    axes[1, 0].set_xlabel('% Games with Wind Blowing Out')
    axes[1, 0].set_ylabel('Average Home Runs per Game')
    axes[1, 0].set_title('Wind Direction vs Home Runs')
    axes[1, 0].grid(True, alpha=0.3)

    # Monthly trends
    axes[1, 1].plot(months, wind_patterns['avg_runs_per_game'],
                   'o-', linewidth=2, markersize=8, label='Runs/Game')
    ax2 = axes[1, 1].twinx()
    ax2.plot(months, wind_patterns['avg_hr_per_game'],
            'o-', linewidth=2, markersize=8, color='orange', label='HR/Game')

    axes[1, 1].set_xlabel('Month')
    axes[1, 1].set_ylabel('Runs per Game', color='blue')
    ax2.set_ylabel('Home Runs per Game', color='orange')
    axes[1, 1].set_title('Seasonal Offensive Trends at Wrigley')
    axes[1, 1].tick_params(axis='y', labelcolor='blue')
    ax2.tick_params(axis='y', labelcolor='orange')
    axes[1, 1].set_xticklabels(months, rotation=45)
    axes[1, 1].grid(True, alpha=0.3)

    plt.tight_layout()
    plt.savefig('wrigley_wind_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()

    return wind_patterns

wrigley_wind = wrigley_wind_analysis()

# Wind rose visualization
def create_wind_rose():
    """
    Create wind rose showing typical Wrigley wind patterns
    """

    # Simulate wind direction data (degrees from North)
    # Wrigley: Lake Michigan is to the east
    np.random.seed(42)

    # Common directions at Wrigley
    # North (0°): In from CF
    # South (180°): Out to CF
    # East (90°): From lake, cross wind
    # West (270°): Cross wind

    directions = []
    speeds = []

    # Generate realistic distribution
    for _ in range(1000):
        # More common: SW winds (blowing out), NE winds (blowing in)
        direction_type = np.random.choice(['SW', 'NE', 'E', 'W', 'Variable'],
                                         p=[0.35, 0.30, 0.15, 0.15, 0.05])

        if direction_type == 'SW':
            dir_deg = np.random.normal(225, 20)  # Southwest
            speed = np.random.normal(10, 3)
        elif direction_type == 'NE':
            dir_deg = np.random.normal(45, 20)  # Northeast
            speed = np.random.normal(12, 3)
        elif direction_type == 'E':
            dir_deg = np.random.normal(90, 15)  # East
            speed = np.random.normal(8, 2)
        elif direction_type == 'W':
            dir_deg = np.random.normal(270, 15)  # West
            speed = np.random.normal(8, 2)
        else:
            dir_deg = np.random.uniform(0, 360)
            speed = np.random.normal(5, 2)

        directions.append(dir_deg % 360)
        speeds.append(max(0, speed))

    # Create polar histogram
    fig = plt.figure(figsize=(10, 10))
    ax = fig.add_subplot(111, projection='polar')

    # Convert to radians and bin
    dir_radians = np.array(directions) * np.pi / 180
    bins = np.linspace(0, 2*np.pi, 17)

    hist, bin_edges = np.histogram(dir_radians, bins=bins)

    # Plot
    width = 2*np.pi / 16
    bars = ax.bar(bin_edges[:-1], hist, width=width, alpha=0.7)

    # Color by impact (SW = red for wind out, NE = blue for wind in)
    colors = []
    for edge in bin_edges[:-1]:
        deg = edge * 180 / np.pi
        if 135 <= deg <= 315:  # SW quadrant (wind blowing out)
            colors.append('red')
        elif deg < 135 or deg > 315:  # NE quadrant (wind blowing in)
            colors.append('blue')
        else:
            colors.append('gray')

    for bar, color in zip(bars, colors):
        bar.set_facecolor(color)
        bar.set_alpha(0.6)

    ax.set_theta_zero_location('N')
    ax.set_theta_direction(-1)
    ax.set_title('Wrigley Field Wind Rose\\n(Red = Blowing Out, Blue = Blowing In)',
                 fontsize=14, fontweight='bold', pad=20)

    plt.tight_layout()
    plt.show()

create_wind_rose()

R
# R: Analyze Wind Effects
library(tidyverse)

# Simulate wind and batted ball outcomes
set.seed(321)

wind_data <- expand_grid(
  wind_direction = c("Out to CF", "Out to LF", "Out to RF",
                    "In from CF", "Cross L-to-R", "Cross R-to-L", "Calm"),
  wind_speed = c(0, 5, 10, 15, 20),
  exit_velo = seq(95, 110, 5),
  launch_angle = seq(20, 35, 5)
) %>%
  rowwise() %>%
  mutate(
    # Base distance
    base_distance = -140 + 4.5 * exit_velo + 6 * launch_angle -
                   0.08 * launch_angle^2,

    # Wind effect
    wind_effect = case_when(
      wind_direction == "Calm" ~ 0,
      grepl("Out", wind_direction) ~ wind_speed * 1.8,
      grepl("In", wind_direction) ~ -wind_speed * 2.0,
      grepl("Cross", wind_direction) ~ wind_speed * 0.3
    ),

    # Actual distance
    actual_distance = base_distance + wind_effect + rnorm(1, 0, 5),

    # Outcome
    is_hr = actual_distance >= 400
  ) %>%
  ungroup()

# Analyze wind impact on HR rate
hr_by_wind <- wind_data %>%
  group_by(wind_direction, wind_speed) %>%
  summarise(
    batted_balls = n(),
    home_runs = sum(is_hr),
    hr_rate = mean(is_hr),
    avg_distance = mean(actual_distance),
    .groups = 'drop'
  )

# Focus on meaningful wind speeds
hr_summary <- wind_data %>%
  filter(wind_speed %in% c(0, 10, 20)) %>%
  mutate(wind_label = paste(wind_direction, wind_speed, "mph")) %>%
  group_by(wind_label) %>%
  summarise(
    hr_rate = mean(is_hr),
    avg_distance = mean(actual_distance),
    .groups = 'drop'
  ) %>%
  arrange(desc(hr_rate))

print("Home Run Rate by Wind Conditions:")
print(hr_summary, n = 25)

# Visualize
wind_out <- wind_data %>%
  filter(wind_direction %in% c("Out to CF", "In from CF", "Calm")) %>%
  mutate(wind_direction = factor(wind_direction,
                                levels = c("In from CF", "Calm", "Out to CF")))

ggplot(wind_out, aes(x = wind_speed, y = actual_distance,
                     color = wind_direction)) +
  geom_smooth(method = "lm", se = TRUE, size = 1.5) +
  scale_color_manual(values = c("blue", "gray", "red"),
                    name = "Wind Direction") +
  labs(title = "Wind Impact on Batted Ball Distance",
       subtitle = "All exit velocities and launch angles",
       x = "Wind Speed (mph)",
       y = "Distance (feet)") +
  theme_minimal() +
  theme(legend.position = "top")

# Statistical model
wind_model <- lm(actual_distance ~ wind_speed * wind_direction +
                exit_velo + launch_angle,
                data = wind_data)

cat("\nWind Effect Model:\n")
print(summary(wind_model)$coefficients[2:3, ])

# Calculate expected distance change
wind_predictions <- expand_grid(
  wind_speed = c(0, 10, 20),
  wind_direction = "Out to CF",
  exit_velo = 105,
  launch_angle = 28
)

wind_predictions$predicted_distance <- predict(wind_model, wind_predictions)

cat("\nPredicted Distance (105 mph EV, 28° LA):\n")
print(wind_predictions %>% select(wind_speed, wind_direction, predicted_distance))
Python
# Python: Wrigley Field Wind Patterns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import seaborn as sns

# Wrigley Field is famous for wind effects
# Lake Michigan creates unique patterns

def wrigley_wind_analysis():
    """
    Analyze characteristic wind patterns at Wrigley Field
    """

    # Simulate Wrigley wind data by month
    months = ['April', 'May', 'June', 'July', 'August', 'September', 'October']

    # Wrigley: wind often blows IN in April/May, OUT in summer
    wind_patterns = pd.DataFrame({
        'month': months,
        'avg_wind_speed': [12, 11, 9, 8, 8, 9, 11],
        'pct_blowing_out': [25, 35, 45, 50, 48, 40, 30],
        'pct_blowing_in': [45, 40, 30, 25, 27, 35, 40],
        'pct_cross': [30, 25, 25, 25, 25, 25, 30],
        'avg_runs_per_game': [8.2, 8.9, 9.4, 9.7, 9.5, 9.1, 8.5],
        'avg_hr_per_game': [2.3, 2.7, 3.1, 3.4, 3.2, 2.9, 2.5]
    })

    print("Wrigley Field Wind Patterns by Month:")
    print(wind_patterns.to_string(index=False))

    # Correlation between wind direction and offense
    from scipy.stats import pearsonr

    corr_runs, p_runs = pearsonr(wind_patterns['pct_blowing_out'],
                                  wind_patterns['avg_runs_per_game'])
    corr_hr, p_hr = pearsonr(wind_patterns['pct_blowing_out'],
                             wind_patterns['avg_hr_per_game'])

    print(f"\nCorrelation Analysis:")
    print(f"Wind Out % vs Runs/Game: r = {corr_runs:.3f}, p = {p_runs:.3f}")
    print(f"Wind Out % vs HR/Game: r = {corr_hr:.3f}, p = {p_hr:.3f}")

    # Visualize
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))

    # Wind direction by month
    x = np.arange(len(months))
    width = 0.25

    axes[0, 0].bar(x - width, wind_patterns['pct_blowing_out'], width,
                   label='Blowing Out', color='red', alpha=0.7)
    axes[0, 0].bar(x, wind_patterns['pct_blowing_in'], width,
                   label='Blowing In', color='blue', alpha=0.7)
    axes[0, 0].bar(x + width, wind_patterns['pct_cross'], width,
                   label='Cross Wind', color='gray', alpha=0.7)
    axes[0, 0].set_xlabel('Month')
    axes[0, 0].set_ylabel('Percentage of Games (%)')
    axes[0, 0].set_title('Wrigley Field Wind Direction by Month')
    axes[0, 0].set_xticks(x)
    axes[0, 0].set_xticklabels(months, rotation=45)
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3, axis='y')

    # Scoring vs wind out percentage
    axes[0, 1].scatter(wind_patterns['pct_blowing_out'],
                      wind_patterns['avg_runs_per_game'],
                      s=100, alpha=0.6)
    for idx, month in enumerate(months):
        axes[0, 1].annotate(month,
                           (wind_patterns.iloc[idx]['pct_blowing_out'],
                            wind_patterns.iloc[idx]['avg_runs_per_game']),
                           fontsize=9, alpha=0.7)

    # Add trend line
    z = np.polyfit(wind_patterns['pct_blowing_out'],
                   wind_patterns['avg_runs_per_game'], 1)
    p = np.poly1d(z)
    axes[0, 1].plot(wind_patterns['pct_blowing_out'],
                    p(wind_patterns['pct_blowing_out']),
                    "r--", alpha=0.8, linewidth=2)

    axes[0, 1].set_xlabel('% Games with Wind Blowing Out')
    axes[0, 1].set_ylabel('Average Runs per Game')
    axes[0, 1].set_title('Wind Direction vs Scoring')
    axes[0, 1].grid(True, alpha=0.3)

    # Home runs vs wind
    axes[1, 0].scatter(wind_patterns['pct_blowing_out'],
                      wind_patterns['avg_hr_per_game'],
                      s=100, alpha=0.6, color='orange')
    for idx, month in enumerate(months):
        axes[1, 0].annotate(month,
                           (wind_patterns.iloc[idx]['pct_blowing_out'],
                            wind_patterns.iloc[idx]['avg_hr_per_game']),
                           fontsize=9, alpha=0.7)

    z = np.polyfit(wind_patterns['pct_blowing_out'],
                   wind_patterns['avg_hr_per_game'], 1)
    p = np.poly1d(z)
    axes[1, 0].plot(wind_patterns['pct_blowing_out'],
                    p(wind_patterns['pct_blowing_out']),
                    "r--", alpha=0.8, linewidth=2)

    axes[1, 0].set_xlabel('% Games with Wind Blowing Out')
    axes[1, 0].set_ylabel('Average Home Runs per Game')
    axes[1, 0].set_title('Wind Direction vs Home Runs')
    axes[1, 0].grid(True, alpha=0.3)

    # Monthly trends
    axes[1, 1].plot(months, wind_patterns['avg_runs_per_game'],
                   'o-', linewidth=2, markersize=8, label='Runs/Game')
    ax2 = axes[1, 1].twinx()
    ax2.plot(months, wind_patterns['avg_hr_per_game'],
            'o-', linewidth=2, markersize=8, color='orange', label='HR/Game')

    axes[1, 1].set_xlabel('Month')
    axes[1, 1].set_ylabel('Runs per Game', color='blue')
    ax2.set_ylabel('Home Runs per Game', color='orange')
    axes[1, 1].set_title('Seasonal Offensive Trends at Wrigley')
    axes[1, 1].tick_params(axis='y', labelcolor='blue')
    ax2.tick_params(axis='y', labelcolor='orange')
    axes[1, 1].set_xticklabels(months, rotation=45)
    axes[1, 1].grid(True, alpha=0.3)

    plt.tight_layout()
    plt.savefig('wrigley_wind_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()

    return wind_patterns

wrigley_wind = wrigley_wind_analysis()

# Wind rose visualization
def create_wind_rose():
    """
    Create wind rose showing typical Wrigley wind patterns
    """

    # Simulate wind direction data (degrees from North)
    # Wrigley: Lake Michigan is to the east
    np.random.seed(42)

    # Common directions at Wrigley
    # North (0°): In from CF
    # South (180°): Out to CF
    # East (90°): From lake, cross wind
    # West (270°): Cross wind

    directions = []
    speeds = []

    # Generate realistic distribution
    for _ in range(1000):
        # More common: SW winds (blowing out), NE winds (blowing in)
        direction_type = np.random.choice(['SW', 'NE', 'E', 'W', 'Variable'],
                                         p=[0.35, 0.30, 0.15, 0.15, 0.05])

        if direction_type == 'SW':
            dir_deg = np.random.normal(225, 20)  # Southwest
            speed = np.random.normal(10, 3)
        elif direction_type == 'NE':
            dir_deg = np.random.normal(45, 20)  # Northeast
            speed = np.random.normal(12, 3)
        elif direction_type == 'E':
            dir_deg = np.random.normal(90, 15)  # East
            speed = np.random.normal(8, 2)
        elif direction_type == 'W':
            dir_deg = np.random.normal(270, 15)  # West
            speed = np.random.normal(8, 2)
        else:
            dir_deg = np.random.uniform(0, 360)
            speed = np.random.normal(5, 2)

        directions.append(dir_deg % 360)
        speeds.append(max(0, speed))

    # Create polar histogram
    fig = plt.figure(figsize=(10, 10))
    ax = fig.add_subplot(111, projection='polar')

    # Convert to radians and bin
    dir_radians = np.array(directions) * np.pi / 180
    bins = np.linspace(0, 2*np.pi, 17)

    hist, bin_edges = np.histogram(dir_radians, bins=bins)

    # Plot
    width = 2*np.pi / 16
    bars = ax.bar(bin_edges[:-1], hist, width=width, alpha=0.7)

    # Color by impact (SW = red for wind out, NE = blue for wind in)
    colors = []
    for edge in bin_edges[:-1]:
        deg = edge * 180 / np.pi
        if 135 <= deg <= 315:  # SW quadrant (wind blowing out)
            colors.append('red')
        elif deg < 135 or deg > 315:  # NE quadrant (wind blowing in)
            colors.append('blue')
        else:
            colors.append('gray')

    for bar, color in zip(bars, colors):
        bar.set_facecolor(color)
        bar.set_alpha(0.6)

    ax.set_theta_zero_location('N')
    ax.set_theta_direction(-1)
    ax.set_title('Wrigley Field Wind Rose\\n(Red = Blowing Out, Blue = Blowing In)',
                 fontsize=14, fontweight='bold', pad=20)

    plt.tight_layout()
    plt.show()

create_wind_rose()

22.6 Day vs Night & Scheduling Effects

Day/Night Game Differences

Day games show systematically different outcomes than night games:

Day Game Characteristics:


  • Higher temperatures (especially afternoon games)

  • More wind variability

  • Sun interference (fielding difficulty)

  • Player fatigue (unusual game time for body clock)

Night Game Characteristics:


  • Cooler temperatures

  • Better lighting consistency

  • Standard game time for players

  • Potentially higher humidity

R Analysis: Day/Night Splits

# R: Day vs Night Game Analysis
library(tidyverse)
library(Lahman)

# Simulate realistic day/night data
set.seed(654)

games_data <- tibble(
  game_id = 1:2430,  # Full season of games
  game_type = sample(c("Day", "Night"), 2430, replace = TRUE,
                    prob = c(0.35, 0.65)),  # ~35% day games
  temperature = ifelse(game_type == "Day",
                      rnorm(2430, 78, 10),
                      rnorm(2430, 68, 8)),
  month = sample(c("April", "May", "June", "July",
                  "August", "September"), 2430, replace = TRUE)
) %>%
  mutate(
    # Day games have higher scoring due to temperature
    base_runs = 8.5,
    temp_effect = (temperature - 70) * 0.03,
    day_effect = ifelse(game_type == "Day", 0.4, 0),  # Extra boost
    noise = rnorm(n(), 0, 1.6),
    total_runs = pmax(base_runs + temp_effect + day_effect + noise, 2),
    total_runs = round(total_runs),

    # Home runs
    hr_base = 2.2,
    hr_temp = (temperature - 70) * 0.015,
    hr_day = ifelse(game_type == "Day", 0.25, 0),
    hr_noise = rnorm(n(), 0, 0.7),
    home_runs = pmax(hr_base + hr_temp + hr_day + hr_noise, 0),
    home_runs = round(home_runs),

    # Errors (sun field at day games)
    errors = rpois(n(), ifelse(game_type == "Day", 0.85, 0.65))
  )

# Compare day vs night
day_night_summary <- games_data %>%
  group_by(game_type) %>%
  summarise(
    n_games = n(),
    avg_temp = mean(temperature),
    avg_runs = mean(total_runs),
    avg_hr = mean(home_runs),
    avg_errors = mean(errors),
    .groups = 'drop'
  )

print("Day vs Night Game Statistics:")
print(day_night_summary)

# Statistical tests
t_test_runs <- t.test(total_runs ~ game_type, data = games_data)
t_test_hr <- t.test(home_runs ~ game_type, data = games_data)
t_test_errors <- t.test(errors ~ game_type, data = games_data)

cat("\nStatistical Significance Tests:\n")
cat(sprintf("Runs - Day vs Night: t = %.2f, p = %.4f\n",
           t_test_runs$statistic, t_test_runs$p.value))
cat(sprintf("HR - Day vs Night: t = %.2f, p = %.4f\n",
           t_test_hr$statistic, t_test_hr$p.value))
cat(sprintf("Errors - Day vs Night: t = %.2f, p = %.4f\n",
           t_test_errors$statistic, t_test_errors$p.value))

# Visualizations
p1 <- ggplot(games_data, aes(x = game_type, y = total_runs, fill = game_type)) +
  geom_boxplot(alpha = 0.7) +
  scale_fill_manual(values = c("Day" = "gold", "Night" = "navy")) +
  labs(title = "Total Runs: Day vs Night Games",
       x = "Game Type",
       y = "Total Runs") +
  theme_minimal() +
  theme(legend.position = "none")

p2 <- ggplot(games_data, aes(x = temperature, y = total_runs,
                             color = game_type)) +
  geom_point(alpha = 0.3) +
  geom_smooth(method = "lm", se = TRUE) +
  scale_color_manual(values = c("Day" = "orange", "Night" = "darkblue"),
                    name = "Game Type") +
  labs(title = "Temperature Effect by Game Type",
       x = "Temperature (°F)",
       y = "Total Runs") +
  theme_minimal()

library(patchwork)
p1 + p2

# Monthly breakdown
monthly_summary <- games_data %>%
  group_by(month, game_type) %>%
  summarise(
    avg_runs = mean(total_runs),
    n = n(),
    .groups = 'drop'
  ) %>%
  mutate(month = factor(month, levels = c("April", "May", "June",
                                          "July", "August", "September")))

ggplot(monthly_summary, aes(x = month, y = avg_runs,
                           fill = game_type, group = game_type)) +
  geom_col(position = "dodge", alpha = 0.8) +
  scale_fill_manual(values = c("Day" = "gold", "Night" = "navy"),
                   name = "Game Type") +
  labs(title = "Scoring by Month and Game Time",
       x = "Month",
       y = "Average Runs per Game") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Python Implementation: Comprehensive Scheduling Analysis

# Python: Advanced Day/Night and Scheduling Analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

def comprehensive_scheduling_analysis():
    """
    Analyze various scheduling factors affecting performance
    """

    np.random.seed(999)
    n_games = 2430

    # Generate comprehensive game data
    games = pd.DataFrame({
        'game_id': range(n_games),
        'game_type': np.random.choice(['Day', 'Night'], n_games, p=[0.35, 0.65]),
        'day_of_week': np.random.choice(['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'],
                                       n_games, p=[0.10, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15]),
        'month': np.random.choice(range(4, 10), n_games),  # April-Sept (4-9)
        'is_weekend': np.random.choice([0, 1], n_games, p=[0.7, 0.3]),
        'is_doubleheader': np.random.choice([0, 1], n_games, p=[0.97, 0.03]),
        'rest_days': np.random.choice([0, 1, 2, 3], n_games, p=[0.7, 0.2, 0.08, 0.02])
    })

    # Temperature varies by time and month
    games['temperature'] = np.where(
        games['game_type'] == 'Day',
        np.random.normal(75 + games['month'] * 1.5, 11),
        np.random.normal(67 + games['month'] * 1.2, 9)
    )

    # Generate runs with multiple factors
    base_runs = 8.5
    day_effect = np.where(games['game_type'] == 'Day', 0.45, 0)
    temp_effect = (games['temperature'] - 70) * 0.03
    dh_effect = np.where(games['is_doubleheader'] == 1, -0.6, 0)  # Fatigue
    rest_effect = games['rest_days'] * 0.15  # Fresh teams score more
    noise = np.random.normal(0, 1.5, n_games)

    games['total_runs'] = (base_runs + day_effect + temp_effect +
                          dh_effect + rest_effect + noise).clip(1, 25)
    games['total_runs'] = games['total_runs'].round()

    # Analysis 1: Day vs Night
    day_night = games.groupby('game_type').agg({
        'total_runs': ['mean', 'std', 'count'],
        'temperature': 'mean'
    }).round(2)

    print("Day vs Night Game Analysis:")
    print(day_night)
    print()

    # Analysis 2: Day of week effects
    dow_effects = games.groupby('day_of_week').agg({
        'total_runs': ['mean', 'count']
    }).round(2)

    dow_effects = dow_effects.reindex(['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])
    print("Day of Week Effects:")
    print(dow_effects)
    print()

    # Analysis 3: Rest days impact
    rest_effects = games.groupby('rest_days').agg({
        'total_runs': ['mean', 'std', 'count']
    }).round(2)

    print("Rest Days Impact:")
    print(rest_effects)
    print()

    # Analysis 4: Doubleheader effects
    dh_effects = games.groupby('is_doubleheader').agg({
        'total_runs': ['mean', 'std', 'count']
    }).round(2)

    print("Doubleheader Effects:")
    print(dh_effects)
    print()

    # Statistical modeling
    from sklearn.linear_model import LinearRegression
    from sklearn.preprocessing import LabelEncoder

    # Encode categorical variables
    games_encoded = games.copy()
    games_encoded['game_type_encoded'] = (games_encoded['game_type'] == 'Day').astype(int)

    le_dow = LabelEncoder()
    games_encoded['dow_encoded'] = le_dow.fit_transform(games_encoded['day_of_week'])

    X = games_encoded[['game_type_encoded', 'temperature', 'month',
                       'is_doubleheader', 'rest_days', 'dow_encoded']]
    y = games_encoded['total_runs']

    model = LinearRegression()
    model.fit(X, y)

    print("\nRegression Model Coefficients:")
    for feature, coef in zip(X.columns, model.coef_):
        print(f"  {feature}: {coef:.4f}")
    print(f"\nR-squared: {model.score(X, y):.4f}")

    # Visualizations
    fig, axes = plt.subplots(2, 3, figsize=(16, 10))

    # 1. Day vs Night boxplot
    sns.boxplot(data=games, x='game_type', y='total_runs', ax=axes[0, 0],
                palette={'Day': 'gold', 'Night': 'navy'})
    axes[0, 0].set_title('Day vs Night Scoring', fontweight='bold')
    axes[0, 0].set_ylabel('Total Runs')

    # 2. Temperature distribution by game type
    games[games['game_type']=='Day']['temperature'].hist(
        bins=30, alpha=0.6, ax=axes[0, 1], label='Day', color='gold'
    )
    games[games['game_type']=='Night']['temperature'].hist(
        bins=30, alpha=0.6, ax=axes[0, 1], label='Night', color='navy'
    )
    axes[0, 1].set_xlabel('Temperature (°F)')
    axes[0, 1].set_ylabel('Frequency')
    axes[0, 1].set_title('Temperature Distribution', fontweight='bold')
    axes[0, 1].legend()

    # 3. Day of week scoring
    dow_order = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
    dow_data = games.groupby('day_of_week')['total_runs'].mean().reindex(dow_order)
    axes[0, 2].bar(range(7), dow_data.values, color='steelblue', alpha=0.7)
    axes[0, 2].set_xticks(range(7))
    axes[0, 2].set_xticklabels(dow_order)
    axes[0, 2].set_ylabel('Average Runs')
    axes[0, 2].set_title('Scoring by Day of Week', fontweight='bold')
    axes[0, 2].grid(True, alpha=0.3, axis='y')

    # 4. Rest days effect
    rest_data = games.groupby('rest_days')['total_runs'].mean()
    axes[1, 0].plot(rest_data.index, rest_data.values, 'o-',
                    linewidth=2, markersize=10, color='green')
    axes[1, 0].set_xlabel('Rest Days Before Game')
    axes[1, 0].set_ylabel('Average Runs')
    axes[1, 0].set_title('Impact of Rest on Scoring', fontweight='bold')
    axes[1, 0].grid(True, alpha=0.3)

    # 5. Monthly trends by game type
    monthly_by_type = games.groupby(['month', 'game_type'])['total_runs'].mean().unstack()
    monthly_by_type.plot(ax=axes[1, 1], marker='o', linewidth=2)
    axes[1, 1].set_xlabel('Month')
    axes[1, 1].set_ylabel('Average Runs')
    axes[1, 1].set_title('Monthly Trends by Game Type', fontweight='bold')
    axes[1, 1].set_xticks(range(4, 10))
    axes[1, 1].set_xticklabels(['Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep'])
    axes[1, 1].grid(True, alpha=0.3)
    axes[1, 1].legend(title='Game Type')

    # 6. Feature importance
    feature_importance = pd.DataFrame({
        'feature': ['Day Game', 'Temperature', 'Month',
                   'Doubleheader', 'Rest Days', 'Day of Week'],
        'coefficient': model.coef_
    }).sort_values('coefficient')

    colors = ['red' if x < 0 else 'green' for x in feature_importance['coefficient']]
    axes[1, 2].barh(range(len(feature_importance)),
                    feature_importance['coefficient'],
                    color=colors, alpha=0.6)
    axes[1, 2].set_yticks(range(len(feature_importance)))
    axes[1, 2].set_yticklabels(feature_importance['feature'])
    axes[1, 2].set_xlabel('Effect on Runs')
    axes[1, 2].set_title('Factor Impact on Scoring', fontweight='bold')
    axes[1, 2].axvline(x=0, color='black', linestyle='-', linewidth=0.8)
    axes[1, 2].grid(True, alpha=0.3, axis='x')

    plt.tight_layout()
    plt.savefig('scheduling_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()

    return games, model

games_df, sched_model = comprehensive_scheduling_analysis()

# Practical application: Predict expected runs for specific game
print("\n" + "="*60)
print("Example Prediction:")
print("="*60)

example_game = pd.DataFrame({
    'game_type_encoded': [1],  # Day game
    'temperature': [88],
    'month': [7],  # July
    'is_doubleheader': [0],
    'rest_days': [1],
    'dow_encoded': [5]  # Saturday
})

predicted_runs = sched_model.predict(example_game)[0]
print(f"Day game in July, 88°F, Saturday, 1 day rest:")
print(f"Expected total runs: {predicted_runs:.1f}")

R
# R: Day vs Night Game Analysis
library(tidyverse)
library(Lahman)

# Simulate realistic day/night data
set.seed(654)

games_data <- tibble(
  game_id = 1:2430,  # Full season of games
  game_type = sample(c("Day", "Night"), 2430, replace = TRUE,
                    prob = c(0.35, 0.65)),  # ~35% day games
  temperature = ifelse(game_type == "Day",
                      rnorm(2430, 78, 10),
                      rnorm(2430, 68, 8)),
  month = sample(c("April", "May", "June", "July",
                  "August", "September"), 2430, replace = TRUE)
) %>%
  mutate(
    # Day games have higher scoring due to temperature
    base_runs = 8.5,
    temp_effect = (temperature - 70) * 0.03,
    day_effect = ifelse(game_type == "Day", 0.4, 0),  # Extra boost
    noise = rnorm(n(), 0, 1.6),
    total_runs = pmax(base_runs + temp_effect + day_effect + noise, 2),
    total_runs = round(total_runs),

    # Home runs
    hr_base = 2.2,
    hr_temp = (temperature - 70) * 0.015,
    hr_day = ifelse(game_type == "Day", 0.25, 0),
    hr_noise = rnorm(n(), 0, 0.7),
    home_runs = pmax(hr_base + hr_temp + hr_day + hr_noise, 0),
    home_runs = round(home_runs),

    # Errors (sun field at day games)
    errors = rpois(n(), ifelse(game_type == "Day", 0.85, 0.65))
  )

# Compare day vs night
day_night_summary <- games_data %>%
  group_by(game_type) %>%
  summarise(
    n_games = n(),
    avg_temp = mean(temperature),
    avg_runs = mean(total_runs),
    avg_hr = mean(home_runs),
    avg_errors = mean(errors),
    .groups = 'drop'
  )

print("Day vs Night Game Statistics:")
print(day_night_summary)

# Statistical tests
t_test_runs <- t.test(total_runs ~ game_type, data = games_data)
t_test_hr <- t.test(home_runs ~ game_type, data = games_data)
t_test_errors <- t.test(errors ~ game_type, data = games_data)

cat("\nStatistical Significance Tests:\n")
cat(sprintf("Runs - Day vs Night: t = %.2f, p = %.4f\n",
           t_test_runs$statistic, t_test_runs$p.value))
cat(sprintf("HR - Day vs Night: t = %.2f, p = %.4f\n",
           t_test_hr$statistic, t_test_hr$p.value))
cat(sprintf("Errors - Day vs Night: t = %.2f, p = %.4f\n",
           t_test_errors$statistic, t_test_errors$p.value))

# Visualizations
p1 <- ggplot(games_data, aes(x = game_type, y = total_runs, fill = game_type)) +
  geom_boxplot(alpha = 0.7) +
  scale_fill_manual(values = c("Day" = "gold", "Night" = "navy")) +
  labs(title = "Total Runs: Day vs Night Games",
       x = "Game Type",
       y = "Total Runs") +
  theme_minimal() +
  theme(legend.position = "none")

p2 <- ggplot(games_data, aes(x = temperature, y = total_runs,
                             color = game_type)) +
  geom_point(alpha = 0.3) +
  geom_smooth(method = "lm", se = TRUE) +
  scale_color_manual(values = c("Day" = "orange", "Night" = "darkblue"),
                    name = "Game Type") +
  labs(title = "Temperature Effect by Game Type",
       x = "Temperature (°F)",
       y = "Total Runs") +
  theme_minimal()

library(patchwork)
p1 + p2

# Monthly breakdown
monthly_summary <- games_data %>%
  group_by(month, game_type) %>%
  summarise(
    avg_runs = mean(total_runs),
    n = n(),
    .groups = 'drop'
  ) %>%
  mutate(month = factor(month, levels = c("April", "May", "June",
                                          "July", "August", "September")))

ggplot(monthly_summary, aes(x = month, y = avg_runs,
                           fill = game_type, group = game_type)) +
  geom_col(position = "dodge", alpha = 0.8) +
  scale_fill_manual(values = c("Day" = "gold", "Night" = "navy"),
                   name = "Game Type") +
  labs(title = "Scoring by Month and Game Time",
       x = "Month",
       y = "Average Runs per Game") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Python
# Python: Advanced Day/Night and Scheduling Analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

def comprehensive_scheduling_analysis():
    """
    Analyze various scheduling factors affecting performance
    """

    np.random.seed(999)
    n_games = 2430

    # Generate comprehensive game data
    games = pd.DataFrame({
        'game_id': range(n_games),
        'game_type': np.random.choice(['Day', 'Night'], n_games, p=[0.35, 0.65]),
        'day_of_week': np.random.choice(['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'],
                                       n_games, p=[0.10, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15]),
        'month': np.random.choice(range(4, 10), n_games),  # April-Sept (4-9)
        'is_weekend': np.random.choice([0, 1], n_games, p=[0.7, 0.3]),
        'is_doubleheader': np.random.choice([0, 1], n_games, p=[0.97, 0.03]),
        'rest_days': np.random.choice([0, 1, 2, 3], n_games, p=[0.7, 0.2, 0.08, 0.02])
    })

    # Temperature varies by time and month
    games['temperature'] = np.where(
        games['game_type'] == 'Day',
        np.random.normal(75 + games['month'] * 1.5, 11),
        np.random.normal(67 + games['month'] * 1.2, 9)
    )

    # Generate runs with multiple factors
    base_runs = 8.5
    day_effect = np.where(games['game_type'] == 'Day', 0.45, 0)
    temp_effect = (games['temperature'] - 70) * 0.03
    dh_effect = np.where(games['is_doubleheader'] == 1, -0.6, 0)  # Fatigue
    rest_effect = games['rest_days'] * 0.15  # Fresh teams score more
    noise = np.random.normal(0, 1.5, n_games)

    games['total_runs'] = (base_runs + day_effect + temp_effect +
                          dh_effect + rest_effect + noise).clip(1, 25)
    games['total_runs'] = games['total_runs'].round()

    # Analysis 1: Day vs Night
    day_night = games.groupby('game_type').agg({
        'total_runs': ['mean', 'std', 'count'],
        'temperature': 'mean'
    }).round(2)

    print("Day vs Night Game Analysis:")
    print(day_night)
    print()

    # Analysis 2: Day of week effects
    dow_effects = games.groupby('day_of_week').agg({
        'total_runs': ['mean', 'count']
    }).round(2)

    dow_effects = dow_effects.reindex(['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])
    print("Day of Week Effects:")
    print(dow_effects)
    print()

    # Analysis 3: Rest days impact
    rest_effects = games.groupby('rest_days').agg({
        'total_runs': ['mean', 'std', 'count']
    }).round(2)

    print("Rest Days Impact:")
    print(rest_effects)
    print()

    # Analysis 4: Doubleheader effects
    dh_effects = games.groupby('is_doubleheader').agg({
        'total_runs': ['mean', 'std', 'count']
    }).round(2)

    print("Doubleheader Effects:")
    print(dh_effects)
    print()

    # Statistical modeling
    from sklearn.linear_model import LinearRegression
    from sklearn.preprocessing import LabelEncoder

    # Encode categorical variables
    games_encoded = games.copy()
    games_encoded['game_type_encoded'] = (games_encoded['game_type'] == 'Day').astype(int)

    le_dow = LabelEncoder()
    games_encoded['dow_encoded'] = le_dow.fit_transform(games_encoded['day_of_week'])

    X = games_encoded[['game_type_encoded', 'temperature', 'month',
                       'is_doubleheader', 'rest_days', 'dow_encoded']]
    y = games_encoded['total_runs']

    model = LinearRegression()
    model.fit(X, y)

    print("\nRegression Model Coefficients:")
    for feature, coef in zip(X.columns, model.coef_):
        print(f"  {feature}: {coef:.4f}")
    print(f"\nR-squared: {model.score(X, y):.4f}")

    # Visualizations
    fig, axes = plt.subplots(2, 3, figsize=(16, 10))

    # 1. Day vs Night boxplot
    sns.boxplot(data=games, x='game_type', y='total_runs', ax=axes[0, 0],
                palette={'Day': 'gold', 'Night': 'navy'})
    axes[0, 0].set_title('Day vs Night Scoring', fontweight='bold')
    axes[0, 0].set_ylabel('Total Runs')

    # 2. Temperature distribution by game type
    games[games['game_type']=='Day']['temperature'].hist(
        bins=30, alpha=0.6, ax=axes[0, 1], label='Day', color='gold'
    )
    games[games['game_type']=='Night']['temperature'].hist(
        bins=30, alpha=0.6, ax=axes[0, 1], label='Night', color='navy'
    )
    axes[0, 1].set_xlabel('Temperature (°F)')
    axes[0, 1].set_ylabel('Frequency')
    axes[0, 1].set_title('Temperature Distribution', fontweight='bold')
    axes[0, 1].legend()

    # 3. Day of week scoring
    dow_order = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
    dow_data = games.groupby('day_of_week')['total_runs'].mean().reindex(dow_order)
    axes[0, 2].bar(range(7), dow_data.values, color='steelblue', alpha=0.7)
    axes[0, 2].set_xticks(range(7))
    axes[0, 2].set_xticklabels(dow_order)
    axes[0, 2].set_ylabel('Average Runs')
    axes[0, 2].set_title('Scoring by Day of Week', fontweight='bold')
    axes[0, 2].grid(True, alpha=0.3, axis='y')

    # 4. Rest days effect
    rest_data = games.groupby('rest_days')['total_runs'].mean()
    axes[1, 0].plot(rest_data.index, rest_data.values, 'o-',
                    linewidth=2, markersize=10, color='green')
    axes[1, 0].set_xlabel('Rest Days Before Game')
    axes[1, 0].set_ylabel('Average Runs')
    axes[1, 0].set_title('Impact of Rest on Scoring', fontweight='bold')
    axes[1, 0].grid(True, alpha=0.3)

    # 5. Monthly trends by game type
    monthly_by_type = games.groupby(['month', 'game_type'])['total_runs'].mean().unstack()
    monthly_by_type.plot(ax=axes[1, 1], marker='o', linewidth=2)
    axes[1, 1].set_xlabel('Month')
    axes[1, 1].set_ylabel('Average Runs')
    axes[1, 1].set_title('Monthly Trends by Game Type', fontweight='bold')
    axes[1, 1].set_xticks(range(4, 10))
    axes[1, 1].set_xticklabels(['Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep'])
    axes[1, 1].grid(True, alpha=0.3)
    axes[1, 1].legend(title='Game Type')

    # 6. Feature importance
    feature_importance = pd.DataFrame({
        'feature': ['Day Game', 'Temperature', 'Month',
                   'Doubleheader', 'Rest Days', 'Day of Week'],
        'coefficient': model.coef_
    }).sort_values('coefficient')

    colors = ['red' if x < 0 else 'green' for x in feature_importance['coefficient']]
    axes[1, 2].barh(range(len(feature_importance)),
                    feature_importance['coefficient'],
                    color=colors, alpha=0.6)
    axes[1, 2].set_yticks(range(len(feature_importance)))
    axes[1, 2].set_yticklabels(feature_importance['feature'])
    axes[1, 2].set_xlabel('Effect on Runs')
    axes[1, 2].set_title('Factor Impact on Scoring', fontweight='bold')
    axes[1, 2].axvline(x=0, color='black', linestyle='-', linewidth=0.8)
    axes[1, 2].grid(True, alpha=0.3, axis='x')

    plt.tight_layout()
    plt.savefig('scheduling_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()

    return games, model

games_df, sched_model = comprehensive_scheduling_analysis()

# Practical application: Predict expected runs for specific game
print("\n" + "="*60)
print("Example Prediction:")
print("="*60)

example_game = pd.DataFrame({
    'game_type_encoded': [1],  # Day game
    'temperature': [88],
    'month': [7],  # July
    'is_doubleheader': [0],
    'rest_days': [1],
    'dow_encoded': [5]  # Saturday
})

predicted_runs = sched_model.predict(example_game)[0]
print(f"Day game in July, 88°F, Saturday, 1 day rest:")
print(f"Expected total runs: {predicted_runs:.1f}")

22.7 Exercises

Exercise 1: Calculate Custom Park Factors

Using the Lahman database, calculate three-year park factors (2021-2023) for all MLB ballparks. Include separate factors for runs, home runs, hits, and strikeouts. Identify the top 5 most extreme parks in each category.

Steps:


  1. Extract team-level statistics for home and road games

  2. Calculate park factors using the proper formula

  3. Visualize the results with a multi-faceted plot

  4. Discuss which parks favor which types of outcomes

Exercise 2: Weather Impact Simulation

Build a physics-based simulator that predicts whether identical batted balls (105 mph exit velocity, 28-degree launch angle) result in home runs under different weather conditions.

Requirements:


  • Test combinations of temperature (50-100°F), humidity (20-90%), and wind (-15 to +15 mph)

  • Account for air density changes

  • Calculate the probability of a home run for each condition

  • Create a heat map showing HR probability across weather conditions

  • Determine the weather conditions that maximize and minimize HR probability

Exercise 3: Coors Field Player Adjustment

Analyze the career statistics of five players who spent significant time with the Colorado Rockies. Calculate their park-adjusted statistics and compare them to their raw numbers.

Analysis Points:


  • Home vs road splits for batting average, OBP, SLG, and HR rate

  • Career totals adjusted for Coors Field effects

  • Comparison to league-average park adjustments

  • Discussion of whether Coors Field affects player evaluation and contract value

Exercise 4: Day/Night Betting Model

Build a predictive model for game total runs that incorporates game time (day/night), weather conditions, park factors, and team strength.

Model Requirements:


  • Use regression or machine learning approach

  • Include interaction terms (e.g., day games × temperature)

  • Validate with train/test split

  • Calculate expected value for over/under bets

  • Backtest strategy: when would betting based on weather conditions be profitable?

Bonus: Identify specific situations where the model predicts the largest deviations from betting market totals.


Summary

Environmental factors substantially affect baseball outcomes in quantifiable ways:

Key Findings:


  • Park Factors: Coors Field (PF ~110-115) and other extreme parks require statistical adjustment

  • Temperature: Every 10°F adds ~2-3 feet to fly ball distance; correlates strongly with offense

  • Altitude: Coors Field's 5,200-foot elevation reduces air density 17%, adding 15-25 feet to fly balls and reducing breaking ball movement

  • Humidity: Counterintuitively helps offense (water vapor is lighter than air)

  • Wind: 10 mph helping wind adds ~20 feet; headwind subtracts similar amounts

  • Day vs Night: Day games average 0.4-0.5 more runs due to higher temperatures

  • Scheduling: Rest days, doubleheaders, and day-of-week show measurable effects

Practical Applications:


  • Player Evaluation: Adjust statistics for park and weather contexts

  • Projections: Account for home ballpark changes (trades, free agency)

  • Betting: Weather creates exploitable edges in run totals markets

  • Strategy: Lineup construction can optimize for home park characteristics

  • Historical Analysis: Era and park adjustments essential for fair comparisons

Understanding these environmental effects allows analysts to separate signal from noise, fairly evaluate players across contexts, and identify strategic opportunities. While we cannot control the weather, we can quantify its impact and make better-informed decisions accordingly.

Chapter Summary

In this chapter, you learned about weather, ballpark & environmental factors. Key topics covered:

  • Park Factors Deep Dive
  • Weather Effects on Ball Flight
  • Altitude & Air Density: Coors Field Analysis
  • Temperature & Humidity Effects
  • Wind Analysis & Modeling
  • Day vs Night & Scheduling Effects