Chapter 14: Team Building & Roster Construction

14.1 Economics of Baseball

Baseball operates under a unique economic structure that profoundly affects team building. Understanding player costs, market efficiency, and the relationship between payroll and wins is fundamental to roster construction.

The Salary Structure

MLB's salary structure creates distinct player markets based on service time:

Pre-Arbitration (0-3 years of service): Players earn near the league minimum (approximately $740,000 in 2024). Teams control these players completely, offering minimal salaries regardless of performance. A pre-arbitration player producing 5 WAR costs roughly $3-4 million total—an extraordinary bargain.

Arbitration-Eligible (3-6 years): Players become eligible for salary arbitration, where neutral arbitrators determine fair salaries based on comparable players. Arbitration salaries increase with performance and service time but remain below open market rates. A 3-WAR arbitration-eligible player might earn $8-12 million—still below market value but substantially more than pre-arbitration.

Free Agency (6+ years): After six years of service, players can negotiate with any team. Free agent salaries reflect open market competition and typically exceed performance value, especially for multi-year contracts that pay for declining future seasons.

This structure creates enormous incentives to develop young talent and trade players before they reach free agency. A team built around pre-arbitration stars—like the 2020 Rays or 2023 Orioles—can win while maintaining low payrolls.

Cost Per Win Analysis

Quantifying the relationship between spending and wins helps teams allocate resources efficiently. The fundamental metric is cost per marginal win ($/WAR), calculated by examining free agent contracts.

Let's analyze the 2023-24 free agent market to establish cost per win:

# 2023-24 Notable Free Agent Contracts
library(tidyverse)

free_agents <- tibble(
  player = c("Shohei Ohtani", "Yoshinobu Yamamoto", "Aaron Nola",
             "Jordan Montgomery", "Blake Snell", "Cody Bellinger",
             "Matt Chapman", "Jung Hoo Lee"),
  aav = c(70, 32.5, 25, 25, 32, 26.7, 25, 18.5),  # Average annual value (millions)
  years = c(10, 12, 7, 2, 2, 3, 3, 6),
  projected_war = c(5.5, 3.5, 3.0, 2.8, 3.2, 2.5, 3.5, 2.0)  # Annual WAR projection
)

# Calculate cost per WAR
free_agents <- free_agents %>%
  mutate(
    cost_per_war = aav / projected_war,
    total_value = aav * years
  )

# Summary statistics
cat("2023-24 Free Agent Market:\n")
cat("Median Cost per WAR: $", round(median(free_agents$cost_per_war), 1), "M\n", sep="")
cat("Mean Cost per WAR: $", round(mean(free_agents$cost_per_war), 1), "M\n", sep="")

# Visualize
ggplot(free_agents, aes(x = projected_war, y = aav, label = player)) +
  geom_point(size = 3, color = "steelblue") +
  geom_text(hjust = -0.1, size = 3) +
  geom_smooth(method = "lm", se = FALSE, color = "red", linetype = "dashed") +
  labs(title = "Free Agent Value: AAV vs Projected WAR",
       x = "Projected WAR per Season",
       y = "Average Annual Value ($M)") +
  theme_minimal() +
  xlim(1.5, 6)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# 2023-24 Notable Free Agent Contracts
free_agents = pd.DataFrame({
    'player': ['Shohei Ohtani', 'Yoshinobu Yamamoto', 'Aaron Nola',
               'Jordan Montgomery', 'Blake Snell', 'Cody Bellinger',
               'Matt Chapman', 'Jung Hoo Lee'],
    'aav': [70, 32.5, 25, 25, 32, 26.7, 25, 18.5],  # millions
    'years': [10, 12, 7, 2, 2, 3, 3, 6],
    'projected_war': [5.5, 3.5, 3.0, 2.8, 3.2, 2.5, 3.5, 2.0]
})

# Calculate cost per WAR
free_agents['cost_per_war'] = free_agents['aav'] / free_agents['projected_war']
free_agents['total_value'] = free_agents['aav'] * free_agents['years']

# Summary statistics
print("2023-24 Free Agent Market:")
print(f"Median Cost per WAR: ${free_agents['cost_per_war'].median():.1f}M")
print(f"Mean Cost per WAR: ${free_agents['cost_per_war'].mean():.1f}M")

# Visualize
plt.figure(figsize=(10, 6))
plt.scatter(free_agents['projected_war'], free_agents['aav'], s=100, alpha=0.6)

# Add labels
for idx, row in free_agents.iterrows():
    plt.annotate(row['player'], (row['projected_war'], row['aav']),
                xytext=(5, 5), textcoords='offset points', fontsize=8)

# Fit line
slope, intercept, r, p, se = stats.linregress(free_agents['projected_war'],
                                               free_agents['aav'])
x_line = np.array([1.5, 6])
y_line = slope * x_line + intercept
plt.plot(x_line, y_line, 'r--', alpha=0.7, label=f'Fit: ${slope:.1f}M per WAR')

plt.xlabel('Projected WAR per Season')
plt.ylabel('Average Annual Value ($M)')
plt.title('Free Agent Value: AAV vs Projected WAR')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

The 2023-24 market showed cost per WAR around $9-12 million, though Ohtani's unique contract structure (heavily deferred) complicates the calculation. Historical analysis suggests free agent cost per win has grown approximately 5% annually, outpacing inflation.

Comparing Market Segments

The economic advantage of team-controlled players becomes clear when comparing costs across service classes:

# Cost comparison across service classes
salary_structure <- tibble(
  service_class = c("Pre-Arb", "Arb Year 1", "Arb Year 2",
                   "Arb Year 3", "Free Agent"),
  avg_salary = c(0.8, 2.5, 5.0, 8.5, 15.0),  # millions, for 3-WAR player
  war_value = rep(3.0, 5),
  market_value = c(36, 36, 36, 36, 36)  # 3 WAR * $12M
)

salary_structure <- salary_structure %>%
  mutate(
    surplus_value = market_value - avg_salary,
    efficiency = market_value / avg_salary
  )

print(salary_structure)

# Visualize surplus value
ggplot(salary_structure, aes(x = service_class, y = surplus_value)) +
  geom_col(fill = "darkgreen", alpha = 0.7) +
  geom_text(aes(label = paste0("$", surplus_value, "M")),
            vjust = -0.5, size = 4) +
  labs(title = "Surplus Value by Service Class",
       subtitle = "For a 3-WAR player at $12M/WAR market rate",
       x = "Service Class",
       y = "Surplus Value ($M)") +
  theme_minimal() +
  scale_x_discrete(limits = c("Pre-Arb", "Arb Year 1", "Arb Year 2",
                              "Arb Year 3", "Free Agent"))

# Cost comparison across service classes
salary_structure = pd.DataFrame({
    'service_class': ['Pre-Arb', 'Arb Year 1', 'Arb Year 2',
                     'Arb Year 3', 'Free Agent'],
    'avg_salary': [0.8, 2.5, 5.0, 8.5, 15.0],  # millions, for 3-WAR player
    'war_value': [3.0] * 5,
    'market_value': [36] * 5  # 3 WAR * $12M
})

salary_structure['surplus_value'] = (salary_structure['market_value'] -
                                     salary_structure['avg_salary'])
salary_structure['efficiency'] = (salary_structure['market_value'] /
                                  salary_structure['avg_salary'])

print(salary_structure)

# Visualize surplus value
plt.figure(figsize=(10, 6))
bars = plt.bar(salary_structure['service_class'],
               salary_structure['surplus_value'],
               color='darkgreen', alpha=0.7)

# Add value labels
for idx, bar in enumerate(bars):
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
            f'${salary_structure.iloc[idx]["surplus_value"]:.1f}M',
            ha='center', va='bottom', fontsize=10)

plt.xlabel('Service Class')
plt.ylabel('Surplus Value ($M)')
plt.title('Surplus Value by Service Class\nFor a 3-WAR player at $12M/WAR market rate')
plt.xticks(rotation=45)
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

Pre-arbitration players generate 40x return on investment compared to free agents. This explains why teams like Tampa Bay, Cleveland, and Baltimore have remained competitive despite low payrolls—they've built rosters around team-controlled talent.

The Win Curve and Marginal Value

Not all wins have equal value. The 85th win (potentially making the playoffs) is worth far more than the 65th win (remaining well out of contention). This non-linear relationship affects optimal spending strategies.

# Estimate playoff probability by wins (based on historical data)
library(tidyverse)

win_curve <- tibble(
  wins = 75:95,
  playoff_prob = plogis((wins - 87) / 3)  # Logistic curve centered at 87 wins
) %>%
  mutate(
    marginal_playoff_prob = playoff_prob - lag(playoff_prob, default = 0),
    win_value_multiplier = 1 + (marginal_playoff_prob * 20)  # Playoff worth ~20 wins
  )

ggplot(win_curve, aes(x = wins, y = playoff_prob * 100)) +
  geom_line(size = 1.5, color = "darkblue") +
  geom_vline(xintercept = 87, linetype = "dashed", color = "red") +
  labs(title = "Playoff Probability by Regular Season Wins",
       subtitle = "Each marginal win has different value",
       x = "Regular Season Wins",
       y = "Playoff Probability (%)") +
  theme_minimal() +
  annotate("text", x = 89, y = 25, label = "Steepest slope:\nHighest marginal value",
           hjust = 0, size = 3.5)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.special import expit  # logistic function

# Estimate playoff probability by wins
wins = np.arange(75, 96)
playoff_prob = expit((wins - 87) / 3)  # Logistic curve centered at 87 wins

win_curve = pd.DataFrame({
    'wins': wins,
    'playoff_prob': playoff_prob
})

win_curve['marginal_playoff_prob'] = win_curve['playoff_prob'].diff().fillna(0)
win_curve['win_value_multiplier'] = 1 + (win_curve['marginal_playoff_prob'] * 20)

# Plot
plt.figure(figsize=(10, 6))
plt.plot(win_curve['wins'], win_curve['playoff_prob'] * 100,
         linewidth=2.5, color='darkblue')
plt.axvline(x=87, linestyle='--', color='red', alpha=0.7, label='Median playoff team')
plt.xlabel('Regular Season Wins')
plt.ylabel('Playoff Probability (%)')
plt.title('Playoff Probability by Regular Season Wins\nEach marginal win has different value')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

Teams near the playoff threshold should value wins more highly than teams far from contention. An 84-win team might rationally spend $15M per marginal win to reach 87 wins, while a 72-win team shouldn't exceed $10M per win.

Luxury Tax and Budget Constraints

The Competitive Balance Tax (luxury tax) creates marginal cost increases for high-payroll teams. In 2024, the threshold is $237M, with escalating penalties:

First threshold: 20% tax (50% for repeat offenders)
Second threshold ($257M): 32% tax (62% for repeat offenders)
Third threshold ($277M): 62.5% tax (95% for repeat offenders)

Plus additional draft pick penalties. This creates a progressive cost structure where the Yankees pay $1.95 for every $1.00 in salary above $277M (as a repeat offender).

# Calculate effective cost including luxury tax
calculate_tax_cost <- function(salary, current_payroll, threshold = 237,
                               repeat_offender = TRUE) {
  new_payroll <- current_payroll + salary

  if (new_payroll <= threshold) {
    return(salary)  # No tax
  }

  # Calculate tiered tax
  tax_rate <- ifelse(repeat_offender, 0.50, 0.20)  # First tier

  if (new_payroll > 277) {
    tax_rate <- ifelse(repeat_offender, 0.95, 0.625)
  } else if (new_payroll > 257) {
    tax_rate <- ifelse(repeat_offender, 0.62, 0.32)
  }

  overage <- new_payroll - threshold
  total_cost <- salary + (overage * tax_rate)

  return(total_cost)
}

# Example: Yankees (repeat offender, $280M payroll) vs Rays ($90M payroll)
# Adding a $20M player
yankees_cost <- calculate_tax_cost(20, 280, repeat_offender = TRUE)
rays_cost <- calculate_tax_cost(20, 90, repeat_offender = FALSE)

cat("Cost to add $20M player:\n")
cat("Yankees: $", round(yankees_cost, 1), "M (", round(yankees_cost/20, 2), "x multiplier)\n", sep="")
cat("Rays: $", round(rays_cost, 1), "M (", round(rays_cost/20, 2), "x multiplier)\n", sep="")

def calculate_tax_cost(salary, current_payroll, threshold=237, repeat_offender=True):
    """Calculate effective cost including luxury tax"""
    new_payroll = current_payroll + salary

    if new_payroll <= threshold:
        return salary  # No tax

    # Calculate tiered tax
    if new_payroll > 277:
        tax_rate = 0.95 if repeat_offender else 0.625
    elif new_payroll > 257:
        tax_rate = 0.62 if repeat_offender else 0.32
    else:
        tax_rate = 0.50 if repeat_offender else 0.20

    overage = new_payroll - threshold
    total_cost = salary + (overage * tax_rate)

    return total_cost

# Example: Yankees (repeat offender, $280M payroll) vs Rays ($90M payroll)
yankees_cost = calculate_tax_cost(20, 280, repeat_offender=True)
rays_cost = calculate_tax_cost(20, 90, repeat_offender=False)

print("Cost to add $20M player:")
print(f"Yankees: ${yankees_cost:.1f}M ({yankees_cost/20:.2f}x multiplier)")
print(f"Rays: ${rays_cost:.1f}M ({rays_cost/20:.2f}x multiplier)")

High-payroll teams face dramatically higher effective costs, creating different optimization problems. The Yankees must generate more WAR per dollar spent than the Rays to justify acquisitions.

R

# 2023-24 Notable Free Agent Contracts
library(tidyverse)

free_agents <- tibble(
  player = c("Shohei Ohtani", "Yoshinobu Yamamoto", "Aaron Nola",
             "Jordan Montgomery", "Blake Snell", "Cody Bellinger",
             "Matt Chapman", "Jung Hoo Lee"),
  aav = c(70, 32.5, 25, 25, 32, 26.7, 25, 18.5),  # Average annual value (millions)
  years = c(10, 12, 7, 2, 2, 3, 3, 6),
  projected_war = c(5.5, 3.5, 3.0, 2.8, 3.2, 2.5, 3.5, 2.0)  # Annual WAR projection
)

# Calculate cost per WAR
free_agents <- free_agents %>%
  mutate(
    cost_per_war = aav / projected_war,
    total_value = aav * years
  )

# Summary statistics
cat("2023-24 Free Agent Market:\n")
cat("Median Cost per WAR: $", round(median(free_agents$cost_per_war), 1), "M\n", sep="")
cat("Mean Cost per WAR: $", round(mean(free_agents$cost_per_war), 1), "M\n", sep="")

# Visualize
ggplot(free_agents, aes(x = projected_war, y = aav, label = player)) +
  geom_point(size = 3, color = "steelblue") +
  geom_text(hjust = -0.1, size = 3) +
  geom_smooth(method = "lm", se = FALSE, color = "red", linetype = "dashed") +
  labs(title = "Free Agent Value: AAV vs Projected WAR",
       x = "Projected WAR per Season",
       y = "Average Annual Value ($M)") +
  theme_minimal() +
  xlim(1.5, 6)

R

# Cost comparison across service classes
salary_structure <- tibble(
  service_class = c("Pre-Arb", "Arb Year 1", "Arb Year 2",
                   "Arb Year 3", "Free Agent"),
  avg_salary = c(0.8, 2.5, 5.0, 8.5, 15.0),  # millions, for 3-WAR player
  war_value = rep(3.0, 5),
  market_value = c(36, 36, 36, 36, 36)  # 3 WAR * $12M
)

salary_structure <- salary_structure %>%
  mutate(
    surplus_value = market_value - avg_salary,
    efficiency = market_value / avg_salary
  )

print(salary_structure)

# Visualize surplus value
ggplot(salary_structure, aes(x = service_class, y = surplus_value)) +
  geom_col(fill = "darkgreen", alpha = 0.7) +
  geom_text(aes(label = paste0("$", surplus_value, "M")),
            vjust = -0.5, size = 4) +
  labs(title = "Surplus Value by Service Class",
       subtitle = "For a 3-WAR player at $12M/WAR market rate",
       x = "Service Class",
       y = "Surplus Value ($M)") +
  theme_minimal() +
  scale_x_discrete(limits = c("Pre-Arb", "Arb Year 1", "Arb Year 2",
                              "Arb Year 3", "Free Agent"))

R

# Estimate playoff probability by wins (based on historical data)
library(tidyverse)

win_curve <- tibble(
  wins = 75:95,
  playoff_prob = plogis((wins - 87) / 3)  # Logistic curve centered at 87 wins
) %>%
  mutate(
    marginal_playoff_prob = playoff_prob - lag(playoff_prob, default = 0),
    win_value_multiplier = 1 + (marginal_playoff_prob * 20)  # Playoff worth ~20 wins
  )

ggplot(win_curve, aes(x = wins, y = playoff_prob * 100)) +
  geom_line(size = 1.5, color = "darkblue") +
  geom_vline(xintercept = 87, linetype = "dashed", color = "red") +
  labs(title = "Playoff Probability by Regular Season Wins",
       subtitle = "Each marginal win has different value",
       x = "Regular Season Wins",
       y = "Playoff Probability (%)") +
  theme_minimal() +
  annotate("text", x = 89, y = 25, label = "Steepest slope:\nHighest marginal value",
           hjust = 0, size = 3.5)

R

# Calculate effective cost including luxury tax
calculate_tax_cost <- function(salary, current_payroll, threshold = 237,
                               repeat_offender = TRUE) {
  new_payroll <- current_payroll + salary

  if (new_payroll <= threshold) {
    return(salary)  # No tax
  }

  # Calculate tiered tax
  tax_rate <- ifelse(repeat_offender, 0.50, 0.20)  # First tier

  if (new_payroll > 277) {
    tax_rate <- ifelse(repeat_offender, 0.95, 0.625)
  } else if (new_payroll > 257) {
    tax_rate <- ifelse(repeat_offender, 0.62, 0.32)
  }

  overage <- new_payroll - threshold
  total_cost <- salary + (overage * tax_rate)

  return(total_cost)
}

# Example: Yankees (repeat offender, $280M payroll) vs Rays ($90M payroll)
# Adding a $20M player
yankees_cost <- calculate_tax_cost(20, 280, repeat_offender = TRUE)
rays_cost <- calculate_tax_cost(20, 90, repeat_offender = FALSE)

cat("Cost to add $20M player:\n")
cat("Yankees: $", round(yankees_cost, 1), "M (", round(yankees_cost/20, 2), "x multiplier)\n", sep="")
cat("Rays: $", round(rays_cost, 1), "M (", round(rays_cost/20, 2), "x multiplier)\n", sep="")

Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# 2023-24 Notable Free Agent Contracts
free_agents = pd.DataFrame({
    'player': ['Shohei Ohtani', 'Yoshinobu Yamamoto', 'Aaron Nola',
               'Jordan Montgomery', 'Blake Snell', 'Cody Bellinger',
               'Matt Chapman', 'Jung Hoo Lee'],
    'aav': [70, 32.5, 25, 25, 32, 26.7, 25, 18.5],  # millions
    'years': [10, 12, 7, 2, 2, 3, 3, 6],
    'projected_war': [5.5, 3.5, 3.0, 2.8, 3.2, 2.5, 3.5, 2.0]
})

# Calculate cost per WAR
free_agents['cost_per_war'] = free_agents['aav'] / free_agents['projected_war']
free_agents['total_value'] = free_agents['aav'] * free_agents['years']

# Summary statistics
print("2023-24 Free Agent Market:")
print(f"Median Cost per WAR: ${free_agents['cost_per_war'].median():.1f}M")
print(f"Mean Cost per WAR: ${free_agents['cost_per_war'].mean():.1f}M")

# Visualize
plt.figure(figsize=(10, 6))
plt.scatter(free_agents['projected_war'], free_agents['aav'], s=100, alpha=0.6)

# Add labels
for idx, row in free_agents.iterrows():
    plt.annotate(row['player'], (row['projected_war'], row['aav']),
                xytext=(5, 5), textcoords='offset points', fontsize=8)

# Fit line
slope, intercept, r, p, se = stats.linregress(free_agents['projected_war'],
                                               free_agents['aav'])
x_line = np.array([1.5, 6])
y_line = slope * x_line + intercept
plt.plot(x_line, y_line, 'r--', alpha=0.7, label=f'Fit: ${slope:.1f}M per WAR')

plt.xlabel('Projected WAR per Season')
plt.ylabel('Average Annual Value ($M)')
plt.title('Free Agent Value: AAV vs Projected WAR')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Python

# Cost comparison across service classes
salary_structure = pd.DataFrame({
    'service_class': ['Pre-Arb', 'Arb Year 1', 'Arb Year 2',
                     'Arb Year 3', 'Free Agent'],
    'avg_salary': [0.8, 2.5, 5.0, 8.5, 15.0],  # millions, for 3-WAR player
    'war_value': [3.0] * 5,
    'market_value': [36] * 5  # 3 WAR * $12M
})

salary_structure['surplus_value'] = (salary_structure['market_value'] -
                                     salary_structure['avg_salary'])
salary_structure['efficiency'] = (salary_structure['market_value'] /
                                  salary_structure['avg_salary'])

print(salary_structure)

# Visualize surplus value
plt.figure(figsize=(10, 6))
bars = plt.bar(salary_structure['service_class'],
               salary_structure['surplus_value'],
               color='darkgreen', alpha=0.7)

# Add value labels
for idx, bar in enumerate(bars):
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
            f'${salary_structure.iloc[idx]["surplus_value"]:.1f}M',
            ha='center', va='bottom', fontsize=10)

plt.xlabel('Service Class')
plt.ylabel('Surplus Value ($M)')
plt.title('Surplus Value by Service Class\nFor a 3-WAR player at $12M/WAR market rate')
plt.xticks(rotation=45)
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.special import expit  # logistic function

# Estimate playoff probability by wins
wins = np.arange(75, 96)
playoff_prob = expit((wins - 87) / 3)  # Logistic curve centered at 87 wins

win_curve = pd.DataFrame({
    'wins': wins,
    'playoff_prob': playoff_prob
})

win_curve['marginal_playoff_prob'] = win_curve['playoff_prob'].diff().fillna(0)
win_curve['win_value_multiplier'] = 1 + (win_curve['marginal_playoff_prob'] * 20)

# Plot
plt.figure(figsize=(10, 6))
plt.plot(win_curve['wins'], win_curve['playoff_prob'] * 100,
         linewidth=2.5, color='darkblue')
plt.axvline(x=87, linestyle='--', color='red', alpha=0.7, label='Median playoff team')
plt.xlabel('Regular Season Wins')
plt.ylabel('Playoff Probability (%)')
plt.title('Playoff Probability by Regular Season Wins\nEach marginal win has different value')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

Python

def calculate_tax_cost(salary, current_payroll, threshold=237, repeat_offender=True):
    """Calculate effective cost including luxury tax"""
    new_payroll = current_payroll + salary

    if new_payroll <= threshold:
        return salary  # No tax

    # Calculate tiered tax
    if new_payroll > 277:
        tax_rate = 0.95 if repeat_offender else 0.625
    elif new_payroll > 257:
        tax_rate = 0.62 if repeat_offender else 0.32
    else:
        tax_rate = 0.50 if repeat_offender else 0.20

    overage = new_payroll - threshold
    total_cost = salary + (overage * tax_rate)

    return total_cost

# Example: Yankees (repeat offender, $280M payroll) vs Rays ($90M payroll)
yankees_cost = calculate_tax_cost(20, 280, repeat_offender=True)
rays_cost = calculate_tax_cost(20, 90, repeat_offender=False)

print("Cost to add $20M player:")
print(f"Yankees: ${yankees_cost:.1f}M ({yankees_cost/20:.2f}x multiplier)")
print(f"Rays: ${rays_cost:.1f}M ({rays_cost/20:.2f}x multiplier)")

14.2 Positional Value

A shortstop hitting .240 with 15 home runs might be more valuable than a first baseman hitting .280 with 25 home runs. Defensive positions have different offensive standards and scarcity levels, requiring positional adjustments in player valuation.

The Defensive Spectrum

Positions fall along a defensive spectrum from most demanding (catcher, shortstop, center field) to least demanding (first base, designated hitter). More demanding positions have lower offensive standards because fewer players can handle the defensive requirements.

The standard positional adjustments (runs per 162 games, relative to average):

# Positional adjustments (runs per 162 games)
positional_value <- tibble(
  position = c("C", "SS", "2B", "CF", "3B", "LF", "RF", "1B", "DH"),
  adjustment = c(12.5, 7.5, 3.0, 2.5, 2.0, -7.5, -7.5, -12.5, -17.5),
  war_adjustment = adjustment / 10  # Convert runs to WAR (10 runs ≈ 1 WAR)
) %>%
  arrange(desc(adjustment))

print(positional_value)

# Visualize
ggplot(positional_value, aes(x = reorder(position, adjustment),
                             y = adjustment, fill = adjustment > 0)) +
  geom_col(alpha = 0.7) +
  geom_text(aes(label = sprintf("%+.1f", adjustment)),
            hjust = ifelse(positional_value$adjustment > 0, -0.2, 1.2),
            size = 4) +
  coord_flip() +
  scale_fill_manual(values = c("red", "blue"), guide = "none") +
  labs(title = "Positional Value Adjustments",
       subtitle = "Runs per 162 games, relative to average position",
       x = "Position",
       y = "Run Adjustment") +
  theme_minimal()

import pandas as pd
import matplotlib.pyplot as plt

# Positional adjustments (runs per 162 games)
positional_value = pd.DataFrame({
    'position': ['C', 'SS', '2B', 'CF', '3B', 'LF', 'RF', '1B', 'DH'],
    'adjustment': [12.5, 7.5, 3.0, 2.5, 2.0, -7.5, -7.5, -12.5, -17.5]
})

positional_value['war_adjustment'] = positional_value['adjustment'] / 10
positional_value = positional_value.sort_values('adjustment', ascending=False)

print(positional_value)

# Visualize
fig, ax = plt.subplots(figsize=(10, 6))
colors = ['blue' if x > 0 else 'red' for x in positional_value['adjustment']]
bars = ax.barh(positional_value['position'], positional_value['adjustment'],
               color=colors, alpha=0.7)

# Add value labels
for idx, (pos, adj) in enumerate(zip(positional_value['position'],
                                      positional_value['adjustment'])):
    ax.text(adj + (1 if adj > 0 else -1), idx, f'{adj:+.1f}',
            ha='left' if adj > 0 else 'right', va='center', fontsize=10)

ax.set_xlabel('Run Adjustment')
ax.set_ylabel('Position')
ax.set_title('Positional Value Adjustments\nRuns per 162 games, relative to average position')
ax.axvline(x=0, color='black', linestyle='-', linewidth=0.8)
ax.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

A catcher receives a +12.5 run adjustment (about +1.25 WAR) compared to an average position player, while a DH receives a -17.5 run penalty (-1.75 WAR). This 3-WAR difference means a catcher can produce equivalent value while posting significantly worse offensive numbers than a designated hitter.

Market Inefficiency at Premium Positions

Teams often overpay for offense at premium defensive positions and underpay for defense. The 2023 Orioles exemplified efficient roster construction by acquiring strong defenders at premium positions:

Adley Rutschman (C): Elite defense, above-average offense
Gunnar Henderson (SS): Plus defense, excellent offense
Cedric Mullins (CF): Gold Glove defense, solid offense

Meanwhile, they accepted below-average defense at corner positions (1B, LF, RF) in exchange for offensive production. This construction maximized total value by allocating defensive resources where they matter most.

Calculating Position-Adjusted Value

Let's compare two 2023 players with similar raw offensive numbers:

# Compare Corey Seager (SS) vs Freddie Freeman (1B) - 2023
library(tidyverse)

players <- tibble(
  player = c("Corey Seager", "Freddie Freeman"),
  position = c("SS", "1B"),
  batting_war = c(4.2, 4.5),  # Offensive value only
  fielding_runs = c(0, 8),
  positional_adj = c(7.5, -12.5)  # From our table
)

players <- players %>%
  mutate(
    fielding_war = fielding_runs / 10,
    position_war = positional_adj / 10,
    total_war = batting_war + fielding_war + position_war,
    salary_2023 = c(33, 27),  # Millions
    war_per_dollar = total_war / salary_2023
  )

print(players)

# Visualization
players_long <- players %>%
  select(player, batting_war, fielding_war, position_war) %>%
  pivot_longer(cols = -player, names_to = "component", values_to = "war")

ggplot(players_long, aes(x = player, y = war, fill = component)) +
  geom_col(position = "stack") +
  geom_text(data = players, aes(x = player, y = total_war + 0.3,
                                label = sprintf("%.1f WAR", total_war)),
            inherit.aes = FALSE, size = 5) +
  scale_fill_manual(values = c("batting_war" = "darkblue",
                               "fielding_war" = "darkgreen",
                               "position_war" = "orange"),
                    labels = c("Batting", "Fielding", "Positional Adj")) +
  labs(title = "WAR Components: Premium vs Corner Position",
       subtitle = "2023 Season Comparison",
       x = NULL, y = "WAR", fill = "Component") +
  theme_minimal()

import pandas as pd
import matplotlib.pyplot as plt

# Compare Corey Seager (SS) vs Freddie Freeman (1B) - 2023
players = pd.DataFrame({
    'player': ['Corey Seager', 'Freddie Freeman'],
    'position': ['SS', '1B'],
    'batting_war': [4.2, 4.5],
    'fielding_runs': [0, 8],
    'positional_adj': [7.5, -12.5]
})

players['fielding_war'] = players['fielding_runs'] / 10
players['position_war'] = players['positional_adj'] / 10
players['total_war'] = (players['batting_war'] + players['fielding_war'] +
                        players['position_war'])
players['salary_2023'] = [33, 27]  # Millions
players['war_per_dollar'] = players['total_war'] / players['salary_2023']

print(players[['player', 'batting_war', 'fielding_war', 'position_war', 'total_war']])

# Visualization
fig, ax = plt.subplots(figsize=(10, 6))

x = range(len(players))
width = 0.6

# Stacked bars
p1 = ax.bar(x, players['batting_war'], width, label='Batting', color='darkblue')
p2 = ax.bar(x, players['fielding_war'], width, bottom=players['batting_war'],
            label='Fielding', color='darkgreen')
p3 = ax.bar(x, players['position_war'], width,
            bottom=players['batting_war'] + players['fielding_war'],
            label='Positional Adj', color='orange')

# Add total WAR labels
for i, (player, war) in enumerate(zip(players['player'], players['total_war'])):
    ax.text(i, war + 0.3, f'{war:.1f} WAR', ha='center', fontsize=12, fontweight='bold')

ax.set_ylabel('WAR')
ax.set_title('WAR Components: Premium vs Corner Position\n2023 Season Comparison')
ax.set_xticks(x)
ax.set_xticklabels(players['player'])
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

Despite similar batting contributions, Seager's shortstop premium makes him more valuable overall. This explains why teams pay premiums for shortstops who can hit—they're scarce.

Position Flexibility as Value

Players who can competently handle multiple positions provide roster flexibility worth 0.5-1.0 WAR beyond their performance. The 2016 Cubs utilized this extensively:

Ben Zobrist: 2B/OF/3B versatility
Javier Baez: SS/2B/3B capability
Kyle Schwarber: LF/C emergency option

This flexibility allowed manager Joe Maddon to optimize matchups, rest players, and navigate injuries without roster moves. Teams should value positional versatility in player acquisition and development.

R

# Positional adjustments (runs per 162 games)
positional_value <- tibble(
  position = c("C", "SS", "2B", "CF", "3B", "LF", "RF", "1B", "DH"),
  adjustment = c(12.5, 7.5, 3.0, 2.5, 2.0, -7.5, -7.5, -12.5, -17.5),
  war_adjustment = adjustment / 10  # Convert runs to WAR (10 runs ≈ 1 WAR)
) %>%
  arrange(desc(adjustment))

print(positional_value)

# Visualize
ggplot(positional_value, aes(x = reorder(position, adjustment),
                             y = adjustment, fill = adjustment > 0)) +
  geom_col(alpha = 0.7) +
  geom_text(aes(label = sprintf("%+.1f", adjustment)),
            hjust = ifelse(positional_value$adjustment > 0, -0.2, 1.2),
            size = 4) +
  coord_flip() +
  scale_fill_manual(values = c("red", "blue"), guide = "none") +
  labs(title = "Positional Value Adjustments",
       subtitle = "Runs per 162 games, relative to average position",
       x = "Position",
       y = "Run Adjustment") +
  theme_minimal()

R

# Compare Corey Seager (SS) vs Freddie Freeman (1B) - 2023
library(tidyverse)

players <- tibble(
  player = c("Corey Seager", "Freddie Freeman"),
  position = c("SS", "1B"),
  batting_war = c(4.2, 4.5),  # Offensive value only
  fielding_runs = c(0, 8),
  positional_adj = c(7.5, -12.5)  # From our table
)

players <- players %>%
  mutate(
    fielding_war = fielding_runs / 10,
    position_war = positional_adj / 10,
    total_war = batting_war + fielding_war + position_war,
    salary_2023 = c(33, 27),  # Millions
    war_per_dollar = total_war / salary_2023
  )

print(players)

# Visualization
players_long <- players %>%
  select(player, batting_war, fielding_war, position_war) %>%
  pivot_longer(cols = -player, names_to = "component", values_to = "war")

ggplot(players_long, aes(x = player, y = war, fill = component)) +
  geom_col(position = "stack") +
  geom_text(data = players, aes(x = player, y = total_war + 0.3,
                                label = sprintf("%.1f WAR", total_war)),
            inherit.aes = FALSE, size = 5) +
  scale_fill_manual(values = c("batting_war" = "darkblue",
                               "fielding_war" = "darkgreen",
                               "position_war" = "orange"),
                    labels = c("Batting", "Fielding", "Positional Adj")) +
  labs(title = "WAR Components: Premium vs Corner Position",
       subtitle = "2023 Season Comparison",
       x = NULL, y = "WAR", fill = "Component") +
  theme_minimal()

Python

import pandas as pd
import matplotlib.pyplot as plt

# Positional adjustments (runs per 162 games)
positional_value = pd.DataFrame({
    'position': ['C', 'SS', '2B', 'CF', '3B', 'LF', 'RF', '1B', 'DH'],
    'adjustment': [12.5, 7.5, 3.0, 2.5, 2.0, -7.5, -7.5, -12.5, -17.5]
})

positional_value['war_adjustment'] = positional_value['adjustment'] / 10
positional_value = positional_value.sort_values('adjustment', ascending=False)

print(positional_value)

# Visualize
fig, ax = plt.subplots(figsize=(10, 6))
colors = ['blue' if x > 0 else 'red' for x in positional_value['adjustment']]
bars = ax.barh(positional_value['position'], positional_value['adjustment'],
               color=colors, alpha=0.7)

# Add value labels
for idx, (pos, adj) in enumerate(zip(positional_value['position'],
                                      positional_value['adjustment'])):
    ax.text(adj + (1 if adj > 0 else -1), idx, f'{adj:+.1f}',
            ha='left' if adj > 0 else 'right', va='center', fontsize=10)

ax.set_xlabel('Run Adjustment')
ax.set_ylabel('Position')
ax.set_title('Positional Value Adjustments\nRuns per 162 games, relative to average position')
ax.axvline(x=0, color='black', linestyle='-', linewidth=0.8)
ax.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

Python

import pandas as pd
import matplotlib.pyplot as plt

# Compare Corey Seager (SS) vs Freddie Freeman (1B) - 2023
players = pd.DataFrame({
    'player': ['Corey Seager', 'Freddie Freeman'],
    'position': ['SS', '1B'],
    'batting_war': [4.2, 4.5],
    'fielding_runs': [0, 8],
    'positional_adj': [7.5, -12.5]
})

players['fielding_war'] = players['fielding_runs'] / 10
players['position_war'] = players['positional_adj'] / 10
players['total_war'] = (players['batting_war'] + players['fielding_war'] +
                        players['position_war'])
players['salary_2023'] = [33, 27]  # Millions
players['war_per_dollar'] = players['total_war'] / players['salary_2023']

print(players[['player', 'batting_war', 'fielding_war', 'position_war', 'total_war']])

# Visualization
fig, ax = plt.subplots(figsize=(10, 6))

x = range(len(players))
width = 0.6

# Stacked bars
p1 = ax.bar(x, players['batting_war'], width, label='Batting', color='darkblue')
p2 = ax.bar(x, players['fielding_war'], width, bottom=players['batting_war'],
            label='Fielding', color='darkgreen')
p3 = ax.bar(x, players['position_war'], width,
            bottom=players['batting_war'] + players['fielding_war'],
            label='Positional Adj', color='orange')

# Add total WAR labels
for i, (player, war) in enumerate(zip(players['player'], players['total_war'])):
    ax.text(i, war + 0.3, f'{war:.1f} WAR', ha='center', fontsize=12, fontweight='bold')

ax.set_ylabel('WAR')
ax.set_title('WAR Components: Premium vs Corner Position\n2023 Season Comparison')
ax.set_xticks(x)
ax.set_xticklabels(players['player'])
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

14.3 Free Agent Valuation

Free agency represents the most transparent player market but also the most inefficient. Teams routinely overpay for past performance, overestimate aging curves, and fall victim to winner's curse dynamics. Analytical approaches improve free agent decision-making.

The Aging Curve Problem

Player performance follows a predictable aging curve: rapid improvement through age 26-27, a brief peak, then gradual decline. Free agents typically reach the market at age 28-30, meaning teams pay for declining future performance.

# Generic aging curve (based on research)
library(tidyverse)

aging_curve <- tibble(
  age = 22:38,
  war_multiplier = c(0.75, 0.85, 0.92, 0.97, 1.00, 1.00, 0.98,  # 22-28
                     0.95, 0.91, 0.86, 0.80, 0.73, 0.65, 0.56,  # 29-35
                     0.46, 0.36, 0.26)  # 36-38
)

# Example: Player produces 4 WAR at age 28
player_age_28_war <- 4

projections <- aging_curve %>%
  filter(age >= 28) %>%
  mutate(
    projected_war = player_age_28_war * war_multiplier,
    year = row_number()
  )

ggplot(projections, aes(x = age, y = projected_war)) +
  geom_line(size = 1.5, color = "darkred") +
  geom_point(size = 3, color = "darkred") +
  geom_hline(yintercept = 2, linetype = "dashed", color = "gray50") +
  annotate("text", x = 35, y = 2.2, label = "Replacement level", size = 3.5) +
  labs(title = "Aging Curve: Projected Performance Decline",
       subtitle = "Starting from 4 WAR at age 28",
       x = "Age", y = "Projected WAR") +
  theme_minimal() +
  scale_x_continuous(breaks = seq(28, 38, 2))

# Calculate total value of 6-year contract
total_war_6yr <- sum(projections$projected_war[1:6])
cat("\nTotal projected WAR over 6-year deal (ages 28-33):", round(total_war_6yr, 1), "\n")
cat("Average WAR per season:", round(total_war_6yr / 6, 2), "\n")

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Generic aging curve
aging_curve = pd.DataFrame({
    'age': range(22, 39),
    'war_multiplier': [0.75, 0.85, 0.92, 0.97, 1.00, 1.00, 0.98,  # 22-28
                       0.95, 0.91, 0.86, 0.80, 0.73, 0.65, 0.56,  # 29-35
                       0.46, 0.36, 0.26]  # 36-38
})

# Example: Player produces 4 WAR at age 28
player_age_28_war = 4

projections = aging_curve[aging_curve['age'] >= 28].copy()
projections['projected_war'] = player_age_28_war * projections['war_multiplier']
projections['year'] = range(1, len(projections) + 1)

# Plot
plt.figure(figsize=(10, 6))
plt.plot(projections['age'], projections['projected_war'],
         linewidth=2.5, color='darkred', marker='o', markersize=6)
plt.axhline(y=2, linestyle='--', color='gray', alpha=0.7, label='Replacement level')
plt.xlabel('Age')
plt.ylabel('Projected WAR')
plt.title('Aging Curve: Projected Performance Decline\nStarting from 4 WAR at age 28')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

# Calculate total value
total_war_6yr = projections['projected_war'].iloc[:6].sum()
print(f"\nTotal projected WAR over 6-year deal (ages 28-33): {total_war_6yr:.1f}")
print(f"Average WAR per season: {total_war_6yr/6:.2f}")

A 6-year deal for a 4-WAR player at age 28 yields roughly 19.4 total WAR, averaging 3.2 WAR per season. But teams pay based on current performance (4 WAR), not average future performance (3.2 WAR), creating systematic overpayment.

Discount Rate and Present Value

Future performance is worth less than current performance due to:

Uncertainty: Injury, unexpected decline, or external factors

Opportunity cost: Money spent today can't be deployed elsewhere

Competitive window: A win in 2025 might be worth more than a win in 2030

Financial analysis uses discount rates to value future wins:

# Calculate present value of multi-year contract
calculate_pv_war <- function(war_vector, discount_rate = 0.10) {
  years <- seq_along(war_vector)
  discount_factors <- 1 / ((1 + discount_rate) ^ (years - 1))
  pv_war <- sum(war_vector * discount_factors)
  return(pv_war)
}

# Example: 6-year contract, declining WAR
contract_war <- c(4.0, 3.8, 3.4, 3.0, 2.5, 2.0)

undiscounted <- sum(contract_war)
pv_5pct <- calculate_pv_war(contract_war, 0.05)
pv_10pct <- calculate_pv_war(contract_war, 0.10)

cat("Total WAR:\n")
cat("Undiscounted:", round(undiscounted, 1), "\n")
cat("PV at 5% discount:", round(pv_5pct, 1), "\n")
cat("PV at 10% discount:", round(pv_10pct, 1), "\n")

# Visualize
discount_rates <- seq(0, 0.15, 0.01)
pv_values <- sapply(discount_rates, function(r) calculate_pv_war(contract_war, r))

tibble(discount_rate = discount_rates, pv_war = pv_values) %>%
  ggplot(aes(x = discount_rate * 100, y = pv_war)) +
  geom_line(size = 1.5, color = "darkblue") +
  geom_hline(yintercept = undiscounted, linetype = "dashed", color = "red") +
  labs(title = "Present Value of WAR by Discount Rate",
       subtitle = "6-year contract with declining production",
       x = "Discount Rate (%)", y = "Present Value (WAR)") +
  theme_minimal()

def calculate_pv_war(war_vector, discount_rate=0.10):
    """Calculate present value of WAR stream"""
    years = np.arange(1, len(war_vector) + 1)
    discount_factors = 1 / ((1 + discount_rate) ** (years - 1))
    pv_war = np.sum(np.array(war_vector) * discount_factors)
    return pv_war

# Example: 6-year contract, declining WAR
contract_war = [4.0, 3.8, 3.4, 3.0, 2.5, 2.0]

undiscounted = sum(contract_war)
pv_5pct = calculate_pv_war(contract_war, 0.05)
pv_10pct = calculate_pv_war(contract_war, 0.10)

print("Total WAR:")
print(f"Undiscounted: {undiscounted:.1f}")
print(f"PV at 5% discount: {pv_5pct:.1f}")
print(f"PV at 10% discount: {pv_10pct:.1f}")

# Visualize
discount_rates = np.arange(0, 0.16, 0.01)
pv_values = [calculate_pv_war(contract_war, r) for r in discount_rates]

plt.figure(figsize=(10, 6))
plt.plot(discount_rates * 100, pv_values, linewidth=2.5, color='darkblue')
plt.axhline(y=undiscounted, linestyle='--', color='red', alpha=0.7,
            label=f'Undiscounted ({undiscounted:.1f} WAR)')
plt.xlabel('Discount Rate (%)')
plt.ylabel('Present Value (WAR)')
plt.title('Present Value of WAR by Discount Rate\n6-year contract with declining production')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

At a 10% discount rate, this 18.7 WAR contract is worth only 15.4 present-value WAR. Contending teams should use lower discount rates (future wins still valuable), while rebuilding teams should use higher rates (future wins more valuable than current wins).

Case Study: Evaluating the 2024 Free Agent Class

Let's analyze whether the Dodgers' signing of Shohei Ohtani made economic sense:

# Ohtani contract analysis
# $700M over 10 years, but $680M deferred (97% of contract)
# Actual payments: $2M/year for 10 years, then $68M/year for 10 years

# Calculate present value (using 5% discount rate)
calculate_contract_pv <- function(annual_payments, discount_rate = 0.05) {
  years <- seq_along(annual_payments)
  pv <- sum(annual_payments / ((1 + discount_rate) ^ years))
  return(pv)
}

# Ohtani's payment structure
ohtani_payments <- c(rep(2, 10), rep(68, 10))  # Millions
ohtani_pv <- calculate_contract_pv(ohtani_payments, 0.05)

cat("Ohtani Contract:\n")
cat("Nominal value: $700M\n")
cat("Present value (5%): $", round(ohtani_pv, 0), "M\n", sep="")
cat("Effective AAV: $", round(ohtani_pv / 10, 1), "M\n\n", sep="")

# Project WAR (pitching in 2025, two-way thereafter)
ohtani_war_projection <- c(2.5, 8.0, 7.5, 7.0, 6.5, 6.0, 5.0, 4.0, 3.0, 2.0)
total_projected_war <- sum(ohtani_war_projection)
pv_war <- calculate_pv_war(ohtani_war_projection, 0.05)

cost_per_war <- (ohtani_pv / 10) / (total_projected_war / 10)

cat("Projected Performance:\n")
cat("Total WAR (10 years):", total_projected_war, "\n")
cat("Present Value WAR:", round(pv_war, 1), "\n")
cat("Cost per WAR: $", round(cost_per_war, 1), "M\n", sep="")
cat("Market rate: ~$12M/WAR\n")
cat("Value created: ", ifelse(cost_per_war < 12, "POSITIVE", "NEGATIVE"), "\n", sep="")

def calculate_contract_pv(annual_payments, discount_rate=0.05):
    """Calculate present value of contract"""
    years = np.arange(1, len(annual_payments) + 1)
    pv = np.sum(np.array(annual_payments) / ((1 + discount_rate) ** years))
    return pv

# Ohtani's payment structure
ohtani_payments = [2] * 10 + [68] * 10  # Millions
ohtani_pv = calculate_contract_pv(ohtani_payments, 0.05)

print("Ohtani Contract:")
print(f"Nominal value: $700M")
print(f"Present value (5%): ${ohtani_pv:.0f}M")
print(f"Effective AAV: ${ohtani_pv/10:.1f}M\n")

# Project WAR
ohtani_war_projection = [2.5, 8.0, 7.5, 7.0, 6.5, 6.0, 5.0, 4.0, 3.0, 2.0]
total_projected_war = sum(ohtani_war_projection)
pv_war = calculate_pv_war(ohtani_war_projection, 0.05)

cost_per_war = (ohtani_pv / 10) / (total_projected_war / 10)

print("Projected Performance:")
print(f"Total WAR (10 years): {total_projected_war}")
print(f"Present Value WAR: {pv_war:.1f}")
print(f"Cost per WAR: ${cost_per_war:.1f}M")
print(f"Market rate: ~$12M/WAR")
print(f"Value created: {'POSITIVE' if cost_per_war < 12 else 'NEGATIVE'}")

The extreme deferral structure reduced Ohtani's present value to approximately $460M (effective AAV ~$46M), making the contract economically justifiable despite the eye-popping nominal value. The Dodgers also benefit from luxury tax calculations using the lower AAV.

R

# Generic aging curve (based on research)
library(tidyverse)

aging_curve <- tibble(
  age = 22:38,
  war_multiplier = c(0.75, 0.85, 0.92, 0.97, 1.00, 1.00, 0.98,  # 22-28
                     0.95, 0.91, 0.86, 0.80, 0.73, 0.65, 0.56,  # 29-35
                     0.46, 0.36, 0.26)  # 36-38
)

# Example: Player produces 4 WAR at age 28
player_age_28_war <- 4

projections <- aging_curve %>%
  filter(age >= 28) %>%
  mutate(
    projected_war = player_age_28_war * war_multiplier,
    year = row_number()
  )

ggplot(projections, aes(x = age, y = projected_war)) +
  geom_line(size = 1.5, color = "darkred") +
  geom_point(size = 3, color = "darkred") +
  geom_hline(yintercept = 2, linetype = "dashed", color = "gray50") +
  annotate("text", x = 35, y = 2.2, label = "Replacement level", size = 3.5) +
  labs(title = "Aging Curve: Projected Performance Decline",
       subtitle = "Starting from 4 WAR at age 28",
       x = "Age", y = "Projected WAR") +
  theme_minimal() +
  scale_x_continuous(breaks = seq(28, 38, 2))

# Calculate total value of 6-year contract
total_war_6yr <- sum(projections$projected_war[1:6])
cat("\nTotal projected WAR over 6-year deal (ages 28-33):", round(total_war_6yr, 1), "\n")
cat("Average WAR per season:", round(total_war_6yr / 6, 2), "\n")

R

# Calculate present value of multi-year contract
calculate_pv_war <- function(war_vector, discount_rate = 0.10) {
  years <- seq_along(war_vector)
  discount_factors <- 1 / ((1 + discount_rate) ^ (years - 1))
  pv_war <- sum(war_vector * discount_factors)
  return(pv_war)
}

# Example: 6-year contract, declining WAR
contract_war <- c(4.0, 3.8, 3.4, 3.0, 2.5, 2.0)

undiscounted <- sum(contract_war)
pv_5pct <- calculate_pv_war(contract_war, 0.05)
pv_10pct <- calculate_pv_war(contract_war, 0.10)

cat("Total WAR:\n")
cat("Undiscounted:", round(undiscounted, 1), "\n")
cat("PV at 5% discount:", round(pv_5pct, 1), "\n")
cat("PV at 10% discount:", round(pv_10pct, 1), "\n")

# Visualize
discount_rates <- seq(0, 0.15, 0.01)
pv_values <- sapply(discount_rates, function(r) calculate_pv_war(contract_war, r))

tibble(discount_rate = discount_rates, pv_war = pv_values) %>%
  ggplot(aes(x = discount_rate * 100, y = pv_war)) +
  geom_line(size = 1.5, color = "darkblue") +
  geom_hline(yintercept = undiscounted, linetype = "dashed", color = "red") +
  labs(title = "Present Value of WAR by Discount Rate",
       subtitle = "6-year contract with declining production",
       x = "Discount Rate (%)", y = "Present Value (WAR)") +
  theme_minimal()

R

# Ohtani contract analysis
# $700M over 10 years, but $680M deferred (97% of contract)
# Actual payments: $2M/year for 10 years, then $68M/year for 10 years

# Calculate present value (using 5% discount rate)
calculate_contract_pv <- function(annual_payments, discount_rate = 0.05) {
  years <- seq_along(annual_payments)
  pv <- sum(annual_payments / ((1 + discount_rate) ^ years))
  return(pv)
}

# Ohtani's payment structure
ohtani_payments <- c(rep(2, 10), rep(68, 10))  # Millions
ohtani_pv <- calculate_contract_pv(ohtani_payments, 0.05)

cat("Ohtani Contract:\n")
cat("Nominal value: $700M\n")
cat("Present value (5%): $", round(ohtani_pv, 0), "M\n", sep="")
cat("Effective AAV: $", round(ohtani_pv / 10, 1), "M\n\n", sep="")

# Project WAR (pitching in 2025, two-way thereafter)
ohtani_war_projection <- c(2.5, 8.0, 7.5, 7.0, 6.5, 6.0, 5.0, 4.0, 3.0, 2.0)
total_projected_war <- sum(ohtani_war_projection)
pv_war <- calculate_pv_war(ohtani_war_projection, 0.05)

cost_per_war <- (ohtani_pv / 10) / (total_projected_war / 10)

cat("Projected Performance:\n")
cat("Total WAR (10 years):", total_projected_war, "\n")
cat("Present Value WAR:", round(pv_war, 1), "\n")
cat("Cost per WAR: $", round(cost_per_war, 1), "M\n", sep="")
cat("Market rate: ~$12M/WAR\n")
cat("Value created: ", ifelse(cost_per_war < 12, "POSITIVE", "NEGATIVE"), "\n", sep="")

Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Generic aging curve
aging_curve = pd.DataFrame({
    'age': range(22, 39),
    'war_multiplier': [0.75, 0.85, 0.92, 0.97, 1.00, 1.00, 0.98,  # 22-28
                       0.95, 0.91, 0.86, 0.80, 0.73, 0.65, 0.56,  # 29-35
                       0.46, 0.36, 0.26]  # 36-38
})

# Example: Player produces 4 WAR at age 28
player_age_28_war = 4

projections = aging_curve[aging_curve['age'] >= 28].copy()
projections['projected_war'] = player_age_28_war * projections['war_multiplier']
projections['year'] = range(1, len(projections) + 1)

# Plot
plt.figure(figsize=(10, 6))
plt.plot(projections['age'], projections['projected_war'],
         linewidth=2.5, color='darkred', marker='o', markersize=6)
plt.axhline(y=2, linestyle='--', color='gray', alpha=0.7, label='Replacement level')
plt.xlabel('Age')
plt.ylabel('Projected WAR')
plt.title('Aging Curve: Projected Performance Decline\nStarting from 4 WAR at age 28')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

# Calculate total value
total_war_6yr = projections['projected_war'].iloc[:6].sum()
print(f"\nTotal projected WAR over 6-year deal (ages 28-33): {total_war_6yr:.1f}")
print(f"Average WAR per season: {total_war_6yr/6:.2f}")

Python

def calculate_pv_war(war_vector, discount_rate=0.10):
    """Calculate present value of WAR stream"""
    years = np.arange(1, len(war_vector) + 1)
    discount_factors = 1 / ((1 + discount_rate) ** (years - 1))
    pv_war = np.sum(np.array(war_vector) * discount_factors)
    return pv_war

# Example: 6-year contract, declining WAR
contract_war = [4.0, 3.8, 3.4, 3.0, 2.5, 2.0]

undiscounted = sum(contract_war)
pv_5pct = calculate_pv_war(contract_war, 0.05)
pv_10pct = calculate_pv_war(contract_war, 0.10)

print("Total WAR:")
print(f"Undiscounted: {undiscounted:.1f}")
print(f"PV at 5% discount: {pv_5pct:.1f}")
print(f"PV at 10% discount: {pv_10pct:.1f}")

# Visualize
discount_rates = np.arange(0, 0.16, 0.01)
pv_values = [calculate_pv_war(contract_war, r) for r in discount_rates]

plt.figure(figsize=(10, 6))
plt.plot(discount_rates * 100, pv_values, linewidth=2.5, color='darkblue')
plt.axhline(y=undiscounted, linestyle='--', color='red', alpha=0.7,
            label=f'Undiscounted ({undiscounted:.1f} WAR)')
plt.xlabel('Discount Rate (%)')
plt.ylabel('Present Value (WAR)')
plt.title('Present Value of WAR by Discount Rate\n6-year contract with declining production')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

Python

def calculate_contract_pv(annual_payments, discount_rate=0.05):
    """Calculate present value of contract"""
    years = np.arange(1, len(annual_payments) + 1)
    pv = np.sum(np.array(annual_payments) / ((1 + discount_rate) ** years))
    return pv

# Ohtani's payment structure
ohtani_payments = [2] * 10 + [68] * 10  # Millions
ohtani_pv = calculate_contract_pv(ohtani_payments, 0.05)

print("Ohtani Contract:")
print(f"Nominal value: $700M")
print(f"Present value (5%): ${ohtani_pv:.0f}M")
print(f"Effective AAV: ${ohtani_pv/10:.1f}M\n")

# Project WAR
ohtani_war_projection = [2.5, 8.0, 7.5, 7.0, 6.5, 6.0, 5.0, 4.0, 3.0, 2.0]
total_projected_war = sum(ohtani_war_projection)
pv_war = calculate_pv_war(ohtani_war_projection, 0.05)

cost_per_war = (ohtani_pv / 10) / (total_projected_war / 10)

print("Projected Performance:")
print(f"Total WAR (10 years): {total_projected_war}")
print(f"Present Value WAR: {pv_war:.1f}")
print(f"Cost per WAR: ${cost_per_war:.1f}M")
print(f"Market rate: ~$12M/WAR")
print(f"Value created: {'POSITIVE' if cost_per_war < 12 else 'NEGATIVE'}")

14.4 Trade Analysis

Trades involve exchanging present and future value across different competitive timelines. Analytical frameworks help evaluate whether trades align with organizational objectives.

Framework: Surplus Value

Trade analysis centers on surplus value—the difference between a player's expected production and cost. Teams trade surplus value across different time horizons.

# Calculate surplus value
calculate_surplus <- function(projected_war, salary, market_rate = 12) {
  market_value <- projected_war * market_rate
  surplus <- market_value - salary
  return(surplus)
}

# Example: Compare players in trade discussion
trade_evaluation <- tibble(
  player = c("Established Star", "Young Prospect", "Prospect 2", "Prospect 3"),
  years_control = c(2, 6, 5, 5),
  projected_war_annual = c(5.5, 2.5, 2.0, 1.5),
  salary = c(32, 0.8, 0.75, 0.75)  # Millions per year
)

trade_evaluation <- trade_evaluation %>%
  mutate(
    total_war = years_control * projected_war_annual,
    total_salary = years_control * salary,
    total_surplus = calculate_surplus(total_war, total_salary),
    surplus_per_year = total_surplus / years_control
  )

print(trade_evaluation)

cat("\nTrade Scenario: Star for 3 prospects\n")
cat("Team A receives: ", trade_evaluation$total_surplus[1], "M surplus (win-now mode)\n", sep="")
cat("Team B receives: ", sum(trade_evaluation$total_surplus[2:4]), "M surplus (rebuilding)\n", sep="")

def calculate_surplus(projected_war, salary, market_rate=12):
    """Calculate surplus value"""
    market_value = projected_war * market_rate
    surplus = market_value - salary
    return surplus

# Example: Compare players in trade discussion
trade_evaluation = pd.DataFrame({
    'player': ['Established Star', 'Young Prospect', 'Prospect 2', 'Prospect 3'],
    'years_control': [2, 6, 5, 5],
    'projected_war_annual': [5.5, 2.5, 2.0, 1.5],
    'salary': [32, 0.8, 0.75, 0.75]  # Millions per year
})

trade_evaluation['total_war'] = (trade_evaluation['years_control'] *
                                 trade_evaluation['projected_war_annual'])
trade_evaluation['total_salary'] = (trade_evaluation['years_control'] *
                                    trade_evaluation['salary'])
trade_evaluation['total_surplus'] = calculate_surplus(
    trade_evaluation['total_war'],
    trade_evaluation['total_salary']
)
trade_evaluation['surplus_per_year'] = (trade_evaluation['total_surplus'] /
                                        trade_evaluation['years_control'])

print(trade_evaluation)

print(f"\nTrade Scenario: Star for 3 prospects")
print(f"Team A receives: ${trade_evaluation['total_surplus'].iloc[0]:.0f}M surplus (win-now)")
print(f"Team B receives: ${trade_evaluation['total_surplus'].iloc[1:4].sum():.0f}M surplus (rebuilding)")

The star provides 38M surplus over 2 years (immediate value), while the prospects provide 144M surplus over 5-6 years (future value). Both teams can win this trade if their timelines align appropriately.

The Win-Now vs Rebuild Trade-off

Contending teams should trade future value for present value; rebuilding teams should do the opposite. The key is estimating competitive windows.

# Model competitive window
library(tidyverse)

# Scenario: Team is 83 wins, needs 87+ to contend
simulate_competitive_window <- function(current_wins = 83,
                                       core_aging_rate = -1.5,
                                       prospect_improvement = 0.5,
                                       years = 5) {

  tibble(
    year = 1:years,
    core_value = pmax(0, current_wins + (year - 1) * core_aging_rate),
    prospect_value = (year - 1) * prospect_improvement,
    total_wins = core_value + prospect_value
  )
}

# Two scenarios: trade for star vs keep prospects
trade_scenario <- simulate_competitive_window() %>%
  mutate(
    scenario = "Trade for Star",
    star_boost = c(7, 6, 4, 0, 0),  # 2 years of star, declining
    final_wins = total_wins + star_boost
  )

keep_scenario <- simulate_competitive_window() %>%
  mutate(
    scenario = "Keep Prospects",
    prospect_boost = year * 1.5,  # Prospects develop gradually
    final_wins = total_wins + prospect_boost
  )

scenarios <- bind_rows(trade_scenario, keep_scenario)

ggplot(scenarios, aes(x = year, y = final_wins, color = scenario)) +
  geom_line(size = 1.5) +
  geom_point(size = 3) +
  geom_hline(yintercept = 87, linetype = "dashed", color = "red") +
  annotate("text", x = 4.5, y = 88, label = "Playoff threshold", size = 3.5) +
  labs(title = "Trade Decision: Competitive Window Analysis",
       subtitle = "Win-now trade vs prospect development",
       x = "Years from Now", y = "Projected Wins",
       color = "Strategy") +
  theme_minimal() +
  scale_color_manual(values = c("darkblue", "darkgreen"))

def simulate_competitive_window(current_wins=83, core_aging_rate=-1.5,
                               prospect_improvement=0.5, years=5):
    """Model competitive window over time"""
    year = np.arange(1, years + 1)
    core_value = np.maximum(0, current_wins + (year - 1) * core_aging_rate)
    prospect_value = (year - 1) * prospect_improvement
    total_wins = core_value + prospect_value

    return pd.DataFrame({
        'year': year,
        'core_value': core_value,
        'prospect_value': prospect_value,
        'total_wins': total_wins
    })

# Two scenarios
trade_scenario = simulate_competitive_window()
trade_scenario['scenario'] = 'Trade for Star'
trade_scenario['star_boost'] = [7, 6, 4, 0, 0]
trade_scenario['final_wins'] = trade_scenario['total_wins'] + trade_scenario['star_boost']

keep_scenario = simulate_competitive_window()
keep_scenario['scenario'] = 'Keep Prospects'
keep_scenario['prospect_boost'] = keep_scenario['year'] * 1.5
keep_scenario['final_wins'] = keep_scenario['total_wins'] + keep_scenario['prospect_boost']

# Plot
plt.figure(figsize=(10, 6))
plt.plot(trade_scenario['year'], trade_scenario['final_wins'],
         marker='o', linewidth=2.5, label='Trade for Star', color='darkblue')
plt.plot(keep_scenario['year'], keep_scenario['final_wins'],
         marker='o', linewidth=2.5, label='Keep Prospects', color='darkgreen')
plt.axhline(y=87, linestyle='--', color='red', alpha=0.7, label='Playoff threshold')
plt.xlabel('Years from Now')
plt.ylabel('Projected Wins')
plt.title('Trade Decision: Competitive Window Analysis\nWin-now trade vs prospect development')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

The trade creates a 2-year window above 87 wins but sacrifices future competitiveness. Keep prospects if you value sustained success; trade if you prioritize immediate contention.

Case Study: The 2017 Astros Trade for Verlander

In August 2017, the Houston Astros traded for Justin Verlander, sending 3 prospects to Detroit. Analysis:

What Houston Gave Up:

Franklin Perez (RHP): Mid-level prospect, 30% chance of 2+ WAR

Daz Cameron (OF): Mid-level prospect, 25% chance of 2+ WAR

Jake Rogers (C): Fringe prospect, 15% chance of 2+ WAR

Expected total value: ~6 WAR over 6 years = 72M surplus

What Houston Got:

Verlander (2017 playoffs): ~1 WAR, crucial in World Series run

Verlander (2018-19): ~13 WAR combined

Total value: ~14 WAR over 2.5 years

Result: Houston won the 2017 World Series and remained elite through 2019. None of the prospects became significant contributors. The trade succeeded because Houston correctly identified their championship window and maximized present value.

R

# Calculate surplus value
calculate_surplus <- function(projected_war, salary, market_rate = 12) {
  market_value <- projected_war * market_rate
  surplus <- market_value - salary
  return(surplus)
}

# Example: Compare players in trade discussion
trade_evaluation <- tibble(
  player = c("Established Star", "Young Prospect", "Prospect 2", "Prospect 3"),
  years_control = c(2, 6, 5, 5),
  projected_war_annual = c(5.5, 2.5, 2.0, 1.5),
  salary = c(32, 0.8, 0.75, 0.75)  # Millions per year
)

trade_evaluation <- trade_evaluation %>%
  mutate(
    total_war = years_control * projected_war_annual,
    total_salary = years_control * salary,
    total_surplus = calculate_surplus(total_war, total_salary),
    surplus_per_year = total_surplus / years_control
  )

print(trade_evaluation)

cat("\nTrade Scenario: Star for 3 prospects\n")
cat("Team A receives: ", trade_evaluation$total_surplus[1], "M surplus (win-now mode)\n", sep="")
cat("Team B receives: ", sum(trade_evaluation$total_surplus[2:4]), "M surplus (rebuilding)\n", sep="")

R

# Model competitive window
library(tidyverse)

# Scenario: Team is 83 wins, needs 87+ to contend
simulate_competitive_window <- function(current_wins = 83,
                                       core_aging_rate = -1.5,
                                       prospect_improvement = 0.5,
                                       years = 5) {

  tibble(
    year = 1:years,
    core_value = pmax(0, current_wins + (year - 1) * core_aging_rate),
    prospect_value = (year - 1) * prospect_improvement,
    total_wins = core_value + prospect_value
  )
}

# Two scenarios: trade for star vs keep prospects
trade_scenario <- simulate_competitive_window() %>%
  mutate(
    scenario = "Trade for Star",
    star_boost = c(7, 6, 4, 0, 0),  # 2 years of star, declining
    final_wins = total_wins + star_boost
  )

keep_scenario <- simulate_competitive_window() %>%
  mutate(
    scenario = "Keep Prospects",
    prospect_boost = year * 1.5,  # Prospects develop gradually
    final_wins = total_wins + prospect_boost
  )

scenarios <- bind_rows(trade_scenario, keep_scenario)

ggplot(scenarios, aes(x = year, y = final_wins, color = scenario)) +
  geom_line(size = 1.5) +
  geom_point(size = 3) +
  geom_hline(yintercept = 87, linetype = "dashed", color = "red") +
  annotate("text", x = 4.5, y = 88, label = "Playoff threshold", size = 3.5) +
  labs(title = "Trade Decision: Competitive Window Analysis",
       subtitle = "Win-now trade vs prospect development",
       x = "Years from Now", y = "Projected Wins",
       color = "Strategy") +
  theme_minimal() +
  scale_color_manual(values = c("darkblue", "darkgreen"))

Python

def calculate_surplus(projected_war, salary, market_rate=12):
    """Calculate surplus value"""
    market_value = projected_war * market_rate
    surplus = market_value - salary
    return surplus

# Example: Compare players in trade discussion
trade_evaluation = pd.DataFrame({
    'player': ['Established Star', 'Young Prospect', 'Prospect 2', 'Prospect 3'],
    'years_control': [2, 6, 5, 5],
    'projected_war_annual': [5.5, 2.5, 2.0, 1.5],
    'salary': [32, 0.8, 0.75, 0.75]  # Millions per year
})

trade_evaluation['total_war'] = (trade_evaluation['years_control'] *
                                 trade_evaluation['projected_war_annual'])
trade_evaluation['total_salary'] = (trade_evaluation['years_control'] *
                                    trade_evaluation['salary'])
trade_evaluation['total_surplus'] = calculate_surplus(
    trade_evaluation['total_war'],
    trade_evaluation['total_salary']
)
trade_evaluation['surplus_per_year'] = (trade_evaluation['total_surplus'] /
                                        trade_evaluation['years_control'])

print(trade_evaluation)

print(f"\nTrade Scenario: Star for 3 prospects")
print(f"Team A receives: ${trade_evaluation['total_surplus'].iloc[0]:.0f}M surplus (win-now)")
print(f"Team B receives: ${trade_evaluation['total_surplus'].iloc[1:4].sum():.0f}M surplus (rebuilding)")

Python

def simulate_competitive_window(current_wins=83, core_aging_rate=-1.5,
                               prospect_improvement=0.5, years=5):
    """Model competitive window over time"""
    year = np.arange(1, years + 1)
    core_value = np.maximum(0, current_wins + (year - 1) * core_aging_rate)
    prospect_value = (year - 1) * prospect_improvement
    total_wins = core_value + prospect_value

    return pd.DataFrame({
        'year': year,
        'core_value': core_value,
        'prospect_value': prospect_value,
        'total_wins': total_wins
    })

# Two scenarios
trade_scenario = simulate_competitive_window()
trade_scenario['scenario'] = 'Trade for Star'
trade_scenario['star_boost'] = [7, 6, 4, 0, 0]
trade_scenario['final_wins'] = trade_scenario['total_wins'] + trade_scenario['star_boost']

keep_scenario = simulate_competitive_window()
keep_scenario['scenario'] = 'Keep Prospects'
keep_scenario['prospect_boost'] = keep_scenario['year'] * 1.5
keep_scenario['final_wins'] = keep_scenario['total_wins'] + keep_scenario['prospect_boost']

# Plot
plt.figure(figsize=(10, 6))
plt.plot(trade_scenario['year'], trade_scenario['final_wins'],
         marker='o', linewidth=2.5, label='Trade for Star', color='darkblue')
plt.plot(keep_scenario['year'], keep_scenario['final_wins'],
         marker='o', linewidth=2.5, label='Keep Prospects', color='darkgreen')
plt.axhline(y=87, linestyle='--', color='red', alpha=0.7, label='Playoff threshold')
plt.xlabel('Years from Now')
plt.ylabel('Projected Wins')
plt.title('Trade Decision: Competitive Window Analysis\nWin-now trade vs prospect development')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

14.5 Draft Strategy

The MLB draft differs fundamentally from other sports' drafts. There's no salary cap, draft picks can't be traded directly (only via competitive balance), and amateur players take 3-5 years to reach MLB. These features create unique strategic considerations.

Expected Value by Draft Position

Draft picks have declining expected value, but unlike the NBA or NFL, even first overall picks are uncertain:

# Model expected value by draft position (based on historical research)
library(tidyverse)

draft_value <- tibble(
  pick = 1:100,
  prob_mlb = pmax(0.05, 0.85 - (pick - 1) * 0.008),  # Probability of reaching MLB
  expected_war = pmax(0.1, 12 - (pick - 1) * 0.11),  # Expected career WAR if reaches MLB
  overall_expected_war = prob_mlb * expected_war
) %>%
  mutate(
    surplus_value = overall_expected_war * 12 - 2,  # $12M/WAR, -$2M signing bonus
    round = case_when(
      pick <= 30 ~ "Round 1",
      pick <= 60 ~ "Round 2",
      pick <= 100 ~ "Round 3+"
    )
  )

# Visualize
ggplot(draft_value %>% filter(pick <= 60),
       aes(x = pick, y = surplus_value, color = round)) +
  geom_line(size = 1.5) +
  geom_point(size = 2) +
  scale_color_manual(values = c("darkblue", "steelblue", "lightblue")) +
  labs(title = "Expected Surplus Value by Draft Position",
       subtitle = "MLB Draft expected value declines gradually",
       x = "Draft Pick", y = "Expected Surplus Value ($M)",
       color = "Round") +
  theme_minimal()

# Compare value tiers
cat("Expected surplus value:\n")
cat("Pick 1: $", round(draft_value$surplus_value[1], 1), "M\n", sep="")
cat("Pick 10: $", round(draft_value$surplus_value[10], 1), "M\n", sep="")
cat("Pick 30: $", round(draft_value$surplus_value[30], 1), "M\n", sep="")

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Model expected value by draft position
pick = np.arange(1, 101)
prob_mlb = np.maximum(0.05, 0.85 - (pick - 1) * 0.008)
expected_war = np.maximum(0.1, 12 - (pick - 1) * 0.11)
overall_expected_war = prob_mlb * expected_war

draft_value = pd.DataFrame({
    'pick': pick,
    'prob_mlb': prob_mlb,
    'expected_war': expected_war,
    'overall_expected_war': overall_expected_war
})

draft_value['surplus_value'] = draft_value['overall_expected_war'] * 12 - 2
draft_value['round'] = pd.cut(draft_value['pick'],
                              bins=[0, 30, 60, 100],
                              labels=['Round 1', 'Round 2', 'Round 3+'])

# Visualize
plt.figure(figsize=(10, 6))
for round_name, color in [('Round 1', 'darkblue'),
                          ('Round 2', 'steelblue'),
                          ('Round 3+', 'lightblue')]:
    data = draft_value[draft_value['round'] == round_name]
    if len(data) > 0 and data['pick'].max() <= 60:
        plt.plot(data['pick'], data['surplus_value'],
                label=round_name, color=color, linewidth=2)

plt.xlabel('Draft Pick')
plt.ylabel('Expected Surplus Value ($M)')
plt.title('Expected Surplus Value by Draft Position\nMLB Draft expected value declines gradually')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xlim(0, 60)
plt.tight_layout()
plt.show()

print("Expected surplus value:")
print(f"Pick 1: ${draft_value['surplus_value'].iloc[0]:.1f}M")
print(f"Pick 10: ${draft_value['surplus_value'].iloc[9]:.1f}M")
print(f"Pick 30: ${draft_value['surplus_value'].iloc[29]:.1f}M")

Unlike the NBA where the #1 pick has enormous expected value, MLB's #1 pick is worth perhaps 10M more than #10. This flatter value curve reduces the incentive to tank.

College vs High School Players

Teams face a fundamental strategic choice: draft college players (safer, closer to MLB-ready) or high school players (riskier, higher upside).

College Players:

Higher floor: More developed, easier to evaluate

Lower ceiling: Limited remaining development time

Faster to MLB: Often 2-3 years vs 4-5 for high schoolers

Better for contending teams with shorter time horizons

High School Players:

Higher ceiling: More physical development remaining

Lower floor: Higher bust rate, harder to evaluate

Slower to MLB: Longer development time

Better for rebuilding teams with patient timelines

# Compare college vs high school success rates (based on research)
library(tidyverse)

draft_comparison <- tibble(
  pick_range = rep(c("1-10", "11-30", "31-60"), 2),
  player_type = rep(c("College", "High School"), each = 3),
  mlb_rate = c(0.75, 0.65, 0.52,    # College
               0.60, 0.48, 0.35),   # High School
  avg_war = c(15, 10, 6,             # College
              18, 13, 8),            # High School (among those who make it)
  expected_war = mlb_rate * avg_war
)

ggplot(draft_comparison, aes(x = pick_range, y = expected_war,
                             fill = player_type)) +
  geom_col(position = "dodge", alpha = 0.8) +
  scale_fill_manual(values = c("College" = "darkblue",
                               "High School" = "darkorange")) +
  labs(title = "College vs High School Draft Value",
       subtitle = "Expected career WAR by draft position",
       x = "Draft Pick Range", y = "Expected WAR",
       fill = "Player Type") +
  theme_minimal()

# Compare college vs high school success rates
draft_comparison = pd.DataFrame({
    'pick_range': ['1-10', '11-30', '31-60'] * 2,
    'player_type': ['College'] * 3 + ['High School'] * 3,
    'mlb_rate': [0.75, 0.65, 0.52,    # College
                 0.60, 0.48, 0.35],   # High School
    'avg_war': [15, 10, 6,             # College
                18, 13, 8],            # High School
})

draft_comparison['expected_war'] = (draft_comparison['mlb_rate'] *
                                    draft_comparison['avg_war'])

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
x = np.arange(3)
width = 0.35

college = draft_comparison[draft_comparison['player_type'] == 'College']
hs = draft_comparison[draft_comparison['player_type'] == 'High School']

ax.bar(x - width/2, college['expected_war'], width,
       label='College', color='darkblue', alpha=0.8)
ax.bar(x + width/2, hs['expected_war'], width,
       label='High School', color='darkorange', alpha=0.8)

ax.set_xlabel('Draft Pick Range')
ax.set_ylabel('Expected WAR')
ax.set_title('College vs High School Draft Value\nExpected career WAR by draft position')
ax.set_xticks(x)
ax.set_xticklabels(['1-10', '11-30', '31-60'])
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

In the top 10, high schoolers have higher expected value despite lower MLB rates because successful high schoolers contribute more total WAR. This reverses in later rounds where college players' safer profiles dominate.

Signability and Slot Value

Unlike NBA/NFL drafts, MLB draft picks can refuse to sign. Teams must balance talent evaluation with signability within bonus pool constraints. This creates market efficiency opportunities.

Strategy for Teams Exceeding Slot:

Target high-upside players who fell due to signability concerns

Offer over-slot bonuses to secure them

Accept penalties for exceeding pool (10-15% tax, or worse if exceeding by 5%+)

Strategy for Cost-Conscious Teams:

Draft signable college seniors (limited leverage) early

Save slot money for later rounds

Target high schoolers with college commitments in later rounds, offer them saved money

The 2023 Pirates exemplified this: they drafted college-heavy early (signable), then used saved pool money on high-upside high schoolers in rounds 3-10.

Case Study: Astros' Draft Strategy (2011-2014)

Houston's rebuild featured aggressive draft strategy:

2012-2014 Strategy:

Accumulated top picks via losing (1st overall: 2012, 2013, 2014)

Drafted high-ceiling high schoolers: Carlos Correa (2012), Mark Appel (2013 bust), Alex Bregman (2015)

Exceeded slot bonuses to secure upside

Accepted competitive balance pick penalties

Results:

Correa: 135 WAR through 2023, cornerstone player

Bregman: 35+ WAR, All-Star third baseman

Strong supporting cast from later picks

This aggressive strategy, combined with analytics-driven player development, turned Houston from 100-loss team (2011-2013) to World Series champion (2017).

R

# Model expected value by draft position (based on historical research)
library(tidyverse)

draft_value <- tibble(
  pick = 1:100,
  prob_mlb = pmax(0.05, 0.85 - (pick - 1) * 0.008),  # Probability of reaching MLB
  expected_war = pmax(0.1, 12 - (pick - 1) * 0.11),  # Expected career WAR if reaches MLB
  overall_expected_war = prob_mlb * expected_war
) %>%
  mutate(
    surplus_value = overall_expected_war * 12 - 2,  # $12M/WAR, -$2M signing bonus
    round = case_when(
      pick <= 30 ~ "Round 1",
      pick <= 60 ~ "Round 2",
      pick <= 100 ~ "Round 3+"
    )
  )

# Visualize
ggplot(draft_value %>% filter(pick <= 60),
       aes(x = pick, y = surplus_value, color = round)) +
  geom_line(size = 1.5) +
  geom_point(size = 2) +
  scale_color_manual(values = c("darkblue", "steelblue", "lightblue")) +
  labs(title = "Expected Surplus Value by Draft Position",
       subtitle = "MLB Draft expected value declines gradually",
       x = "Draft Pick", y = "Expected Surplus Value ($M)",
       color = "Round") +
  theme_minimal()

# Compare value tiers
cat("Expected surplus value:\n")
cat("Pick 1: $", round(draft_value$surplus_value[1], 1), "M\n", sep="")
cat("Pick 10: $", round(draft_value$surplus_value[10], 1), "M\n", sep="")
cat("Pick 30: $", round(draft_value$surplus_value[30], 1), "M\n", sep="")

R

# Compare college vs high school success rates (based on research)
library(tidyverse)

draft_comparison <- tibble(
  pick_range = rep(c("1-10", "11-30", "31-60"), 2),
  player_type = rep(c("College", "High School"), each = 3),
  mlb_rate = c(0.75, 0.65, 0.52,    # College
               0.60, 0.48, 0.35),   # High School
  avg_war = c(15, 10, 6,             # College
              18, 13, 8),            # High School (among those who make it)
  expected_war = mlb_rate * avg_war
)

ggplot(draft_comparison, aes(x = pick_range, y = expected_war,
                             fill = player_type)) +
  geom_col(position = "dodge", alpha = 0.8) +
  scale_fill_manual(values = c("College" = "darkblue",
                               "High School" = "darkorange")) +
  labs(title = "College vs High School Draft Value",
       subtitle = "Expected career WAR by draft position",
       x = "Draft Pick Range", y = "Expected WAR",
       fill = "Player Type") +
  theme_minimal()

Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Model expected value by draft position
pick = np.arange(1, 101)
prob_mlb = np.maximum(0.05, 0.85 - (pick - 1) * 0.008)
expected_war = np.maximum(0.1, 12 - (pick - 1) * 0.11)
overall_expected_war = prob_mlb * expected_war

draft_value = pd.DataFrame({
    'pick': pick,
    'prob_mlb': prob_mlb,
    'expected_war': expected_war,
    'overall_expected_war': overall_expected_war
})

draft_value['surplus_value'] = draft_value['overall_expected_war'] * 12 - 2
draft_value['round'] = pd.cut(draft_value['pick'],
                              bins=[0, 30, 60, 100],
                              labels=['Round 1', 'Round 2', 'Round 3+'])

# Visualize
plt.figure(figsize=(10, 6))
for round_name, color in [('Round 1', 'darkblue'),
                          ('Round 2', 'steelblue'),
                          ('Round 3+', 'lightblue')]:
    data = draft_value[draft_value['round'] == round_name]
    if len(data) > 0 and data['pick'].max() <= 60:
        plt.plot(data['pick'], data['surplus_value'],
                label=round_name, color=color, linewidth=2)

plt.xlabel('Draft Pick')
plt.ylabel('Expected Surplus Value ($M)')
plt.title('Expected Surplus Value by Draft Position\nMLB Draft expected value declines gradually')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xlim(0, 60)
plt.tight_layout()
plt.show()

print("Expected surplus value:")
print(f"Pick 1: ${draft_value['surplus_value'].iloc[0]:.1f}M")
print(f"Pick 10: ${draft_value['surplus_value'].iloc[9]:.1f}M")
print(f"Pick 30: ${draft_value['surplus_value'].iloc[29]:.1f}M")

Python

# Compare college vs high school success rates
draft_comparison = pd.DataFrame({
    'pick_range': ['1-10', '11-30', '31-60'] * 2,
    'player_type': ['College'] * 3 + ['High School'] * 3,
    'mlb_rate': [0.75, 0.65, 0.52,    # College
                 0.60, 0.48, 0.35],   # High School
    'avg_war': [15, 10, 6,             # College
                18, 13, 8],            # High School
})

draft_comparison['expected_war'] = (draft_comparison['mlb_rate'] *
                                    draft_comparison['avg_war'])

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
x = np.arange(3)
width = 0.35

college = draft_comparison[draft_comparison['player_type'] == 'College']
hs = draft_comparison[draft_comparison['player_type'] == 'High School']

ax.bar(x - width/2, college['expected_war'], width,
       label='College', color='darkblue', alpha=0.8)
ax.bar(x + width/2, hs['expected_war'], width,
       label='High School', color='darkorange', alpha=0.8)

ax.set_xlabel('Draft Pick Range')
ax.set_ylabel('Expected WAR')
ax.set_title('College vs High School Draft Value\nExpected career WAR by draft position')
ax.set_xticks(x)
ax.set_xticklabels(['1-10', '11-30', '31-60'])
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

14.6 Contender vs Rebuilding

Perhaps the most consequential front office decision is determining competitive timeline: compete now, rebuild, or attempt middle ground. Analytics can inform but not fully answer this strategic question.

The Competitive Window

Teams have limited windows of contention based on:

Core player aging curves

Contract expirations and financial flexibility

Farm system strength

Division and league competition

# Model competitive window
library(tidyverse)

# Hypothetical team: strong core aging out, weak farm system
team_projection <- tibble(
  year = 2024:2033,
  core_war = c(35, 33, 30, 26, 22, 18, 14, 10, 8, 6),  # Aging stars
  young_players = c(5, 8, 10, 12, 12, 11, 10, 8, 6, 5),
  free_agents = c(8, 8, 8, 8, 8, 8, 8, 8, 8, 8),  # Replacement level
  total_war = core_war + young_players + free_agents
) %>%
  mutate(
    expected_wins = 52 + (total_war * 0.96),  # Pythagorean conversion
    playoff_prob = plogis((expected_wins - 87) / 3) * 100
  )

# Visualize window
ggplot(team_projection, aes(x = year, y = expected_wins)) +
  geom_line(size = 1.5, color = "darkblue") +
  geom_ribbon(aes(ymin = 52, ymax = expected_wins), alpha = 0.3, fill = "blue") +
  geom_hline(yintercept = 87, linetype = "dashed", color = "red") +
  annotate("text", x = 2028, y = 88, label = "Playoff threshold (~87 wins)",
           size = 4, hjust = 0) +
  annotate("rect", xmin = 2024, xmax = 2027, ymin = 50, ymax = 105,
           alpha = 0.1, fill = "green") +
  annotate("text", x = 2025.5, y = 103, label = "Competitive Window",
           size = 4.5, fontface = "bold") +
  labs(title = "Projected Competitive Window",
       subtitle = "Team with aging core, limited farm system",
       x = "Year", y = "Projected Wins") +
  theme_minimal() +
  scale_x_continuous(breaks = seq(2024, 2033, 1))

print(team_projection)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.special import expit

# Hypothetical team projection
team_projection = pd.DataFrame({
    'year': range(2024, 2034),
    'core_war': [35, 33, 30, 26, 22, 18, 14, 10, 8, 6],
    'young_players': [5, 8, 10, 12, 12, 11, 10, 8, 6, 5],
    'free_agents': [8] * 10
})

team_projection['total_war'] = (team_projection['core_war'] +
                                team_projection['young_players'] +
                                team_projection['free_agents'])
team_projection['expected_wins'] = 52 + (team_projection['total_war'] * 0.96)
team_projection['playoff_prob'] = (expit((team_projection['expected_wins'] - 87) / 3) * 100)

print(team_projection)

# Visualize
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(team_projection['year'], team_projection['expected_wins'],
        linewidth=2.5, color='darkblue', marker='o')
ax.fill_between(team_projection['year'], 52, team_projection['expected_wins'],
                alpha=0.3, color='blue')
ax.axhline(y=87, linestyle='--', color='red', linewidth=1.5, label='Playoff threshold')

# Highlight competitive window
ax.axvspan(2024, 2027, alpha=0.1, color='green')
ax.text(2025.5, 103, 'Competitive Window', fontsize=12,
        ha='center', fontweight='bold')

ax.set_xlabel('Year', fontsize=11)
ax.set_ylabel('Projected Wins', fontsize=11)
ax.set_title('Projected Competitive Window\nTeam with aging core, limited farm system',
             fontsize=13)
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

This team has 3-4 years to contend before core decline makes competition unlikely. This shapes all roster decisions.

The Cost of Mediocrity

The "middle ground" strategy—trying to compete while rebuilding—often fails. Mediocre teams (75-82 wins) get worst of both worlds:

Miss playoffs (no postseason revenue or success)

Pick 15-20th in draft (lower talent acquisition)

Limited prospect surplus to trade

Reduced organizational energy and fan enthusiasm

# Compare strategies over 10 years
library(tidyverse)

simulate_strategy <- function(strategy = "compete", years = 10) {
  if (strategy == "compete") {
    wins <- c(92, 88, 91, 85, 82, 78, 75, 72, 70, 68)
    playoff_prob <- plogis((wins - 87) / 3)
    championships <- c(0, 0, 0.15, 0, 0, 0, 0, 0, 0, 0)  # 15% chance in year 3
  } else if (strategy == "rebuild") {
    wins <- c(68, 65, 72, 78, 84, 88, 91, 93, 89, 87)
    playoff_prob <- plogis((wins - 87) / 3)
    championships <- c(0, 0, 0, 0, 0.05, 0.10, 0.15, 0.18, 0.12, 0.10)
  } else {  # mediocre
    wins <- c(79, 77, 80, 78, 81, 79, 80, 82, 78, 77)
    playoff_prob <- plogis((wins - 87) / 3)
    championships <- rep(0.02, 10)
  }

  tibble(
    year = 1:years,
    strategy = strategy,
    wins = wins,
    playoff_prob = playoff_prob,
    championship_prob = championships,
    expected_championships = sum(championships)
  )
}

strategies <- bind_rows(
  simulate_strategy("compete"),
  simulate_strategy("rebuild"),
  simulate_strategy("mediocre")
)

# Compare outcomes
ggplot(strategies, aes(x = year, y = wins, color = strategy)) +
  geom_line(size = 1.5) +
  geom_point(size = 3) +
  geom_hline(yintercept = 87, linetype = "dashed", color = "black") +
  scale_color_manual(values = c("compete" = "darkgreen",
                                "rebuild" = "darkblue",
                                "mediocre" = "gray50")) +
  labs(title = "Strategic Approach Comparison",
       subtitle = "10-year outcomes by strategy",
       x = "Year", y = "Wins", color = "Strategy") +
  theme_minimal()

# Summary statistics
strategies %>%
  group_by(strategy) %>%
  summarise(
    total_wins = sum(wins),
    avg_wins = mean(wins),
    playoff_years = sum(playoff_prob > 0.5),
    expected_titles = sum(championship_prob)
  ) %>%
  print()

def simulate_strategy(strategy='compete', years=10):
    """Simulate 10-year outcomes by strategy"""
    if strategy == 'compete':
        wins = [92, 88, 91, 85, 82, 78, 75, 72, 70, 68]
        championships = [0, 0, 0.15, 0, 0, 0, 0, 0, 0, 0]
    elif strategy == 'rebuild':
        wins = [68, 65, 72, 78, 84, 88, 91, 93, 89, 87]
        championships = [0, 0, 0, 0, 0.05, 0.10, 0.15, 0.18, 0.12, 0.10]
    else:  # mediocre
        wins = [79, 77, 80, 78, 81, 79, 80, 82, 78, 77]
        championships = [0.02] * 10

    playoff_prob = expit((np.array(wins) - 87) / 3)

    return pd.DataFrame({
        'year': range(1, years + 1),
        'strategy': strategy,
        'wins': wins,
        'playoff_prob': playoff_prob,
        'championship_prob': championships
    })

# Combine strategies
strategies = pd.concat([
    simulate_strategy('compete'),
    simulate_strategy('rebuild'),
    simulate_strategy('mediocre')
])

# Visualize
plt.figure(figsize=(12, 6))
for strat, color in [('compete', 'darkgreen'),
                     ('rebuild', 'darkblue'),
                     ('mediocre', 'gray')]:
    data = strategies[strategies['strategy'] == strat]
    plt.plot(data['year'], data['wins'], marker='o',
            linewidth=2.5, label=strat.capitalize(), color=color)

plt.axhline(y=87, linestyle='--', color='black', alpha=0.7, label='Playoff threshold')
plt.xlabel('Year')
plt.ylabel('Wins')
plt.title('Strategic Approach Comparison\n10-year outcomes by strategy')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Summary
summary = strategies.groupby('strategy').agg({
    'wins': ['sum', 'mean'],
    'playoff_prob': lambda x: (x > 0.5).sum(),
    'championship_prob': 'sum'
}).round(2)
print("\nStrategy Comparison:")
print(summary)

The rebuild strategy produces fewer total wins but more championships. The compete strategy wins early but fades. The mediocre approach accomplishes little—fewer wins and championships than either alternative.

Signals for Rebuilding

Analytics can identify when rebuilding becomes optimal:

Declining core with expensive contracts: Core players 30+ with 2-4 years remaining
Weak farm system: Low prospect surplus value
Division competition: Strong division rivals with younger cores
Market inefficiency: High trade value for aging stars (can recoup surplus)
Expensive win curve: Would need to pay $15M+ per marginal win to contend

Signals for Competing

Compete when:

Strong core in prime (ages 26-30): Multiple stars under team control
Deep farm system: Can trade prospects without depleting
Weak competition: Division and league competitive balance favors you
Favorable contracts: Core players on team-friendly deals create budget flexibility
Efficient win curve: Can buy marginal wins at or below market rate

The Astros Rebuild Model

Houston's 2011-2014 rebuild became a template:

Phase 1 (2011-2013): Tear Down

Trade all valuable veterans for prospects (Hunter Pence, Wandy Rodriguez, etc.)

Accept 100+ loss seasons

Accumulate top draft picks

Install analytics-driven front office and development system

Phase 2 (2014-2015): Foundation

Promote young talent (George Springer, Jose Altuve, Dallas Keuchel)

Continue high draft picks

Begin selective free agent signings (low-cost veterans)

Phase 3 (2016-2017): Compete

Trade prospects for proven talent (Ken Giles, Justin Verlander)

Sign impact free agents (Carlos Beltran)

Promote remaining top prospects (Alex Bregman, Carlos Correa)

Result:

3 World Series appearances (2017, 2019, 2021)

1 championship (2017)

Sustained excellence (85+ wins every year 2015-2023)

The key: Clear strategy, organizational commitment, and analytics-driven execution at every phase.

R

# Model competitive window
library(tidyverse)

# Hypothetical team: strong core aging out, weak farm system
team_projection <- tibble(
  year = 2024:2033,
  core_war = c(35, 33, 30, 26, 22, 18, 14, 10, 8, 6),  # Aging stars
  young_players = c(5, 8, 10, 12, 12, 11, 10, 8, 6, 5),
  free_agents = c(8, 8, 8, 8, 8, 8, 8, 8, 8, 8),  # Replacement level
  total_war = core_war + young_players + free_agents
) %>%
  mutate(
    expected_wins = 52 + (total_war * 0.96),  # Pythagorean conversion
    playoff_prob = plogis((expected_wins - 87) / 3) * 100
  )

# Visualize window
ggplot(team_projection, aes(x = year, y = expected_wins)) +
  geom_line(size = 1.5, color = "darkblue") +
  geom_ribbon(aes(ymin = 52, ymax = expected_wins), alpha = 0.3, fill = "blue") +
  geom_hline(yintercept = 87, linetype = "dashed", color = "red") +
  annotate("text", x = 2028, y = 88, label = "Playoff threshold (~87 wins)",
           size = 4, hjust = 0) +
  annotate("rect", xmin = 2024, xmax = 2027, ymin = 50, ymax = 105,
           alpha = 0.1, fill = "green") +
  annotate("text", x = 2025.5, y = 103, label = "Competitive Window",
           size = 4.5, fontface = "bold") +
  labs(title = "Projected Competitive Window",
       subtitle = "Team with aging core, limited farm system",
       x = "Year", y = "Projected Wins") +
  theme_minimal() +
  scale_x_continuous(breaks = seq(2024, 2033, 1))

print(team_projection)

R

# Compare strategies over 10 years
library(tidyverse)

simulate_strategy <- function(strategy = "compete", years = 10) {
  if (strategy == "compete") {
    wins <- c(92, 88, 91, 85, 82, 78, 75, 72, 70, 68)
    playoff_prob <- plogis((wins - 87) / 3)
    championships <- c(0, 0, 0.15, 0, 0, 0, 0, 0, 0, 0)  # 15% chance in year 3
  } else if (strategy == "rebuild") {
    wins <- c(68, 65, 72, 78, 84, 88, 91, 93, 89, 87)
    playoff_prob <- plogis((wins - 87) / 3)
    championships <- c(0, 0, 0, 0, 0.05, 0.10, 0.15, 0.18, 0.12, 0.10)
  } else {  # mediocre
    wins <- c(79, 77, 80, 78, 81, 79, 80, 82, 78, 77)
    playoff_prob <- plogis((wins - 87) / 3)
    championships <- rep(0.02, 10)
  }

  tibble(
    year = 1:years,
    strategy = strategy,
    wins = wins,
    playoff_prob = playoff_prob,
    championship_prob = championships,
    expected_championships = sum(championships)
  )
}

strategies <- bind_rows(
  simulate_strategy("compete"),
  simulate_strategy("rebuild"),
  simulate_strategy("mediocre")
)

# Compare outcomes
ggplot(strategies, aes(x = year, y = wins, color = strategy)) +
  geom_line(size = 1.5) +
  geom_point(size = 3) +
  geom_hline(yintercept = 87, linetype = "dashed", color = "black") +
  scale_color_manual(values = c("compete" = "darkgreen",
                                "rebuild" = "darkblue",
                                "mediocre" = "gray50")) +
  labs(title = "Strategic Approach Comparison",
       subtitle = "10-year outcomes by strategy",
       x = "Year", y = "Wins", color = "Strategy") +
  theme_minimal()

# Summary statistics
strategies %>%
  group_by(strategy) %>%
  summarise(
    total_wins = sum(wins),
    avg_wins = mean(wins),
    playoff_years = sum(playoff_prob > 0.5),
    expected_titles = sum(championship_prob)
  ) %>%
  print()

Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.special import expit

# Hypothetical team projection
team_projection = pd.DataFrame({
    'year': range(2024, 2034),
    'core_war': [35, 33, 30, 26, 22, 18, 14, 10, 8, 6],
    'young_players': [5, 8, 10, 12, 12, 11, 10, 8, 6, 5],
    'free_agents': [8] * 10
})

team_projection['total_war'] = (team_projection['core_war'] +
                                team_projection['young_players'] +
                                team_projection['free_agents'])
team_projection['expected_wins'] = 52 + (team_projection['total_war'] * 0.96)
team_projection['playoff_prob'] = (expit((team_projection['expected_wins'] - 87) / 3) * 100)

print(team_projection)

# Visualize
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(team_projection['year'], team_projection['expected_wins'],
        linewidth=2.5, color='darkblue', marker='o')
ax.fill_between(team_projection['year'], 52, team_projection['expected_wins'],
                alpha=0.3, color='blue')
ax.axhline(y=87, linestyle='--', color='red', linewidth=1.5, label='Playoff threshold')

# Highlight competitive window
ax.axvspan(2024, 2027, alpha=0.1, color='green')
ax.text(2025.5, 103, 'Competitive Window', fontsize=12,
        ha='center', fontweight='bold')

ax.set_xlabel('Year', fontsize=11)
ax.set_ylabel('Projected Wins', fontsize=11)
ax.set_title('Projected Competitive Window\nTeam with aging core, limited farm system',
             fontsize=13)
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Python

def simulate_strategy(strategy='compete', years=10):
    """Simulate 10-year outcomes by strategy"""
    if strategy == 'compete':
        wins = [92, 88, 91, 85, 82, 78, 75, 72, 70, 68]
        championships = [0, 0, 0.15, 0, 0, 0, 0, 0, 0, 0]
    elif strategy == 'rebuild':
        wins = [68, 65, 72, 78, 84, 88, 91, 93, 89, 87]
        championships = [0, 0, 0, 0, 0.05, 0.10, 0.15, 0.18, 0.12, 0.10]
    else:  # mediocre
        wins = [79, 77, 80, 78, 81, 79, 80, 82, 78, 77]
        championships = [0.02] * 10

    playoff_prob = expit((np.array(wins) - 87) / 3)

    return pd.DataFrame({
        'year': range(1, years + 1),
        'strategy': strategy,
        'wins': wins,
        'playoff_prob': playoff_prob,
        'championship_prob': championships
    })

# Combine strategies
strategies = pd.concat([
    simulate_strategy('compete'),
    simulate_strategy('rebuild'),
    simulate_strategy('mediocre')
])

# Visualize
plt.figure(figsize=(12, 6))
for strat, color in [('compete', 'darkgreen'),
                     ('rebuild', 'darkblue'),
                     ('mediocre', 'gray')]:
    data = strategies[strategies['strategy'] == strat]
    plt.plot(data['year'], data['wins'], marker='o',
            linewidth=2.5, label=strat.capitalize(), color=color)

plt.axhline(y=87, linestyle='--', color='black', alpha=0.7, label='Playoff threshold')
plt.xlabel('Year')
plt.ylabel('Wins')
plt.title('Strategic Approach Comparison\n10-year outcomes by strategy')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Summary
summary = strategies.groupby('strategy').agg({
    'wins': ['sum', 'mean'],
    'playoff_prob': lambda x: (x > 0.5).sum(),
    'championship_prob': 'sum'
}).round(2)
print("\nStrategy Comparison:")
print(summary)

14.7 Interactive Roster Tools

Building competitive rosters requires synthesizing complex financial, performance, and strategic data into actionable decisions. Interactive visualizations transform static spreadsheets into dynamic decision-support tools that help front offices evaluate trade-offs, identify market inefficiencies, and optimize resource allocation. This section introduces three essential interactive tools for modern roster construction: payroll efficiency scatter plots, WAR distribution sunburst charts, and free agent valuation calculators.

Interactive Payroll vs Wins Analysis

Understanding the relationship between spending and winning is fundamental to roster construction. An interactive scatter plot allows teams to benchmark their efficiency against competitors, identify outliers, and explore how different spending levels correlate with success.

Let's build an interactive payroll vs wins visualization with team-specific details:

library(tidyverse)
library(plotly)
library(Lahman)

# Create comprehensive payroll-wins dataset
# Note: This uses simulated 2023 data; replace with actual payroll data
create_payroll_analysis <- function(year = 2023) {

  # Get team wins for the year
  team_records <- Teams %>%
    filter(yearID == year) %>%
    select(teamID, name, W, L, G) %>%
    mutate(
      win_pct = W / (W + L),
      team_abbr = teamID
    )

  # Simulated payroll data (in millions) - replace with actual data
  payroll_data <- tibble(
    teamID = c("NYA", "LAN", "NYN", "PHI", "SDP", "BOS", "HOU", "ATL",
               "TOR", "SFN", "CHN", "TEX", "SEA", "ANA", "STL", "MIN",
               "DET", "COL", "ARI", "MIA", "CHA", "CIN", "MIL", "CLE",
               "KCA", "PIT", "WAS", "TBA", "OAK", "BAL"),
    payroll = c(280, 265, 240, 235, 230, 215, 205, 195,
               190, 185, 180, 175, 170, 165, 160, 155,
               150, 145, 140, 135, 130, 125, 120, 115,
               110, 105, 100, 90, 85, 80)
  )

  # Combine datasets
  analysis_data <- team_records %>%
    left_join(payroll_data, by = "teamID") %>%
    filter(!is.na(payroll)) %>%
    mutate(
      cost_per_win = payroll / W,
      playoff_team = W >= 87,  # Approximate playoff threshold
      efficiency_category = case_when(
        cost_per_win < 1.5 ~ "High Efficiency",
        cost_per_win < 2.5 ~ "Average Efficiency",
        TRUE ~ "Low Efficiency"
      )
    )

  return(analysis_data)
}

# Generate data and create interactive plot
team_data <- create_payroll_analysis(2023)

# Create interactive scatter plot
p <- plot_ly(team_data,
             x = ~payroll,
             y = ~W,
             type = 'scatter',
             mode = 'markers',
             color = ~playoff_team,
             colors = c("FALSE" = "lightblue", "TRUE" = "darkgreen"),
             marker = list(size = 12, opacity = 0.7),
             text = ~paste("<b>", name, "</b>",
                          "<br>Payroll: $", round(payroll, 0), "M",
                          "<br>Wins:", W,
                          "<br>Cost/Win: $", sprintf("%.2f", cost_per_win), "M",
                          "<br>Win %:", sprintf("%.3f", win_pct)),
             hoverinfo = 'text') %>%
  add_trace(
    type = 'scatter',
    mode = 'lines',
    x = c(80, 280),
    y = c(70, 95),  # Approximate trend line
    line = list(color = 'red', dash = 'dash'),
    name = 'Expected Wins',
    showlegend = TRUE,
    hoverinfo = 'skip'
  ) %>%
  layout(
    title = "MLB Payroll vs Performance (2023)",
    xaxis = list(title = "Payroll ($M)"),
    yaxis = list(title = "Wins"),
    hovermode = 'closest',
    legend = list(title = list(text = "Made Playoffs"))
  )

# Add annotations for outliers
best_efficiency <- team_data %>%
  filter(playoff_team) %>%
  slice_min(cost_per_win, n = 1)

worst_efficiency <- team_data %>%
  slice_max(cost_per_win, n = 1)

p <- p %>%
  add_annotations(
    x = best_efficiency$payroll,
    y = best_efficiency$W,
    text = paste0(best_efficiency$name, "<br>Best Value"),
    showarrow = TRUE,
    arrowhead = 2,
    ax = 30,
    ay = -40
  ) %>%
  add_annotations(
    x = worst_efficiency$payroll,
    y = worst_efficiency$W,
    text = paste0(worst_efficiency$name, "<br>Worst Value"),
    showarrow = TRUE,
    arrowhead = 2,
    ax = -30,
    ay = 40
  )

p

import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from pybaseball import lahman

# Load team data
teams = lahman.teams()

# Filter for recent season
team_records = teams[teams['yearID'] == 2023][['teamID', 'name', 'W', 'L', 'G']].copy()
team_records['win_pct'] = team_records['W'] / (team_records['W'] + team_records['L'])

# Simulated payroll data (replace with actual data)
payroll_data = pd.DataFrame({
    'teamID': ['NYA', 'LAN', 'NYN', 'PHI', 'SDP', 'BOS', 'HOU', 'ATL',
               'TOR', 'SFN', 'CHN', 'TEX', 'SEA', 'ANA', 'STL', 'MIN',
               'DET', 'COL', 'ARI', 'MIA', 'CHA', 'CIN', 'MIL', 'CLE',
               'KCA', 'PIT', 'WAS', 'TBA', 'OAK', 'BAL'],
    'payroll': [280, 265, 240, 235, 230, 215, 205, 195,
                190, 185, 180, 175, 170, 165, 160, 155,
                150, 145, 140, 135, 130, 125, 120, 115,
                110, 105, 100, 90, 85, 80]
})

# Merge datasets
team_data = team_records.merge(payroll_data, on='teamID', how='left')
team_data = team_data.dropna(subset=['payroll'])

# Calculate metrics
team_data['cost_per_win'] = team_data['payroll'] / team_data['W']
team_data['playoff_team'] = team_data['W'] >= 87
team_data['efficiency_category'] = pd.cut(
    team_data['cost_per_win'],
    bins=[0, 1.5, 2.5, np.inf],
    labels=['High Efficiency', 'Average Efficiency', 'Low Efficiency']
)

# Create hover text
team_data['hover_text'] = (
    '<b>' + team_data['name'] + '</b><br>' +
    'Payroll: $' + team_data['payroll'].round(0).astype(str) + 'M<br>' +
    'Wins: ' + team_data['W'].astype(str) + '<br>' +
    'Cost/Win: $' + team_data['cost_per_win'].round(2).astype(str) + 'M<br>' +
    'Win %: ' + team_data['win_pct'].round(3).astype(str)
)

# Create interactive scatter plot
fig = px.scatter(
    team_data,
    x='payroll',
    y='W',
    color='playoff_team',
    color_discrete_map={True: 'darkgreen', False: 'lightblue'},
    hover_data={'payroll': False, 'W': False, 'playoff_team': False},
    labels={'payroll': 'Payroll ($M)', 'W': 'Wins', 'playoff_team': 'Made Playoffs'},
    title='MLB Payroll vs Performance (2023)'
)

# Update traces with custom hover text
fig.update_traces(
    marker=dict(size=12, opacity=0.7),
    hovertemplate='%{customdata[0]}<extra></extra>',
    customdata=team_data[['hover_text']].values
)

# Add trend line
z = np.polyfit(team_data['payroll'], team_data['W'], 1)
p = np.poly1d(z)
x_trend = np.linspace(team_data['payroll'].min(), team_data['payroll'].max(), 100)
y_trend = p(x_trend)

fig.add_trace(
    go.Scatter(
        x=x_trend,
        y=y_trend,
        mode='lines',
        name='Expected Wins',
        line=dict(color='red', dash='dash'),
        showlegend=True,
        hoverinfo='skip'
    )
)

# Add annotations for best and worst value
best_value = team_data[team_data['playoff_team']].nsmallest(1, 'cost_per_win').iloc[0]
worst_value = team_data.nlargest(1, 'cost_per_win').iloc[0]

fig.add_annotation(
    x=best_value['payroll'],
    y=best_value['W'],
    text=f"{best_value['name']}<br>Best Value",
    showarrow=True,
    arrowhead=2,
    ax=30,
    ay=-40
)

fig.add_annotation(
    x=worst_value['payroll'],
    y=worst_value['W'],
    text=f"{worst_value['name']}<br>Worst Value",
    showarrow=True,
    arrowhead=2,
    ax=-30,
    ay=40
)

fig.update_layout(
    hovermode='closest',
    height=600,
    xaxis_title='Payroll ($M)',
    yaxis_title='Wins',
    legend_title='Made Playoffs'
)

fig.show()

This interactive visualization immediately reveals market inefficiencies. Teams above the trend line (more wins than expected for their payroll) demonstrate superior roster construction or player development. Teams below the line overpaid relative to results. The ability to hover over individual teams provides context that static charts miss.

WAR Distribution Sunburst Chart

Understanding how value is distributed across a roster is critical for identifying construction weaknesses and allocation opportunities. A sunburst chart provides hierarchical visualization of team WAR distributed by position and individual players.

library(tidyverse)
library(plotly)
library(Lahman)

# Create WAR distribution by team and position
create_war_sunburst <- function(team_abbr = "HOU", year = 2023) {

  # Get player batting WAR for the team
  # Note: Using a simplified WAR calculation - replace with actual WAR data
  batting_war <- Batting %>%
    filter(teamID == team_abbr, yearID == year, AB >= 50) %>%
    left_join(People %>% select(playerID, nameFirst, nameLast),
              by = "playerID") %>%
    mutate(
      player_name = paste(nameFirst, nameLast),
      # Simplified WAR calculation (replace with actual)
      war_approx = ((H + BB) / (AB + BB) - 0.320) * AB / 60 +
                   (HR * 0.5) + (SB * 0.2) - (CS * 0.4),
      war = pmax(0, war_approx)  # Keep positive only
    ) %>%
    select(playerID, player_name, war)

  # Get positional data
  fielding_pos <- Fielding %>%
    filter(teamID == team_abbr, yearID == year) %>%
    group_by(playerID) %>%
    slice_max(G, n = 1) %>%
    select(playerID, POS) %>%
    mutate(
      position_group = case_when(
        POS %in% c("C", "1B", "2B", "3B", "SS") ~ "Infield",
        POS %in% c("LF", "CF", "RF") ~ "Outfield",
        POS == "DH" ~ "DH",
        TRUE ~ "Pitcher"
      )
    )

  # Combine data
  war_data <- batting_war %>%
    left_join(fielding_pos, by = "playerID") %>%
    filter(!is.na(POS)) %>%
    group_by(position_group, POS) %>%
    arrange(desc(war)) %>%
    mutate(rank = row_number()) %>%
    filter(rank <= 3) %>%  # Top 3 per position
    ungroup()

  # Create hierarchical data for sunburst
  # Level 1: Position groups
  level1 <- war_data %>%
    group_by(position_group) %>%
    summarise(war = sum(war)) %>%
    mutate(
      labels = position_group,
      parents = "",
      values = war
    )

  # Level 2: Specific positions
  level2 <- war_data %>%
    group_by(position_group, POS) %>%
    summarise(war = sum(war), .groups = 'drop') %>%
    mutate(
      labels = POS,
      parents = position_group,
      values = war
    )

  # Level 3: Individual players
  level3 <- war_data %>%
    mutate(
      labels = player_name,
      parents = POS,
      values = war
    ) %>%
    select(labels, parents, values)

  # Combine all levels
  sunburst_data <- bind_rows(level1, level2, level3) %>%
    select(labels, parents, values)

  # Create sunburst plot
  fig <- plot_ly(
    labels = sunburst_data$labels,
    parents = sunburst_data$parents,
    values = sunburst_data$values,
    type = 'sunburst',
    branchvalues = "total",
    hovertemplate = '<b>%{label}</b><br>WAR: %{value:.1f}<extra></extra>'
  ) %>%
    layout(
      title = paste0(team_abbr, " WAR Distribution by Position (", year, ")"),
      margin = list(l = 0, r = 0, t = 50, b = 0)
    )

  return(fig)
}

# Create visualization
fig <- create_war_sunburst("HOU", 2023)
fig

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from pybaseball import lahman

def create_war_sunburst(team_abbr='HOU', year=2023):
    """
    Create sunburst chart showing WAR distribution by position

    Parameters:
    -----------
    team_abbr : str
        Team abbreviation (e.g., 'HOU', 'LAD')
    year : int
        Season year
    """

    # Load data
    batting = lahman.batting()
    fielding = lahman.fielding()
    people = lahman.people()

    # Filter batting data for team
    team_batting = batting[
        (batting['teamID'] == team_abbr) &
        (batting['yearID'] == year) &
        (batting['AB'] >= 50)
    ].copy()

    # Merge with player names
    team_batting = team_batting.merge(
        people[['playerID', 'nameFirst', 'nameLast']],
        on='playerID'
    )
    team_batting['player_name'] = (team_batting['nameFirst'] + ' ' +
                                   team_batting['nameLast'])

    # Calculate simplified WAR (replace with actual WAR data)
    team_batting['war_approx'] = (
        ((team_batting['H'] + team_batting['BB']) /
         (team_batting['AB'] + team_batting['BB']) - 0.320) *
        team_batting['AB'] / 60 +
        (team_batting['HR'] * 0.5) +
        (team_batting['SB'] * 0.2) -
        (team_batting['CS'] * 0.4)
    )
    team_batting['war'] = team_batting['war_approx'].clip(lower=0)

    # Get primary position for each player
    team_fielding = fielding[
        (fielding['teamID'] == team_abbr) &
        (fielding['yearID'] == year)
    ].copy()

    primary_pos = team_fielding.loc[
        team_fielding.groupby('playerID')['G'].idxmax()
    ][['playerID', 'POS']]

    # Add position grouping
    def classify_position(pos):
        if pos in ['C', '1B', '2B', '3B', 'SS']:
            return 'Infield'
        elif pos in ['LF', 'CF', 'RF']:
            return 'Outfield'
        elif pos == 'DH':
            return 'DH'
        else:
            return 'Pitcher'

    primary_pos['position_group'] = primary_pos['POS'].apply(classify_position)

    # Merge with WAR data
    war_data = team_batting.merge(primary_pos, on='playerID', how='left')
    war_data = war_data.dropna(subset=['POS'])

    # Get top 3 players per position
    war_data['rank'] = war_data.groupby('POS')['war'].rank(
        method='dense', ascending=False
    )
    war_data = war_data[war_data['rank'] <= 3]

    # Build hierarchical data
    # Level 1: Position groups
    level1 = war_data.groupby('position_group')['war'].sum().reset_index()
    level1['labels'] = level1['position_group']
    level1['parents'] = ''
    level1['values'] = level1['war']

    # Level 2: Specific positions
    level2 = war_data.groupby(['position_group', 'POS'])['war'].sum().reset_index()
    level2['labels'] = level2['POS']
    level2['parents'] = level2['position_group']
    level2['values'] = level2['war']

    # Level 3: Individual players
    level3 = war_data[['player_name', 'POS', 'war']].copy()
    level3['labels'] = level3['player_name']
    level3['parents'] = level3['POS']
    level3['values'] = level3['war']

    # Combine all levels
    sunburst_data = pd.concat([
        level1[['labels', 'parents', 'values']],
        level2[['labels', 'parents', 'values']],
        level3[['labels', 'parents', 'values']]
    ], ignore_index=True)

    # Create sunburst chart
    fig = go.Figure(go.Sunburst(
        labels=sunburst_data['labels'],
        parents=sunburst_data['parents'],
        values=sunburst_data['values'],
        branchvalues="total",
        hovertemplate='<b>%{label}</b><br>WAR: %{value:.1f}<extra></extra>'
    ))

    fig.update_layout(
        title=f'{team_abbr} WAR Distribution by Position ({year})',
        margin=dict(l=0, r=0, t=50, b=0),
        height=600
    )

    return fig

# Create visualization
fig = create_war_sunburst('HOU', 2023)
fig.show()

The sunburst chart provides immediate visual feedback on roster balance. A well-constructed team shows relatively even distribution across position groups, avoiding over-reliance on one or two players. The hierarchical structure lets analysts drill down from position groups (Infield, Outfield) to specific positions (SS, CF) to individual contributors.

Free Agent Valuation Calculator

Front offices need tools to quickly evaluate whether free agent contracts represent good value. An interactive calculator allows users to input player projections and contract terms, instantly seeing the financial implications and comparing against market benchmarks.

library(tidyverse)
library(plotly)

# Free agent valuation calculator with visualization
create_fa_calculator <- function() {

  # Define market parameters
  market_rate_per_war <- 12  # Million dollars per WAR
  discount_rate <- 0.05      # 5% annual discount rate

  # Example contract scenarios
  contracts <- tibble(
    player = c("Star Player", "Mid-Tier Player", "Value Signing",
               "Overpay Example", "Young FA"),
    years = c(6, 4, 2, 5, 7),
    aav = c(35, 18, 8, 22, 28),  # Average annual value (millions)
    year_1_war = c(5.5, 3.5, 2.0, 2.5, 4.5),
    annual_decline = c(0.3, 0.25, 0.15, 0.35, 0.2)
  )

  # Calculate projected WAR for each year of contract
  calculate_contract_value <- function(years, year_1_war, annual_decline,
                                      aav, discount_rate, market_rate) {

    war_by_year <- numeric(years)
    for (i in 1:years) {
      war_by_year[i] <- max(0, year_1_war - (i - 1) * annual_decline)
    }

    # Calculate present value
    discount_factors <- 1 / ((1 + discount_rate) ^ (0:(years-1)))
    pv_war <- sum(war_by_year * discount_factors)
    pv_salary <- sum(rep(aav, years) * discount_factors)

    # Market value
    market_value <- pv_war * market_rate

    # Surplus/deficit
    surplus <- market_value - pv_salary

    return(list(
      total_war = sum(war_by_year),
      pv_war = pv_war,
      pv_salary = pv_salary,
      market_value = market_value,
      surplus = surplus,
      cost_per_war = pv_salary / pv_war
    ))
  }

  # Calculate for all contracts
  contract_analysis <- contracts %>%
    rowwise() %>%
    mutate(
      results = list(calculate_contract_value(
        years, year_1_war, annual_decline, aav, discount_rate, market_rate_per_war
      ))
    ) %>%
    unnest_wider(results) %>%
    mutate(
      total_salary = aav * years,
      value_category = case_when(
        surplus > 20 ~ "Excellent Value",
        surplus > 0 ~ "Good Value",
        surplus > -20 ~ "Market Rate",
        TRUE ~ "Overpay"
      )
    )

  # Create interactive bar chart
  fig <- plot_ly(contract_analysis,
                 x = ~player,
                 y = ~surplus,
                 type = 'bar',
                 color = ~value_category,
                 colors = c("Excellent Value" = "darkgreen",
                           "Good Value" = "lightgreen",
                           "Market Rate" = "yellow",
                           "Overpay" = "red"),
                 text = ~paste("<b>", player, "</b>",
                              "<br>Contract:", years, "years @", aav, "M/yr",
                              "<br>Total Salary: $", round(total_salary, 0), "M",
                              "<br>PV Salary: $", round(pv_salary, 1), "M",
                              "<br>Projected WAR:", round(total_war, 1),
                              "<br>PV WAR:", round(pv_war, 1),
                              "<br>Market Value: $", round(market_value, 1), "M",
                              "<br>Surplus: $", round(surplus, 1), "M",
                              "<br>Cost/WAR: $", round(cost_per_war, 1), "M"),
                 hoverinfo = 'text') %>%
    layout(
      title = "Free Agent Contract Valuation Analysis",
      xaxis = list(title = ""),
      yaxis = list(title = "Surplus Value ($M)"),
      hovermode = 'closest',
      showlegend = TRUE
    ) %>%
    add_hline(y = 0, line = list(color = "black", dash = "dash"))

  return(list(data = contract_analysis, plot = fig))
}

# Run calculator
results <- create_fa_calculator()
results$plot
print(results$data)

import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px

def calculate_contract_value(years, year_1_war, annual_decline, aav,
                            discount_rate=0.05, market_rate=12):
    """
    Calculate present value and surplus for a free agent contract

    Parameters:
    -----------
    years : int
        Contract length
    year_1_war : float
        Projected WAR in first year
    annual_decline : float
        Expected annual WAR decline
    aav : float
        Average annual value (millions)
    discount_rate : float
        Annual discount rate
    market_rate : float
        Market rate per WAR (millions)
    """

    # Project WAR for each year
    war_by_year = [max(0, year_1_war - i * annual_decline) for i in range(years)]

    # Calculate present values
    discount_factors = [1 / ((1 + discount_rate) ** i) for i in range(years)]
    pv_war = sum(w * d for w, d in zip(war_by_year, discount_factors))
    pv_salary = sum(aav * d for d in discount_factors)

    # Calculate surplus
    market_value = pv_war * market_rate
    surplus = market_value - pv_salary

    return {
        'total_war': sum(war_by_year),
        'pv_war': pv_war,
        'pv_salary': pv_salary,
        'market_value': market_value,
        'surplus': surplus,
        'cost_per_war': pv_salary / pv_war if pv_war > 0 else 0
    }

def create_fa_calculator():
    """Create interactive free agent valuation calculator"""

    # Example contract scenarios
    contracts = pd.DataFrame({
        'player': ['Star Player', 'Mid-Tier Player', 'Value Signing',
                   'Overpay Example', 'Young FA'],
        'years': [6, 4, 2, 5, 7],
        'aav': [35, 18, 8, 22, 28],
        'year_1_war': [5.5, 3.5, 2.0, 2.5, 4.5],
        'annual_decline': [0.3, 0.25, 0.15, 0.35, 0.2]
    })

    # Calculate values for each contract
    results = []
    for _, row in contracts.iterrows():
        calc = calculate_contract_value(
            row['years'], row['year_1_war'], row['annual_decline'], row['aav']
        )
        results.append(calc)

    # Add results to dataframe
    results_df = pd.DataFrame(results)
    contract_analysis = pd.concat([contracts, results_df], axis=1)

    # Calculate total salary and categorize
    contract_analysis['total_salary'] = (contract_analysis['aav'] *
                                         contract_analysis['years'])

    def categorize_value(surplus):
        if surplus > 20:
            return 'Excellent Value'
        elif surplus > 0:
            return 'Good Value'
        elif surplus > -20:
            return 'Market Rate'
        else:
            return 'Overpay'

    contract_analysis['value_category'] = (contract_analysis['surplus']
                                           .apply(categorize_value))

    # Create hover text
    contract_analysis['hover_text'] = (
        '<b>' + contract_analysis['player'] + '</b><br>' +
        'Contract: ' + contract_analysis['years'].astype(str) + ' years @ $' +
        contract_analysis['aav'].astype(str) + 'M/yr<br>' +
        'Total Salary: $' + contract_analysis['total_salary'].round(0).astype(str) + 'M<br>' +
        'PV Salary: $' + contract_analysis['pv_salary'].round(1).astype(str) + 'M<br>' +
        'Projected WAR: ' + contract_analysis['total_war'].round(1).astype(str) + '<br>' +
        'PV WAR: ' + contract_analysis['pv_war'].round(1).astype(str) + '<br>' +
        'Market Value: $' + contract_analysis['market_value'].round(1).astype(str) + 'M<br>' +
        'Surplus: $' + contract_analysis['surplus'].round(1).astype(str) + 'M<br>' +
        'Cost/WAR: $' + contract_analysis['cost_per_war'].round(1).astype(str) + 'M'
    )

    # Create bar chart
    color_map = {
        'Excellent Value': 'darkgreen',
        'Good Value': 'lightgreen',
        'Market Rate': 'yellow',
        'Overpay': 'red'
    }

    fig = px.bar(
        contract_analysis,
        x='player',
        y='surplus',
        color='value_category',
        color_discrete_map=color_map,
        title='Free Agent Contract Valuation Analysis',
        labels={'player': '', 'surplus': 'Surplus Value ($M)'}
    )

    fig.update_traces(
        hovertemplate='%{customdata[0]}<extra></extra>',
        customdata=contract_analysis[['hover_text']].values
    )

    fig.add_hline(
        y=0,
        line_dash="dash",
        line_color="black",
        annotation_text="Break-even"
    )

    fig.update_layout(
        hovermode='closest',
        height=600,
        showlegend=True,
        legend_title='Value Assessment'
    )

    return contract_analysis, fig

# Run calculator
contract_analysis, fig = create_fa_calculator()
fig.show()
print("\nContract Analysis Summary:")
print(contract_analysis[['player', 'years', 'aav', 'total_war', 'surplus',
                         'cost_per_war', 'value_category']])

This free agent calculator transforms abstract contract negotiations into concrete value assessments. By visualizing surplus value (market value minus cost), front offices can quickly identify which signings create value and which destroy it. The tool accounts for aging curves through annual decline rates and applies financial discounting to properly value future performance.

These interactive roster tools empower data-driven decision-making by making complex analyses accessible and intuitive. Teams can explore different scenarios, benchmark against competitors, and identify inefficiencies that create competitive advantages. As roster construction becomes increasingly sophisticated, interactive visualization becomes essential for translating analytics into action.

R

library(tidyverse)
library(plotly)
library(Lahman)

# Create comprehensive payroll-wins dataset
# Note: This uses simulated 2023 data; replace with actual payroll data
create_payroll_analysis <- function(year = 2023) {

  # Get team wins for the year
  team_records <- Teams %>%
    filter(yearID == year) %>%
    select(teamID, name, W, L, G) %>%
    mutate(
      win_pct = W / (W + L),
      team_abbr = teamID
    )

  # Simulated payroll data (in millions) - replace with actual data
  payroll_data <- tibble(
    teamID = c("NYA", "LAN", "NYN", "PHI", "SDP", "BOS", "HOU", "ATL",
               "TOR", "SFN", "CHN", "TEX", "SEA", "ANA", "STL", "MIN",
               "DET", "COL", "ARI", "MIA", "CHA", "CIN", "MIL", "CLE",
               "KCA", "PIT", "WAS", "TBA", "OAK", "BAL"),
    payroll = c(280, 265, 240, 235, 230, 215, 205, 195,
               190, 185, 180, 175, 170, 165, 160, 155,
               150, 145, 140, 135, 130, 125, 120, 115,
               110, 105, 100, 90, 85, 80)
  )

  # Combine datasets
  analysis_data <- team_records %>%
    left_join(payroll_data, by = "teamID") %>%
    filter(!is.na(payroll)) %>%
    mutate(
      cost_per_win = payroll / W,
      playoff_team = W >= 87,  # Approximate playoff threshold
      efficiency_category = case_when(
        cost_per_win < 1.5 ~ "High Efficiency",
        cost_per_win < 2.5 ~ "Average Efficiency",
        TRUE ~ "Low Efficiency"
      )
    )

  return(analysis_data)
}

# Generate data and create interactive plot
team_data <- create_payroll_analysis(2023)

# Create interactive scatter plot
p <- plot_ly(team_data,
             x = ~payroll,
             y = ~W,
             type = 'scatter',
             mode = 'markers',
             color = ~playoff_team,
             colors = c("FALSE" = "lightblue", "TRUE" = "darkgreen"),
             marker = list(size = 12, opacity = 0.7),
             text = ~paste("<b>", name, "</b>",
                          "<br>Payroll: $", round(payroll, 0), "M",
                          "<br>Wins:", W,
                          "<br>Cost/Win: $", sprintf("%.2f", cost_per_win), "M",
                          "<br>Win %:", sprintf("%.3f", win_pct)),
             hoverinfo = 'text') %>%
  add_trace(
    type = 'scatter',
    mode = 'lines',
    x = c(80, 280),
    y = c(70, 95),  # Approximate trend line
    line = list(color = 'red', dash = 'dash'),
    name = 'Expected Wins',
    showlegend = TRUE,
    hoverinfo = 'skip'
  ) %>%
  layout(
    title = "MLB Payroll vs Performance (2023)",
    xaxis = list(title = "Payroll ($M)"),
    yaxis = list(title = "Wins"),
    hovermode = 'closest',
    legend = list(title = list(text = "Made Playoffs"))
  )

# Add annotations for outliers
best_efficiency <- team_data %>%
  filter(playoff_team) %>%
  slice_min(cost_per_win, n = 1)

worst_efficiency <- team_data %>%
  slice_max(cost_per_win, n = 1)

p <- p %>%
  add_annotations(
    x = best_efficiency$payroll,
    y = best_efficiency$W,
    text = paste0(best_efficiency$name, "<br>Best Value"),
    showarrow = TRUE,
    arrowhead = 2,
    ax = 30,
    ay = -40
  ) %>%
  add_annotations(
    x = worst_efficiency$payroll,
    y = worst_efficiency$W,
    text = paste0(worst_efficiency$name, "<br>Worst Value"),
    showarrow = TRUE,
    arrowhead = 2,
    ax = -30,
    ay = 40
  )

p

R

library(tidyverse)
library(plotly)
library(Lahman)

# Create WAR distribution by team and position
create_war_sunburst <- function(team_abbr = "HOU", year = 2023) {

  # Get player batting WAR for the team
  # Note: Using a simplified WAR calculation - replace with actual WAR data
  batting_war <- Batting %>%
    filter(teamID == team_abbr, yearID == year, AB >= 50) %>%
    left_join(People %>% select(playerID, nameFirst, nameLast),
              by = "playerID") %>%
    mutate(
      player_name = paste(nameFirst, nameLast),
      # Simplified WAR calculation (replace with actual)
      war_approx = ((H + BB) / (AB + BB) - 0.320) * AB / 60 +
                   (HR * 0.5) + (SB * 0.2) - (CS * 0.4),
      war = pmax(0, war_approx)  # Keep positive only
    ) %>%
    select(playerID, player_name, war)

  # Get positional data
  fielding_pos <- Fielding %>%
    filter(teamID == team_abbr, yearID == year) %>%
    group_by(playerID) %>%
    slice_max(G, n = 1) %>%
    select(playerID, POS) %>%
    mutate(
      position_group = case_when(
        POS %in% c("C", "1B", "2B", "3B", "SS") ~ "Infield",
        POS %in% c("LF", "CF", "RF") ~ "Outfield",
        POS == "DH" ~ "DH",
        TRUE ~ "Pitcher"
      )
    )

  # Combine data
  war_data <- batting_war %>%
    left_join(fielding_pos, by = "playerID") %>%
    filter(!is.na(POS)) %>%
    group_by(position_group, POS) %>%
    arrange(desc(war)) %>%
    mutate(rank = row_number()) %>%
    filter(rank <= 3) %>%  # Top 3 per position
    ungroup()

  # Create hierarchical data for sunburst
  # Level 1: Position groups
  level1 <- war_data %>%
    group_by(position_group) %>%
    summarise(war = sum(war)) %>%
    mutate(
      labels = position_group,
      parents = "",
      values = war
    )

  # Level 2: Specific positions
  level2 <- war_data %>%
    group_by(position_group, POS) %>%
    summarise(war = sum(war), .groups = 'drop') %>%
    mutate(
      labels = POS,
      parents = position_group,
      values = war
    )

  # Level 3: Individual players
  level3 <- war_data %>%
    mutate(
      labels = player_name,
      parents = POS,
      values = war
    ) %>%
    select(labels, parents, values)

  # Combine all levels
  sunburst_data <- bind_rows(level1, level2, level3) %>%
    select(labels, parents, values)

  # Create sunburst plot
  fig <- plot_ly(
    labels = sunburst_data$labels,
    parents = sunburst_data$parents,
    values = sunburst_data$values,
    type = 'sunburst',
    branchvalues = "total",
    hovertemplate = '<b>%{label}</b><br>WAR: %{value:.1f}<extra></extra>'
  ) %>%
    layout(
      title = paste0(team_abbr, " WAR Distribution by Position (", year, ")"),
      margin = list(l = 0, r = 0, t = 50, b = 0)
    )

  return(fig)
}

# Create visualization
fig <- create_war_sunburst("HOU", 2023)
fig

R

library(tidyverse)
library(plotly)

# Free agent valuation calculator with visualization
create_fa_calculator <- function() {

  # Define market parameters
  market_rate_per_war <- 12  # Million dollars per WAR
  discount_rate <- 0.05      # 5% annual discount rate

  # Example contract scenarios
  contracts <- tibble(
    player = c("Star Player", "Mid-Tier Player", "Value Signing",
               "Overpay Example", "Young FA"),
    years = c(6, 4, 2, 5, 7),
    aav = c(35, 18, 8, 22, 28),  # Average annual value (millions)
    year_1_war = c(5.5, 3.5, 2.0, 2.5, 4.5),
    annual_decline = c(0.3, 0.25, 0.15, 0.35, 0.2)
  )

  # Calculate projected WAR for each year of contract
  calculate_contract_value <- function(years, year_1_war, annual_decline,
                                      aav, discount_rate, market_rate) {

    war_by_year <- numeric(years)
    for (i in 1:years) {
      war_by_year[i] <- max(0, year_1_war - (i - 1) * annual_decline)
    }

    # Calculate present value
    discount_factors <- 1 / ((1 + discount_rate) ^ (0:(years-1)))
    pv_war <- sum(war_by_year * discount_factors)
    pv_salary <- sum(rep(aav, years) * discount_factors)

    # Market value
    market_value <- pv_war * market_rate

    # Surplus/deficit
    surplus <- market_value - pv_salary

    return(list(
      total_war = sum(war_by_year),
      pv_war = pv_war,
      pv_salary = pv_salary,
      market_value = market_value,
      surplus = surplus,
      cost_per_war = pv_salary / pv_war
    ))
  }

  # Calculate for all contracts
  contract_analysis <- contracts %>%
    rowwise() %>%
    mutate(
      results = list(calculate_contract_value(
        years, year_1_war, annual_decline, aav, discount_rate, market_rate_per_war
      ))
    ) %>%
    unnest_wider(results) %>%
    mutate(
      total_salary = aav * years,
      value_category = case_when(
        surplus > 20 ~ "Excellent Value",
        surplus > 0 ~ "Good Value",
        surplus > -20 ~ "Market Rate",
        TRUE ~ "Overpay"
      )
    )

  # Create interactive bar chart
  fig <- plot_ly(contract_analysis,
                 x = ~player,
                 y = ~surplus,
                 type = 'bar',
                 color = ~value_category,
                 colors = c("Excellent Value" = "darkgreen",
                           "Good Value" = "lightgreen",
                           "Market Rate" = "yellow",
                           "Overpay" = "red"),
                 text = ~paste("<b>", player, "</b>",
                              "<br>Contract:", years, "years @", aav, "M/yr",
                              "<br>Total Salary: $", round(total_salary, 0), "M",
                              "<br>PV Salary: $", round(pv_salary, 1), "M",
                              "<br>Projected WAR:", round(total_war, 1),
                              "<br>PV WAR:", round(pv_war, 1),
                              "<br>Market Value: $", round(market_value, 1), "M",
                              "<br>Surplus: $", round(surplus, 1), "M",
                              "<br>Cost/WAR: $", round(cost_per_war, 1), "M"),
                 hoverinfo = 'text') %>%
    layout(
      title = "Free Agent Contract Valuation Analysis",
      xaxis = list(title = ""),
      yaxis = list(title = "Surplus Value ($M)"),
      hovermode = 'closest',
      showlegend = TRUE
    ) %>%
    add_hline(y = 0, line = list(color = "black", dash = "dash"))

  return(list(data = contract_analysis, plot = fig))
}

# Run calculator
results <- create_fa_calculator()
results$plot
print(results$data)

Python

import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from pybaseball import lahman

# Load team data
teams = lahman.teams()

# Filter for recent season
team_records = teams[teams['yearID'] == 2023][['teamID', 'name', 'W', 'L', 'G']].copy()
team_records['win_pct'] = team_records['W'] / (team_records['W'] + team_records['L'])

# Simulated payroll data (replace with actual data)
payroll_data = pd.DataFrame({
    'teamID': ['NYA', 'LAN', 'NYN', 'PHI', 'SDP', 'BOS', 'HOU', 'ATL',
               'TOR', 'SFN', 'CHN', 'TEX', 'SEA', 'ANA', 'STL', 'MIN',
               'DET', 'COL', 'ARI', 'MIA', 'CHA', 'CIN', 'MIL', 'CLE',
               'KCA', 'PIT', 'WAS', 'TBA', 'OAK', 'BAL'],
    'payroll': [280, 265, 240, 235, 230, 215, 205, 195,
                190, 185, 180, 175, 170, 165, 160, 155,
                150, 145, 140, 135, 130, 125, 120, 115,
                110, 105, 100, 90, 85, 80]
})

# Merge datasets
team_data = team_records.merge(payroll_data, on='teamID', how='left')
team_data = team_data.dropna(subset=['payroll'])

# Calculate metrics
team_data['cost_per_win'] = team_data['payroll'] / team_data['W']
team_data['playoff_team'] = team_data['W'] >= 87
team_data['efficiency_category'] = pd.cut(
    team_data['cost_per_win'],
    bins=[0, 1.5, 2.5, np.inf],
    labels=['High Efficiency', 'Average Efficiency', 'Low Efficiency']
)

# Create hover text
team_data['hover_text'] = (
    '<b>' + team_data['name'] + '</b><br>' +
    'Payroll: $' + team_data['payroll'].round(0).astype(str) + 'M<br>' +
    'Wins: ' + team_data['W'].astype(str) + '<br>' +
    'Cost/Win: $' + team_data['cost_per_win'].round(2).astype(str) + 'M<br>' +
    'Win %: ' + team_data['win_pct'].round(3).astype(str)
)

# Create interactive scatter plot
fig = px.scatter(
    team_data,
    x='payroll',
    y='W',
    color='playoff_team',
    color_discrete_map={True: 'darkgreen', False: 'lightblue'},
    hover_data={'payroll': False, 'W': False, 'playoff_team': False},
    labels={'payroll': 'Payroll ($M)', 'W': 'Wins', 'playoff_team': 'Made Playoffs'},
    title='MLB Payroll vs Performance (2023)'
)

# Update traces with custom hover text
fig.update_traces(
    marker=dict(size=12, opacity=0.7),
    hovertemplate='%{customdata[0]}<extra></extra>',
    customdata=team_data[['hover_text']].values
)

# Add trend line
z = np.polyfit(team_data['payroll'], team_data['W'], 1)
p = np.poly1d(z)
x_trend = np.linspace(team_data['payroll'].min(), team_data['payroll'].max(), 100)
y_trend = p(x_trend)

fig.add_trace(
    go.Scatter(
        x=x_trend,
        y=y_trend,
        mode='lines',
        name='Expected Wins',
        line=dict(color='red', dash='dash'),
        showlegend=True,
        hoverinfo='skip'
    )
)

# Add annotations for best and worst value
best_value = team_data[team_data['playoff_team']].nsmallest(1, 'cost_per_win').iloc[0]
worst_value = team_data.nlargest(1, 'cost_per_win').iloc[0]

fig.add_annotation(
    x=best_value['payroll'],
    y=best_value['W'],
    text=f"{best_value['name']}<br>Best Value",
    showarrow=True,
    arrowhead=2,
    ax=30,
    ay=-40
)

fig.add_annotation(
    x=worst_value['payroll'],
    y=worst_value['W'],
    text=f"{worst_value['name']}<br>Worst Value",
    showarrow=True,
    arrowhead=2,
    ax=-30,
    ay=40
)

fig.update_layout(
    hovermode='closest',
    height=600,
    xaxis_title='Payroll ($M)',
    yaxis_title='Wins',
    legend_title='Made Playoffs'
)

fig.show()

Python

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from pybaseball import lahman

def create_war_sunburst(team_abbr='HOU', year=2023):
    """
    Create sunburst chart showing WAR distribution by position

    Parameters:
    -----------
    team_abbr : str
        Team abbreviation (e.g., 'HOU', 'LAD')
    year : int
        Season year
    """

    # Load data
    batting = lahman.batting()
    fielding = lahman.fielding()
    people = lahman.people()

    # Filter batting data for team
    team_batting = batting[
        (batting['teamID'] == team_abbr) &
        (batting['yearID'] == year) &
        (batting['AB'] >= 50)
    ].copy()

    # Merge with player names
    team_batting = team_batting.merge(
        people[['playerID', 'nameFirst', 'nameLast']],
        on='playerID'
    )
    team_batting['player_name'] = (team_batting['nameFirst'] + ' ' +
                                   team_batting['nameLast'])

    # Calculate simplified WAR (replace with actual WAR data)
    team_batting['war_approx'] = (
        ((team_batting['H'] + team_batting['BB']) /
         (team_batting['AB'] + team_batting['BB']) - 0.320) *
        team_batting['AB'] / 60 +
        (team_batting['HR'] * 0.5) +
        (team_batting['SB'] * 0.2) -
        (team_batting['CS'] * 0.4)
    )
    team_batting['war'] = team_batting['war_approx'].clip(lower=0)

    # Get primary position for each player
    team_fielding = fielding[
        (fielding['teamID'] == team_abbr) &
        (fielding['yearID'] == year)
    ].copy()

    primary_pos = team_fielding.loc[
        team_fielding.groupby('playerID')['G'].idxmax()
    ][['playerID', 'POS']]

    # Add position grouping
    def classify_position(pos):
        if pos in ['C', '1B', '2B', '3B', 'SS']:
            return 'Infield'
        elif pos in ['LF', 'CF', 'RF']:
            return 'Outfield'
        elif pos == 'DH':
            return 'DH'
        else:
            return 'Pitcher'

    primary_pos['position_group'] = primary_pos['POS'].apply(classify_position)

    # Merge with WAR data
    war_data = team_batting.merge(primary_pos, on='playerID', how='left')
    war_data = war_data.dropna(subset=['POS'])

    # Get top 3 players per position
    war_data['rank'] = war_data.groupby('POS')['war'].rank(
        method='dense', ascending=False
    )
    war_data = war_data[war_data['rank'] <= 3]

    # Build hierarchical data
    # Level 1: Position groups
    level1 = war_data.groupby('position_group')['war'].sum().reset_index()
    level1['labels'] = level1['position_group']
    level1['parents'] = ''
    level1['values'] = level1['war']

    # Level 2: Specific positions
    level2 = war_data.groupby(['position_group', 'POS'])['war'].sum().reset_index()
    level2['labels'] = level2['POS']
    level2['parents'] = level2['position_group']
    level2['values'] = level2['war']

    # Level 3: Individual players
    level3 = war_data[['player_name', 'POS', 'war']].copy()
    level3['labels'] = level3['player_name']
    level3['parents'] = level3['POS']
    level3['values'] = level3['war']

    # Combine all levels
    sunburst_data = pd.concat([
        level1[['labels', 'parents', 'values']],
        level2[['labels', 'parents', 'values']],
        level3[['labels', 'parents', 'values']]
    ], ignore_index=True)

    # Create sunburst chart
    fig = go.Figure(go.Sunburst(
        labels=sunburst_data['labels'],
        parents=sunburst_data['parents'],
        values=sunburst_data['values'],
        branchvalues="total",
        hovertemplate='<b>%{label}</b><br>WAR: %{value:.1f}<extra></extra>'
    ))

    fig.update_layout(
        title=f'{team_abbr} WAR Distribution by Position ({year})',
        margin=dict(l=0, r=0, t=50, b=0),
        height=600
    )

    return fig

# Create visualization
fig = create_war_sunburst('HOU', 2023)
fig.show()

Python

import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px

def calculate_contract_value(years, year_1_war, annual_decline, aav,
                            discount_rate=0.05, market_rate=12):
    """
    Calculate present value and surplus for a free agent contract

    Parameters:
    -----------
    years : int
        Contract length
    year_1_war : float
        Projected WAR in first year
    annual_decline : float
        Expected annual WAR decline
    aav : float
        Average annual value (millions)
    discount_rate : float
        Annual discount rate
    market_rate : float
        Market rate per WAR (millions)
    """

    # Project WAR for each year
    war_by_year = [max(0, year_1_war - i * annual_decline) for i in range(years)]

    # Calculate present values
    discount_factors = [1 / ((1 + discount_rate) ** i) for i in range(years)]
    pv_war = sum(w * d for w, d in zip(war_by_year, discount_factors))
    pv_salary = sum(aav * d for d in discount_factors)

    # Calculate surplus
    market_value = pv_war * market_rate
    surplus = market_value - pv_salary

    return {
        'total_war': sum(war_by_year),
        'pv_war': pv_war,
        'pv_salary': pv_salary,
        'market_value': market_value,
        'surplus': surplus,
        'cost_per_war': pv_salary / pv_war if pv_war > 0 else 0
    }

def create_fa_calculator():
    """Create interactive free agent valuation calculator"""

    # Example contract scenarios
    contracts = pd.DataFrame({
        'player': ['Star Player', 'Mid-Tier Player', 'Value Signing',
                   'Overpay Example', 'Young FA'],
        'years': [6, 4, 2, 5, 7],
        'aav': [35, 18, 8, 22, 28],
        'year_1_war': [5.5, 3.5, 2.0, 2.5, 4.5],
        'annual_decline': [0.3, 0.25, 0.15, 0.35, 0.2]
    })

    # Calculate values for each contract
    results = []
    for _, row in contracts.iterrows():
        calc = calculate_contract_value(
            row['years'], row['year_1_war'], row['annual_decline'], row['aav']
        )
        results.append(calc)

    # Add results to dataframe
    results_df = pd.DataFrame(results)
    contract_analysis = pd.concat([contracts, results_df], axis=1)

    # Calculate total salary and categorize
    contract_analysis['total_salary'] = (contract_analysis['aav'] *
                                         contract_analysis['years'])

    def categorize_value(surplus):
        if surplus > 20:
            return 'Excellent Value'
        elif surplus > 0:
            return 'Good Value'
        elif surplus > -20:
            return 'Market Rate'
        else:
            return 'Overpay'

    contract_analysis['value_category'] = (contract_analysis['surplus']
                                           .apply(categorize_value))

    # Create hover text
    contract_analysis['hover_text'] = (
        '<b>' + contract_analysis['player'] + '</b><br>' +
        'Contract: ' + contract_analysis['years'].astype(str) + ' years @ $' +
        contract_analysis['aav'].astype(str) + 'M/yr<br>' +
        'Total Salary: $' + contract_analysis['total_salary'].round(0).astype(str) + 'M<br>' +
        'PV Salary: $' + contract_analysis['pv_salary'].round(1).astype(str) + 'M<br>' +
        'Projected WAR: ' + contract_analysis['total_war'].round(1).astype(str) + '<br>' +
        'PV WAR: ' + contract_analysis['pv_war'].round(1).astype(str) + '<br>' +
        'Market Value: $' + contract_analysis['market_value'].round(1).astype(str) + 'M<br>' +
        'Surplus: $' + contract_analysis['surplus'].round(1).astype(str) + 'M<br>' +
        'Cost/WAR: $' + contract_analysis['cost_per_war'].round(1).astype(str) + 'M'
    )

    # Create bar chart
    color_map = {
        'Excellent Value': 'darkgreen',
        'Good Value': 'lightgreen',
        'Market Rate': 'yellow',
        'Overpay': 'red'
    }

    fig = px.bar(
        contract_analysis,
        x='player',
        y='surplus',
        color='value_category',
        color_discrete_map=color_map,
        title='Free Agent Contract Valuation Analysis',
        labels={'player': '', 'surplus': 'Surplus Value ($M)'}
    )

    fig.update_traces(
        hovertemplate='%{customdata[0]}<extra></extra>',
        customdata=contract_analysis[['hover_text']].values
    )

    fig.add_hline(
        y=0,
        line_dash="dash",
        line_color="black",
        annotation_text="Break-even"
    )

    fig.update_layout(
        hovermode='closest',
        height=600,
        showlegend=True,
        legend_title='Value Assessment'
    )

    return contract_analysis, fig

# Run calculator
contract_analysis, fig = create_fa_calculator()
fig.show()
print("\nContract Analysis Summary:")
print(contract_analysis[['player', 'years', 'aav', 'total_war', 'surplus',
                         'cost_per_war', 'value_category']])

14.8 Exercises

Exercise 14.1: Free Agent Cost Analysis

Using 2024 free agent data, calculate cost per WAR for at least 10 free agent signings. Then:

a) Compare cost per WAR across different position groups (pitchers vs hitters, premium positions vs corner positions)

b) Analyze whether older players (33+) cost more or less per WAR than younger free agents (28-30)

c) Identify which signing appears most efficient (best value) and least efficient (worst value)

Data to collect:

Player name, position, age

Contract terms (years, AAV)

Projected WAR for first year (use Steamer or ZiPS)

Hint: Check FanGraphs or Baseball Prospectus for free agent tracker and projections.

Exercise 14.2: Trade Surplus Value

Evaluate a recent blockbuster trade using surplus value analysis. Choose a trade from the past 2 years involving multiple players.

Your analysis should:

a) Calculate total surplus value for each side of the trade (projected WAR × market rate - expected salary over years of control)

b) Apply discount rates to future value (use 5-10%)

c) Determine which team "won" the trade based on surplus value

d) Discuss how competitive windows might make the trade beneficial for both sides despite unequal surplus value

Suggested trades:

Juan Soto to Padres (2022)

Tyler Glasnow to Dodgers (2023)

Dylan Cease to Padres (2024)

Exercise 14.3: Draft Pick Value Curve

Using Baseball Reference or FanGraphs, collect data on draft picks from a single year (suggest 2014-2016 to allow development time):

For picks 1-30:

a) Calculate what percentage reached MLB (100+ PA or 50+ IP)

b) For those who reached MLB, calculate total career WAR through 2023

c) Build a value curve showing expected WAR by draft position

d) Identify which picks outperformed or underperformed expectations

Extension: Compare college vs high school players. Do high school picks have higher variance? Higher ceiling?

Exercise 14.4: Competitive Window Modeling

Choose a current MLB team and project their competitive window:

a) Identify core players and project their WAR trajectory over next 5 years using aging curves

b) Estimate prospect contribution (consult top prospect lists)

c) Calculate total projected WAR and expected wins for each season

d) Determine the team's optimal strategy: compete now, rebuild, or middle ground

e) Recommend specific roster moves (trades, free agent signings, or sell-offs) that align with your recommended strategy

Teams with interesting situations:

Baltimore Orioles (young core, rising)

St. Louis Cardinals (aging core, crossroads)

Los Angeles Angels (Trout aging, weak farm)

Tampa Bay Rays (perennial contender, low payroll)

Chapter Summary

Team building combines economics, player valuation, strategic timing, and organizational philosophy. Key takeaways:

Economic Efficiency: Pre-arbitration players provide 40x ROI vs free agents; exploit this arbitrage
Positional Value: Premium defensive positions (C, SS, CF) allow lower offensive standards
Free Agent Markets: Account for aging curves, apply discount rates, avoid winner's curse
Trade Strategy: Exchange surplus value across different timelines; align with competitive windows
Draft Philosophy: Balance upside (high school) vs safety (college) based on organizational timeline
Strategic Clarity: Commit fully to competing or rebuilding; avoid mediocre middle ground

Successful team building requires analytical rigor, clear strategic vision, and disciplined execution. The best front offices combine quantitative analysis with qualitative evaluation, organizational development, and adaptive strategy. As analytics evolve, teams that integrate new methods while maintaining coherent long-term plans will sustain competitive advantage.

Chapter 14: Team Building & Roster Construction

Book Progress

What You'll Learn

Languages in This Chapter

Table of Contents

Quick Navigation

14.1 Economics of Baseball

The Salary Structure

Cost Per Win Analysis

Comparing Market Segments

The Win Curve and Marginal Value

Luxury Tax and Budget Constraints

14.2 Positional Value

The Defensive Spectrum

Market Inefficiency at Premium Positions

Calculating Position-Adjusted Value

Position Flexibility as Value

14.3 Free Agent Valuation

The Aging Curve Problem

Discount Rate and Present Value

Case Study: Evaluating the 2024 Free Agent Class

14.4 Trade Analysis

Framework: Surplus Value

The Win-Now vs Rebuild Trade-off

Case Study: The 2017 Astros Trade for Verlander

14.5 Draft Strategy

Expected Value by Draft Position

College vs High School Players

Signability and Slot Value

Case Study: Astros' Draft Strategy (2011-2014)

14.6 Contender vs Rebuilding

The Competitive Window

The Cost of Mediocrity

Signals for Rebuilding

Signals for Competing

The Astros Rebuild Model

14.7 Interactive Roster Tools

Interactive Payroll vs Wins Analysis

WAR Distribution Sunburst Chart

Free Agent Valuation Calculator

14.8 Exercises

Exercise 14.1: Free Agent Cost Analysis

Exercise 14.2: Trade Surplus Value

Exercise 14.3: Draft Pick Value Curve

Exercise 14.4: Competitive Window Modeling

Practice Exercises

Tips for Success

Free Agent Cost Analysis

Trade Surplus Value

Draft Pick Value Curve

Competitive Window Modeling

Chapter Summary

Related Resources

Glossary

Resources

All Chapters