Baseball operates under a unique economic structure that profoundly affects team building. Understanding player costs, market efficiency, and the relationship between payroll and wins is fundamental to roster construction.
The Salary Structure
MLB's salary structure creates distinct player markets based on service time:
Pre-Arbitration (0-3 years of service): Players earn near the league minimum (approximately $740,000 in 2024). Teams control these players completely, offering minimal salaries regardless of performance. A pre-arbitration player producing 5 WAR costs roughly $3-4 million total—an extraordinary bargain.
Arbitration-Eligible (3-6 years): Players become eligible for salary arbitration, where neutral arbitrators determine fair salaries based on comparable players. Arbitration salaries increase with performance and service time but remain below open market rates. A 3-WAR arbitration-eligible player might earn $8-12 million—still below market value but substantially more than pre-arbitration.
Free Agency (6+ years): After six years of service, players can negotiate with any team. Free agent salaries reflect open market competition and typically exceed performance value, especially for multi-year contracts that pay for declining future seasons.
This structure creates enormous incentives to develop young talent and trade players before they reach free agency. A team built around pre-arbitration stars—like the 2020 Rays or 2023 Orioles—can win while maintaining low payrolls.
Cost Per Win Analysis
Quantifying the relationship between spending and wins helps teams allocate resources efficiently. The fundamental metric is cost per marginal win ($/WAR), calculated by examining free agent contracts.
Let's analyze the 2023-24 free agent market to establish cost per win:
# 2023-24 Notable Free Agent Contracts
library(tidyverse)
free_agents <- tibble(
player = c("Shohei Ohtani", "Yoshinobu Yamamoto", "Aaron Nola",
"Jordan Montgomery", "Blake Snell", "Cody Bellinger",
"Matt Chapman", "Jung Hoo Lee"),
aav = c(70, 32.5, 25, 25, 32, 26.7, 25, 18.5), # Average annual value (millions)
years = c(10, 12, 7, 2, 2, 3, 3, 6),
projected_war = c(5.5, 3.5, 3.0, 2.8, 3.2, 2.5, 3.5, 2.0) # Annual WAR projection
)
# Calculate cost per WAR
free_agents <- free_agents %>%
mutate(
cost_per_war = aav / projected_war,
total_value = aav * years
)
# Summary statistics
cat("2023-24 Free Agent Market:\n")
cat("Median Cost per WAR: $", round(median(free_agents$cost_per_war), 1), "M\n", sep="")
cat("Mean Cost per WAR: $", round(mean(free_agents$cost_per_war), 1), "M\n", sep="")
# Visualize
ggplot(free_agents, aes(x = projected_war, y = aav, label = player)) +
geom_point(size = 3, color = "steelblue") +
geom_text(hjust = -0.1, size = 3) +
geom_smooth(method = "lm", se = FALSE, color = "red", linetype = "dashed") +
labs(title = "Free Agent Value: AAV vs Projected WAR",
x = "Projected WAR per Season",
y = "Average Annual Value ($M)") +
theme_minimal() +
xlim(1.5, 6)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# 2023-24 Notable Free Agent Contracts
free_agents = pd.DataFrame({
'player': ['Shohei Ohtani', 'Yoshinobu Yamamoto', 'Aaron Nola',
'Jordan Montgomery', 'Blake Snell', 'Cody Bellinger',
'Matt Chapman', 'Jung Hoo Lee'],
'aav': [70, 32.5, 25, 25, 32, 26.7, 25, 18.5], # millions
'years': [10, 12, 7, 2, 2, 3, 3, 6],
'projected_war': [5.5, 3.5, 3.0, 2.8, 3.2, 2.5, 3.5, 2.0]
})
# Calculate cost per WAR
free_agents['cost_per_war'] = free_agents['aav'] / free_agents['projected_war']
free_agents['total_value'] = free_agents['aav'] * free_agents['years']
# Summary statistics
print("2023-24 Free Agent Market:")
print(f"Median Cost per WAR: ${free_agents['cost_per_war'].median():.1f}M")
print(f"Mean Cost per WAR: ${free_agents['cost_per_war'].mean():.1f}M")
# Visualize
plt.figure(figsize=(10, 6))
plt.scatter(free_agents['projected_war'], free_agents['aav'], s=100, alpha=0.6)
# Add labels
for idx, row in free_agents.iterrows():
plt.annotate(row['player'], (row['projected_war'], row['aav']),
xytext=(5, 5), textcoords='offset points', fontsize=8)
# Fit line
slope, intercept, r, p, se = stats.linregress(free_agents['projected_war'],
free_agents['aav'])
x_line = np.array([1.5, 6])
y_line = slope * x_line + intercept
plt.plot(x_line, y_line, 'r--', alpha=0.7, label=f'Fit: ${slope:.1f}M per WAR')
plt.xlabel('Projected WAR per Season')
plt.ylabel('Average Annual Value ($M)')
plt.title('Free Agent Value: AAV vs Projected WAR')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
The 2023-24 market showed cost per WAR around $9-12 million, though Ohtani's unique contract structure (heavily deferred) complicates the calculation. Historical analysis suggests free agent cost per win has grown approximately 5% annually, outpacing inflation.
Comparing Market Segments
The economic advantage of team-controlled players becomes clear when comparing costs across service classes:
# Cost comparison across service classes
salary_structure <- tibble(
service_class = c("Pre-Arb", "Arb Year 1", "Arb Year 2",
"Arb Year 3", "Free Agent"),
avg_salary = c(0.8, 2.5, 5.0, 8.5, 15.0), # millions, for 3-WAR player
war_value = rep(3.0, 5),
market_value = c(36, 36, 36, 36, 36) # 3 WAR * $12M
)
salary_structure <- salary_structure %>%
mutate(
surplus_value = market_value - avg_salary,
efficiency = market_value / avg_salary
)
print(salary_structure)
# Visualize surplus value
ggplot(salary_structure, aes(x = service_class, y = surplus_value)) +
geom_col(fill = "darkgreen", alpha = 0.7) +
geom_text(aes(label = paste0("$", surplus_value, "M")),
vjust = -0.5, size = 4) +
labs(title = "Surplus Value by Service Class",
subtitle = "For a 3-WAR player at $12M/WAR market rate",
x = "Service Class",
y = "Surplus Value ($M)") +
theme_minimal() +
scale_x_discrete(limits = c("Pre-Arb", "Arb Year 1", "Arb Year 2",
"Arb Year 3", "Free Agent"))
# Cost comparison across service classes
salary_structure = pd.DataFrame({
'service_class': ['Pre-Arb', 'Arb Year 1', 'Arb Year 2',
'Arb Year 3', 'Free Agent'],
'avg_salary': [0.8, 2.5, 5.0, 8.5, 15.0], # millions, for 3-WAR player
'war_value': [3.0] * 5,
'market_value': [36] * 5 # 3 WAR * $12M
})
salary_structure['surplus_value'] = (salary_structure['market_value'] -
salary_structure['avg_salary'])
salary_structure['efficiency'] = (salary_structure['market_value'] /
salary_structure['avg_salary'])
print(salary_structure)
# Visualize surplus value
plt.figure(figsize=(10, 6))
bars = plt.bar(salary_structure['service_class'],
salary_structure['surplus_value'],
color='darkgreen', alpha=0.7)
# Add value labels
for idx, bar in enumerate(bars):
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height,
f'${salary_structure.iloc[idx]["surplus_value"]:.1f}M',
ha='center', va='bottom', fontsize=10)
plt.xlabel('Service Class')
plt.ylabel('Surplus Value ($M)')
plt.title('Surplus Value by Service Class\nFor a 3-WAR player at $12M/WAR market rate')
plt.xticks(rotation=45)
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
Pre-arbitration players generate 40x return on investment compared to free agents. This explains why teams like Tampa Bay, Cleveland, and Baltimore have remained competitive despite low payrolls—they've built rosters around team-controlled talent.
The Win Curve and Marginal Value
Not all wins have equal value. The 85th win (potentially making the playoffs) is worth far more than the 65th win (remaining well out of contention). This non-linear relationship affects optimal spending strategies.
# Estimate playoff probability by wins (based on historical data)
library(tidyverse)
win_curve <- tibble(
wins = 75:95,
playoff_prob = plogis((wins - 87) / 3) # Logistic curve centered at 87 wins
) %>%
mutate(
marginal_playoff_prob = playoff_prob - lag(playoff_prob, default = 0),
win_value_multiplier = 1 + (marginal_playoff_prob * 20) # Playoff worth ~20 wins
)
ggplot(win_curve, aes(x = wins, y = playoff_prob * 100)) +
geom_line(size = 1.5, color = "darkblue") +
geom_vline(xintercept = 87, linetype = "dashed", color = "red") +
labs(title = "Playoff Probability by Regular Season Wins",
subtitle = "Each marginal win has different value",
x = "Regular Season Wins",
y = "Playoff Probability (%)") +
theme_minimal() +
annotate("text", x = 89, y = 25, label = "Steepest slope:\nHighest marginal value",
hjust = 0, size = 3.5)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.special import expit # logistic function
# Estimate playoff probability by wins
wins = np.arange(75, 96)
playoff_prob = expit((wins - 87) / 3) # Logistic curve centered at 87 wins
win_curve = pd.DataFrame({
'wins': wins,
'playoff_prob': playoff_prob
})
win_curve['marginal_playoff_prob'] = win_curve['playoff_prob'].diff().fillna(0)
win_curve['win_value_multiplier'] = 1 + (win_curve['marginal_playoff_prob'] * 20)
# Plot
plt.figure(figsize=(10, 6))
plt.plot(win_curve['wins'], win_curve['playoff_prob'] * 100,
linewidth=2.5, color='darkblue')
plt.axvline(x=87, linestyle='--', color='red', alpha=0.7, label='Median playoff team')
plt.xlabel('Regular Season Wins')
plt.ylabel('Playoff Probability (%)')
plt.title('Playoff Probability by Regular Season Wins\nEach marginal win has different value')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()
Teams near the playoff threshold should value wins more highly than teams far from contention. An 84-win team might rationally spend $15M per marginal win to reach 87 wins, while a 72-win team shouldn't exceed $10M per win.
Luxury Tax and Budget Constraints
The Competitive Balance Tax (luxury tax) creates marginal cost increases for high-payroll teams. In 2024, the threshold is $237M, with escalating penalties:
- First threshold: 20% tax (50% for repeat offenders)
- Second threshold ($257M): 32% tax (62% for repeat offenders)
- Third threshold ($277M): 62.5% tax (95% for repeat offenders)
Plus additional draft pick penalties. This creates a progressive cost structure where the Yankees pay $1.95 for every $1.00 in salary above $277M (as a repeat offender).
# Calculate effective cost including luxury tax
calculate_tax_cost <- function(salary, current_payroll, threshold = 237,
repeat_offender = TRUE) {
new_payroll <- current_payroll + salary
if (new_payroll <= threshold) {
return(salary) # No tax
}
# Calculate tiered tax
tax_rate <- ifelse(repeat_offender, 0.50, 0.20) # First tier
if (new_payroll > 277) {
tax_rate <- ifelse(repeat_offender, 0.95, 0.625)
} else if (new_payroll > 257) {
tax_rate <- ifelse(repeat_offender, 0.62, 0.32)
}
overage <- new_payroll - threshold
total_cost <- salary + (overage * tax_rate)
return(total_cost)
}
# Example: Yankees (repeat offender, $280M payroll) vs Rays ($90M payroll)
# Adding a $20M player
yankees_cost <- calculate_tax_cost(20, 280, repeat_offender = TRUE)
rays_cost <- calculate_tax_cost(20, 90, repeat_offender = FALSE)
cat("Cost to add $20M player:\n")
cat("Yankees: $", round(yankees_cost, 1), "M (", round(yankees_cost/20, 2), "x multiplier)\n", sep="")
cat("Rays: $", round(rays_cost, 1), "M (", round(rays_cost/20, 2), "x multiplier)\n", sep="")
def calculate_tax_cost(salary, current_payroll, threshold=237, repeat_offender=True):
"""Calculate effective cost including luxury tax"""
new_payroll = current_payroll + salary
if new_payroll <= threshold:
return salary # No tax
# Calculate tiered tax
if new_payroll > 277:
tax_rate = 0.95 if repeat_offender else 0.625
elif new_payroll > 257:
tax_rate = 0.62 if repeat_offender else 0.32
else:
tax_rate = 0.50 if repeat_offender else 0.20
overage = new_payroll - threshold
total_cost = salary + (overage * tax_rate)
return total_cost
# Example: Yankees (repeat offender, $280M payroll) vs Rays ($90M payroll)
yankees_cost = calculate_tax_cost(20, 280, repeat_offender=True)
rays_cost = calculate_tax_cost(20, 90, repeat_offender=False)
print("Cost to add $20M player:")
print(f"Yankees: ${yankees_cost:.1f}M ({yankees_cost/20:.2f}x multiplier)")
print(f"Rays: ${rays_cost:.1f}M ({rays_cost/20:.2f}x multiplier)")
High-payroll teams face dramatically higher effective costs, creating different optimization problems. The Yankees must generate more WAR per dollar spent than the Rays to justify acquisitions.
# 2023-24 Notable Free Agent Contracts
library(tidyverse)
free_agents <- tibble(
player = c("Shohei Ohtani", "Yoshinobu Yamamoto", "Aaron Nola",
"Jordan Montgomery", "Blake Snell", "Cody Bellinger",
"Matt Chapman", "Jung Hoo Lee"),
aav = c(70, 32.5, 25, 25, 32, 26.7, 25, 18.5), # Average annual value (millions)
years = c(10, 12, 7, 2, 2, 3, 3, 6),
projected_war = c(5.5, 3.5, 3.0, 2.8, 3.2, 2.5, 3.5, 2.0) # Annual WAR projection
)
# Calculate cost per WAR
free_agents <- free_agents %>%
mutate(
cost_per_war = aav / projected_war,
total_value = aav * years
)
# Summary statistics
cat("2023-24 Free Agent Market:\n")
cat("Median Cost per WAR: $", round(median(free_agents$cost_per_war), 1), "M\n", sep="")
cat("Mean Cost per WAR: $", round(mean(free_agents$cost_per_war), 1), "M\n", sep="")
# Visualize
ggplot(free_agents, aes(x = projected_war, y = aav, label = player)) +
geom_point(size = 3, color = "steelblue") +
geom_text(hjust = -0.1, size = 3) +
geom_smooth(method = "lm", se = FALSE, color = "red", linetype = "dashed") +
labs(title = "Free Agent Value: AAV vs Projected WAR",
x = "Projected WAR per Season",
y = "Average Annual Value ($M)") +
theme_minimal() +
xlim(1.5, 6)
# Cost comparison across service classes
salary_structure <- tibble(
service_class = c("Pre-Arb", "Arb Year 1", "Arb Year 2",
"Arb Year 3", "Free Agent"),
avg_salary = c(0.8, 2.5, 5.0, 8.5, 15.0), # millions, for 3-WAR player
war_value = rep(3.0, 5),
market_value = c(36, 36, 36, 36, 36) # 3 WAR * $12M
)
salary_structure <- salary_structure %>%
mutate(
surplus_value = market_value - avg_salary,
efficiency = market_value / avg_salary
)
print(salary_structure)
# Visualize surplus value
ggplot(salary_structure, aes(x = service_class, y = surplus_value)) +
geom_col(fill = "darkgreen", alpha = 0.7) +
geom_text(aes(label = paste0("$", surplus_value, "M")),
vjust = -0.5, size = 4) +
labs(title = "Surplus Value by Service Class",
subtitle = "For a 3-WAR player at $12M/WAR market rate",
x = "Service Class",
y = "Surplus Value ($M)") +
theme_minimal() +
scale_x_discrete(limits = c("Pre-Arb", "Arb Year 1", "Arb Year 2",
"Arb Year 3", "Free Agent"))
# Estimate playoff probability by wins (based on historical data)
library(tidyverse)
win_curve <- tibble(
wins = 75:95,
playoff_prob = plogis((wins - 87) / 3) # Logistic curve centered at 87 wins
) %>%
mutate(
marginal_playoff_prob = playoff_prob - lag(playoff_prob, default = 0),
win_value_multiplier = 1 + (marginal_playoff_prob * 20) # Playoff worth ~20 wins
)
ggplot(win_curve, aes(x = wins, y = playoff_prob * 100)) +
geom_line(size = 1.5, color = "darkblue") +
geom_vline(xintercept = 87, linetype = "dashed", color = "red") +
labs(title = "Playoff Probability by Regular Season Wins",
subtitle = "Each marginal win has different value",
x = "Regular Season Wins",
y = "Playoff Probability (%)") +
theme_minimal() +
annotate("text", x = 89, y = 25, label = "Steepest slope:\nHighest marginal value",
hjust = 0, size = 3.5)
# Calculate effective cost including luxury tax
calculate_tax_cost <- function(salary, current_payroll, threshold = 237,
repeat_offender = TRUE) {
new_payroll <- current_payroll + salary
if (new_payroll <= threshold) {
return(salary) # No tax
}
# Calculate tiered tax
tax_rate <- ifelse(repeat_offender, 0.50, 0.20) # First tier
if (new_payroll > 277) {
tax_rate <- ifelse(repeat_offender, 0.95, 0.625)
} else if (new_payroll > 257) {
tax_rate <- ifelse(repeat_offender, 0.62, 0.32)
}
overage <- new_payroll - threshold
total_cost <- salary + (overage * tax_rate)
return(total_cost)
}
# Example: Yankees (repeat offender, $280M payroll) vs Rays ($90M payroll)
# Adding a $20M player
yankees_cost <- calculate_tax_cost(20, 280, repeat_offender = TRUE)
rays_cost <- calculate_tax_cost(20, 90, repeat_offender = FALSE)
cat("Cost to add $20M player:\n")
cat("Yankees: $", round(yankees_cost, 1), "M (", round(yankees_cost/20, 2), "x multiplier)\n", sep="")
cat("Rays: $", round(rays_cost, 1), "M (", round(rays_cost/20, 2), "x multiplier)\n", sep="")
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# 2023-24 Notable Free Agent Contracts
free_agents = pd.DataFrame({
'player': ['Shohei Ohtani', 'Yoshinobu Yamamoto', 'Aaron Nola',
'Jordan Montgomery', 'Blake Snell', 'Cody Bellinger',
'Matt Chapman', 'Jung Hoo Lee'],
'aav': [70, 32.5, 25, 25, 32, 26.7, 25, 18.5], # millions
'years': [10, 12, 7, 2, 2, 3, 3, 6],
'projected_war': [5.5, 3.5, 3.0, 2.8, 3.2, 2.5, 3.5, 2.0]
})
# Calculate cost per WAR
free_agents['cost_per_war'] = free_agents['aav'] / free_agents['projected_war']
free_agents['total_value'] = free_agents['aav'] * free_agents['years']
# Summary statistics
print("2023-24 Free Agent Market:")
print(f"Median Cost per WAR: ${free_agents['cost_per_war'].median():.1f}M")
print(f"Mean Cost per WAR: ${free_agents['cost_per_war'].mean():.1f}M")
# Visualize
plt.figure(figsize=(10, 6))
plt.scatter(free_agents['projected_war'], free_agents['aav'], s=100, alpha=0.6)
# Add labels
for idx, row in free_agents.iterrows():
plt.annotate(row['player'], (row['projected_war'], row['aav']),
xytext=(5, 5), textcoords='offset points', fontsize=8)
# Fit line
slope, intercept, r, p, se = stats.linregress(free_agents['projected_war'],
free_agents['aav'])
x_line = np.array([1.5, 6])
y_line = slope * x_line + intercept
plt.plot(x_line, y_line, 'r--', alpha=0.7, label=f'Fit: ${slope:.1f}M per WAR')
plt.xlabel('Projected WAR per Season')
plt.ylabel('Average Annual Value ($M)')
plt.title('Free Agent Value: AAV vs Projected WAR')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Cost comparison across service classes
salary_structure = pd.DataFrame({
'service_class': ['Pre-Arb', 'Arb Year 1', 'Arb Year 2',
'Arb Year 3', 'Free Agent'],
'avg_salary': [0.8, 2.5, 5.0, 8.5, 15.0], # millions, for 3-WAR player
'war_value': [3.0] * 5,
'market_value': [36] * 5 # 3 WAR * $12M
})
salary_structure['surplus_value'] = (salary_structure['market_value'] -
salary_structure['avg_salary'])
salary_structure['efficiency'] = (salary_structure['market_value'] /
salary_structure['avg_salary'])
print(salary_structure)
# Visualize surplus value
plt.figure(figsize=(10, 6))
bars = plt.bar(salary_structure['service_class'],
salary_structure['surplus_value'],
color='darkgreen', alpha=0.7)
# Add value labels
for idx, bar in enumerate(bars):
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height,
f'${salary_structure.iloc[idx]["surplus_value"]:.1f}M',
ha='center', va='bottom', fontsize=10)
plt.xlabel('Service Class')
plt.ylabel('Surplus Value ($M)')
plt.title('Surplus Value by Service Class\nFor a 3-WAR player at $12M/WAR market rate')
plt.xticks(rotation=45)
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.special import expit # logistic function
# Estimate playoff probability by wins
wins = np.arange(75, 96)
playoff_prob = expit((wins - 87) / 3) # Logistic curve centered at 87 wins
win_curve = pd.DataFrame({
'wins': wins,
'playoff_prob': playoff_prob
})
win_curve['marginal_playoff_prob'] = win_curve['playoff_prob'].diff().fillna(0)
win_curve['win_value_multiplier'] = 1 + (win_curve['marginal_playoff_prob'] * 20)
# Plot
plt.figure(figsize=(10, 6))
plt.plot(win_curve['wins'], win_curve['playoff_prob'] * 100,
linewidth=2.5, color='darkblue')
plt.axvline(x=87, linestyle='--', color='red', alpha=0.7, label='Median playoff team')
plt.xlabel('Regular Season Wins')
plt.ylabel('Playoff Probability (%)')
plt.title('Playoff Probability by Regular Season Wins\nEach marginal win has different value')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()
def calculate_tax_cost(salary, current_payroll, threshold=237, repeat_offender=True):
"""Calculate effective cost including luxury tax"""
new_payroll = current_payroll + salary
if new_payroll <= threshold:
return salary # No tax
# Calculate tiered tax
if new_payroll > 277:
tax_rate = 0.95 if repeat_offender else 0.625
elif new_payroll > 257:
tax_rate = 0.62 if repeat_offender else 0.32
else:
tax_rate = 0.50 if repeat_offender else 0.20
overage = new_payroll - threshold
total_cost = salary + (overage * tax_rate)
return total_cost
# Example: Yankees (repeat offender, $280M payroll) vs Rays ($90M payroll)
yankees_cost = calculate_tax_cost(20, 280, repeat_offender=True)
rays_cost = calculate_tax_cost(20, 90, repeat_offender=False)
print("Cost to add $20M player:")
print(f"Yankees: ${yankees_cost:.1f}M ({yankees_cost/20:.2f}x multiplier)")
print(f"Rays: ${rays_cost:.1f}M ({rays_cost/20:.2f}x multiplier)")
A shortstop hitting .240 with 15 home runs might be more valuable than a first baseman hitting .280 with 25 home runs. Defensive positions have different offensive standards and scarcity levels, requiring positional adjustments in player valuation.
The Defensive Spectrum
Positions fall along a defensive spectrum from most demanding (catcher, shortstop, center field) to least demanding (first base, designated hitter). More demanding positions have lower offensive standards because fewer players can handle the defensive requirements.
The standard positional adjustments (runs per 162 games, relative to average):
# Positional adjustments (runs per 162 games)
positional_value <- tibble(
position = c("C", "SS", "2B", "CF", "3B", "LF", "RF", "1B", "DH"),
adjustment = c(12.5, 7.5, 3.0, 2.5, 2.0, -7.5, -7.5, -12.5, -17.5),
war_adjustment = adjustment / 10 # Convert runs to WAR (10 runs ≈ 1 WAR)
) %>%
arrange(desc(adjustment))
print(positional_value)
# Visualize
ggplot(positional_value, aes(x = reorder(position, adjustment),
y = adjustment, fill = adjustment > 0)) +
geom_col(alpha = 0.7) +
geom_text(aes(label = sprintf("%+.1f", adjustment)),
hjust = ifelse(positional_value$adjustment > 0, -0.2, 1.2),
size = 4) +
coord_flip() +
scale_fill_manual(values = c("red", "blue"), guide = "none") +
labs(title = "Positional Value Adjustments",
subtitle = "Runs per 162 games, relative to average position",
x = "Position",
y = "Run Adjustment") +
theme_minimal()
import pandas as pd
import matplotlib.pyplot as plt
# Positional adjustments (runs per 162 games)
positional_value = pd.DataFrame({
'position': ['C', 'SS', '2B', 'CF', '3B', 'LF', 'RF', '1B', 'DH'],
'adjustment': [12.5, 7.5, 3.0, 2.5, 2.0, -7.5, -7.5, -12.5, -17.5]
})
positional_value['war_adjustment'] = positional_value['adjustment'] / 10
positional_value = positional_value.sort_values('adjustment', ascending=False)
print(positional_value)
# Visualize
fig, ax = plt.subplots(figsize=(10, 6))
colors = ['blue' if x > 0 else 'red' for x in positional_value['adjustment']]
bars = ax.barh(positional_value['position'], positional_value['adjustment'],
color=colors, alpha=0.7)
# Add value labels
for idx, (pos, adj) in enumerate(zip(positional_value['position'],
positional_value['adjustment'])):
ax.text(adj + (1 if adj > 0 else -1), idx, f'{adj:+.1f}',
ha='left' if adj > 0 else 'right', va='center', fontsize=10)
ax.set_xlabel('Run Adjustment')
ax.set_ylabel('Position')
ax.set_title('Positional Value Adjustments\nRuns per 162 games, relative to average position')
ax.axvline(x=0, color='black', linestyle='-', linewidth=0.8)
ax.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()
A catcher receives a +12.5 run adjustment (about +1.25 WAR) compared to an average position player, while a DH receives a -17.5 run penalty (-1.75 WAR). This 3-WAR difference means a catcher can produce equivalent value while posting significantly worse offensive numbers than a designated hitter.
Market Inefficiency at Premium Positions
Teams often overpay for offense at premium defensive positions and underpay for defense. The 2023 Orioles exemplified efficient roster construction by acquiring strong defenders at premium positions:
- Adley Rutschman (C): Elite defense, above-average offense
- Gunnar Henderson (SS): Plus defense, excellent offense
- Cedric Mullins (CF): Gold Glove defense, solid offense
Meanwhile, they accepted below-average defense at corner positions (1B, LF, RF) in exchange for offensive production. This construction maximized total value by allocating defensive resources where they matter most.
Calculating Position-Adjusted Value
Let's compare two 2023 players with similar raw offensive numbers:
# Compare Corey Seager (SS) vs Freddie Freeman (1B) - 2023
library(tidyverse)
players <- tibble(
player = c("Corey Seager", "Freddie Freeman"),
position = c("SS", "1B"),
batting_war = c(4.2, 4.5), # Offensive value only
fielding_runs = c(0, 8),
positional_adj = c(7.5, -12.5) # From our table
)
players <- players %>%
mutate(
fielding_war = fielding_runs / 10,
position_war = positional_adj / 10,
total_war = batting_war + fielding_war + position_war,
salary_2023 = c(33, 27), # Millions
war_per_dollar = total_war / salary_2023
)
print(players)
# Visualization
players_long <- players %>%
select(player, batting_war, fielding_war, position_war) %>%
pivot_longer(cols = -player, names_to = "component", values_to = "war")
ggplot(players_long, aes(x = player, y = war, fill = component)) +
geom_col(position = "stack") +
geom_text(data = players, aes(x = player, y = total_war + 0.3,
label = sprintf("%.1f WAR", total_war)),
inherit.aes = FALSE, size = 5) +
scale_fill_manual(values = c("batting_war" = "darkblue",
"fielding_war" = "darkgreen",
"position_war" = "orange"),
labels = c("Batting", "Fielding", "Positional Adj")) +
labs(title = "WAR Components: Premium vs Corner Position",
subtitle = "2023 Season Comparison",
x = NULL, y = "WAR", fill = "Component") +
theme_minimal()
import pandas as pd
import matplotlib.pyplot as plt
# Compare Corey Seager (SS) vs Freddie Freeman (1B) - 2023
players = pd.DataFrame({
'player': ['Corey Seager', 'Freddie Freeman'],
'position': ['SS', '1B'],
'batting_war': [4.2, 4.5],
'fielding_runs': [0, 8],
'positional_adj': [7.5, -12.5]
})
players['fielding_war'] = players['fielding_runs'] / 10
players['position_war'] = players['positional_adj'] / 10
players['total_war'] = (players['batting_war'] + players['fielding_war'] +
players['position_war'])
players['salary_2023'] = [33, 27] # Millions
players['war_per_dollar'] = players['total_war'] / players['salary_2023']
print(players[['player', 'batting_war', 'fielding_war', 'position_war', 'total_war']])
# Visualization
fig, ax = plt.subplots(figsize=(10, 6))
x = range(len(players))
width = 0.6
# Stacked bars
p1 = ax.bar(x, players['batting_war'], width, label='Batting', color='darkblue')
p2 = ax.bar(x, players['fielding_war'], width, bottom=players['batting_war'],
label='Fielding', color='darkgreen')
p3 = ax.bar(x, players['position_war'], width,
bottom=players['batting_war'] + players['fielding_war'],
label='Positional Adj', color='orange')
# Add total WAR labels
for i, (player, war) in enumerate(zip(players['player'], players['total_war'])):
ax.text(i, war + 0.3, f'{war:.1f} WAR', ha='center', fontsize=12, fontweight='bold')
ax.set_ylabel('WAR')
ax.set_title('WAR Components: Premium vs Corner Position\n2023 Season Comparison')
ax.set_xticks(x)
ax.set_xticklabels(players['player'])
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
Despite similar batting contributions, Seager's shortstop premium makes him more valuable overall. This explains why teams pay premiums for shortstops who can hit—they're scarce.
Position Flexibility as Value
Players who can competently handle multiple positions provide roster flexibility worth 0.5-1.0 WAR beyond their performance. The 2016 Cubs utilized this extensively:
- Ben Zobrist: 2B/OF/3B versatility
- Javier Baez: SS/2B/3B capability
- Kyle Schwarber: LF/C emergency option
This flexibility allowed manager Joe Maddon to optimize matchups, rest players, and navigate injuries without roster moves. Teams should value positional versatility in player acquisition and development.
# Positional adjustments (runs per 162 games)
positional_value <- tibble(
position = c("C", "SS", "2B", "CF", "3B", "LF", "RF", "1B", "DH"),
adjustment = c(12.5, 7.5, 3.0, 2.5, 2.0, -7.5, -7.5, -12.5, -17.5),
war_adjustment = adjustment / 10 # Convert runs to WAR (10 runs ≈ 1 WAR)
) %>%
arrange(desc(adjustment))
print(positional_value)
# Visualize
ggplot(positional_value, aes(x = reorder(position, adjustment),
y = adjustment, fill = adjustment > 0)) +
geom_col(alpha = 0.7) +
geom_text(aes(label = sprintf("%+.1f", adjustment)),
hjust = ifelse(positional_value$adjustment > 0, -0.2, 1.2),
size = 4) +
coord_flip() +
scale_fill_manual(values = c("red", "blue"), guide = "none") +
labs(title = "Positional Value Adjustments",
subtitle = "Runs per 162 games, relative to average position",
x = "Position",
y = "Run Adjustment") +
theme_minimal()
# Compare Corey Seager (SS) vs Freddie Freeman (1B) - 2023
library(tidyverse)
players <- tibble(
player = c("Corey Seager", "Freddie Freeman"),
position = c("SS", "1B"),
batting_war = c(4.2, 4.5), # Offensive value only
fielding_runs = c(0, 8),
positional_adj = c(7.5, -12.5) # From our table
)
players <- players %>%
mutate(
fielding_war = fielding_runs / 10,
position_war = positional_adj / 10,
total_war = batting_war + fielding_war + position_war,
salary_2023 = c(33, 27), # Millions
war_per_dollar = total_war / salary_2023
)
print(players)
# Visualization
players_long <- players %>%
select(player, batting_war, fielding_war, position_war) %>%
pivot_longer(cols = -player, names_to = "component", values_to = "war")
ggplot(players_long, aes(x = player, y = war, fill = component)) +
geom_col(position = "stack") +
geom_text(data = players, aes(x = player, y = total_war + 0.3,
label = sprintf("%.1f WAR", total_war)),
inherit.aes = FALSE, size = 5) +
scale_fill_manual(values = c("batting_war" = "darkblue",
"fielding_war" = "darkgreen",
"position_war" = "orange"),
labels = c("Batting", "Fielding", "Positional Adj")) +
labs(title = "WAR Components: Premium vs Corner Position",
subtitle = "2023 Season Comparison",
x = NULL, y = "WAR", fill = "Component") +
theme_minimal()
import pandas as pd
import matplotlib.pyplot as plt
# Positional adjustments (runs per 162 games)
positional_value = pd.DataFrame({
'position': ['C', 'SS', '2B', 'CF', '3B', 'LF', 'RF', '1B', 'DH'],
'adjustment': [12.5, 7.5, 3.0, 2.5, 2.0, -7.5, -7.5, -12.5, -17.5]
})
positional_value['war_adjustment'] = positional_value['adjustment'] / 10
positional_value = positional_value.sort_values('adjustment', ascending=False)
print(positional_value)
# Visualize
fig, ax = plt.subplots(figsize=(10, 6))
colors = ['blue' if x > 0 else 'red' for x in positional_value['adjustment']]
bars = ax.barh(positional_value['position'], positional_value['adjustment'],
color=colors, alpha=0.7)
# Add value labels
for idx, (pos, adj) in enumerate(zip(positional_value['position'],
positional_value['adjustment'])):
ax.text(adj + (1 if adj > 0 else -1), idx, f'{adj:+.1f}',
ha='left' if adj > 0 else 'right', va='center', fontsize=10)
ax.set_xlabel('Run Adjustment')
ax.set_ylabel('Position')
ax.set_title('Positional Value Adjustments\nRuns per 162 games, relative to average position')
ax.axvline(x=0, color='black', linestyle='-', linewidth=0.8)
ax.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
# Compare Corey Seager (SS) vs Freddie Freeman (1B) - 2023
players = pd.DataFrame({
'player': ['Corey Seager', 'Freddie Freeman'],
'position': ['SS', '1B'],
'batting_war': [4.2, 4.5],
'fielding_runs': [0, 8],
'positional_adj': [7.5, -12.5]
})
players['fielding_war'] = players['fielding_runs'] / 10
players['position_war'] = players['positional_adj'] / 10
players['total_war'] = (players['batting_war'] + players['fielding_war'] +
players['position_war'])
players['salary_2023'] = [33, 27] # Millions
players['war_per_dollar'] = players['total_war'] / players['salary_2023']
print(players[['player', 'batting_war', 'fielding_war', 'position_war', 'total_war']])
# Visualization
fig, ax = plt.subplots(figsize=(10, 6))
x = range(len(players))
width = 0.6
# Stacked bars
p1 = ax.bar(x, players['batting_war'], width, label='Batting', color='darkblue')
p2 = ax.bar(x, players['fielding_war'], width, bottom=players['batting_war'],
label='Fielding', color='darkgreen')
p3 = ax.bar(x, players['position_war'], width,
bottom=players['batting_war'] + players['fielding_war'],
label='Positional Adj', color='orange')
# Add total WAR labels
for i, (player, war) in enumerate(zip(players['player'], players['total_war'])):
ax.text(i, war + 0.3, f'{war:.1f} WAR', ha='center', fontsize=12, fontweight='bold')
ax.set_ylabel('WAR')
ax.set_title('WAR Components: Premium vs Corner Position\n2023 Season Comparison')
ax.set_xticks(x)
ax.set_xticklabels(players['player'])
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
Free agency represents the most transparent player market but also the most inefficient. Teams routinely overpay for past performance, overestimate aging curves, and fall victim to winner's curse dynamics. Analytical approaches improve free agent decision-making.
The Aging Curve Problem
Player performance follows a predictable aging curve: rapid improvement through age 26-27, a brief peak, then gradual decline. Free agents typically reach the market at age 28-30, meaning teams pay for declining future performance.
# Generic aging curve (based on research)
library(tidyverse)
aging_curve <- tibble(
age = 22:38,
war_multiplier = c(0.75, 0.85, 0.92, 0.97, 1.00, 1.00, 0.98, # 22-28
0.95, 0.91, 0.86, 0.80, 0.73, 0.65, 0.56, # 29-35
0.46, 0.36, 0.26) # 36-38
)
# Example: Player produces 4 WAR at age 28
player_age_28_war <- 4
projections <- aging_curve %>%
filter(age >= 28) %>%
mutate(
projected_war = player_age_28_war * war_multiplier,
year = row_number()
)
ggplot(projections, aes(x = age, y = projected_war)) +
geom_line(size = 1.5, color = "darkred") +
geom_point(size = 3, color = "darkred") +
geom_hline(yintercept = 2, linetype = "dashed", color = "gray50") +
annotate("text", x = 35, y = 2.2, label = "Replacement level", size = 3.5) +
labs(title = "Aging Curve: Projected Performance Decline",
subtitle = "Starting from 4 WAR at age 28",
x = "Age", y = "Projected WAR") +
theme_minimal() +
scale_x_continuous(breaks = seq(28, 38, 2))
# Calculate total value of 6-year contract
total_war_6yr <- sum(projections$projected_war[1:6])
cat("\nTotal projected WAR over 6-year deal (ages 28-33):", round(total_war_6yr, 1), "\n")
cat("Average WAR per season:", round(total_war_6yr / 6, 2), "\n")
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Generic aging curve
aging_curve = pd.DataFrame({
'age': range(22, 39),
'war_multiplier': [0.75, 0.85, 0.92, 0.97, 1.00, 1.00, 0.98, # 22-28
0.95, 0.91, 0.86, 0.80, 0.73, 0.65, 0.56, # 29-35
0.46, 0.36, 0.26] # 36-38
})
# Example: Player produces 4 WAR at age 28
player_age_28_war = 4
projections = aging_curve[aging_curve['age'] >= 28].copy()
projections['projected_war'] = player_age_28_war * projections['war_multiplier']
projections['year'] = range(1, len(projections) + 1)
# Plot
plt.figure(figsize=(10, 6))
plt.plot(projections['age'], projections['projected_war'],
linewidth=2.5, color='darkred', marker='o', markersize=6)
plt.axhline(y=2, linestyle='--', color='gray', alpha=0.7, label='Replacement level')
plt.xlabel('Age')
plt.ylabel('Projected WAR')
plt.title('Aging Curve: Projected Performance Decline\nStarting from 4 WAR at age 28')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()
# Calculate total value
total_war_6yr = projections['projected_war'].iloc[:6].sum()
print(f"\nTotal projected WAR over 6-year deal (ages 28-33): {total_war_6yr:.1f}")
print(f"Average WAR per season: {total_war_6yr/6:.2f}")
A 6-year deal for a 4-WAR player at age 28 yields roughly 19.4 total WAR, averaging 3.2 WAR per season. But teams pay based on current performance (4 WAR), not average future performance (3.2 WAR), creating systematic overpayment.
Discount Rate and Present Value
Future performance is worth less than current performance due to:
- Uncertainty: Injury, unexpected decline, or external factors
- Opportunity cost: Money spent today can't be deployed elsewhere
- Competitive window: A win in 2025 might be worth more than a win in 2030
Financial analysis uses discount rates to value future wins:
# Calculate present value of multi-year contract
calculate_pv_war <- function(war_vector, discount_rate = 0.10) {
years <- seq_along(war_vector)
discount_factors <- 1 / ((1 + discount_rate) ^ (years - 1))
pv_war <- sum(war_vector * discount_factors)
return(pv_war)
}
# Example: 6-year contract, declining WAR
contract_war <- c(4.0, 3.8, 3.4, 3.0, 2.5, 2.0)
undiscounted <- sum(contract_war)
pv_5pct <- calculate_pv_war(contract_war, 0.05)
pv_10pct <- calculate_pv_war(contract_war, 0.10)
cat("Total WAR:\n")
cat("Undiscounted:", round(undiscounted, 1), "\n")
cat("PV at 5% discount:", round(pv_5pct, 1), "\n")
cat("PV at 10% discount:", round(pv_10pct, 1), "\n")
# Visualize
discount_rates <- seq(0, 0.15, 0.01)
pv_values <- sapply(discount_rates, function(r) calculate_pv_war(contract_war, r))
tibble(discount_rate = discount_rates, pv_war = pv_values) %>%
ggplot(aes(x = discount_rate * 100, y = pv_war)) +
geom_line(size = 1.5, color = "darkblue") +
geom_hline(yintercept = undiscounted, linetype = "dashed", color = "red") +
labs(title = "Present Value of WAR by Discount Rate",
subtitle = "6-year contract with declining production",
x = "Discount Rate (%)", y = "Present Value (WAR)") +
theme_minimal()
def calculate_pv_war(war_vector, discount_rate=0.10):
"""Calculate present value of WAR stream"""
years = np.arange(1, len(war_vector) + 1)
discount_factors = 1 / ((1 + discount_rate) ** (years - 1))
pv_war = np.sum(np.array(war_vector) * discount_factors)
return pv_war
# Example: 6-year contract, declining WAR
contract_war = [4.0, 3.8, 3.4, 3.0, 2.5, 2.0]
undiscounted = sum(contract_war)
pv_5pct = calculate_pv_war(contract_war, 0.05)
pv_10pct = calculate_pv_war(contract_war, 0.10)
print("Total WAR:")
print(f"Undiscounted: {undiscounted:.1f}")
print(f"PV at 5% discount: {pv_5pct:.1f}")
print(f"PV at 10% discount: {pv_10pct:.1f}")
# Visualize
discount_rates = np.arange(0, 0.16, 0.01)
pv_values = [calculate_pv_war(contract_war, r) for r in discount_rates]
plt.figure(figsize=(10, 6))
plt.plot(discount_rates * 100, pv_values, linewidth=2.5, color='darkblue')
plt.axhline(y=undiscounted, linestyle='--', color='red', alpha=0.7,
label=f'Undiscounted ({undiscounted:.1f} WAR)')
plt.xlabel('Discount Rate (%)')
plt.ylabel('Present Value (WAR)')
plt.title('Present Value of WAR by Discount Rate\n6-year contract with declining production')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()
At a 10% discount rate, this 18.7 WAR contract is worth only 15.4 present-value WAR. Contending teams should use lower discount rates (future wins still valuable), while rebuilding teams should use higher rates (future wins more valuable than current wins).
Case Study: Evaluating the 2024 Free Agent Class
Let's analyze whether the Dodgers' signing of Shohei Ohtani made economic sense:
# Ohtani contract analysis
# $700M over 10 years, but $680M deferred (97% of contract)
# Actual payments: $2M/year for 10 years, then $68M/year for 10 years
# Calculate present value (using 5% discount rate)
calculate_contract_pv <- function(annual_payments, discount_rate = 0.05) {
years <- seq_along(annual_payments)
pv <- sum(annual_payments / ((1 + discount_rate) ^ years))
return(pv)
}
# Ohtani's payment structure
ohtani_payments <- c(rep(2, 10), rep(68, 10)) # Millions
ohtani_pv <- calculate_contract_pv(ohtani_payments, 0.05)
cat("Ohtani Contract:\n")
cat("Nominal value: $700M\n")
cat("Present value (5%): $", round(ohtani_pv, 0), "M\n", sep="")
cat("Effective AAV: $", round(ohtani_pv / 10, 1), "M\n\n", sep="")
# Project WAR (pitching in 2025, two-way thereafter)
ohtani_war_projection <- c(2.5, 8.0, 7.5, 7.0, 6.5, 6.0, 5.0, 4.0, 3.0, 2.0)
total_projected_war <- sum(ohtani_war_projection)
pv_war <- calculate_pv_war(ohtani_war_projection, 0.05)
cost_per_war <- (ohtani_pv / 10) / (total_projected_war / 10)
cat("Projected Performance:\n")
cat("Total WAR (10 years):", total_projected_war, "\n")
cat("Present Value WAR:", round(pv_war, 1), "\n")
cat("Cost per WAR: $", round(cost_per_war, 1), "M\n", sep="")
cat("Market rate: ~$12M/WAR\n")
cat("Value created: ", ifelse(cost_per_war < 12, "POSITIVE", "NEGATIVE"), "\n", sep="")
def calculate_contract_pv(annual_payments, discount_rate=0.05):
"""Calculate present value of contract"""
years = np.arange(1, len(annual_payments) + 1)
pv = np.sum(np.array(annual_payments) / ((1 + discount_rate) ** years))
return pv
# Ohtani's payment structure
ohtani_payments = [2] * 10 + [68] * 10 # Millions
ohtani_pv = calculate_contract_pv(ohtani_payments, 0.05)
print("Ohtani Contract:")
print(f"Nominal value: $700M")
print(f"Present value (5%): ${ohtani_pv:.0f}M")
print(f"Effective AAV: ${ohtani_pv/10:.1f}M\n")
# Project WAR
ohtani_war_projection = [2.5, 8.0, 7.5, 7.0, 6.5, 6.0, 5.0, 4.0, 3.0, 2.0]
total_projected_war = sum(ohtani_war_projection)
pv_war = calculate_pv_war(ohtani_war_projection, 0.05)
cost_per_war = (ohtani_pv / 10) / (total_projected_war / 10)
print("Projected Performance:")
print(f"Total WAR (10 years): {total_projected_war}")
print(f"Present Value WAR: {pv_war:.1f}")
print(f"Cost per WAR: ${cost_per_war:.1f}M")
print(f"Market rate: ~$12M/WAR")
print(f"Value created: {'POSITIVE' if cost_per_war < 12 else 'NEGATIVE'}")
The extreme deferral structure reduced Ohtani's present value to approximately $460M (effective AAV ~$46M), making the contract economically justifiable despite the eye-popping nominal value. The Dodgers also benefit from luxury tax calculations using the lower AAV.
# Generic aging curve (based on research)
library(tidyverse)
aging_curve <- tibble(
age = 22:38,
war_multiplier = c(0.75, 0.85, 0.92, 0.97, 1.00, 1.00, 0.98, # 22-28
0.95, 0.91, 0.86, 0.80, 0.73, 0.65, 0.56, # 29-35
0.46, 0.36, 0.26) # 36-38
)
# Example: Player produces 4 WAR at age 28
player_age_28_war <- 4
projections <- aging_curve %>%
filter(age >= 28) %>%
mutate(
projected_war = player_age_28_war * war_multiplier,
year = row_number()
)
ggplot(projections, aes(x = age, y = projected_war)) +
geom_line(size = 1.5, color = "darkred") +
geom_point(size = 3, color = "darkred") +
geom_hline(yintercept = 2, linetype = "dashed", color = "gray50") +
annotate("text", x = 35, y = 2.2, label = "Replacement level", size = 3.5) +
labs(title = "Aging Curve: Projected Performance Decline",
subtitle = "Starting from 4 WAR at age 28",
x = "Age", y = "Projected WAR") +
theme_minimal() +
scale_x_continuous(breaks = seq(28, 38, 2))
# Calculate total value of 6-year contract
total_war_6yr <- sum(projections$projected_war[1:6])
cat("\nTotal projected WAR over 6-year deal (ages 28-33):", round(total_war_6yr, 1), "\n")
cat("Average WAR per season:", round(total_war_6yr / 6, 2), "\n")
# Calculate present value of multi-year contract
calculate_pv_war <- function(war_vector, discount_rate = 0.10) {
years <- seq_along(war_vector)
discount_factors <- 1 / ((1 + discount_rate) ^ (years - 1))
pv_war <- sum(war_vector * discount_factors)
return(pv_war)
}
# Example: 6-year contract, declining WAR
contract_war <- c(4.0, 3.8, 3.4, 3.0, 2.5, 2.0)
undiscounted <- sum(contract_war)
pv_5pct <- calculate_pv_war(contract_war, 0.05)
pv_10pct <- calculate_pv_war(contract_war, 0.10)
cat("Total WAR:\n")
cat("Undiscounted:", round(undiscounted, 1), "\n")
cat("PV at 5% discount:", round(pv_5pct, 1), "\n")
cat("PV at 10% discount:", round(pv_10pct, 1), "\n")
# Visualize
discount_rates <- seq(0, 0.15, 0.01)
pv_values <- sapply(discount_rates, function(r) calculate_pv_war(contract_war, r))
tibble(discount_rate = discount_rates, pv_war = pv_values) %>%
ggplot(aes(x = discount_rate * 100, y = pv_war)) +
geom_line(size = 1.5, color = "darkblue") +
geom_hline(yintercept = undiscounted, linetype = "dashed", color = "red") +
labs(title = "Present Value of WAR by Discount Rate",
subtitle = "6-year contract with declining production",
x = "Discount Rate (%)", y = "Present Value (WAR)") +
theme_minimal()
# Ohtani contract analysis
# $700M over 10 years, but $680M deferred (97% of contract)
# Actual payments: $2M/year for 10 years, then $68M/year for 10 years
# Calculate present value (using 5% discount rate)
calculate_contract_pv <- function(annual_payments, discount_rate = 0.05) {
years <- seq_along(annual_payments)
pv <- sum(annual_payments / ((1 + discount_rate) ^ years))
return(pv)
}
# Ohtani's payment structure
ohtani_payments <- c(rep(2, 10), rep(68, 10)) # Millions
ohtani_pv <- calculate_contract_pv(ohtani_payments, 0.05)
cat("Ohtani Contract:\n")
cat("Nominal value: $700M\n")
cat("Present value (5%): $", round(ohtani_pv, 0), "M\n", sep="")
cat("Effective AAV: $", round(ohtani_pv / 10, 1), "M\n\n", sep="")
# Project WAR (pitching in 2025, two-way thereafter)
ohtani_war_projection <- c(2.5, 8.0, 7.5, 7.0, 6.5, 6.0, 5.0, 4.0, 3.0, 2.0)
total_projected_war <- sum(ohtani_war_projection)
pv_war <- calculate_pv_war(ohtani_war_projection, 0.05)
cost_per_war <- (ohtani_pv / 10) / (total_projected_war / 10)
cat("Projected Performance:\n")
cat("Total WAR (10 years):", total_projected_war, "\n")
cat("Present Value WAR:", round(pv_war, 1), "\n")
cat("Cost per WAR: $", round(cost_per_war, 1), "M\n", sep="")
cat("Market rate: ~$12M/WAR\n")
cat("Value created: ", ifelse(cost_per_war < 12, "POSITIVE", "NEGATIVE"), "\n", sep="")
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Generic aging curve
aging_curve = pd.DataFrame({
'age': range(22, 39),
'war_multiplier': [0.75, 0.85, 0.92, 0.97, 1.00, 1.00, 0.98, # 22-28
0.95, 0.91, 0.86, 0.80, 0.73, 0.65, 0.56, # 29-35
0.46, 0.36, 0.26] # 36-38
})
# Example: Player produces 4 WAR at age 28
player_age_28_war = 4
projections = aging_curve[aging_curve['age'] >= 28].copy()
projections['projected_war'] = player_age_28_war * projections['war_multiplier']
projections['year'] = range(1, len(projections) + 1)
# Plot
plt.figure(figsize=(10, 6))
plt.plot(projections['age'], projections['projected_war'],
linewidth=2.5, color='darkred', marker='o', markersize=6)
plt.axhline(y=2, linestyle='--', color='gray', alpha=0.7, label='Replacement level')
plt.xlabel('Age')
plt.ylabel('Projected WAR')
plt.title('Aging Curve: Projected Performance Decline\nStarting from 4 WAR at age 28')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()
# Calculate total value
total_war_6yr = projections['projected_war'].iloc[:6].sum()
print(f"\nTotal projected WAR over 6-year deal (ages 28-33): {total_war_6yr:.1f}")
print(f"Average WAR per season: {total_war_6yr/6:.2f}")
def calculate_pv_war(war_vector, discount_rate=0.10):
"""Calculate present value of WAR stream"""
years = np.arange(1, len(war_vector) + 1)
discount_factors = 1 / ((1 + discount_rate) ** (years - 1))
pv_war = np.sum(np.array(war_vector) * discount_factors)
return pv_war
# Example: 6-year contract, declining WAR
contract_war = [4.0, 3.8, 3.4, 3.0, 2.5, 2.0]
undiscounted = sum(contract_war)
pv_5pct = calculate_pv_war(contract_war, 0.05)
pv_10pct = calculate_pv_war(contract_war, 0.10)
print("Total WAR:")
print(f"Undiscounted: {undiscounted:.1f}")
print(f"PV at 5% discount: {pv_5pct:.1f}")
print(f"PV at 10% discount: {pv_10pct:.1f}")
# Visualize
discount_rates = np.arange(0, 0.16, 0.01)
pv_values = [calculate_pv_war(contract_war, r) for r in discount_rates]
plt.figure(figsize=(10, 6))
plt.plot(discount_rates * 100, pv_values, linewidth=2.5, color='darkblue')
plt.axhline(y=undiscounted, linestyle='--', color='red', alpha=0.7,
label=f'Undiscounted ({undiscounted:.1f} WAR)')
plt.xlabel('Discount Rate (%)')
plt.ylabel('Present Value (WAR)')
plt.title('Present Value of WAR by Discount Rate\n6-year contract with declining production')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()
def calculate_contract_pv(annual_payments, discount_rate=0.05):
"""Calculate present value of contract"""
years = np.arange(1, len(annual_payments) + 1)
pv = np.sum(np.array(annual_payments) / ((1 + discount_rate) ** years))
return pv
# Ohtani's payment structure
ohtani_payments = [2] * 10 + [68] * 10 # Millions
ohtani_pv = calculate_contract_pv(ohtani_payments, 0.05)
print("Ohtani Contract:")
print(f"Nominal value: $700M")
print(f"Present value (5%): ${ohtani_pv:.0f}M")
print(f"Effective AAV: ${ohtani_pv/10:.1f}M\n")
# Project WAR
ohtani_war_projection = [2.5, 8.0, 7.5, 7.0, 6.5, 6.0, 5.0, 4.0, 3.0, 2.0]
total_projected_war = sum(ohtani_war_projection)
pv_war = calculate_pv_war(ohtani_war_projection, 0.05)
cost_per_war = (ohtani_pv / 10) / (total_projected_war / 10)
print("Projected Performance:")
print(f"Total WAR (10 years): {total_projected_war}")
print(f"Present Value WAR: {pv_war:.1f}")
print(f"Cost per WAR: ${cost_per_war:.1f}M")
print(f"Market rate: ~$12M/WAR")
print(f"Value created: {'POSITIVE' if cost_per_war < 12 else 'NEGATIVE'}")
Trades involve exchanging present and future value across different competitive timelines. Analytical frameworks help evaluate whether trades align with organizational objectives.
Framework: Surplus Value
Trade analysis centers on surplus value—the difference between a player's expected production and cost. Teams trade surplus value across different time horizons.
# Calculate surplus value
calculate_surplus <- function(projected_war, salary, market_rate = 12) {
market_value <- projected_war * market_rate
surplus <- market_value - salary
return(surplus)
}
# Example: Compare players in trade discussion
trade_evaluation <- tibble(
player = c("Established Star", "Young Prospect", "Prospect 2", "Prospect 3"),
years_control = c(2, 6, 5, 5),
projected_war_annual = c(5.5, 2.5, 2.0, 1.5),
salary = c(32, 0.8, 0.75, 0.75) # Millions per year
)
trade_evaluation <- trade_evaluation %>%
mutate(
total_war = years_control * projected_war_annual,
total_salary = years_control * salary,
total_surplus = calculate_surplus(total_war, total_salary),
surplus_per_year = total_surplus / years_control
)
print(trade_evaluation)
cat("\nTrade Scenario: Star for 3 prospects\n")
cat("Team A receives: ", trade_evaluation$total_surplus[1], "M surplus (win-now mode)\n", sep="")
cat("Team B receives: ", sum(trade_evaluation$total_surplus[2:4]), "M surplus (rebuilding)\n", sep="")
def calculate_surplus(projected_war, salary, market_rate=12):
"""Calculate surplus value"""
market_value = projected_war * market_rate
surplus = market_value - salary
return surplus
# Example: Compare players in trade discussion
trade_evaluation = pd.DataFrame({
'player': ['Established Star', 'Young Prospect', 'Prospect 2', 'Prospect 3'],
'years_control': [2, 6, 5, 5],
'projected_war_annual': [5.5, 2.5, 2.0, 1.5],
'salary': [32, 0.8, 0.75, 0.75] # Millions per year
})
trade_evaluation['total_war'] = (trade_evaluation['years_control'] *
trade_evaluation['projected_war_annual'])
trade_evaluation['total_salary'] = (trade_evaluation['years_control'] *
trade_evaluation['salary'])
trade_evaluation['total_surplus'] = calculate_surplus(
trade_evaluation['total_war'],
trade_evaluation['total_salary']
)
trade_evaluation['surplus_per_year'] = (trade_evaluation['total_surplus'] /
trade_evaluation['years_control'])
print(trade_evaluation)
print(f"\nTrade Scenario: Star for 3 prospects")
print(f"Team A receives: ${trade_evaluation['total_surplus'].iloc[0]:.0f}M surplus (win-now)")
print(f"Team B receives: ${trade_evaluation['total_surplus'].iloc[1:4].sum():.0f}M surplus (rebuilding)")
The star provides 38M surplus over 2 years (immediate value), while the prospects provide 144M surplus over 5-6 years (future value). Both teams can win this trade if their timelines align appropriately.
The Win-Now vs Rebuild Trade-off
Contending teams should trade future value for present value; rebuilding teams should do the opposite. The key is estimating competitive windows.
# Model competitive window
library(tidyverse)
# Scenario: Team is 83 wins, needs 87+ to contend
simulate_competitive_window <- function(current_wins = 83,
core_aging_rate = -1.5,
prospect_improvement = 0.5,
years = 5) {
tibble(
year = 1:years,
core_value = pmax(0, current_wins + (year - 1) * core_aging_rate),
prospect_value = (year - 1) * prospect_improvement,
total_wins = core_value + prospect_value
)
}
# Two scenarios: trade for star vs keep prospects
trade_scenario <- simulate_competitive_window() %>%
mutate(
scenario = "Trade for Star",
star_boost = c(7, 6, 4, 0, 0), # 2 years of star, declining
final_wins = total_wins + star_boost
)
keep_scenario <- simulate_competitive_window() %>%
mutate(
scenario = "Keep Prospects",
prospect_boost = year * 1.5, # Prospects develop gradually
final_wins = total_wins + prospect_boost
)
scenarios <- bind_rows(trade_scenario, keep_scenario)
ggplot(scenarios, aes(x = year, y = final_wins, color = scenario)) +
geom_line(size = 1.5) +
geom_point(size = 3) +
geom_hline(yintercept = 87, linetype = "dashed", color = "red") +
annotate("text", x = 4.5, y = 88, label = "Playoff threshold", size = 3.5) +
labs(title = "Trade Decision: Competitive Window Analysis",
subtitle = "Win-now trade vs prospect development",
x = "Years from Now", y = "Projected Wins",
color = "Strategy") +
theme_minimal() +
scale_color_manual(values = c("darkblue", "darkgreen"))
def simulate_competitive_window(current_wins=83, core_aging_rate=-1.5,
prospect_improvement=0.5, years=5):
"""Model competitive window over time"""
year = np.arange(1, years + 1)
core_value = np.maximum(0, current_wins + (year - 1) * core_aging_rate)
prospect_value = (year - 1) * prospect_improvement
total_wins = core_value + prospect_value
return pd.DataFrame({
'year': year,
'core_value': core_value,
'prospect_value': prospect_value,
'total_wins': total_wins
})
# Two scenarios
trade_scenario = simulate_competitive_window()
trade_scenario['scenario'] = 'Trade for Star'
trade_scenario['star_boost'] = [7, 6, 4, 0, 0]
trade_scenario['final_wins'] = trade_scenario['total_wins'] + trade_scenario['star_boost']
keep_scenario = simulate_competitive_window()
keep_scenario['scenario'] = 'Keep Prospects'
keep_scenario['prospect_boost'] = keep_scenario['year'] * 1.5
keep_scenario['final_wins'] = keep_scenario['total_wins'] + keep_scenario['prospect_boost']
# Plot
plt.figure(figsize=(10, 6))
plt.plot(trade_scenario['year'], trade_scenario['final_wins'],
marker='o', linewidth=2.5, label='Trade for Star', color='darkblue')
plt.plot(keep_scenario['year'], keep_scenario['final_wins'],
marker='o', linewidth=2.5, label='Keep Prospects', color='darkgreen')
plt.axhline(y=87, linestyle='--', color='red', alpha=0.7, label='Playoff threshold')
plt.xlabel('Years from Now')
plt.ylabel('Projected Wins')
plt.title('Trade Decision: Competitive Window Analysis\nWin-now trade vs prospect development')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
The trade creates a 2-year window above 87 wins but sacrifices future competitiveness. Keep prospects if you value sustained success; trade if you prioritize immediate contention.
Case Study: The 2017 Astros Trade for Verlander
In August 2017, the Houston Astros traded for Justin Verlander, sending 3 prospects to Detroit. Analysis:
What Houston Gave Up:
- Franklin Perez (RHP): Mid-level prospect, 30% chance of 2+ WAR
- Daz Cameron (OF): Mid-level prospect, 25% chance of 2+ WAR
- Jake Rogers (C): Fringe prospect, 15% chance of 2+ WAR
- Expected total value: ~6 WAR over 6 years = 72M surplus
What Houston Got:
- Verlander (2017 playoffs): ~1 WAR, crucial in World Series run
- Verlander (2018-19): ~13 WAR combined
- Total value: ~14 WAR over 2.5 years
Result: Houston won the 2017 World Series and remained elite through 2019. None of the prospects became significant contributors. The trade succeeded because Houston correctly identified their championship window and maximized present value.
# Calculate surplus value
calculate_surplus <- function(projected_war, salary, market_rate = 12) {
market_value <- projected_war * market_rate
surplus <- market_value - salary
return(surplus)
}
# Example: Compare players in trade discussion
trade_evaluation <- tibble(
player = c("Established Star", "Young Prospect", "Prospect 2", "Prospect 3"),
years_control = c(2, 6, 5, 5),
projected_war_annual = c(5.5, 2.5, 2.0, 1.5),
salary = c(32, 0.8, 0.75, 0.75) # Millions per year
)
trade_evaluation <- trade_evaluation %>%
mutate(
total_war = years_control * projected_war_annual,
total_salary = years_control * salary,
total_surplus = calculate_surplus(total_war, total_salary),
surplus_per_year = total_surplus / years_control
)
print(trade_evaluation)
cat("\nTrade Scenario: Star for 3 prospects\n")
cat("Team A receives: ", trade_evaluation$total_surplus[1], "M surplus (win-now mode)\n", sep="")
cat("Team B receives: ", sum(trade_evaluation$total_surplus[2:4]), "M surplus (rebuilding)\n", sep="")
# Model competitive window
library(tidyverse)
# Scenario: Team is 83 wins, needs 87+ to contend
simulate_competitive_window <- function(current_wins = 83,
core_aging_rate = -1.5,
prospect_improvement = 0.5,
years = 5) {
tibble(
year = 1:years,
core_value = pmax(0, current_wins + (year - 1) * core_aging_rate),
prospect_value = (year - 1) * prospect_improvement,
total_wins = core_value + prospect_value
)
}
# Two scenarios: trade for star vs keep prospects
trade_scenario <- simulate_competitive_window() %>%
mutate(
scenario = "Trade for Star",
star_boost = c(7, 6, 4, 0, 0), # 2 years of star, declining
final_wins = total_wins + star_boost
)
keep_scenario <- simulate_competitive_window() %>%
mutate(
scenario = "Keep Prospects",
prospect_boost = year * 1.5, # Prospects develop gradually
final_wins = total_wins + prospect_boost
)
scenarios <- bind_rows(trade_scenario, keep_scenario)
ggplot(scenarios, aes(x = year, y = final_wins, color = scenario)) +
geom_line(size = 1.5) +
geom_point(size = 3) +
geom_hline(yintercept = 87, linetype = "dashed", color = "red") +
annotate("text", x = 4.5, y = 88, label = "Playoff threshold", size = 3.5) +
labs(title = "Trade Decision: Competitive Window Analysis",
subtitle = "Win-now trade vs prospect development",
x = "Years from Now", y = "Projected Wins",
color = "Strategy") +
theme_minimal() +
scale_color_manual(values = c("darkblue", "darkgreen"))
def calculate_surplus(projected_war, salary, market_rate=12):
"""Calculate surplus value"""
market_value = projected_war * market_rate
surplus = market_value - salary
return surplus
# Example: Compare players in trade discussion
trade_evaluation = pd.DataFrame({
'player': ['Established Star', 'Young Prospect', 'Prospect 2', 'Prospect 3'],
'years_control': [2, 6, 5, 5],
'projected_war_annual': [5.5, 2.5, 2.0, 1.5],
'salary': [32, 0.8, 0.75, 0.75] # Millions per year
})
trade_evaluation['total_war'] = (trade_evaluation['years_control'] *
trade_evaluation['projected_war_annual'])
trade_evaluation['total_salary'] = (trade_evaluation['years_control'] *
trade_evaluation['salary'])
trade_evaluation['total_surplus'] = calculate_surplus(
trade_evaluation['total_war'],
trade_evaluation['total_salary']
)
trade_evaluation['surplus_per_year'] = (trade_evaluation['total_surplus'] /
trade_evaluation['years_control'])
print(trade_evaluation)
print(f"\nTrade Scenario: Star for 3 prospects")
print(f"Team A receives: ${trade_evaluation['total_surplus'].iloc[0]:.0f}M surplus (win-now)")
print(f"Team B receives: ${trade_evaluation['total_surplus'].iloc[1:4].sum():.0f}M surplus (rebuilding)")
def simulate_competitive_window(current_wins=83, core_aging_rate=-1.5,
prospect_improvement=0.5, years=5):
"""Model competitive window over time"""
year = np.arange(1, years + 1)
core_value = np.maximum(0, current_wins + (year - 1) * core_aging_rate)
prospect_value = (year - 1) * prospect_improvement
total_wins = core_value + prospect_value
return pd.DataFrame({
'year': year,
'core_value': core_value,
'prospect_value': prospect_value,
'total_wins': total_wins
})
# Two scenarios
trade_scenario = simulate_competitive_window()
trade_scenario['scenario'] = 'Trade for Star'
trade_scenario['star_boost'] = [7, 6, 4, 0, 0]
trade_scenario['final_wins'] = trade_scenario['total_wins'] + trade_scenario['star_boost']
keep_scenario = simulate_competitive_window()
keep_scenario['scenario'] = 'Keep Prospects'
keep_scenario['prospect_boost'] = keep_scenario['year'] * 1.5
keep_scenario['final_wins'] = keep_scenario['total_wins'] + keep_scenario['prospect_boost']
# Plot
plt.figure(figsize=(10, 6))
plt.plot(trade_scenario['year'], trade_scenario['final_wins'],
marker='o', linewidth=2.5, label='Trade for Star', color='darkblue')
plt.plot(keep_scenario['year'], keep_scenario['final_wins'],
marker='o', linewidth=2.5, label='Keep Prospects', color='darkgreen')
plt.axhline(y=87, linestyle='--', color='red', alpha=0.7, label='Playoff threshold')
plt.xlabel('Years from Now')
plt.ylabel('Projected Wins')
plt.title('Trade Decision: Competitive Window Analysis\nWin-now trade vs prospect development')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
The MLB draft differs fundamentally from other sports' drafts. There's no salary cap, draft picks can't be traded directly (only via competitive balance), and amateur players take 3-5 years to reach MLB. These features create unique strategic considerations.
Expected Value by Draft Position
Draft picks have declining expected value, but unlike the NBA or NFL, even first overall picks are uncertain:
# Model expected value by draft position (based on historical research)
library(tidyverse)
draft_value <- tibble(
pick = 1:100,
prob_mlb = pmax(0.05, 0.85 - (pick - 1) * 0.008), # Probability of reaching MLB
expected_war = pmax(0.1, 12 - (pick - 1) * 0.11), # Expected career WAR if reaches MLB
overall_expected_war = prob_mlb * expected_war
) %>%
mutate(
surplus_value = overall_expected_war * 12 - 2, # $12M/WAR, -$2M signing bonus
round = case_when(
pick <= 30 ~ "Round 1",
pick <= 60 ~ "Round 2",
pick <= 100 ~ "Round 3+"
)
)
# Visualize
ggplot(draft_value %>% filter(pick <= 60),
aes(x = pick, y = surplus_value, color = round)) +
geom_line(size = 1.5) +
geom_point(size = 2) +
scale_color_manual(values = c("darkblue", "steelblue", "lightblue")) +
labs(title = "Expected Surplus Value by Draft Position",
subtitle = "MLB Draft expected value declines gradually",
x = "Draft Pick", y = "Expected Surplus Value ($M)",
color = "Round") +
theme_minimal()
# Compare value tiers
cat("Expected surplus value:\n")
cat("Pick 1: $", round(draft_value$surplus_value[1], 1), "M\n", sep="")
cat("Pick 10: $", round(draft_value$surplus_value[10], 1), "M\n", sep="")
cat("Pick 30: $", round(draft_value$surplus_value[30], 1), "M\n", sep="")
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Model expected value by draft position
pick = np.arange(1, 101)
prob_mlb = np.maximum(0.05, 0.85 - (pick - 1) * 0.008)
expected_war = np.maximum(0.1, 12 - (pick - 1) * 0.11)
overall_expected_war = prob_mlb * expected_war
draft_value = pd.DataFrame({
'pick': pick,
'prob_mlb': prob_mlb,
'expected_war': expected_war,
'overall_expected_war': overall_expected_war
})
draft_value['surplus_value'] = draft_value['overall_expected_war'] * 12 - 2
draft_value['round'] = pd.cut(draft_value['pick'],
bins=[0, 30, 60, 100],
labels=['Round 1', 'Round 2', 'Round 3+'])
# Visualize
plt.figure(figsize=(10, 6))
for round_name, color in [('Round 1', 'darkblue'),
('Round 2', 'steelblue'),
('Round 3+', 'lightblue')]:
data = draft_value[draft_value['round'] == round_name]
if len(data) > 0 and data['pick'].max() <= 60:
plt.plot(data['pick'], data['surplus_value'],
label=round_name, color=color, linewidth=2)
plt.xlabel('Draft Pick')
plt.ylabel('Expected Surplus Value ($M)')
plt.title('Expected Surplus Value by Draft Position\nMLB Draft expected value declines gradually')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xlim(0, 60)
plt.tight_layout()
plt.show()
print("Expected surplus value:")
print(f"Pick 1: ${draft_value['surplus_value'].iloc[0]:.1f}M")
print(f"Pick 10: ${draft_value['surplus_value'].iloc[9]:.1f}M")
print(f"Pick 30: ${draft_value['surplus_value'].iloc[29]:.1f}M")
Unlike the NBA where the #1 pick has enormous expected value, MLB's #1 pick is worth perhaps 10M more than #10. This flatter value curve reduces the incentive to tank.
College vs High School Players
Teams face a fundamental strategic choice: draft college players (safer, closer to MLB-ready) or high school players (riskier, higher upside).
College Players:
- Higher floor: More developed, easier to evaluate
- Lower ceiling: Limited remaining development time
- Faster to MLB: Often 2-3 years vs 4-5 for high schoolers
- Better for contending teams with shorter time horizons
High School Players:
- Higher ceiling: More physical development remaining
- Lower floor: Higher bust rate, harder to evaluate
- Slower to MLB: Longer development time
- Better for rebuilding teams with patient timelines
# Compare college vs high school success rates (based on research)
library(tidyverse)
draft_comparison <- tibble(
pick_range = rep(c("1-10", "11-30", "31-60"), 2),
player_type = rep(c("College", "High School"), each = 3),
mlb_rate = c(0.75, 0.65, 0.52, # College
0.60, 0.48, 0.35), # High School
avg_war = c(15, 10, 6, # College
18, 13, 8), # High School (among those who make it)
expected_war = mlb_rate * avg_war
)
ggplot(draft_comparison, aes(x = pick_range, y = expected_war,
fill = player_type)) +
geom_col(position = "dodge", alpha = 0.8) +
scale_fill_manual(values = c("College" = "darkblue",
"High School" = "darkorange")) +
labs(title = "College vs High School Draft Value",
subtitle = "Expected career WAR by draft position",
x = "Draft Pick Range", y = "Expected WAR",
fill = "Player Type") +
theme_minimal()
# Compare college vs high school success rates
draft_comparison = pd.DataFrame({
'pick_range': ['1-10', '11-30', '31-60'] * 2,
'player_type': ['College'] * 3 + ['High School'] * 3,
'mlb_rate': [0.75, 0.65, 0.52, # College
0.60, 0.48, 0.35], # High School
'avg_war': [15, 10, 6, # College
18, 13, 8], # High School
})
draft_comparison['expected_war'] = (draft_comparison['mlb_rate'] *
draft_comparison['avg_war'])
# Plot
fig, ax = plt.subplots(figsize=(10, 6))
x = np.arange(3)
width = 0.35
college = draft_comparison[draft_comparison['player_type'] == 'College']
hs = draft_comparison[draft_comparison['player_type'] == 'High School']
ax.bar(x - width/2, college['expected_war'], width,
label='College', color='darkblue', alpha=0.8)
ax.bar(x + width/2, hs['expected_war'], width,
label='High School', color='darkorange', alpha=0.8)
ax.set_xlabel('Draft Pick Range')
ax.set_ylabel('Expected WAR')
ax.set_title('College vs High School Draft Value\nExpected career WAR by draft position')
ax.set_xticks(x)
ax.set_xticklabels(['1-10', '11-30', '31-60'])
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
In the top 10, high schoolers have higher expected value despite lower MLB rates because successful high schoolers contribute more total WAR. This reverses in later rounds where college players' safer profiles dominate.
Signability and Slot Value
Unlike NBA/NFL drafts, MLB draft picks can refuse to sign. Teams must balance talent evaluation with signability within bonus pool constraints. This creates market efficiency opportunities.
Strategy for Teams Exceeding Slot:
- Target high-upside players who fell due to signability concerns
- Offer over-slot bonuses to secure them
- Accept penalties for exceeding pool (10-15% tax, or worse if exceeding by 5%+)
Strategy for Cost-Conscious Teams:
- Draft signable college seniors (limited leverage) early
- Save slot money for later rounds
- Target high schoolers with college commitments in later rounds, offer them saved money
The 2023 Pirates exemplified this: they drafted college-heavy early (signable), then used saved pool money on high-upside high schoolers in rounds 3-10.
Case Study: Astros' Draft Strategy (2011-2014)
Houston's rebuild featured aggressive draft strategy:
2012-2014 Strategy:
- Accumulated top picks via losing (1st overall: 2012, 2013, 2014)
- Drafted high-ceiling high schoolers: Carlos Correa (2012), Mark Appel (2013 bust), Alex Bregman (2015)
- Exceeded slot bonuses to secure upside
- Accepted competitive balance pick penalties
Results:
- Correa: 135 WAR through 2023, cornerstone player
- Bregman: 35+ WAR, All-Star third baseman
- Strong supporting cast from later picks
This aggressive strategy, combined with analytics-driven player development, turned Houston from 100-loss team (2011-2013) to World Series champion (2017).
# Model expected value by draft position (based on historical research)
library(tidyverse)
draft_value <- tibble(
pick = 1:100,
prob_mlb = pmax(0.05, 0.85 - (pick - 1) * 0.008), # Probability of reaching MLB
expected_war = pmax(0.1, 12 - (pick - 1) * 0.11), # Expected career WAR if reaches MLB
overall_expected_war = prob_mlb * expected_war
) %>%
mutate(
surplus_value = overall_expected_war * 12 - 2, # $12M/WAR, -$2M signing bonus
round = case_when(
pick <= 30 ~ "Round 1",
pick <= 60 ~ "Round 2",
pick <= 100 ~ "Round 3+"
)
)
# Visualize
ggplot(draft_value %>% filter(pick <= 60),
aes(x = pick, y = surplus_value, color = round)) +
geom_line(size = 1.5) +
geom_point(size = 2) +
scale_color_manual(values = c("darkblue", "steelblue", "lightblue")) +
labs(title = "Expected Surplus Value by Draft Position",
subtitle = "MLB Draft expected value declines gradually",
x = "Draft Pick", y = "Expected Surplus Value ($M)",
color = "Round") +
theme_minimal()
# Compare value tiers
cat("Expected surplus value:\n")
cat("Pick 1: $", round(draft_value$surplus_value[1], 1), "M\n", sep="")
cat("Pick 10: $", round(draft_value$surplus_value[10], 1), "M\n", sep="")
cat("Pick 30: $", round(draft_value$surplus_value[30], 1), "M\n", sep="")
# Compare college vs high school success rates (based on research)
library(tidyverse)
draft_comparison <- tibble(
pick_range = rep(c("1-10", "11-30", "31-60"), 2),
player_type = rep(c("College", "High School"), each = 3),
mlb_rate = c(0.75, 0.65, 0.52, # College
0.60, 0.48, 0.35), # High School
avg_war = c(15, 10, 6, # College
18, 13, 8), # High School (among those who make it)
expected_war = mlb_rate * avg_war
)
ggplot(draft_comparison, aes(x = pick_range, y = expected_war,
fill = player_type)) +
geom_col(position = "dodge", alpha = 0.8) +
scale_fill_manual(values = c("College" = "darkblue",
"High School" = "darkorange")) +
labs(title = "College vs High School Draft Value",
subtitle = "Expected career WAR by draft position",
x = "Draft Pick Range", y = "Expected WAR",
fill = "Player Type") +
theme_minimal()
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Model expected value by draft position
pick = np.arange(1, 101)
prob_mlb = np.maximum(0.05, 0.85 - (pick - 1) * 0.008)
expected_war = np.maximum(0.1, 12 - (pick - 1) * 0.11)
overall_expected_war = prob_mlb * expected_war
draft_value = pd.DataFrame({
'pick': pick,
'prob_mlb': prob_mlb,
'expected_war': expected_war,
'overall_expected_war': overall_expected_war
})
draft_value['surplus_value'] = draft_value['overall_expected_war'] * 12 - 2
draft_value['round'] = pd.cut(draft_value['pick'],
bins=[0, 30, 60, 100],
labels=['Round 1', 'Round 2', 'Round 3+'])
# Visualize
plt.figure(figsize=(10, 6))
for round_name, color in [('Round 1', 'darkblue'),
('Round 2', 'steelblue'),
('Round 3+', 'lightblue')]:
data = draft_value[draft_value['round'] == round_name]
if len(data) > 0 and data['pick'].max() <= 60:
plt.plot(data['pick'], data['surplus_value'],
label=round_name, color=color, linewidth=2)
plt.xlabel('Draft Pick')
plt.ylabel('Expected Surplus Value ($M)')
plt.title('Expected Surplus Value by Draft Position\nMLB Draft expected value declines gradually')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xlim(0, 60)
plt.tight_layout()
plt.show()
print("Expected surplus value:")
print(f"Pick 1: ${draft_value['surplus_value'].iloc[0]:.1f}M")
print(f"Pick 10: ${draft_value['surplus_value'].iloc[9]:.1f}M")
print(f"Pick 30: ${draft_value['surplus_value'].iloc[29]:.1f}M")
# Compare college vs high school success rates
draft_comparison = pd.DataFrame({
'pick_range': ['1-10', '11-30', '31-60'] * 2,
'player_type': ['College'] * 3 + ['High School'] * 3,
'mlb_rate': [0.75, 0.65, 0.52, # College
0.60, 0.48, 0.35], # High School
'avg_war': [15, 10, 6, # College
18, 13, 8], # High School
})
draft_comparison['expected_war'] = (draft_comparison['mlb_rate'] *
draft_comparison['avg_war'])
# Plot
fig, ax = plt.subplots(figsize=(10, 6))
x = np.arange(3)
width = 0.35
college = draft_comparison[draft_comparison['player_type'] == 'College']
hs = draft_comparison[draft_comparison['player_type'] == 'High School']
ax.bar(x - width/2, college['expected_war'], width,
label='College', color='darkblue', alpha=0.8)
ax.bar(x + width/2, hs['expected_war'], width,
label='High School', color='darkorange', alpha=0.8)
ax.set_xlabel('Draft Pick Range')
ax.set_ylabel('Expected WAR')
ax.set_title('College vs High School Draft Value\nExpected career WAR by draft position')
ax.set_xticks(x)
ax.set_xticklabels(['1-10', '11-30', '31-60'])
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
Perhaps the most consequential front office decision is determining competitive timeline: compete now, rebuild, or attempt middle ground. Analytics can inform but not fully answer this strategic question.
The Competitive Window
Teams have limited windows of contention based on:
- Core player aging curves
- Contract expirations and financial flexibility
- Farm system strength
- Division and league competition
# Model competitive window
library(tidyverse)
# Hypothetical team: strong core aging out, weak farm system
team_projection <- tibble(
year = 2024:2033,
core_war = c(35, 33, 30, 26, 22, 18, 14, 10, 8, 6), # Aging stars
young_players = c(5, 8, 10, 12, 12, 11, 10, 8, 6, 5),
free_agents = c(8, 8, 8, 8, 8, 8, 8, 8, 8, 8), # Replacement level
total_war = core_war + young_players + free_agents
) %>%
mutate(
expected_wins = 52 + (total_war * 0.96), # Pythagorean conversion
playoff_prob = plogis((expected_wins - 87) / 3) * 100
)
# Visualize window
ggplot(team_projection, aes(x = year, y = expected_wins)) +
geom_line(size = 1.5, color = "darkblue") +
geom_ribbon(aes(ymin = 52, ymax = expected_wins), alpha = 0.3, fill = "blue") +
geom_hline(yintercept = 87, linetype = "dashed", color = "red") +
annotate("text", x = 2028, y = 88, label = "Playoff threshold (~87 wins)",
size = 4, hjust = 0) +
annotate("rect", xmin = 2024, xmax = 2027, ymin = 50, ymax = 105,
alpha = 0.1, fill = "green") +
annotate("text", x = 2025.5, y = 103, label = "Competitive Window",
size = 4.5, fontface = "bold") +
labs(title = "Projected Competitive Window",
subtitle = "Team with aging core, limited farm system",
x = "Year", y = "Projected Wins") +
theme_minimal() +
scale_x_continuous(breaks = seq(2024, 2033, 1))
print(team_projection)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.special import expit
# Hypothetical team projection
team_projection = pd.DataFrame({
'year': range(2024, 2034),
'core_war': [35, 33, 30, 26, 22, 18, 14, 10, 8, 6],
'young_players': [5, 8, 10, 12, 12, 11, 10, 8, 6, 5],
'free_agents': [8] * 10
})
team_projection['total_war'] = (team_projection['core_war'] +
team_projection['young_players'] +
team_projection['free_agents'])
team_projection['expected_wins'] = 52 + (team_projection['total_war'] * 0.96)
team_projection['playoff_prob'] = (expit((team_projection['expected_wins'] - 87) / 3) * 100)
print(team_projection)
# Visualize
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(team_projection['year'], team_projection['expected_wins'],
linewidth=2.5, color='darkblue', marker='o')
ax.fill_between(team_projection['year'], 52, team_projection['expected_wins'],
alpha=0.3, color='blue')
ax.axhline(y=87, linestyle='--', color='red', linewidth=1.5, label='Playoff threshold')
# Highlight competitive window
ax.axvspan(2024, 2027, alpha=0.1, color='green')
ax.text(2025.5, 103, 'Competitive Window', fontsize=12,
ha='center', fontweight='bold')
ax.set_xlabel('Year', fontsize=11)
ax.set_ylabel('Projected Wins', fontsize=11)
ax.set_title('Projected Competitive Window\nTeam with aging core, limited farm system',
fontsize=13)
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
This team has 3-4 years to contend before core decline makes competition unlikely. This shapes all roster decisions.
The Cost of Mediocrity
The "middle ground" strategy—trying to compete while rebuilding—often fails. Mediocre teams (75-82 wins) get worst of both worlds:
- Miss playoffs (no postseason revenue or success)
- Pick 15-20th in draft (lower talent acquisition)
- Limited prospect surplus to trade
- Reduced organizational energy and fan enthusiasm
# Compare strategies over 10 years
library(tidyverse)
simulate_strategy <- function(strategy = "compete", years = 10) {
if (strategy == "compete") {
wins <- c(92, 88, 91, 85, 82, 78, 75, 72, 70, 68)
playoff_prob <- plogis((wins - 87) / 3)
championships <- c(0, 0, 0.15, 0, 0, 0, 0, 0, 0, 0) # 15% chance in year 3
} else if (strategy == "rebuild") {
wins <- c(68, 65, 72, 78, 84, 88, 91, 93, 89, 87)
playoff_prob <- plogis((wins - 87) / 3)
championships <- c(0, 0, 0, 0, 0.05, 0.10, 0.15, 0.18, 0.12, 0.10)
} else { # mediocre
wins <- c(79, 77, 80, 78, 81, 79, 80, 82, 78, 77)
playoff_prob <- plogis((wins - 87) / 3)
championships <- rep(0.02, 10)
}
tibble(
year = 1:years,
strategy = strategy,
wins = wins,
playoff_prob = playoff_prob,
championship_prob = championships,
expected_championships = sum(championships)
)
}
strategies <- bind_rows(
simulate_strategy("compete"),
simulate_strategy("rebuild"),
simulate_strategy("mediocre")
)
# Compare outcomes
ggplot(strategies, aes(x = year, y = wins, color = strategy)) +
geom_line(size = 1.5) +
geom_point(size = 3) +
geom_hline(yintercept = 87, linetype = "dashed", color = "black") +
scale_color_manual(values = c("compete" = "darkgreen",
"rebuild" = "darkblue",
"mediocre" = "gray50")) +
labs(title = "Strategic Approach Comparison",
subtitle = "10-year outcomes by strategy",
x = "Year", y = "Wins", color = "Strategy") +
theme_minimal()
# Summary statistics
strategies %>%
group_by(strategy) %>%
summarise(
total_wins = sum(wins),
avg_wins = mean(wins),
playoff_years = sum(playoff_prob > 0.5),
expected_titles = sum(championship_prob)
) %>%
print()
def simulate_strategy(strategy='compete', years=10):
"""Simulate 10-year outcomes by strategy"""
if strategy == 'compete':
wins = [92, 88, 91, 85, 82, 78, 75, 72, 70, 68]
championships = [0, 0, 0.15, 0, 0, 0, 0, 0, 0, 0]
elif strategy == 'rebuild':
wins = [68, 65, 72, 78, 84, 88, 91, 93, 89, 87]
championships = [0, 0, 0, 0, 0.05, 0.10, 0.15, 0.18, 0.12, 0.10]
else: # mediocre
wins = [79, 77, 80, 78, 81, 79, 80, 82, 78, 77]
championships = [0.02] * 10
playoff_prob = expit((np.array(wins) - 87) / 3)
return pd.DataFrame({
'year': range(1, years + 1),
'strategy': strategy,
'wins': wins,
'playoff_prob': playoff_prob,
'championship_prob': championships
})
# Combine strategies
strategies = pd.concat([
simulate_strategy('compete'),
simulate_strategy('rebuild'),
simulate_strategy('mediocre')
])
# Visualize
plt.figure(figsize=(12, 6))
for strat, color in [('compete', 'darkgreen'),
('rebuild', 'darkblue'),
('mediocre', 'gray')]:
data = strategies[strategies['strategy'] == strat]
plt.plot(data['year'], data['wins'], marker='o',
linewidth=2.5, label=strat.capitalize(), color=color)
plt.axhline(y=87, linestyle='--', color='black', alpha=0.7, label='Playoff threshold')
plt.xlabel('Year')
plt.ylabel('Wins')
plt.title('Strategic Approach Comparison\n10-year outcomes by strategy')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Summary
summary = strategies.groupby('strategy').agg({
'wins': ['sum', 'mean'],
'playoff_prob': lambda x: (x > 0.5).sum(),
'championship_prob': 'sum'
}).round(2)
print("\nStrategy Comparison:")
print(summary)
The rebuild strategy produces fewer total wins but more championships. The compete strategy wins early but fades. The mediocre approach accomplishes little—fewer wins and championships than either alternative.
Signals for Rebuilding
Analytics can identify when rebuilding becomes optimal:
- Declining core with expensive contracts: Core players 30+ with 2-4 years remaining
- Weak farm system: Low prospect surplus value
- Division competition: Strong division rivals with younger cores
- Market inefficiency: High trade value for aging stars (can recoup surplus)
- Expensive win curve: Would need to pay $15M+ per marginal win to contend
Signals for Competing
Compete when:
- Strong core in prime (ages 26-30): Multiple stars under team control
- Deep farm system: Can trade prospects without depleting
- Weak competition: Division and league competitive balance favors you
- Favorable contracts: Core players on team-friendly deals create budget flexibility
- Efficient win curve: Can buy marginal wins at or below market rate
The Astros Rebuild Model
Houston's 2011-2014 rebuild became a template:
Phase 1 (2011-2013): Tear Down
- Trade all valuable veterans for prospects (Hunter Pence, Wandy Rodriguez, etc.)
- Accept 100+ loss seasons
- Accumulate top draft picks
- Install analytics-driven front office and development system
Phase 2 (2014-2015): Foundation
- Promote young talent (George Springer, Jose Altuve, Dallas Keuchel)
- Continue high draft picks
- Begin selective free agent signings (low-cost veterans)
Phase 3 (2016-2017): Compete
- Trade prospects for proven talent (Ken Giles, Justin Verlander)
- Sign impact free agents (Carlos Beltran)
- Promote remaining top prospects (Alex Bregman, Carlos Correa)
Result:
- 3 World Series appearances (2017, 2019, 2021)
- 1 championship (2017)
- Sustained excellence (85+ wins every year 2015-2023)
The key: Clear strategy, organizational commitment, and analytics-driven execution at every phase.
# Model competitive window
library(tidyverse)
# Hypothetical team: strong core aging out, weak farm system
team_projection <- tibble(
year = 2024:2033,
core_war = c(35, 33, 30, 26, 22, 18, 14, 10, 8, 6), # Aging stars
young_players = c(5, 8, 10, 12, 12, 11, 10, 8, 6, 5),
free_agents = c(8, 8, 8, 8, 8, 8, 8, 8, 8, 8), # Replacement level
total_war = core_war + young_players + free_agents
) %>%
mutate(
expected_wins = 52 + (total_war * 0.96), # Pythagorean conversion
playoff_prob = plogis((expected_wins - 87) / 3) * 100
)
# Visualize window
ggplot(team_projection, aes(x = year, y = expected_wins)) +
geom_line(size = 1.5, color = "darkblue") +
geom_ribbon(aes(ymin = 52, ymax = expected_wins), alpha = 0.3, fill = "blue") +
geom_hline(yintercept = 87, linetype = "dashed", color = "red") +
annotate("text", x = 2028, y = 88, label = "Playoff threshold (~87 wins)",
size = 4, hjust = 0) +
annotate("rect", xmin = 2024, xmax = 2027, ymin = 50, ymax = 105,
alpha = 0.1, fill = "green") +
annotate("text", x = 2025.5, y = 103, label = "Competitive Window",
size = 4.5, fontface = "bold") +
labs(title = "Projected Competitive Window",
subtitle = "Team with aging core, limited farm system",
x = "Year", y = "Projected Wins") +
theme_minimal() +
scale_x_continuous(breaks = seq(2024, 2033, 1))
print(team_projection)
# Compare strategies over 10 years
library(tidyverse)
simulate_strategy <- function(strategy = "compete", years = 10) {
if (strategy == "compete") {
wins <- c(92, 88, 91, 85, 82, 78, 75, 72, 70, 68)
playoff_prob <- plogis((wins - 87) / 3)
championships <- c(0, 0, 0.15, 0, 0, 0, 0, 0, 0, 0) # 15% chance in year 3
} else if (strategy == "rebuild") {
wins <- c(68, 65, 72, 78, 84, 88, 91, 93, 89, 87)
playoff_prob <- plogis((wins - 87) / 3)
championships <- c(0, 0, 0, 0, 0.05, 0.10, 0.15, 0.18, 0.12, 0.10)
} else { # mediocre
wins <- c(79, 77, 80, 78, 81, 79, 80, 82, 78, 77)
playoff_prob <- plogis((wins - 87) / 3)
championships <- rep(0.02, 10)
}
tibble(
year = 1:years,
strategy = strategy,
wins = wins,
playoff_prob = playoff_prob,
championship_prob = championships,
expected_championships = sum(championships)
)
}
strategies <- bind_rows(
simulate_strategy("compete"),
simulate_strategy("rebuild"),
simulate_strategy("mediocre")
)
# Compare outcomes
ggplot(strategies, aes(x = year, y = wins, color = strategy)) +
geom_line(size = 1.5) +
geom_point(size = 3) +
geom_hline(yintercept = 87, linetype = "dashed", color = "black") +
scale_color_manual(values = c("compete" = "darkgreen",
"rebuild" = "darkblue",
"mediocre" = "gray50")) +
labs(title = "Strategic Approach Comparison",
subtitle = "10-year outcomes by strategy",
x = "Year", y = "Wins", color = "Strategy") +
theme_minimal()
# Summary statistics
strategies %>%
group_by(strategy) %>%
summarise(
total_wins = sum(wins),
avg_wins = mean(wins),
playoff_years = sum(playoff_prob > 0.5),
expected_titles = sum(championship_prob)
) %>%
print()
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.special import expit
# Hypothetical team projection
team_projection = pd.DataFrame({
'year': range(2024, 2034),
'core_war': [35, 33, 30, 26, 22, 18, 14, 10, 8, 6],
'young_players': [5, 8, 10, 12, 12, 11, 10, 8, 6, 5],
'free_agents': [8] * 10
})
team_projection['total_war'] = (team_projection['core_war'] +
team_projection['young_players'] +
team_projection['free_agents'])
team_projection['expected_wins'] = 52 + (team_projection['total_war'] * 0.96)
team_projection['playoff_prob'] = (expit((team_projection['expected_wins'] - 87) / 3) * 100)
print(team_projection)
# Visualize
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(team_projection['year'], team_projection['expected_wins'],
linewidth=2.5, color='darkblue', marker='o')
ax.fill_between(team_projection['year'], 52, team_projection['expected_wins'],
alpha=0.3, color='blue')
ax.axhline(y=87, linestyle='--', color='red', linewidth=1.5, label='Playoff threshold')
# Highlight competitive window
ax.axvspan(2024, 2027, alpha=0.1, color='green')
ax.text(2025.5, 103, 'Competitive Window', fontsize=12,
ha='center', fontweight='bold')
ax.set_xlabel('Year', fontsize=11)
ax.set_ylabel('Projected Wins', fontsize=11)
ax.set_title('Projected Competitive Window\nTeam with aging core, limited farm system',
fontsize=13)
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
def simulate_strategy(strategy='compete', years=10):
"""Simulate 10-year outcomes by strategy"""
if strategy == 'compete':
wins = [92, 88, 91, 85, 82, 78, 75, 72, 70, 68]
championships = [0, 0, 0.15, 0, 0, 0, 0, 0, 0, 0]
elif strategy == 'rebuild':
wins = [68, 65, 72, 78, 84, 88, 91, 93, 89, 87]
championships = [0, 0, 0, 0, 0.05, 0.10, 0.15, 0.18, 0.12, 0.10]
else: # mediocre
wins = [79, 77, 80, 78, 81, 79, 80, 82, 78, 77]
championships = [0.02] * 10
playoff_prob = expit((np.array(wins) - 87) / 3)
return pd.DataFrame({
'year': range(1, years + 1),
'strategy': strategy,
'wins': wins,
'playoff_prob': playoff_prob,
'championship_prob': championships
})
# Combine strategies
strategies = pd.concat([
simulate_strategy('compete'),
simulate_strategy('rebuild'),
simulate_strategy('mediocre')
])
# Visualize
plt.figure(figsize=(12, 6))
for strat, color in [('compete', 'darkgreen'),
('rebuild', 'darkblue'),
('mediocre', 'gray')]:
data = strategies[strategies['strategy'] == strat]
plt.plot(data['year'], data['wins'], marker='o',
linewidth=2.5, label=strat.capitalize(), color=color)
plt.axhline(y=87, linestyle='--', color='black', alpha=0.7, label='Playoff threshold')
plt.xlabel('Year')
plt.ylabel('Wins')
plt.title('Strategic Approach Comparison\n10-year outcomes by strategy')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Summary
summary = strategies.groupby('strategy').agg({
'wins': ['sum', 'mean'],
'playoff_prob': lambda x: (x > 0.5).sum(),
'championship_prob': 'sum'
}).round(2)
print("\nStrategy Comparison:")
print(summary)
Building competitive rosters requires synthesizing complex financial, performance, and strategic data into actionable decisions. Interactive visualizations transform static spreadsheets into dynamic decision-support tools that help front offices evaluate trade-offs, identify market inefficiencies, and optimize resource allocation. This section introduces three essential interactive tools for modern roster construction: payroll efficiency scatter plots, WAR distribution sunburst charts, and free agent valuation calculators.
Interactive Payroll vs Wins Analysis
Understanding the relationship between spending and winning is fundamental to roster construction. An interactive scatter plot allows teams to benchmark their efficiency against competitors, identify outliers, and explore how different spending levels correlate with success.
Let's build an interactive payroll vs wins visualization with team-specific details:
library(tidyverse)
library(plotly)
library(Lahman)
# Create comprehensive payroll-wins dataset
# Note: This uses simulated 2023 data; replace with actual payroll data
create_payroll_analysis <- function(year = 2023) {
# Get team wins for the year
team_records <- Teams %>%
filter(yearID == year) %>%
select(teamID, name, W, L, G) %>%
mutate(
win_pct = W / (W + L),
team_abbr = teamID
)
# Simulated payroll data (in millions) - replace with actual data
payroll_data <- tibble(
teamID = c("NYA", "LAN", "NYN", "PHI", "SDP", "BOS", "HOU", "ATL",
"TOR", "SFN", "CHN", "TEX", "SEA", "ANA", "STL", "MIN",
"DET", "COL", "ARI", "MIA", "CHA", "CIN", "MIL", "CLE",
"KCA", "PIT", "WAS", "TBA", "OAK", "BAL"),
payroll = c(280, 265, 240, 235, 230, 215, 205, 195,
190, 185, 180, 175, 170, 165, 160, 155,
150, 145, 140, 135, 130, 125, 120, 115,
110, 105, 100, 90, 85, 80)
)
# Combine datasets
analysis_data <- team_records %>%
left_join(payroll_data, by = "teamID") %>%
filter(!is.na(payroll)) %>%
mutate(
cost_per_win = payroll / W,
playoff_team = W >= 87, # Approximate playoff threshold
efficiency_category = case_when(
cost_per_win < 1.5 ~ "High Efficiency",
cost_per_win < 2.5 ~ "Average Efficiency",
TRUE ~ "Low Efficiency"
)
)
return(analysis_data)
}
# Generate data and create interactive plot
team_data <- create_payroll_analysis(2023)
# Create interactive scatter plot
p <- plot_ly(team_data,
x = ~payroll,
y = ~W,
type = 'scatter',
mode = 'markers',
color = ~playoff_team,
colors = c("FALSE" = "lightblue", "TRUE" = "darkgreen"),
marker = list(size = 12, opacity = 0.7),
text = ~paste("<b>", name, "</b>",
"<br>Payroll: $", round(payroll, 0), "M",
"<br>Wins:", W,
"<br>Cost/Win: $", sprintf("%.2f", cost_per_win), "M",
"<br>Win %:", sprintf("%.3f", win_pct)),
hoverinfo = 'text') %>%
add_trace(
type = 'scatter',
mode = 'lines',
x = c(80, 280),
y = c(70, 95), # Approximate trend line
line = list(color = 'red', dash = 'dash'),
name = 'Expected Wins',
showlegend = TRUE,
hoverinfo = 'skip'
) %>%
layout(
title = "MLB Payroll vs Performance (2023)",
xaxis = list(title = "Payroll ($M)"),
yaxis = list(title = "Wins"),
hovermode = 'closest',
legend = list(title = list(text = "Made Playoffs"))
)
# Add annotations for outliers
best_efficiency <- team_data %>%
filter(playoff_team) %>%
slice_min(cost_per_win, n = 1)
worst_efficiency <- team_data %>%
slice_max(cost_per_win, n = 1)
p <- p %>%
add_annotations(
x = best_efficiency$payroll,
y = best_efficiency$W,
text = paste0(best_efficiency$name, "<br>Best Value"),
showarrow = TRUE,
arrowhead = 2,
ax = 30,
ay = -40
) %>%
add_annotations(
x = worst_efficiency$payroll,
y = worst_efficiency$W,
text = paste0(worst_efficiency$name, "<br>Worst Value"),
showarrow = TRUE,
arrowhead = 2,
ax = -30,
ay = 40
)
p
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from pybaseball import lahman
# Load team data
teams = lahman.teams()
# Filter for recent season
team_records = teams[teams['yearID'] == 2023][['teamID', 'name', 'W', 'L', 'G']].copy()
team_records['win_pct'] = team_records['W'] / (team_records['W'] + team_records['L'])
# Simulated payroll data (replace with actual data)
payroll_data = pd.DataFrame({
'teamID': ['NYA', 'LAN', 'NYN', 'PHI', 'SDP', 'BOS', 'HOU', 'ATL',
'TOR', 'SFN', 'CHN', 'TEX', 'SEA', 'ANA', 'STL', 'MIN',
'DET', 'COL', 'ARI', 'MIA', 'CHA', 'CIN', 'MIL', 'CLE',
'KCA', 'PIT', 'WAS', 'TBA', 'OAK', 'BAL'],
'payroll': [280, 265, 240, 235, 230, 215, 205, 195,
190, 185, 180, 175, 170, 165, 160, 155,
150, 145, 140, 135, 130, 125, 120, 115,
110, 105, 100, 90, 85, 80]
})
# Merge datasets
team_data = team_records.merge(payroll_data, on='teamID', how='left')
team_data = team_data.dropna(subset=['payroll'])
# Calculate metrics
team_data['cost_per_win'] = team_data['payroll'] / team_data['W']
team_data['playoff_team'] = team_data['W'] >= 87
team_data['efficiency_category'] = pd.cut(
team_data['cost_per_win'],
bins=[0, 1.5, 2.5, np.inf],
labels=['High Efficiency', 'Average Efficiency', 'Low Efficiency']
)
# Create hover text
team_data['hover_text'] = (
'<b>' + team_data['name'] + '</b><br>' +
'Payroll: $' + team_data['payroll'].round(0).astype(str) + 'M<br>' +
'Wins: ' + team_data['W'].astype(str) + '<br>' +
'Cost/Win: $' + team_data['cost_per_win'].round(2).astype(str) + 'M<br>' +
'Win %: ' + team_data['win_pct'].round(3).astype(str)
)
# Create interactive scatter plot
fig = px.scatter(
team_data,
x='payroll',
y='W',
color='playoff_team',
color_discrete_map={True: 'darkgreen', False: 'lightblue'},
hover_data={'payroll': False, 'W': False, 'playoff_team': False},
labels={'payroll': 'Payroll ($M)', 'W': 'Wins', 'playoff_team': 'Made Playoffs'},
title='MLB Payroll vs Performance (2023)'
)
# Update traces with custom hover text
fig.update_traces(
marker=dict(size=12, opacity=0.7),
hovertemplate='%{customdata[0]}<extra></extra>',
customdata=team_data[['hover_text']].values
)
# Add trend line
z = np.polyfit(team_data['payroll'], team_data['W'], 1)
p = np.poly1d(z)
x_trend = np.linspace(team_data['payroll'].min(), team_data['payroll'].max(), 100)
y_trend = p(x_trend)
fig.add_trace(
go.Scatter(
x=x_trend,
y=y_trend,
mode='lines',
name='Expected Wins',
line=dict(color='red', dash='dash'),
showlegend=True,
hoverinfo='skip'
)
)
# Add annotations for best and worst value
best_value = team_data[team_data['playoff_team']].nsmallest(1, 'cost_per_win').iloc[0]
worst_value = team_data.nlargest(1, 'cost_per_win').iloc[0]
fig.add_annotation(
x=best_value['payroll'],
y=best_value['W'],
text=f"{best_value['name']}<br>Best Value",
showarrow=True,
arrowhead=2,
ax=30,
ay=-40
)
fig.add_annotation(
x=worst_value['payroll'],
y=worst_value['W'],
text=f"{worst_value['name']}<br>Worst Value",
showarrow=True,
arrowhead=2,
ax=-30,
ay=40
)
fig.update_layout(
hovermode='closest',
height=600,
xaxis_title='Payroll ($M)',
yaxis_title='Wins',
legend_title='Made Playoffs'
)
fig.show()
This interactive visualization immediately reveals market inefficiencies. Teams above the trend line (more wins than expected for their payroll) demonstrate superior roster construction or player development. Teams below the line overpaid relative to results. The ability to hover over individual teams provides context that static charts miss.
WAR Distribution Sunburst Chart
Understanding how value is distributed across a roster is critical for identifying construction weaknesses and allocation opportunities. A sunburst chart provides hierarchical visualization of team WAR distributed by position and individual players.
library(tidyverse)
library(plotly)
library(Lahman)
# Create WAR distribution by team and position
create_war_sunburst <- function(team_abbr = "HOU", year = 2023) {
# Get player batting WAR for the team
# Note: Using a simplified WAR calculation - replace with actual WAR data
batting_war <- Batting %>%
filter(teamID == team_abbr, yearID == year, AB >= 50) %>%
left_join(People %>% select(playerID, nameFirst, nameLast),
by = "playerID") %>%
mutate(
player_name = paste(nameFirst, nameLast),
# Simplified WAR calculation (replace with actual)
war_approx = ((H + BB) / (AB + BB) - 0.320) * AB / 60 +
(HR * 0.5) + (SB * 0.2) - (CS * 0.4),
war = pmax(0, war_approx) # Keep positive only
) %>%
select(playerID, player_name, war)
# Get positional data
fielding_pos <- Fielding %>%
filter(teamID == team_abbr, yearID == year) %>%
group_by(playerID) %>%
slice_max(G, n = 1) %>%
select(playerID, POS) %>%
mutate(
position_group = case_when(
POS %in% c("C", "1B", "2B", "3B", "SS") ~ "Infield",
POS %in% c("LF", "CF", "RF") ~ "Outfield",
POS == "DH" ~ "DH",
TRUE ~ "Pitcher"
)
)
# Combine data
war_data <- batting_war %>%
left_join(fielding_pos, by = "playerID") %>%
filter(!is.na(POS)) %>%
group_by(position_group, POS) %>%
arrange(desc(war)) %>%
mutate(rank = row_number()) %>%
filter(rank <= 3) %>% # Top 3 per position
ungroup()
# Create hierarchical data for sunburst
# Level 1: Position groups
level1 <- war_data %>%
group_by(position_group) %>%
summarise(war = sum(war)) %>%
mutate(
labels = position_group,
parents = "",
values = war
)
# Level 2: Specific positions
level2 <- war_data %>%
group_by(position_group, POS) %>%
summarise(war = sum(war), .groups = 'drop') %>%
mutate(
labels = POS,
parents = position_group,
values = war
)
# Level 3: Individual players
level3 <- war_data %>%
mutate(
labels = player_name,
parents = POS,
values = war
) %>%
select(labels, parents, values)
# Combine all levels
sunburst_data <- bind_rows(level1, level2, level3) %>%
select(labels, parents, values)
# Create sunburst plot
fig <- plot_ly(
labels = sunburst_data$labels,
parents = sunburst_data$parents,
values = sunburst_data$values,
type = 'sunburst',
branchvalues = "total",
hovertemplate = '<b>%{label}</b><br>WAR: %{value:.1f}<extra></extra>'
) %>%
layout(
title = paste0(team_abbr, " WAR Distribution by Position (", year, ")"),
margin = list(l = 0, r = 0, t = 50, b = 0)
)
return(fig)
}
# Create visualization
fig <- create_war_sunburst("HOU", 2023)
fig
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from pybaseball import lahman
def create_war_sunburst(team_abbr='HOU', year=2023):
"""
Create sunburst chart showing WAR distribution by position
Parameters:
-----------
team_abbr : str
Team abbreviation (e.g., 'HOU', 'LAD')
year : int
Season year
"""
# Load data
batting = lahman.batting()
fielding = lahman.fielding()
people = lahman.people()
# Filter batting data for team
team_batting = batting[
(batting['teamID'] == team_abbr) &
(batting['yearID'] == year) &
(batting['AB'] >= 50)
].copy()
# Merge with player names
team_batting = team_batting.merge(
people[['playerID', 'nameFirst', 'nameLast']],
on='playerID'
)
team_batting['player_name'] = (team_batting['nameFirst'] + ' ' +
team_batting['nameLast'])
# Calculate simplified WAR (replace with actual WAR data)
team_batting['war_approx'] = (
((team_batting['H'] + team_batting['BB']) /
(team_batting['AB'] + team_batting['BB']) - 0.320) *
team_batting['AB'] / 60 +
(team_batting['HR'] * 0.5) +
(team_batting['SB'] * 0.2) -
(team_batting['CS'] * 0.4)
)
team_batting['war'] = team_batting['war_approx'].clip(lower=0)
# Get primary position for each player
team_fielding = fielding[
(fielding['teamID'] == team_abbr) &
(fielding['yearID'] == year)
].copy()
primary_pos = team_fielding.loc[
team_fielding.groupby('playerID')['G'].idxmax()
][['playerID', 'POS']]
# Add position grouping
def classify_position(pos):
if pos in ['C', '1B', '2B', '3B', 'SS']:
return 'Infield'
elif pos in ['LF', 'CF', 'RF']:
return 'Outfield'
elif pos == 'DH':
return 'DH'
else:
return 'Pitcher'
primary_pos['position_group'] = primary_pos['POS'].apply(classify_position)
# Merge with WAR data
war_data = team_batting.merge(primary_pos, on='playerID', how='left')
war_data = war_data.dropna(subset=['POS'])
# Get top 3 players per position
war_data['rank'] = war_data.groupby('POS')['war'].rank(
method='dense', ascending=False
)
war_data = war_data[war_data['rank'] <= 3]
# Build hierarchical data
# Level 1: Position groups
level1 = war_data.groupby('position_group')['war'].sum().reset_index()
level1['labels'] = level1['position_group']
level1['parents'] = ''
level1['values'] = level1['war']
# Level 2: Specific positions
level2 = war_data.groupby(['position_group', 'POS'])['war'].sum().reset_index()
level2['labels'] = level2['POS']
level2['parents'] = level2['position_group']
level2['values'] = level2['war']
# Level 3: Individual players
level3 = war_data[['player_name', 'POS', 'war']].copy()
level3['labels'] = level3['player_name']
level3['parents'] = level3['POS']
level3['values'] = level3['war']
# Combine all levels
sunburst_data = pd.concat([
level1[['labels', 'parents', 'values']],
level2[['labels', 'parents', 'values']],
level3[['labels', 'parents', 'values']]
], ignore_index=True)
# Create sunburst chart
fig = go.Figure(go.Sunburst(
labels=sunburst_data['labels'],
parents=sunburst_data['parents'],
values=sunburst_data['values'],
branchvalues="total",
hovertemplate='<b>%{label}</b><br>WAR: %{value:.1f}<extra></extra>'
))
fig.update_layout(
title=f'{team_abbr} WAR Distribution by Position ({year})',
margin=dict(l=0, r=0, t=50, b=0),
height=600
)
return fig
# Create visualization
fig = create_war_sunburst('HOU', 2023)
fig.show()
The sunburst chart provides immediate visual feedback on roster balance. A well-constructed team shows relatively even distribution across position groups, avoiding over-reliance on one or two players. The hierarchical structure lets analysts drill down from position groups (Infield, Outfield) to specific positions (SS, CF) to individual contributors.
Free Agent Valuation Calculator
Front offices need tools to quickly evaluate whether free agent contracts represent good value. An interactive calculator allows users to input player projections and contract terms, instantly seeing the financial implications and comparing against market benchmarks.
library(tidyverse)
library(plotly)
# Free agent valuation calculator with visualization
create_fa_calculator <- function() {
# Define market parameters
market_rate_per_war <- 12 # Million dollars per WAR
discount_rate <- 0.05 # 5% annual discount rate
# Example contract scenarios
contracts <- tibble(
player = c("Star Player", "Mid-Tier Player", "Value Signing",
"Overpay Example", "Young FA"),
years = c(6, 4, 2, 5, 7),
aav = c(35, 18, 8, 22, 28), # Average annual value (millions)
year_1_war = c(5.5, 3.5, 2.0, 2.5, 4.5),
annual_decline = c(0.3, 0.25, 0.15, 0.35, 0.2)
)
# Calculate projected WAR for each year of contract
calculate_contract_value <- function(years, year_1_war, annual_decline,
aav, discount_rate, market_rate) {
war_by_year <- numeric(years)
for (i in 1:years) {
war_by_year[i] <- max(0, year_1_war - (i - 1) * annual_decline)
}
# Calculate present value
discount_factors <- 1 / ((1 + discount_rate) ^ (0:(years-1)))
pv_war <- sum(war_by_year * discount_factors)
pv_salary <- sum(rep(aav, years) * discount_factors)
# Market value
market_value <- pv_war * market_rate
# Surplus/deficit
surplus <- market_value - pv_salary
return(list(
total_war = sum(war_by_year),
pv_war = pv_war,
pv_salary = pv_salary,
market_value = market_value,
surplus = surplus,
cost_per_war = pv_salary / pv_war
))
}
# Calculate for all contracts
contract_analysis <- contracts %>%
rowwise() %>%
mutate(
results = list(calculate_contract_value(
years, year_1_war, annual_decline, aav, discount_rate, market_rate_per_war
))
) %>%
unnest_wider(results) %>%
mutate(
total_salary = aav * years,
value_category = case_when(
surplus > 20 ~ "Excellent Value",
surplus > 0 ~ "Good Value",
surplus > -20 ~ "Market Rate",
TRUE ~ "Overpay"
)
)
# Create interactive bar chart
fig <- plot_ly(contract_analysis,
x = ~player,
y = ~surplus,
type = 'bar',
color = ~value_category,
colors = c("Excellent Value" = "darkgreen",
"Good Value" = "lightgreen",
"Market Rate" = "yellow",
"Overpay" = "red"),
text = ~paste("<b>", player, "</b>",
"<br>Contract:", years, "years @", aav, "M/yr",
"<br>Total Salary: $", round(total_salary, 0), "M",
"<br>PV Salary: $", round(pv_salary, 1), "M",
"<br>Projected WAR:", round(total_war, 1),
"<br>PV WAR:", round(pv_war, 1),
"<br>Market Value: $", round(market_value, 1), "M",
"<br>Surplus: $", round(surplus, 1), "M",
"<br>Cost/WAR: $", round(cost_per_war, 1), "M"),
hoverinfo = 'text') %>%
layout(
title = "Free Agent Contract Valuation Analysis",
xaxis = list(title = ""),
yaxis = list(title = "Surplus Value ($M)"),
hovermode = 'closest',
showlegend = TRUE
) %>%
add_hline(y = 0, line = list(color = "black", dash = "dash"))
return(list(data = contract_analysis, plot = fig))
}
# Run calculator
results <- create_fa_calculator()
results$plot
print(results$data)
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
def calculate_contract_value(years, year_1_war, annual_decline, aav,
discount_rate=0.05, market_rate=12):
"""
Calculate present value and surplus for a free agent contract
Parameters:
-----------
years : int
Contract length
year_1_war : float
Projected WAR in first year
annual_decline : float
Expected annual WAR decline
aav : float
Average annual value (millions)
discount_rate : float
Annual discount rate
market_rate : float
Market rate per WAR (millions)
"""
# Project WAR for each year
war_by_year = [max(0, year_1_war - i * annual_decline) for i in range(years)]
# Calculate present values
discount_factors = [1 / ((1 + discount_rate) ** i) for i in range(years)]
pv_war = sum(w * d for w, d in zip(war_by_year, discount_factors))
pv_salary = sum(aav * d for d in discount_factors)
# Calculate surplus
market_value = pv_war * market_rate
surplus = market_value - pv_salary
return {
'total_war': sum(war_by_year),
'pv_war': pv_war,
'pv_salary': pv_salary,
'market_value': market_value,
'surplus': surplus,
'cost_per_war': pv_salary / pv_war if pv_war > 0 else 0
}
def create_fa_calculator():
"""Create interactive free agent valuation calculator"""
# Example contract scenarios
contracts = pd.DataFrame({
'player': ['Star Player', 'Mid-Tier Player', 'Value Signing',
'Overpay Example', 'Young FA'],
'years': [6, 4, 2, 5, 7],
'aav': [35, 18, 8, 22, 28],
'year_1_war': [5.5, 3.5, 2.0, 2.5, 4.5],
'annual_decline': [0.3, 0.25, 0.15, 0.35, 0.2]
})
# Calculate values for each contract
results = []
for _, row in contracts.iterrows():
calc = calculate_contract_value(
row['years'], row['year_1_war'], row['annual_decline'], row['aav']
)
results.append(calc)
# Add results to dataframe
results_df = pd.DataFrame(results)
contract_analysis = pd.concat([contracts, results_df], axis=1)
# Calculate total salary and categorize
contract_analysis['total_salary'] = (contract_analysis['aav'] *
contract_analysis['years'])
def categorize_value(surplus):
if surplus > 20:
return 'Excellent Value'
elif surplus > 0:
return 'Good Value'
elif surplus > -20:
return 'Market Rate'
else:
return 'Overpay'
contract_analysis['value_category'] = (contract_analysis['surplus']
.apply(categorize_value))
# Create hover text
contract_analysis['hover_text'] = (
'<b>' + contract_analysis['player'] + '</b><br>' +
'Contract: ' + contract_analysis['years'].astype(str) + ' years @ $' +
contract_analysis['aav'].astype(str) + 'M/yr<br>' +
'Total Salary: $' + contract_analysis['total_salary'].round(0).astype(str) + 'M<br>' +
'PV Salary: $' + contract_analysis['pv_salary'].round(1).astype(str) + 'M<br>' +
'Projected WAR: ' + contract_analysis['total_war'].round(1).astype(str) + '<br>' +
'PV WAR: ' + contract_analysis['pv_war'].round(1).astype(str) + '<br>' +
'Market Value: $' + contract_analysis['market_value'].round(1).astype(str) + 'M<br>' +
'Surplus: $' + contract_analysis['surplus'].round(1).astype(str) + 'M<br>' +
'Cost/WAR: $' + contract_analysis['cost_per_war'].round(1).astype(str) + 'M'
)
# Create bar chart
color_map = {
'Excellent Value': 'darkgreen',
'Good Value': 'lightgreen',
'Market Rate': 'yellow',
'Overpay': 'red'
}
fig = px.bar(
contract_analysis,
x='player',
y='surplus',
color='value_category',
color_discrete_map=color_map,
title='Free Agent Contract Valuation Analysis',
labels={'player': '', 'surplus': 'Surplus Value ($M)'}
)
fig.update_traces(
hovertemplate='%{customdata[0]}<extra></extra>',
customdata=contract_analysis[['hover_text']].values
)
fig.add_hline(
y=0,
line_dash="dash",
line_color="black",
annotation_text="Break-even"
)
fig.update_layout(
hovermode='closest',
height=600,
showlegend=True,
legend_title='Value Assessment'
)
return contract_analysis, fig
# Run calculator
contract_analysis, fig = create_fa_calculator()
fig.show()
print("\nContract Analysis Summary:")
print(contract_analysis[['player', 'years', 'aav', 'total_war', 'surplus',
'cost_per_war', 'value_category']])
This free agent calculator transforms abstract contract negotiations into concrete value assessments. By visualizing surplus value (market value minus cost), front offices can quickly identify which signings create value and which destroy it. The tool accounts for aging curves through annual decline rates and applies financial discounting to properly value future performance.
These interactive roster tools empower data-driven decision-making by making complex analyses accessible and intuitive. Teams can explore different scenarios, benchmark against competitors, and identify inefficiencies that create competitive advantages. As roster construction becomes increasingly sophisticated, interactive visualization becomes essential for translating analytics into action.
library(tidyverse)
library(plotly)
library(Lahman)
# Create comprehensive payroll-wins dataset
# Note: This uses simulated 2023 data; replace with actual payroll data
create_payroll_analysis <- function(year = 2023) {
# Get team wins for the year
team_records <- Teams %>%
filter(yearID == year) %>%
select(teamID, name, W, L, G) %>%
mutate(
win_pct = W / (W + L),
team_abbr = teamID
)
# Simulated payroll data (in millions) - replace with actual data
payroll_data <- tibble(
teamID = c("NYA", "LAN", "NYN", "PHI", "SDP", "BOS", "HOU", "ATL",
"TOR", "SFN", "CHN", "TEX", "SEA", "ANA", "STL", "MIN",
"DET", "COL", "ARI", "MIA", "CHA", "CIN", "MIL", "CLE",
"KCA", "PIT", "WAS", "TBA", "OAK", "BAL"),
payroll = c(280, 265, 240, 235, 230, 215, 205, 195,
190, 185, 180, 175, 170, 165, 160, 155,
150, 145, 140, 135, 130, 125, 120, 115,
110, 105, 100, 90, 85, 80)
)
# Combine datasets
analysis_data <- team_records %>%
left_join(payroll_data, by = "teamID") %>%
filter(!is.na(payroll)) %>%
mutate(
cost_per_win = payroll / W,
playoff_team = W >= 87, # Approximate playoff threshold
efficiency_category = case_when(
cost_per_win < 1.5 ~ "High Efficiency",
cost_per_win < 2.5 ~ "Average Efficiency",
TRUE ~ "Low Efficiency"
)
)
return(analysis_data)
}
# Generate data and create interactive plot
team_data <- create_payroll_analysis(2023)
# Create interactive scatter plot
p <- plot_ly(team_data,
x = ~payroll,
y = ~W,
type = 'scatter',
mode = 'markers',
color = ~playoff_team,
colors = c("FALSE" = "lightblue", "TRUE" = "darkgreen"),
marker = list(size = 12, opacity = 0.7),
text = ~paste("<b>", name, "</b>",
"<br>Payroll: $", round(payroll, 0), "M",
"<br>Wins:", W,
"<br>Cost/Win: $", sprintf("%.2f", cost_per_win), "M",
"<br>Win %:", sprintf("%.3f", win_pct)),
hoverinfo = 'text') %>%
add_trace(
type = 'scatter',
mode = 'lines',
x = c(80, 280),
y = c(70, 95), # Approximate trend line
line = list(color = 'red', dash = 'dash'),
name = 'Expected Wins',
showlegend = TRUE,
hoverinfo = 'skip'
) %>%
layout(
title = "MLB Payroll vs Performance (2023)",
xaxis = list(title = "Payroll ($M)"),
yaxis = list(title = "Wins"),
hovermode = 'closest',
legend = list(title = list(text = "Made Playoffs"))
)
# Add annotations for outliers
best_efficiency <- team_data %>%
filter(playoff_team) %>%
slice_min(cost_per_win, n = 1)
worst_efficiency <- team_data %>%
slice_max(cost_per_win, n = 1)
p <- p %>%
add_annotations(
x = best_efficiency$payroll,
y = best_efficiency$W,
text = paste0(best_efficiency$name, "<br>Best Value"),
showarrow = TRUE,
arrowhead = 2,
ax = 30,
ay = -40
) %>%
add_annotations(
x = worst_efficiency$payroll,
y = worst_efficiency$W,
text = paste0(worst_efficiency$name, "<br>Worst Value"),
showarrow = TRUE,
arrowhead = 2,
ax = -30,
ay = 40
)
p
library(tidyverse)
library(plotly)
library(Lahman)
# Create WAR distribution by team and position
create_war_sunburst <- function(team_abbr = "HOU", year = 2023) {
# Get player batting WAR for the team
# Note: Using a simplified WAR calculation - replace with actual WAR data
batting_war <- Batting %>%
filter(teamID == team_abbr, yearID == year, AB >= 50) %>%
left_join(People %>% select(playerID, nameFirst, nameLast),
by = "playerID") %>%
mutate(
player_name = paste(nameFirst, nameLast),
# Simplified WAR calculation (replace with actual)
war_approx = ((H + BB) / (AB + BB) - 0.320) * AB / 60 +
(HR * 0.5) + (SB * 0.2) - (CS * 0.4),
war = pmax(0, war_approx) # Keep positive only
) %>%
select(playerID, player_name, war)
# Get positional data
fielding_pos <- Fielding %>%
filter(teamID == team_abbr, yearID == year) %>%
group_by(playerID) %>%
slice_max(G, n = 1) %>%
select(playerID, POS) %>%
mutate(
position_group = case_when(
POS %in% c("C", "1B", "2B", "3B", "SS") ~ "Infield",
POS %in% c("LF", "CF", "RF") ~ "Outfield",
POS == "DH" ~ "DH",
TRUE ~ "Pitcher"
)
)
# Combine data
war_data <- batting_war %>%
left_join(fielding_pos, by = "playerID") %>%
filter(!is.na(POS)) %>%
group_by(position_group, POS) %>%
arrange(desc(war)) %>%
mutate(rank = row_number()) %>%
filter(rank <= 3) %>% # Top 3 per position
ungroup()
# Create hierarchical data for sunburst
# Level 1: Position groups
level1 <- war_data %>%
group_by(position_group) %>%
summarise(war = sum(war)) %>%
mutate(
labels = position_group,
parents = "",
values = war
)
# Level 2: Specific positions
level2 <- war_data %>%
group_by(position_group, POS) %>%
summarise(war = sum(war), .groups = 'drop') %>%
mutate(
labels = POS,
parents = position_group,
values = war
)
# Level 3: Individual players
level3 <- war_data %>%
mutate(
labels = player_name,
parents = POS,
values = war
) %>%
select(labels, parents, values)
# Combine all levels
sunburst_data <- bind_rows(level1, level2, level3) %>%
select(labels, parents, values)
# Create sunburst plot
fig <- plot_ly(
labels = sunburst_data$labels,
parents = sunburst_data$parents,
values = sunburst_data$values,
type = 'sunburst',
branchvalues = "total",
hovertemplate = '<b>%{label}</b><br>WAR: %{value:.1f}<extra></extra>'
) %>%
layout(
title = paste0(team_abbr, " WAR Distribution by Position (", year, ")"),
margin = list(l = 0, r = 0, t = 50, b = 0)
)
return(fig)
}
# Create visualization
fig <- create_war_sunburst("HOU", 2023)
fig
library(tidyverse)
library(plotly)
# Free agent valuation calculator with visualization
create_fa_calculator <- function() {
# Define market parameters
market_rate_per_war <- 12 # Million dollars per WAR
discount_rate <- 0.05 # 5% annual discount rate
# Example contract scenarios
contracts <- tibble(
player = c("Star Player", "Mid-Tier Player", "Value Signing",
"Overpay Example", "Young FA"),
years = c(6, 4, 2, 5, 7),
aav = c(35, 18, 8, 22, 28), # Average annual value (millions)
year_1_war = c(5.5, 3.5, 2.0, 2.5, 4.5),
annual_decline = c(0.3, 0.25, 0.15, 0.35, 0.2)
)
# Calculate projected WAR for each year of contract
calculate_contract_value <- function(years, year_1_war, annual_decline,
aav, discount_rate, market_rate) {
war_by_year <- numeric(years)
for (i in 1:years) {
war_by_year[i] <- max(0, year_1_war - (i - 1) * annual_decline)
}
# Calculate present value
discount_factors <- 1 / ((1 + discount_rate) ^ (0:(years-1)))
pv_war <- sum(war_by_year * discount_factors)
pv_salary <- sum(rep(aav, years) * discount_factors)
# Market value
market_value <- pv_war * market_rate
# Surplus/deficit
surplus <- market_value - pv_salary
return(list(
total_war = sum(war_by_year),
pv_war = pv_war,
pv_salary = pv_salary,
market_value = market_value,
surplus = surplus,
cost_per_war = pv_salary / pv_war
))
}
# Calculate for all contracts
contract_analysis <- contracts %>%
rowwise() %>%
mutate(
results = list(calculate_contract_value(
years, year_1_war, annual_decline, aav, discount_rate, market_rate_per_war
))
) %>%
unnest_wider(results) %>%
mutate(
total_salary = aav * years,
value_category = case_when(
surplus > 20 ~ "Excellent Value",
surplus > 0 ~ "Good Value",
surplus > -20 ~ "Market Rate",
TRUE ~ "Overpay"
)
)
# Create interactive bar chart
fig <- plot_ly(contract_analysis,
x = ~player,
y = ~surplus,
type = 'bar',
color = ~value_category,
colors = c("Excellent Value" = "darkgreen",
"Good Value" = "lightgreen",
"Market Rate" = "yellow",
"Overpay" = "red"),
text = ~paste("<b>", player, "</b>",
"<br>Contract:", years, "years @", aav, "M/yr",
"<br>Total Salary: $", round(total_salary, 0), "M",
"<br>PV Salary: $", round(pv_salary, 1), "M",
"<br>Projected WAR:", round(total_war, 1),
"<br>PV WAR:", round(pv_war, 1),
"<br>Market Value: $", round(market_value, 1), "M",
"<br>Surplus: $", round(surplus, 1), "M",
"<br>Cost/WAR: $", round(cost_per_war, 1), "M"),
hoverinfo = 'text') %>%
layout(
title = "Free Agent Contract Valuation Analysis",
xaxis = list(title = ""),
yaxis = list(title = "Surplus Value ($M)"),
hovermode = 'closest',
showlegend = TRUE
) %>%
add_hline(y = 0, line = list(color = "black", dash = "dash"))
return(list(data = contract_analysis, plot = fig))
}
# Run calculator
results <- create_fa_calculator()
results$plot
print(results$data)
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from pybaseball import lahman
# Load team data
teams = lahman.teams()
# Filter for recent season
team_records = teams[teams['yearID'] == 2023][['teamID', 'name', 'W', 'L', 'G']].copy()
team_records['win_pct'] = team_records['W'] / (team_records['W'] + team_records['L'])
# Simulated payroll data (replace with actual data)
payroll_data = pd.DataFrame({
'teamID': ['NYA', 'LAN', 'NYN', 'PHI', 'SDP', 'BOS', 'HOU', 'ATL',
'TOR', 'SFN', 'CHN', 'TEX', 'SEA', 'ANA', 'STL', 'MIN',
'DET', 'COL', 'ARI', 'MIA', 'CHA', 'CIN', 'MIL', 'CLE',
'KCA', 'PIT', 'WAS', 'TBA', 'OAK', 'BAL'],
'payroll': [280, 265, 240, 235, 230, 215, 205, 195,
190, 185, 180, 175, 170, 165, 160, 155,
150, 145, 140, 135, 130, 125, 120, 115,
110, 105, 100, 90, 85, 80]
})
# Merge datasets
team_data = team_records.merge(payroll_data, on='teamID', how='left')
team_data = team_data.dropna(subset=['payroll'])
# Calculate metrics
team_data['cost_per_win'] = team_data['payroll'] / team_data['W']
team_data['playoff_team'] = team_data['W'] >= 87
team_data['efficiency_category'] = pd.cut(
team_data['cost_per_win'],
bins=[0, 1.5, 2.5, np.inf],
labels=['High Efficiency', 'Average Efficiency', 'Low Efficiency']
)
# Create hover text
team_data['hover_text'] = (
'<b>' + team_data['name'] + '</b><br>' +
'Payroll: $' + team_data['payroll'].round(0).astype(str) + 'M<br>' +
'Wins: ' + team_data['W'].astype(str) + '<br>' +
'Cost/Win: $' + team_data['cost_per_win'].round(2).astype(str) + 'M<br>' +
'Win %: ' + team_data['win_pct'].round(3).astype(str)
)
# Create interactive scatter plot
fig = px.scatter(
team_data,
x='payroll',
y='W',
color='playoff_team',
color_discrete_map={True: 'darkgreen', False: 'lightblue'},
hover_data={'payroll': False, 'W': False, 'playoff_team': False},
labels={'payroll': 'Payroll ($M)', 'W': 'Wins', 'playoff_team': 'Made Playoffs'},
title='MLB Payroll vs Performance (2023)'
)
# Update traces with custom hover text
fig.update_traces(
marker=dict(size=12, opacity=0.7),
hovertemplate='%{customdata[0]}<extra></extra>',
customdata=team_data[['hover_text']].values
)
# Add trend line
z = np.polyfit(team_data['payroll'], team_data['W'], 1)
p = np.poly1d(z)
x_trend = np.linspace(team_data['payroll'].min(), team_data['payroll'].max(), 100)
y_trend = p(x_trend)
fig.add_trace(
go.Scatter(
x=x_trend,
y=y_trend,
mode='lines',
name='Expected Wins',
line=dict(color='red', dash='dash'),
showlegend=True,
hoverinfo='skip'
)
)
# Add annotations for best and worst value
best_value = team_data[team_data['playoff_team']].nsmallest(1, 'cost_per_win').iloc[0]
worst_value = team_data.nlargest(1, 'cost_per_win').iloc[0]
fig.add_annotation(
x=best_value['payroll'],
y=best_value['W'],
text=f"{best_value['name']}<br>Best Value",
showarrow=True,
arrowhead=2,
ax=30,
ay=-40
)
fig.add_annotation(
x=worst_value['payroll'],
y=worst_value['W'],
text=f"{worst_value['name']}<br>Worst Value",
showarrow=True,
arrowhead=2,
ax=-30,
ay=40
)
fig.update_layout(
hovermode='closest',
height=600,
xaxis_title='Payroll ($M)',
yaxis_title='Wins',
legend_title='Made Playoffs'
)
fig.show()
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from pybaseball import lahman
def create_war_sunburst(team_abbr='HOU', year=2023):
"""
Create sunburst chart showing WAR distribution by position
Parameters:
-----------
team_abbr : str
Team abbreviation (e.g., 'HOU', 'LAD')
year : int
Season year
"""
# Load data
batting = lahman.batting()
fielding = lahman.fielding()
people = lahman.people()
# Filter batting data for team
team_batting = batting[
(batting['teamID'] == team_abbr) &
(batting['yearID'] == year) &
(batting['AB'] >= 50)
].copy()
# Merge with player names
team_batting = team_batting.merge(
people[['playerID', 'nameFirst', 'nameLast']],
on='playerID'
)
team_batting['player_name'] = (team_batting['nameFirst'] + ' ' +
team_batting['nameLast'])
# Calculate simplified WAR (replace with actual WAR data)
team_batting['war_approx'] = (
((team_batting['H'] + team_batting['BB']) /
(team_batting['AB'] + team_batting['BB']) - 0.320) *
team_batting['AB'] / 60 +
(team_batting['HR'] * 0.5) +
(team_batting['SB'] * 0.2) -
(team_batting['CS'] * 0.4)
)
team_batting['war'] = team_batting['war_approx'].clip(lower=0)
# Get primary position for each player
team_fielding = fielding[
(fielding['teamID'] == team_abbr) &
(fielding['yearID'] == year)
].copy()
primary_pos = team_fielding.loc[
team_fielding.groupby('playerID')['G'].idxmax()
][['playerID', 'POS']]
# Add position grouping
def classify_position(pos):
if pos in ['C', '1B', '2B', '3B', 'SS']:
return 'Infield'
elif pos in ['LF', 'CF', 'RF']:
return 'Outfield'
elif pos == 'DH':
return 'DH'
else:
return 'Pitcher'
primary_pos['position_group'] = primary_pos['POS'].apply(classify_position)
# Merge with WAR data
war_data = team_batting.merge(primary_pos, on='playerID', how='left')
war_data = war_data.dropna(subset=['POS'])
# Get top 3 players per position
war_data['rank'] = war_data.groupby('POS')['war'].rank(
method='dense', ascending=False
)
war_data = war_data[war_data['rank'] <= 3]
# Build hierarchical data
# Level 1: Position groups
level1 = war_data.groupby('position_group')['war'].sum().reset_index()
level1['labels'] = level1['position_group']
level1['parents'] = ''
level1['values'] = level1['war']
# Level 2: Specific positions
level2 = war_data.groupby(['position_group', 'POS'])['war'].sum().reset_index()
level2['labels'] = level2['POS']
level2['parents'] = level2['position_group']
level2['values'] = level2['war']
# Level 3: Individual players
level3 = war_data[['player_name', 'POS', 'war']].copy()
level3['labels'] = level3['player_name']
level3['parents'] = level3['POS']
level3['values'] = level3['war']
# Combine all levels
sunburst_data = pd.concat([
level1[['labels', 'parents', 'values']],
level2[['labels', 'parents', 'values']],
level3[['labels', 'parents', 'values']]
], ignore_index=True)
# Create sunburst chart
fig = go.Figure(go.Sunburst(
labels=sunburst_data['labels'],
parents=sunburst_data['parents'],
values=sunburst_data['values'],
branchvalues="total",
hovertemplate='<b>%{label}</b><br>WAR: %{value:.1f}<extra></extra>'
))
fig.update_layout(
title=f'{team_abbr} WAR Distribution by Position ({year})',
margin=dict(l=0, r=0, t=50, b=0),
height=600
)
return fig
# Create visualization
fig = create_war_sunburst('HOU', 2023)
fig.show()
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
def calculate_contract_value(years, year_1_war, annual_decline, aav,
discount_rate=0.05, market_rate=12):
"""
Calculate present value and surplus for a free agent contract
Parameters:
-----------
years : int
Contract length
year_1_war : float
Projected WAR in first year
annual_decline : float
Expected annual WAR decline
aav : float
Average annual value (millions)
discount_rate : float
Annual discount rate
market_rate : float
Market rate per WAR (millions)
"""
# Project WAR for each year
war_by_year = [max(0, year_1_war - i * annual_decline) for i in range(years)]
# Calculate present values
discount_factors = [1 / ((1 + discount_rate) ** i) for i in range(years)]
pv_war = sum(w * d for w, d in zip(war_by_year, discount_factors))
pv_salary = sum(aav * d for d in discount_factors)
# Calculate surplus
market_value = pv_war * market_rate
surplus = market_value - pv_salary
return {
'total_war': sum(war_by_year),
'pv_war': pv_war,
'pv_salary': pv_salary,
'market_value': market_value,
'surplus': surplus,
'cost_per_war': pv_salary / pv_war if pv_war > 0 else 0
}
def create_fa_calculator():
"""Create interactive free agent valuation calculator"""
# Example contract scenarios
contracts = pd.DataFrame({
'player': ['Star Player', 'Mid-Tier Player', 'Value Signing',
'Overpay Example', 'Young FA'],
'years': [6, 4, 2, 5, 7],
'aav': [35, 18, 8, 22, 28],
'year_1_war': [5.5, 3.5, 2.0, 2.5, 4.5],
'annual_decline': [0.3, 0.25, 0.15, 0.35, 0.2]
})
# Calculate values for each contract
results = []
for _, row in contracts.iterrows():
calc = calculate_contract_value(
row['years'], row['year_1_war'], row['annual_decline'], row['aav']
)
results.append(calc)
# Add results to dataframe
results_df = pd.DataFrame(results)
contract_analysis = pd.concat([contracts, results_df], axis=1)
# Calculate total salary and categorize
contract_analysis['total_salary'] = (contract_analysis['aav'] *
contract_analysis['years'])
def categorize_value(surplus):
if surplus > 20:
return 'Excellent Value'
elif surplus > 0:
return 'Good Value'
elif surplus > -20:
return 'Market Rate'
else:
return 'Overpay'
contract_analysis['value_category'] = (contract_analysis['surplus']
.apply(categorize_value))
# Create hover text
contract_analysis['hover_text'] = (
'<b>' + contract_analysis['player'] + '</b><br>' +
'Contract: ' + contract_analysis['years'].astype(str) + ' years @ $' +
contract_analysis['aav'].astype(str) + 'M/yr<br>' +
'Total Salary: $' + contract_analysis['total_salary'].round(0).astype(str) + 'M<br>' +
'PV Salary: $' + contract_analysis['pv_salary'].round(1).astype(str) + 'M<br>' +
'Projected WAR: ' + contract_analysis['total_war'].round(1).astype(str) + '<br>' +
'PV WAR: ' + contract_analysis['pv_war'].round(1).astype(str) + '<br>' +
'Market Value: $' + contract_analysis['market_value'].round(1).astype(str) + 'M<br>' +
'Surplus: $' + contract_analysis['surplus'].round(1).astype(str) + 'M<br>' +
'Cost/WAR: $' + contract_analysis['cost_per_war'].round(1).astype(str) + 'M'
)
# Create bar chart
color_map = {
'Excellent Value': 'darkgreen',
'Good Value': 'lightgreen',
'Market Rate': 'yellow',
'Overpay': 'red'
}
fig = px.bar(
contract_analysis,
x='player',
y='surplus',
color='value_category',
color_discrete_map=color_map,
title='Free Agent Contract Valuation Analysis',
labels={'player': '', 'surplus': 'Surplus Value ($M)'}
)
fig.update_traces(
hovertemplate='%{customdata[0]}<extra></extra>',
customdata=contract_analysis[['hover_text']].values
)
fig.add_hline(
y=0,
line_dash="dash",
line_color="black",
annotation_text="Break-even"
)
fig.update_layout(
hovermode='closest',
height=600,
showlegend=True,
legend_title='Value Assessment'
)
return contract_analysis, fig
# Run calculator
contract_analysis, fig = create_fa_calculator()
fig.show()
print("\nContract Analysis Summary:")
print(contract_analysis[['player', 'years', 'aav', 'total_war', 'surplus',
'cost_per_war', 'value_category']])
Exercise 14.1: Free Agent Cost Analysis
Using 2024 free agent data, calculate cost per WAR for at least 10 free agent signings. Then:
a) Compare cost per WAR across different position groups (pitchers vs hitters, premium positions vs corner positions)
b) Analyze whether older players (33+) cost more or less per WAR than younger free agents (28-30)
c) Identify which signing appears most efficient (best value) and least efficient (worst value)
Data to collect:
- Player name, position, age
- Contract terms (years, AAV)
- Projected WAR for first year (use Steamer or ZiPS)
Hint: Check FanGraphs or Baseball Prospectus for free agent tracker and projections.
Exercise 14.2: Trade Surplus Value
Evaluate a recent blockbuster trade using surplus value analysis. Choose a trade from the past 2 years involving multiple players.
Your analysis should:
a) Calculate total surplus value for each side of the trade (projected WAR × market rate - expected salary over years of control)
b) Apply discount rates to future value (use 5-10%)
c) Determine which team "won" the trade based on surplus value
d) Discuss how competitive windows might make the trade beneficial for both sides despite unequal surplus value
Suggested trades:
- Juan Soto to Padres (2022)
- Tyler Glasnow to Dodgers (2023)
- Dylan Cease to Padres (2024)
Exercise 14.3: Draft Pick Value Curve
Using Baseball Reference or FanGraphs, collect data on draft picks from a single year (suggest 2014-2016 to allow development time):
For picks 1-30:
a) Calculate what percentage reached MLB (100+ PA or 50+ IP)
b) For those who reached MLB, calculate total career WAR through 2023
c) Build a value curve showing expected WAR by draft position
d) Identify which picks outperformed or underperformed expectations
Extension: Compare college vs high school players. Do high school picks have higher variance? Higher ceiling?
Exercise 14.4: Competitive Window Modeling
Choose a current MLB team and project their competitive window:
a) Identify core players and project their WAR trajectory over next 5 years using aging curves
b) Estimate prospect contribution (consult top prospect lists)
c) Calculate total projected WAR and expected wins for each season
d) Determine the team's optimal strategy: compete now, rebuild, or middle ground
e) Recommend specific roster moves (trades, free agent signings, or sell-offs) that align with your recommended strategy
Teams with interesting situations:
- Baltimore Orioles (young core, rising)
- St. Louis Cardinals (aging core, crossroads)
- Los Angeles Angels (Trout aging, weak farm)
- Tampa Bay Rays (perennial contender, low payroll)
Chapter Summary
Team building combines economics, player valuation, strategic timing, and organizational philosophy. Key takeaways:
- Economic Efficiency: Pre-arbitration players provide 40x ROI vs free agents; exploit this arbitrage
- Positional Value: Premium defensive positions (C, SS, CF) allow lower offensive standards
- Free Agent Markets: Account for aging curves, apply discount rates, avoid winner's curse
- Trade Strategy: Exchange surplus value across different timelines; align with competitive windows
- Draft Philosophy: Balance upside (high school) vs safety (college) based on organizational timeline
- Strategic Clarity: Commit fully to competing or rebuilding; avoid mediocre middle ground
Successful team building requires analytical rigor, clear strategic vision, and disciplined execution. The best front offices combine quantitative analysis with qualitative evaluation, organizational development, and adaptive strategy. As analytics evolve, teams that integrate new methods while maintaining coherent long-term plans will sustain competitive advantage.
Practice Exercises
Reinforce what you've learned with these hands-on exercises. Try to solve them on your own before viewing hints or solutions.
Tips for Success
- Read the problem carefully before starting to code
- Break down complex problems into smaller steps
- Use the hints if you're stuck - they won't give away the answer
- After solving, compare your approach with the solution
Free Agent Cost Analysis
a) Compare cost per WAR across different position groups (pitchers vs hitters, premium positions vs corner positions)
b) Analyze whether older players (33+) cost more or less per WAR than younger free agents (28-30)
c) Identify which signing appears most efficient (best value) and least efficient (worst value)
**Data to collect:**
- Player name, position, age
- Contract terms (years, AAV)
- Projected WAR for first year (use Steamer or ZiPS)
**Hint:** Check FanGraphs or Baseball Prospectus for free agent tracker and projections.
Trade Surplus Value
**Your analysis should:**
a) Calculate total surplus value for each side of the trade (projected WAR × market rate - expected salary over years of control)
b) Apply discount rates to future value (use 5-10%)
c) Determine which team "won" the trade based on surplus value
d) Discuss how competitive windows might make the trade beneficial for both sides despite unequal surplus value
**Suggested trades:**
- Juan Soto to Padres (2022)
- Tyler Glasnow to Dodgers (2023)
- Dylan Cease to Padres (2024)
Draft Pick Value Curve
**For picks 1-30:**
a) Calculate what percentage reached MLB (100+ PA or 50+ IP)
b) For those who reached MLB, calculate total career WAR through 2023
c) Build a value curve showing expected WAR by draft position
d) Identify which picks outperformed or underperformed expectations
**Extension:** Compare college vs high school players. Do high school picks have higher variance? Higher ceiling?
Competitive Window Modeling
a) Identify core players and project their WAR trajectory over next 5 years using aging curves
b) Estimate prospect contribution (consult top prospect lists)
c) Calculate total projected WAR and expected wins for each season
d) Determine the team's optimal strategy: compete now, rebuild, or middle ground
e) Recommend specific roster moves (trades, free agent signings, or sell-offs) that align with your recommended strategy
**Teams with interesting situations:**
- Baltimore Orioles (young core, rising)
- St. Louis Cardinals (aging core, crossroads)
- Los Angeles Angels (Trout aging, weak farm)
- Tampa Bay Rays (perennial contender, low payroll)
---
**Chapter Summary**
Team building combines economics, player valuation, strategic timing, and organizational philosophy. Key takeaways:
1. **Economic Efficiency**: Pre-arbitration players provide 40x ROI vs free agents; exploit this arbitrage
2. **Positional Value**: Premium defensive positions (C, SS, CF) allow lower offensive standards
3. **Free Agent Markets**: Account for aging curves, apply discount rates, avoid winner's curse
4. **Trade Strategy**: Exchange surplus value across different timelines; align with competitive windows
5. **Draft Philosophy**: Balance upside (high school) vs safety (college) based on organizational timeline
6. **Strategic Clarity**: Commit fully to competing or rebuilding; avoid mediocre middle ground
Successful team building requires analytical rigor, clear strategic vision, and disciplined execution. The best front offices combine quantitative analysis with qualitative evaluation, organizational development, and adaptive strategy. As analytics evolve, teams that integrate new methods while maintaining coherent long-term plans will sustain competitive advantage.