The minor league system serves as baseball's developmental pipeline, with six organizational levels: Rookie ball, Single-A, High-A, Double-A, and Triple-A, plus complex leagues. Each level presents distinct analytical challenges and opportunities.
System Structure and Data Availability
Modern minor league analytics benefit from increasingly robust data collection. StatCast-equivalent systems have been implemented across Triple-A and Double-A since 2020, providing granular tracking data previously available only at the major league level. This technological advancement enables more sophisticated player evaluation and development strategies.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
# Minor league level progression framework
class MinorLeagueSystem:
def __init__(self):
self.levels = {
'Rookie': {'age_range': (17, 21), 'difficulty': 1},
'Low-A': {'age_range': (18, 22), 'difficulty': 2},
'High-A': {'age_range': (19, 23), 'difficulty': 3},
'AA': {'age_range': (20, 25), 'difficulty': 4},
'AAA': {'age_range': (21, 28), 'difficulty': 5}
}
def calculate_age_adjusted_performance(self, stats_df):
"""
Adjust performance metrics based on age relative to league average
"""
stats_df['age_diff'] = stats_df['age'] - stats_df['league_avg_age']
# Younger players face more difficulty - adjust stats upward
stats_df['wRC+_age_adj'] = stats_df['wRC+'] + (stats_df['age_diff'] * -3)
stats_df['K_pct_age_adj'] = stats_df['K_pct'] + (stats_df['age_diff'] * 0.5)
stats_df['BB_pct_age_adj'] = stats_df['BB_pct'] + (stats_df['age_diff'] * -0.3)
return stats_df
def calculate_level_difficulty_factor(self, from_level, to_level):
"""
Calculate difficulty increase between levels
"""
difficulty_jump = (self.levels[to_level]['difficulty'] -
self.levels[from_level]['difficulty'])
return 1 + (difficulty_jump * 0.15) # ~15% difficulty per level
# Example: Analyzing a prospect's progression
prospect_data = pd.DataFrame({
'level': ['Low-A', 'High-A', 'AA', 'AAA'],
'age': [20, 21, 22, 23],
'league_avg_age': [21.5, 22.8, 24.1, 26.3],
'wRC+': [115, 108, 98, 105],
'K_pct': [22.5, 24.1, 26.8, 24.2],
'BB_pct': [8.5, 9.2, 10.1, 11.3],
'ISO': [.185, .172, .158, .168]
})
system = MinorLeagueSystem()
adjusted_stats = system.calculate_age_adjusted_performance(prospect_data)
print("Age-Adjusted Performance Metrics:")
print(adjusted_stats[['level', 'age', 'wRC+', 'wRC+_age_adj']])
library(tidyverse)
library(ggplot2)
# Minor league progression analysis
analyze_minor_league_progression <- function(player_data) {
# Calculate age-adjusted performance
player_data <- player_data %>%
mutate(
age_diff = age - league_avg_age,
wRC_plus_adj = wRC_plus + (age_diff * -3),
# Level difficulty adjustment
level_num = case_when(
level == "Rookie" ~ 1,
level == "Low-A" ~ 2,
level == "High-A" ~ 3,
level == "AA" ~ 4,
level == "AAA" ~ 5
)
)
return(player_data)
}
# Example: Bobby Witt Jr.'s minor league progression
witt_progression <- tibble(
year = c(2019, 2020, 2021, 2021),
level = c("Rookie", "Low-A", "High-A", "AA"),
age = c(19, 20, 21, 21),
league_avg_age = c(20.1, 21.5, 22.8, 24.1),
wRC_plus = c(128, NA, 142, 135), # 2020 cancelled
K_pct = c(25.8, NA, 20.1, 22.4),
BB_pct = c(9.1, NA, 11.2, 10.8),
ISO = c(.198, NA, .246, .221)
)
witt_adjusted <- analyze_minor_league_progression(witt_progression)
# Visualize progression
ggplot(witt_adjusted %>% filter(!is.na(wRC_plus)),
aes(x = level_num, y = wRC_plus_adj)) +
geom_line(size = 1.2, color = "#004687") +
geom_point(size = 4, color = "#BD9B60") +
labs(
title = "Bobby Witt Jr. - Age-Adjusted Performance Progression",
x = "Minor League Level",
y = "Age-Adjusted wRC+",
subtitle = "Accounting for age relative to league average"
) +
theme_minimal()
Key Performance Indicators by Level
Different metrics carry varying predictive weight at different levels. In lower minors, raw tools and plate discipline metrics often matter more than results. As players advance, the ability to make consistent hard contact and handle advanced pitching becomes paramount.
Critical Metrics by Level:
- Rookie/Low-A: Walk rate, strikeout rate, exit velocity (where available)
- High-A: Contact quality, swing decisions, pitch recognition
- Double-A: Advanced metrics (xwOBA, hard-hit rate), platoon splits
- Triple-A: MLB-readiness indicators, specific skill refinements
library(tidyverse)
library(ggplot2)
# Minor league progression analysis
analyze_minor_league_progression <- function(player_data) {
# Calculate age-adjusted performance
player_data <- player_data %>%
mutate(
age_diff = age - league_avg_age,
wRC_plus_adj = wRC_plus + (age_diff * -3),
# Level difficulty adjustment
level_num = case_when(
level == "Rookie" ~ 1,
level == "Low-A" ~ 2,
level == "High-A" ~ 3,
level == "AA" ~ 4,
level == "AAA" ~ 5
)
)
return(player_data)
}
# Example: Bobby Witt Jr.'s minor league progression
witt_progression <- tibble(
year = c(2019, 2020, 2021, 2021),
level = c("Rookie", "Low-A", "High-A", "AA"),
age = c(19, 20, 21, 21),
league_avg_age = c(20.1, 21.5, 22.8, 24.1),
wRC_plus = c(128, NA, 142, 135), # 2020 cancelled
K_pct = c(25.8, NA, 20.1, 22.4),
BB_pct = c(9.1, NA, 11.2, 10.8),
ISO = c(.198, NA, .246, .221)
)
witt_adjusted <- analyze_minor_league_progression(witt_progression)
# Visualize progression
ggplot(witt_adjusted %>% filter(!is.na(wRC_plus)),
aes(x = level_num, y = wRC_plus_adj)) +
geom_line(size = 1.2, color = "#004687") +
geom_point(size = 4, color = "#BD9B60") +
labs(
title = "Bobby Witt Jr. - Age-Adjusted Performance Progression",
x = "Minor League Level",
y = "Age-Adjusted wRC+",
subtitle = "Accounting for age relative to league average"
) +
theme_minimal()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
# Minor league level progression framework
class MinorLeagueSystem:
def __init__(self):
self.levels = {
'Rookie': {'age_range': (17, 21), 'difficulty': 1},
'Low-A': {'age_range': (18, 22), 'difficulty': 2},
'High-A': {'age_range': (19, 23), 'difficulty': 3},
'AA': {'age_range': (20, 25), 'difficulty': 4},
'AAA': {'age_range': (21, 28), 'difficulty': 5}
}
def calculate_age_adjusted_performance(self, stats_df):
"""
Adjust performance metrics based on age relative to league average
"""
stats_df['age_diff'] = stats_df['age'] - stats_df['league_avg_age']
# Younger players face more difficulty - adjust stats upward
stats_df['wRC+_age_adj'] = stats_df['wRC+'] + (stats_df['age_diff'] * -3)
stats_df['K_pct_age_adj'] = stats_df['K_pct'] + (stats_df['age_diff'] * 0.5)
stats_df['BB_pct_age_adj'] = stats_df['BB_pct'] + (stats_df['age_diff'] * -0.3)
return stats_df
def calculate_level_difficulty_factor(self, from_level, to_level):
"""
Calculate difficulty increase between levels
"""
difficulty_jump = (self.levels[to_level]['difficulty'] -
self.levels[from_level]['difficulty'])
return 1 + (difficulty_jump * 0.15) # ~15% difficulty per level
# Example: Analyzing a prospect's progression
prospect_data = pd.DataFrame({
'level': ['Low-A', 'High-A', 'AA', 'AAA'],
'age': [20, 21, 22, 23],
'league_avg_age': [21.5, 22.8, 24.1, 26.3],
'wRC+': [115, 108, 98, 105],
'K_pct': [22.5, 24.1, 26.8, 24.2],
'BB_pct': [8.5, 9.2, 10.1, 11.3],
'ISO': [.185, .172, .158, .168]
})
system = MinorLeagueSystem()
adjusted_stats = system.calculate_age_adjusted_performance(prospect_data)
print("Age-Adjusted Performance Metrics:")
print(adjusted_stats[['level', 'age', 'wRC+', 'wRC+_age_adj']])
Traditional scouting grades (20-80 scale) combine with modern analytics to create comprehensive prospect evaluation frameworks. The key is understanding which metrics translate across levels and predict major league success.
Composite Evaluation Framework
class ProspectEvaluator:
def __init__(self):
self.weights = {
'hit_tool': 0.25,
'power_tool': 0.20,
'speed_tool': 0.10,
'plate_discipline': 0.25,
'contact_quality': 0.20
}
def calculate_hit_probability(self, prospect_metrics):
"""
Calculate probability of MLB success based on minor league metrics
"""
# Key predictive metrics
k_rate_score = self.normalize_k_rate(prospect_metrics['K_pct'])
bb_rate_score = self.normalize_bb_rate(prospect_metrics['BB_pct'])
contact_score = self.normalize_contact(prospect_metrics['contact_pct'])
power_score = self.normalize_power(prospect_metrics['ISO'])
# Composite score
composite = (
k_rate_score * 0.30 +
bb_rate_score * 0.25 +
contact_score * 0.25 +
power_score * 0.20
)
# Convert to probability using logistic function
probability = 1 / (1 + np.exp(-5 * (composite - 0.5)))
return probability
def normalize_k_rate(self, k_pct):
"""Lower is better - normalize to 0-1 scale"""
# Elite: 15%, Average: 23%, Poor: 30%
return max(0, min(1, (30 - k_pct) / 15))
def normalize_bb_rate(self, bb_pct):
"""Higher is better"""
# Elite: 12%, Average: 8%, Poor: 4%
return max(0, min(1, (bb_pct - 4) / 8))
def normalize_contact(self, contact_pct):
"""Higher is better"""
# Elite: 85%, Average: 75%, Poor: 65%
return max(0, min(1, (contact_pct - 65) / 20))
def normalize_power(self, iso):
"""Higher is better"""
# Elite: .250, Average: .150, Poor: .080
return max(0, min(1, (iso - 0.080) / 0.170))
# Example: Evaluating top prospects
prospects = pd.DataFrame({
'name': ['Adley Rutschman', 'Julio Rodriguez', 'Bobby Witt Jr.', 'Riley Greene'],
'level': ['AAA', 'AA', 'AA', 'AAA'],
'K_pct': [18.2, 22.4, 22.4, 20.1],
'BB_pct': [13.5, 10.8, 10.8, 11.2],
'contact_pct': [81.2, 76.5, 77.8, 79.3],
'ISO': [.242, .221, .221, .198],
'age': [23, 20, 21, 21]
})
evaluator = ProspectEvaluator()
prospects['mlb_success_prob'] = prospects.apply(
lambda row: evaluator.calculate_hit_probability(row), axis=1
)
print("\nProspect MLB Success Probability:")
print(prospects[['name', 'level', 'age', 'mlb_success_prob']].sort_values(
'mlb_success_prob', ascending=False
))
# Prospect evaluation using plate discipline metrics
evaluate_prospect_discipline <- function(prospect_data) {
# Calculate key ratios and scores
prospect_data <- prospect_data %>%
mutate(
bb_k_ratio = BB_pct / K_pct,
discipline_score = (BB_pct * 2) - (K_pct * 0.5),
# Percentile rankings
bb_percentile = percent_rank(BB_pct),
k_percentile = 1 - percent_rank(K_pct), # Inverse for K%
iso_percentile = percent_rank(ISO),
# Composite tool grade (20-80 scale)
composite_grade = 20 + (
(bb_percentile * 0.3 +
k_percentile * 0.3 +
iso_percentile * 0.4) * 60
)
)
return(prospect_data)
}
# Example: 2021 top prospect class evaluation
prospects_2021 <- tibble(
name = c("Wander Franco", "Adley Rutschman", "Julio Rodriguez",
"Bobby Witt Jr.", "Riley Greene"),
level = c("AAA", "AAA", "AA", "AA", "AAA"),
age = c(20, 23, 20, 21, 21),
K_pct = c(19.8, 18.2, 22.4, 22.4, 20.1),
BB_pct = c(11.2, 13.5, 10.8, 10.8, 11.2),
ISO = c(.198, .242, .221, .221, .198),
contact_pct = c(80.2, 81.8, 76.5, 77.8, 79.3)
)
evaluated_prospects <- evaluate_prospect_discipline(prospects_2021)
# Display results
print("Prospect Evaluation Scores:")
evaluated_prospects %>%
select(name, age, discipline_score, composite_grade) %>%
arrange(desc(composite_grade))
Contact Quality Metrics
With StatCast data available in upper minors, evaluators can assess contact quality using the same metrics as major league analysis: exit velocity, launch angle, barrel rate, and expected statistics (xBA, xSLG, xwOBA).
def analyze_contact_quality(statcast_data):
"""
Analyze minor league StatCast data for contact quality indicators
"""
# Calculate key contact metrics
results = {
'avg_exit_velo': statcast_data['exit_velocity'].mean(),
'max_exit_velo': statcast_data['exit_velocity'].max(),
'barrel_rate': (statcast_data['barrel'] == 1).sum() / len(statcast_data) * 100,
'hard_hit_rate': (statcast_data['exit_velocity'] >= 95).sum() / len(statcast_data) * 100,
'sweet_spot_pct': ((statcast_data['launch_angle'] >= 8) &
(statcast_data['launch_angle'] <= 32)).sum() / len(statcast_data) * 100
}
# Calculate expected statistics
results['xBA'] = calculate_xBA(statcast_data)
results['xSLG'] = calculate_xSLG(statcast_data)
return results
def calculate_xBA(df):
"""Simplified xBA calculation based on exit velo and launch angle"""
# This is a simplified version - actual xBA uses more complex models
conditions = [
(df['exit_velocity'] >= 98) & (df['launch_angle'].between(8, 32)),
(df['exit_velocity'] >= 90) & (df['launch_angle'].between(10, 30)),
(df['exit_velocity'] >= 80) & (df['launch_angle'].between(-10, 40))
]
values = [0.750, 0.450, 0.250]
df['xBA_contact'] = np.select(conditions, values, default=0.100)
return df['xBA_contact'].mean()
def calculate_xSLG(df):
"""Simplified xSLG calculation"""
# Estimate bases based on exit velo and launch angle
conditions = [
(df['exit_velocity'] >= 100) & (df['launch_angle'].between(20, 35)), # HR territory
(df['exit_velocity'] >= 95) & (df['launch_angle'].between(15, 40)), # XBH likely
(df['exit_velocity'] >= 90) & (df['launch_angle'].between(10, 30)), # Solid contact
]
bases = [2.5, 1.8, 1.2]
df['expected_bases'] = np.select(conditions, bases, default=0.5)
return df['expected_bases'].mean()
# Example: Analyzing a top prospect's contact quality
np.random.seed(42)
n_batted_balls = 250
# Simulate Julio Rodriguez's AAA data (strong contact profile)
julio_data = pd.DataFrame({
'exit_velocity': np.random.normal(92.5, 7.5, n_batted_balls),
'launch_angle': np.random.normal(12, 18, n_batted_balls),
'barrel': np.random.choice([0, 1], n_batted_balls, p=[0.88, 0.12])
})
julio_contact = analyze_contact_quality(julio_data)
print("\nJulio Rodriguez Contact Quality Profile:")
for metric, value in julio_contact.items():
print(f"{metric}: {value:.2f}")
# Prospect evaluation using plate discipline metrics
evaluate_prospect_discipline <- function(prospect_data) {
# Calculate key ratios and scores
prospect_data <- prospect_data %>%
mutate(
bb_k_ratio = BB_pct / K_pct,
discipline_score = (BB_pct * 2) - (K_pct * 0.5),
# Percentile rankings
bb_percentile = percent_rank(BB_pct),
k_percentile = 1 - percent_rank(K_pct), # Inverse for K%
iso_percentile = percent_rank(ISO),
# Composite tool grade (20-80 scale)
composite_grade = 20 + (
(bb_percentile * 0.3 +
k_percentile * 0.3 +
iso_percentile * 0.4) * 60
)
)
return(prospect_data)
}
# Example: 2021 top prospect class evaluation
prospects_2021 <- tibble(
name = c("Wander Franco", "Adley Rutschman", "Julio Rodriguez",
"Bobby Witt Jr.", "Riley Greene"),
level = c("AAA", "AAA", "AA", "AA", "AAA"),
age = c(20, 23, 20, 21, 21),
K_pct = c(19.8, 18.2, 22.4, 22.4, 20.1),
BB_pct = c(11.2, 13.5, 10.8, 10.8, 11.2),
ISO = c(.198, .242, .221, .221, .198),
contact_pct = c(80.2, 81.8, 76.5, 77.8, 79.3)
)
evaluated_prospects <- evaluate_prospect_discipline(prospects_2021)
# Display results
print("Prospect Evaluation Scores:")
evaluated_prospects %>%
select(name, age, discipline_score, composite_grade) %>%
arrange(desc(composite_grade))
class ProspectEvaluator:
def __init__(self):
self.weights = {
'hit_tool': 0.25,
'power_tool': 0.20,
'speed_tool': 0.10,
'plate_discipline': 0.25,
'contact_quality': 0.20
}
def calculate_hit_probability(self, prospect_metrics):
"""
Calculate probability of MLB success based on minor league metrics
"""
# Key predictive metrics
k_rate_score = self.normalize_k_rate(prospect_metrics['K_pct'])
bb_rate_score = self.normalize_bb_rate(prospect_metrics['BB_pct'])
contact_score = self.normalize_contact(prospect_metrics['contact_pct'])
power_score = self.normalize_power(prospect_metrics['ISO'])
# Composite score
composite = (
k_rate_score * 0.30 +
bb_rate_score * 0.25 +
contact_score * 0.25 +
power_score * 0.20
)
# Convert to probability using logistic function
probability = 1 / (1 + np.exp(-5 * (composite - 0.5)))
return probability
def normalize_k_rate(self, k_pct):
"""Lower is better - normalize to 0-1 scale"""
# Elite: 15%, Average: 23%, Poor: 30%
return max(0, min(1, (30 - k_pct) / 15))
def normalize_bb_rate(self, bb_pct):
"""Higher is better"""
# Elite: 12%, Average: 8%, Poor: 4%
return max(0, min(1, (bb_pct - 4) / 8))
def normalize_contact(self, contact_pct):
"""Higher is better"""
# Elite: 85%, Average: 75%, Poor: 65%
return max(0, min(1, (contact_pct - 65) / 20))
def normalize_power(self, iso):
"""Higher is better"""
# Elite: .250, Average: .150, Poor: .080
return max(0, min(1, (iso - 0.080) / 0.170))
# Example: Evaluating top prospects
prospects = pd.DataFrame({
'name': ['Adley Rutschman', 'Julio Rodriguez', 'Bobby Witt Jr.', 'Riley Greene'],
'level': ['AAA', 'AA', 'AA', 'AAA'],
'K_pct': [18.2, 22.4, 22.4, 20.1],
'BB_pct': [13.5, 10.8, 10.8, 11.2],
'contact_pct': [81.2, 76.5, 77.8, 79.3],
'ISO': [.242, .221, .221, .198],
'age': [23, 20, 21, 21]
})
evaluator = ProspectEvaluator()
prospects['mlb_success_prob'] = prospects.apply(
lambda row: evaluator.calculate_hit_probability(row), axis=1
)
print("\nProspect MLB Success Probability:")
print(prospects[['name', 'level', 'age', 'mlb_success_prob']].sort_values(
'mlb_success_prob', ascending=False
))
def analyze_contact_quality(statcast_data):
"""
Analyze minor league StatCast data for contact quality indicators
"""
# Calculate key contact metrics
results = {
'avg_exit_velo': statcast_data['exit_velocity'].mean(),
'max_exit_velo': statcast_data['exit_velocity'].max(),
'barrel_rate': (statcast_data['barrel'] == 1).sum() / len(statcast_data) * 100,
'hard_hit_rate': (statcast_data['exit_velocity'] >= 95).sum() / len(statcast_data) * 100,
'sweet_spot_pct': ((statcast_data['launch_angle'] >= 8) &
(statcast_data['launch_angle'] <= 32)).sum() / len(statcast_data) * 100
}
# Calculate expected statistics
results['xBA'] = calculate_xBA(statcast_data)
results['xSLG'] = calculate_xSLG(statcast_data)
return results
def calculate_xBA(df):
"""Simplified xBA calculation based on exit velo and launch angle"""
# This is a simplified version - actual xBA uses more complex models
conditions = [
(df['exit_velocity'] >= 98) & (df['launch_angle'].between(8, 32)),
(df['exit_velocity'] >= 90) & (df['launch_angle'].between(10, 30)),
(df['exit_velocity'] >= 80) & (df['launch_angle'].between(-10, 40))
]
values = [0.750, 0.450, 0.250]
df['xBA_contact'] = np.select(conditions, values, default=0.100)
return df['xBA_contact'].mean()
def calculate_xSLG(df):
"""Simplified xSLG calculation"""
# Estimate bases based on exit velo and launch angle
conditions = [
(df['exit_velocity'] >= 100) & (df['launch_angle'].between(20, 35)), # HR territory
(df['exit_velocity'] >= 95) & (df['launch_angle'].between(15, 40)), # XBH likely
(df['exit_velocity'] >= 90) & (df['launch_angle'].between(10, 30)), # Solid contact
]
bases = [2.5, 1.8, 1.2]
df['expected_bases'] = np.select(conditions, bases, default=0.5)
return df['expected_bases'].mean()
# Example: Analyzing a top prospect's contact quality
np.random.seed(42)
n_batted_balls = 250
# Simulate Julio Rodriguez's AAA data (strong contact profile)
julio_data = pd.DataFrame({
'exit_velocity': np.random.normal(92.5, 7.5, n_batted_balls),
'launch_angle': np.random.normal(12, 18, n_batted_balls),
'barrel': np.random.choice([0, 1], n_batted_balls, p=[0.88, 0.12])
})
julio_contact = analyze_contact_quality(julio_data)
print("\nJulio Rodriguez Contact Quality Profile:")
for metric, value in julio_contact.items():
print(f"{metric}: {value:.2f}")
Projection systems for minor leaguers face unique challenges: limited sample sizes, developmental trajectories, and the difficulty of translating performance across competitive levels. Modern systems combine statistical translation with aging curves and scouting inputs.
MiLB-to-MLB Translation
The fundamental challenge is converting minor league statistics to major league equivalents. Bill James pioneered this approach with his Minor League Equivalencies (MLE), which remains the foundation for modern systems.
library(tidyverse)
# Minor League Equivalency (MLE) translation factors
calculate_mle <- function(minor_stats, level) {
# Translation factors by level (approximate)
translation_factors <- list(
"AAA" = list(avg = 0.77, iso = 0.85, bb_pct = 0.92, k_pct = 1.15),
"AA" = list(avg = 0.73, iso = 0.80, bb_pct = 0.88, k_pct = 1.22),
"High-A" = list(avg = 0.68, iso = 0.75, bb_pct = 0.82, k_pct = 1.30),
"Low-A" = list(avg = 0.63, iso = 0.70, bb_pct = 0.78, k_pct = 1.40)
)
factors <- translation_factors[[level]]
# Apply translations
mle <- minor_stats %>%
mutate(
mlb_avg = avg * factors$avg,
mlb_iso = iso * factors$iso,
mlb_bb_pct = bb_pct * factors$bb_pct,
mlb_k_pct = k_pct * factors$k_pct,
mlb_obp = mlb_avg + mlb_bb_pct,
mlb_slg = mlb_avg + mlb_iso,
mlb_wOBA = calculate_wOBA(mlb_avg, mlb_bb_pct, mlb_iso)
)
return(mle)
}
calculate_wOBA <- function(avg, bb_pct, iso) {
# Simplified wOBA calculation
# Actual formula uses specific event weights
singles <- avg - iso
extra_bases <- iso
wOBA <- (bb_pct * 0.69) + (singles * 0.88) + (extra_bases * 1.60)
return(wOBA)
}
# Example: Project Spencer Torkelson's AAA performance to MLB
torkelson_aaa <- tibble(
level = "AAA",
pa = 358,
avg = .267,
iso = .234,
bb_pct = 0.112,
k_pct = 0.279,
hr = 30,
age = 21
)
torkelson_mlb_proj <- calculate_mle(torkelson_aaa, "AAA")
print("Spencer Torkelson MLB Projection:")
print(torkelson_mlb_proj %>%
select(mlb_avg, mlb_iso, mlb_bb_pct, mlb_k_pct, mlb_wOBA))
Age-Based Developmental Curves
Players develop at different rates, but aggregate data reveals consistent aging patterns. Young players typically improve through age 27, then gradually decline. Prospects require age-adjusted projections.
class ProspectProjectionSystem:
def __init__(self):
# Age adjustment factors (peak = 27)
self.age_factors = {
19: -0.050, 20: -0.040, 21: -0.030, 22: -0.020, 23: -0.010,
24: -0.005, 25: 0.000, 26: 0.005, 27: 0.010, 28: 0.005,
29: 0.000, 30: -0.010, 31: -0.020, 32: -0.035
}
def project_future_performance(self, current_stats, current_age, target_age):
"""
Project performance from current age to target age
"""
current_factor = self.age_factors.get(current_age, 0)
target_factor = self.age_factors.get(target_age, 0)
improvement = target_factor - current_factor
# Apply improvement to stats
projected = current_stats.copy()
projected['wRC+'] = current_stats['wRC+'] * (1 + improvement)
projected['ISO'] = current_stats['ISO'] * (1 + improvement * 1.2) # Power develops more
projected['K_pct'] = current_stats['K_pct'] * (1 - improvement * 0.8) # K% improves
projected['BB_pct'] = current_stats['BB_pct'] * (1 + improvement * 0.6) # BB% improves
return projected
def create_projection_range(self, stats, age, confidence=0.80):
"""
Create projection range with confidence intervals
"""
# Project to age 27 (peak)
peak_projection = self.project_future_performance(stats, age, 27)
# Calculate uncertainty based on age distance from peak
age_distance = abs(27 - age)
uncertainty = 0.10 + (age_distance * 0.015) # More uncertainty for younger players
# Calculate confidence intervals
z_score = 1.282 if confidence == 0.80 else 1.645 # 80% or 90% confidence
projection = {}
for stat in peak_projection.index:
mean_val = peak_projection[stat]
std_error = mean_val * uncertainty
projection[stat] = {
'median': mean_val,
'lower': mean_val - (z_score * std_error),
'upper': mean_val + (z_score * std_error)
}
return projection
# Example: Project Gunnar Henderson's development
henderson_current = pd.Series({
'wRC+': 125,
'ISO': .215,
'K_pct': 24.5,
'BB_pct': 10.2
})
projector = ProspectProjectionSystem()
henderson_proj = projector.create_projection_range(henderson_current, age=22)
print("\nGunnar Henderson Age-27 Projection (80% confidence):")
for stat, values in henderson_proj.items():
print(f"{stat}: {values['median']:.3f} ({values['lower']:.3f} - {values['upper']:.3f})")
Multi-Year Projection Framework
def create_multi_year_projection(player_data, years=5):
"""
Create multi-year projections incorporating development and regression
"""
projections = []
current_age = player_data['age']
current_stats = player_data['stats']
projector = ProspectProjectionSystem()
for year in range(years):
target_age = current_age + year
# Get base projection
year_proj = projector.project_future_performance(
current_stats, current_age, target_age
)
# Add playing time projection (increases as player establishes)
pa_projection = min(600, 400 + (year * 50))
# Regression for young players
regression_factor = 0.85 if year == 0 else 0.95
proj_dict = {
'year': year + 1,
'age': target_age,
'PA': pa_projection,
'wRC+': year_proj['wRC+'] * regression_factor,
'ISO': year_proj['ISO'] * regression_factor,
'K_pct': year_proj['K_pct'] / regression_factor,
'BB_pct': year_proj['BB_pct'] * regression_factor
}
projections.append(proj_dict)
return pd.DataFrame(projections)
# Example: 5-year projection for top prospect
prospect_profile = {
'name': 'Jackson Holliday',
'age': 20,
'stats': pd.Series({
'wRC+': 135,
'ISO': .225,
'K_pct': 21.5,
'BB_pct': 12.8
})
}
holliday_projection = create_multi_year_projection(prospect_profile)
print("\nJackson Holliday 5-Year Projection:")
print(holliday_projection.round(1))
library(tidyverse)
# Minor League Equivalency (MLE) translation factors
calculate_mle <- function(minor_stats, level) {
# Translation factors by level (approximate)
translation_factors <- list(
"AAA" = list(avg = 0.77, iso = 0.85, bb_pct = 0.92, k_pct = 1.15),
"AA" = list(avg = 0.73, iso = 0.80, bb_pct = 0.88, k_pct = 1.22),
"High-A" = list(avg = 0.68, iso = 0.75, bb_pct = 0.82, k_pct = 1.30),
"Low-A" = list(avg = 0.63, iso = 0.70, bb_pct = 0.78, k_pct = 1.40)
)
factors <- translation_factors[[level]]
# Apply translations
mle <- minor_stats %>%
mutate(
mlb_avg = avg * factors$avg,
mlb_iso = iso * factors$iso,
mlb_bb_pct = bb_pct * factors$bb_pct,
mlb_k_pct = k_pct * factors$k_pct,
mlb_obp = mlb_avg + mlb_bb_pct,
mlb_slg = mlb_avg + mlb_iso,
mlb_wOBA = calculate_wOBA(mlb_avg, mlb_bb_pct, mlb_iso)
)
return(mle)
}
calculate_wOBA <- function(avg, bb_pct, iso) {
# Simplified wOBA calculation
# Actual formula uses specific event weights
singles <- avg - iso
extra_bases <- iso
wOBA <- (bb_pct * 0.69) + (singles * 0.88) + (extra_bases * 1.60)
return(wOBA)
}
# Example: Project Spencer Torkelson's AAA performance to MLB
torkelson_aaa <- tibble(
level = "AAA",
pa = 358,
avg = .267,
iso = .234,
bb_pct = 0.112,
k_pct = 0.279,
hr = 30,
age = 21
)
torkelson_mlb_proj <- calculate_mle(torkelson_aaa, "AAA")
print("Spencer Torkelson MLB Projection:")
print(torkelson_mlb_proj %>%
select(mlb_avg, mlb_iso, mlb_bb_pct, mlb_k_pct, mlb_wOBA))
class ProspectProjectionSystem:
def __init__(self):
# Age adjustment factors (peak = 27)
self.age_factors = {
19: -0.050, 20: -0.040, 21: -0.030, 22: -0.020, 23: -0.010,
24: -0.005, 25: 0.000, 26: 0.005, 27: 0.010, 28: 0.005,
29: 0.000, 30: -0.010, 31: -0.020, 32: -0.035
}
def project_future_performance(self, current_stats, current_age, target_age):
"""
Project performance from current age to target age
"""
current_factor = self.age_factors.get(current_age, 0)
target_factor = self.age_factors.get(target_age, 0)
improvement = target_factor - current_factor
# Apply improvement to stats
projected = current_stats.copy()
projected['wRC+'] = current_stats['wRC+'] * (1 + improvement)
projected['ISO'] = current_stats['ISO'] * (1 + improvement * 1.2) # Power develops more
projected['K_pct'] = current_stats['K_pct'] * (1 - improvement * 0.8) # K% improves
projected['BB_pct'] = current_stats['BB_pct'] * (1 + improvement * 0.6) # BB% improves
return projected
def create_projection_range(self, stats, age, confidence=0.80):
"""
Create projection range with confidence intervals
"""
# Project to age 27 (peak)
peak_projection = self.project_future_performance(stats, age, 27)
# Calculate uncertainty based on age distance from peak
age_distance = abs(27 - age)
uncertainty = 0.10 + (age_distance * 0.015) # More uncertainty for younger players
# Calculate confidence intervals
z_score = 1.282 if confidence == 0.80 else 1.645 # 80% or 90% confidence
projection = {}
for stat in peak_projection.index:
mean_val = peak_projection[stat]
std_error = mean_val * uncertainty
projection[stat] = {
'median': mean_val,
'lower': mean_val - (z_score * std_error),
'upper': mean_val + (z_score * std_error)
}
return projection
# Example: Project Gunnar Henderson's development
henderson_current = pd.Series({
'wRC+': 125,
'ISO': .215,
'K_pct': 24.5,
'BB_pct': 10.2
})
projector = ProspectProjectionSystem()
henderson_proj = projector.create_projection_range(henderson_current, age=22)
print("\nGunnar Henderson Age-27 Projection (80% confidence):")
for stat, values in henderson_proj.items():
print(f"{stat}: {values['median']:.3f} ({values['lower']:.3f} - {values['upper']:.3f})")
def create_multi_year_projection(player_data, years=5):
"""
Create multi-year projections incorporating development and regression
"""
projections = []
current_age = player_data['age']
current_stats = player_data['stats']
projector = ProspectProjectionSystem()
for year in range(years):
target_age = current_age + year
# Get base projection
year_proj = projector.project_future_performance(
current_stats, current_age, target_age
)
# Add playing time projection (increases as player establishes)
pa_projection = min(600, 400 + (year * 50))
# Regression for young players
regression_factor = 0.85 if year == 0 else 0.95
proj_dict = {
'year': year + 1,
'age': target_age,
'PA': pa_projection,
'wRC+': year_proj['wRC+'] * regression_factor,
'ISO': year_proj['ISO'] * regression_factor,
'K_pct': year_proj['K_pct'] / regression_factor,
'BB_pct': year_proj['BB_pct'] * regression_factor
}
projections.append(proj_dict)
return pd.DataFrame(projections)
# Example: 5-year projection for top prospect
prospect_profile = {
'name': 'Jackson Holliday',
'age': 20,
'stats': pd.Series({
'wRC+': 135,
'ISO': .225,
'K_pct': 21.5,
'BB_pct': 12.8
})
}
holliday_projection = create_multi_year_projection(prospect_profile)
print("\nJackson Holliday 5-Year Projection:")
print(holliday_projection.round(1))
Identifying breakout candidates before they emerge represents significant competitive advantage. Analytics can reveal players whose underlying metrics suggest imminent improvement.
Leading Indicators of Breakout Performance
library(tidyverse)
library(randomForest)
# Identify breakout candidates using key indicators
identify_breakout_candidates <- function(prospect_pool) {
# Calculate breakout indicators
candidates <- prospect_pool %>%
mutate(
# Key indicators
discipline_improvement = BB_pct_current - BB_pct_prior,
k_rate_improvement = K_pct_prior - K_pct_current,
contact_quality_jump = avg_exit_velo - prior_avg_exit_velo,
# Composite breakout score
breakout_score = (
(discipline_improvement * 10) +
(k_rate_improvement * 8) +
(contact_quality_jump * 2) +
(barrel_rate * 1.5) +
(hard_hit_rate * 0.5)
),
# Age factor (young breakouts more valuable)
age_adjusted_score = breakout_score * (28 - age) / 8
) %>%
filter(
# Minimum playing time
PA >= 250,
# Positive trends
discipline_improvement > 0 | k_rate_improvement > 0,
# Age range
age <= 25
) %>%
arrange(desc(age_adjusted_score))
return(candidates)
}
# Example dataset: 2023 AA/AAA prospects
prospect_pool_2023 <- tibble(
name = c("Elly De La Cruz", "Jasson Dominguez", "Curtis Mead",
"Colton Cowser", "Jordan Walker"),
age = c(21, 20, 22, 23, 21),
level = c("AA", "AA", "AAA", "AAA", "AA"),
PA = c(412, 315, 485, 521, 456),
BB_pct_prior = c(8.2, 10.5, 7.8, 12.1, 6.5),
BB_pct_current = c(10.4, 12.8, 9.2, 14.5, 8.8),
K_pct_prior = c(28.5, 25.2, 22.8, 24.5, 26.2),
K_pct_current = c(25.2, 23.1, 21.5, 22.8, 23.8),
prior_avg_exit_velo = c(88.2, 89.5, 87.8, 90.2, 88.8),
avg_exit_velo = c(91.5, 91.2, 89.5, 91.8, 91.2),
barrel_rate = c(12.5, 11.8, 8.5, 10.2, 13.5),
hard_hit_rate = c(45.2, 42.8, 38.5, 43.2, 46.8)
)
breakout_candidates <- identify_breakout_candidates(prospect_pool_2023)
print("Top Breakout Candidates:")
print(breakout_candidates %>%
select(name, age, level, breakout_score, age_adjusted_score) %>%
head(5))
Swing Decision Metrics
Advanced tracking data reveals swing decisions—chase rate, in-zone contact, and pitch recognition metrics that often improve before traditional statistics reflect the change.
class SwingDecisionAnalyzer:
def __init__(self):
self.zone_thresholds = {
'in_zone': 0.75, # Expected swing rate in zone
'out_zone': 0.30 # Expected swing rate out of zone
}
def calculate_swing_metrics(self, pitch_data):
"""
Calculate advanced swing decision metrics
"""
# Separate in-zone and out-of-zone pitches
in_zone = pitch_data[pitch_data['in_zone'] == True]
out_zone = pitch_data[pitch_data['in_zone'] == False]
metrics = {
'z_swing_pct': (in_zone['swing'] == True).sum() / len(in_zone) * 100,
'o_swing_pct': (out_zone['swing'] == True).sum() / len(out_zone) * 100,
'z_contact_pct': ((in_zone['swing'] == True) & (in_zone['contact'] == True)).sum() /
(in_zone['swing'] == True).sum() * 100,
'o_contact_pct': ((out_zone['swing'] == True) & (out_zone['contact'] == True)).sum() /
(out_zone['swing'] == True).sum() * 100,
'chase_rate': (out_zone['swing'] == True).sum() / len(out_zone) * 100,
'whiff_rate': (pitch_data[pitch_data['swing'] == True]['contact'] == False).sum() /
(pitch_data['swing'] == True).sum() * 100
}
# Calculate decision value score
metrics['swing_decision_score'] = (
metrics['z_swing_pct'] * 0.3 +
(100 - metrics['o_swing_pct']) * 0.4 +
metrics['z_contact_pct'] * 0.3
)
return metrics
def identify_improvement(self, current_metrics, prior_metrics):
"""
Identify meaningful improvements in swing decisions
"""
improvements = {}
# Key improvements to track
improvements['chase_improvement'] = prior_metrics['chase_rate'] - current_metrics['chase_rate']
improvements['zone_contact_improvement'] = current_metrics['z_contact_pct'] - prior_metrics['z_contact_pct']
improvements['whiff_improvement'] = prior_metrics['whiff_rate'] - current_metrics['whiff_rate']
# Overall improvement score
improvements['total_improvement'] = (
improvements['chase_improvement'] * 0.40 +
improvements['zone_contact_improvement'] * 0.35 +
improvements['whiff_improvement'] * 0.25
)
return improvements
# Example: Analyzing Corbin Carroll's swing decision improvement
np.random.seed(42)
# Simulate pitch-by-pitch data
def generate_pitch_data(n_pitches, chase_rate, z_contact, whiff_rate):
data = pd.DataFrame({
'in_zone': np.random.choice([True, False], n_pitches, p=[0.45, 0.55]),
'swing': False,
'contact': False
})
# In-zone swings
data.loc[data['in_zone'] == True, 'swing'] = np.random.choice(
[True, False], (data['in_zone'] == True).sum(), p=[0.70, 0.30]
)
# Out-of-zone swings (chase)
data.loc[data['in_zone'] == False, 'swing'] = np.random.choice(
[True, False], (data['in_zone'] == False).sum(), p=[chase_rate, 1-chase_rate]
)
# Contact on swings
data.loc[(data['swing'] == True) & (data['in_zone'] == True), 'contact'] = np.random.choice(
[True, False], ((data['swing'] == True) & (data['in_zone'] == True)).sum(),
p=[z_contact, 1-z_contact]
)
data.loc[(data['swing'] == True) & (data['in_zone'] == False), 'contact'] = np.random.choice(
[True, False], ((data['swing'] == True) & (data['in_zone'] == False)).sum(),
p=[0.55, 0.45]
)
return data
# Prior year (2021)
carroll_2021 = generate_pitch_data(1500, chase_rate=0.32, z_contact=0.82, whiff_rate=0.25)
# Current year (2022) - improved
carroll_2022 = generate_pitch_data(1800, chase_rate=0.25, z_contact=0.88, whiff_rate=0.19)
analyzer = SwingDecisionAnalyzer()
metrics_2021 = analyzer.calculate_swing_metrics(carroll_2021)
metrics_2022 = analyzer.calculate_swing_metrics(carroll_2022)
improvements = analyzer.identify_improvement(metrics_2022, metrics_2021)
print("\nCorbin Carroll Swing Decision Improvement (2021 vs 2022):")
print(f"Chase Rate: {metrics_2021['chase_rate']:.1f}% → {metrics_2022['chase_rate']:.1f}% "
f"(Δ {improvements['chase_improvement']:.1f}%)")
print(f"Zone Contact: {metrics_2021['z_contact_pct']:.1f}% → {metrics_2022['z_contact_pct']:.1f}% "
f"(Δ {improvements['zone_contact_improvement']:.1f}%)")
print(f"Whiff Rate: {metrics_2021['whiff_rate']:.1f}% → {metrics_2022['whiff_rate']:.1f}% "
f"(Δ {improvements['whiff_improvement']:.1f}%)")
print(f"\nOverall Improvement Score: {improvements['total_improvement']:.2f}")
Power Development Indicators
Power often develops later than hit tool, making it critical to identify prospects showing early power indicators even if home run totals remain modest.
def analyze_power_development(batted_ball_data):
"""
Identify power development using batted ball quality
"""
# Calculate power indicators
indicators = {
'avg_distance': batted_ball_data['hit_distance'].mean(),
'max_distance': batted_ball_data['hit_distance'].max(),
'avg_exit_velo': batted_ball_data['exit_velocity'].mean(),
'max_exit_velo': batted_ball_data['exit_velocity'].max(),
'95mph+_rate': (batted_ball_data['exit_velocity'] >= 95).sum() / len(batted_ball_data) * 100,
'100mph+_rate': (batted_ball_data['exit_velocity'] >= 100).sum() / len(batted_ball_data) * 100,
'barrel_rate': (batted_ball_data['barrel'] == 1).sum() / len(batted_ball_data) * 100,
'optimal_la_rate': ((batted_ball_data['launch_angle'] >= 15) &
(batted_ball_data['launch_angle'] <= 35)).sum() / len(batted_ball_data) * 100
}
# Power potential score (0-100)
indicators['power_potential'] = (
(indicators['avg_exit_velo'] - 82) * 2.5 +
indicators['barrel_rate'] * 2 +
indicators['95mph+_rate'] * 0.5 +
(indicators['optimal_la_rate'] - 20) * 0.8
)
return indicators
# Example: Compare power development of similar prospects
np.random.seed(123)
# Prospect A: Raw power, needs refinement
prospect_a_bb = pd.DataFrame({
'exit_velocity': np.random.normal(91.5, 8, 300),
'launch_angle': np.random.normal(8, 22, 300), # Low avg launch angle
'hit_distance': np.random.normal(260, 75, 300),
'barrel': np.random.choice([0, 1], 300, p=[0.90, 0.10])
})
# Prospect B: Growing power, better approach
prospect_b_bb = pd.DataFrame({
'exit_velocity': np.random.normal(89.5, 7, 300),
'launch_angle': np.random.normal(14, 18, 300), # Better launch angle
'hit_distance': np.random.normal(270, 70, 300),
'barrel': np.random.choice([0, 1], 300, p=[0.92, 0.08])
})
power_a = analyze_power_development(prospect_a_bb)
power_b = analyze_power_development(prospect_b_bb)
print("\nPower Development Comparison:")
print(f"\nProspect A (Raw Power):")
print(f" Avg Exit Velo: {power_a['avg_exit_velo']:.1f} mph")
print(f" Barrel Rate: {power_a['barrel_rate']:.1f}%")
print(f" Optimal LA Rate: {power_a['optimal_la_rate']:.1f}%")
print(f" Power Potential Score: {power_a['power_potential']:.1f}")
print(f"\nProspect B (Refined Approach):")
print(f" Avg Exit Velo: {power_b['avg_exit_velo']:.1f} mph")
print(f" Barrel Rate: {power_b['barrel_rate']:.1f}%")
print(f" Optimal LA Rate: {power_b['optimal_la_rate']:.1f}%")
print(f" Power Potential Score: {power_b['power_potential']:.1f}")
library(tidyverse)
library(randomForest)
# Identify breakout candidates using key indicators
identify_breakout_candidates <- function(prospect_pool) {
# Calculate breakout indicators
candidates <- prospect_pool %>%
mutate(
# Key indicators
discipline_improvement = BB_pct_current - BB_pct_prior,
k_rate_improvement = K_pct_prior - K_pct_current,
contact_quality_jump = avg_exit_velo - prior_avg_exit_velo,
# Composite breakout score
breakout_score = (
(discipline_improvement * 10) +
(k_rate_improvement * 8) +
(contact_quality_jump * 2) +
(barrel_rate * 1.5) +
(hard_hit_rate * 0.5)
),
# Age factor (young breakouts more valuable)
age_adjusted_score = breakout_score * (28 - age) / 8
) %>%
filter(
# Minimum playing time
PA >= 250,
# Positive trends
discipline_improvement > 0 | k_rate_improvement > 0,
# Age range
age <= 25
) %>%
arrange(desc(age_adjusted_score))
return(candidates)
}
# Example dataset: 2023 AA/AAA prospects
prospect_pool_2023 <- tibble(
name = c("Elly De La Cruz", "Jasson Dominguez", "Curtis Mead",
"Colton Cowser", "Jordan Walker"),
age = c(21, 20, 22, 23, 21),
level = c("AA", "AA", "AAA", "AAA", "AA"),
PA = c(412, 315, 485, 521, 456),
BB_pct_prior = c(8.2, 10.5, 7.8, 12.1, 6.5),
BB_pct_current = c(10.4, 12.8, 9.2, 14.5, 8.8),
K_pct_prior = c(28.5, 25.2, 22.8, 24.5, 26.2),
K_pct_current = c(25.2, 23.1, 21.5, 22.8, 23.8),
prior_avg_exit_velo = c(88.2, 89.5, 87.8, 90.2, 88.8),
avg_exit_velo = c(91.5, 91.2, 89.5, 91.8, 91.2),
barrel_rate = c(12.5, 11.8, 8.5, 10.2, 13.5),
hard_hit_rate = c(45.2, 42.8, 38.5, 43.2, 46.8)
)
breakout_candidates <- identify_breakout_candidates(prospect_pool_2023)
print("Top Breakout Candidates:")
print(breakout_candidates %>%
select(name, age, level, breakout_score, age_adjusted_score) %>%
head(5))
class SwingDecisionAnalyzer:
def __init__(self):
self.zone_thresholds = {
'in_zone': 0.75, # Expected swing rate in zone
'out_zone': 0.30 # Expected swing rate out of zone
}
def calculate_swing_metrics(self, pitch_data):
"""
Calculate advanced swing decision metrics
"""
# Separate in-zone and out-of-zone pitches
in_zone = pitch_data[pitch_data['in_zone'] == True]
out_zone = pitch_data[pitch_data['in_zone'] == False]
metrics = {
'z_swing_pct': (in_zone['swing'] == True).sum() / len(in_zone) * 100,
'o_swing_pct': (out_zone['swing'] == True).sum() / len(out_zone) * 100,
'z_contact_pct': ((in_zone['swing'] == True) & (in_zone['contact'] == True)).sum() /
(in_zone['swing'] == True).sum() * 100,
'o_contact_pct': ((out_zone['swing'] == True) & (out_zone['contact'] == True)).sum() /
(out_zone['swing'] == True).sum() * 100,
'chase_rate': (out_zone['swing'] == True).sum() / len(out_zone) * 100,
'whiff_rate': (pitch_data[pitch_data['swing'] == True]['contact'] == False).sum() /
(pitch_data['swing'] == True).sum() * 100
}
# Calculate decision value score
metrics['swing_decision_score'] = (
metrics['z_swing_pct'] * 0.3 +
(100 - metrics['o_swing_pct']) * 0.4 +
metrics['z_contact_pct'] * 0.3
)
return metrics
def identify_improvement(self, current_metrics, prior_metrics):
"""
Identify meaningful improvements in swing decisions
"""
improvements = {}
# Key improvements to track
improvements['chase_improvement'] = prior_metrics['chase_rate'] - current_metrics['chase_rate']
improvements['zone_contact_improvement'] = current_metrics['z_contact_pct'] - prior_metrics['z_contact_pct']
improvements['whiff_improvement'] = prior_metrics['whiff_rate'] - current_metrics['whiff_rate']
# Overall improvement score
improvements['total_improvement'] = (
improvements['chase_improvement'] * 0.40 +
improvements['zone_contact_improvement'] * 0.35 +
improvements['whiff_improvement'] * 0.25
)
return improvements
# Example: Analyzing Corbin Carroll's swing decision improvement
np.random.seed(42)
# Simulate pitch-by-pitch data
def generate_pitch_data(n_pitches, chase_rate, z_contact, whiff_rate):
data = pd.DataFrame({
'in_zone': np.random.choice([True, False], n_pitches, p=[0.45, 0.55]),
'swing': False,
'contact': False
})
# In-zone swings
data.loc[data['in_zone'] == True, 'swing'] = np.random.choice(
[True, False], (data['in_zone'] == True).sum(), p=[0.70, 0.30]
)
# Out-of-zone swings (chase)
data.loc[data['in_zone'] == False, 'swing'] = np.random.choice(
[True, False], (data['in_zone'] == False).sum(), p=[chase_rate, 1-chase_rate]
)
# Contact on swings
data.loc[(data['swing'] == True) & (data['in_zone'] == True), 'contact'] = np.random.choice(
[True, False], ((data['swing'] == True) & (data['in_zone'] == True)).sum(),
p=[z_contact, 1-z_contact]
)
data.loc[(data['swing'] == True) & (data['in_zone'] == False), 'contact'] = np.random.choice(
[True, False], ((data['swing'] == True) & (data['in_zone'] == False)).sum(),
p=[0.55, 0.45]
)
return data
# Prior year (2021)
carroll_2021 = generate_pitch_data(1500, chase_rate=0.32, z_contact=0.82, whiff_rate=0.25)
# Current year (2022) - improved
carroll_2022 = generate_pitch_data(1800, chase_rate=0.25, z_contact=0.88, whiff_rate=0.19)
analyzer = SwingDecisionAnalyzer()
metrics_2021 = analyzer.calculate_swing_metrics(carroll_2021)
metrics_2022 = analyzer.calculate_swing_metrics(carroll_2022)
improvements = analyzer.identify_improvement(metrics_2022, metrics_2021)
print("\nCorbin Carroll Swing Decision Improvement (2021 vs 2022):")
print(f"Chase Rate: {metrics_2021['chase_rate']:.1f}% → {metrics_2022['chase_rate']:.1f}% "
f"(Δ {improvements['chase_improvement']:.1f}%)")
print(f"Zone Contact: {metrics_2021['z_contact_pct']:.1f}% → {metrics_2022['z_contact_pct']:.1f}% "
f"(Δ {improvements['zone_contact_improvement']:.1f}%)")
print(f"Whiff Rate: {metrics_2021['whiff_rate']:.1f}% → {metrics_2022['whiff_rate']:.1f}% "
f"(Δ {improvements['whiff_improvement']:.1f}%)")
print(f"\nOverall Improvement Score: {improvements['total_improvement']:.2f}")
def analyze_power_development(batted_ball_data):
"""
Identify power development using batted ball quality
"""
# Calculate power indicators
indicators = {
'avg_distance': batted_ball_data['hit_distance'].mean(),
'max_distance': batted_ball_data['hit_distance'].max(),
'avg_exit_velo': batted_ball_data['exit_velocity'].mean(),
'max_exit_velo': batted_ball_data['exit_velocity'].max(),
'95mph+_rate': (batted_ball_data['exit_velocity'] >= 95).sum() / len(batted_ball_data) * 100,
'100mph+_rate': (batted_ball_data['exit_velocity'] >= 100).sum() / len(batted_ball_data) * 100,
'barrel_rate': (batted_ball_data['barrel'] == 1).sum() / len(batted_ball_data) * 100,
'optimal_la_rate': ((batted_ball_data['launch_angle'] >= 15) &
(batted_ball_data['launch_angle'] <= 35)).sum() / len(batted_ball_data) * 100
}
# Power potential score (0-100)
indicators['power_potential'] = (
(indicators['avg_exit_velo'] - 82) * 2.5 +
indicators['barrel_rate'] * 2 +
indicators['95mph+_rate'] * 0.5 +
(indicators['optimal_la_rate'] - 20) * 0.8
)
return indicators
# Example: Compare power development of similar prospects
np.random.seed(123)
# Prospect A: Raw power, needs refinement
prospect_a_bb = pd.DataFrame({
'exit_velocity': np.random.normal(91.5, 8, 300),
'launch_angle': np.random.normal(8, 22, 300), # Low avg launch angle
'hit_distance': np.random.normal(260, 75, 300),
'barrel': np.random.choice([0, 1], 300, p=[0.90, 0.10])
})
# Prospect B: Growing power, better approach
prospect_b_bb = pd.DataFrame({
'exit_velocity': np.random.normal(89.5, 7, 300),
'launch_angle': np.random.normal(14, 18, 300), # Better launch angle
'hit_distance': np.random.normal(270, 70, 300),
'barrel': np.random.choice([0, 1], 300, p=[0.92, 0.08])
})
power_a = analyze_power_development(prospect_a_bb)
power_b = analyze_power_development(prospect_b_bb)
print("\nPower Development Comparison:")
print(f"\nProspect A (Raw Power):")
print(f" Avg Exit Velo: {power_a['avg_exit_velo']:.1f} mph")
print(f" Barrel Rate: {power_a['barrel_rate']:.1f}%")
print(f" Optimal LA Rate: {power_a['optimal_la_rate']:.1f}%")
print(f" Power Potential Score: {power_a['power_potential']:.1f}")
print(f"\nProspect B (Refined Approach):")
print(f" Avg Exit Velo: {power_b['avg_exit_velo']:.1f} mph")
print(f" Barrel Rate: {power_b['barrel_rate']:.1f}%")
print(f" Optimal LA Rate: {power_b['optimal_la_rate']:.1f}%")
print(f" Power Potential Score: {power_b['power_potential']:.1f}")
International markets—particularly Latin America and Asia—provide critical talent pipelines. Analytics in international scouting face unique challenges: limited statistical data, cultural differences, and varying competitive contexts.
International Prospect Evaluation Framework
library(tidyverse)
# International prospect evaluation
evaluate_international_prospect <- function(prospect_data, market) {
# Market-specific adjustment factors
market_factors <- list(
"Dominican" = list(development = 1.15, tools = 1.10, risk = 1.20),
"Venezuela" = list(development = 1.12, tools = 1.08, risk = 1.18),
"Cuba" = list(development = 0.95, tools = 1.05, risk = 0.90),
"Japan" = list(development = 0.85, tools = 0.95, risk = 0.75),
"Korea" = list(development = 0.88, tools = 0.92, risk = 0.80)
)
factors <- market_factors[[market]]
# Evaluate tools (20-80 scale)
prospect_data <- prospect_data %>%
mutate(
# Adjust raw tools for market context
hit_adjusted = hit_grade * factors$tools,
power_adjusted = power_grade * factors$tools,
speed_adjusted = speed_grade,
# Development timeline adjustment
eta_years = base_eta * factors$development,
# Risk adjustment
risk_score = base_risk * factors$risk,
# Composite future value
fv_grade = (hit_adjusted * 0.30 +
power_adjusted * 0.25 +
speed_adjusted * 0.15 +
field_grade * 0.20 +
arm_grade * 0.10) * (1 - (risk_score * 0.01))
)
return(prospect_data)
}
# Example: Evaluating 2023 international class
intl_class_2023 <- tibble(
name = c("Ethan Salas", "Jaison Chourio", "Cristian Hernandez",
"Armando Cruz", "Colin Houck"),
age = c(16, 17, 16, 17, 16),
market = c("Venezuela", "Venezuela", "Dominican", "Dominican", "Cuba"),
hit_grade = c(55, 60, 50, 55, 60),
power_grade = c(60, 55, 55, 50, 55),
speed_grade = c(50, 60, 55, 50, 45),
field_grade = c(60, 55, 50, 55, 50),
arm_grade = c(70, 55, 60, 50, 55),
base_eta = c(5, 4, 5, 5, 3),
base_risk = c(45, 40, 50, 48, 35)
)
# Evaluate each prospect
evaluated_intl <- intl_class_2023 %>%
rowwise() %>%
mutate(evaluation = list(evaluate_international_prospect(
cur_data(), market
))) %>%
unnest(evaluation)
print("International Prospect Evaluations:")
print(evaluated_intl %>%
select(name, market, fv_grade, eta_years, risk_score) %>%
arrange(desc(fv_grade)))
Amateur Statistical Analysis
For markets with organized leagues (Japan, Korea, Taiwan), statistical analysis becomes possible but requires careful context adjustment.
class InternationalStatTranslator:
def __init__(self):
# League difficulty multipliers relative to MLB (1.00)
self.league_factors = {
'NPB': 0.78, # Japanese NPB
'KBO': 0.72, # Korean KBO
'CPBL': 0.65, # Taiwan CPBL
'Cuban': 0.68, # Cuban National Series
'Mexican': 0.62 # Mexican League
}
def translate_to_mlb_equivalent(self, stats, league, age):
"""
Translate international league stats to MLB equivalents
"""
league_factor = self.league_factors.get(league, 0.60)
# Age adjustment (younger players get bonus)
age_adjustment = 1.0 + ((25 - age) * 0.015) if age < 25 else 1.0
# Apply translations with age adjustment
mlb_equivalent = {
'AVG': stats['AVG'] * league_factor * age_adjustment * 0.95,
'OBP': stats['OBP'] * league_factor * age_adjustment * 0.98,
'SLG': stats['SLG'] * league_factor * age_adjustment * 0.90,
'HR': stats['HR'] * league_factor * age_adjustment * 0.85,
'BB_pct': stats['BB_pct'] * league_factor * age_adjustment,
'K_pct': stats['K_pct'] / (league_factor * age_adjustment)
}
# Calculate expected wRC+
mlb_equivalent['wRC+'] = self.calculate_wrc_plus(mlb_equivalent)
return mlb_equivalent
def calculate_wrc_plus(self, stats):
"""Calculate estimated wRC+ from basic stats"""
# Simplified wRC+ estimate
wOBA = (0.69 * stats['BB_pct'] +
0.88 * (stats['AVG'] - (stats['SLG'] - stats['AVG'])) +
1.27 * (stats['SLG'] - stats['AVG']))
# League average wOBA ~ 0.320
wRC_plus = (wOBA / 0.320) * 100
return wRC_plus
# Example: Translate Masataka Yoshida's NPB stats to MLB projection
yoshida_npb_2022 = {
'AVG': .335,
'OBP': .447,
'SLG': .562,
'HR': 21,
'BB_pct': 0.159,
'K_pct': 0.107
}
translator = InternationalStatTranslator()
yoshida_mlb_proj = translator.translate_to_mlb_equivalent(
yoshida_npb_2022, 'NPB', age=29
)
print("\nMasataka Yoshida NPB to MLB Translation:")
print(f"NPB Stats (2022): .335/.447/.562, 21 HR")
print(f"\nMLB Projection:")
print(f" AVG: {yoshida_mlb_proj['AVG']:.3f}")
print(f" OBP: {yoshida_mlb_proj['OBP']:.3f}")
print(f" SLG: {yoshida_mlb_proj['SLG']:.3f}")
print(f" BB%: {yoshida_mlb_proj['BB_pct']:.1%}")
print(f" K%: {yoshida_mlb_proj['K_pct']:.1%}")
print(f" Projected wRC+: {yoshida_mlb_proj['wRC+']:.0f}")
# Example: Translate Jung Hoo Lee's KBO stats
lee_kbo_2023 = {
'AVG': .349,
'OBP': .421,
'SLG': .575,
'HR': 23,
'BB_pct': 0.098,
'K_pct': 0.089
}
lee_mlb_proj = translator.translate_to_mlb_equivalent(
lee_kbo_2023, 'KBO', age=25
)
print("\n\nJung Hoo Lee KBO to MLB Translation:")
print(f"KBO Stats (2023): .349/.421/.575, 23 HR")
print(f"\nMLB Projection:")
print(f" AVG: {lee_mlb_proj['AVG']:.3f}")
print(f" OBP: {lee_mlb_proj['OBP']:.3f}")
print(f" SLG: {lee_mlb_proj['SLG']:.3f}")
print(f" BB%: {lee_mlb_proj['BB_pct']:.1%}")
print(f" K%: {lee_mlb_proj['K_pct']:.1%}")
print(f" Projected wRC+: {lee_mlb_proj['wRC+']:.0f}")
Physical Projection Models
For young international amateurs, physical projection models estimate future size and strength development, critical for power projection.
# Physical growth projection model
project_physical_development <- function(current_metrics, current_age) {
# Average growth patterns by age
growth_factors <- tibble(
age = 16:22,
height_factor = c(1.03, 1.02, 1.01, 1.00, 1.00, 1.00, 1.00),
weight_factor = c(1.15, 1.12, 1.08, 1.05, 1.02, 1.01, 1.00),
strength_factor = c(1.20, 1.18, 1.15, 1.10, 1.05, 1.02, 1.00)
)
# Project to age 22 (physical maturity)
if (current_age >= 22) {
return(current_metrics)
}
target_factors <- growth_factors %>%
filter(age == 22) %>%
select(-age)
current_factors <- growth_factors %>%
filter(age == current_age) %>%
select(-age)
projected <- current_metrics %>%
mutate(
projected_height = height * (target_factors$height_factor / current_factors$height_factor),
projected_weight = weight * (target_factors$weight_factor / current_factors$weight_factor),
projected_strength = strength_score * (target_factors$strength_factor / current_factors$strength_factor),
# Estimate power grade based on physical projection
projected_power = case_when(
projected_weight >= 210 & projected_strength >= 65 ~ 60,
projected_weight >= 195 & projected_strength >= 60 ~ 55,
projected_weight >= 180 & projected_strength >= 55 ~ 50,
TRUE ~ 45
)
)
return(projected)
}
# Example: Project 16-year-old Dominican prospect
young_prospect <- tibble(
name = "Prospect A",
age = 16,
height = 70, # inches
weight = 165,
strength_score = 45,
current_power = 40
)
physical_projection <- project_physical_development(young_prospect, 16)
print("Physical Development Projection:")
print(glue::glue(
"{young_prospect$name} (Age {young_prospect$age})
Current: {young_prospect$height}in, {young_prospect$weight}lbs, Power: {young_prospect$current_power}
Projected: {round(physical_projection$projected_height, 1)}in, {round(physical_projection$projected_weight, 0)}lbs, Power: {physical_projection$projected_power}"
))
library(tidyverse)
# International prospect evaluation
evaluate_international_prospect <- function(prospect_data, market) {
# Market-specific adjustment factors
market_factors <- list(
"Dominican" = list(development = 1.15, tools = 1.10, risk = 1.20),
"Venezuela" = list(development = 1.12, tools = 1.08, risk = 1.18),
"Cuba" = list(development = 0.95, tools = 1.05, risk = 0.90),
"Japan" = list(development = 0.85, tools = 0.95, risk = 0.75),
"Korea" = list(development = 0.88, tools = 0.92, risk = 0.80)
)
factors <- market_factors[[market]]
# Evaluate tools (20-80 scale)
prospect_data <- prospect_data %>%
mutate(
# Adjust raw tools for market context
hit_adjusted = hit_grade * factors$tools,
power_adjusted = power_grade * factors$tools,
speed_adjusted = speed_grade,
# Development timeline adjustment
eta_years = base_eta * factors$development,
# Risk adjustment
risk_score = base_risk * factors$risk,
# Composite future value
fv_grade = (hit_adjusted * 0.30 +
power_adjusted * 0.25 +
speed_adjusted * 0.15 +
field_grade * 0.20 +
arm_grade * 0.10) * (1 - (risk_score * 0.01))
)
return(prospect_data)
}
# Example: Evaluating 2023 international class
intl_class_2023 <- tibble(
name = c("Ethan Salas", "Jaison Chourio", "Cristian Hernandez",
"Armando Cruz", "Colin Houck"),
age = c(16, 17, 16, 17, 16),
market = c("Venezuela", "Venezuela", "Dominican", "Dominican", "Cuba"),
hit_grade = c(55, 60, 50, 55, 60),
power_grade = c(60, 55, 55, 50, 55),
speed_grade = c(50, 60, 55, 50, 45),
field_grade = c(60, 55, 50, 55, 50),
arm_grade = c(70, 55, 60, 50, 55),
base_eta = c(5, 4, 5, 5, 3),
base_risk = c(45, 40, 50, 48, 35)
)
# Evaluate each prospect
evaluated_intl <- intl_class_2023 %>%
rowwise() %>%
mutate(evaluation = list(evaluate_international_prospect(
cur_data(), market
))) %>%
unnest(evaluation)
print("International Prospect Evaluations:")
print(evaluated_intl %>%
select(name, market, fv_grade, eta_years, risk_score) %>%
arrange(desc(fv_grade)))
# Physical growth projection model
project_physical_development <- function(current_metrics, current_age) {
# Average growth patterns by age
growth_factors <- tibble(
age = 16:22,
height_factor = c(1.03, 1.02, 1.01, 1.00, 1.00, 1.00, 1.00),
weight_factor = c(1.15, 1.12, 1.08, 1.05, 1.02, 1.01, 1.00),
strength_factor = c(1.20, 1.18, 1.15, 1.10, 1.05, 1.02, 1.00)
)
# Project to age 22 (physical maturity)
if (current_age >= 22) {
return(current_metrics)
}
target_factors <- growth_factors %>%
filter(age == 22) %>%
select(-age)
current_factors <- growth_factors %>%
filter(age == current_age) %>%
select(-age)
projected <- current_metrics %>%
mutate(
projected_height = height * (target_factors$height_factor / current_factors$height_factor),
projected_weight = weight * (target_factors$weight_factor / current_factors$weight_factor),
projected_strength = strength_score * (target_factors$strength_factor / current_factors$strength_factor),
# Estimate power grade based on physical projection
projected_power = case_when(
projected_weight >= 210 & projected_strength >= 65 ~ 60,
projected_weight >= 195 & projected_strength >= 60 ~ 55,
projected_weight >= 180 & projected_strength >= 55 ~ 50,
TRUE ~ 45
)
)
return(projected)
}
# Example: Project 16-year-old Dominican prospect
young_prospect <- tibble(
name = "Prospect A",
age = 16,
height = 70, # inches
weight = 165,
strength_score = 45,
current_power = 40
)
physical_projection <- project_physical_development(young_prospect, 16)
print("Physical Development Projection:")
print(glue::glue(
"{young_prospect$name} (Age {young_prospect$age})
Current: {young_prospect$height}in, {young_prospect$weight}lbs, Power: {young_prospect$current_power}
Projected: {round(physical_projection$projected_height, 1)}in, {round(physical_projection$projected_weight, 0)}lbs, Power: {physical_projection$projected_power}"
))
class InternationalStatTranslator:
def __init__(self):
# League difficulty multipliers relative to MLB (1.00)
self.league_factors = {
'NPB': 0.78, # Japanese NPB
'KBO': 0.72, # Korean KBO
'CPBL': 0.65, # Taiwan CPBL
'Cuban': 0.68, # Cuban National Series
'Mexican': 0.62 # Mexican League
}
def translate_to_mlb_equivalent(self, stats, league, age):
"""
Translate international league stats to MLB equivalents
"""
league_factor = self.league_factors.get(league, 0.60)
# Age adjustment (younger players get bonus)
age_adjustment = 1.0 + ((25 - age) * 0.015) if age < 25 else 1.0
# Apply translations with age adjustment
mlb_equivalent = {
'AVG': stats['AVG'] * league_factor * age_adjustment * 0.95,
'OBP': stats['OBP'] * league_factor * age_adjustment * 0.98,
'SLG': stats['SLG'] * league_factor * age_adjustment * 0.90,
'HR': stats['HR'] * league_factor * age_adjustment * 0.85,
'BB_pct': stats['BB_pct'] * league_factor * age_adjustment,
'K_pct': stats['K_pct'] / (league_factor * age_adjustment)
}
# Calculate expected wRC+
mlb_equivalent['wRC+'] = self.calculate_wrc_plus(mlb_equivalent)
return mlb_equivalent
def calculate_wrc_plus(self, stats):
"""Calculate estimated wRC+ from basic stats"""
# Simplified wRC+ estimate
wOBA = (0.69 * stats['BB_pct'] +
0.88 * (stats['AVG'] - (stats['SLG'] - stats['AVG'])) +
1.27 * (stats['SLG'] - stats['AVG']))
# League average wOBA ~ 0.320
wRC_plus = (wOBA / 0.320) * 100
return wRC_plus
# Example: Translate Masataka Yoshida's NPB stats to MLB projection
yoshida_npb_2022 = {
'AVG': .335,
'OBP': .447,
'SLG': .562,
'HR': 21,
'BB_pct': 0.159,
'K_pct': 0.107
}
translator = InternationalStatTranslator()
yoshida_mlb_proj = translator.translate_to_mlb_equivalent(
yoshida_npb_2022, 'NPB', age=29
)
print("\nMasataka Yoshida NPB to MLB Translation:")
print(f"NPB Stats (2022): .335/.447/.562, 21 HR")
print(f"\nMLB Projection:")
print(f" AVG: {yoshida_mlb_proj['AVG']:.3f}")
print(f" OBP: {yoshida_mlb_proj['OBP']:.3f}")
print(f" SLG: {yoshida_mlb_proj['SLG']:.3f}")
print(f" BB%: {yoshida_mlb_proj['BB_pct']:.1%}")
print(f" K%: {yoshida_mlb_proj['K_pct']:.1%}")
print(f" Projected wRC+: {yoshida_mlb_proj['wRC+']:.0f}")
# Example: Translate Jung Hoo Lee's KBO stats
lee_kbo_2023 = {
'AVG': .349,
'OBP': .421,
'SLG': .575,
'HR': 23,
'BB_pct': 0.098,
'K_pct': 0.089
}
lee_mlb_proj = translator.translate_to_mlb_equivalent(
lee_kbo_2023, 'KBO', age=25
)
print("\n\nJung Hoo Lee KBO to MLB Translation:")
print(f"KBO Stats (2023): .349/.421/.575, 23 HR")
print(f"\nMLB Projection:")
print(f" AVG: {lee_mlb_proj['AVG']:.3f}")
print(f" OBP: {lee_mlb_proj['OBP']:.3f}")
print(f" SLG: {lee_mlb_proj['SLG']:.3f}")
print(f" BB%: {lee_mlb_proj['BB_pct']:.1%}")
print(f" K%: {lee_mlb_proj['K_pct']:.1%}")
print(f" Projected wRC+: {lee_mlb_proj['wRC+']:.0f}")
Determining the optimal time to promote a prospect represents one of the most consequential decisions in player development. Analytics can inform timing by balancing player readiness, service time considerations, and organizational need.
Readiness Assessment Framework
class CallUpDecisionModel:
def __init__(self):
self.readiness_weights = {
'performance': 0.35,
'skills': 0.30,
'experience': 0.15,
'need': 0.20
}
def assess_mlb_readiness(self, prospect_profile, team_context):
"""
Assess whether prospect is ready for MLB promotion
"""
# Performance component
performance_score = self.calculate_performance_score(prospect_profile)
# Skills component
skills_score = self.calculate_skills_score(prospect_profile)
# Experience component
experience_score = self.calculate_experience_score(prospect_profile)
# Organizational need component
need_score = self.calculate_need_score(team_context)
# Weighted composite
readiness_score = (
performance_score * self.readiness_weights['performance'] +
skills_score * self.readiness_weights['skills'] +
experience_score * self.readiness_weights['experience'] +
need_score * self.readiness_weights['need']
)
# Service time consideration
service_time_impact = self.calculate_service_time_value(
prospect_profile, team_context
)
return {
'readiness_score': readiness_score,
'performance_score': performance_score,
'skills_score': skills_score,
'experience_score': experience_score,
'need_score': need_score,
'service_time_value': service_time_impact,
'recommendation': self.make_recommendation(readiness_score, service_time_impact)
}
def calculate_performance_score(self, profile):
"""Score based on recent performance"""
# Last 30 days wRC+ (AAA)
recent_wrc = profile.get('recent_wRC+', 100)
# Season wRC+
season_wrc = profile.get('season_wRC+', 100)
# Combine with recency bias
performance = (recent_wrc * 0.6 + season_wrc * 0.4)
# Normalize to 0-100
score = min(100, max(0, (performance - 70) * 1.5))
return score
def calculate_skills_score(self, profile):
"""Score based on skill profile"""
# Key skills
plate_discipline = (profile.get('BB_pct', 8) - 4) * 8
contact_ability = (85 - profile.get('K_pct', 23)) * 2
power = profile.get('ISO', 0.150) * 200
defense = profile.get('defensive_grade', 50) - 30
score = (plate_discipline * 0.3 + contact_ability * 0.3 +
power * 0.25 + defense * 0.15)
return max(0, min(100, score))
def calculate_experience_score(self, profile):
"""Score based on development experience"""
pa_aaa = profile.get('AAA_PA', 0)
pa_aa = profile.get('AA_PA', 0)
# Prefer meaningful AAA experience
experience = (pa_aaa * 0.7 + pa_aa * 0.3)
# Normalize (300 PA = 50 score, 600 PA = 100 score)
score = min(100, (experience / 6))
return score
def calculate_need_score(self, context):
"""Score based on organizational need"""
position_depth = context.get('position_depth', 5) # Number of MLB options
current_production = context.get('position_wRC+', 100) # Current position wRC+
# High need = low depth or poor production
depth_need = (6 - position_depth) * 15
production_need = max(0, (100 - current_production) * 0.5)
score = min(100, depth_need + production_need)
return score
def calculate_service_time_value(self, profile, context):
"""
Calculate value of delaying call-up for service time
"""
days_until_super2 = context.get('days_until_super2', 0)
days_until_full_year = context.get('days_until_full_year', 0)
prospect_value = profile.get('future_WAR', 2.5)
# Value of extra year of control (roughly $8M per WAR)
extra_year_value = prospect_value * 8_000_000
# Discount based on time delay
if days_until_full_year > 0 and days_until_full_year <= 20:
return extra_year_value * 0.9 # High value to wait
elif days_until_super2 > 0 and days_until_super2 <= 30:
return extra_year_value * 0.3 # Moderate value to wait
else:
return 0
def make_recommendation(self, readiness_score, service_time_value):
"""Make call-up recommendation"""
if readiness_score >= 75:
if service_time_value > 10_000_000:
return "READY - Consider service time timing"
else:
return "READY - Call up now"
elif readiness_score >= 60:
return "CLOSE - Monitor closely, could be ready soon"
elif readiness_score >= 40:
return "DEVELOPING - Needs more time"
else:
return "NOT READY - Significant development needed"
# Example: Evaluate call-up decision for top prospects
model = CallUpDecisionModel()
# Prospect 1: Gunnar Henderson (June 2022)
henderson_profile = {
'recent_wRC+': 138,
'season_wRC+': 125,
'BB_pct': 11.2,
'K_pct': 22.5,
'ISO': 0.215,
'defensive_grade': 55,
'AAA_PA': 245,
'AA_PA': 412,
'future_WAR': 3.5
}
orioles_context = {
'position_depth': 3, # SS/3B
'position_wRC+': 88, # Below average production
'days_until_super2': 45,
'days_until_full_year': 0
}
henderson_eval = model.assess_mlb_readiness(henderson_profile, orioles_context)
print("Gunnar Henderson Call-Up Evaluation:")
print(f"Readiness Score: {henderson_eval['readiness_score']:.1f}/100")
print(f" Performance: {henderson_eval['performance_score']:.1f}")
print(f" Skills: {henderson_eval['skills_score']:.1f}")
print(f" Experience: {henderson_eval['experience_score']:.1f}")
print(f" Need: {henderson_eval['need_score']:.1f}")
print(f"Service Time Value: ${henderson_eval['service_time_value']:,.0f}")
print(f"Recommendation: {henderson_eval['recommendation']}")
# Prospect 2: Jordan Walker (March 2023 - Opening Day consideration)
walker_profile = {
'recent_wRC+': 115, # Spring training
'season_wRC+': 135, # Previous AA season
'BB_pct': 8.8,
'K_pct': 23.8,
'ISO': 0.235,
'defensive_grade': 50,
'AAA_PA': 0, # No AAA experience
'AA_PA': 456,
'future_WAR': 4.0
}
cardinals_context = {
'position_depth': 4, # OF depth
'position_wRC+': 105, # Average production
'days_until_super2': 0,
'days_until_full_year': 15 # Opening Day decision
}
walker_eval = model.assess_mlb_readiness(walker_profile, cardinals_context)
print("\n\nJordan Walker Call-Up Evaluation:")
print(f"Readiness Score: {walker_eval['readiness_score']:.1f}/100")
print(f" Performance: {walker_eval['performance_score']:.1f}")
print(f" Skills: {walker_eval['skills_score']:.1f}")
print(f" Experience: {walker_eval['experience_score']:.1f}")
print(f" Need: {walker_eval['need_score']:.1f}")
print(f"Service Time Value: ${walker_eval['service_time_value']:,.0f}")
print(f"Recommendation: {walker_eval['recommendation']}")
Service Time Optimization
library(tidyverse)
# Service time calculator
calculate_service_time_scenarios <- function(call_up_date, season_year) {
# MLB service time rules
season_start <- as.Date(paste0(season_year, "-03-30"))
season_end <- as.Date(paste0(season_year, "-10-01"))
# Days in season
total_days <- as.numeric(season_end - season_start)
# Calculate days of service
if (call_up_date < season_start) {
service_days <- total_days
} else if (call_up_date > season_end) {
service_days <- 0
} else {
service_days <- as.numeric(season_end - call_up_date)
}
# 172 days = 1 year of service
service_years <- service_days / 172
# Free agency year (6 years of service)
fa_year <- season_year + ceiling(6 - service_years)
# Super Two cutoff (roughly top 22% of 2-3 year players, ~2.116 years)
# Typically mid-April call-ups avoid Super Two
super_two_cutoff <- as.Date(paste0(season_year, "-04-15"))
is_super_two <- call_up_date < super_two_cutoff
# Arbitration years
if (is_super_two) {
arb_years <- 4 # Super Two = 4 arb years
arb_start_year <- season_year + ceiling(3 - service_years)
} else {
arb_years <- 3 # Normal = 3 arb years
arb_start_year <- season_year + ceiling(3 - service_years)
}
return(tibble(
call_up_date = call_up_date,
service_days = service_days,
service_years = service_years,
is_super_two = is_super_two,
arb_years = arb_years,
arb_start_year = arb_start_year,
fa_year = fa_year
))
}
# Analyze different call-up scenarios
scenarios <- tibble(
scenario = c("Opening Day", "Mid-April", "Super Two Safe", "Mid-Season"),
call_up_date = as.Date(c("2023-03-30", "2023-04-15", "2023-04-25", "2023-06-15"))
)
service_time_analysis <- scenarios %>%
rowwise() %>%
mutate(analysis = list(calculate_service_time_scenarios(call_up_date, 2023))) %>%
unnest(analysis)
print("Service Time Impact by Call-Up Date:")
print(service_time_analysis %>%
select(scenario, call_up_date, service_days, is_super_two, arb_years, fa_year))
# Calculate financial impact
calculate_financial_impact <- function(scenarios_df, projected_war) {
# Arbitration cost estimates per WAR
arb1_rate <- 3.0 # $M per WAR
arb2_rate <- 5.5
arb3_rate <- 8.0
arb4_rate <- 10.0 # Super Two only
scenarios_df <- scenarios_df %>%
mutate(
# Estimate total arbitration costs
arb_cost = case_when(
arb_years == 4 ~ projected_war * (arb1_rate + arb2_rate + arb3_rate + arb4_rate),
arb_years == 3 ~ projected_war * (arb1_rate + arb2_rate + arb3_rate),
TRUE ~ 0
),
# Extra year of control value (pre-FA year at arb3 rates vs FA market)
fa_year_value = projected_war * 8.0, # Market rate
# If FA year is delayed, that's value retained
years_delayed = fa_year - min(fa_year),
control_value = years_delayed * fa_year_value
)
return(scenarios_df)
}
financial_analysis <- calculate_financial_impact(service_time_analysis, projected_war = 3.5)
print("\nFinancial Impact Analysis (Projected 3.5 WAR player):")
print(financial_analysis %>%
select(scenario, arb_cost, control_value, fa_year) %>%
mutate(total_value = control_value - (arb_cost - min(arb_cost))))
Performance-Based Triggers
class PerformanceTriggerSystem:
def __init__(self):
self.trigger_thresholds = {
'AAA': {'wRC+': 120, 'K_pct': 25, 'BB_pct': 8, 'min_PA': 200},
'AA': {'wRC+': 130, 'K_pct': 23, 'BB_pct': 9, 'min_PA': 300}
}
def check_promotion_triggers(self, player_stats, current_level):
"""
Check if player has met performance triggers for promotion
"""
thresholds = self.trigger_thresholds[current_level]
triggers_met = []
# Check each threshold
if player_stats['PA'] >= thresholds['min_PA']:
triggers_met.append('Playing Time')
if player_stats['wRC+'] >= thresholds['wRC+']:
triggers_met.append('Overall Performance')
if player_stats['K_pct'] <= thresholds['K_pct']:
triggers_met.append('Strikeout Rate')
if player_stats['BB_pct'] >= thresholds['BB_pct']:
triggers_met.append('Walk Rate')
# Age consideration - young players get promoted more aggressively
if player_stats['age'] <= 22 and player_stats['wRC+'] >= 110:
triggers_met.append('Age-Adjusted Performance')
# Determine readiness
triggers_total = 5 if player_stats['age'] <= 22 else 4
triggers_pct = len(triggers_met) / triggers_total
if triggers_pct >= 0.75:
recommendation = "PROMOTE"
elif triggers_pct >= 0.50:
recommendation = "MONITOR"
else:
recommendation = "CONTINUE DEVELOPMENT"
return {
'triggers_met': triggers_met,
'triggers_percentage': triggers_pct,
'recommendation': recommendation
}
# Example: Monitor multiple prospects for promotion triggers
trigger_system = PerformanceTriggerSystem()
prospects_to_monitor = pd.DataFrame({
'name': ['Prospect A', 'Prospect B', 'Prospect C'],
'level': ['AAA', 'AA', 'AAA'],
'age': [22, 20, 24],
'PA': [285, 345, 412],
'wRC+': [128, 135, 118],
'K_pct': [23.5, 22.1, 26.8],
'BB_pct': [9.2, 10.5, 7.8]
})
print("Promotion Trigger Analysis:\n")
for idx, prospect in prospects_to_monitor.iterrows():
result = trigger_system.check_promotion_triggers(prospect, prospect['level'])
print(f"{prospect['name']} ({prospect['level']}, Age {prospect['age']}):")
print(f" Triggers Met: {', '.join(result['triggers_met']) if result['triggers_met'] else 'None'}")
print(f" Readiness: {result['triggers_percentage']:.0%}")
print(f" Recommendation: {result['recommendation']}\n")
library(tidyverse)
# Service time calculator
calculate_service_time_scenarios <- function(call_up_date, season_year) {
# MLB service time rules
season_start <- as.Date(paste0(season_year, "-03-30"))
season_end <- as.Date(paste0(season_year, "-10-01"))
# Days in season
total_days <- as.numeric(season_end - season_start)
# Calculate days of service
if (call_up_date < season_start) {
service_days <- total_days
} else if (call_up_date > season_end) {
service_days <- 0
} else {
service_days <- as.numeric(season_end - call_up_date)
}
# 172 days = 1 year of service
service_years <- service_days / 172
# Free agency year (6 years of service)
fa_year <- season_year + ceiling(6 - service_years)
# Super Two cutoff (roughly top 22% of 2-3 year players, ~2.116 years)
# Typically mid-April call-ups avoid Super Two
super_two_cutoff <- as.Date(paste0(season_year, "-04-15"))
is_super_two <- call_up_date < super_two_cutoff
# Arbitration years
if (is_super_two) {
arb_years <- 4 # Super Two = 4 arb years
arb_start_year <- season_year + ceiling(3 - service_years)
} else {
arb_years <- 3 # Normal = 3 arb years
arb_start_year <- season_year + ceiling(3 - service_years)
}
return(tibble(
call_up_date = call_up_date,
service_days = service_days,
service_years = service_years,
is_super_two = is_super_two,
arb_years = arb_years,
arb_start_year = arb_start_year,
fa_year = fa_year
))
}
# Analyze different call-up scenarios
scenarios <- tibble(
scenario = c("Opening Day", "Mid-April", "Super Two Safe", "Mid-Season"),
call_up_date = as.Date(c("2023-03-30", "2023-04-15", "2023-04-25", "2023-06-15"))
)
service_time_analysis <- scenarios %>%
rowwise() %>%
mutate(analysis = list(calculate_service_time_scenarios(call_up_date, 2023))) %>%
unnest(analysis)
print("Service Time Impact by Call-Up Date:")
print(service_time_analysis %>%
select(scenario, call_up_date, service_days, is_super_two, arb_years, fa_year))
# Calculate financial impact
calculate_financial_impact <- function(scenarios_df, projected_war) {
# Arbitration cost estimates per WAR
arb1_rate <- 3.0 # $M per WAR
arb2_rate <- 5.5
arb3_rate <- 8.0
arb4_rate <- 10.0 # Super Two only
scenarios_df <- scenarios_df %>%
mutate(
# Estimate total arbitration costs
arb_cost = case_when(
arb_years == 4 ~ projected_war * (arb1_rate + arb2_rate + arb3_rate + arb4_rate),
arb_years == 3 ~ projected_war * (arb1_rate + arb2_rate + arb3_rate),
TRUE ~ 0
),
# Extra year of control value (pre-FA year at arb3 rates vs FA market)
fa_year_value = projected_war * 8.0, # Market rate
# If FA year is delayed, that's value retained
years_delayed = fa_year - min(fa_year),
control_value = years_delayed * fa_year_value
)
return(scenarios_df)
}
financial_analysis <- calculate_financial_impact(service_time_analysis, projected_war = 3.5)
print("\nFinancial Impact Analysis (Projected 3.5 WAR player):")
print(financial_analysis %>%
select(scenario, arb_cost, control_value, fa_year) %>%
mutate(total_value = control_value - (arb_cost - min(arb_cost))))
class CallUpDecisionModel:
def __init__(self):
self.readiness_weights = {
'performance': 0.35,
'skills': 0.30,
'experience': 0.15,
'need': 0.20
}
def assess_mlb_readiness(self, prospect_profile, team_context):
"""
Assess whether prospect is ready for MLB promotion
"""
# Performance component
performance_score = self.calculate_performance_score(prospect_profile)
# Skills component
skills_score = self.calculate_skills_score(prospect_profile)
# Experience component
experience_score = self.calculate_experience_score(prospect_profile)
# Organizational need component
need_score = self.calculate_need_score(team_context)
# Weighted composite
readiness_score = (
performance_score * self.readiness_weights['performance'] +
skills_score * self.readiness_weights['skills'] +
experience_score * self.readiness_weights['experience'] +
need_score * self.readiness_weights['need']
)
# Service time consideration
service_time_impact = self.calculate_service_time_value(
prospect_profile, team_context
)
return {
'readiness_score': readiness_score,
'performance_score': performance_score,
'skills_score': skills_score,
'experience_score': experience_score,
'need_score': need_score,
'service_time_value': service_time_impact,
'recommendation': self.make_recommendation(readiness_score, service_time_impact)
}
def calculate_performance_score(self, profile):
"""Score based on recent performance"""
# Last 30 days wRC+ (AAA)
recent_wrc = profile.get('recent_wRC+', 100)
# Season wRC+
season_wrc = profile.get('season_wRC+', 100)
# Combine with recency bias
performance = (recent_wrc * 0.6 + season_wrc * 0.4)
# Normalize to 0-100
score = min(100, max(0, (performance - 70) * 1.5))
return score
def calculate_skills_score(self, profile):
"""Score based on skill profile"""
# Key skills
plate_discipline = (profile.get('BB_pct', 8) - 4) * 8
contact_ability = (85 - profile.get('K_pct', 23)) * 2
power = profile.get('ISO', 0.150) * 200
defense = profile.get('defensive_grade', 50) - 30
score = (plate_discipline * 0.3 + contact_ability * 0.3 +
power * 0.25 + defense * 0.15)
return max(0, min(100, score))
def calculate_experience_score(self, profile):
"""Score based on development experience"""
pa_aaa = profile.get('AAA_PA', 0)
pa_aa = profile.get('AA_PA', 0)
# Prefer meaningful AAA experience
experience = (pa_aaa * 0.7 + pa_aa * 0.3)
# Normalize (300 PA = 50 score, 600 PA = 100 score)
score = min(100, (experience / 6))
return score
def calculate_need_score(self, context):
"""Score based on organizational need"""
position_depth = context.get('position_depth', 5) # Number of MLB options
current_production = context.get('position_wRC+', 100) # Current position wRC+
# High need = low depth or poor production
depth_need = (6 - position_depth) * 15
production_need = max(0, (100 - current_production) * 0.5)
score = min(100, depth_need + production_need)
return score
def calculate_service_time_value(self, profile, context):
"""
Calculate value of delaying call-up for service time
"""
days_until_super2 = context.get('days_until_super2', 0)
days_until_full_year = context.get('days_until_full_year', 0)
prospect_value = profile.get('future_WAR', 2.5)
# Value of extra year of control (roughly $8M per WAR)
extra_year_value = prospect_value * 8_000_000
# Discount based on time delay
if days_until_full_year > 0 and days_until_full_year <= 20:
return extra_year_value * 0.9 # High value to wait
elif days_until_super2 > 0 and days_until_super2 <= 30:
return extra_year_value * 0.3 # Moderate value to wait
else:
return 0
def make_recommendation(self, readiness_score, service_time_value):
"""Make call-up recommendation"""
if readiness_score >= 75:
if service_time_value > 10_000_000:
return "READY - Consider service time timing"
else:
return "READY - Call up now"
elif readiness_score >= 60:
return "CLOSE - Monitor closely, could be ready soon"
elif readiness_score >= 40:
return "DEVELOPING - Needs more time"
else:
return "NOT READY - Significant development needed"
# Example: Evaluate call-up decision for top prospects
model = CallUpDecisionModel()
# Prospect 1: Gunnar Henderson (June 2022)
henderson_profile = {
'recent_wRC+': 138,
'season_wRC+': 125,
'BB_pct': 11.2,
'K_pct': 22.5,
'ISO': 0.215,
'defensive_grade': 55,
'AAA_PA': 245,
'AA_PA': 412,
'future_WAR': 3.5
}
orioles_context = {
'position_depth': 3, # SS/3B
'position_wRC+': 88, # Below average production
'days_until_super2': 45,
'days_until_full_year': 0
}
henderson_eval = model.assess_mlb_readiness(henderson_profile, orioles_context)
print("Gunnar Henderson Call-Up Evaluation:")
print(f"Readiness Score: {henderson_eval['readiness_score']:.1f}/100")
print(f" Performance: {henderson_eval['performance_score']:.1f}")
print(f" Skills: {henderson_eval['skills_score']:.1f}")
print(f" Experience: {henderson_eval['experience_score']:.1f}")
print(f" Need: {henderson_eval['need_score']:.1f}")
print(f"Service Time Value: ${henderson_eval['service_time_value']:,.0f}")
print(f"Recommendation: {henderson_eval['recommendation']}")
# Prospect 2: Jordan Walker (March 2023 - Opening Day consideration)
walker_profile = {
'recent_wRC+': 115, # Spring training
'season_wRC+': 135, # Previous AA season
'BB_pct': 8.8,
'K_pct': 23.8,
'ISO': 0.235,
'defensive_grade': 50,
'AAA_PA': 0, # No AAA experience
'AA_PA': 456,
'future_WAR': 4.0
}
cardinals_context = {
'position_depth': 4, # OF depth
'position_wRC+': 105, # Average production
'days_until_super2': 0,
'days_until_full_year': 15 # Opening Day decision
}
walker_eval = model.assess_mlb_readiness(walker_profile, cardinals_context)
print("\n\nJordan Walker Call-Up Evaluation:")
print(f"Readiness Score: {walker_eval['readiness_score']:.1f}/100")
print(f" Performance: {walker_eval['performance_score']:.1f}")
print(f" Skills: {walker_eval['skills_score']:.1f}")
print(f" Experience: {walker_eval['experience_score']:.1f}")
print(f" Need: {walker_eval['need_score']:.1f}")
print(f"Service Time Value: ${walker_eval['service_time_value']:,.0f}")
print(f"Recommendation: {walker_eval['recommendation']}")
class PerformanceTriggerSystem:
def __init__(self):
self.trigger_thresholds = {
'AAA': {'wRC+': 120, 'K_pct': 25, 'BB_pct': 8, 'min_PA': 200},
'AA': {'wRC+': 130, 'K_pct': 23, 'BB_pct': 9, 'min_PA': 300}
}
def check_promotion_triggers(self, player_stats, current_level):
"""
Check if player has met performance triggers for promotion
"""
thresholds = self.trigger_thresholds[current_level]
triggers_met = []
# Check each threshold
if player_stats['PA'] >= thresholds['min_PA']:
triggers_met.append('Playing Time')
if player_stats['wRC+'] >= thresholds['wRC+']:
triggers_met.append('Overall Performance')
if player_stats['K_pct'] <= thresholds['K_pct']:
triggers_met.append('Strikeout Rate')
if player_stats['BB_pct'] >= thresholds['BB_pct']:
triggers_met.append('Walk Rate')
# Age consideration - young players get promoted more aggressively
if player_stats['age'] <= 22 and player_stats['wRC+'] >= 110:
triggers_met.append('Age-Adjusted Performance')
# Determine readiness
triggers_total = 5 if player_stats['age'] <= 22 else 4
triggers_pct = len(triggers_met) / triggers_total
if triggers_pct >= 0.75:
recommendation = "PROMOTE"
elif triggers_pct >= 0.50:
recommendation = "MONITOR"
else:
recommendation = "CONTINUE DEVELOPMENT"
return {
'triggers_met': triggers_met,
'triggers_percentage': triggers_pct,
'recommendation': recommendation
}
# Example: Monitor multiple prospects for promotion triggers
trigger_system = PerformanceTriggerSystem()
prospects_to_monitor = pd.DataFrame({
'name': ['Prospect A', 'Prospect B', 'Prospect C'],
'level': ['AAA', 'AA', 'AAA'],
'age': [22, 20, 24],
'PA': [285, 345, 412],
'wRC+': [128, 135, 118],
'K_pct': [23.5, 22.1, 26.8],
'BB_pct': [9.2, 10.5, 7.8]
})
print("Promotion Trigger Analysis:\n")
for idx, prospect in prospects_to_monitor.iterrows():
result = trigger_system.check_promotion_triggers(prospect, prospect['level'])
print(f"{prospect['name']} ({prospect['level']}, Age {prospect['age']}):")
print(f" Triggers Met: {', '.join(result['triggers_met']) if result['triggers_met'] else 'None'}")
print(f" Readiness: {result['triggers_percentage']:.0%}")
print(f" Recommendation: {result['recommendation']}\n")
Exercise 15.1: Age-Adjusted Performance Analysis
Task: Analyze a prospect's performance adjusting for age relative to league average. Using the provided data, calculate age-adjusted metrics and determine if the prospect is performing above or below expectations.
Data:
Prospect: SS, Age 20
Level: High-A (League Avg Age: 22.8)
Stats: .275 AVG, .345 OBP, .485 SLG, 15 HR, 285 PA
12.2% BB%, 24.5% K%, .210 ISO
Questions:
- Calculate the prospect's age-adjusted wRC+ (assume league average is 100)
- How does the strikeout rate compare when adjusted for age?
- Based on age-adjusted metrics, is this prospect ahead or behind the development curve?
- What level should this prospect be promoted to next, and why?
Exercise 15.2: Breakout Candidate Identification
Task: Using the swing decision and contact quality metrics below, identify which prospect is most likely to break out in the next season.
Prospect Comparison:
| Metric | Prospect A | Prospect B | Prospect C |
|---|---|---|---|
| Current wRC+ | 105 | 118 | 98 |
| Chase Rate Change | -4.5% | -1.2% | +2.1% |
| Zone Contact Change | +3.2% | +1.8% | -0.5% |
| Avg EV Change | +2.1 mph | +0.8 mph | +3.5 mph |
| Barrel Rate | 8.5% | 11.2% | 6.8% |
| Age | 22 | 24 | 21 |
Questions:
- Calculate a composite breakout score for each prospect
- Which prospect shows the most promising leading indicators?
- What specific improvements drive your choice?
- What realistic wRC+ would you project for each prospect next season?
Exercise 15.3: International Prospect Translation
Task: Translate the following KBO statistics to MLB equivalents and project first-year MLB performance.
Player Data:
Player: OF, Age 26
League: KBO (Korean Baseball Organization)
Stats: .318 AVG, .385 OBP, .538 SLG, 28 HR, 550 PA
9.5% BB%, 15.2% K%, .220 ISO
Previous MLB exposure: None
Questions:
- Translate the KBO statistics to MLB equivalents using appropriate league factors
- What MLB slash line would you project for Year 1?
- What is the biggest risk factor in this projection?
- How would your projection change if the player were age 23 instead of 26?
Exercise 15.4: Call-Up Decision Analysis
Task: You are the GM. Determine whether to call up your top prospect on May 1st or wait until mid-June for service time reasons.
Prospect Profile:
Position: 3B, Age 22
AAA Stats (145 PA): .298/.375/.512, 6 HR, 12.4% BB%, 21.2% K%
AA Stats (425 PA): .285/.360/.485, 18 HR, 10.1% BB%, 24.5% K%
Defensive Grade: 55 (above average)
Future WAR Projection: 4.0 WAR annually (ages 25-29)
Team Context:
Current 3B Production: 85 wRC+ (below average)
Team Record: 15-18 (below .500)
Payroll Situation: Middle of pack
Days until full year service time: 12 days (mid-April)
Estimated Super Two cutoff: Already passed
Questions:
- Calculate the financial value of delaying the call-up until mid-June
- What is the estimated WAR cost of keeping an 85 wRC+ player at 3B for 6 more weeks?
- Make your recommendation: Call up now or wait? Justify with analysis.
- What performance threshold would make you change your decision?
Exercise Solutions: Solutions to these exercises involve combining multiple analytical techniques from the chapter. Students should use the code frameworks provided to build their own analysis pipelines, applying appropriate age adjustments, translation factors, and decision models. The exercises emphasize practical decision-making under uncertainty, mirroring real-world front office challenges.
Summary
Player development analytics represents the convergence of traditional scouting and modern data science. Success requires understanding both the quantitative metrics that predict future performance and the qualitative factors that influence player development trajectories.
Key takeaways:
- Age matters: Always adjust performance metrics for age relative to league average. A 20-year-old posting league-average numbers in Double-A is far more impressive than a 25-year-old doing the same.
- Leading indicators beat results: Swing decision metrics, contact quality, and plate discipline improvements often predict breakouts before traditional statistics reflect the change.
- Context is critical: Whether evaluating international players, translating minor league stats, or making call-up decisions, understanding context—competitive environment, organizational need, service time implications—determines decision quality.
- Projection uncertainty increases with distance: The younger the player and the lower the level, the wider the confidence intervals. Build ranges, not point estimates.
- Development is non-linear: Players progress at different rates. Some break out immediately, others need time to adjust to each level. Patience with high-upside prospects often yields superior long-term value.
The organizations that excel at player development combine these analytical frameworks with deep scouting expertise, creating systems that identify talent earlier, develop it more effectively, and deploy it optimally. In an era of controlled spending and competitive balance mechanisms, sustainable success increasingly flows through the minor league system.
Prospect: SS, Age 20
Level: High-A (League Avg Age: 22.8)
Stats: .275 AVG, .345 OBP, .485 SLG, 15 HR, 285 PA
12.2% BB%, 24.5% K%, .210 ISO
Player: OF, Age 26
League: KBO (Korean Baseball Organization)
Stats: .318 AVG, .385 OBP, .538 SLG, 28 HR, 550 PA
9.5% BB%, 15.2% K%, .220 ISO
Previous MLB exposure: None
Position: 3B, Age 22
AAA Stats (145 PA): .298/.375/.512, 6 HR, 12.4% BB%, 21.2% K%
AA Stats (425 PA): .285/.360/.485, 18 HR, 10.1% BB%, 24.5% K%
Defensive Grade: 55 (above average)
Future WAR Projection: 4.0 WAR annually (ages 25-29)
Current 3B Production: 85 wRC+ (below average)
Team Record: 15-18 (below .500)
Payroll Situation: Middle of pack
Days until full year service time: 12 days (mid-April)
Estimated Super Two cutoff: Already passed
Practice Exercises
Reinforce what you've learned with these hands-on exercises. Try to solve them on your own before viewing hints or solutions.
Tips for Success
- Read the problem carefully before starting to code
- Break down complex problems into smaller steps
- Use the hints if you're stuck - they won't give away the answer
- After solving, compare your approach with the solution
Age-Adjusted Performance Analysis
**Data**:
```
Prospect: SS, Age 20
Level: High-A (League Avg Age: 22.8)
Stats: .275 AVG, .345 OBP, .485 SLG, 15 HR, 285 PA
12.2% BB%, 24.5% K%, .210 ISO
```
**Questions**:
1. Calculate the prospect's age-adjusted wRC+ (assume league average is 100)
2. How does the strikeout rate compare when adjusted for age?
3. Based on age-adjusted metrics, is this prospect ahead or behind the development curve?
4. What level should this prospect be promoted to next, and why?
Breakout Candidate Identification
**Prospect Comparison**:
| Metric | Prospect A | Prospect B | Prospect C |
|--------|-----------|-----------|-----------|
| Current wRC+ | 105 | 118 | 98 |
| Chase Rate Change | -4.5% | -1.2% | +2.1% |
| Zone Contact Change | +3.2% | +1.8% | -0.5% |
| Avg EV Change | +2.1 mph | +0.8 mph | +3.5 mph |
| Barrel Rate | 8.5% | 11.2% | 6.8% |
| Age | 22 | 24 | 21 |
**Questions**:
1. Calculate a composite breakout score for each prospect
2. Which prospect shows the most promising leading indicators?
3. What specific improvements drive your choice?
4. What realistic wRC+ would you project for each prospect next season?
International Prospect Translation
**Player Data**:
```
Player: OF, Age 26
League: KBO (Korean Baseball Organization)
Stats: .318 AVG, .385 OBP, .538 SLG, 28 HR, 550 PA
9.5% BB%, 15.2% K%, .220 ISO
Previous MLB exposure: None
```
**Questions**:
1. Translate the KBO statistics to MLB equivalents using appropriate league factors
2. What MLB slash line would you project for Year 1?
3. What is the biggest risk factor in this projection?
4. How would your projection change if the player were age 23 instead of 26?
Call-Up Decision Analysis
**Prospect Profile**:
```
Position: 3B, Age 22
AAA Stats (145 PA): .298/.375/.512, 6 HR, 12.4% BB%, 21.2% K%
AA Stats (425 PA): .285/.360/.485, 18 HR, 10.1% BB%, 24.5% K%
Defensive Grade: 55 (above average)
Future WAR Projection: 4.0 WAR annually (ages 25-29)
```
**Team Context**:
```
Current 3B Production: 85 wRC+ (below average)
Team Record: 15-18 (below .500)
Payroll Situation: Middle of pack
Days until full year service time: 12 days (mid-April)
Estimated Super Two cutoff: Already passed
```
**Questions**:
1. Calculate the financial value of delaying the call-up until mid-June
2. What is the estimated WAR cost of keeping an 85 wRC+ player at 3B for 6 more weeks?
3. Make your recommendation: Call up now or wait? Justify with analysis.
4. What performance threshold would make you change your decision?
---
**Exercise Solutions**: Solutions to these exercises involve combining multiple analytical techniques from the chapter. Students should use the code frameworks provided to build their own analysis pipelines, applying appropriate age adjustments, translation factors, and decision models. The exercises emphasize practical decision-making under uncertainty, mirroring real-world front office challenges.