Chapter 15: Player Development & Minor League Analytics

Player development represents one of the most critical yet underappreciated aspects of building a successful Major League Baseball organization. While free agency and trades capture headlines, sustainable success often stems from cultivating talent through the minor league system. Modern analytics has revolutionized how organizations evaluate prospects, project future performance, and make strategic decisions about player advancement.

Advanced ~5 min read 7 sections 21 code examples 4 exercises
Book Progress
30%
Chapter 16 of 54
What You'll Learn
  • Minor League System Overview
  • Prospect Evaluation Metrics
  • Projection Systems
  • Breakout Candidates
  • And 3 more topics...
Languages in This Chapter
R (11) Python (10)

All code examples can be copied and run in your environment.

15.1 Minor League System Overview

The minor league system serves as baseball's developmental pipeline, with six organizational levels: Rookie ball, Single-A, High-A, Double-A, and Triple-A, plus complex leagues. Each level presents distinct analytical challenges and opportunities.

System Structure and Data Availability

Modern minor league analytics benefit from increasingly robust data collection. StatCast-equivalent systems have been implemented across Triple-A and Double-A since 2020, providing granular tracking data previously available only at the major league level. This technological advancement enables more sophisticated player evaluation and development strategies.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Minor league level progression framework
class MinorLeagueSystem:
    def __init__(self):
        self.levels = {
            'Rookie': {'age_range': (17, 21), 'difficulty': 1},
            'Low-A': {'age_range': (18, 22), 'difficulty': 2},
            'High-A': {'age_range': (19, 23), 'difficulty': 3},
            'AA': {'age_range': (20, 25), 'difficulty': 4},
            'AAA': {'age_range': (21, 28), 'difficulty': 5}
        }

    def calculate_age_adjusted_performance(self, stats_df):
        """
        Adjust performance metrics based on age relative to league average
        """
        stats_df['age_diff'] = stats_df['age'] - stats_df['league_avg_age']

        # Younger players face more difficulty - adjust stats upward
        stats_df['wRC+_age_adj'] = stats_df['wRC+'] + (stats_df['age_diff'] * -3)
        stats_df['K_pct_age_adj'] = stats_df['K_pct'] + (stats_df['age_diff'] * 0.5)
        stats_df['BB_pct_age_adj'] = stats_df['BB_pct'] + (stats_df['age_diff'] * -0.3)

        return stats_df

    def calculate_level_difficulty_factor(self, from_level, to_level):
        """
        Calculate difficulty increase between levels
        """
        difficulty_jump = (self.levels[to_level]['difficulty'] -
                          self.levels[from_level]['difficulty'])
        return 1 + (difficulty_jump * 0.15)  # ~15% difficulty per level

# Example: Analyzing a prospect's progression
prospect_data = pd.DataFrame({
    'level': ['Low-A', 'High-A', 'AA', 'AAA'],
    'age': [20, 21, 22, 23],
    'league_avg_age': [21.5, 22.8, 24.1, 26.3],
    'wRC+': [115, 108, 98, 105],
    'K_pct': [22.5, 24.1, 26.8, 24.2],
    'BB_pct': [8.5, 9.2, 10.1, 11.3],
    'ISO': [.185, .172, .158, .168]
})

system = MinorLeagueSystem()
adjusted_stats = system.calculate_age_adjusted_performance(prospect_data)

print("Age-Adjusted Performance Metrics:")
print(adjusted_stats[['level', 'age', 'wRC+', 'wRC+_age_adj']])
library(tidyverse)
library(ggplot2)

# Minor league progression analysis
analyze_minor_league_progression <- function(player_data) {
  # Calculate age-adjusted performance
  player_data <- player_data %>%
    mutate(
      age_diff = age - league_avg_age,
      wRC_plus_adj = wRC_plus + (age_diff * -3),
      # Level difficulty adjustment
      level_num = case_when(
        level == "Rookie" ~ 1,
        level == "Low-A" ~ 2,
        level == "High-A" ~ 3,
        level == "AA" ~ 4,
        level == "AAA" ~ 5
      )
    )

  return(player_data)
}

# Example: Bobby Witt Jr.'s minor league progression
witt_progression <- tibble(
  year = c(2019, 2020, 2021, 2021),
  level = c("Rookie", "Low-A", "High-A", "AA"),
  age = c(19, 20, 21, 21),
  league_avg_age = c(20.1, 21.5, 22.8, 24.1),
  wRC_plus = c(128, NA, 142, 135),  # 2020 cancelled
  K_pct = c(25.8, NA, 20.1, 22.4),
  BB_pct = c(9.1, NA, 11.2, 10.8),
  ISO = c(.198, NA, .246, .221)
)

witt_adjusted <- analyze_minor_league_progression(witt_progression)

# Visualize progression
ggplot(witt_adjusted %>% filter(!is.na(wRC_plus)),
       aes(x = level_num, y = wRC_plus_adj)) +
  geom_line(size = 1.2, color = "#004687") +
  geom_point(size = 4, color = "#BD9B60") +
  labs(
    title = "Bobby Witt Jr. - Age-Adjusted Performance Progression",
    x = "Minor League Level",
    y = "Age-Adjusted wRC+",
    subtitle = "Accounting for age relative to league average"
  ) +
  theme_minimal()

Key Performance Indicators by Level

Different metrics carry varying predictive weight at different levels. In lower minors, raw tools and plate discipline metrics often matter more than results. As players advance, the ability to make consistent hard contact and handle advanced pitching becomes paramount.

Critical Metrics by Level:

  • Rookie/Low-A: Walk rate, strikeout rate, exit velocity (where available)
  • High-A: Contact quality, swing decisions, pitch recognition
  • Double-A: Advanced metrics (xwOBA, hard-hit rate), platoon splits
  • Triple-A: MLB-readiness indicators, specific skill refinements
R
library(tidyverse)
library(ggplot2)

# Minor league progression analysis
analyze_minor_league_progression <- function(player_data) {
  # Calculate age-adjusted performance
  player_data <- player_data %>%
    mutate(
      age_diff = age - league_avg_age,
      wRC_plus_adj = wRC_plus + (age_diff * -3),
      # Level difficulty adjustment
      level_num = case_when(
        level == "Rookie" ~ 1,
        level == "Low-A" ~ 2,
        level == "High-A" ~ 3,
        level == "AA" ~ 4,
        level == "AAA" ~ 5
      )
    )

  return(player_data)
}

# Example: Bobby Witt Jr.'s minor league progression
witt_progression <- tibble(
  year = c(2019, 2020, 2021, 2021),
  level = c("Rookie", "Low-A", "High-A", "AA"),
  age = c(19, 20, 21, 21),
  league_avg_age = c(20.1, 21.5, 22.8, 24.1),
  wRC_plus = c(128, NA, 142, 135),  # 2020 cancelled
  K_pct = c(25.8, NA, 20.1, 22.4),
  BB_pct = c(9.1, NA, 11.2, 10.8),
  ISO = c(.198, NA, .246, .221)
)

witt_adjusted <- analyze_minor_league_progression(witt_progression)

# Visualize progression
ggplot(witt_adjusted %>% filter(!is.na(wRC_plus)),
       aes(x = level_num, y = wRC_plus_adj)) +
  geom_line(size = 1.2, color = "#004687") +
  geom_point(size = 4, color = "#BD9B60") +
  labs(
    title = "Bobby Witt Jr. - Age-Adjusted Performance Progression",
    x = "Minor League Level",
    y = "Age-Adjusted wRC+",
    subtitle = "Accounting for age relative to league average"
  ) +
  theme_minimal()
Python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Minor league level progression framework
class MinorLeagueSystem:
    def __init__(self):
        self.levels = {
            'Rookie': {'age_range': (17, 21), 'difficulty': 1},
            'Low-A': {'age_range': (18, 22), 'difficulty': 2},
            'High-A': {'age_range': (19, 23), 'difficulty': 3},
            'AA': {'age_range': (20, 25), 'difficulty': 4},
            'AAA': {'age_range': (21, 28), 'difficulty': 5}
        }

    def calculate_age_adjusted_performance(self, stats_df):
        """
        Adjust performance metrics based on age relative to league average
        """
        stats_df['age_diff'] = stats_df['age'] - stats_df['league_avg_age']

        # Younger players face more difficulty - adjust stats upward
        stats_df['wRC+_age_adj'] = stats_df['wRC+'] + (stats_df['age_diff'] * -3)
        stats_df['K_pct_age_adj'] = stats_df['K_pct'] + (stats_df['age_diff'] * 0.5)
        stats_df['BB_pct_age_adj'] = stats_df['BB_pct'] + (stats_df['age_diff'] * -0.3)

        return stats_df

    def calculate_level_difficulty_factor(self, from_level, to_level):
        """
        Calculate difficulty increase between levels
        """
        difficulty_jump = (self.levels[to_level]['difficulty'] -
                          self.levels[from_level]['difficulty'])
        return 1 + (difficulty_jump * 0.15)  # ~15% difficulty per level

# Example: Analyzing a prospect's progression
prospect_data = pd.DataFrame({
    'level': ['Low-A', 'High-A', 'AA', 'AAA'],
    'age': [20, 21, 22, 23],
    'league_avg_age': [21.5, 22.8, 24.1, 26.3],
    'wRC+': [115, 108, 98, 105],
    'K_pct': [22.5, 24.1, 26.8, 24.2],
    'BB_pct': [8.5, 9.2, 10.1, 11.3],
    'ISO': [.185, .172, .158, .168]
})

system = MinorLeagueSystem()
adjusted_stats = system.calculate_age_adjusted_performance(prospect_data)

print("Age-Adjusted Performance Metrics:")
print(adjusted_stats[['level', 'age', 'wRC+', 'wRC+_age_adj']])

15.2 Prospect Evaluation Metrics

Traditional scouting grades (20-80 scale) combine with modern analytics to create comprehensive prospect evaluation frameworks. The key is understanding which metrics translate across levels and predict major league success.

Composite Evaluation Framework

class ProspectEvaluator:
    def __init__(self):
        self.weights = {
            'hit_tool': 0.25,
            'power_tool': 0.20,
            'speed_tool': 0.10,
            'plate_discipline': 0.25,
            'contact_quality': 0.20
        }

    def calculate_hit_probability(self, prospect_metrics):
        """
        Calculate probability of MLB success based on minor league metrics
        """
        # Key predictive metrics
        k_rate_score = self.normalize_k_rate(prospect_metrics['K_pct'])
        bb_rate_score = self.normalize_bb_rate(prospect_metrics['BB_pct'])
        contact_score = self.normalize_contact(prospect_metrics['contact_pct'])
        power_score = self.normalize_power(prospect_metrics['ISO'])

        # Composite score
        composite = (
            k_rate_score * 0.30 +
            bb_rate_score * 0.25 +
            contact_score * 0.25 +
            power_score * 0.20
        )

        # Convert to probability using logistic function
        probability = 1 / (1 + np.exp(-5 * (composite - 0.5)))
        return probability

    def normalize_k_rate(self, k_pct):
        """Lower is better - normalize to 0-1 scale"""
        # Elite: 15%, Average: 23%, Poor: 30%
        return max(0, min(1, (30 - k_pct) / 15))

    def normalize_bb_rate(self, bb_pct):
        """Higher is better"""
        # Elite: 12%, Average: 8%, Poor: 4%
        return max(0, min(1, (bb_pct - 4) / 8))

    def normalize_contact(self, contact_pct):
        """Higher is better"""
        # Elite: 85%, Average: 75%, Poor: 65%
        return max(0, min(1, (contact_pct - 65) / 20))

    def normalize_power(self, iso):
        """Higher is better"""
        # Elite: .250, Average: .150, Poor: .080
        return max(0, min(1, (iso - 0.080) / 0.170))

# Example: Evaluating top prospects
prospects = pd.DataFrame({
    'name': ['Adley Rutschman', 'Julio Rodriguez', 'Bobby Witt Jr.', 'Riley Greene'],
    'level': ['AAA', 'AA', 'AA', 'AAA'],
    'K_pct': [18.2, 22.4, 22.4, 20.1],
    'BB_pct': [13.5, 10.8, 10.8, 11.2],
    'contact_pct': [81.2, 76.5, 77.8, 79.3],
    'ISO': [.242, .221, .221, .198],
    'age': [23, 20, 21, 21]
})

evaluator = ProspectEvaluator()

prospects['mlb_success_prob'] = prospects.apply(
    lambda row: evaluator.calculate_hit_probability(row), axis=1
)

print("\nProspect MLB Success Probability:")
print(prospects[['name', 'level', 'age', 'mlb_success_prob']].sort_values(
    'mlb_success_prob', ascending=False
))
# Prospect evaluation using plate discipline metrics
evaluate_prospect_discipline <- function(prospect_data) {
  # Calculate key ratios and scores
  prospect_data <- prospect_data %>%
    mutate(
      bb_k_ratio = BB_pct / K_pct,
      discipline_score = (BB_pct * 2) - (K_pct * 0.5),

      # Percentile rankings
      bb_percentile = percent_rank(BB_pct),
      k_percentile = 1 - percent_rank(K_pct),  # Inverse for K%
      iso_percentile = percent_rank(ISO),

      # Composite tool grade (20-80 scale)
      composite_grade = 20 + (
        (bb_percentile * 0.3 +
         k_percentile * 0.3 +
         iso_percentile * 0.4) * 60
      )
    )

  return(prospect_data)
}

# Example: 2021 top prospect class evaluation
prospects_2021 <- tibble(
  name = c("Wander Franco", "Adley Rutschman", "Julio Rodriguez",
           "Bobby Witt Jr.", "Riley Greene"),
  level = c("AAA", "AAA", "AA", "AA", "AAA"),
  age = c(20, 23, 20, 21, 21),
  K_pct = c(19.8, 18.2, 22.4, 22.4, 20.1),
  BB_pct = c(11.2, 13.5, 10.8, 10.8, 11.2),
  ISO = c(.198, .242, .221, .221, .198),
  contact_pct = c(80.2, 81.8, 76.5, 77.8, 79.3)
)

evaluated_prospects <- evaluate_prospect_discipline(prospects_2021)

# Display results
print("Prospect Evaluation Scores:")
evaluated_prospects %>%
  select(name, age, discipline_score, composite_grade) %>%
  arrange(desc(composite_grade))

Contact Quality Metrics

With StatCast data available in upper minors, evaluators can assess contact quality using the same metrics as major league analysis: exit velocity, launch angle, barrel rate, and expected statistics (xBA, xSLG, xwOBA).

def analyze_contact_quality(statcast_data):
    """
    Analyze minor league StatCast data for contact quality indicators
    """
    # Calculate key contact metrics
    results = {
        'avg_exit_velo': statcast_data['exit_velocity'].mean(),
        'max_exit_velo': statcast_data['exit_velocity'].max(),
        'barrel_rate': (statcast_data['barrel'] == 1).sum() / len(statcast_data) * 100,
        'hard_hit_rate': (statcast_data['exit_velocity'] >= 95).sum() / len(statcast_data) * 100,
        'sweet_spot_pct': ((statcast_data['launch_angle'] >= 8) &
                          (statcast_data['launch_angle'] <= 32)).sum() / len(statcast_data) * 100
    }

    # Calculate expected statistics
    results['xBA'] = calculate_xBA(statcast_data)
    results['xSLG'] = calculate_xSLG(statcast_data)

    return results

def calculate_xBA(df):
    """Simplified xBA calculation based on exit velo and launch angle"""
    # This is a simplified version - actual xBA uses more complex models
    conditions = [
        (df['exit_velocity'] >= 98) & (df['launch_angle'].between(8, 32)),
        (df['exit_velocity'] >= 90) & (df['launch_angle'].between(10, 30)),
        (df['exit_velocity'] >= 80) & (df['launch_angle'].between(-10, 40))
    ]
    values = [0.750, 0.450, 0.250]

    df['xBA_contact'] = np.select(conditions, values, default=0.100)
    return df['xBA_contact'].mean()

def calculate_xSLG(df):
    """Simplified xSLG calculation"""
    # Estimate bases based on exit velo and launch angle
    conditions = [
        (df['exit_velocity'] >= 100) & (df['launch_angle'].between(20, 35)),  # HR territory
        (df['exit_velocity'] >= 95) & (df['launch_angle'].between(15, 40)),   # XBH likely
        (df['exit_velocity'] >= 90) & (df['launch_angle'].between(10, 30)),   # Solid contact
    ]
    bases = [2.5, 1.8, 1.2]

    df['expected_bases'] = np.select(conditions, bases, default=0.5)
    return df['expected_bases'].mean()

# Example: Analyzing a top prospect's contact quality
np.random.seed(42)
n_batted_balls = 250

# Simulate Julio Rodriguez's AAA data (strong contact profile)
julio_data = pd.DataFrame({
    'exit_velocity': np.random.normal(92.5, 7.5, n_batted_balls),
    'launch_angle': np.random.normal(12, 18, n_batted_balls),
    'barrel': np.random.choice([0, 1], n_batted_balls, p=[0.88, 0.12])
})

julio_contact = analyze_contact_quality(julio_data)
print("\nJulio Rodriguez Contact Quality Profile:")
for metric, value in julio_contact.items():
    print(f"{metric}: {value:.2f}")
R
# Prospect evaluation using plate discipline metrics
evaluate_prospect_discipline <- function(prospect_data) {
  # Calculate key ratios and scores
  prospect_data <- prospect_data %>%
    mutate(
      bb_k_ratio = BB_pct / K_pct,
      discipline_score = (BB_pct * 2) - (K_pct * 0.5),

      # Percentile rankings
      bb_percentile = percent_rank(BB_pct),
      k_percentile = 1 - percent_rank(K_pct),  # Inverse for K%
      iso_percentile = percent_rank(ISO),

      # Composite tool grade (20-80 scale)
      composite_grade = 20 + (
        (bb_percentile * 0.3 +
         k_percentile * 0.3 +
         iso_percentile * 0.4) * 60
      )
    )

  return(prospect_data)
}

# Example: 2021 top prospect class evaluation
prospects_2021 <- tibble(
  name = c("Wander Franco", "Adley Rutschman", "Julio Rodriguez",
           "Bobby Witt Jr.", "Riley Greene"),
  level = c("AAA", "AAA", "AA", "AA", "AAA"),
  age = c(20, 23, 20, 21, 21),
  K_pct = c(19.8, 18.2, 22.4, 22.4, 20.1),
  BB_pct = c(11.2, 13.5, 10.8, 10.8, 11.2),
  ISO = c(.198, .242, .221, .221, .198),
  contact_pct = c(80.2, 81.8, 76.5, 77.8, 79.3)
)

evaluated_prospects <- evaluate_prospect_discipline(prospects_2021)

# Display results
print("Prospect Evaluation Scores:")
evaluated_prospects %>%
  select(name, age, discipline_score, composite_grade) %>%
  arrange(desc(composite_grade))
Python
class ProspectEvaluator:
    def __init__(self):
        self.weights = {
            'hit_tool': 0.25,
            'power_tool': 0.20,
            'speed_tool': 0.10,
            'plate_discipline': 0.25,
            'contact_quality': 0.20
        }

    def calculate_hit_probability(self, prospect_metrics):
        """
        Calculate probability of MLB success based on minor league metrics
        """
        # Key predictive metrics
        k_rate_score = self.normalize_k_rate(prospect_metrics['K_pct'])
        bb_rate_score = self.normalize_bb_rate(prospect_metrics['BB_pct'])
        contact_score = self.normalize_contact(prospect_metrics['contact_pct'])
        power_score = self.normalize_power(prospect_metrics['ISO'])

        # Composite score
        composite = (
            k_rate_score * 0.30 +
            bb_rate_score * 0.25 +
            contact_score * 0.25 +
            power_score * 0.20
        )

        # Convert to probability using logistic function
        probability = 1 / (1 + np.exp(-5 * (composite - 0.5)))
        return probability

    def normalize_k_rate(self, k_pct):
        """Lower is better - normalize to 0-1 scale"""
        # Elite: 15%, Average: 23%, Poor: 30%
        return max(0, min(1, (30 - k_pct) / 15))

    def normalize_bb_rate(self, bb_pct):
        """Higher is better"""
        # Elite: 12%, Average: 8%, Poor: 4%
        return max(0, min(1, (bb_pct - 4) / 8))

    def normalize_contact(self, contact_pct):
        """Higher is better"""
        # Elite: 85%, Average: 75%, Poor: 65%
        return max(0, min(1, (contact_pct - 65) / 20))

    def normalize_power(self, iso):
        """Higher is better"""
        # Elite: .250, Average: .150, Poor: .080
        return max(0, min(1, (iso - 0.080) / 0.170))

# Example: Evaluating top prospects
prospects = pd.DataFrame({
    'name': ['Adley Rutschman', 'Julio Rodriguez', 'Bobby Witt Jr.', 'Riley Greene'],
    'level': ['AAA', 'AA', 'AA', 'AAA'],
    'K_pct': [18.2, 22.4, 22.4, 20.1],
    'BB_pct': [13.5, 10.8, 10.8, 11.2],
    'contact_pct': [81.2, 76.5, 77.8, 79.3],
    'ISO': [.242, .221, .221, .198],
    'age': [23, 20, 21, 21]
})

evaluator = ProspectEvaluator()

prospects['mlb_success_prob'] = prospects.apply(
    lambda row: evaluator.calculate_hit_probability(row), axis=1
)

print("\nProspect MLB Success Probability:")
print(prospects[['name', 'level', 'age', 'mlb_success_prob']].sort_values(
    'mlb_success_prob', ascending=False
))
Python
def analyze_contact_quality(statcast_data):
    """
    Analyze minor league StatCast data for contact quality indicators
    """
    # Calculate key contact metrics
    results = {
        'avg_exit_velo': statcast_data['exit_velocity'].mean(),
        'max_exit_velo': statcast_data['exit_velocity'].max(),
        'barrel_rate': (statcast_data['barrel'] == 1).sum() / len(statcast_data) * 100,
        'hard_hit_rate': (statcast_data['exit_velocity'] >= 95).sum() / len(statcast_data) * 100,
        'sweet_spot_pct': ((statcast_data['launch_angle'] >= 8) &
                          (statcast_data['launch_angle'] <= 32)).sum() / len(statcast_data) * 100
    }

    # Calculate expected statistics
    results['xBA'] = calculate_xBA(statcast_data)
    results['xSLG'] = calculate_xSLG(statcast_data)

    return results

def calculate_xBA(df):
    """Simplified xBA calculation based on exit velo and launch angle"""
    # This is a simplified version - actual xBA uses more complex models
    conditions = [
        (df['exit_velocity'] >= 98) & (df['launch_angle'].between(8, 32)),
        (df['exit_velocity'] >= 90) & (df['launch_angle'].between(10, 30)),
        (df['exit_velocity'] >= 80) & (df['launch_angle'].between(-10, 40))
    ]
    values = [0.750, 0.450, 0.250]

    df['xBA_contact'] = np.select(conditions, values, default=0.100)
    return df['xBA_contact'].mean()

def calculate_xSLG(df):
    """Simplified xSLG calculation"""
    # Estimate bases based on exit velo and launch angle
    conditions = [
        (df['exit_velocity'] >= 100) & (df['launch_angle'].between(20, 35)),  # HR territory
        (df['exit_velocity'] >= 95) & (df['launch_angle'].between(15, 40)),   # XBH likely
        (df['exit_velocity'] >= 90) & (df['launch_angle'].between(10, 30)),   # Solid contact
    ]
    bases = [2.5, 1.8, 1.2]

    df['expected_bases'] = np.select(conditions, bases, default=0.5)
    return df['expected_bases'].mean()

# Example: Analyzing a top prospect's contact quality
np.random.seed(42)
n_batted_balls = 250

# Simulate Julio Rodriguez's AAA data (strong contact profile)
julio_data = pd.DataFrame({
    'exit_velocity': np.random.normal(92.5, 7.5, n_batted_balls),
    'launch_angle': np.random.normal(12, 18, n_batted_balls),
    'barrel': np.random.choice([0, 1], n_batted_balls, p=[0.88, 0.12])
})

julio_contact = analyze_contact_quality(julio_data)
print("\nJulio Rodriguez Contact Quality Profile:")
for metric, value in julio_contact.items():
    print(f"{metric}: {value:.2f}")

15.3 Projection Systems

Projection systems for minor leaguers face unique challenges: limited sample sizes, developmental trajectories, and the difficulty of translating performance across competitive levels. Modern systems combine statistical translation with aging curves and scouting inputs.

MiLB-to-MLB Translation

The fundamental challenge is converting minor league statistics to major league equivalents. Bill James pioneered this approach with his Minor League Equivalencies (MLE), which remains the foundation for modern systems.

library(tidyverse)

# Minor League Equivalency (MLE) translation factors
calculate_mle <- function(minor_stats, level) {
  # Translation factors by level (approximate)
  translation_factors <- list(
    "AAA" = list(avg = 0.77, iso = 0.85, bb_pct = 0.92, k_pct = 1.15),
    "AA" = list(avg = 0.73, iso = 0.80, bb_pct = 0.88, k_pct = 1.22),
    "High-A" = list(avg = 0.68, iso = 0.75, bb_pct = 0.82, k_pct = 1.30),
    "Low-A" = list(avg = 0.63, iso = 0.70, bb_pct = 0.78, k_pct = 1.40)
  )

  factors <- translation_factors[[level]]

  # Apply translations
  mle <- minor_stats %>%
    mutate(
      mlb_avg = avg * factors$avg,
      mlb_iso = iso * factors$iso,
      mlb_bb_pct = bb_pct * factors$bb_pct,
      mlb_k_pct = k_pct * factors$k_pct,
      mlb_obp = mlb_avg + mlb_bb_pct,
      mlb_slg = mlb_avg + mlb_iso,
      mlb_wOBA = calculate_wOBA(mlb_avg, mlb_bb_pct, mlb_iso)
    )

  return(mle)
}

calculate_wOBA <- function(avg, bb_pct, iso) {
  # Simplified wOBA calculation
  # Actual formula uses specific event weights
  singles <- avg - iso
  extra_bases <- iso

  wOBA <- (bb_pct * 0.69) + (singles * 0.88) + (extra_bases * 1.60)
  return(wOBA)
}

# Example: Project Spencer Torkelson's AAA performance to MLB
torkelson_aaa <- tibble(
  level = "AAA",
  pa = 358,
  avg = .267,
  iso = .234,
  bb_pct = 0.112,
  k_pct = 0.279,
  hr = 30,
  age = 21
)

torkelson_mlb_proj <- calculate_mle(torkelson_aaa, "AAA")

print("Spencer Torkelson MLB Projection:")
print(torkelson_mlb_proj %>%
  select(mlb_avg, mlb_iso, mlb_bb_pct, mlb_k_pct, mlb_wOBA))

Age-Based Developmental Curves

Players develop at different rates, but aggregate data reveals consistent aging patterns. Young players typically improve through age 27, then gradually decline. Prospects require age-adjusted projections.

class ProspectProjectionSystem:
    def __init__(self):
        # Age adjustment factors (peak = 27)
        self.age_factors = {
            19: -0.050, 20: -0.040, 21: -0.030, 22: -0.020, 23: -0.010,
            24: -0.005, 25: 0.000, 26: 0.005, 27: 0.010, 28: 0.005,
            29: 0.000, 30: -0.010, 31: -0.020, 32: -0.035
        }

    def project_future_performance(self, current_stats, current_age, target_age):
        """
        Project performance from current age to target age
        """
        current_factor = self.age_factors.get(current_age, 0)
        target_factor = self.age_factors.get(target_age, 0)

        improvement = target_factor - current_factor

        # Apply improvement to stats
        projected = current_stats.copy()
        projected['wRC+'] = current_stats['wRC+'] * (1 + improvement)
        projected['ISO'] = current_stats['ISO'] * (1 + improvement * 1.2)  # Power develops more
        projected['K_pct'] = current_stats['K_pct'] * (1 - improvement * 0.8)  # K% improves
        projected['BB_pct'] = current_stats['BB_pct'] * (1 + improvement * 0.6)  # BB% improves

        return projected

    def create_projection_range(self, stats, age, confidence=0.80):
        """
        Create projection range with confidence intervals
        """
        # Project to age 27 (peak)
        peak_projection = self.project_future_performance(stats, age, 27)

        # Calculate uncertainty based on age distance from peak
        age_distance = abs(27 - age)
        uncertainty = 0.10 + (age_distance * 0.015)  # More uncertainty for younger players

        # Calculate confidence intervals
        z_score = 1.282 if confidence == 0.80 else 1.645  # 80% or 90% confidence

        projection = {}
        for stat in peak_projection.index:
            mean_val = peak_projection[stat]
            std_error = mean_val * uncertainty

            projection[stat] = {
                'median': mean_val,
                'lower': mean_val - (z_score * std_error),
                'upper': mean_val + (z_score * std_error)
            }

        return projection

# Example: Project Gunnar Henderson's development
henderson_current = pd.Series({
    'wRC+': 125,
    'ISO': .215,
    'K_pct': 24.5,
    'BB_pct': 10.2
})

projector = ProspectProjectionSystem()
henderson_proj = projector.create_projection_range(henderson_current, age=22)

print("\nGunnar Henderson Age-27 Projection (80% confidence):")
for stat, values in henderson_proj.items():
    print(f"{stat}: {values['median']:.3f} ({values['lower']:.3f} - {values['upper']:.3f})")

Multi-Year Projection Framework

def create_multi_year_projection(player_data, years=5):
    """
    Create multi-year projections incorporating development and regression
    """
    projections = []
    current_age = player_data['age']
    current_stats = player_data['stats']

    projector = ProspectProjectionSystem()

    for year in range(years):
        target_age = current_age + year

        # Get base projection
        year_proj = projector.project_future_performance(
            current_stats, current_age, target_age
        )

        # Add playing time projection (increases as player establishes)
        pa_projection = min(600, 400 + (year * 50))

        # Regression for young players
        regression_factor = 0.85 if year == 0 else 0.95

        proj_dict = {
            'year': year + 1,
            'age': target_age,
            'PA': pa_projection,
            'wRC+': year_proj['wRC+'] * regression_factor,
            'ISO': year_proj['ISO'] * regression_factor,
            'K_pct': year_proj['K_pct'] / regression_factor,
            'BB_pct': year_proj['BB_pct'] * regression_factor
        }

        projections.append(proj_dict)

    return pd.DataFrame(projections)

# Example: 5-year projection for top prospect
prospect_profile = {
    'name': 'Jackson Holliday',
    'age': 20,
    'stats': pd.Series({
        'wRC+': 135,
        'ISO': .225,
        'K_pct': 21.5,
        'BB_pct': 12.8
    })
}

holliday_projection = create_multi_year_projection(prospect_profile)
print("\nJackson Holliday 5-Year Projection:")
print(holliday_projection.round(1))
R
library(tidyverse)

# Minor League Equivalency (MLE) translation factors
calculate_mle <- function(minor_stats, level) {
  # Translation factors by level (approximate)
  translation_factors <- list(
    "AAA" = list(avg = 0.77, iso = 0.85, bb_pct = 0.92, k_pct = 1.15),
    "AA" = list(avg = 0.73, iso = 0.80, bb_pct = 0.88, k_pct = 1.22),
    "High-A" = list(avg = 0.68, iso = 0.75, bb_pct = 0.82, k_pct = 1.30),
    "Low-A" = list(avg = 0.63, iso = 0.70, bb_pct = 0.78, k_pct = 1.40)
  )

  factors <- translation_factors[[level]]

  # Apply translations
  mle <- minor_stats %>%
    mutate(
      mlb_avg = avg * factors$avg,
      mlb_iso = iso * factors$iso,
      mlb_bb_pct = bb_pct * factors$bb_pct,
      mlb_k_pct = k_pct * factors$k_pct,
      mlb_obp = mlb_avg + mlb_bb_pct,
      mlb_slg = mlb_avg + mlb_iso,
      mlb_wOBA = calculate_wOBA(mlb_avg, mlb_bb_pct, mlb_iso)
    )

  return(mle)
}

calculate_wOBA <- function(avg, bb_pct, iso) {
  # Simplified wOBA calculation
  # Actual formula uses specific event weights
  singles <- avg - iso
  extra_bases <- iso

  wOBA <- (bb_pct * 0.69) + (singles * 0.88) + (extra_bases * 1.60)
  return(wOBA)
}

# Example: Project Spencer Torkelson's AAA performance to MLB
torkelson_aaa <- tibble(
  level = "AAA",
  pa = 358,
  avg = .267,
  iso = .234,
  bb_pct = 0.112,
  k_pct = 0.279,
  hr = 30,
  age = 21
)

torkelson_mlb_proj <- calculate_mle(torkelson_aaa, "AAA")

print("Spencer Torkelson MLB Projection:")
print(torkelson_mlb_proj %>%
  select(mlb_avg, mlb_iso, mlb_bb_pct, mlb_k_pct, mlb_wOBA))
Python
class ProspectProjectionSystem:
    def __init__(self):
        # Age adjustment factors (peak = 27)
        self.age_factors = {
            19: -0.050, 20: -0.040, 21: -0.030, 22: -0.020, 23: -0.010,
            24: -0.005, 25: 0.000, 26: 0.005, 27: 0.010, 28: 0.005,
            29: 0.000, 30: -0.010, 31: -0.020, 32: -0.035
        }

    def project_future_performance(self, current_stats, current_age, target_age):
        """
        Project performance from current age to target age
        """
        current_factor = self.age_factors.get(current_age, 0)
        target_factor = self.age_factors.get(target_age, 0)

        improvement = target_factor - current_factor

        # Apply improvement to stats
        projected = current_stats.copy()
        projected['wRC+'] = current_stats['wRC+'] * (1 + improvement)
        projected['ISO'] = current_stats['ISO'] * (1 + improvement * 1.2)  # Power develops more
        projected['K_pct'] = current_stats['K_pct'] * (1 - improvement * 0.8)  # K% improves
        projected['BB_pct'] = current_stats['BB_pct'] * (1 + improvement * 0.6)  # BB% improves

        return projected

    def create_projection_range(self, stats, age, confidence=0.80):
        """
        Create projection range with confidence intervals
        """
        # Project to age 27 (peak)
        peak_projection = self.project_future_performance(stats, age, 27)

        # Calculate uncertainty based on age distance from peak
        age_distance = abs(27 - age)
        uncertainty = 0.10 + (age_distance * 0.015)  # More uncertainty for younger players

        # Calculate confidence intervals
        z_score = 1.282 if confidence == 0.80 else 1.645  # 80% or 90% confidence

        projection = {}
        for stat in peak_projection.index:
            mean_val = peak_projection[stat]
            std_error = mean_val * uncertainty

            projection[stat] = {
                'median': mean_val,
                'lower': mean_val - (z_score * std_error),
                'upper': mean_val + (z_score * std_error)
            }

        return projection

# Example: Project Gunnar Henderson's development
henderson_current = pd.Series({
    'wRC+': 125,
    'ISO': .215,
    'K_pct': 24.5,
    'BB_pct': 10.2
})

projector = ProspectProjectionSystem()
henderson_proj = projector.create_projection_range(henderson_current, age=22)

print("\nGunnar Henderson Age-27 Projection (80% confidence):")
for stat, values in henderson_proj.items():
    print(f"{stat}: {values['median']:.3f} ({values['lower']:.3f} - {values['upper']:.3f})")
Python
def create_multi_year_projection(player_data, years=5):
    """
    Create multi-year projections incorporating development and regression
    """
    projections = []
    current_age = player_data['age']
    current_stats = player_data['stats']

    projector = ProspectProjectionSystem()

    for year in range(years):
        target_age = current_age + year

        # Get base projection
        year_proj = projector.project_future_performance(
            current_stats, current_age, target_age
        )

        # Add playing time projection (increases as player establishes)
        pa_projection = min(600, 400 + (year * 50))

        # Regression for young players
        regression_factor = 0.85 if year == 0 else 0.95

        proj_dict = {
            'year': year + 1,
            'age': target_age,
            'PA': pa_projection,
            'wRC+': year_proj['wRC+'] * regression_factor,
            'ISO': year_proj['ISO'] * regression_factor,
            'K_pct': year_proj['K_pct'] / regression_factor,
            'BB_pct': year_proj['BB_pct'] * regression_factor
        }

        projections.append(proj_dict)

    return pd.DataFrame(projections)

# Example: 5-year projection for top prospect
prospect_profile = {
    'name': 'Jackson Holliday',
    'age': 20,
    'stats': pd.Series({
        'wRC+': 135,
        'ISO': .225,
        'K_pct': 21.5,
        'BB_pct': 12.8
    })
}

holliday_projection = create_multi_year_projection(prospect_profile)
print("\nJackson Holliday 5-Year Projection:")
print(holliday_projection.round(1))

15.4 Breakout Candidates

Identifying breakout candidates before they emerge represents significant competitive advantage. Analytics can reveal players whose underlying metrics suggest imminent improvement.

Leading Indicators of Breakout Performance

library(tidyverse)
library(randomForest)

# Identify breakout candidates using key indicators
identify_breakout_candidates <- function(prospect_pool) {
  # Calculate breakout indicators
  candidates <- prospect_pool %>%
    mutate(
      # Key indicators
      discipline_improvement = BB_pct_current - BB_pct_prior,
      k_rate_improvement = K_pct_prior - K_pct_current,
      contact_quality_jump = avg_exit_velo - prior_avg_exit_velo,

      # Composite breakout score
      breakout_score = (
        (discipline_improvement * 10) +
        (k_rate_improvement * 8) +
        (contact_quality_jump * 2) +
        (barrel_rate * 1.5) +
        (hard_hit_rate * 0.5)
      ),

      # Age factor (young breakouts more valuable)
      age_adjusted_score = breakout_score * (28 - age) / 8
    ) %>%
    filter(
      # Minimum playing time
      PA >= 250,
      # Positive trends
      discipline_improvement > 0 | k_rate_improvement > 0,
      # Age range
      age <= 25
    ) %>%
    arrange(desc(age_adjusted_score))

  return(candidates)
}

# Example dataset: 2023 AA/AAA prospects
prospect_pool_2023 <- tibble(
  name = c("Elly De La Cruz", "Jasson Dominguez", "Curtis Mead",
           "Colton Cowser", "Jordan Walker"),
  age = c(21, 20, 22, 23, 21),
  level = c("AA", "AA", "AAA", "AAA", "AA"),
  PA = c(412, 315, 485, 521, 456),
  BB_pct_prior = c(8.2, 10.5, 7.8, 12.1, 6.5),
  BB_pct_current = c(10.4, 12.8, 9.2, 14.5, 8.8),
  K_pct_prior = c(28.5, 25.2, 22.8, 24.5, 26.2),
  K_pct_current = c(25.2, 23.1, 21.5, 22.8, 23.8),
  prior_avg_exit_velo = c(88.2, 89.5, 87.8, 90.2, 88.8),
  avg_exit_velo = c(91.5, 91.2, 89.5, 91.8, 91.2),
  barrel_rate = c(12.5, 11.8, 8.5, 10.2, 13.5),
  hard_hit_rate = c(45.2, 42.8, 38.5, 43.2, 46.8)
)

breakout_candidates <- identify_breakout_candidates(prospect_pool_2023)

print("Top Breakout Candidates:")
print(breakout_candidates %>%
  select(name, age, level, breakout_score, age_adjusted_score) %>%
  head(5))

Swing Decision Metrics

Advanced tracking data reveals swing decisions—chase rate, in-zone contact, and pitch recognition metrics that often improve before traditional statistics reflect the change.

class SwingDecisionAnalyzer:
    def __init__(self):
        self.zone_thresholds = {
            'in_zone': 0.75,  # Expected swing rate in zone
            'out_zone': 0.30  # Expected swing rate out of zone
        }

    def calculate_swing_metrics(self, pitch_data):
        """
        Calculate advanced swing decision metrics
        """
        # Separate in-zone and out-of-zone pitches
        in_zone = pitch_data[pitch_data['in_zone'] == True]
        out_zone = pitch_data[pitch_data['in_zone'] == False]

        metrics = {
            'z_swing_pct': (in_zone['swing'] == True).sum() / len(in_zone) * 100,
            'o_swing_pct': (out_zone['swing'] == True).sum() / len(out_zone) * 100,
            'z_contact_pct': ((in_zone['swing'] == True) & (in_zone['contact'] == True)).sum() /
                            (in_zone['swing'] == True).sum() * 100,
            'o_contact_pct': ((out_zone['swing'] == True) & (out_zone['contact'] == True)).sum() /
                            (out_zone['swing'] == True).sum() * 100,
            'chase_rate': (out_zone['swing'] == True).sum() / len(out_zone) * 100,
            'whiff_rate': (pitch_data[pitch_data['swing'] == True]['contact'] == False).sum() /
                         (pitch_data['swing'] == True).sum() * 100
        }

        # Calculate decision value score
        metrics['swing_decision_score'] = (
            metrics['z_swing_pct'] * 0.3 +
            (100 - metrics['o_swing_pct']) * 0.4 +
            metrics['z_contact_pct'] * 0.3
        )

        return metrics

    def identify_improvement(self, current_metrics, prior_metrics):
        """
        Identify meaningful improvements in swing decisions
        """
        improvements = {}

        # Key improvements to track
        improvements['chase_improvement'] = prior_metrics['chase_rate'] - current_metrics['chase_rate']
        improvements['zone_contact_improvement'] = current_metrics['z_contact_pct'] - prior_metrics['z_contact_pct']
        improvements['whiff_improvement'] = prior_metrics['whiff_rate'] - current_metrics['whiff_rate']

        # Overall improvement score
        improvements['total_improvement'] = (
            improvements['chase_improvement'] * 0.40 +
            improvements['zone_contact_improvement'] * 0.35 +
            improvements['whiff_improvement'] * 0.25
        )

        return improvements

# Example: Analyzing Corbin Carroll's swing decision improvement
np.random.seed(42)

# Simulate pitch-by-pitch data
def generate_pitch_data(n_pitches, chase_rate, z_contact, whiff_rate):
    data = pd.DataFrame({
        'in_zone': np.random.choice([True, False], n_pitches, p=[0.45, 0.55]),
        'swing': False,
        'contact': False
    })

    # In-zone swings
    data.loc[data['in_zone'] == True, 'swing'] = np.random.choice(
        [True, False], (data['in_zone'] == True).sum(), p=[0.70, 0.30]
    )

    # Out-of-zone swings (chase)
    data.loc[data['in_zone'] == False, 'swing'] = np.random.choice(
        [True, False], (data['in_zone'] == False).sum(), p=[chase_rate, 1-chase_rate]
    )

    # Contact on swings
    data.loc[(data['swing'] == True) & (data['in_zone'] == True), 'contact'] = np.random.choice(
        [True, False], ((data['swing'] == True) & (data['in_zone'] == True)).sum(),
        p=[z_contact, 1-z_contact]
    )

    data.loc[(data['swing'] == True) & (data['in_zone'] == False), 'contact'] = np.random.choice(
        [True, False], ((data['swing'] == True) & (data['in_zone'] == False)).sum(),
        p=[0.55, 0.45]
    )

    return data

# Prior year (2021)
carroll_2021 = generate_pitch_data(1500, chase_rate=0.32, z_contact=0.82, whiff_rate=0.25)

# Current year (2022) - improved
carroll_2022 = generate_pitch_data(1800, chase_rate=0.25, z_contact=0.88, whiff_rate=0.19)

analyzer = SwingDecisionAnalyzer()
metrics_2021 = analyzer.calculate_swing_metrics(carroll_2021)
metrics_2022 = analyzer.calculate_swing_metrics(carroll_2022)
improvements = analyzer.identify_improvement(metrics_2022, metrics_2021)

print("\nCorbin Carroll Swing Decision Improvement (2021 vs 2022):")
print(f"Chase Rate: {metrics_2021['chase_rate']:.1f}% → {metrics_2022['chase_rate']:.1f}% "
      f"(Δ {improvements['chase_improvement']:.1f}%)")
print(f"Zone Contact: {metrics_2021['z_contact_pct']:.1f}% → {metrics_2022['z_contact_pct']:.1f}% "
      f"(Δ {improvements['zone_contact_improvement']:.1f}%)")
print(f"Whiff Rate: {metrics_2021['whiff_rate']:.1f}% → {metrics_2022['whiff_rate']:.1f}% "
      f"(Δ {improvements['whiff_improvement']:.1f}%)")
print(f"\nOverall Improvement Score: {improvements['total_improvement']:.2f}")

Power Development Indicators

Power often develops later than hit tool, making it critical to identify prospects showing early power indicators even if home run totals remain modest.

def analyze_power_development(batted_ball_data):
    """
    Identify power development using batted ball quality
    """
    # Calculate power indicators
    indicators = {
        'avg_distance': batted_ball_data['hit_distance'].mean(),
        'max_distance': batted_ball_data['hit_distance'].max(),
        'avg_exit_velo': batted_ball_data['exit_velocity'].mean(),
        'max_exit_velo': batted_ball_data['exit_velocity'].max(),
        '95mph+_rate': (batted_ball_data['exit_velocity'] >= 95).sum() / len(batted_ball_data) * 100,
        '100mph+_rate': (batted_ball_data['exit_velocity'] >= 100).sum() / len(batted_ball_data) * 100,
        'barrel_rate': (batted_ball_data['barrel'] == 1).sum() / len(batted_ball_data) * 100,
        'optimal_la_rate': ((batted_ball_data['launch_angle'] >= 15) &
                           (batted_ball_data['launch_angle'] <= 35)).sum() / len(batted_ball_data) * 100
    }

    # Power potential score (0-100)
    indicators['power_potential'] = (
        (indicators['avg_exit_velo'] - 82) * 2.5 +
        indicators['barrel_rate'] * 2 +
        indicators['95mph+_rate'] * 0.5 +
        (indicators['optimal_la_rate'] - 20) * 0.8
    )

    return indicators

# Example: Compare power development of similar prospects
np.random.seed(123)

# Prospect A: Raw power, needs refinement
prospect_a_bb = pd.DataFrame({
    'exit_velocity': np.random.normal(91.5, 8, 300),
    'launch_angle': np.random.normal(8, 22, 300),  # Low avg launch angle
    'hit_distance': np.random.normal(260, 75, 300),
    'barrel': np.random.choice([0, 1], 300, p=[0.90, 0.10])
})

# Prospect B: Growing power, better approach
prospect_b_bb = pd.DataFrame({
    'exit_velocity': np.random.normal(89.5, 7, 300),
    'launch_angle': np.random.normal(14, 18, 300),  # Better launch angle
    'hit_distance': np.random.normal(270, 70, 300),
    'barrel': np.random.choice([0, 1], 300, p=[0.92, 0.08])
})

power_a = analyze_power_development(prospect_a_bb)
power_b = analyze_power_development(prospect_b_bb)

print("\nPower Development Comparison:")
print(f"\nProspect A (Raw Power):")
print(f"  Avg Exit Velo: {power_a['avg_exit_velo']:.1f} mph")
print(f"  Barrel Rate: {power_a['barrel_rate']:.1f}%")
print(f"  Optimal LA Rate: {power_a['optimal_la_rate']:.1f}%")
print(f"  Power Potential Score: {power_a['power_potential']:.1f}")

print(f"\nProspect B (Refined Approach):")
print(f"  Avg Exit Velo: {power_b['avg_exit_velo']:.1f} mph")
print(f"  Barrel Rate: {power_b['barrel_rate']:.1f}%")
print(f"  Optimal LA Rate: {power_b['optimal_la_rate']:.1f}%")
print(f"  Power Potential Score: {power_b['power_potential']:.1f}")
R
library(tidyverse)
library(randomForest)

# Identify breakout candidates using key indicators
identify_breakout_candidates <- function(prospect_pool) {
  # Calculate breakout indicators
  candidates <- prospect_pool %>%
    mutate(
      # Key indicators
      discipline_improvement = BB_pct_current - BB_pct_prior,
      k_rate_improvement = K_pct_prior - K_pct_current,
      contact_quality_jump = avg_exit_velo - prior_avg_exit_velo,

      # Composite breakout score
      breakout_score = (
        (discipline_improvement * 10) +
        (k_rate_improvement * 8) +
        (contact_quality_jump * 2) +
        (barrel_rate * 1.5) +
        (hard_hit_rate * 0.5)
      ),

      # Age factor (young breakouts more valuable)
      age_adjusted_score = breakout_score * (28 - age) / 8
    ) %>%
    filter(
      # Minimum playing time
      PA >= 250,
      # Positive trends
      discipline_improvement > 0 | k_rate_improvement > 0,
      # Age range
      age <= 25
    ) %>%
    arrange(desc(age_adjusted_score))

  return(candidates)
}

# Example dataset: 2023 AA/AAA prospects
prospect_pool_2023 <- tibble(
  name = c("Elly De La Cruz", "Jasson Dominguez", "Curtis Mead",
           "Colton Cowser", "Jordan Walker"),
  age = c(21, 20, 22, 23, 21),
  level = c("AA", "AA", "AAA", "AAA", "AA"),
  PA = c(412, 315, 485, 521, 456),
  BB_pct_prior = c(8.2, 10.5, 7.8, 12.1, 6.5),
  BB_pct_current = c(10.4, 12.8, 9.2, 14.5, 8.8),
  K_pct_prior = c(28.5, 25.2, 22.8, 24.5, 26.2),
  K_pct_current = c(25.2, 23.1, 21.5, 22.8, 23.8),
  prior_avg_exit_velo = c(88.2, 89.5, 87.8, 90.2, 88.8),
  avg_exit_velo = c(91.5, 91.2, 89.5, 91.8, 91.2),
  barrel_rate = c(12.5, 11.8, 8.5, 10.2, 13.5),
  hard_hit_rate = c(45.2, 42.8, 38.5, 43.2, 46.8)
)

breakout_candidates <- identify_breakout_candidates(prospect_pool_2023)

print("Top Breakout Candidates:")
print(breakout_candidates %>%
  select(name, age, level, breakout_score, age_adjusted_score) %>%
  head(5))
Python
class SwingDecisionAnalyzer:
    def __init__(self):
        self.zone_thresholds = {
            'in_zone': 0.75,  # Expected swing rate in zone
            'out_zone': 0.30  # Expected swing rate out of zone
        }

    def calculate_swing_metrics(self, pitch_data):
        """
        Calculate advanced swing decision metrics
        """
        # Separate in-zone and out-of-zone pitches
        in_zone = pitch_data[pitch_data['in_zone'] == True]
        out_zone = pitch_data[pitch_data['in_zone'] == False]

        metrics = {
            'z_swing_pct': (in_zone['swing'] == True).sum() / len(in_zone) * 100,
            'o_swing_pct': (out_zone['swing'] == True).sum() / len(out_zone) * 100,
            'z_contact_pct': ((in_zone['swing'] == True) & (in_zone['contact'] == True)).sum() /
                            (in_zone['swing'] == True).sum() * 100,
            'o_contact_pct': ((out_zone['swing'] == True) & (out_zone['contact'] == True)).sum() /
                            (out_zone['swing'] == True).sum() * 100,
            'chase_rate': (out_zone['swing'] == True).sum() / len(out_zone) * 100,
            'whiff_rate': (pitch_data[pitch_data['swing'] == True]['contact'] == False).sum() /
                         (pitch_data['swing'] == True).sum() * 100
        }

        # Calculate decision value score
        metrics['swing_decision_score'] = (
            metrics['z_swing_pct'] * 0.3 +
            (100 - metrics['o_swing_pct']) * 0.4 +
            metrics['z_contact_pct'] * 0.3
        )

        return metrics

    def identify_improvement(self, current_metrics, prior_metrics):
        """
        Identify meaningful improvements in swing decisions
        """
        improvements = {}

        # Key improvements to track
        improvements['chase_improvement'] = prior_metrics['chase_rate'] - current_metrics['chase_rate']
        improvements['zone_contact_improvement'] = current_metrics['z_contact_pct'] - prior_metrics['z_contact_pct']
        improvements['whiff_improvement'] = prior_metrics['whiff_rate'] - current_metrics['whiff_rate']

        # Overall improvement score
        improvements['total_improvement'] = (
            improvements['chase_improvement'] * 0.40 +
            improvements['zone_contact_improvement'] * 0.35 +
            improvements['whiff_improvement'] * 0.25
        )

        return improvements

# Example: Analyzing Corbin Carroll's swing decision improvement
np.random.seed(42)

# Simulate pitch-by-pitch data
def generate_pitch_data(n_pitches, chase_rate, z_contact, whiff_rate):
    data = pd.DataFrame({
        'in_zone': np.random.choice([True, False], n_pitches, p=[0.45, 0.55]),
        'swing': False,
        'contact': False
    })

    # In-zone swings
    data.loc[data['in_zone'] == True, 'swing'] = np.random.choice(
        [True, False], (data['in_zone'] == True).sum(), p=[0.70, 0.30]
    )

    # Out-of-zone swings (chase)
    data.loc[data['in_zone'] == False, 'swing'] = np.random.choice(
        [True, False], (data['in_zone'] == False).sum(), p=[chase_rate, 1-chase_rate]
    )

    # Contact on swings
    data.loc[(data['swing'] == True) & (data['in_zone'] == True), 'contact'] = np.random.choice(
        [True, False], ((data['swing'] == True) & (data['in_zone'] == True)).sum(),
        p=[z_contact, 1-z_contact]
    )

    data.loc[(data['swing'] == True) & (data['in_zone'] == False), 'contact'] = np.random.choice(
        [True, False], ((data['swing'] == True) & (data['in_zone'] == False)).sum(),
        p=[0.55, 0.45]
    )

    return data

# Prior year (2021)
carroll_2021 = generate_pitch_data(1500, chase_rate=0.32, z_contact=0.82, whiff_rate=0.25)

# Current year (2022) - improved
carroll_2022 = generate_pitch_data(1800, chase_rate=0.25, z_contact=0.88, whiff_rate=0.19)

analyzer = SwingDecisionAnalyzer()
metrics_2021 = analyzer.calculate_swing_metrics(carroll_2021)
metrics_2022 = analyzer.calculate_swing_metrics(carroll_2022)
improvements = analyzer.identify_improvement(metrics_2022, metrics_2021)

print("\nCorbin Carroll Swing Decision Improvement (2021 vs 2022):")
print(f"Chase Rate: {metrics_2021['chase_rate']:.1f}% → {metrics_2022['chase_rate']:.1f}% "
      f"(Δ {improvements['chase_improvement']:.1f}%)")
print(f"Zone Contact: {metrics_2021['z_contact_pct']:.1f}% → {metrics_2022['z_contact_pct']:.1f}% "
      f"(Δ {improvements['zone_contact_improvement']:.1f}%)")
print(f"Whiff Rate: {metrics_2021['whiff_rate']:.1f}% → {metrics_2022['whiff_rate']:.1f}% "
      f"(Δ {improvements['whiff_improvement']:.1f}%)")
print(f"\nOverall Improvement Score: {improvements['total_improvement']:.2f}")
Python
def analyze_power_development(batted_ball_data):
    """
    Identify power development using batted ball quality
    """
    # Calculate power indicators
    indicators = {
        'avg_distance': batted_ball_data['hit_distance'].mean(),
        'max_distance': batted_ball_data['hit_distance'].max(),
        'avg_exit_velo': batted_ball_data['exit_velocity'].mean(),
        'max_exit_velo': batted_ball_data['exit_velocity'].max(),
        '95mph+_rate': (batted_ball_data['exit_velocity'] >= 95).sum() / len(batted_ball_data) * 100,
        '100mph+_rate': (batted_ball_data['exit_velocity'] >= 100).sum() / len(batted_ball_data) * 100,
        'barrel_rate': (batted_ball_data['barrel'] == 1).sum() / len(batted_ball_data) * 100,
        'optimal_la_rate': ((batted_ball_data['launch_angle'] >= 15) &
                           (batted_ball_data['launch_angle'] <= 35)).sum() / len(batted_ball_data) * 100
    }

    # Power potential score (0-100)
    indicators['power_potential'] = (
        (indicators['avg_exit_velo'] - 82) * 2.5 +
        indicators['barrel_rate'] * 2 +
        indicators['95mph+_rate'] * 0.5 +
        (indicators['optimal_la_rate'] - 20) * 0.8
    )

    return indicators

# Example: Compare power development of similar prospects
np.random.seed(123)

# Prospect A: Raw power, needs refinement
prospect_a_bb = pd.DataFrame({
    'exit_velocity': np.random.normal(91.5, 8, 300),
    'launch_angle': np.random.normal(8, 22, 300),  # Low avg launch angle
    'hit_distance': np.random.normal(260, 75, 300),
    'barrel': np.random.choice([0, 1], 300, p=[0.90, 0.10])
})

# Prospect B: Growing power, better approach
prospect_b_bb = pd.DataFrame({
    'exit_velocity': np.random.normal(89.5, 7, 300),
    'launch_angle': np.random.normal(14, 18, 300),  # Better launch angle
    'hit_distance': np.random.normal(270, 70, 300),
    'barrel': np.random.choice([0, 1], 300, p=[0.92, 0.08])
})

power_a = analyze_power_development(prospect_a_bb)
power_b = analyze_power_development(prospect_b_bb)

print("\nPower Development Comparison:")
print(f"\nProspect A (Raw Power):")
print(f"  Avg Exit Velo: {power_a['avg_exit_velo']:.1f} mph")
print(f"  Barrel Rate: {power_a['barrel_rate']:.1f}%")
print(f"  Optimal LA Rate: {power_a['optimal_la_rate']:.1f}%")
print(f"  Power Potential Score: {power_a['power_potential']:.1f}")

print(f"\nProspect B (Refined Approach):")
print(f"  Avg Exit Velo: {power_b['avg_exit_velo']:.1f} mph")
print(f"  Barrel Rate: {power_b['barrel_rate']:.1f}%")
print(f"  Optimal LA Rate: {power_b['optimal_la_rate']:.1f}%")
print(f"  Power Potential Score: {power_b['power_potential']:.1f}")

15.5 International Scouting

International markets—particularly Latin America and Asia—provide critical talent pipelines. Analytics in international scouting face unique challenges: limited statistical data, cultural differences, and varying competitive contexts.

International Prospect Evaluation Framework

library(tidyverse)

# International prospect evaluation
evaluate_international_prospect <- function(prospect_data, market) {
  # Market-specific adjustment factors
  market_factors <- list(
    "Dominican" = list(development = 1.15, tools = 1.10, risk = 1.20),
    "Venezuela" = list(development = 1.12, tools = 1.08, risk = 1.18),
    "Cuba" = list(development = 0.95, tools = 1.05, risk = 0.90),
    "Japan" = list(development = 0.85, tools = 0.95, risk = 0.75),
    "Korea" = list(development = 0.88, tools = 0.92, risk = 0.80)
  )

  factors <- market_factors[[market]]

  # Evaluate tools (20-80 scale)
  prospect_data <- prospect_data %>%
    mutate(
      # Adjust raw tools for market context
      hit_adjusted = hit_grade * factors$tools,
      power_adjusted = power_grade * factors$tools,
      speed_adjusted = speed_grade,

      # Development timeline adjustment
      eta_years = base_eta * factors$development,

      # Risk adjustment
      risk_score = base_risk * factors$risk,

      # Composite future value
      fv_grade = (hit_adjusted * 0.30 +
                  power_adjusted * 0.25 +
                  speed_adjusted * 0.15 +
                  field_grade * 0.20 +
                  arm_grade * 0.10) * (1 - (risk_score * 0.01))
    )

  return(prospect_data)
}

# Example: Evaluating 2023 international class
intl_class_2023 <- tibble(
  name = c("Ethan Salas", "Jaison Chourio", "Cristian Hernandez",
           "Armando Cruz", "Colin Houck"),
  age = c(16, 17, 16, 17, 16),
  market = c("Venezuela", "Venezuela", "Dominican", "Dominican", "Cuba"),
  hit_grade = c(55, 60, 50, 55, 60),
  power_grade = c(60, 55, 55, 50, 55),
  speed_grade = c(50, 60, 55, 50, 45),
  field_grade = c(60, 55, 50, 55, 50),
  arm_grade = c(70, 55, 60, 50, 55),
  base_eta = c(5, 4, 5, 5, 3),
  base_risk = c(45, 40, 50, 48, 35)
)

# Evaluate each prospect
evaluated_intl <- intl_class_2023 %>%
  rowwise() %>%
  mutate(evaluation = list(evaluate_international_prospect(
    cur_data(), market
  ))) %>%
  unnest(evaluation)

print("International Prospect Evaluations:")
print(evaluated_intl %>%
  select(name, market, fv_grade, eta_years, risk_score) %>%
  arrange(desc(fv_grade)))

Amateur Statistical Analysis

For markets with organized leagues (Japan, Korea, Taiwan), statistical analysis becomes possible but requires careful context adjustment.

class InternationalStatTranslator:
    def __init__(self):
        # League difficulty multipliers relative to MLB (1.00)
        self.league_factors = {
            'NPB': 0.78,      # Japanese NPB
            'KBO': 0.72,      # Korean KBO
            'CPBL': 0.65,     # Taiwan CPBL
            'Cuban': 0.68,    # Cuban National Series
            'Mexican': 0.62   # Mexican League
        }

    def translate_to_mlb_equivalent(self, stats, league, age):
        """
        Translate international league stats to MLB equivalents
        """
        league_factor = self.league_factors.get(league, 0.60)

        # Age adjustment (younger players get bonus)
        age_adjustment = 1.0 + ((25 - age) * 0.015) if age < 25 else 1.0

        # Apply translations with age adjustment
        mlb_equivalent = {
            'AVG': stats['AVG'] * league_factor * age_adjustment * 0.95,
            'OBP': stats['OBP'] * league_factor * age_adjustment * 0.98,
            'SLG': stats['SLG'] * league_factor * age_adjustment * 0.90,
            'HR': stats['HR'] * league_factor * age_adjustment * 0.85,
            'BB_pct': stats['BB_pct'] * league_factor * age_adjustment,
            'K_pct': stats['K_pct'] / (league_factor * age_adjustment)
        }

        # Calculate expected wRC+
        mlb_equivalent['wRC+'] = self.calculate_wrc_plus(mlb_equivalent)

        return mlb_equivalent

    def calculate_wrc_plus(self, stats):
        """Calculate estimated wRC+ from basic stats"""
        # Simplified wRC+ estimate
        wOBA = (0.69 * stats['BB_pct'] +
                0.88 * (stats['AVG'] - (stats['SLG'] - stats['AVG'])) +
                1.27 * (stats['SLG'] - stats['AVG']))

        # League average wOBA ~ 0.320
        wRC_plus = (wOBA / 0.320) * 100
        return wRC_plus

# Example: Translate Masataka Yoshida's NPB stats to MLB projection
yoshida_npb_2022 = {
    'AVG': .335,
    'OBP': .447,
    'SLG': .562,
    'HR': 21,
    'BB_pct': 0.159,
    'K_pct': 0.107
}

translator = InternationalStatTranslator()
yoshida_mlb_proj = translator.translate_to_mlb_equivalent(
    yoshida_npb_2022, 'NPB', age=29
)

print("\nMasataka Yoshida NPB to MLB Translation:")
print(f"NPB Stats (2022): .335/.447/.562, 21 HR")
print(f"\nMLB Projection:")
print(f"  AVG: {yoshida_mlb_proj['AVG']:.3f}")
print(f"  OBP: {yoshida_mlb_proj['OBP']:.3f}")
print(f"  SLG: {yoshida_mlb_proj['SLG']:.3f}")
print(f"  BB%: {yoshida_mlb_proj['BB_pct']:.1%}")
print(f"  K%: {yoshida_mlb_proj['K_pct']:.1%}")
print(f"  Projected wRC+: {yoshida_mlb_proj['wRC+']:.0f}")

# Example: Translate Jung Hoo Lee's KBO stats
lee_kbo_2023 = {
    'AVG': .349,
    'OBP': .421,
    'SLG': .575,
    'HR': 23,
    'BB_pct': 0.098,
    'K_pct': 0.089
}

lee_mlb_proj = translator.translate_to_mlb_equivalent(
    lee_kbo_2023, 'KBO', age=25
)

print("\n\nJung Hoo Lee KBO to MLB Translation:")
print(f"KBO Stats (2023): .349/.421/.575, 23 HR")
print(f"\nMLB Projection:")
print(f"  AVG: {lee_mlb_proj['AVG']:.3f}")
print(f"  OBP: {lee_mlb_proj['OBP']:.3f}")
print(f"  SLG: {lee_mlb_proj['SLG']:.3f}")
print(f"  BB%: {lee_mlb_proj['BB_pct']:.1%}")
print(f"  K%: {lee_mlb_proj['K_pct']:.1%}")
print(f"  Projected wRC+: {lee_mlb_proj['wRC+']:.0f}")

Physical Projection Models

For young international amateurs, physical projection models estimate future size and strength development, critical for power projection.

# Physical growth projection model
project_physical_development <- function(current_metrics, current_age) {
  # Average growth patterns by age
  growth_factors <- tibble(
    age = 16:22,
    height_factor = c(1.03, 1.02, 1.01, 1.00, 1.00, 1.00, 1.00),
    weight_factor = c(1.15, 1.12, 1.08, 1.05, 1.02, 1.01, 1.00),
    strength_factor = c(1.20, 1.18, 1.15, 1.10, 1.05, 1.02, 1.00)
  )

  # Project to age 22 (physical maturity)
  if (current_age >= 22) {
    return(current_metrics)
  }

  target_factors <- growth_factors %>%
    filter(age == 22) %>%
    select(-age)

  current_factors <- growth_factors %>%
    filter(age == current_age) %>%
    select(-age)

  projected <- current_metrics %>%
    mutate(
      projected_height = height * (target_factors$height_factor / current_factors$height_factor),
      projected_weight = weight * (target_factors$weight_factor / current_factors$weight_factor),
      projected_strength = strength_score * (target_factors$strength_factor / current_factors$strength_factor),

      # Estimate power grade based on physical projection
      projected_power = case_when(
        projected_weight >= 210 & projected_strength >= 65 ~ 60,
        projected_weight >= 195 & projected_strength >= 60 ~ 55,
        projected_weight >= 180 & projected_strength >= 55 ~ 50,
        TRUE ~ 45
      )
    )

  return(projected)
}

# Example: Project 16-year-old Dominican prospect
young_prospect <- tibble(
  name = "Prospect A",
  age = 16,
  height = 70,  # inches
  weight = 165,
  strength_score = 45,
  current_power = 40
)

physical_projection <- project_physical_development(young_prospect, 16)

print("Physical Development Projection:")
print(glue::glue(
  "{young_prospect$name} (Age {young_prospect$age})
  Current: {young_prospect$height}in, {young_prospect$weight}lbs, Power: {young_prospect$current_power}
  Projected: {round(physical_projection$projected_height, 1)}in, {round(physical_projection$projected_weight, 0)}lbs, Power: {physical_projection$projected_power}"
))
R
library(tidyverse)

# International prospect evaluation
evaluate_international_prospect <- function(prospect_data, market) {
  # Market-specific adjustment factors
  market_factors <- list(
    "Dominican" = list(development = 1.15, tools = 1.10, risk = 1.20),
    "Venezuela" = list(development = 1.12, tools = 1.08, risk = 1.18),
    "Cuba" = list(development = 0.95, tools = 1.05, risk = 0.90),
    "Japan" = list(development = 0.85, tools = 0.95, risk = 0.75),
    "Korea" = list(development = 0.88, tools = 0.92, risk = 0.80)
  )

  factors <- market_factors[[market]]

  # Evaluate tools (20-80 scale)
  prospect_data <- prospect_data %>%
    mutate(
      # Adjust raw tools for market context
      hit_adjusted = hit_grade * factors$tools,
      power_adjusted = power_grade * factors$tools,
      speed_adjusted = speed_grade,

      # Development timeline adjustment
      eta_years = base_eta * factors$development,

      # Risk adjustment
      risk_score = base_risk * factors$risk,

      # Composite future value
      fv_grade = (hit_adjusted * 0.30 +
                  power_adjusted * 0.25 +
                  speed_adjusted * 0.15 +
                  field_grade * 0.20 +
                  arm_grade * 0.10) * (1 - (risk_score * 0.01))
    )

  return(prospect_data)
}

# Example: Evaluating 2023 international class
intl_class_2023 <- tibble(
  name = c("Ethan Salas", "Jaison Chourio", "Cristian Hernandez",
           "Armando Cruz", "Colin Houck"),
  age = c(16, 17, 16, 17, 16),
  market = c("Venezuela", "Venezuela", "Dominican", "Dominican", "Cuba"),
  hit_grade = c(55, 60, 50, 55, 60),
  power_grade = c(60, 55, 55, 50, 55),
  speed_grade = c(50, 60, 55, 50, 45),
  field_grade = c(60, 55, 50, 55, 50),
  arm_grade = c(70, 55, 60, 50, 55),
  base_eta = c(5, 4, 5, 5, 3),
  base_risk = c(45, 40, 50, 48, 35)
)

# Evaluate each prospect
evaluated_intl <- intl_class_2023 %>%
  rowwise() %>%
  mutate(evaluation = list(evaluate_international_prospect(
    cur_data(), market
  ))) %>%
  unnest(evaluation)

print("International Prospect Evaluations:")
print(evaluated_intl %>%
  select(name, market, fv_grade, eta_years, risk_score) %>%
  arrange(desc(fv_grade)))
R
# Physical growth projection model
project_physical_development <- function(current_metrics, current_age) {
  # Average growth patterns by age
  growth_factors <- tibble(
    age = 16:22,
    height_factor = c(1.03, 1.02, 1.01, 1.00, 1.00, 1.00, 1.00),
    weight_factor = c(1.15, 1.12, 1.08, 1.05, 1.02, 1.01, 1.00),
    strength_factor = c(1.20, 1.18, 1.15, 1.10, 1.05, 1.02, 1.00)
  )

  # Project to age 22 (physical maturity)
  if (current_age >= 22) {
    return(current_metrics)
  }

  target_factors <- growth_factors %>%
    filter(age == 22) %>%
    select(-age)

  current_factors <- growth_factors %>%
    filter(age == current_age) %>%
    select(-age)

  projected <- current_metrics %>%
    mutate(
      projected_height = height * (target_factors$height_factor / current_factors$height_factor),
      projected_weight = weight * (target_factors$weight_factor / current_factors$weight_factor),
      projected_strength = strength_score * (target_factors$strength_factor / current_factors$strength_factor),

      # Estimate power grade based on physical projection
      projected_power = case_when(
        projected_weight >= 210 & projected_strength >= 65 ~ 60,
        projected_weight >= 195 & projected_strength >= 60 ~ 55,
        projected_weight >= 180 & projected_strength >= 55 ~ 50,
        TRUE ~ 45
      )
    )

  return(projected)
}

# Example: Project 16-year-old Dominican prospect
young_prospect <- tibble(
  name = "Prospect A",
  age = 16,
  height = 70,  # inches
  weight = 165,
  strength_score = 45,
  current_power = 40
)

physical_projection <- project_physical_development(young_prospect, 16)

print("Physical Development Projection:")
print(glue::glue(
  "{young_prospect$name} (Age {young_prospect$age})
  Current: {young_prospect$height}in, {young_prospect$weight}lbs, Power: {young_prospect$current_power}
  Projected: {round(physical_projection$projected_height, 1)}in, {round(physical_projection$projected_weight, 0)}lbs, Power: {physical_projection$projected_power}"
))
Python
class InternationalStatTranslator:
    def __init__(self):
        # League difficulty multipliers relative to MLB (1.00)
        self.league_factors = {
            'NPB': 0.78,      # Japanese NPB
            'KBO': 0.72,      # Korean KBO
            'CPBL': 0.65,     # Taiwan CPBL
            'Cuban': 0.68,    # Cuban National Series
            'Mexican': 0.62   # Mexican League
        }

    def translate_to_mlb_equivalent(self, stats, league, age):
        """
        Translate international league stats to MLB equivalents
        """
        league_factor = self.league_factors.get(league, 0.60)

        # Age adjustment (younger players get bonus)
        age_adjustment = 1.0 + ((25 - age) * 0.015) if age < 25 else 1.0

        # Apply translations with age adjustment
        mlb_equivalent = {
            'AVG': stats['AVG'] * league_factor * age_adjustment * 0.95,
            'OBP': stats['OBP'] * league_factor * age_adjustment * 0.98,
            'SLG': stats['SLG'] * league_factor * age_adjustment * 0.90,
            'HR': stats['HR'] * league_factor * age_adjustment * 0.85,
            'BB_pct': stats['BB_pct'] * league_factor * age_adjustment,
            'K_pct': stats['K_pct'] / (league_factor * age_adjustment)
        }

        # Calculate expected wRC+
        mlb_equivalent['wRC+'] = self.calculate_wrc_plus(mlb_equivalent)

        return mlb_equivalent

    def calculate_wrc_plus(self, stats):
        """Calculate estimated wRC+ from basic stats"""
        # Simplified wRC+ estimate
        wOBA = (0.69 * stats['BB_pct'] +
                0.88 * (stats['AVG'] - (stats['SLG'] - stats['AVG'])) +
                1.27 * (stats['SLG'] - stats['AVG']))

        # League average wOBA ~ 0.320
        wRC_plus = (wOBA / 0.320) * 100
        return wRC_plus

# Example: Translate Masataka Yoshida's NPB stats to MLB projection
yoshida_npb_2022 = {
    'AVG': .335,
    'OBP': .447,
    'SLG': .562,
    'HR': 21,
    'BB_pct': 0.159,
    'K_pct': 0.107
}

translator = InternationalStatTranslator()
yoshida_mlb_proj = translator.translate_to_mlb_equivalent(
    yoshida_npb_2022, 'NPB', age=29
)

print("\nMasataka Yoshida NPB to MLB Translation:")
print(f"NPB Stats (2022): .335/.447/.562, 21 HR")
print(f"\nMLB Projection:")
print(f"  AVG: {yoshida_mlb_proj['AVG']:.3f}")
print(f"  OBP: {yoshida_mlb_proj['OBP']:.3f}")
print(f"  SLG: {yoshida_mlb_proj['SLG']:.3f}")
print(f"  BB%: {yoshida_mlb_proj['BB_pct']:.1%}")
print(f"  K%: {yoshida_mlb_proj['K_pct']:.1%}")
print(f"  Projected wRC+: {yoshida_mlb_proj['wRC+']:.0f}")

# Example: Translate Jung Hoo Lee's KBO stats
lee_kbo_2023 = {
    'AVG': .349,
    'OBP': .421,
    'SLG': .575,
    'HR': 23,
    'BB_pct': 0.098,
    'K_pct': 0.089
}

lee_mlb_proj = translator.translate_to_mlb_equivalent(
    lee_kbo_2023, 'KBO', age=25
)

print("\n\nJung Hoo Lee KBO to MLB Translation:")
print(f"KBO Stats (2023): .349/.421/.575, 23 HR")
print(f"\nMLB Projection:")
print(f"  AVG: {lee_mlb_proj['AVG']:.3f}")
print(f"  OBP: {lee_mlb_proj['OBP']:.3f}")
print(f"  SLG: {lee_mlb_proj['SLG']:.3f}")
print(f"  BB%: {lee_mlb_proj['BB_pct']:.1%}")
print(f"  K%: {lee_mlb_proj['K_pct']:.1%}")
print(f"  Projected wRC+: {lee_mlb_proj['wRC+']:.0f}")

15.6 Call-Up Decisions

Determining the optimal time to promote a prospect represents one of the most consequential decisions in player development. Analytics can inform timing by balancing player readiness, service time considerations, and organizational need.

Readiness Assessment Framework

class CallUpDecisionModel:
    def __init__(self):
        self.readiness_weights = {
            'performance': 0.35,
            'skills': 0.30,
            'experience': 0.15,
            'need': 0.20
        }

    def assess_mlb_readiness(self, prospect_profile, team_context):
        """
        Assess whether prospect is ready for MLB promotion
        """
        # Performance component
        performance_score = self.calculate_performance_score(prospect_profile)

        # Skills component
        skills_score = self.calculate_skills_score(prospect_profile)

        # Experience component
        experience_score = self.calculate_experience_score(prospect_profile)

        # Organizational need component
        need_score = self.calculate_need_score(team_context)

        # Weighted composite
        readiness_score = (
            performance_score * self.readiness_weights['performance'] +
            skills_score * self.readiness_weights['skills'] +
            experience_score * self.readiness_weights['experience'] +
            need_score * self.readiness_weights['need']
        )

        # Service time consideration
        service_time_impact = self.calculate_service_time_value(
            prospect_profile, team_context
        )

        return {
            'readiness_score': readiness_score,
            'performance_score': performance_score,
            'skills_score': skills_score,
            'experience_score': experience_score,
            'need_score': need_score,
            'service_time_value': service_time_impact,
            'recommendation': self.make_recommendation(readiness_score, service_time_impact)
        }

    def calculate_performance_score(self, profile):
        """Score based on recent performance"""
        # Last 30 days wRC+ (AAA)
        recent_wrc = profile.get('recent_wRC+', 100)

        # Season wRC+
        season_wrc = profile.get('season_wRC+', 100)

        # Combine with recency bias
        performance = (recent_wrc * 0.6 + season_wrc * 0.4)

        # Normalize to 0-100
        score = min(100, max(0, (performance - 70) * 1.5))
        return score

    def calculate_skills_score(self, profile):
        """Score based on skill profile"""
        # Key skills
        plate_discipline = (profile.get('BB_pct', 8) - 4) * 8
        contact_ability = (85 - profile.get('K_pct', 23)) * 2
        power = profile.get('ISO', 0.150) * 200
        defense = profile.get('defensive_grade', 50) - 30

        score = (plate_discipline * 0.3 + contact_ability * 0.3 +
                power * 0.25 + defense * 0.15)

        return max(0, min(100, score))

    def calculate_experience_score(self, profile):
        """Score based on development experience"""
        pa_aaa = profile.get('AAA_PA', 0)
        pa_aa = profile.get('AA_PA', 0)

        # Prefer meaningful AAA experience
        experience = (pa_aaa * 0.7 + pa_aa * 0.3)

        # Normalize (300 PA = 50 score, 600 PA = 100 score)
        score = min(100, (experience / 6))
        return score

    def calculate_need_score(self, context):
        """Score based on organizational need"""
        position_depth = context.get('position_depth', 5)  # Number of MLB options
        current_production = context.get('position_wRC+', 100)  # Current position wRC+

        # High need = low depth or poor production
        depth_need = (6 - position_depth) * 15
        production_need = max(0, (100 - current_production) * 0.5)

        score = min(100, depth_need + production_need)
        return score

    def calculate_service_time_value(self, profile, context):
        """
        Calculate value of delaying call-up for service time
        """
        days_until_super2 = context.get('days_until_super2', 0)
        days_until_full_year = context.get('days_until_full_year', 0)

        prospect_value = profile.get('future_WAR', 2.5)

        # Value of extra year of control (roughly $8M per WAR)
        extra_year_value = prospect_value * 8_000_000

        # Discount based on time delay
        if days_until_full_year > 0 and days_until_full_year <= 20:
            return extra_year_value * 0.9  # High value to wait
        elif days_until_super2 > 0 and days_until_super2 <= 30:
            return extra_year_value * 0.3  # Moderate value to wait
        else:
            return 0

    def make_recommendation(self, readiness_score, service_time_value):
        """Make call-up recommendation"""
        if readiness_score >= 75:
            if service_time_value > 10_000_000:
                return "READY - Consider service time timing"
            else:
                return "READY - Call up now"
        elif readiness_score >= 60:
            return "CLOSE - Monitor closely, could be ready soon"
        elif readiness_score >= 40:
            return "DEVELOPING - Needs more time"
        else:
            return "NOT READY - Significant development needed"

# Example: Evaluate call-up decision for top prospects
model = CallUpDecisionModel()

# Prospect 1: Gunnar Henderson (June 2022)
henderson_profile = {
    'recent_wRC+': 138,
    'season_wRC+': 125,
    'BB_pct': 11.2,
    'K_pct': 22.5,
    'ISO': 0.215,
    'defensive_grade': 55,
    'AAA_PA': 245,
    'AA_PA': 412,
    'future_WAR': 3.5
}

orioles_context = {
    'position_depth': 3,  # SS/3B
    'position_wRC+': 88,  # Below average production
    'days_until_super2': 45,
    'days_until_full_year': 0
}

henderson_eval = model.assess_mlb_readiness(henderson_profile, orioles_context)

print("Gunnar Henderson Call-Up Evaluation:")
print(f"Readiness Score: {henderson_eval['readiness_score']:.1f}/100")
print(f"  Performance: {henderson_eval['performance_score']:.1f}")
print(f"  Skills: {henderson_eval['skills_score']:.1f}")
print(f"  Experience: {henderson_eval['experience_score']:.1f}")
print(f"  Need: {henderson_eval['need_score']:.1f}")
print(f"Service Time Value: ${henderson_eval['service_time_value']:,.0f}")
print(f"Recommendation: {henderson_eval['recommendation']}")

# Prospect 2: Jordan Walker (March 2023 - Opening Day consideration)
walker_profile = {
    'recent_wRC+': 115,  # Spring training
    'season_wRC+': 135,  # Previous AA season
    'BB_pct': 8.8,
    'K_pct': 23.8,
    'ISO': 0.235,
    'defensive_grade': 50,
    'AAA_PA': 0,  # No AAA experience
    'AA_PA': 456,
    'future_WAR': 4.0
}

cardinals_context = {
    'position_depth': 4,  # OF depth
    'position_wRC+': 105,  # Average production
    'days_until_super2': 0,
    'days_until_full_year': 15  # Opening Day decision
}

walker_eval = model.assess_mlb_readiness(walker_profile, cardinals_context)

print("\n\nJordan Walker Call-Up Evaluation:")
print(f"Readiness Score: {walker_eval['readiness_score']:.1f}/100")
print(f"  Performance: {walker_eval['performance_score']:.1f}")
print(f"  Skills: {walker_eval['skills_score']:.1f}")
print(f"  Experience: {walker_eval['experience_score']:.1f}")
print(f"  Need: {walker_eval['need_score']:.1f}")
print(f"Service Time Value: ${walker_eval['service_time_value']:,.0f}")
print(f"Recommendation: {walker_eval['recommendation']}")

Service Time Optimization

library(tidyverse)

# Service time calculator
calculate_service_time_scenarios <- function(call_up_date, season_year) {
  # MLB service time rules
  season_start <- as.Date(paste0(season_year, "-03-30"))
  season_end <- as.Date(paste0(season_year, "-10-01"))

  # Days in season
  total_days <- as.numeric(season_end - season_start)

  # Calculate days of service
  if (call_up_date < season_start) {
    service_days <- total_days
  } else if (call_up_date > season_end) {
    service_days <- 0
  } else {
    service_days <- as.numeric(season_end - call_up_date)
  }

  # 172 days = 1 year of service
  service_years <- service_days / 172

  # Free agency year (6 years of service)
  fa_year <- season_year + ceiling(6 - service_years)

  # Super Two cutoff (roughly top 22% of 2-3 year players, ~2.116 years)
  # Typically mid-April call-ups avoid Super Two
  super_two_cutoff <- as.Date(paste0(season_year, "-04-15"))
  is_super_two <- call_up_date < super_two_cutoff

  # Arbitration years
  if (is_super_two) {
    arb_years <- 4  # Super Two = 4 arb years
    arb_start_year <- season_year + ceiling(3 - service_years)
  } else {
    arb_years <- 3  # Normal = 3 arb years
    arb_start_year <- season_year + ceiling(3 - service_years)
  }

  return(tibble(
    call_up_date = call_up_date,
    service_days = service_days,
    service_years = service_years,
    is_super_two = is_super_two,
    arb_years = arb_years,
    arb_start_year = arb_start_year,
    fa_year = fa_year
  ))
}

# Analyze different call-up scenarios
scenarios <- tibble(
  scenario = c("Opening Day", "Mid-April", "Super Two Safe", "Mid-Season"),
  call_up_date = as.Date(c("2023-03-30", "2023-04-15", "2023-04-25", "2023-06-15"))
)

service_time_analysis <- scenarios %>%
  rowwise() %>%
  mutate(analysis = list(calculate_service_time_scenarios(call_up_date, 2023))) %>%
  unnest(analysis)

print("Service Time Impact by Call-Up Date:")
print(service_time_analysis %>%
  select(scenario, call_up_date, service_days, is_super_two, arb_years, fa_year))

# Calculate financial impact
calculate_financial_impact <- function(scenarios_df, projected_war) {
  # Arbitration cost estimates per WAR
  arb1_rate <- 3.0  # $M per WAR
  arb2_rate <- 5.5
  arb3_rate <- 8.0
  arb4_rate <- 10.0  # Super Two only

  scenarios_df <- scenarios_df %>%
    mutate(
      # Estimate total arbitration costs
      arb_cost = case_when(
        arb_years == 4 ~ projected_war * (arb1_rate + arb2_rate + arb3_rate + arb4_rate),
        arb_years == 3 ~ projected_war * (arb1_rate + arb2_rate + arb3_rate),
        TRUE ~ 0
      ),

      # Extra year of control value (pre-FA year at arb3 rates vs FA market)
      fa_year_value = projected_war * 8.0,  # Market rate

      # If FA year is delayed, that's value retained
      years_delayed = fa_year - min(fa_year),
      control_value = years_delayed * fa_year_value
    )

  return(scenarios_df)
}

financial_analysis <- calculate_financial_impact(service_time_analysis, projected_war = 3.5)

print("\nFinancial Impact Analysis (Projected 3.5 WAR player):")
print(financial_analysis %>%
  select(scenario, arb_cost, control_value, fa_year) %>%
  mutate(total_value = control_value - (arb_cost - min(arb_cost))))

Performance-Based Triggers

class PerformanceTriggerSystem:
    def __init__(self):
        self.trigger_thresholds = {
            'AAA': {'wRC+': 120, 'K_pct': 25, 'BB_pct': 8, 'min_PA': 200},
            'AA': {'wRC+': 130, 'K_pct': 23, 'BB_pct': 9, 'min_PA': 300}
        }

    def check_promotion_triggers(self, player_stats, current_level):
        """
        Check if player has met performance triggers for promotion
        """
        thresholds = self.trigger_thresholds[current_level]
        triggers_met = []

        # Check each threshold
        if player_stats['PA'] >= thresholds['min_PA']:
            triggers_met.append('Playing Time')

        if player_stats['wRC+'] >= thresholds['wRC+']:
            triggers_met.append('Overall Performance')

        if player_stats['K_pct'] <= thresholds['K_pct']:
            triggers_met.append('Strikeout Rate')

        if player_stats['BB_pct'] >= thresholds['BB_pct']:
            triggers_met.append('Walk Rate')

        # Age consideration - young players get promoted more aggressively
        if player_stats['age'] <= 22 and player_stats['wRC+'] >= 110:
            triggers_met.append('Age-Adjusted Performance')

        # Determine readiness
        triggers_total = 5 if player_stats['age'] <= 22 else 4
        triggers_pct = len(triggers_met) / triggers_total

        if triggers_pct >= 0.75:
            recommendation = "PROMOTE"
        elif triggers_pct >= 0.50:
            recommendation = "MONITOR"
        else:
            recommendation = "CONTINUE DEVELOPMENT"

        return {
            'triggers_met': triggers_met,
            'triggers_percentage': triggers_pct,
            'recommendation': recommendation
        }

# Example: Monitor multiple prospects for promotion triggers
trigger_system = PerformanceTriggerSystem()

prospects_to_monitor = pd.DataFrame({
    'name': ['Prospect A', 'Prospect B', 'Prospect C'],
    'level': ['AAA', 'AA', 'AAA'],
    'age': [22, 20, 24],
    'PA': [285, 345, 412],
    'wRC+': [128, 135, 118],
    'K_pct': [23.5, 22.1, 26.8],
    'BB_pct': [9.2, 10.5, 7.8]
})

print("Promotion Trigger Analysis:\n")
for idx, prospect in prospects_to_monitor.iterrows():
    result = trigger_system.check_promotion_triggers(prospect, prospect['level'])

    print(f"{prospect['name']} ({prospect['level']}, Age {prospect['age']}):")
    print(f"  Triggers Met: {', '.join(result['triggers_met']) if result['triggers_met'] else 'None'}")
    print(f"  Readiness: {result['triggers_percentage']:.0%}")
    print(f"  Recommendation: {result['recommendation']}\n")
R
library(tidyverse)

# Service time calculator
calculate_service_time_scenarios <- function(call_up_date, season_year) {
  # MLB service time rules
  season_start <- as.Date(paste0(season_year, "-03-30"))
  season_end <- as.Date(paste0(season_year, "-10-01"))

  # Days in season
  total_days <- as.numeric(season_end - season_start)

  # Calculate days of service
  if (call_up_date < season_start) {
    service_days <- total_days
  } else if (call_up_date > season_end) {
    service_days <- 0
  } else {
    service_days <- as.numeric(season_end - call_up_date)
  }

  # 172 days = 1 year of service
  service_years <- service_days / 172

  # Free agency year (6 years of service)
  fa_year <- season_year + ceiling(6 - service_years)

  # Super Two cutoff (roughly top 22% of 2-3 year players, ~2.116 years)
  # Typically mid-April call-ups avoid Super Two
  super_two_cutoff <- as.Date(paste0(season_year, "-04-15"))
  is_super_two <- call_up_date < super_two_cutoff

  # Arbitration years
  if (is_super_two) {
    arb_years <- 4  # Super Two = 4 arb years
    arb_start_year <- season_year + ceiling(3 - service_years)
  } else {
    arb_years <- 3  # Normal = 3 arb years
    arb_start_year <- season_year + ceiling(3 - service_years)
  }

  return(tibble(
    call_up_date = call_up_date,
    service_days = service_days,
    service_years = service_years,
    is_super_two = is_super_two,
    arb_years = arb_years,
    arb_start_year = arb_start_year,
    fa_year = fa_year
  ))
}

# Analyze different call-up scenarios
scenarios <- tibble(
  scenario = c("Opening Day", "Mid-April", "Super Two Safe", "Mid-Season"),
  call_up_date = as.Date(c("2023-03-30", "2023-04-15", "2023-04-25", "2023-06-15"))
)

service_time_analysis <- scenarios %>%
  rowwise() %>%
  mutate(analysis = list(calculate_service_time_scenarios(call_up_date, 2023))) %>%
  unnest(analysis)

print("Service Time Impact by Call-Up Date:")
print(service_time_analysis %>%
  select(scenario, call_up_date, service_days, is_super_two, arb_years, fa_year))

# Calculate financial impact
calculate_financial_impact <- function(scenarios_df, projected_war) {
  # Arbitration cost estimates per WAR
  arb1_rate <- 3.0  # $M per WAR
  arb2_rate <- 5.5
  arb3_rate <- 8.0
  arb4_rate <- 10.0  # Super Two only

  scenarios_df <- scenarios_df %>%
    mutate(
      # Estimate total arbitration costs
      arb_cost = case_when(
        arb_years == 4 ~ projected_war * (arb1_rate + arb2_rate + arb3_rate + arb4_rate),
        arb_years == 3 ~ projected_war * (arb1_rate + arb2_rate + arb3_rate),
        TRUE ~ 0
      ),

      # Extra year of control value (pre-FA year at arb3 rates vs FA market)
      fa_year_value = projected_war * 8.0,  # Market rate

      # If FA year is delayed, that's value retained
      years_delayed = fa_year - min(fa_year),
      control_value = years_delayed * fa_year_value
    )

  return(scenarios_df)
}

financial_analysis <- calculate_financial_impact(service_time_analysis, projected_war = 3.5)

print("\nFinancial Impact Analysis (Projected 3.5 WAR player):")
print(financial_analysis %>%
  select(scenario, arb_cost, control_value, fa_year) %>%
  mutate(total_value = control_value - (arb_cost - min(arb_cost))))
Python
class CallUpDecisionModel:
    def __init__(self):
        self.readiness_weights = {
            'performance': 0.35,
            'skills': 0.30,
            'experience': 0.15,
            'need': 0.20
        }

    def assess_mlb_readiness(self, prospect_profile, team_context):
        """
        Assess whether prospect is ready for MLB promotion
        """
        # Performance component
        performance_score = self.calculate_performance_score(prospect_profile)

        # Skills component
        skills_score = self.calculate_skills_score(prospect_profile)

        # Experience component
        experience_score = self.calculate_experience_score(prospect_profile)

        # Organizational need component
        need_score = self.calculate_need_score(team_context)

        # Weighted composite
        readiness_score = (
            performance_score * self.readiness_weights['performance'] +
            skills_score * self.readiness_weights['skills'] +
            experience_score * self.readiness_weights['experience'] +
            need_score * self.readiness_weights['need']
        )

        # Service time consideration
        service_time_impact = self.calculate_service_time_value(
            prospect_profile, team_context
        )

        return {
            'readiness_score': readiness_score,
            'performance_score': performance_score,
            'skills_score': skills_score,
            'experience_score': experience_score,
            'need_score': need_score,
            'service_time_value': service_time_impact,
            'recommendation': self.make_recommendation(readiness_score, service_time_impact)
        }

    def calculate_performance_score(self, profile):
        """Score based on recent performance"""
        # Last 30 days wRC+ (AAA)
        recent_wrc = profile.get('recent_wRC+', 100)

        # Season wRC+
        season_wrc = profile.get('season_wRC+', 100)

        # Combine with recency bias
        performance = (recent_wrc * 0.6 + season_wrc * 0.4)

        # Normalize to 0-100
        score = min(100, max(0, (performance - 70) * 1.5))
        return score

    def calculate_skills_score(self, profile):
        """Score based on skill profile"""
        # Key skills
        plate_discipline = (profile.get('BB_pct', 8) - 4) * 8
        contact_ability = (85 - profile.get('K_pct', 23)) * 2
        power = profile.get('ISO', 0.150) * 200
        defense = profile.get('defensive_grade', 50) - 30

        score = (plate_discipline * 0.3 + contact_ability * 0.3 +
                power * 0.25 + defense * 0.15)

        return max(0, min(100, score))

    def calculate_experience_score(self, profile):
        """Score based on development experience"""
        pa_aaa = profile.get('AAA_PA', 0)
        pa_aa = profile.get('AA_PA', 0)

        # Prefer meaningful AAA experience
        experience = (pa_aaa * 0.7 + pa_aa * 0.3)

        # Normalize (300 PA = 50 score, 600 PA = 100 score)
        score = min(100, (experience / 6))
        return score

    def calculate_need_score(self, context):
        """Score based on organizational need"""
        position_depth = context.get('position_depth', 5)  # Number of MLB options
        current_production = context.get('position_wRC+', 100)  # Current position wRC+

        # High need = low depth or poor production
        depth_need = (6 - position_depth) * 15
        production_need = max(0, (100 - current_production) * 0.5)

        score = min(100, depth_need + production_need)
        return score

    def calculate_service_time_value(self, profile, context):
        """
        Calculate value of delaying call-up for service time
        """
        days_until_super2 = context.get('days_until_super2', 0)
        days_until_full_year = context.get('days_until_full_year', 0)

        prospect_value = profile.get('future_WAR', 2.5)

        # Value of extra year of control (roughly $8M per WAR)
        extra_year_value = prospect_value * 8_000_000

        # Discount based on time delay
        if days_until_full_year > 0 and days_until_full_year <= 20:
            return extra_year_value * 0.9  # High value to wait
        elif days_until_super2 > 0 and days_until_super2 <= 30:
            return extra_year_value * 0.3  # Moderate value to wait
        else:
            return 0

    def make_recommendation(self, readiness_score, service_time_value):
        """Make call-up recommendation"""
        if readiness_score >= 75:
            if service_time_value > 10_000_000:
                return "READY - Consider service time timing"
            else:
                return "READY - Call up now"
        elif readiness_score >= 60:
            return "CLOSE - Monitor closely, could be ready soon"
        elif readiness_score >= 40:
            return "DEVELOPING - Needs more time"
        else:
            return "NOT READY - Significant development needed"

# Example: Evaluate call-up decision for top prospects
model = CallUpDecisionModel()

# Prospect 1: Gunnar Henderson (June 2022)
henderson_profile = {
    'recent_wRC+': 138,
    'season_wRC+': 125,
    'BB_pct': 11.2,
    'K_pct': 22.5,
    'ISO': 0.215,
    'defensive_grade': 55,
    'AAA_PA': 245,
    'AA_PA': 412,
    'future_WAR': 3.5
}

orioles_context = {
    'position_depth': 3,  # SS/3B
    'position_wRC+': 88,  # Below average production
    'days_until_super2': 45,
    'days_until_full_year': 0
}

henderson_eval = model.assess_mlb_readiness(henderson_profile, orioles_context)

print("Gunnar Henderson Call-Up Evaluation:")
print(f"Readiness Score: {henderson_eval['readiness_score']:.1f}/100")
print(f"  Performance: {henderson_eval['performance_score']:.1f}")
print(f"  Skills: {henderson_eval['skills_score']:.1f}")
print(f"  Experience: {henderson_eval['experience_score']:.1f}")
print(f"  Need: {henderson_eval['need_score']:.1f}")
print(f"Service Time Value: ${henderson_eval['service_time_value']:,.0f}")
print(f"Recommendation: {henderson_eval['recommendation']}")

# Prospect 2: Jordan Walker (March 2023 - Opening Day consideration)
walker_profile = {
    'recent_wRC+': 115,  # Spring training
    'season_wRC+': 135,  # Previous AA season
    'BB_pct': 8.8,
    'K_pct': 23.8,
    'ISO': 0.235,
    'defensive_grade': 50,
    'AAA_PA': 0,  # No AAA experience
    'AA_PA': 456,
    'future_WAR': 4.0
}

cardinals_context = {
    'position_depth': 4,  # OF depth
    'position_wRC+': 105,  # Average production
    'days_until_super2': 0,
    'days_until_full_year': 15  # Opening Day decision
}

walker_eval = model.assess_mlb_readiness(walker_profile, cardinals_context)

print("\n\nJordan Walker Call-Up Evaluation:")
print(f"Readiness Score: {walker_eval['readiness_score']:.1f}/100")
print(f"  Performance: {walker_eval['performance_score']:.1f}")
print(f"  Skills: {walker_eval['skills_score']:.1f}")
print(f"  Experience: {walker_eval['experience_score']:.1f}")
print(f"  Need: {walker_eval['need_score']:.1f}")
print(f"Service Time Value: ${walker_eval['service_time_value']:,.0f}")
print(f"Recommendation: {walker_eval['recommendation']}")
Python
class PerformanceTriggerSystem:
    def __init__(self):
        self.trigger_thresholds = {
            'AAA': {'wRC+': 120, 'K_pct': 25, 'BB_pct': 8, 'min_PA': 200},
            'AA': {'wRC+': 130, 'K_pct': 23, 'BB_pct': 9, 'min_PA': 300}
        }

    def check_promotion_triggers(self, player_stats, current_level):
        """
        Check if player has met performance triggers for promotion
        """
        thresholds = self.trigger_thresholds[current_level]
        triggers_met = []

        # Check each threshold
        if player_stats['PA'] >= thresholds['min_PA']:
            triggers_met.append('Playing Time')

        if player_stats['wRC+'] >= thresholds['wRC+']:
            triggers_met.append('Overall Performance')

        if player_stats['K_pct'] <= thresholds['K_pct']:
            triggers_met.append('Strikeout Rate')

        if player_stats['BB_pct'] >= thresholds['BB_pct']:
            triggers_met.append('Walk Rate')

        # Age consideration - young players get promoted more aggressively
        if player_stats['age'] <= 22 and player_stats['wRC+'] >= 110:
            triggers_met.append('Age-Adjusted Performance')

        # Determine readiness
        triggers_total = 5 if player_stats['age'] <= 22 else 4
        triggers_pct = len(triggers_met) / triggers_total

        if triggers_pct >= 0.75:
            recommendation = "PROMOTE"
        elif triggers_pct >= 0.50:
            recommendation = "MONITOR"
        else:
            recommendation = "CONTINUE DEVELOPMENT"

        return {
            'triggers_met': triggers_met,
            'triggers_percentage': triggers_pct,
            'recommendation': recommendation
        }

# Example: Monitor multiple prospects for promotion triggers
trigger_system = PerformanceTriggerSystem()

prospects_to_monitor = pd.DataFrame({
    'name': ['Prospect A', 'Prospect B', 'Prospect C'],
    'level': ['AAA', 'AA', 'AAA'],
    'age': [22, 20, 24],
    'PA': [285, 345, 412],
    'wRC+': [128, 135, 118],
    'K_pct': [23.5, 22.1, 26.8],
    'BB_pct': [9.2, 10.5, 7.8]
})

print("Promotion Trigger Analysis:\n")
for idx, prospect in prospects_to_monitor.iterrows():
    result = trigger_system.check_promotion_triggers(prospect, prospect['level'])

    print(f"{prospect['name']} ({prospect['level']}, Age {prospect['age']}):")
    print(f"  Triggers Met: {', '.join(result['triggers_met']) if result['triggers_met'] else 'None'}")
    print(f"  Readiness: {result['triggers_percentage']:.0%}")
    print(f"  Recommendation: {result['recommendation']}\n")

15.7 Exercises

Exercise 15.1: Age-Adjusted Performance Analysis

Task: Analyze a prospect's performance adjusting for age relative to league average. Using the provided data, calculate age-adjusted metrics and determine if the prospect is performing above or below expectations.

Data:

Prospect: SS, Age 20
Level: High-A (League Avg Age: 22.8)
Stats: .275 AVG, .345 OBP, .485 SLG, 15 HR, 285 PA
      12.2% BB%, 24.5% K%, .210 ISO

Questions:


  1. Calculate the prospect's age-adjusted wRC+ (assume league average is 100)

  2. How does the strikeout rate compare when adjusted for age?

  3. Based on age-adjusted metrics, is this prospect ahead or behind the development curve?

  4. What level should this prospect be promoted to next, and why?

Exercise 15.2: Breakout Candidate Identification

Task: Using the swing decision and contact quality metrics below, identify which prospect is most likely to break out in the next season.

Prospect Comparison:

MetricProspect AProspect BProspect C
Current wRC+10511898
Chase Rate Change-4.5%-1.2%+2.1%
Zone Contact Change+3.2%+1.8%-0.5%
Avg EV Change+2.1 mph+0.8 mph+3.5 mph
Barrel Rate8.5%11.2%6.8%
Age222421

Questions:


  1. Calculate a composite breakout score for each prospect

  2. Which prospect shows the most promising leading indicators?

  3. What specific improvements drive your choice?

  4. What realistic wRC+ would you project for each prospect next season?

Exercise 15.3: International Prospect Translation

Task: Translate the following KBO statistics to MLB equivalents and project first-year MLB performance.

Player Data:

Player: OF, Age 26
League: KBO (Korean Baseball Organization)
Stats: .318 AVG, .385 OBP, .538 SLG, 28 HR, 550 PA
       9.5% BB%, 15.2% K%, .220 ISO
Previous MLB exposure: None

Questions:


  1. Translate the KBO statistics to MLB equivalents using appropriate league factors

  2. What MLB slash line would you project for Year 1?

  3. What is the biggest risk factor in this projection?

  4. How would your projection change if the player were age 23 instead of 26?

Exercise 15.4: Call-Up Decision Analysis

Task: You are the GM. Determine whether to call up your top prospect on May 1st or wait until mid-June for service time reasons.

Prospect Profile:

Position: 3B, Age 22
AAA Stats (145 PA): .298/.375/.512, 6 HR, 12.4% BB%, 21.2% K%
AA Stats (425 PA): .285/.360/.485, 18 HR, 10.1% BB%, 24.5% K%
Defensive Grade: 55 (above average)
Future WAR Projection: 4.0 WAR annually (ages 25-29)

Team Context:

Current 3B Production: 85 wRC+ (below average)
Team Record: 15-18 (below .500)
Payroll Situation: Middle of pack
Days until full year service time: 12 days (mid-April)
Estimated Super Two cutoff: Already passed

Questions:


  1. Calculate the financial value of delaying the call-up until mid-June

  2. What is the estimated WAR cost of keeping an 85 wRC+ player at 3B for 6 more weeks?

  3. Make your recommendation: Call up now or wait? Justify with analysis.

  4. What performance threshold would make you change your decision?


Exercise Solutions: Solutions to these exercises involve combining multiple analytical techniques from the chapter. Students should use the code frameworks provided to build their own analysis pipelines, applying appropriate age adjustments, translation factors, and decision models. The exercises emphasize practical decision-making under uncertainty, mirroring real-world front office challenges.

Summary

Player development analytics represents the convergence of traditional scouting and modern data science. Success requires understanding both the quantitative metrics that predict future performance and the qualitative factors that influence player development trajectories.

Key takeaways:

  1. Age matters: Always adjust performance metrics for age relative to league average. A 20-year-old posting league-average numbers in Double-A is far more impressive than a 25-year-old doing the same.
  1. Leading indicators beat results: Swing decision metrics, contact quality, and plate discipline improvements often predict breakouts before traditional statistics reflect the change.
  1. Context is critical: Whether evaluating international players, translating minor league stats, or making call-up decisions, understanding context—competitive environment, organizational need, service time implications—determines decision quality.
  1. Projection uncertainty increases with distance: The younger the player and the lower the level, the wider the confidence intervals. Build ranges, not point estimates.
  1. Development is non-linear: Players progress at different rates. Some break out immediately, others need time to adjust to each level. Patience with high-upside prospects often yields superior long-term value.

The organizations that excel at player development combine these analytical frameworks with deep scouting expertise, creating systems that identify talent earlier, develop it more effectively, and deploy it optimally. In an era of controlled spending and competitive balance mechanisms, sustainable success increasingly flows through the minor league system.

R
Prospect: SS, Age 20
Level: High-A (League Avg Age: 22.8)
Stats: .275 AVG, .345 OBP, .485 SLG, 15 HR, 285 PA
      12.2% BB%, 24.5% K%, .210 ISO
R
Player: OF, Age 26
League: KBO (Korean Baseball Organization)
Stats: .318 AVG, .385 OBP, .538 SLG, 28 HR, 550 PA
       9.5% BB%, 15.2% K%, .220 ISO
Previous MLB exposure: None
R
Position: 3B, Age 22
AAA Stats (145 PA): .298/.375/.512, 6 HR, 12.4% BB%, 21.2% K%
AA Stats (425 PA): .285/.360/.485, 18 HR, 10.1% BB%, 24.5% K%
Defensive Grade: 55 (above average)
Future WAR Projection: 4.0 WAR annually (ages 25-29)
R
Current 3B Production: 85 wRC+ (below average)
Team Record: 15-18 (below .500)
Payroll Situation: Middle of pack
Days until full year service time: 12 days (mid-April)
Estimated Super Two cutoff: Already passed

Practice Exercises

Reinforce what you've learned with these hands-on exercises. Try to solve them on your own before viewing hints or solutions.

4 exercises
Tips for Success
  • Read the problem carefully before starting to code
  • Break down complex problems into smaller steps
  • Use the hints if you're stuck - they won't give away the answer
  • After solving, compare your approach with the solution
Exercise 15.1
Age-Adjusted Performance Analysis
Medium
**Task**: Analyze a prospect's performance adjusting for age relative to league average. Using the provided data, calculate age-adjusted metrics and determine if the prospect is performing above or below expectations.

**Data**:
```
Prospect: SS, Age 20
Level: High-A (League Avg Age: 22.8)
Stats: .275 AVG, .345 OBP, .485 SLG, 15 HR, 285 PA
12.2% BB%, 24.5% K%, .210 ISO
```

**Questions**:
1. Calculate the prospect's age-adjusted wRC+ (assume league average is 100)
2. How does the strikeout rate compare when adjusted for age?
3. Based on age-adjusted metrics, is this prospect ahead or behind the development curve?
4. What level should this prospect be promoted to next, and why?
Exercise 15.2
Breakout Candidate Identification
Medium
**Task**: Using the swing decision and contact quality metrics below, identify which prospect is most likely to break out in the next season.

**Prospect Comparison**:

| Metric | Prospect A | Prospect B | Prospect C |
|--------|-----------|-----------|-----------|
| Current wRC+ | 105 | 118 | 98 |
| Chase Rate Change | -4.5% | -1.2% | +2.1% |
| Zone Contact Change | +3.2% | +1.8% | -0.5% |
| Avg EV Change | +2.1 mph | +0.8 mph | +3.5 mph |
| Barrel Rate | 8.5% | 11.2% | 6.8% |
| Age | 22 | 24 | 21 |

**Questions**:
1. Calculate a composite breakout score for each prospect
2. Which prospect shows the most promising leading indicators?
3. What specific improvements drive your choice?
4. What realistic wRC+ would you project for each prospect next season?
Exercise 15.3
International Prospect Translation
Medium
**Task**: Translate the following KBO statistics to MLB equivalents and project first-year MLB performance.

**Player Data**:
```
Player: OF, Age 26
League: KBO (Korean Baseball Organization)
Stats: .318 AVG, .385 OBP, .538 SLG, 28 HR, 550 PA
9.5% BB%, 15.2% K%, .220 ISO
Previous MLB exposure: None
```

**Questions**:
1. Translate the KBO statistics to MLB equivalents using appropriate league factors
2. What MLB slash line would you project for Year 1?
3. What is the biggest risk factor in this projection?
4. How would your projection change if the player were age 23 instead of 26?
Exercise 15.4
Call-Up Decision Analysis
Hard
**Task**: You are the GM. Determine whether to call up your top prospect on May 1st or wait until mid-June for service time reasons.

**Prospect Profile**:
```
Position: 3B, Age 22
AAA Stats (145 PA): .298/.375/.512, 6 HR, 12.4% BB%, 21.2% K%
AA Stats (425 PA): .285/.360/.485, 18 HR, 10.1% BB%, 24.5% K%
Defensive Grade: 55 (above average)
Future WAR Projection: 4.0 WAR annually (ages 25-29)
```

**Team Context**:
```
Current 3B Production: 85 wRC+ (below average)
Team Record: 15-18 (below .500)
Payroll Situation: Middle of pack
Days until full year service time: 12 days (mid-April)
Estimated Super Two cutoff: Already passed
```

**Questions**:
1. Calculate the financial value of delaying the call-up until mid-June
2. What is the estimated WAR cost of keeping an 85 wRC+ player at 3B for 6 more weeks?
3. Make your recommendation: Call up now or wait? Justify with analysis.
4. What performance threshold would make you change your decision?

---

**Exercise Solutions**: Solutions to these exercises involve combining multiple analytical techniques from the chapter. Students should use the code frameworks provided to build their own analysis pipelines, applying appropriate age adjustments, translation factors, and decision models. The exercises emphasize practical decision-making under uncertainty, mirroring real-world front office challenges.

Chapter Summary

In this chapter, you learned about player development & minor league analytics. Key topics covered:

  • Minor League System Overview
  • Prospect Evaluation Metrics
  • Projection Systems
  • Breakout Candidates
  • International Scouting
  • Call-Up Decisions
4 practice exercises available Practice Now