Chapter 30: Advanced Sports Betting Models

Sports betting analytics represents one of the most challenging applications of statistical modeling in baseball. This chapter explores the mathematical foundations, predictive modeling techniques, and practical implementation strategies for building robust MLB betting models. We emphasize the analytical framework rather than promoting gambling—our goal is to understand how probability, statistics, and machine learning intersect in real-world prediction markets.

Advanced ~5 min read 8 sections 17 code examples
Book Progress
57%
Chapter 31 of 54
What You'll Learn
  • Understanding Betting Markets & Lines
  • Building Predictive Models for MLB Games
  • Expected Value & Kelly Criterion
  • Player Props & Statcast-Based Models
  • And 4 more topics...
Languages in This Chapter
R (8) Python (9)

All code examples can be copied and run in your environment.

30.1 Understanding Betting Markets & Lines

Market Mechanics and Line Setting

Sportsbooks set betting lines through a combination of statistical modeling, historical data, and market dynamics. Understanding these mechanics is crucial for developing profitable betting strategies.

Key Betting Market Types:

  1. Moneyline (ML): A straight bet on which team will win
  2. Run Line (RL): Similar to point spread, typically set at ±1.5 runs
  3. Total (Over/Under): Combined runs scored by both teams
  4. Player Props: Individual player performance bets
  5. First Five Innings (F5): Betting on the outcome after 5 innings

Converting Odds to Implied Probability

American odds can be converted to implied probabilities:

# R: Convert American odds to implied probability
american_to_prob <- function(odds) {
  if (odds > 0) {
    prob <- 100 / (odds + 100)
  } else {
    prob <- abs(odds) / (abs(odds) + 100)
  }
  return(prob)
}

# Example: Yankees -150, Red Sox +130
yankees_odds <- -150
redsox_odds <- 130

yankees_implied <- american_to_prob(yankees_odds)
redsox_implied <- american_to_prob(redsox_odds)

cat(sprintf("Yankees implied probability: %.2f%%\n", yankees_implied * 100))
cat(sprintf("Red Sox implied probability: %.2f%%\n", redsox_implied * 100))
cat(sprintf("Total (vig): %.2f%%\n", (yankees_implied + redsox_implied) * 100))
# Python: Convert American odds to implied probability
import numpy as np
import pandas as pd

def american_to_prob(odds):
    """Convert American odds to implied probability"""
    if odds > 0:
        prob = 100 / (odds + 100)
    else:
        prob = abs(odds) / (abs(odds) + 100)
    return prob

def prob_to_american(prob):
    """Convert probability to American odds"""
    if prob >= 0.5:
        odds = -(prob * 100) / (1 - prob)
    else:
        odds = ((1 - prob) * 100) / prob
    return odds

# Example
yankees_odds = -150
redsox_odds = 130

yankees_implied = american_to_prob(yankees_odds)
redsox_implied = american_to_prob(redsox_odds)

print(f"Yankees implied probability: {yankees_implied:.2%}")
print(f"Red Sox implied probability: {redsox_implied:.2%}")
print(f"Total (vig): {(yankees_implied + redsox_implied):.2%}")
print(f"Vig: {(yankees_implied + redsox_implied - 1):.2%}")

The Efficient Market Hypothesis in Sports Betting

Sports betting markets exhibit characteristics of efficient markets, where:

  • Odds quickly adjust to new information (injuries, weather, lineup changes)
  • Sharp bettors exploit inefficiencies, moving lines toward true probabilities
  • The "wisdom of the crowd" often produces accurate probability estimates
  • Market inefficiencies exist but are difficult to exploit consistently

Quantifying the Vig (Vigorish)

The vig represents the sportsbook's built-in profit margin:

# R: Calculate no-vig (fair) odds
calculate_no_vig_prob <- function(prob1, prob2) {
  total <- prob1 + prob2
  fair_prob1 <- prob1 / total
  fair_prob2 <- prob2 / total
  return(list(prob1 = fair_prob1, prob2 = fair_prob2))
}

# Remove vig from our example
fair_probs <- calculate_no_vig_prob(yankees_implied, redsox_implied)
cat(sprintf("Yankees no-vig probability: %.2f%%\n", fair_probs$prob1 * 100))
cat(sprintf("Red Sox no-vig probability: %.2f%%\n", fair_probs$prob2 * 100))
R
# R: Convert American odds to implied probability
american_to_prob <- function(odds) {
  if (odds > 0) {
    prob <- 100 / (odds + 100)
  } else {
    prob <- abs(odds) / (abs(odds) + 100)
  }
  return(prob)
}

# Example: Yankees -150, Red Sox +130
yankees_odds <- -150
redsox_odds <- 130

yankees_implied <- american_to_prob(yankees_odds)
redsox_implied <- american_to_prob(redsox_odds)

cat(sprintf("Yankees implied probability: %.2f%%\n", yankees_implied * 100))
cat(sprintf("Red Sox implied probability: %.2f%%\n", redsox_implied * 100))
cat(sprintf("Total (vig): %.2f%%\n", (yankees_implied + redsox_implied) * 100))
R
# R: Calculate no-vig (fair) odds
calculate_no_vig_prob <- function(prob1, prob2) {
  total <- prob1 + prob2
  fair_prob1 <- prob1 / total
  fair_prob2 <- prob2 / total
  return(list(prob1 = fair_prob1, prob2 = fair_prob2))
}

# Remove vig from our example
fair_probs <- calculate_no_vig_prob(yankees_implied, redsox_implied)
cat(sprintf("Yankees no-vig probability: %.2f%%\n", fair_probs$prob1 * 100))
cat(sprintf("Red Sox no-vig probability: %.2f%%\n", fair_probs$prob2 * 100))
Python
# Python: Convert American odds to implied probability
import numpy as np
import pandas as pd

def american_to_prob(odds):
    """Convert American odds to implied probability"""
    if odds > 0:
        prob = 100 / (odds + 100)
    else:
        prob = abs(odds) / (abs(odds) + 100)
    return prob

def prob_to_american(prob):
    """Convert probability to American odds"""
    if prob >= 0.5:
        odds = -(prob * 100) / (1 - prob)
    else:
        odds = ((1 - prob) * 100) / prob
    return odds

# Example
yankees_odds = -150
redsox_odds = 130

yankees_implied = american_to_prob(yankees_odds)
redsox_implied = american_to_prob(redsox_odds)

print(f"Yankees implied probability: {yankees_implied:.2%}")
print(f"Red Sox implied probability: {redsox_implied:.2%}")
print(f"Total (vig): {(yankees_implied + redsox_implied):.2%}")
print(f"Vig: {(yankees_implied + redsox_implied - 1):.2%}")

30.2 Building Predictive Models for MLB Games

Feature Engineering for Game Prediction

Successful betting models require carefully engineered features that capture team strength, matchups, and contextual factors.

# Python: Comprehensive feature engineering for game prediction
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

class MLBGameFeatures:
    """Feature engineering for MLB game predictions"""

    def __init__(self, games_df, team_stats_df, pitcher_stats_df):
        self.games = games_df
        self.team_stats = team_stats_df
        self.pitcher_stats = pitcher_stats_df

    def create_rolling_stats(self, window=10):
        """Calculate rolling team performance metrics"""
        features = []

        for idx, game in self.games.iterrows():
            team = game['team']
            date = game['date']

            # Get recent games
            recent = self.games[
                (self.games['team'] == team) &
                (self.games['date'] < date)
            ].tail(window)

            if len(recent) < 5:  # Require minimum games
                continue

            features.append({
                'game_id': game['game_id'],
                'team': team,
                'rolling_win_pct': recent['win'].mean(),
                'rolling_runs_scored': recent['runs_scored'].mean(),
                'rolling_runs_allowed': recent['runs_allowed'].mean(),
                'rolling_wOBA': recent['wOBA'].mean(),
                'rolling_FIP': recent['FIP'].mean(),
                'days_rest': (date - recent['date'].max()).days
            })

        return pd.DataFrame(features)

    def create_pitcher_features(self, pitcher_id, date, hand):
        """Extract pitcher-specific features"""
        pitcher_data = self.pitcher_stats[
            (self.pitcher_stats['pitcher_id'] == pitcher_id) &
            (self.pitcher_stats['date'] < date)
        ].tail(5)  # Last 5 starts

        if len(pitcher_data) == 0:
            return {}

        return {
            'pitcher_ERA': pitcher_data['ERA'].mean(),
            'pitcher_FIP': pitcher_data['FIP'].mean(),
            'pitcher_K_per_9': pitcher_data['K_per_9'].mean(),
            'pitcher_BB_per_9': pitcher_data['BB_per_9'].mean(),
            'pitcher_WHIP': pitcher_data['WHIP'].mean(),
            'pitcher_hand': hand
        }

    def create_matchup_features(self, team, opp_team, date):
        """Create team vs team matchup features"""
        # Historical head-to-head (last 2 seasons)
        h2h = self.games[
            (self.games['team'] == team) &
            (self.games['opponent'] == opp_team) &
            (self.games['date'] < date) &
            (self.games['date'] > date - timedelta(days=730))
        ]

        return {
            'h2h_win_pct': h2h['win'].mean() if len(h2h) > 0 else 0.5,
            'h2h_games': len(h2h)
        }

    def create_contextual_features(self, game):
        """Contextual features: home/away, weather, rest, etc."""
        return {
            'is_home': game['is_home'],
            'temperature': game.get('temperature', 70),
            'wind_speed': game.get('wind_speed', 0),
            'is_dome': game.get('is_dome', 0),
            'day_of_week': game['date'].dayofweek,
            'month': game['date'].month
        }

# Example usage
def build_game_features(game_data):
    """Complete feature set for a single game"""
    feature_engine = MLBGameFeatures(game_data['historical_games'],
                                     game_data['team_stats'],
                                     game_data['pitcher_stats'])

    features = {}
    features.update(feature_engine.create_rolling_stats(window=10))
    features.update(feature_engine.create_pitcher_features(
        game_data['pitcher_id'],
        game_data['date'],
        game_data['pitcher_hand']
    ))
    features.update(feature_engine.create_matchup_features(
        game_data['team'],
        game_data['opponent'],
        game_data['date']
    ))
    features.update(feature_engine.create_contextual_features(game_data))

    return features

Elo Rating System for MLB

Elo ratings provide a simple yet effective method for estimating team strength:

# R: Elo rating system for MLB
library(dplyr)

elo_rating <- function(games_df, K = 20, home_advantage = 30) {
  # Initialize ratings
  teams <- unique(c(games_df$home_team, games_df$away_team))
  ratings <- setNames(rep(1500, length(teams)), teams)

  # Store predictions and outcomes
  results <- data.frame()

  for (i in 1:nrow(games_df)) {
    game <- games_df[i, ]

    # Get current ratings
    home_rating <- ratings[game$home_team]
    away_rating <- ratings[game$away_team]

    # Expected win probability (with home advantage)
    home_expected <- 1 / (1 + 10^(-(home_rating - away_rating + home_advantage) / 400))

    # Actual outcome (1 if home team won, 0 otherwise)
    home_won <- ifelse(game$home_score > game$away_score, 1, 0)

    # Update ratings
    home_new <- home_rating + K * (home_won - home_expected)
    away_new <- away_rating + K * ((1 - home_won) - (1 - home_expected))

    ratings[game$home_team] <- home_new
    ratings[game$away_team] <- away_new

    # Store result
    results <- rbind(results, data.frame(
      game_id = game$game_id,
      date = game$date,
      home_team = game$home_team,
      away_team = game$away_team,
      home_rating_pre = home_rating,
      away_rating_pre = away_rating,
      home_prob = home_expected,
      home_won = home_won,
      home_rating_post = home_new,
      away_rating_post = away_new
    ))
  }

  return(list(ratings = ratings, results = results))
}

# Calculate Elo predictions for upcoming games
predict_with_elo <- function(home_team, away_team, ratings, home_advantage = 30) {
  home_rating <- ratings[home_team]
  away_rating <- ratings[away_team]

  home_prob <- 1 / (1 + 10^(-(home_rating - away_rating + home_advantage) / 400))

  return(list(
    home_prob = home_prob,
    away_prob = 1 - home_prob,
    home_rating = home_rating,
    away_rating = away_rating
  ))
}

Machine Learning Models for Game Prediction

# Python: Advanced ML models for game outcome prediction
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.calibration import CalibratedClassifierCV
import xgboost as xgb
import lightgbm as lgb

class MLBBettingModel:
    """Ensemble model for MLB game predictions"""

    def __init__(self, model_type='xgboost'):
        self.model_type = model_type
        self.model = None
        self.scaler = StandardScaler()
        self.feature_names = None

    def prepare_features(self, df):
        """Prepare feature matrix from game data"""
        feature_cols = [
            'rolling_win_pct', 'rolling_runs_scored', 'rolling_runs_allowed',
            'rolling_wOBA', 'rolling_FIP', 'days_rest',
            'pitcher_ERA', 'pitcher_FIP', 'pitcher_K_per_9',
            'pitcher_BB_per_9', 'pitcher_WHIP',
            'opp_rolling_win_pct', 'opp_rolling_runs_scored',
            'opp_rolling_runs_allowed', 'opp_rolling_wOBA', 'opp_rolling_FIP',
            'opp_pitcher_ERA', 'opp_pitcher_FIP',
            'is_home', 'temperature', 'wind_speed', 'h2h_win_pct'
        ]

        X = df[feature_cols].fillna(df[feature_cols].median())
        y = df['win']

        self.feature_names = feature_cols
        return X, y

    def build_model(self):
        """Initialize the prediction model"""
        if self.model_type == 'xgboost':
            self.model = xgb.XGBClassifier(
                n_estimators=200,
                max_depth=6,
                learning_rate=0.05,
                subsample=0.8,
                colsample_bytree=0.8,
                random_state=42,
                eval_metric='logloss'
            )
        elif self.model_type == 'lightgbm':
            self.model = lgb.LGBMClassifier(
                n_estimators=200,
                max_depth=6,
                learning_rate=0.05,
                subsample=0.8,
                colsample_bytree=0.8,
                random_state=42
            )
        elif self.model_type == 'random_forest':
            self.model = RandomForestClassifier(
                n_estimators=200,
                max_depth=10,
                min_samples_split=10,
                random_state=42
            )
        else:  # logistic regression
            self.model = LogisticRegression(
                C=1.0,
                max_iter=1000,
                random_state=42
            )

    def train(self, X, y):
        """Train the model with proper calibration"""
        # Scale features
        X_scaled = self.scaler.fit_transform(X)

        # Build and train model
        self.build_model()

        # Use calibration to ensure probabilities are well-calibrated
        self.model = CalibratedClassifierCV(
            self.model,
            method='isotonic',
            cv=5
        )

        self.model.fit(X_scaled, y)

    def predict_proba(self, X):
        """Predict win probabilities"""
        X_scaled = self.scaler.transform(X)
        return self.model.predict_proba(X_scaled)[:, 1]

    def evaluate_time_series(self, X, y, n_splits=5):
        """Evaluate model using time-series cross-validation"""
        tscv = TimeSeriesSplit(n_splits=n_splits)

        # Scale features
        X_scaled = self.scaler.fit_transform(X)

        # Build model
        self.build_model()

        # Cross-validation scores
        scores = cross_val_score(
            self.model,
            X_scaled,
            y,
            cv=tscv,
            scoring='neg_log_loss'
        )

        return {
            'mean_log_loss': -scores.mean(),
            'std_log_loss': scores.std(),
            'scores': -scores
        }

    def feature_importance(self):
        """Extract feature importance"""
        if hasattr(self.model.base_estimator, 'feature_importances_'):
            importances = self.model.base_estimator.feature_importances_
        else:
            # For linear models, use coefficient magnitude
            importances = np.abs(self.model.base_estimator.coef_[0])

        feature_imp = pd.DataFrame({
            'feature': self.feature_names,
            'importance': importances
        }).sort_values('importance', ascending=False)

        return feature_imp
R
# R: Elo rating system for MLB
library(dplyr)

elo_rating <- function(games_df, K = 20, home_advantage = 30) {
  # Initialize ratings
  teams <- unique(c(games_df$home_team, games_df$away_team))
  ratings <- setNames(rep(1500, length(teams)), teams)

  # Store predictions and outcomes
  results <- data.frame()

  for (i in 1:nrow(games_df)) {
    game <- games_df[i, ]

    # Get current ratings
    home_rating <- ratings[game$home_team]
    away_rating <- ratings[game$away_team]

    # Expected win probability (with home advantage)
    home_expected <- 1 / (1 + 10^(-(home_rating - away_rating + home_advantage) / 400))

    # Actual outcome (1 if home team won, 0 otherwise)
    home_won <- ifelse(game$home_score > game$away_score, 1, 0)

    # Update ratings
    home_new <- home_rating + K * (home_won - home_expected)
    away_new <- away_rating + K * ((1 - home_won) - (1 - home_expected))

    ratings[game$home_team] <- home_new
    ratings[game$away_team] <- away_new

    # Store result
    results <- rbind(results, data.frame(
      game_id = game$game_id,
      date = game$date,
      home_team = game$home_team,
      away_team = game$away_team,
      home_rating_pre = home_rating,
      away_rating_pre = away_rating,
      home_prob = home_expected,
      home_won = home_won,
      home_rating_post = home_new,
      away_rating_post = away_new
    ))
  }

  return(list(ratings = ratings, results = results))
}

# Calculate Elo predictions for upcoming games
predict_with_elo <- function(home_team, away_team, ratings, home_advantage = 30) {
  home_rating <- ratings[home_team]
  away_rating <- ratings[away_team]

  home_prob <- 1 / (1 + 10^(-(home_rating - away_rating + home_advantage) / 400))

  return(list(
    home_prob = home_prob,
    away_prob = 1 - home_prob,
    home_rating = home_rating,
    away_rating = away_rating
  ))
}
Python
# Python: Comprehensive feature engineering for game prediction
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

class MLBGameFeatures:
    """Feature engineering for MLB game predictions"""

    def __init__(self, games_df, team_stats_df, pitcher_stats_df):
        self.games = games_df
        self.team_stats = team_stats_df
        self.pitcher_stats = pitcher_stats_df

    def create_rolling_stats(self, window=10):
        """Calculate rolling team performance metrics"""
        features = []

        for idx, game in self.games.iterrows():
            team = game['team']
            date = game['date']

            # Get recent games
            recent = self.games[
                (self.games['team'] == team) &
                (self.games['date'] < date)
            ].tail(window)

            if len(recent) < 5:  # Require minimum games
                continue

            features.append({
                'game_id': game['game_id'],
                'team': team,
                'rolling_win_pct': recent['win'].mean(),
                'rolling_runs_scored': recent['runs_scored'].mean(),
                'rolling_runs_allowed': recent['runs_allowed'].mean(),
                'rolling_wOBA': recent['wOBA'].mean(),
                'rolling_FIP': recent['FIP'].mean(),
                'days_rest': (date - recent['date'].max()).days
            })

        return pd.DataFrame(features)

    def create_pitcher_features(self, pitcher_id, date, hand):
        """Extract pitcher-specific features"""
        pitcher_data = self.pitcher_stats[
            (self.pitcher_stats['pitcher_id'] == pitcher_id) &
            (self.pitcher_stats['date'] < date)
        ].tail(5)  # Last 5 starts

        if len(pitcher_data) == 0:
            return {}

        return {
            'pitcher_ERA': pitcher_data['ERA'].mean(),
            'pitcher_FIP': pitcher_data['FIP'].mean(),
            'pitcher_K_per_9': pitcher_data['K_per_9'].mean(),
            'pitcher_BB_per_9': pitcher_data['BB_per_9'].mean(),
            'pitcher_WHIP': pitcher_data['WHIP'].mean(),
            'pitcher_hand': hand
        }

    def create_matchup_features(self, team, opp_team, date):
        """Create team vs team matchup features"""
        # Historical head-to-head (last 2 seasons)
        h2h = self.games[
            (self.games['team'] == team) &
            (self.games['opponent'] == opp_team) &
            (self.games['date'] < date) &
            (self.games['date'] > date - timedelta(days=730))
        ]

        return {
            'h2h_win_pct': h2h['win'].mean() if len(h2h) > 0 else 0.5,
            'h2h_games': len(h2h)
        }

    def create_contextual_features(self, game):
        """Contextual features: home/away, weather, rest, etc."""
        return {
            'is_home': game['is_home'],
            'temperature': game.get('temperature', 70),
            'wind_speed': game.get('wind_speed', 0),
            'is_dome': game.get('is_dome', 0),
            'day_of_week': game['date'].dayofweek,
            'month': game['date'].month
        }

# Example usage
def build_game_features(game_data):
    """Complete feature set for a single game"""
    feature_engine = MLBGameFeatures(game_data['historical_games'],
                                     game_data['team_stats'],
                                     game_data['pitcher_stats'])

    features = {}
    features.update(feature_engine.create_rolling_stats(window=10))
    features.update(feature_engine.create_pitcher_features(
        game_data['pitcher_id'],
        game_data['date'],
        game_data['pitcher_hand']
    ))
    features.update(feature_engine.create_matchup_features(
        game_data['team'],
        game_data['opponent'],
        game_data['date']
    ))
    features.update(feature_engine.create_contextual_features(game_data))

    return features
Python
# Python: Advanced ML models for game outcome prediction
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.calibration import CalibratedClassifierCV
import xgboost as xgb
import lightgbm as lgb

class MLBBettingModel:
    """Ensemble model for MLB game predictions"""

    def __init__(self, model_type='xgboost'):
        self.model_type = model_type
        self.model = None
        self.scaler = StandardScaler()
        self.feature_names = None

    def prepare_features(self, df):
        """Prepare feature matrix from game data"""
        feature_cols = [
            'rolling_win_pct', 'rolling_runs_scored', 'rolling_runs_allowed',
            'rolling_wOBA', 'rolling_FIP', 'days_rest',
            'pitcher_ERA', 'pitcher_FIP', 'pitcher_K_per_9',
            'pitcher_BB_per_9', 'pitcher_WHIP',
            'opp_rolling_win_pct', 'opp_rolling_runs_scored',
            'opp_rolling_runs_allowed', 'opp_rolling_wOBA', 'opp_rolling_FIP',
            'opp_pitcher_ERA', 'opp_pitcher_FIP',
            'is_home', 'temperature', 'wind_speed', 'h2h_win_pct'
        ]

        X = df[feature_cols].fillna(df[feature_cols].median())
        y = df['win']

        self.feature_names = feature_cols
        return X, y

    def build_model(self):
        """Initialize the prediction model"""
        if self.model_type == 'xgboost':
            self.model = xgb.XGBClassifier(
                n_estimators=200,
                max_depth=6,
                learning_rate=0.05,
                subsample=0.8,
                colsample_bytree=0.8,
                random_state=42,
                eval_metric='logloss'
            )
        elif self.model_type == 'lightgbm':
            self.model = lgb.LGBMClassifier(
                n_estimators=200,
                max_depth=6,
                learning_rate=0.05,
                subsample=0.8,
                colsample_bytree=0.8,
                random_state=42
            )
        elif self.model_type == 'random_forest':
            self.model = RandomForestClassifier(
                n_estimators=200,
                max_depth=10,
                min_samples_split=10,
                random_state=42
            )
        else:  # logistic regression
            self.model = LogisticRegression(
                C=1.0,
                max_iter=1000,
                random_state=42
            )

    def train(self, X, y):
        """Train the model with proper calibration"""
        # Scale features
        X_scaled = self.scaler.fit_transform(X)

        # Build and train model
        self.build_model()

        # Use calibration to ensure probabilities are well-calibrated
        self.model = CalibratedClassifierCV(
            self.model,
            method='isotonic',
            cv=5
        )

        self.model.fit(X_scaled, y)

    def predict_proba(self, X):
        """Predict win probabilities"""
        X_scaled = self.scaler.transform(X)
        return self.model.predict_proba(X_scaled)[:, 1]

    def evaluate_time_series(self, X, y, n_splits=5):
        """Evaluate model using time-series cross-validation"""
        tscv = TimeSeriesSplit(n_splits=n_splits)

        # Scale features
        X_scaled = self.scaler.fit_transform(X)

        # Build model
        self.build_model()

        # Cross-validation scores
        scores = cross_val_score(
            self.model,
            X_scaled,
            y,
            cv=tscv,
            scoring='neg_log_loss'
        )

        return {
            'mean_log_loss': -scores.mean(),
            'std_log_loss': scores.std(),
            'scores': -scores
        }

    def feature_importance(self):
        """Extract feature importance"""
        if hasattr(self.model.base_estimator, 'feature_importances_'):
            importances = self.model.base_estimator.feature_importances_
        else:
            # For linear models, use coefficient magnitude
            importances = np.abs(self.model.base_estimator.coef_[0])

        feature_imp = pd.DataFrame({
            'feature': self.feature_names,
            'importance': importances
        }).sort_values('importance', ascending=False)

        return feature_imp

30.3 Expected Value & Kelly Criterion

Calculating Expected Value (EV)

Expected Value is the fundamental metric for assessing betting opportunities:

EV = (Probability of Winning × Amount Won) - (Probability of Losing × Amount Lost)

# Python: Expected value calculations
def calculate_ev(true_prob, odds, bet_amount=100):
    """
    Calculate expected value of a bet

    Parameters:
    - true_prob: Your estimated probability of winning
    - odds: American odds
    - bet_amount: Size of bet
    """
    # Convert odds to payout
    if odds > 0:
        payout = bet_amount * (odds / 100)
    else:
        payout = bet_amount * (100 / abs(odds))

    # Calculate EV
    win_amount = payout
    lose_amount = bet_amount

    ev = (true_prob * win_amount) - ((1 - true_prob) * lose_amount)
    ev_percent = (ev / bet_amount) * 100

    return {
        'ev_dollars': ev,
        'ev_percent': ev_percent,
        'implied_prob': american_to_prob(odds),
        'edge': true_prob - american_to_prob(odds)
    }

# Example: Finding positive EV bets
games = pd.DataFrame({
    'team': ['Yankees', 'Red Sox', 'Dodgers', 'Giants'],
    'odds': [-150, 120, -180, 160],
    'model_prob': [0.62, 0.47, 0.68, 0.41]
})

games['ev_analysis'] = games.apply(
    lambda row: calculate_ev(row['model_prob'], row['odds']),
    axis=1
)

# Extract EV metrics
games['ev_percent'] = games['ev_analysis'].apply(lambda x: x['ev_percent'])
games['edge'] = games['ev_analysis'].apply(lambda x: x['edge'])

# Find positive EV bets
positive_ev = games[games['ev_percent'] > 0].sort_values('ev_percent', ascending=False)
print("Positive EV Opportunities:")
print(positive_ev[['team', 'odds', 'model_prob', 'ev_percent', 'edge']])

Kelly Criterion for Optimal Bet Sizing

The Kelly Criterion determines optimal bet size based on edge and odds:

Kelly % = (bp - q) / b

Where:


  • b = decimal odds - 1

  • p = probability of winning

  • q = probability of losing (1 - p)

# R: Kelly Criterion implementation
kelly_criterion <- function(prob, odds, kelly_fraction = 1) {
  # Convert American odds to decimal
  if (odds > 0) {
    decimal_odds <- 1 + (odds / 100)
  } else {
    decimal_odds <- 1 + (100 / abs(odds))
  }

  b <- decimal_odds - 1
  p <- prob
  q <- 1 - prob

  # Kelly percentage
  kelly_pct <- (b * p - q) / b

  # Apply fractional Kelly (more conservative)
  adjusted_kelly <- kelly_pct * kelly_fraction

  # Never bet if Kelly is negative
  bet_size <- max(0, adjusted_kelly)

  return(list(
    kelly_pct = kelly_pct,
    adjusted_kelly = adjusted_kelly,
    bet_size = bet_size,
    recommendation = ifelse(bet_size > 0, "BET", "NO BET")
  ))
}

# Example: Calculate Kelly bet sizes
games <- data.frame(
  team = c("Yankees", "Red Sox", "Dodgers", "Giants"),
  odds = c(-150, 120, -180, 160),
  model_prob = c(0.62, 0.47, 0.68, 0.41)
)

# Apply Kelly with 25% fraction (more conservative)
games$kelly_full <- mapply(kelly_criterion, games$model_prob, games$odds, 1)
games$kelly_quarter <- mapply(kelly_criterion, games$model_prob, games$odds, 0.25)

# Extract bet sizes
games$full_kelly_pct <- sapply(games$kelly_full, function(x) x$bet_size * 100)
games$quarter_kelly_pct <- sapply(games$kelly_quarter, function(x) x$bet_size * 100)

print(games[, c("team", "odds", "model_prob", "full_kelly_pct", "quarter_kelly_pct")])

Simulating Kelly Performance

# Python: Monte Carlo simulation of Kelly vs. flat betting
import numpy as np
import matplotlib.pyplot as plt

def simulate_betting_strategy(n_bets=1000,
                              win_prob=0.55,
                              odds=-110,
                              strategy='kelly',
                              initial_bankroll=10000,
                              kelly_fraction=0.25,
                              flat_bet_pct=0.02):
    """
    Simulate betting strategy performance

    Parameters:
    - n_bets: Number of bets to simulate
    - win_prob: Probability of winning each bet
    - odds: American odds
    - strategy: 'kelly', 'flat', or 'martingale'
    - initial_bankroll: Starting bankroll
    - kelly_fraction: Fraction of Kelly to use
    - flat_bet_pct: Percentage of bankroll for flat betting
    """
    bankroll = initial_bankroll
    bankroll_history = [bankroll]

    # Convert odds to payout multiplier
    if odds > 0:
        payout_multiplier = odds / 100
    else:
        payout_multiplier = 100 / abs(odds)

    for _ in range(n_bets):
        # Determine bet size based on strategy
        if strategy == 'kelly':
            decimal_odds = 1 + payout_multiplier
            b = decimal_odds - 1
            kelly_pct = (b * win_prob - (1 - win_prob)) / b
            bet_size = bankroll * max(0, kelly_pct * kelly_fraction)
        elif strategy == 'flat':
            bet_size = bankroll * flat_bet_pct
        else:  # martingale (not recommended)
            bet_size = bankroll * 0.01

        # Ensure we don't bet more than bankroll
        bet_size = min(bet_size, bankroll)

        if bet_size == 0:
            break

        # Simulate bet outcome
        won = np.random.random() < win_prob

        if won:
            bankroll += bet_size * payout_multiplier
        else:
            bankroll -= bet_size

        bankroll_history.append(bankroll)

        # Stop if bankrupt
        if bankroll <= 0:
            break

    return np.array(bankroll_history)

# Run simulation comparison
np.random.seed(42)
n_simulations = 100

kelly_results = []
flat_results = []

for _ in range(n_simulations):
    kelly_results.append(simulate_betting_strategy(
        n_bets=1000,
        win_prob=0.55,
        strategy='kelly',
        kelly_fraction=0.25
    ))
    flat_results.append(simulate_betting_strategy(
        n_bets=1000,
        win_prob=0.55,
        strategy='flat',
        flat_bet_pct=0.02
    ))

# Calculate average outcomes
kelly_avg = np.mean([sim[-1] for sim in kelly_results])
flat_avg = np.mean([sim[-1] for sim in flat_results])

print(f"Kelly (25% fraction) average final bankroll: ${kelly_avg:.2f}")
print(f"Flat betting average final bankroll: ${flat_avg:.2f}")
print(f"Kelly advantage: {(kelly_avg / flat_avg - 1) * 100:.1f}%")
R
# R: Kelly Criterion implementation
kelly_criterion <- function(prob, odds, kelly_fraction = 1) {
  # Convert American odds to decimal
  if (odds > 0) {
    decimal_odds <- 1 + (odds / 100)
  } else {
    decimal_odds <- 1 + (100 / abs(odds))
  }

  b <- decimal_odds - 1
  p <- prob
  q <- 1 - prob

  # Kelly percentage
  kelly_pct <- (b * p - q) / b

  # Apply fractional Kelly (more conservative)
  adjusted_kelly <- kelly_pct * kelly_fraction

  # Never bet if Kelly is negative
  bet_size <- max(0, adjusted_kelly)

  return(list(
    kelly_pct = kelly_pct,
    adjusted_kelly = adjusted_kelly,
    bet_size = bet_size,
    recommendation = ifelse(bet_size > 0, "BET", "NO BET")
  ))
}

# Example: Calculate Kelly bet sizes
games <- data.frame(
  team = c("Yankees", "Red Sox", "Dodgers", "Giants"),
  odds = c(-150, 120, -180, 160),
  model_prob = c(0.62, 0.47, 0.68, 0.41)
)

# Apply Kelly with 25% fraction (more conservative)
games$kelly_full <- mapply(kelly_criterion, games$model_prob, games$odds, 1)
games$kelly_quarter <- mapply(kelly_criterion, games$model_prob, games$odds, 0.25)

# Extract bet sizes
games$full_kelly_pct <- sapply(games$kelly_full, function(x) x$bet_size * 100)
games$quarter_kelly_pct <- sapply(games$kelly_quarter, function(x) x$bet_size * 100)

print(games[, c("team", "odds", "model_prob", "full_kelly_pct", "quarter_kelly_pct")])
Python
# Python: Expected value calculations
def calculate_ev(true_prob, odds, bet_amount=100):
    """
    Calculate expected value of a bet

    Parameters:
    - true_prob: Your estimated probability of winning
    - odds: American odds
    - bet_amount: Size of bet
    """
    # Convert odds to payout
    if odds > 0:
        payout = bet_amount * (odds / 100)
    else:
        payout = bet_amount * (100 / abs(odds))

    # Calculate EV
    win_amount = payout
    lose_amount = bet_amount

    ev = (true_prob * win_amount) - ((1 - true_prob) * lose_amount)
    ev_percent = (ev / bet_amount) * 100

    return {
        'ev_dollars': ev,
        'ev_percent': ev_percent,
        'implied_prob': american_to_prob(odds),
        'edge': true_prob - american_to_prob(odds)
    }

# Example: Finding positive EV bets
games = pd.DataFrame({
    'team': ['Yankees', 'Red Sox', 'Dodgers', 'Giants'],
    'odds': [-150, 120, -180, 160],
    'model_prob': [0.62, 0.47, 0.68, 0.41]
})

games['ev_analysis'] = games.apply(
    lambda row: calculate_ev(row['model_prob'], row['odds']),
    axis=1
)

# Extract EV metrics
games['ev_percent'] = games['ev_analysis'].apply(lambda x: x['ev_percent'])
games['edge'] = games['ev_analysis'].apply(lambda x: x['edge'])

# Find positive EV bets
positive_ev = games[games['ev_percent'] > 0].sort_values('ev_percent', ascending=False)
print("Positive EV Opportunities:")
print(positive_ev[['team', 'odds', 'model_prob', 'ev_percent', 'edge']])
Python
# Python: Monte Carlo simulation of Kelly vs. flat betting
import numpy as np
import matplotlib.pyplot as plt

def simulate_betting_strategy(n_bets=1000,
                              win_prob=0.55,
                              odds=-110,
                              strategy='kelly',
                              initial_bankroll=10000,
                              kelly_fraction=0.25,
                              flat_bet_pct=0.02):
    """
    Simulate betting strategy performance

    Parameters:
    - n_bets: Number of bets to simulate
    - win_prob: Probability of winning each bet
    - odds: American odds
    - strategy: 'kelly', 'flat', or 'martingale'
    - initial_bankroll: Starting bankroll
    - kelly_fraction: Fraction of Kelly to use
    - flat_bet_pct: Percentage of bankroll for flat betting
    """
    bankroll = initial_bankroll
    bankroll_history = [bankroll]

    # Convert odds to payout multiplier
    if odds > 0:
        payout_multiplier = odds / 100
    else:
        payout_multiplier = 100 / abs(odds)

    for _ in range(n_bets):
        # Determine bet size based on strategy
        if strategy == 'kelly':
            decimal_odds = 1 + payout_multiplier
            b = decimal_odds - 1
            kelly_pct = (b * win_prob - (1 - win_prob)) / b
            bet_size = bankroll * max(0, kelly_pct * kelly_fraction)
        elif strategy == 'flat':
            bet_size = bankroll * flat_bet_pct
        else:  # martingale (not recommended)
            bet_size = bankroll * 0.01

        # Ensure we don't bet more than bankroll
        bet_size = min(bet_size, bankroll)

        if bet_size == 0:
            break

        # Simulate bet outcome
        won = np.random.random() < win_prob

        if won:
            bankroll += bet_size * payout_multiplier
        else:
            bankroll -= bet_size

        bankroll_history.append(bankroll)

        # Stop if bankrupt
        if bankroll <= 0:
            break

    return np.array(bankroll_history)

# Run simulation comparison
np.random.seed(42)
n_simulations = 100

kelly_results = []
flat_results = []

for _ in range(n_simulations):
    kelly_results.append(simulate_betting_strategy(
        n_bets=1000,
        win_prob=0.55,
        strategy='kelly',
        kelly_fraction=0.25
    ))
    flat_results.append(simulate_betting_strategy(
        n_bets=1000,
        win_prob=0.55,
        strategy='flat',
        flat_bet_pct=0.02
    ))

# Calculate average outcomes
kelly_avg = np.mean([sim[-1] for sim in kelly_results])
flat_avg = np.mean([sim[-1] for sim in flat_results])

print(f"Kelly (25% fraction) average final bankroll: ${kelly_avg:.2f}")
print(f"Flat betting average final bankroll: ${flat_avg:.2f}")
print(f"Kelly advantage: {(kelly_avg / flat_avg - 1) * 100:.1f}%")

30.4 Player Props & Statcast-Based Models

Modeling Player Performance

Player props require different modeling approaches than game outcomes. We need to predict distributions rather than binary outcomes.

# Python: Player prop modeling with Statcast data
import pandas as pd
import numpy as np
from scipy import stats
from sklearn.ensemble import RandomForestRegressor

class PlayerPropModel:
    """Model for predicting player performance props"""

    def __init__(self, stat_type='hits'):
        """
        stat_type: 'hits', 'strikeouts', 'home_runs', 'total_bases', etc.
        """
        self.stat_type = stat_type
        self.model = None

    def create_features(self, player_df, pitcher_df, matchup_data):
        """Create features for player prop prediction"""
        features = {
            # Recent performance
            'last_7_avg': player_df['last_7_games'][self.stat_type].mean(),
            'last_30_avg': player_df['last_30_games'][self.stat_type].mean(),
            'season_avg': player_df['season'][self.stat_type].mean(),

            # Advanced metrics
            'exit_velocity_avg': player_df['exit_velocity'].mean(),
            'hard_hit_pct': player_df['hard_hit_rate'].mean(),
            'barrel_pct': player_df['barrel_rate'].mean(),
            'xwOBA': player_df['xwOBA'].mean(),

            # Matchup specific
            'vs_pitcher_hand': matchup_data['vs_hand_splits'][self.stat_type],
            'vs_pitcher_historical': matchup_data['vs_pitcher'][self.stat_type],

            # Pitcher strength
            'pitcher_k_rate': pitcher_df['k_rate'],
            'pitcher_whip': pitcher_df['whip'],
            'pitcher_xFIP': pitcher_df['xfip'],

            # Context
            'home_away': matchup_data['is_home'],
            'park_factor': matchup_data['park_factor'],
            'batting_order_pos': matchup_data['lineup_position']
        }

        return pd.Series(features)

    def predict_distribution(self, features):
        """Predict distribution of player stat"""
        # Use Poisson or Negative Binomial for count stats
        if self.stat_type in ['hits', 'strikeouts', 'home_runs']:
            # Predict lambda (mean) of Poisson distribution
            predicted_mean = self.model.predict([features])[0]

            # Generate probability distribution
            max_value = int(predicted_mean * 3 + 10)
            values = np.arange(0, max_value)
            probabilities = stats.poisson.pmf(values, predicted_mean)

            return {
                'values': values,
                'probabilities': probabilities,
                'mean': predicted_mean,
                'median': stats.poisson.median(predicted_mean),
                'std': np.sqrt(predicted_mean)
            }
        else:
            # For continuous stats, use normal distribution
            predicted_mean = self.model.predict([features])[0]
            predicted_std = self.estimate_std(features)

            return {
                'mean': predicted_mean,
                'std': predicted_std,
                'distribution': 'normal'
            }

    def calculate_prop_ev(self, line, over_odds, under_odds, predicted_dist):
        """Calculate EV for over/under prop"""
        # Probability of going over
        if predicted_dist.get('distribution') == 'normal':
            prob_over = 1 - stats.norm.cdf(
                line,
                predicted_dist['mean'],
                predicted_dist['std']
            )
        else:
            # For discrete distributions (Poisson)
            prob_over = 1 - stats.poisson.cdf(
                line - 0.5,  # Adjust for discrete
                predicted_dist['mean']
            )

        prob_under = 1 - prob_over

        # Calculate EV for both sides
        over_ev = calculate_ev(prob_over, over_odds)
        under_ev = calculate_ev(prob_under, under_odds)

        return {
            'prob_over': prob_over,
            'prob_under': prob_under,
            'over_ev': over_ev,
            'under_ev': under_ev,
            'best_bet': 'OVER' if over_ev['ev_percent'] > under_ev['ev_percent']
                       else 'UNDER'
        }

# Example: Aaron Judge home run prop
def analyze_hr_prop(player_data, pitcher_data, matchup_data, line=0.5,
                    over_odds=180, under_odds=-240):
    """Analyze home run prop bet"""

    prop_model = PlayerPropModel(stat_type='home_runs')

    # Create features
    features = prop_model.create_features(player_data, pitcher_data, matchup_data)

    # Predict HR distribution
    hr_dist = prop_model.predict_distribution(features)

    print(f"Predicted HR mean: {hr_dist['mean']:.3f}")
    print(f"Probability of 1+ HR: {1 - hr_dist['probabilities'][0]:.1%}")
    print(f"Probability of 2+ HR: {1 - sum(hr_dist['probabilities'][:2]):.1%}")

    # Calculate prop EV
    prop_analysis = prop_model.calculate_prop_ev(
        line, over_odds, under_odds, hr_dist
    )

    print(f"\nProp Analysis (Line: {line})")
    print(f"Probability OVER: {prop_analysis['prob_over']:.1%}")
    print(f"OVER EV: {prop_analysis['over_ev']['ev_percent']:.2f}%")
    print(f"UNDER EV: {prop_analysis['under_ev']['ev_percent']:.2f}%")
    print(f"Recommendation: {prop_analysis['best_bet']}")

    return prop_analysis

Strikeout Props Using Pitcher-Batter Matchups

# R: Strikeout prop model
library(MASS)  # For negative binomial

strikeout_prop_model <- function(pitcher_data, batter_data, lineup) {
  # Pitcher baseline K rate
  pitcher_k_rate <- pitcher_data$k_per_9 / 9  # Per batter faced

  # Adjust for each batter in lineup
  expected_k <- 0

  for (i in 1:length(lineup)) {
    batter <- lineup[i]
    batter_k_rate <- batter_data[batter_data$player_id == batter, ]$k_rate

    # Assume pitcher and batter rates are independent (simplified)
    # More sophisticated: use matchup data
    matchup_k_prob <- pitcher_k_rate * batter_k_rate / 0.23  # League avg

    # Weight by likelihood of facing this batter
    # Top of order faces more batters
    at_bats_expected <- ifelse(i <= 3, 4.5, ifelse(i <= 6, 4, 3.5))

    expected_k <- expected_k + (matchup_k_prob * at_bats_expected)
  }

  # Model as negative binomial (overdispersed Poisson)
  # Estimate dispersion parameter
  size_param <- expected_k / 2  # Variance = mean + mean^2/size

  # Calculate probabilities for each K total
  k_values <- 0:15
  k_probs <- dnbinom(k_values, size = size_param, mu = expected_k)

  results <- data.frame(
    strikeouts = k_values,
    probability = k_probs,
    cumulative_prob = pnbinom(k_values, size = size_param, mu = expected_k)
  )

  return(list(
    expected_k = expected_k,
    distribution = results,
    size = size_param
  ))
}

# Calculate prop value
evaluate_k_prop <- function(model_results, line, over_odds, under_odds) {
  # Probability of going over
  prob_over <- 1 - pnbinom(
    line - 0.5,  # Adjust for whole numbers
    size = model_results$size,
    mu = model_results$expected_k
  )

  prob_under <- 1 - prob_over

  # Calculate EV
  over_ev <- calculate_ev_r(prob_over, over_odds)
  under_ev <- calculate_ev_r(prob_under, under_odds)

  cat(sprintf("Expected Strikeouts: %.2f\n", model_results$expected_k))
  cat(sprintf("P(Over %.1f): %.2f%%\n", line, prob_over * 100))
  cat(sprintf("OVER EV: %.2f%%\n", over_ev$ev_percent))
  cat(sprintf("UNDER EV: %.2f%%\n", under_ev$ev_percent))

  return(list(
    prob_over = prob_over,
    prob_under = prob_under,
    over_ev = over_ev,
    under_ev = under_ev
  ))
}

calculate_ev_r <- function(prob, odds) {
  if (odds > 0) {
    payout <- odds / 100
  } else {
    payout <- 100 / abs(odds)
  }

  ev <- prob * payout - (1 - prob)
  ev_percent <- ev * 100

  return(list(ev = ev, ev_percent = ev_percent))
}
R
# R: Strikeout prop model
library(MASS)  # For negative binomial

strikeout_prop_model <- function(pitcher_data, batter_data, lineup) {
  # Pitcher baseline K rate
  pitcher_k_rate <- pitcher_data$k_per_9 / 9  # Per batter faced

  # Adjust for each batter in lineup
  expected_k <- 0

  for (i in 1:length(lineup)) {
    batter <- lineup[i]
    batter_k_rate <- batter_data[batter_data$player_id == batter, ]$k_rate

    # Assume pitcher and batter rates are independent (simplified)
    # More sophisticated: use matchup data
    matchup_k_prob <- pitcher_k_rate * batter_k_rate / 0.23  # League avg

    # Weight by likelihood of facing this batter
    # Top of order faces more batters
    at_bats_expected <- ifelse(i <= 3, 4.5, ifelse(i <= 6, 4, 3.5))

    expected_k <- expected_k + (matchup_k_prob * at_bats_expected)
  }

  # Model as negative binomial (overdispersed Poisson)
  # Estimate dispersion parameter
  size_param <- expected_k / 2  # Variance = mean + mean^2/size

  # Calculate probabilities for each K total
  k_values <- 0:15
  k_probs <- dnbinom(k_values, size = size_param, mu = expected_k)

  results <- data.frame(
    strikeouts = k_values,
    probability = k_probs,
    cumulative_prob = pnbinom(k_values, size = size_param, mu = expected_k)
  )

  return(list(
    expected_k = expected_k,
    distribution = results,
    size = size_param
  ))
}

# Calculate prop value
evaluate_k_prop <- function(model_results, line, over_odds, under_odds) {
  # Probability of going over
  prob_over <- 1 - pnbinom(
    line - 0.5,  # Adjust for whole numbers
    size = model_results$size,
    mu = model_results$expected_k
  )

  prob_under <- 1 - prob_over

  # Calculate EV
  over_ev <- calculate_ev_r(prob_over, over_odds)
  under_ev <- calculate_ev_r(prob_under, under_odds)

  cat(sprintf("Expected Strikeouts: %.2f\n", model_results$expected_k))
  cat(sprintf("P(Over %.1f): %.2f%%\n", line, prob_over * 100))
  cat(sprintf("OVER EV: %.2f%%\n", over_ev$ev_percent))
  cat(sprintf("UNDER EV: %.2f%%\n", under_ev$ev_percent))

  return(list(
    prob_over = prob_over,
    prob_under = prob_under,
    over_ev = over_ev,
    under_ev = under_ev
  ))
}

calculate_ev_r <- function(prob, odds) {
  if (odds > 0) {
    payout <- odds / 100
  } else {
    payout <- 100 / abs(odds)
  }

  ev <- prob * payout - (1 - prob)
  ev_percent <- ev * 100

  return(list(ev = ev, ev_percent = ev_percent))
}
Python
# Python: Player prop modeling with Statcast data
import pandas as pd
import numpy as np
from scipy import stats
from sklearn.ensemble import RandomForestRegressor

class PlayerPropModel:
    """Model for predicting player performance props"""

    def __init__(self, stat_type='hits'):
        """
        stat_type: 'hits', 'strikeouts', 'home_runs', 'total_bases', etc.
        """
        self.stat_type = stat_type
        self.model = None

    def create_features(self, player_df, pitcher_df, matchup_data):
        """Create features for player prop prediction"""
        features = {
            # Recent performance
            'last_7_avg': player_df['last_7_games'][self.stat_type].mean(),
            'last_30_avg': player_df['last_30_games'][self.stat_type].mean(),
            'season_avg': player_df['season'][self.stat_type].mean(),

            # Advanced metrics
            'exit_velocity_avg': player_df['exit_velocity'].mean(),
            'hard_hit_pct': player_df['hard_hit_rate'].mean(),
            'barrel_pct': player_df['barrel_rate'].mean(),
            'xwOBA': player_df['xwOBA'].mean(),

            # Matchup specific
            'vs_pitcher_hand': matchup_data['vs_hand_splits'][self.stat_type],
            'vs_pitcher_historical': matchup_data['vs_pitcher'][self.stat_type],

            # Pitcher strength
            'pitcher_k_rate': pitcher_df['k_rate'],
            'pitcher_whip': pitcher_df['whip'],
            'pitcher_xFIP': pitcher_df['xfip'],

            # Context
            'home_away': matchup_data['is_home'],
            'park_factor': matchup_data['park_factor'],
            'batting_order_pos': matchup_data['lineup_position']
        }

        return pd.Series(features)

    def predict_distribution(self, features):
        """Predict distribution of player stat"""
        # Use Poisson or Negative Binomial for count stats
        if self.stat_type in ['hits', 'strikeouts', 'home_runs']:
            # Predict lambda (mean) of Poisson distribution
            predicted_mean = self.model.predict([features])[0]

            # Generate probability distribution
            max_value = int(predicted_mean * 3 + 10)
            values = np.arange(0, max_value)
            probabilities = stats.poisson.pmf(values, predicted_mean)

            return {
                'values': values,
                'probabilities': probabilities,
                'mean': predicted_mean,
                'median': stats.poisson.median(predicted_mean),
                'std': np.sqrt(predicted_mean)
            }
        else:
            # For continuous stats, use normal distribution
            predicted_mean = self.model.predict([features])[0]
            predicted_std = self.estimate_std(features)

            return {
                'mean': predicted_mean,
                'std': predicted_std,
                'distribution': 'normal'
            }

    def calculate_prop_ev(self, line, over_odds, under_odds, predicted_dist):
        """Calculate EV for over/under prop"""
        # Probability of going over
        if predicted_dist.get('distribution') == 'normal':
            prob_over = 1 - stats.norm.cdf(
                line,
                predicted_dist['mean'],
                predicted_dist['std']
            )
        else:
            # For discrete distributions (Poisson)
            prob_over = 1 - stats.poisson.cdf(
                line - 0.5,  # Adjust for discrete
                predicted_dist['mean']
            )

        prob_under = 1 - prob_over

        # Calculate EV for both sides
        over_ev = calculate_ev(prob_over, over_odds)
        under_ev = calculate_ev(prob_under, under_odds)

        return {
            'prob_over': prob_over,
            'prob_under': prob_under,
            'over_ev': over_ev,
            'under_ev': under_ev,
            'best_bet': 'OVER' if over_ev['ev_percent'] > under_ev['ev_percent']
                       else 'UNDER'
        }

# Example: Aaron Judge home run prop
def analyze_hr_prop(player_data, pitcher_data, matchup_data, line=0.5,
                    over_odds=180, under_odds=-240):
    """Analyze home run prop bet"""

    prop_model = PlayerPropModel(stat_type='home_runs')

    # Create features
    features = prop_model.create_features(player_data, pitcher_data, matchup_data)

    # Predict HR distribution
    hr_dist = prop_model.predict_distribution(features)

    print(f"Predicted HR mean: {hr_dist['mean']:.3f}")
    print(f"Probability of 1+ HR: {1 - hr_dist['probabilities'][0]:.1%}")
    print(f"Probability of 2+ HR: {1 - sum(hr_dist['probabilities'][:2]):.1%}")

    # Calculate prop EV
    prop_analysis = prop_model.calculate_prop_ev(
        line, over_odds, under_odds, hr_dist
    )

    print(f"\nProp Analysis (Line: {line})")
    print(f"Probability OVER: {prop_analysis['prob_over']:.1%}")
    print(f"OVER EV: {prop_analysis['over_ev']['ev_percent']:.2f}%")
    print(f"UNDER EV: {prop_analysis['under_ev']['ev_percent']:.2f}%")
    print(f"Recommendation: {prop_analysis['best_bet']}")

    return prop_analysis

30.5 Live Betting & In-Game Win Probability

Win Probability Models

Live betting requires real-time win probability updates based on game state.

# Python: In-game win probability model
import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier

class LiveWinProbabilityModel:
    """Real-time win probability for live betting"""

    def __init__(self):
        self.model = None
        self.historical_states = None

    def create_game_state_features(self, game_state):
        """
        Extract features from current game state

        game_state should include:
        - inning, outs, runners, score, pitcher, batter, etc.
        """
        features = {
            # Score situation
            'score_diff': game_state['home_score'] - game_state['away_score'],
            'home_score': game_state['home_score'],
            'away_score': game_state['away_score'],

            # Game progress
            'inning': game_state['inning'],
            'is_top': game_state['is_top_inning'],
            'outs': game_state['outs'],
            'innings_remaining': 9 - game_state['inning'] + (1 if game_state['is_top_inning'] else 0),

            # Base runners
            'runner_1b': game_state['runner_on_1st'],
            'runner_2b': game_state['runner_on_2nd'],
            'runner_3b': game_state['runner_on_3rd'],
            'runners_on': game_state['runner_on_1st'] + game_state['runner_on_2nd'] + game_state['runner_on_3rd'],

            # Base-out state (24 states)
            'base_out_state': self.calculate_base_out_state(game_state),

            # Pitcher/batter matchup
            'current_pitcher_era': game_state['pitcher_era'],
            'pitcher_pitch_count': game_state['pitch_count'],
            'current_batter_woba': game_state['batter_woba'],

            # Team strength (pre-game)
            'home_team_strength': game_state['home_elo_rating'],
            'away_team_strength': game_state['away_elo_rating'],

            # Bullpen availability
            'home_bullpen_available': game_state['home_bullpen_innings'],
            'away_bullpen_available': game_state['away_bullpen_innings']
        }

        # Interaction features
        features['score_diff_per_inning_left'] = (
            features['score_diff'] / max(features['innings_remaining'], 1)
        )

        return pd.Series(features)

    def calculate_base_out_state(self, game_state):
        """Calculate base-out state (0-23)"""
        bases = (game_state['runner_on_1st'] * 1 +
                game_state['runner_on_2nd'] * 2 +
                game_state['runner_on_3rd'] * 4)
        return bases * 3 + game_state['outs']

    def train_from_historical_games(self, historical_play_by_play):
        """Train model on historical play-by-play data"""
        features_list = []
        labels = []

        for game_id, game_data in historical_play_by_play.groupby('game_id'):
            home_won = game_data.iloc[-1]['home_won']

            for idx, play in game_data.iterrows():
                # Skip if game is over
                if play['game_over']:
                    continue

                state_features = self.create_game_state_features(play)
                features_list.append(state_features)
                labels.append(home_won)

        X = pd.DataFrame(features_list)
        y = np.array(labels)

        # Train gradient boosting model
        self.model = GradientBoostingClassifier(
            n_estimators=200,
            max_depth=5,
            learning_rate=0.1,
            random_state=42
        )

        self.model.fit(X, y)

        print(f"Model trained on {len(X)} game states")
        print(f"Training accuracy: {self.model.score(X, y):.3f}")

    def predict_win_probability(self, game_state):
        """Predict home team win probability for current game state"""
        features = self.create_game_state_features(game_state)
        features_df = pd.DataFrame([features])

        win_prob = self.model.predict_proba(features_df)[0, 1]

        return {
            'home_win_prob': win_prob,
            'away_win_prob': 1 - win_prob
        }

    def find_live_betting_opportunities(self, game_state, live_odds):
        """Compare model probability to live betting odds"""
        model_probs = self.predict_win_probability(game_state)

        # Convert live odds to probabilities
        home_implied = american_to_prob(live_odds['home_ml'])
        away_implied = american_to_prob(live_odds['away_ml'])

        # Calculate edges
        home_edge = model_probs['home_win_prob'] - home_implied
        away_edge = model_probs['away_win_prob'] - away_implied

        # Calculate EV
        home_ev = calculate_ev(model_probs['home_win_prob'], live_odds['home_ml'])
        away_ev = calculate_ev(model_probs['away_win_prob'], live_odds['away_ml'])

        recommendation = None
        if home_ev['ev_percent'] > 2:  # 2% EV threshold
            recommendation = 'BET HOME'
        elif away_ev['ev_percent'] > 2:
            recommendation = 'BET AWAY'
        else:
            recommendation = 'NO BET'

        return {
            'model_home_prob': model_probs['home_win_prob'],
            'implied_home_prob': home_implied,
            'home_edge': home_edge,
            'home_ev': home_ev['ev_percent'],
            'away_ev': away_ev['ev_percent'],
            'recommendation': recommendation
        }

# Example: Analyze live betting situation
def live_bet_example():
    """Example of live betting analysis"""
    # Current game state (top of 7th, tie game)
    game_state = {
        'home_score': 3,
        'away_score': 3,
        'inning': 7,
        'is_top_inning': True,
        'outs': 1,
        'runner_on_1st': True,
        'runner_on_2nd': False,
        'runner_on_3rd': False,
        'pitcher_era': 3.85,
        'pitch_count': 89,
        'batter_woba': 0.340,
        'home_elo_rating': 1520,
        'away_elo_rating': 1480,
        'home_bullpen_innings': 8.0,
        'away_bullpen_innings': 6.5,
        'home_won': None  # To be predicted
    }

    # Live betting odds
    live_odds = {
        'home_ml': -125,
        'away_ml': +105
    }

    model = LiveWinProbabilityModel()
    # Assume model is already trained

    analysis = model.find_live_betting_opportunities(game_state, live_odds)

    print("Live Betting Analysis:")
    print(f"Model Home Win Prob: {analysis['model_home_prob']:.1%}")
    print(f"Implied Home Win Prob: {analysis['implied_home_prob']:.1%}")
    print(f"Edge: {analysis['home_edge']:.1%}")
    print(f"Home EV: {analysis['home_ev']:.2f}%")
    print(f"Away EV: {analysis['away_ev']:.2f}%")
    print(f"Recommendation: {analysis['recommendation']}")

Run Expectancy Matrix for Live Betting

# R: Run expectancy and win probability
library(dplyr)

# Create run expectancy matrix from historical data
create_re_matrix <- function(play_by_play_data) {
  # Calculate runs scored in remainder of inning for each state
  re_matrix <- play_by_play_data %>%
    group_by(outs, runner_1b, runner_2b, runner_3b) %>%
    summarise(
      avg_runs_scored = mean(runs_end_of_inning - runs_start_of_play),
      .groups = 'drop'
    ) %>%
    arrange(outs, runner_1b, runner_2b, runner_3b)

  return(re_matrix)
}

# Win probability added (WPA) for each play
calculate_wpa <- function(win_prob_before, win_prob_after) {
  return(win_prob_after - win_prob_before)
}

# Leverage index - how important is this game situation?
calculate_leverage <- function(game_state, wp_model) {
  # Calculate WP change for all possible outcomes
  outcomes <- c('out', 'single', 'double', 'triple', 'hr', 'walk')
  wp_changes <- numeric(length(outcomes))

  base_wp <- predict_wp(game_state, wp_model)

  for (i in seq_along(outcomes)) {
    new_state <- simulate_outcome(game_state, outcomes[i])
    new_wp <- predict_wp(new_state, wp_model)
    wp_changes[i] <- abs(new_wp - base_wp)
  }

  # Leverage is average absolute WP change weighted by outcome probability
  outcome_probs <- c(0.65, 0.15, 0.05, 0.01, 0.03, 0.11)  # Rough estimates
  leverage <- sum(wp_changes * outcome_probs)

  return(leverage)
}

# Live betting edge calculation
live_betting_edge <- function(model_wp, live_odds_home, live_odds_away) {
  implied_home <- american_to_prob(live_odds_home)
  implied_away <- american_to_prob(live_odds_away)

  # Remove vig
  total <- implied_home + implied_away
  fair_home <- implied_home / total
  fair_away <- implied_away / total

  # Compare to model
  home_edge <- model_wp - fair_home
  away_edge <- (1 - model_wp) - fair_away

  return(list(
    home_edge = home_edge,
    away_edge = away_edge,
    model_wp = model_wp,
    fair_home = fair_home,
    fair_away = fair_away
  ))
}
R
# R: Run expectancy and win probability
library(dplyr)

# Create run expectancy matrix from historical data
create_re_matrix <- function(play_by_play_data) {
  # Calculate runs scored in remainder of inning for each state
  re_matrix <- play_by_play_data %>%
    group_by(outs, runner_1b, runner_2b, runner_3b) %>%
    summarise(
      avg_runs_scored = mean(runs_end_of_inning - runs_start_of_play),
      .groups = 'drop'
    ) %>%
    arrange(outs, runner_1b, runner_2b, runner_3b)

  return(re_matrix)
}

# Win probability added (WPA) for each play
calculate_wpa <- function(win_prob_before, win_prob_after) {
  return(win_prob_after - win_prob_before)
}

# Leverage index - how important is this game situation?
calculate_leverage <- function(game_state, wp_model) {
  # Calculate WP change for all possible outcomes
  outcomes <- c('out', 'single', 'double', 'triple', 'hr', 'walk')
  wp_changes <- numeric(length(outcomes))

  base_wp <- predict_wp(game_state, wp_model)

  for (i in seq_along(outcomes)) {
    new_state <- simulate_outcome(game_state, outcomes[i])
    new_wp <- predict_wp(new_state, wp_model)
    wp_changes[i] <- abs(new_wp - base_wp)
  }

  # Leverage is average absolute WP change weighted by outcome probability
  outcome_probs <- c(0.65, 0.15, 0.05, 0.01, 0.03, 0.11)  # Rough estimates
  leverage <- sum(wp_changes * outcome_probs)

  return(leverage)
}

# Live betting edge calculation
live_betting_edge <- function(model_wp, live_odds_home, live_odds_away) {
  implied_home <- american_to_prob(live_odds_home)
  implied_away <- american_to_prob(live_odds_away)

  # Remove vig
  total <- implied_home + implied_away
  fair_home <- implied_home / total
  fair_away <- implied_away / total

  # Compare to model
  home_edge <- model_wp - fair_home
  away_edge <- (1 - model_wp) - fair_away

  return(list(
    home_edge = home_edge,
    away_edge = away_edge,
    model_wp = model_wp,
    fair_home = fair_home,
    fair_away = fair_away
  ))
}
Python
# Python: In-game win probability model
import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier

class LiveWinProbabilityModel:
    """Real-time win probability for live betting"""

    def __init__(self):
        self.model = None
        self.historical_states = None

    def create_game_state_features(self, game_state):
        """
        Extract features from current game state

        game_state should include:
        - inning, outs, runners, score, pitcher, batter, etc.
        """
        features = {
            # Score situation
            'score_diff': game_state['home_score'] - game_state['away_score'],
            'home_score': game_state['home_score'],
            'away_score': game_state['away_score'],

            # Game progress
            'inning': game_state['inning'],
            'is_top': game_state['is_top_inning'],
            'outs': game_state['outs'],
            'innings_remaining': 9 - game_state['inning'] + (1 if game_state['is_top_inning'] else 0),

            # Base runners
            'runner_1b': game_state['runner_on_1st'],
            'runner_2b': game_state['runner_on_2nd'],
            'runner_3b': game_state['runner_on_3rd'],
            'runners_on': game_state['runner_on_1st'] + game_state['runner_on_2nd'] + game_state['runner_on_3rd'],

            # Base-out state (24 states)
            'base_out_state': self.calculate_base_out_state(game_state),

            # Pitcher/batter matchup
            'current_pitcher_era': game_state['pitcher_era'],
            'pitcher_pitch_count': game_state['pitch_count'],
            'current_batter_woba': game_state['batter_woba'],

            # Team strength (pre-game)
            'home_team_strength': game_state['home_elo_rating'],
            'away_team_strength': game_state['away_elo_rating'],

            # Bullpen availability
            'home_bullpen_available': game_state['home_bullpen_innings'],
            'away_bullpen_available': game_state['away_bullpen_innings']
        }

        # Interaction features
        features['score_diff_per_inning_left'] = (
            features['score_diff'] / max(features['innings_remaining'], 1)
        )

        return pd.Series(features)

    def calculate_base_out_state(self, game_state):
        """Calculate base-out state (0-23)"""
        bases = (game_state['runner_on_1st'] * 1 +
                game_state['runner_on_2nd'] * 2 +
                game_state['runner_on_3rd'] * 4)
        return bases * 3 + game_state['outs']

    def train_from_historical_games(self, historical_play_by_play):
        """Train model on historical play-by-play data"""
        features_list = []
        labels = []

        for game_id, game_data in historical_play_by_play.groupby('game_id'):
            home_won = game_data.iloc[-1]['home_won']

            for idx, play in game_data.iterrows():
                # Skip if game is over
                if play['game_over']:
                    continue

                state_features = self.create_game_state_features(play)
                features_list.append(state_features)
                labels.append(home_won)

        X = pd.DataFrame(features_list)
        y = np.array(labels)

        # Train gradient boosting model
        self.model = GradientBoostingClassifier(
            n_estimators=200,
            max_depth=5,
            learning_rate=0.1,
            random_state=42
        )

        self.model.fit(X, y)

        print(f"Model trained on {len(X)} game states")
        print(f"Training accuracy: {self.model.score(X, y):.3f}")

    def predict_win_probability(self, game_state):
        """Predict home team win probability for current game state"""
        features = self.create_game_state_features(game_state)
        features_df = pd.DataFrame([features])

        win_prob = self.model.predict_proba(features_df)[0, 1]

        return {
            'home_win_prob': win_prob,
            'away_win_prob': 1 - win_prob
        }

    def find_live_betting_opportunities(self, game_state, live_odds):
        """Compare model probability to live betting odds"""
        model_probs = self.predict_win_probability(game_state)

        # Convert live odds to probabilities
        home_implied = american_to_prob(live_odds['home_ml'])
        away_implied = american_to_prob(live_odds['away_ml'])

        # Calculate edges
        home_edge = model_probs['home_win_prob'] - home_implied
        away_edge = model_probs['away_win_prob'] - away_implied

        # Calculate EV
        home_ev = calculate_ev(model_probs['home_win_prob'], live_odds['home_ml'])
        away_ev = calculate_ev(model_probs['away_win_prob'], live_odds['away_ml'])

        recommendation = None
        if home_ev['ev_percent'] > 2:  # 2% EV threshold
            recommendation = 'BET HOME'
        elif away_ev['ev_percent'] > 2:
            recommendation = 'BET AWAY'
        else:
            recommendation = 'NO BET'

        return {
            'model_home_prob': model_probs['home_win_prob'],
            'implied_home_prob': home_implied,
            'home_edge': home_edge,
            'home_ev': home_ev['ev_percent'],
            'away_ev': away_ev['ev_percent'],
            'recommendation': recommendation
        }

# Example: Analyze live betting situation
def live_bet_example():
    """Example of live betting analysis"""
    # Current game state (top of 7th, tie game)
    game_state = {
        'home_score': 3,
        'away_score': 3,
        'inning': 7,
        'is_top_inning': True,
        'outs': 1,
        'runner_on_1st': True,
        'runner_on_2nd': False,
        'runner_on_3rd': False,
        'pitcher_era': 3.85,
        'pitch_count': 89,
        'batter_woba': 0.340,
        'home_elo_rating': 1520,
        'away_elo_rating': 1480,
        'home_bullpen_innings': 8.0,
        'away_bullpen_innings': 6.5,
        'home_won': None  # To be predicted
    }

    # Live betting odds
    live_odds = {
        'home_ml': -125,
        'away_ml': +105
    }

    model = LiveWinProbabilityModel()
    # Assume model is already trained

    analysis = model.find_live_betting_opportunities(game_state, live_odds)

    print("Live Betting Analysis:")
    print(f"Model Home Win Prob: {analysis['model_home_prob']:.1%}")
    print(f"Implied Home Win Prob: {analysis['implied_home_prob']:.1%}")
    print(f"Edge: {analysis['home_edge']:.1%}")
    print(f"Home EV: {analysis['home_ev']:.2f}%")
    print(f"Away EV: {analysis['away_ev']:.2f}%")
    print(f"Recommendation: {analysis['recommendation']}")

30.6 Bankroll Management & Risk Assessment

Bankroll Management Strategies

Proper bankroll management is critical for long-term success in sports betting.

# Python: Comprehensive bankroll management system
import numpy as np
import pandas as pd

class BankrollManager:
    """Advanced bankroll management system"""

    def __init__(self, initial_bankroll, max_bet_pct=0.05, kelly_fraction=0.25):
        self.initial_bankroll = initial_bankroll
        self.current_bankroll = initial_bankroll
        self.max_bet_pct = max_bet_pct
        self.kelly_fraction = kelly_fraction
        self.bet_history = []

    def calculate_bet_size(self, edge, odds, method='fractional_kelly'):
        """
        Calculate appropriate bet size

        Methods:
        - 'fractional_kelly': Conservative Kelly (recommended)
        - 'full_kelly': Full Kelly (high variance)
        - 'fixed_unit': Fixed percentage of bankroll
        - 'fixed_dollar': Fixed dollar amount
        """
        if method == 'fractional_kelly':
            # Convert edge to win probability
            implied_prob = american_to_prob(odds)
            true_prob = implied_prob + edge

            # Kelly calculation
            if odds > 0:
                b = odds / 100
            else:
                b = 100 / abs(odds)

            kelly_pct = (b * true_prob - (1 - true_prob)) / b
            kelly_pct = max(0, kelly_pct)  # Never negative

            # Apply fraction
            bet_pct = kelly_pct * self.kelly_fraction

            # Apply maximum bet constraint
            bet_pct = min(bet_pct, self.max_bet_pct)

            bet_size = self.current_bankroll * bet_pct

        elif method == 'fixed_unit':
            bet_size = self.current_bankroll * 0.01  # 1 unit = 1%

        elif method == 'fixed_dollar':
            bet_size = 100  # Fixed $100

        else:  # full_kelly
            implied_prob = american_to_prob(odds)
            true_prob = implied_prob + edge

            if odds > 0:
                b = odds / 100
            else:
                b = 100 / abs(odds)

            kelly_pct = (b * true_prob - (1 - true_prob)) / b
            kelly_pct = max(0, min(kelly_pct, self.max_bet_pct))

            bet_size = self.current_bankroll * kelly_pct

        return {
            'bet_size': bet_size,
            'bet_pct': bet_size / self.current_bankroll,
            'method': method
        }

    def place_bet(self, bet_size, odds, won):
        """Record bet and update bankroll"""
        # Calculate payout
        if odds > 0:
            payout = bet_size * (odds / 100) if won else -bet_size
        else:
            payout = bet_size * (100 / abs(odds)) if won else -bet_size

        # Update bankroll
        self.current_bankroll += payout

        # Record bet
        self.bet_history.append({
            'bet_size': bet_size,
            'odds': odds,
            'won': won,
            'payout': payout,
            'bankroll_after': self.current_bankroll,
            'roi': (payout / bet_size) if bet_size > 0 else 0
        })

        return {
            'payout': payout,
            'new_bankroll': self.current_bankroll,
            'total_roi': (self.current_bankroll / self.initial_bankroll - 1)
        }

    def get_statistics(self):
        """Calculate bankroll statistics"""
        if not self.bet_history:
            return {}

        history_df = pd.DataFrame(self.bet_history)

        total_bets = len(history_df)
        wins = history_df['won'].sum()
        losses = total_bets - wins
        win_rate = wins / total_bets if total_bets > 0 else 0

        total_wagered = history_df['bet_size'].sum()
        total_profit = history_df['payout'].sum()
        roi = (total_profit / total_wagered) if total_wagered > 0 else 0

        # Calculate maximum drawdown
        bankroll_series = history_df['bankroll_after']
        running_max = bankroll_series.expanding().max()
        drawdown = (bankroll_series - running_max) / running_max
        max_drawdown = drawdown.min()

        # Sharpe ratio (simplified)
        returns = history_df['roi']
        sharpe = (returns.mean() / returns.std()) if returns.std() > 0 else 0

        return {
            'total_bets': total_bets,
            'wins': wins,
            'losses': losses,
            'win_rate': win_rate,
            'total_wagered': total_wagered,
            'total_profit': total_profit,
            'roi': roi,
            'current_bankroll': self.current_bankroll,
            'total_return': (self.current_bankroll / self.initial_bankroll - 1),
            'max_drawdown': max_drawdown,
            'sharpe_ratio': sharpe,
            'avg_bet_size': history_df['bet_size'].mean(),
            'largest_bet': history_df['bet_size'].max()
        }

    def risk_of_ruin(self, win_prob, avg_odds, n_bets=100):
        """
        Calculate probability of losing entire bankroll
        Uses simplified formula for Risk of Ruin
        """
        # Convert odds to win/loss ratio
        if avg_odds > 0:
            win_loss_ratio = avg_odds / 100
        else:
            win_loss_ratio = 100 / abs(avg_odds)

        # Risk of ruin formula
        # RoR = ((1-p)/p * (1/w))^B where p=win prob, w=win/loss ratio, B=bankroll units

        lose_prob = 1 - win_prob

        if win_prob * (1 + win_loss_ratio) > 1:  # Positive expectation
            # Units in bankroll (assuming 1% per bet)
            units = 100
            ror = ((lose_prob / win_prob) * (1 / win_loss_ratio)) ** units
        else:  # Negative expectation
            ror = 1.0  # Certain ruin in long run

        return min(ror, 1.0)

# Example: Simulate betting season
def simulate_betting_season(n_games=162,
                            edge=0.03,
                            avg_odds=-110,
                            win_rate=0.55,
                            initial_bankroll=10000):
    """Simulate a full betting season"""
    manager = BankrollManager(
        initial_bankroll=initial_bankroll,
        kelly_fraction=0.25
    )

    np.random.seed(42)

    for i in range(n_games):
        # Calculate bet size
        bet_info = manager.calculate_bet_size(edge, avg_odds, method='fractional_kelly')

        # Simulate outcome
        won = np.random.random() < win_rate

        # Place bet
        manager.place_bet(bet_info['bet_size'], avg_odds, won)

    # Get final statistics
    stats = manager.get_statistics()

    print("Season Betting Results:")
    print(f"Total Bets: {stats['total_bets']}")
    print(f"Win Rate: {stats['win_rate']:.1%}")
    print(f"Total Wagered: ${stats['total_wagered']:,.2f}")
    print(f"Total Profit: ${stats['total_profit']:,.2f}")
    print(f"ROI: {stats['roi']:.2%}")
    print(f"Final Bankroll: ${stats['current_bankroll']:,.2f}")
    print(f"Total Return: {stats['total_return']:.2%}")
    print(f"Max Drawdown: {stats['max_drawdown']:.2%}")
    print(f"Sharpe Ratio: {stats['sharpe_ratio']:.2f}")

    return manager

Risk Assessment and Portfolio Theory

# R: Portfolio theory applied to sports betting
library(ggplot2)

# Variance of Kelly betting
kelly_variance <- function(edge, win_prob, n_bets) {
  # Kelly fraction
  kelly_pct <- edge / (1 - win_prob)

  # Variance of log returns
  log_win <- log(1 + kelly_pct)
  log_lose <- log(1 - kelly_pct)

  expected_log_return <- win_prob * log_win + (1 - win_prob) * log_lose
  variance_log_return <- win_prob * (log_win - expected_log_return)^2 +
                        (1 - win_prob) * (log_lose - expected_log_return)^2

  # Portfolio variance over n bets
  total_variance <- n_bets * variance_log_return

  return(list(
    expected_return = expected_log_return * n_bets,
    variance = total_variance,
    std_dev = sqrt(total_variance)
  ))
}

# Diversification across multiple bet types
portfolio_correlation <- function(bet_types) {
  # Correlation matrix for different bet types
  # Game totals vs moneylines typically have low correlation
  # Player props vs game outcomes have moderate correlation

  n <- length(bet_types)
  cor_matrix <- matrix(0.3, n, n)  # Assume 0.3 correlation
  diag(cor_matrix) <- 1

  return(cor_matrix)
}

# Optimal portfolio allocation
optimal_bet_allocation <- function(edges, variances, correlations, total_bankroll) {
  # Mean-variance optimization
  # Maximize expected return for given risk level

  # This is simplified - in practice would use quadratic programming
  # Here we use a heuristic based on Sharpe ratio

  n_bets <- length(edges)
  sharpe_ratios <- edges / sqrt(variances)

  # Allocate based on Sharpe ratios
  weights <- sharpe_ratios / sum(sharpe_ratios)

  allocations <- weights * total_bankroll

  return(data.frame(
    bet_type = paste0("Bet_", 1:n_bets),
    edge = edges,
    variance = variances,
    sharpe = sharpe_ratios,
    weight = weights,
    allocation = allocations
  ))
}
R
# R: Portfolio theory applied to sports betting
library(ggplot2)

# Variance of Kelly betting
kelly_variance <- function(edge, win_prob, n_bets) {
  # Kelly fraction
  kelly_pct <- edge / (1 - win_prob)

  # Variance of log returns
  log_win <- log(1 + kelly_pct)
  log_lose <- log(1 - kelly_pct)

  expected_log_return <- win_prob * log_win + (1 - win_prob) * log_lose
  variance_log_return <- win_prob * (log_win - expected_log_return)^2 +
                        (1 - win_prob) * (log_lose - expected_log_return)^2

  # Portfolio variance over n bets
  total_variance <- n_bets * variance_log_return

  return(list(
    expected_return = expected_log_return * n_bets,
    variance = total_variance,
    std_dev = sqrt(total_variance)
  ))
}

# Diversification across multiple bet types
portfolio_correlation <- function(bet_types) {
  # Correlation matrix for different bet types
  # Game totals vs moneylines typically have low correlation
  # Player props vs game outcomes have moderate correlation

  n <- length(bet_types)
  cor_matrix <- matrix(0.3, n, n)  # Assume 0.3 correlation
  diag(cor_matrix) <- 1

  return(cor_matrix)
}

# Optimal portfolio allocation
optimal_bet_allocation <- function(edges, variances, correlations, total_bankroll) {
  # Mean-variance optimization
  # Maximize expected return for given risk level

  # This is simplified - in practice would use quadratic programming
  # Here we use a heuristic based on Sharpe ratio

  n_bets <- length(edges)
  sharpe_ratios <- edges / sqrt(variances)

  # Allocate based on Sharpe ratios
  weights <- sharpe_ratios / sum(sharpe_ratios)

  allocations <- weights * total_bankroll

  return(data.frame(
    bet_type = paste0("Bet_", 1:n_bets),
    edge = edges,
    variance = variances,
    sharpe = sharpe_ratios,
    weight = weights,
    allocation = allocations
  ))
}
Python
# Python: Comprehensive bankroll management system
import numpy as np
import pandas as pd

class BankrollManager:
    """Advanced bankroll management system"""

    def __init__(self, initial_bankroll, max_bet_pct=0.05, kelly_fraction=0.25):
        self.initial_bankroll = initial_bankroll
        self.current_bankroll = initial_bankroll
        self.max_bet_pct = max_bet_pct
        self.kelly_fraction = kelly_fraction
        self.bet_history = []

    def calculate_bet_size(self, edge, odds, method='fractional_kelly'):
        """
        Calculate appropriate bet size

        Methods:
        - 'fractional_kelly': Conservative Kelly (recommended)
        - 'full_kelly': Full Kelly (high variance)
        - 'fixed_unit': Fixed percentage of bankroll
        - 'fixed_dollar': Fixed dollar amount
        """
        if method == 'fractional_kelly':
            # Convert edge to win probability
            implied_prob = american_to_prob(odds)
            true_prob = implied_prob + edge

            # Kelly calculation
            if odds > 0:
                b = odds / 100
            else:
                b = 100 / abs(odds)

            kelly_pct = (b * true_prob - (1 - true_prob)) / b
            kelly_pct = max(0, kelly_pct)  # Never negative

            # Apply fraction
            bet_pct = kelly_pct * self.kelly_fraction

            # Apply maximum bet constraint
            bet_pct = min(bet_pct, self.max_bet_pct)

            bet_size = self.current_bankroll * bet_pct

        elif method == 'fixed_unit':
            bet_size = self.current_bankroll * 0.01  # 1 unit = 1%

        elif method == 'fixed_dollar':
            bet_size = 100  # Fixed $100

        else:  # full_kelly
            implied_prob = american_to_prob(odds)
            true_prob = implied_prob + edge

            if odds > 0:
                b = odds / 100
            else:
                b = 100 / abs(odds)

            kelly_pct = (b * true_prob - (1 - true_prob)) / b
            kelly_pct = max(0, min(kelly_pct, self.max_bet_pct))

            bet_size = self.current_bankroll * kelly_pct

        return {
            'bet_size': bet_size,
            'bet_pct': bet_size / self.current_bankroll,
            'method': method
        }

    def place_bet(self, bet_size, odds, won):
        """Record bet and update bankroll"""
        # Calculate payout
        if odds > 0:
            payout = bet_size * (odds / 100) if won else -bet_size
        else:
            payout = bet_size * (100 / abs(odds)) if won else -bet_size

        # Update bankroll
        self.current_bankroll += payout

        # Record bet
        self.bet_history.append({
            'bet_size': bet_size,
            'odds': odds,
            'won': won,
            'payout': payout,
            'bankroll_after': self.current_bankroll,
            'roi': (payout / bet_size) if bet_size > 0 else 0
        })

        return {
            'payout': payout,
            'new_bankroll': self.current_bankroll,
            'total_roi': (self.current_bankroll / self.initial_bankroll - 1)
        }

    def get_statistics(self):
        """Calculate bankroll statistics"""
        if not self.bet_history:
            return {}

        history_df = pd.DataFrame(self.bet_history)

        total_bets = len(history_df)
        wins = history_df['won'].sum()
        losses = total_bets - wins
        win_rate = wins / total_bets if total_bets > 0 else 0

        total_wagered = history_df['bet_size'].sum()
        total_profit = history_df['payout'].sum()
        roi = (total_profit / total_wagered) if total_wagered > 0 else 0

        # Calculate maximum drawdown
        bankroll_series = history_df['bankroll_after']
        running_max = bankroll_series.expanding().max()
        drawdown = (bankroll_series - running_max) / running_max
        max_drawdown = drawdown.min()

        # Sharpe ratio (simplified)
        returns = history_df['roi']
        sharpe = (returns.mean() / returns.std()) if returns.std() > 0 else 0

        return {
            'total_bets': total_bets,
            'wins': wins,
            'losses': losses,
            'win_rate': win_rate,
            'total_wagered': total_wagered,
            'total_profit': total_profit,
            'roi': roi,
            'current_bankroll': self.current_bankroll,
            'total_return': (self.current_bankroll / self.initial_bankroll - 1),
            'max_drawdown': max_drawdown,
            'sharpe_ratio': sharpe,
            'avg_bet_size': history_df['bet_size'].mean(),
            'largest_bet': history_df['bet_size'].max()
        }

    def risk_of_ruin(self, win_prob, avg_odds, n_bets=100):
        """
        Calculate probability of losing entire bankroll
        Uses simplified formula for Risk of Ruin
        """
        # Convert odds to win/loss ratio
        if avg_odds > 0:
            win_loss_ratio = avg_odds / 100
        else:
            win_loss_ratio = 100 / abs(avg_odds)

        # Risk of ruin formula
        # RoR = ((1-p)/p * (1/w))^B where p=win prob, w=win/loss ratio, B=bankroll units

        lose_prob = 1 - win_prob

        if win_prob * (1 + win_loss_ratio) > 1:  # Positive expectation
            # Units in bankroll (assuming 1% per bet)
            units = 100
            ror = ((lose_prob / win_prob) * (1 / win_loss_ratio)) ** units
        else:  # Negative expectation
            ror = 1.0  # Certain ruin in long run

        return min(ror, 1.0)

# Example: Simulate betting season
def simulate_betting_season(n_games=162,
                            edge=0.03,
                            avg_odds=-110,
                            win_rate=0.55,
                            initial_bankroll=10000):
    """Simulate a full betting season"""
    manager = BankrollManager(
        initial_bankroll=initial_bankroll,
        kelly_fraction=0.25
    )

    np.random.seed(42)

    for i in range(n_games):
        # Calculate bet size
        bet_info = manager.calculate_bet_size(edge, avg_odds, method='fractional_kelly')

        # Simulate outcome
        won = np.random.random() < win_rate

        # Place bet
        manager.place_bet(bet_info['bet_size'], avg_odds, won)

    # Get final statistics
    stats = manager.get_statistics()

    print("Season Betting Results:")
    print(f"Total Bets: {stats['total_bets']}")
    print(f"Win Rate: {stats['win_rate']:.1%}")
    print(f"Total Wagered: ${stats['total_wagered']:,.2f}")
    print(f"Total Profit: ${stats['total_profit']:,.2f}")
    print(f"ROI: {stats['roi']:.2%}")
    print(f"Final Bankroll: ${stats['current_bankroll']:,.2f}")
    print(f"Total Return: {stats['total_return']:.2%}")
    print(f"Max Drawdown: {stats['max_drawdown']:.2%}")
    print(f"Sharpe Ratio: {stats['sharpe_ratio']:.2f}")

    return manager

30.7 Building a Betting Model Pipeline

Complete End-to-End Pipeline

# Python: Complete betting model pipeline
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import joblib

class MLBBettingPipeline:
    """End-to-end pipeline for MLB betting predictions"""

    def __init__(self):
        self.game_model = None
        self.prop_models = {}
        self.bankroll_manager = None
        self.feature_cache = {}

    def fetch_data(self, date):
        """Fetch all required data for given date"""
        # This would connect to your data sources
        # For example: MLB API, Statcast, odds API

        data = {
            'games': self.fetch_todays_games(date),
            'team_stats': self.fetch_team_stats(date),
            'pitcher_stats': self.fetch_pitcher_stats(date),
            'odds': self.fetch_odds(date),
            'weather': self.fetch_weather(date),
            'lineups': self.fetch_lineups(date)
        }

        return data

    def fetch_todays_games(self, date):
        """Fetch today's game schedule"""
        # Placeholder - would call MLB API
        return pd.DataFrame({
            'game_id': [1, 2, 3],
            'date': [date] * 3,
            'home_team': ['NYY', 'BOS', 'LAD'],
            'away_team': ['TOR', 'TB', 'SF'],
            'home_pitcher': ['Cole', 'Sale', 'Kershaw'],
            'away_pitcher': ['Gausman', 'Glasnow', 'Webb']
        })

    def engineer_features(self, data):
        """Create all features for predictions"""
        games = data['games']
        features_list = []

        for idx, game in games.iterrows():
            game_features = {
                'game_id': game['game_id'],
                'date': game['date'],
                'home_team': game['home_team'],
                'away_team': game['away_team']
            }

            # Add team stats
            home_stats = self.get_team_features(
                game['home_team'],
                game['date'],
                data['team_stats']
            )
            away_stats = self.get_team_features(
                game['away_team'],
                game['date'],
                data['team_stats']
            )

            # Prefix features
            for key, val in home_stats.items():
                game_features[f'home_{key}'] = val
            for key, val in away_stats.items():
                game_features[f'away_{key}'] = val

            # Add pitcher stats
            home_pitcher_stats = self.get_pitcher_features(
                game['home_pitcher'],
                game['date'],
                data['pitcher_stats']
            )
            away_pitcher_stats = self.get_pitcher_features(
                game['away_pitcher'],
                game['date'],
                data['pitcher_stats']
            )

            for key, val in home_pitcher_stats.items():
                game_features[f'home_pitcher_{key}'] = val
            for key, val in away_pitcher_stats.items():
                game_features[f'away_pitcher_{key}'] = val

            # Add contextual features
            contextual = self.get_contextual_features(game, data)
            game_features.update(contextual)

            features_list.append(game_features)

        return pd.DataFrame(features_list)

    def get_team_features(self, team, date, team_stats):
        """Extract team-level features"""
        team_data = team_stats[
            (team_stats['team'] == team) &
            (team_stats['date'] < date)
        ].tail(20)  # Last 20 games

        if len(team_data) == 0:
            return self.get_default_team_features()

        return {
            'win_pct': team_data['win'].mean(),
            'runs_per_game': team_data['runs_scored'].mean(),
            'runs_allowed_per_game': team_data['runs_allowed'].mean(),
            'wOBA': team_data['wOBA'].mean(),
            'FIP': team_data['FIP'].mean(),
            'bullpen_ERA': team_data['bullpen_ERA'].mean()
        }

    def get_pitcher_features(self, pitcher, date, pitcher_stats):
        """Extract pitcher-level features"""
        pitcher_data = pitcher_stats[
            (pitcher_stats['pitcher'] == pitcher) &
            (pitcher_stats['date'] < date)
        ].tail(5)  # Last 5 starts

        if len(pitcher_data) == 0:
            return self.get_default_pitcher_features()

        return {
            'ERA': pitcher_data['ERA'].mean(),
            'FIP': pitcher_data['FIP'].mean(),
            'WHIP': pitcher_data['WHIP'].mean(),
            'K_per_9': pitcher_data['K_per_9'].mean(),
            'BB_per_9': pitcher_data['BB_per_9'].mean()
        }

    def get_contextual_features(self, game, data):
        """Extract contextual features"""
        weather = data['weather'].get(game['game_id'], {})

        return {
            'temperature': weather.get('temp', 70),
            'wind_speed': weather.get('wind', 0),
            'precipitation': weather.get('precip', 0),
            'day_of_week': game['date'].dayofweek,
            'month': game['date'].month
        }

    def get_default_team_features(self):
        """Return league average features"""
        return {
            'win_pct': 0.500,
            'runs_per_game': 4.5,
            'runs_allowed_per_game': 4.5,
            'wOBA': 0.320,
            'FIP': 4.00,
            'bullpen_ERA': 4.00
        }

    def get_default_pitcher_features(self):
        """Return league average pitcher features"""
        return {
            'ERA': 4.00,
            'FIP': 4.00,
            'WHIP': 1.30,
            'K_per_9': 8.5,
            'BB_per_9': 3.0
        }

    def make_predictions(self, features):
        """Generate predictions for all games"""
        predictions = []

        for idx, game_features in features.iterrows():
            # Predict with game model
            X = game_features[self.game_model.feature_names].values.reshape(1, -1)
            home_win_prob = self.game_model.predict_proba(X)[0]

            predictions.append({
                'game_id': game_features['game_id'],
                'home_team': game_features['home_team'],
                'away_team': game_features['away_team'],
                'home_win_prob': home_win_prob,
                'away_win_prob': 1 - home_win_prob
            })

        return pd.DataFrame(predictions)

    def find_betting_opportunities(self, predictions, odds_data, min_ev=2.0):
        """Identify positive EV betting opportunities"""
        opportunities = []

        for idx, pred in predictions.iterrows():
            game_odds = odds_data[odds_data['game_id'] == pred['game_id']].iloc[0]

            # Check home moneyline
            home_ev = calculate_ev(
                pred['home_win_prob'],
                game_odds['home_ml']
            )

            # Check away moneyline
            away_ev = calculate_ev(
                pred['away_win_prob'],
                game_odds['away_ml']
            )

            # Add positive EV bets
            if home_ev['ev_percent'] >= min_ev:
                opportunities.append({
                    'game_id': pred['game_id'],
                    'bet_type': 'moneyline',
                    'team': pred['home_team'],
                    'odds': game_odds['home_ml'],
                    'model_prob': pred['home_win_prob'],
                    'ev_percent': home_ev['ev_percent'],
                    'edge': home_ev['edge']
                })

            if away_ev['ev_percent'] >= min_ev:
                opportunities.append({
                    'game_id': pred['game_id'],
                    'bet_type': 'moneyline',
                    'team': pred['away_team'],
                    'odds': game_odds['away_ml'],
                    'model_prob': pred['away_win_prob'],
                    'ev_percent': away_ev['ev_percent'],
                    'edge': away_ev['edge']
                })

        return pd.DataFrame(opportunities)

    def calculate_bet_sizes(self, opportunities):
        """Calculate appropriate bet sizes using Kelly"""
        for idx, opp in opportunities.iterrows():
            bet_size_info = self.bankroll_manager.calculate_bet_size(
                edge=opp['edge'],
                odds=opp['odds'],
                method='fractional_kelly'
            )

            opportunities.loc[idx, 'bet_size'] = bet_size_info['bet_size']
            opportunities.loc[idx, 'bet_pct'] = bet_size_info['bet_pct']

        return opportunities

    def generate_daily_report(self, date):
        """Generate complete daily betting report"""
        print(f"=== MLB Betting Report for {date} ===\n")

        # Fetch data
        print("Fetching data...")
        data = self.fetch_data(date)

        # Engineer features
        print("Engineering features...")
        features = self.engineer_features(data)

        # Make predictions
        print("Generating predictions...")
        predictions = self.make_predictions(features)

        # Find opportunities
        print("Identifying betting opportunities...")
        opportunities = self.find_betting_opportunities(
            predictions,
            data['odds'],
            min_ev=2.0
        )

        if len(opportunities) == 0:
            print("\nNo positive EV opportunities found today.")
            return None

        # Calculate bet sizes
        opportunities = self.calculate_bet_sizes(opportunities)

        # Sort by EV
        opportunities = opportunities.sort_values('ev_percent', ascending=False)

        # Display report
        print(f"\n{len(opportunities)} Betting Opportunities Found:\n")
        for idx, opp in opportunities.iterrows():
            print(f"Game: {opp['team']}")
            print(f"  Bet Type: {opp['bet_type']}")
            print(f"  Odds: {opp['odds']:+.0f}")
            print(f"  Model Probability: {opp['model_prob']:.1%}")
            print(f"  Edge: {opp['edge']:.2%}")
            print(f"  EV: {opp['ev_percent']:.2f}%")
            print(f"  Recommended Bet: ${opp['bet_size']:.2f} ({opp['bet_pct']:.2%} of bankroll)")
            print()

        return opportunities

# Example usage
def main():
    """Run daily betting pipeline"""
    # Initialize pipeline
    pipeline = MLBBettingPipeline()

    # Load trained models (assumed to be pre-trained)
    # pipeline.game_model = joblib.load('models/game_model.pkl')

    # Initialize bankroll manager
    pipeline.bankroll_manager = BankrollManager(
        initial_bankroll=10000,
        kelly_fraction=0.25,
        max_bet_pct=0.05
    )

    # Generate report for today
    today = datetime.now()
    opportunities = pipeline.generate_daily_report(today)

    return opportunities

R Implementation of Pipeline

# R: Betting pipeline implementation
library(dplyr)
library(caret)
library(lubridate)

mlb_betting_pipeline <- function(date, models, bankroll = 10000) {
  # Fetch data
  cat("Fetching game data for", as.character(date), "\n")
  games <- fetch_games(date)
  team_stats <- fetch_team_stats(date)
  pitcher_stats <- fetch_pitcher_stats(date)
  odds_data <- fetch_odds(date)

  # Engineer features
  cat("Engineering features...\n")
  features <- engineer_all_features(games, team_stats, pitcher_stats)

  # Make predictions
  cat("Generating predictions...\n")
  predictions <- predict(models$game_model, features, type = "prob")
  features$home_win_prob <- predictions[, "home_win"]

  # Merge with odds
  betting_data <- features %>%
    left_join(odds_data, by = "game_id")

  # Calculate EV for each bet
  betting_data <- betting_data %>%
    mutate(
      home_ml_ev = calculate_ev_r(home_win_prob, home_ml)$ev_percent,
      away_ml_ev = calculate_ev_r(1 - home_win_prob, away_ml)$ev_percent,
      home_edge = home_win_prob - american_to_prob(home_ml),
      away_edge = (1 - home_win_prob) - american_to_prob(away_ml)
    )

  # Find positive EV bets
  home_bets <- betting_data %>%
    filter(home_ml_ev > 2) %>%
    mutate(
      team = home_team,
      odds = home_ml,
      model_prob = home_win_prob,
      ev = home_ml_ev,
      edge = home_edge
    )

  away_bets <- betting_data %>%
    filter(away_ml_ev > 2) %>%
    mutate(
      team = away_team,
      odds = away_ml,
      model_prob = 1 - home_win_prob,
      ev = away_ml_ev,
      edge = away_edge
    )

  opportunities <- bind_rows(home_bets, away_bets) %>%
    select(game_id, team, odds, model_prob, ev, edge) %>%
    arrange(desc(ev))

  # Calculate bet sizes
  if (nrow(opportunities) > 0) {
    opportunities$bet_size <- mapply(
      function(edge, odds) {
        kelly <- kelly_criterion(
          american_to_prob(odds) + edge,
          odds,
          kelly_fraction = 0.25
        )
        bankroll * kelly$bet_size
      },
      opportunities$edge,
      opportunities$odds
    )
  }

  # Print report
  cat("\n=== Betting Opportunities ===\n")
  print(opportunities)

  return(opportunities)
}

# Helper functions
fetch_games <- function(date) {
  # Placeholder - would fetch from API
  data.frame(
    game_id = 1:3,
    date = date,
    home_team = c("NYY", "BOS", "LAD"),
    away_team = c("TOR", "TB", "SF")
  )
}

fetch_odds <- function(date) {
  # Placeholder - would fetch from odds API
  data.frame(
    game_id = 1:3,
    home_ml = c(-150, -120, -180),
    away_ml = c(130, 100, 160)
  )
}

engineer_all_features <- function(games, team_stats, pitcher_stats) {
  # Feature engineering
  # Placeholder
  games
}
R
# R: Betting pipeline implementation
library(dplyr)
library(caret)
library(lubridate)

mlb_betting_pipeline <- function(date, models, bankroll = 10000) {
  # Fetch data
  cat("Fetching game data for", as.character(date), "\n")
  games <- fetch_games(date)
  team_stats <- fetch_team_stats(date)
  pitcher_stats <- fetch_pitcher_stats(date)
  odds_data <- fetch_odds(date)

  # Engineer features
  cat("Engineering features...\n")
  features <- engineer_all_features(games, team_stats, pitcher_stats)

  # Make predictions
  cat("Generating predictions...\n")
  predictions <- predict(models$game_model, features, type = "prob")
  features$home_win_prob <- predictions[, "home_win"]

  # Merge with odds
  betting_data <- features %>%
    left_join(odds_data, by = "game_id")

  # Calculate EV for each bet
  betting_data <- betting_data %>%
    mutate(
      home_ml_ev = calculate_ev_r(home_win_prob, home_ml)$ev_percent,
      away_ml_ev = calculate_ev_r(1 - home_win_prob, away_ml)$ev_percent,
      home_edge = home_win_prob - american_to_prob(home_ml),
      away_edge = (1 - home_win_prob) - american_to_prob(away_ml)
    )

  # Find positive EV bets
  home_bets <- betting_data %>%
    filter(home_ml_ev > 2) %>%
    mutate(
      team = home_team,
      odds = home_ml,
      model_prob = home_win_prob,
      ev = home_ml_ev,
      edge = home_edge
    )

  away_bets <- betting_data %>%
    filter(away_ml_ev > 2) %>%
    mutate(
      team = away_team,
      odds = away_ml,
      model_prob = 1 - home_win_prob,
      ev = away_ml_ev,
      edge = away_edge
    )

  opportunities <- bind_rows(home_bets, away_bets) %>%
    select(game_id, team, odds, model_prob, ev, edge) %>%
    arrange(desc(ev))

  # Calculate bet sizes
  if (nrow(opportunities) > 0) {
    opportunities$bet_size <- mapply(
      function(edge, odds) {
        kelly <- kelly_criterion(
          american_to_prob(odds) + edge,
          odds,
          kelly_fraction = 0.25
        )
        bankroll * kelly$bet_size
      },
      opportunities$edge,
      opportunities$odds
    )
  }

  # Print report
  cat("\n=== Betting Opportunities ===\n")
  print(opportunities)

  return(opportunities)
}

# Helper functions
fetch_games <- function(date) {
  # Placeholder - would fetch from API
  data.frame(
    game_id = 1:3,
    date = date,
    home_team = c("NYY", "BOS", "LAD"),
    away_team = c("TOR", "TB", "SF")
  )
}

fetch_odds <- function(date) {
  # Placeholder - would fetch from odds API
  data.frame(
    game_id = 1:3,
    home_ml = c(-150, -120, -180),
    away_ml = c(130, 100, 160)
  )
}

engineer_all_features <- function(games, team_stats, pitcher_stats) {
  # Feature engineering
  # Placeholder
  games
}
Python
# Python: Complete betting model pipeline
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import joblib

class MLBBettingPipeline:
    """End-to-end pipeline for MLB betting predictions"""

    def __init__(self):
        self.game_model = None
        self.prop_models = {}
        self.bankroll_manager = None
        self.feature_cache = {}

    def fetch_data(self, date):
        """Fetch all required data for given date"""
        # This would connect to your data sources
        # For example: MLB API, Statcast, odds API

        data = {
            'games': self.fetch_todays_games(date),
            'team_stats': self.fetch_team_stats(date),
            'pitcher_stats': self.fetch_pitcher_stats(date),
            'odds': self.fetch_odds(date),
            'weather': self.fetch_weather(date),
            'lineups': self.fetch_lineups(date)
        }

        return data

    def fetch_todays_games(self, date):
        """Fetch today's game schedule"""
        # Placeholder - would call MLB API
        return pd.DataFrame({
            'game_id': [1, 2, 3],
            'date': [date] * 3,
            'home_team': ['NYY', 'BOS', 'LAD'],
            'away_team': ['TOR', 'TB', 'SF'],
            'home_pitcher': ['Cole', 'Sale', 'Kershaw'],
            'away_pitcher': ['Gausman', 'Glasnow', 'Webb']
        })

    def engineer_features(self, data):
        """Create all features for predictions"""
        games = data['games']
        features_list = []

        for idx, game in games.iterrows():
            game_features = {
                'game_id': game['game_id'],
                'date': game['date'],
                'home_team': game['home_team'],
                'away_team': game['away_team']
            }

            # Add team stats
            home_stats = self.get_team_features(
                game['home_team'],
                game['date'],
                data['team_stats']
            )
            away_stats = self.get_team_features(
                game['away_team'],
                game['date'],
                data['team_stats']
            )

            # Prefix features
            for key, val in home_stats.items():
                game_features[f'home_{key}'] = val
            for key, val in away_stats.items():
                game_features[f'away_{key}'] = val

            # Add pitcher stats
            home_pitcher_stats = self.get_pitcher_features(
                game['home_pitcher'],
                game['date'],
                data['pitcher_stats']
            )
            away_pitcher_stats = self.get_pitcher_features(
                game['away_pitcher'],
                game['date'],
                data['pitcher_stats']
            )

            for key, val in home_pitcher_stats.items():
                game_features[f'home_pitcher_{key}'] = val
            for key, val in away_pitcher_stats.items():
                game_features[f'away_pitcher_{key}'] = val

            # Add contextual features
            contextual = self.get_contextual_features(game, data)
            game_features.update(contextual)

            features_list.append(game_features)

        return pd.DataFrame(features_list)

    def get_team_features(self, team, date, team_stats):
        """Extract team-level features"""
        team_data = team_stats[
            (team_stats['team'] == team) &
            (team_stats['date'] < date)
        ].tail(20)  # Last 20 games

        if len(team_data) == 0:
            return self.get_default_team_features()

        return {
            'win_pct': team_data['win'].mean(),
            'runs_per_game': team_data['runs_scored'].mean(),
            'runs_allowed_per_game': team_data['runs_allowed'].mean(),
            'wOBA': team_data['wOBA'].mean(),
            'FIP': team_data['FIP'].mean(),
            'bullpen_ERA': team_data['bullpen_ERA'].mean()
        }

    def get_pitcher_features(self, pitcher, date, pitcher_stats):
        """Extract pitcher-level features"""
        pitcher_data = pitcher_stats[
            (pitcher_stats['pitcher'] == pitcher) &
            (pitcher_stats['date'] < date)
        ].tail(5)  # Last 5 starts

        if len(pitcher_data) == 0:
            return self.get_default_pitcher_features()

        return {
            'ERA': pitcher_data['ERA'].mean(),
            'FIP': pitcher_data['FIP'].mean(),
            'WHIP': pitcher_data['WHIP'].mean(),
            'K_per_9': pitcher_data['K_per_9'].mean(),
            'BB_per_9': pitcher_data['BB_per_9'].mean()
        }

    def get_contextual_features(self, game, data):
        """Extract contextual features"""
        weather = data['weather'].get(game['game_id'], {})

        return {
            'temperature': weather.get('temp', 70),
            'wind_speed': weather.get('wind', 0),
            'precipitation': weather.get('precip', 0),
            'day_of_week': game['date'].dayofweek,
            'month': game['date'].month
        }

    def get_default_team_features(self):
        """Return league average features"""
        return {
            'win_pct': 0.500,
            'runs_per_game': 4.5,
            'runs_allowed_per_game': 4.5,
            'wOBA': 0.320,
            'FIP': 4.00,
            'bullpen_ERA': 4.00
        }

    def get_default_pitcher_features(self):
        """Return league average pitcher features"""
        return {
            'ERA': 4.00,
            'FIP': 4.00,
            'WHIP': 1.30,
            'K_per_9': 8.5,
            'BB_per_9': 3.0
        }

    def make_predictions(self, features):
        """Generate predictions for all games"""
        predictions = []

        for idx, game_features in features.iterrows():
            # Predict with game model
            X = game_features[self.game_model.feature_names].values.reshape(1, -1)
            home_win_prob = self.game_model.predict_proba(X)[0]

            predictions.append({
                'game_id': game_features['game_id'],
                'home_team': game_features['home_team'],
                'away_team': game_features['away_team'],
                'home_win_prob': home_win_prob,
                'away_win_prob': 1 - home_win_prob
            })

        return pd.DataFrame(predictions)

    def find_betting_opportunities(self, predictions, odds_data, min_ev=2.0):
        """Identify positive EV betting opportunities"""
        opportunities = []

        for idx, pred in predictions.iterrows():
            game_odds = odds_data[odds_data['game_id'] == pred['game_id']].iloc[0]

            # Check home moneyline
            home_ev = calculate_ev(
                pred['home_win_prob'],
                game_odds['home_ml']
            )

            # Check away moneyline
            away_ev = calculate_ev(
                pred['away_win_prob'],
                game_odds['away_ml']
            )

            # Add positive EV bets
            if home_ev['ev_percent'] >= min_ev:
                opportunities.append({
                    'game_id': pred['game_id'],
                    'bet_type': 'moneyline',
                    'team': pred['home_team'],
                    'odds': game_odds['home_ml'],
                    'model_prob': pred['home_win_prob'],
                    'ev_percent': home_ev['ev_percent'],
                    'edge': home_ev['edge']
                })

            if away_ev['ev_percent'] >= min_ev:
                opportunities.append({
                    'game_id': pred['game_id'],
                    'bet_type': 'moneyline',
                    'team': pred['away_team'],
                    'odds': game_odds['away_ml'],
                    'model_prob': pred['away_win_prob'],
                    'ev_percent': away_ev['ev_percent'],
                    'edge': away_ev['edge']
                })

        return pd.DataFrame(opportunities)

    def calculate_bet_sizes(self, opportunities):
        """Calculate appropriate bet sizes using Kelly"""
        for idx, opp in opportunities.iterrows():
            bet_size_info = self.bankroll_manager.calculate_bet_size(
                edge=opp['edge'],
                odds=opp['odds'],
                method='fractional_kelly'
            )

            opportunities.loc[idx, 'bet_size'] = bet_size_info['bet_size']
            opportunities.loc[idx, 'bet_pct'] = bet_size_info['bet_pct']

        return opportunities

    def generate_daily_report(self, date):
        """Generate complete daily betting report"""
        print(f"=== MLB Betting Report for {date} ===\n")

        # Fetch data
        print("Fetching data...")
        data = self.fetch_data(date)

        # Engineer features
        print("Engineering features...")
        features = self.engineer_features(data)

        # Make predictions
        print("Generating predictions...")
        predictions = self.make_predictions(features)

        # Find opportunities
        print("Identifying betting opportunities...")
        opportunities = self.find_betting_opportunities(
            predictions,
            data['odds'],
            min_ev=2.0
        )

        if len(opportunities) == 0:
            print("\nNo positive EV opportunities found today.")
            return None

        # Calculate bet sizes
        opportunities = self.calculate_bet_sizes(opportunities)

        # Sort by EV
        opportunities = opportunities.sort_values('ev_percent', ascending=False)

        # Display report
        print(f"\n{len(opportunities)} Betting Opportunities Found:\n")
        for idx, opp in opportunities.iterrows():
            print(f"Game: {opp['team']}")
            print(f"  Bet Type: {opp['bet_type']}")
            print(f"  Odds: {opp['odds']:+.0f}")
            print(f"  Model Probability: {opp['model_prob']:.1%}")
            print(f"  Edge: {opp['edge']:.2%}")
            print(f"  EV: {opp['ev_percent']:.2f}%")
            print(f"  Recommended Bet: ${opp['bet_size']:.2f} ({opp['bet_pct']:.2%} of bankroll)")
            print()

        return opportunities

# Example usage
def main():
    """Run daily betting pipeline"""
    # Initialize pipeline
    pipeline = MLBBettingPipeline()

    # Load trained models (assumed to be pre-trained)
    # pipeline.game_model = joblib.load('models/game_model.pkl')

    # Initialize bankroll manager
    pipeline.bankroll_manager = BankrollManager(
        initial_bankroll=10000,
        kelly_fraction=0.25,
        max_bet_pct=0.05
    )

    # Generate report for today
    today = datetime.now()
    opportunities = pipeline.generate_daily_report(today)

    return opportunities

30.8 Exercises

Exercise 1: Odds Conversion (Easy)

Convert the following betting lines to implied probabilities and calculate the vig:

a) Yankees -180 vs Red Sox +160
b) Dodgers -250 vs Giants +210
c) Astros +105 vs Rangers -125

Tasks:


  1. Calculate implied probability for each team

  2. Calculate the vig (overround)

  3. Calculate no-vig (fair) probabilities

Exercise 2: Expected Value Calculation (Easy)

Your model predicts the following win probabilities. Calculate the EV for each bet:

TeamModel ProbOddsBet Amount
Team A58%-140$100
Team B45%+150$100
Team C52%-105$100

Which bets have positive EV?

Exercise 3: Kelly Criterion Application (Medium)

You have a $5,000 bankroll and identify the following edges:

BetOddsEdge
Bet 1-1103%
Bet 2+1405%
Bet 3-1802%

Tasks:


  1. Calculate full Kelly bet size for each

  2. Calculate quarter-Kelly bet size for each

  3. If you can only make one bet, which should you choose and why?

Exercise 4: Run Line vs Moneyline (Medium)

Given:


  • Yankees moneyline: -150 (implied prob 60%)

  • Yankees run line (-1.5): +120

  • Your model gives Yankees 62% to win

Tasks:


  1. Estimate probability Yankees win by 2+ runs (assume they win by 2+ in 45% of their wins)

  2. Calculate EV for moneyline bet

  3. Calculate EV for run line bet

  4. Which is the better bet?

Exercise 5: Player Prop Modeling (Hard)

Model a strikeout prop for pitcher with the following data:

Pitcher Stats (last 5 starts):


  • Strikeouts: 7, 9, 6, 8, 10

  • Innings: 6.0, 7.0, 5.1, 6.2, 7.0

Opposing Team:


  • Team K rate: 24.5% (league average: 22.8%)

Prop Line: Over/Under 6.5 strikeouts

Odds: Over -115, Under -105

Tasks:


  1. Estimate pitcher's expected strikeouts (account for team K rate)

  2. Model as Poisson distribution

  3. Calculate probability of over 6.5

  4. Determine if there's value on either side

  5. Calculate recommended bet size using quarter-Kelly

Exercise 6: Live Betting Simulation (Hard)

Simulate a live betting scenario:

Game State:


  • Top 8th inning, 1 out, runner on 2nd

  • Home team leading 4-3

  • Home team has closer available (1.50 ERA)

  • Away team has 9th hitter, 1-2 hitters due up

Live Odds:


  • Home -200

  • Away +170

Tasks:


  1. Estimate home team win probability using run expectancy

  2. Calculate implied probability from odds

  3. Determine if there's betting value

  4. Estimate how win probability changes if:



  • Current batter gets a hit

  • Current batter makes an out

Exercise 7: Bankroll Simulation (Hard)

Simulate a betting season with the following parameters:

  • Starting bankroll: $10,000
  • Number of bets: 200
  • Average edge: 3%
  • Win rate: 55%
  • Average odds: -110
  • Betting strategy: Quarter-Kelly

Tasks:


  1. Implement the simulation in R or Python

  2. Run 1,000 simulations

  3. Calculate:



  • Mean final bankroll

  • Median final bankroll

  • Probability of doubling bankroll

  • Probability of losing 50%+ of bankroll (risk of ruin)

  • Maximum drawdown distribution



  1. Compare results to flat betting (2% of bankroll per bet)

Exercise 8: Model Calibration (Hard)

You've collected predictions and outcomes from your model:

Predicted ProbActual Win RateSample Size
0.45-0.500.4750
0.50-0.550.5280
0.55-0.600.6175
0.60-0.650.5960
0.65-0.700.6840

Tasks:


  1. Calculate calibration error for each bin

  2. Create a calibration plot

  3. Calculate Brier score

  4. Calculate log loss

  5. Suggest adjustments to improve calibration

  6. Implement isotonic regression calibration in Python

Exercise 9: Multi-Factor Model (Expert)

Build a comprehensive game prediction model using:

Data Sources:


  • Team statistics (last 30 days): batting stats, pitching stats

  • Starting pitcher metrics: ERA, FIP, K/9, BB/9, WHIP

  • Bullpen strength: ERA, recent workload

  • Home field advantage

  • Rest days

  • Weather conditions

  • Umpire factors

Tasks:


  1. Engineer at least 20 meaningful features

  2. Train multiple models (logistic regression, random forest, XGBoost)

  3. Implement proper time-series cross-validation

  4. Calibrate probability predictions

  5. Evaluate model performance using:



  • Log loss

  • Brier score

  • ROC-AUC

  • Calibration plots



  1. Backtest on 2023 season data with proper betting simulation

Exercise 10: Betting Strategy Optimization (Expert)

Design and test an optimal betting strategy:

Requirements:


  • Use your predictions from Exercise 9

  • Incorporate bankroll management

  • Consider multiple bet types (ML, RL, totals)

  • Implement portfolio approach (diversification)

  • Account for correlation between bets

Tasks:


  1. Define criteria for placing bets (minimum EV, minimum edge, etc.)

  2. Implement position sizing strategy

  3. Create risk management rules (max bets per day, max exposure, stop-loss)

  4. Backtest on historical data (full season)

  5. Calculate performance metrics:



  • Total ROI

  • Sharpe ratio

  • Maximum drawdown

  • Win rate

  • Average bet size



  1. Compare to benchmark strategies (flat betting, aggressive Kelly)

  2. Perform sensitivity analysis on key parameters


Summary

This chapter covered the analytical foundations of sports betting as applied to MLB:

  1. Market Mechanics: Understanding how odds work, implied probability, and the efficient market hypothesis
  2. Predictive Modeling: Building robust models using machine learning, Elo ratings, and advanced features
  3. Value Assessment: Calculating expected value and identifying profitable betting opportunities
  4. Kelly Criterion: Optimal bet sizing based on edge and probability
  5. Player Props: Modeling individual player performance using Statcast and distributions
  6. Live Betting: Real-time win probability updates and in-game opportunities
  7. Bankroll Management: Risk assessment, portfolio theory, and long-term sustainability
  8. Implementation: End-to-end pipeline for daily betting analysis

Key Takeaways:

  • Sports betting markets are highly efficient but not perfectly so
  • Consistent profit requires significant edge (typically 3-5%+) after accounting for vig
  • Proper bankroll management is as important as finding edges
  • Model calibration is critical—poorly calibrated probabilities lead to incorrect bet sizing
  • Diversification and portfolio thinking reduce risk
  • Long-term success requires discipline, data-driven decisions, and continuous model improvement

Further Reading:

  • Haghighat, E. et al. (2021). "Machine Learning Applications in Sports Betting"
  • Kovalchik, S. (2016). "Searching for the GOAT of tennis win prediction"
  • Lopez, M. & Matthews, G. (2015). "Building an NCAA basketball prediction model"
  • Boulier, B. & Stekler, H. (2003). "Predicting the outcomes of National Football League games"

Warning: This chapter presents analytical methods for educational purposes. Sports betting involves significant financial risk. Most bettors lose money over time. This material should not be construed as encouragement to gamble. Always bet responsibly and within your means.

Chapter Summary

In this chapter, you learned about advanced sports betting models. Key topics covered:

  • Understanding Betting Markets & Lines
  • Building Predictive Models for MLB Games
  • Expected Value & Kelly Criterion
  • Player Props & Statcast-Based Models
  • Live Betting & In-Game Win Probability
  • Bankroll Management & Risk Assessment