Market Mechanics and Line Setting
Sportsbooks set betting lines through a combination of statistical modeling, historical data, and market dynamics. Understanding these mechanics is crucial for developing profitable betting strategies.
Key Betting Market Types:
- Moneyline (ML): A straight bet on which team will win
- Run Line (RL): Similar to point spread, typically set at ±1.5 runs
- Total (Over/Under): Combined runs scored by both teams
- Player Props: Individual player performance bets
- First Five Innings (F5): Betting on the outcome after 5 innings
Converting Odds to Implied Probability
American odds can be converted to implied probabilities:
# R: Convert American odds to implied probability
american_to_prob <- function(odds) {
if (odds > 0) {
prob <- 100 / (odds + 100)
} else {
prob <- abs(odds) / (abs(odds) + 100)
}
return(prob)
}
# Example: Yankees -150, Red Sox +130
yankees_odds <- -150
redsox_odds <- 130
yankees_implied <- american_to_prob(yankees_odds)
redsox_implied <- american_to_prob(redsox_odds)
cat(sprintf("Yankees implied probability: %.2f%%\n", yankees_implied * 100))
cat(sprintf("Red Sox implied probability: %.2f%%\n", redsox_implied * 100))
cat(sprintf("Total (vig): %.2f%%\n", (yankees_implied + redsox_implied) * 100))
# Python: Convert American odds to implied probability
import numpy as np
import pandas as pd
def american_to_prob(odds):
"""Convert American odds to implied probability"""
if odds > 0:
prob = 100 / (odds + 100)
else:
prob = abs(odds) / (abs(odds) + 100)
return prob
def prob_to_american(prob):
"""Convert probability to American odds"""
if prob >= 0.5:
odds = -(prob * 100) / (1 - prob)
else:
odds = ((1 - prob) * 100) / prob
return odds
# Example
yankees_odds = -150
redsox_odds = 130
yankees_implied = american_to_prob(yankees_odds)
redsox_implied = american_to_prob(redsox_odds)
print(f"Yankees implied probability: {yankees_implied:.2%}")
print(f"Red Sox implied probability: {redsox_implied:.2%}")
print(f"Total (vig): {(yankees_implied + redsox_implied):.2%}")
print(f"Vig: {(yankees_implied + redsox_implied - 1):.2%}")
The Efficient Market Hypothesis in Sports Betting
Sports betting markets exhibit characteristics of efficient markets, where:
- Odds quickly adjust to new information (injuries, weather, lineup changes)
- Sharp bettors exploit inefficiencies, moving lines toward true probabilities
- The "wisdom of the crowd" often produces accurate probability estimates
- Market inefficiencies exist but are difficult to exploit consistently
Quantifying the Vig (Vigorish)
The vig represents the sportsbook's built-in profit margin:
# R: Calculate no-vig (fair) odds
calculate_no_vig_prob <- function(prob1, prob2) {
total <- prob1 + prob2
fair_prob1 <- prob1 / total
fair_prob2 <- prob2 / total
return(list(prob1 = fair_prob1, prob2 = fair_prob2))
}
# Remove vig from our example
fair_probs <- calculate_no_vig_prob(yankees_implied, redsox_implied)
cat(sprintf("Yankees no-vig probability: %.2f%%\n", fair_probs$prob1 * 100))
cat(sprintf("Red Sox no-vig probability: %.2f%%\n", fair_probs$prob2 * 100))
# R: Convert American odds to implied probability
american_to_prob <- function(odds) {
if (odds > 0) {
prob <- 100 / (odds + 100)
} else {
prob <- abs(odds) / (abs(odds) + 100)
}
return(prob)
}
# Example: Yankees -150, Red Sox +130
yankees_odds <- -150
redsox_odds <- 130
yankees_implied <- american_to_prob(yankees_odds)
redsox_implied <- american_to_prob(redsox_odds)
cat(sprintf("Yankees implied probability: %.2f%%\n", yankees_implied * 100))
cat(sprintf("Red Sox implied probability: %.2f%%\n", redsox_implied * 100))
cat(sprintf("Total (vig): %.2f%%\n", (yankees_implied + redsox_implied) * 100))
# R: Calculate no-vig (fair) odds
calculate_no_vig_prob <- function(prob1, prob2) {
total <- prob1 + prob2
fair_prob1 <- prob1 / total
fair_prob2 <- prob2 / total
return(list(prob1 = fair_prob1, prob2 = fair_prob2))
}
# Remove vig from our example
fair_probs <- calculate_no_vig_prob(yankees_implied, redsox_implied)
cat(sprintf("Yankees no-vig probability: %.2f%%\n", fair_probs$prob1 * 100))
cat(sprintf("Red Sox no-vig probability: %.2f%%\n", fair_probs$prob2 * 100))
# Python: Convert American odds to implied probability
import numpy as np
import pandas as pd
def american_to_prob(odds):
"""Convert American odds to implied probability"""
if odds > 0:
prob = 100 / (odds + 100)
else:
prob = abs(odds) / (abs(odds) + 100)
return prob
def prob_to_american(prob):
"""Convert probability to American odds"""
if prob >= 0.5:
odds = -(prob * 100) / (1 - prob)
else:
odds = ((1 - prob) * 100) / prob
return odds
# Example
yankees_odds = -150
redsox_odds = 130
yankees_implied = american_to_prob(yankees_odds)
redsox_implied = american_to_prob(redsox_odds)
print(f"Yankees implied probability: {yankees_implied:.2%}")
print(f"Red Sox implied probability: {redsox_implied:.2%}")
print(f"Total (vig): {(yankees_implied + redsox_implied):.2%}")
print(f"Vig: {(yankees_implied + redsox_implied - 1):.2%}")
Feature Engineering for Game Prediction
Successful betting models require carefully engineered features that capture team strength, matchups, and contextual factors.
# Python: Comprehensive feature engineering for game prediction
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
class MLBGameFeatures:
"""Feature engineering for MLB game predictions"""
def __init__(self, games_df, team_stats_df, pitcher_stats_df):
self.games = games_df
self.team_stats = team_stats_df
self.pitcher_stats = pitcher_stats_df
def create_rolling_stats(self, window=10):
"""Calculate rolling team performance metrics"""
features = []
for idx, game in self.games.iterrows():
team = game['team']
date = game['date']
# Get recent games
recent = self.games[
(self.games['team'] == team) &
(self.games['date'] < date)
].tail(window)
if len(recent) < 5: # Require minimum games
continue
features.append({
'game_id': game['game_id'],
'team': team,
'rolling_win_pct': recent['win'].mean(),
'rolling_runs_scored': recent['runs_scored'].mean(),
'rolling_runs_allowed': recent['runs_allowed'].mean(),
'rolling_wOBA': recent['wOBA'].mean(),
'rolling_FIP': recent['FIP'].mean(),
'days_rest': (date - recent['date'].max()).days
})
return pd.DataFrame(features)
def create_pitcher_features(self, pitcher_id, date, hand):
"""Extract pitcher-specific features"""
pitcher_data = self.pitcher_stats[
(self.pitcher_stats['pitcher_id'] == pitcher_id) &
(self.pitcher_stats['date'] < date)
].tail(5) # Last 5 starts
if len(pitcher_data) == 0:
return {}
return {
'pitcher_ERA': pitcher_data['ERA'].mean(),
'pitcher_FIP': pitcher_data['FIP'].mean(),
'pitcher_K_per_9': pitcher_data['K_per_9'].mean(),
'pitcher_BB_per_9': pitcher_data['BB_per_9'].mean(),
'pitcher_WHIP': pitcher_data['WHIP'].mean(),
'pitcher_hand': hand
}
def create_matchup_features(self, team, opp_team, date):
"""Create team vs team matchup features"""
# Historical head-to-head (last 2 seasons)
h2h = self.games[
(self.games['team'] == team) &
(self.games['opponent'] == opp_team) &
(self.games['date'] < date) &
(self.games['date'] > date - timedelta(days=730))
]
return {
'h2h_win_pct': h2h['win'].mean() if len(h2h) > 0 else 0.5,
'h2h_games': len(h2h)
}
def create_contextual_features(self, game):
"""Contextual features: home/away, weather, rest, etc."""
return {
'is_home': game['is_home'],
'temperature': game.get('temperature', 70),
'wind_speed': game.get('wind_speed', 0),
'is_dome': game.get('is_dome', 0),
'day_of_week': game['date'].dayofweek,
'month': game['date'].month
}
# Example usage
def build_game_features(game_data):
"""Complete feature set for a single game"""
feature_engine = MLBGameFeatures(game_data['historical_games'],
game_data['team_stats'],
game_data['pitcher_stats'])
features = {}
features.update(feature_engine.create_rolling_stats(window=10))
features.update(feature_engine.create_pitcher_features(
game_data['pitcher_id'],
game_data['date'],
game_data['pitcher_hand']
))
features.update(feature_engine.create_matchup_features(
game_data['team'],
game_data['opponent'],
game_data['date']
))
features.update(feature_engine.create_contextual_features(game_data))
return features
Elo Rating System for MLB
Elo ratings provide a simple yet effective method for estimating team strength:
# R: Elo rating system for MLB
library(dplyr)
elo_rating <- function(games_df, K = 20, home_advantage = 30) {
# Initialize ratings
teams <- unique(c(games_df$home_team, games_df$away_team))
ratings <- setNames(rep(1500, length(teams)), teams)
# Store predictions and outcomes
results <- data.frame()
for (i in 1:nrow(games_df)) {
game <- games_df[i, ]
# Get current ratings
home_rating <- ratings[game$home_team]
away_rating <- ratings[game$away_team]
# Expected win probability (with home advantage)
home_expected <- 1 / (1 + 10^(-(home_rating - away_rating + home_advantage) / 400))
# Actual outcome (1 if home team won, 0 otherwise)
home_won <- ifelse(game$home_score > game$away_score, 1, 0)
# Update ratings
home_new <- home_rating + K * (home_won - home_expected)
away_new <- away_rating + K * ((1 - home_won) - (1 - home_expected))
ratings[game$home_team] <- home_new
ratings[game$away_team] <- away_new
# Store result
results <- rbind(results, data.frame(
game_id = game$game_id,
date = game$date,
home_team = game$home_team,
away_team = game$away_team,
home_rating_pre = home_rating,
away_rating_pre = away_rating,
home_prob = home_expected,
home_won = home_won,
home_rating_post = home_new,
away_rating_post = away_new
))
}
return(list(ratings = ratings, results = results))
}
# Calculate Elo predictions for upcoming games
predict_with_elo <- function(home_team, away_team, ratings, home_advantage = 30) {
home_rating <- ratings[home_team]
away_rating <- ratings[away_team]
home_prob <- 1 / (1 + 10^(-(home_rating - away_rating + home_advantage) / 400))
return(list(
home_prob = home_prob,
away_prob = 1 - home_prob,
home_rating = home_rating,
away_rating = away_rating
))
}
Machine Learning Models for Game Prediction
# Python: Advanced ML models for game outcome prediction
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.calibration import CalibratedClassifierCV
import xgboost as xgb
import lightgbm as lgb
class MLBBettingModel:
"""Ensemble model for MLB game predictions"""
def __init__(self, model_type='xgboost'):
self.model_type = model_type
self.model = None
self.scaler = StandardScaler()
self.feature_names = None
def prepare_features(self, df):
"""Prepare feature matrix from game data"""
feature_cols = [
'rolling_win_pct', 'rolling_runs_scored', 'rolling_runs_allowed',
'rolling_wOBA', 'rolling_FIP', 'days_rest',
'pitcher_ERA', 'pitcher_FIP', 'pitcher_K_per_9',
'pitcher_BB_per_9', 'pitcher_WHIP',
'opp_rolling_win_pct', 'opp_rolling_runs_scored',
'opp_rolling_runs_allowed', 'opp_rolling_wOBA', 'opp_rolling_FIP',
'opp_pitcher_ERA', 'opp_pitcher_FIP',
'is_home', 'temperature', 'wind_speed', 'h2h_win_pct'
]
X = df[feature_cols].fillna(df[feature_cols].median())
y = df['win']
self.feature_names = feature_cols
return X, y
def build_model(self):
"""Initialize the prediction model"""
if self.model_type == 'xgboost':
self.model = xgb.XGBClassifier(
n_estimators=200,
max_depth=6,
learning_rate=0.05,
subsample=0.8,
colsample_bytree=0.8,
random_state=42,
eval_metric='logloss'
)
elif self.model_type == 'lightgbm':
self.model = lgb.LGBMClassifier(
n_estimators=200,
max_depth=6,
learning_rate=0.05,
subsample=0.8,
colsample_bytree=0.8,
random_state=42
)
elif self.model_type == 'random_forest':
self.model = RandomForestClassifier(
n_estimators=200,
max_depth=10,
min_samples_split=10,
random_state=42
)
else: # logistic regression
self.model = LogisticRegression(
C=1.0,
max_iter=1000,
random_state=42
)
def train(self, X, y):
"""Train the model with proper calibration"""
# Scale features
X_scaled = self.scaler.fit_transform(X)
# Build and train model
self.build_model()
# Use calibration to ensure probabilities are well-calibrated
self.model = CalibratedClassifierCV(
self.model,
method='isotonic',
cv=5
)
self.model.fit(X_scaled, y)
def predict_proba(self, X):
"""Predict win probabilities"""
X_scaled = self.scaler.transform(X)
return self.model.predict_proba(X_scaled)[:, 1]
def evaluate_time_series(self, X, y, n_splits=5):
"""Evaluate model using time-series cross-validation"""
tscv = TimeSeriesSplit(n_splits=n_splits)
# Scale features
X_scaled = self.scaler.fit_transform(X)
# Build model
self.build_model()
# Cross-validation scores
scores = cross_val_score(
self.model,
X_scaled,
y,
cv=tscv,
scoring='neg_log_loss'
)
return {
'mean_log_loss': -scores.mean(),
'std_log_loss': scores.std(),
'scores': -scores
}
def feature_importance(self):
"""Extract feature importance"""
if hasattr(self.model.base_estimator, 'feature_importances_'):
importances = self.model.base_estimator.feature_importances_
else:
# For linear models, use coefficient magnitude
importances = np.abs(self.model.base_estimator.coef_[0])
feature_imp = pd.DataFrame({
'feature': self.feature_names,
'importance': importances
}).sort_values('importance', ascending=False)
return feature_imp
# R: Elo rating system for MLB
library(dplyr)
elo_rating <- function(games_df, K = 20, home_advantage = 30) {
# Initialize ratings
teams <- unique(c(games_df$home_team, games_df$away_team))
ratings <- setNames(rep(1500, length(teams)), teams)
# Store predictions and outcomes
results <- data.frame()
for (i in 1:nrow(games_df)) {
game <- games_df[i, ]
# Get current ratings
home_rating <- ratings[game$home_team]
away_rating <- ratings[game$away_team]
# Expected win probability (with home advantage)
home_expected <- 1 / (1 + 10^(-(home_rating - away_rating + home_advantage) / 400))
# Actual outcome (1 if home team won, 0 otherwise)
home_won <- ifelse(game$home_score > game$away_score, 1, 0)
# Update ratings
home_new <- home_rating + K * (home_won - home_expected)
away_new <- away_rating + K * ((1 - home_won) - (1 - home_expected))
ratings[game$home_team] <- home_new
ratings[game$away_team] <- away_new
# Store result
results <- rbind(results, data.frame(
game_id = game$game_id,
date = game$date,
home_team = game$home_team,
away_team = game$away_team,
home_rating_pre = home_rating,
away_rating_pre = away_rating,
home_prob = home_expected,
home_won = home_won,
home_rating_post = home_new,
away_rating_post = away_new
))
}
return(list(ratings = ratings, results = results))
}
# Calculate Elo predictions for upcoming games
predict_with_elo <- function(home_team, away_team, ratings, home_advantage = 30) {
home_rating <- ratings[home_team]
away_rating <- ratings[away_team]
home_prob <- 1 / (1 + 10^(-(home_rating - away_rating + home_advantage) / 400))
return(list(
home_prob = home_prob,
away_prob = 1 - home_prob,
home_rating = home_rating,
away_rating = away_rating
))
}
# Python: Comprehensive feature engineering for game prediction
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
class MLBGameFeatures:
"""Feature engineering for MLB game predictions"""
def __init__(self, games_df, team_stats_df, pitcher_stats_df):
self.games = games_df
self.team_stats = team_stats_df
self.pitcher_stats = pitcher_stats_df
def create_rolling_stats(self, window=10):
"""Calculate rolling team performance metrics"""
features = []
for idx, game in self.games.iterrows():
team = game['team']
date = game['date']
# Get recent games
recent = self.games[
(self.games['team'] == team) &
(self.games['date'] < date)
].tail(window)
if len(recent) < 5: # Require minimum games
continue
features.append({
'game_id': game['game_id'],
'team': team,
'rolling_win_pct': recent['win'].mean(),
'rolling_runs_scored': recent['runs_scored'].mean(),
'rolling_runs_allowed': recent['runs_allowed'].mean(),
'rolling_wOBA': recent['wOBA'].mean(),
'rolling_FIP': recent['FIP'].mean(),
'days_rest': (date - recent['date'].max()).days
})
return pd.DataFrame(features)
def create_pitcher_features(self, pitcher_id, date, hand):
"""Extract pitcher-specific features"""
pitcher_data = self.pitcher_stats[
(self.pitcher_stats['pitcher_id'] == pitcher_id) &
(self.pitcher_stats['date'] < date)
].tail(5) # Last 5 starts
if len(pitcher_data) == 0:
return {}
return {
'pitcher_ERA': pitcher_data['ERA'].mean(),
'pitcher_FIP': pitcher_data['FIP'].mean(),
'pitcher_K_per_9': pitcher_data['K_per_9'].mean(),
'pitcher_BB_per_9': pitcher_data['BB_per_9'].mean(),
'pitcher_WHIP': pitcher_data['WHIP'].mean(),
'pitcher_hand': hand
}
def create_matchup_features(self, team, opp_team, date):
"""Create team vs team matchup features"""
# Historical head-to-head (last 2 seasons)
h2h = self.games[
(self.games['team'] == team) &
(self.games['opponent'] == opp_team) &
(self.games['date'] < date) &
(self.games['date'] > date - timedelta(days=730))
]
return {
'h2h_win_pct': h2h['win'].mean() if len(h2h) > 0 else 0.5,
'h2h_games': len(h2h)
}
def create_contextual_features(self, game):
"""Contextual features: home/away, weather, rest, etc."""
return {
'is_home': game['is_home'],
'temperature': game.get('temperature', 70),
'wind_speed': game.get('wind_speed', 0),
'is_dome': game.get('is_dome', 0),
'day_of_week': game['date'].dayofweek,
'month': game['date'].month
}
# Example usage
def build_game_features(game_data):
"""Complete feature set for a single game"""
feature_engine = MLBGameFeatures(game_data['historical_games'],
game_data['team_stats'],
game_data['pitcher_stats'])
features = {}
features.update(feature_engine.create_rolling_stats(window=10))
features.update(feature_engine.create_pitcher_features(
game_data['pitcher_id'],
game_data['date'],
game_data['pitcher_hand']
))
features.update(feature_engine.create_matchup_features(
game_data['team'],
game_data['opponent'],
game_data['date']
))
features.update(feature_engine.create_contextual_features(game_data))
return features
# Python: Advanced ML models for game outcome prediction
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.calibration import CalibratedClassifierCV
import xgboost as xgb
import lightgbm as lgb
class MLBBettingModel:
"""Ensemble model for MLB game predictions"""
def __init__(self, model_type='xgboost'):
self.model_type = model_type
self.model = None
self.scaler = StandardScaler()
self.feature_names = None
def prepare_features(self, df):
"""Prepare feature matrix from game data"""
feature_cols = [
'rolling_win_pct', 'rolling_runs_scored', 'rolling_runs_allowed',
'rolling_wOBA', 'rolling_FIP', 'days_rest',
'pitcher_ERA', 'pitcher_FIP', 'pitcher_K_per_9',
'pitcher_BB_per_9', 'pitcher_WHIP',
'opp_rolling_win_pct', 'opp_rolling_runs_scored',
'opp_rolling_runs_allowed', 'opp_rolling_wOBA', 'opp_rolling_FIP',
'opp_pitcher_ERA', 'opp_pitcher_FIP',
'is_home', 'temperature', 'wind_speed', 'h2h_win_pct'
]
X = df[feature_cols].fillna(df[feature_cols].median())
y = df['win']
self.feature_names = feature_cols
return X, y
def build_model(self):
"""Initialize the prediction model"""
if self.model_type == 'xgboost':
self.model = xgb.XGBClassifier(
n_estimators=200,
max_depth=6,
learning_rate=0.05,
subsample=0.8,
colsample_bytree=0.8,
random_state=42,
eval_metric='logloss'
)
elif self.model_type == 'lightgbm':
self.model = lgb.LGBMClassifier(
n_estimators=200,
max_depth=6,
learning_rate=0.05,
subsample=0.8,
colsample_bytree=0.8,
random_state=42
)
elif self.model_type == 'random_forest':
self.model = RandomForestClassifier(
n_estimators=200,
max_depth=10,
min_samples_split=10,
random_state=42
)
else: # logistic regression
self.model = LogisticRegression(
C=1.0,
max_iter=1000,
random_state=42
)
def train(self, X, y):
"""Train the model with proper calibration"""
# Scale features
X_scaled = self.scaler.fit_transform(X)
# Build and train model
self.build_model()
# Use calibration to ensure probabilities are well-calibrated
self.model = CalibratedClassifierCV(
self.model,
method='isotonic',
cv=5
)
self.model.fit(X_scaled, y)
def predict_proba(self, X):
"""Predict win probabilities"""
X_scaled = self.scaler.transform(X)
return self.model.predict_proba(X_scaled)[:, 1]
def evaluate_time_series(self, X, y, n_splits=5):
"""Evaluate model using time-series cross-validation"""
tscv = TimeSeriesSplit(n_splits=n_splits)
# Scale features
X_scaled = self.scaler.fit_transform(X)
# Build model
self.build_model()
# Cross-validation scores
scores = cross_val_score(
self.model,
X_scaled,
y,
cv=tscv,
scoring='neg_log_loss'
)
return {
'mean_log_loss': -scores.mean(),
'std_log_loss': scores.std(),
'scores': -scores
}
def feature_importance(self):
"""Extract feature importance"""
if hasattr(self.model.base_estimator, 'feature_importances_'):
importances = self.model.base_estimator.feature_importances_
else:
# For linear models, use coefficient magnitude
importances = np.abs(self.model.base_estimator.coef_[0])
feature_imp = pd.DataFrame({
'feature': self.feature_names,
'importance': importances
}).sort_values('importance', ascending=False)
return feature_imp
Calculating Expected Value (EV)
Expected Value is the fundamental metric for assessing betting opportunities:
EV = (Probability of Winning × Amount Won) - (Probability of Losing × Amount Lost)
# Python: Expected value calculations
def calculate_ev(true_prob, odds, bet_amount=100):
"""
Calculate expected value of a bet
Parameters:
- true_prob: Your estimated probability of winning
- odds: American odds
- bet_amount: Size of bet
"""
# Convert odds to payout
if odds > 0:
payout = bet_amount * (odds / 100)
else:
payout = bet_amount * (100 / abs(odds))
# Calculate EV
win_amount = payout
lose_amount = bet_amount
ev = (true_prob * win_amount) - ((1 - true_prob) * lose_amount)
ev_percent = (ev / bet_amount) * 100
return {
'ev_dollars': ev,
'ev_percent': ev_percent,
'implied_prob': american_to_prob(odds),
'edge': true_prob - american_to_prob(odds)
}
# Example: Finding positive EV bets
games = pd.DataFrame({
'team': ['Yankees', 'Red Sox', 'Dodgers', 'Giants'],
'odds': [-150, 120, -180, 160],
'model_prob': [0.62, 0.47, 0.68, 0.41]
})
games['ev_analysis'] = games.apply(
lambda row: calculate_ev(row['model_prob'], row['odds']),
axis=1
)
# Extract EV metrics
games['ev_percent'] = games['ev_analysis'].apply(lambda x: x['ev_percent'])
games['edge'] = games['ev_analysis'].apply(lambda x: x['edge'])
# Find positive EV bets
positive_ev = games[games['ev_percent'] > 0].sort_values('ev_percent', ascending=False)
print("Positive EV Opportunities:")
print(positive_ev[['team', 'odds', 'model_prob', 'ev_percent', 'edge']])
Kelly Criterion for Optimal Bet Sizing
The Kelly Criterion determines optimal bet size based on edge and odds:
Kelly % = (bp - q) / b
Where:
- b = decimal odds - 1
- p = probability of winning
- q = probability of losing (1 - p)
# R: Kelly Criterion implementation
kelly_criterion <- function(prob, odds, kelly_fraction = 1) {
# Convert American odds to decimal
if (odds > 0) {
decimal_odds <- 1 + (odds / 100)
} else {
decimal_odds <- 1 + (100 / abs(odds))
}
b <- decimal_odds - 1
p <- prob
q <- 1 - prob
# Kelly percentage
kelly_pct <- (b * p - q) / b
# Apply fractional Kelly (more conservative)
adjusted_kelly <- kelly_pct * kelly_fraction
# Never bet if Kelly is negative
bet_size <- max(0, adjusted_kelly)
return(list(
kelly_pct = kelly_pct,
adjusted_kelly = adjusted_kelly,
bet_size = bet_size,
recommendation = ifelse(bet_size > 0, "BET", "NO BET")
))
}
# Example: Calculate Kelly bet sizes
games <- data.frame(
team = c("Yankees", "Red Sox", "Dodgers", "Giants"),
odds = c(-150, 120, -180, 160),
model_prob = c(0.62, 0.47, 0.68, 0.41)
)
# Apply Kelly with 25% fraction (more conservative)
games$kelly_full <- mapply(kelly_criterion, games$model_prob, games$odds, 1)
games$kelly_quarter <- mapply(kelly_criterion, games$model_prob, games$odds, 0.25)
# Extract bet sizes
games$full_kelly_pct <- sapply(games$kelly_full, function(x) x$bet_size * 100)
games$quarter_kelly_pct <- sapply(games$kelly_quarter, function(x) x$bet_size * 100)
print(games[, c("team", "odds", "model_prob", "full_kelly_pct", "quarter_kelly_pct")])
Simulating Kelly Performance
# Python: Monte Carlo simulation of Kelly vs. flat betting
import numpy as np
import matplotlib.pyplot as plt
def simulate_betting_strategy(n_bets=1000,
win_prob=0.55,
odds=-110,
strategy='kelly',
initial_bankroll=10000,
kelly_fraction=0.25,
flat_bet_pct=0.02):
"""
Simulate betting strategy performance
Parameters:
- n_bets: Number of bets to simulate
- win_prob: Probability of winning each bet
- odds: American odds
- strategy: 'kelly', 'flat', or 'martingale'
- initial_bankroll: Starting bankroll
- kelly_fraction: Fraction of Kelly to use
- flat_bet_pct: Percentage of bankroll for flat betting
"""
bankroll = initial_bankroll
bankroll_history = [bankroll]
# Convert odds to payout multiplier
if odds > 0:
payout_multiplier = odds / 100
else:
payout_multiplier = 100 / abs(odds)
for _ in range(n_bets):
# Determine bet size based on strategy
if strategy == 'kelly':
decimal_odds = 1 + payout_multiplier
b = decimal_odds - 1
kelly_pct = (b * win_prob - (1 - win_prob)) / b
bet_size = bankroll * max(0, kelly_pct * kelly_fraction)
elif strategy == 'flat':
bet_size = bankroll * flat_bet_pct
else: # martingale (not recommended)
bet_size = bankroll * 0.01
# Ensure we don't bet more than bankroll
bet_size = min(bet_size, bankroll)
if bet_size == 0:
break
# Simulate bet outcome
won = np.random.random() < win_prob
if won:
bankroll += bet_size * payout_multiplier
else:
bankroll -= bet_size
bankroll_history.append(bankroll)
# Stop if bankrupt
if bankroll <= 0:
break
return np.array(bankroll_history)
# Run simulation comparison
np.random.seed(42)
n_simulations = 100
kelly_results = []
flat_results = []
for _ in range(n_simulations):
kelly_results.append(simulate_betting_strategy(
n_bets=1000,
win_prob=0.55,
strategy='kelly',
kelly_fraction=0.25
))
flat_results.append(simulate_betting_strategy(
n_bets=1000,
win_prob=0.55,
strategy='flat',
flat_bet_pct=0.02
))
# Calculate average outcomes
kelly_avg = np.mean([sim[-1] for sim in kelly_results])
flat_avg = np.mean([sim[-1] for sim in flat_results])
print(f"Kelly (25% fraction) average final bankroll: ${kelly_avg:.2f}")
print(f"Flat betting average final bankroll: ${flat_avg:.2f}")
print(f"Kelly advantage: {(kelly_avg / flat_avg - 1) * 100:.1f}%")
# R: Kelly Criterion implementation
kelly_criterion <- function(prob, odds, kelly_fraction = 1) {
# Convert American odds to decimal
if (odds > 0) {
decimal_odds <- 1 + (odds / 100)
} else {
decimal_odds <- 1 + (100 / abs(odds))
}
b <- decimal_odds - 1
p <- prob
q <- 1 - prob
# Kelly percentage
kelly_pct <- (b * p - q) / b
# Apply fractional Kelly (more conservative)
adjusted_kelly <- kelly_pct * kelly_fraction
# Never bet if Kelly is negative
bet_size <- max(0, adjusted_kelly)
return(list(
kelly_pct = kelly_pct,
adjusted_kelly = adjusted_kelly,
bet_size = bet_size,
recommendation = ifelse(bet_size > 0, "BET", "NO BET")
))
}
# Example: Calculate Kelly bet sizes
games <- data.frame(
team = c("Yankees", "Red Sox", "Dodgers", "Giants"),
odds = c(-150, 120, -180, 160),
model_prob = c(0.62, 0.47, 0.68, 0.41)
)
# Apply Kelly with 25% fraction (more conservative)
games$kelly_full <- mapply(kelly_criterion, games$model_prob, games$odds, 1)
games$kelly_quarter <- mapply(kelly_criterion, games$model_prob, games$odds, 0.25)
# Extract bet sizes
games$full_kelly_pct <- sapply(games$kelly_full, function(x) x$bet_size * 100)
games$quarter_kelly_pct <- sapply(games$kelly_quarter, function(x) x$bet_size * 100)
print(games[, c("team", "odds", "model_prob", "full_kelly_pct", "quarter_kelly_pct")])
# Python: Expected value calculations
def calculate_ev(true_prob, odds, bet_amount=100):
"""
Calculate expected value of a bet
Parameters:
- true_prob: Your estimated probability of winning
- odds: American odds
- bet_amount: Size of bet
"""
# Convert odds to payout
if odds > 0:
payout = bet_amount * (odds / 100)
else:
payout = bet_amount * (100 / abs(odds))
# Calculate EV
win_amount = payout
lose_amount = bet_amount
ev = (true_prob * win_amount) - ((1 - true_prob) * lose_amount)
ev_percent = (ev / bet_amount) * 100
return {
'ev_dollars': ev,
'ev_percent': ev_percent,
'implied_prob': american_to_prob(odds),
'edge': true_prob - american_to_prob(odds)
}
# Example: Finding positive EV bets
games = pd.DataFrame({
'team': ['Yankees', 'Red Sox', 'Dodgers', 'Giants'],
'odds': [-150, 120, -180, 160],
'model_prob': [0.62, 0.47, 0.68, 0.41]
})
games['ev_analysis'] = games.apply(
lambda row: calculate_ev(row['model_prob'], row['odds']),
axis=1
)
# Extract EV metrics
games['ev_percent'] = games['ev_analysis'].apply(lambda x: x['ev_percent'])
games['edge'] = games['ev_analysis'].apply(lambda x: x['edge'])
# Find positive EV bets
positive_ev = games[games['ev_percent'] > 0].sort_values('ev_percent', ascending=False)
print("Positive EV Opportunities:")
print(positive_ev[['team', 'odds', 'model_prob', 'ev_percent', 'edge']])
# Python: Monte Carlo simulation of Kelly vs. flat betting
import numpy as np
import matplotlib.pyplot as plt
def simulate_betting_strategy(n_bets=1000,
win_prob=0.55,
odds=-110,
strategy='kelly',
initial_bankroll=10000,
kelly_fraction=0.25,
flat_bet_pct=0.02):
"""
Simulate betting strategy performance
Parameters:
- n_bets: Number of bets to simulate
- win_prob: Probability of winning each bet
- odds: American odds
- strategy: 'kelly', 'flat', or 'martingale'
- initial_bankroll: Starting bankroll
- kelly_fraction: Fraction of Kelly to use
- flat_bet_pct: Percentage of bankroll for flat betting
"""
bankroll = initial_bankroll
bankroll_history = [bankroll]
# Convert odds to payout multiplier
if odds > 0:
payout_multiplier = odds / 100
else:
payout_multiplier = 100 / abs(odds)
for _ in range(n_bets):
# Determine bet size based on strategy
if strategy == 'kelly':
decimal_odds = 1 + payout_multiplier
b = decimal_odds - 1
kelly_pct = (b * win_prob - (1 - win_prob)) / b
bet_size = bankroll * max(0, kelly_pct * kelly_fraction)
elif strategy == 'flat':
bet_size = bankroll * flat_bet_pct
else: # martingale (not recommended)
bet_size = bankroll * 0.01
# Ensure we don't bet more than bankroll
bet_size = min(bet_size, bankroll)
if bet_size == 0:
break
# Simulate bet outcome
won = np.random.random() < win_prob
if won:
bankroll += bet_size * payout_multiplier
else:
bankroll -= bet_size
bankroll_history.append(bankroll)
# Stop if bankrupt
if bankroll <= 0:
break
return np.array(bankroll_history)
# Run simulation comparison
np.random.seed(42)
n_simulations = 100
kelly_results = []
flat_results = []
for _ in range(n_simulations):
kelly_results.append(simulate_betting_strategy(
n_bets=1000,
win_prob=0.55,
strategy='kelly',
kelly_fraction=0.25
))
flat_results.append(simulate_betting_strategy(
n_bets=1000,
win_prob=0.55,
strategy='flat',
flat_bet_pct=0.02
))
# Calculate average outcomes
kelly_avg = np.mean([sim[-1] for sim in kelly_results])
flat_avg = np.mean([sim[-1] for sim in flat_results])
print(f"Kelly (25% fraction) average final bankroll: ${kelly_avg:.2f}")
print(f"Flat betting average final bankroll: ${flat_avg:.2f}")
print(f"Kelly advantage: {(kelly_avg / flat_avg - 1) * 100:.1f}%")
Modeling Player Performance
Player props require different modeling approaches than game outcomes. We need to predict distributions rather than binary outcomes.
# Python: Player prop modeling with Statcast data
import pandas as pd
import numpy as np
from scipy import stats
from sklearn.ensemble import RandomForestRegressor
class PlayerPropModel:
"""Model for predicting player performance props"""
def __init__(self, stat_type='hits'):
"""
stat_type: 'hits', 'strikeouts', 'home_runs', 'total_bases', etc.
"""
self.stat_type = stat_type
self.model = None
def create_features(self, player_df, pitcher_df, matchup_data):
"""Create features for player prop prediction"""
features = {
# Recent performance
'last_7_avg': player_df['last_7_games'][self.stat_type].mean(),
'last_30_avg': player_df['last_30_games'][self.stat_type].mean(),
'season_avg': player_df['season'][self.stat_type].mean(),
# Advanced metrics
'exit_velocity_avg': player_df['exit_velocity'].mean(),
'hard_hit_pct': player_df['hard_hit_rate'].mean(),
'barrel_pct': player_df['barrel_rate'].mean(),
'xwOBA': player_df['xwOBA'].mean(),
# Matchup specific
'vs_pitcher_hand': matchup_data['vs_hand_splits'][self.stat_type],
'vs_pitcher_historical': matchup_data['vs_pitcher'][self.stat_type],
# Pitcher strength
'pitcher_k_rate': pitcher_df['k_rate'],
'pitcher_whip': pitcher_df['whip'],
'pitcher_xFIP': pitcher_df['xfip'],
# Context
'home_away': matchup_data['is_home'],
'park_factor': matchup_data['park_factor'],
'batting_order_pos': matchup_data['lineup_position']
}
return pd.Series(features)
def predict_distribution(self, features):
"""Predict distribution of player stat"""
# Use Poisson or Negative Binomial for count stats
if self.stat_type in ['hits', 'strikeouts', 'home_runs']:
# Predict lambda (mean) of Poisson distribution
predicted_mean = self.model.predict([features])[0]
# Generate probability distribution
max_value = int(predicted_mean * 3 + 10)
values = np.arange(0, max_value)
probabilities = stats.poisson.pmf(values, predicted_mean)
return {
'values': values,
'probabilities': probabilities,
'mean': predicted_mean,
'median': stats.poisson.median(predicted_mean),
'std': np.sqrt(predicted_mean)
}
else:
# For continuous stats, use normal distribution
predicted_mean = self.model.predict([features])[0]
predicted_std = self.estimate_std(features)
return {
'mean': predicted_mean,
'std': predicted_std,
'distribution': 'normal'
}
def calculate_prop_ev(self, line, over_odds, under_odds, predicted_dist):
"""Calculate EV for over/under prop"""
# Probability of going over
if predicted_dist.get('distribution') == 'normal':
prob_over = 1 - stats.norm.cdf(
line,
predicted_dist['mean'],
predicted_dist['std']
)
else:
# For discrete distributions (Poisson)
prob_over = 1 - stats.poisson.cdf(
line - 0.5, # Adjust for discrete
predicted_dist['mean']
)
prob_under = 1 - prob_over
# Calculate EV for both sides
over_ev = calculate_ev(prob_over, over_odds)
under_ev = calculate_ev(prob_under, under_odds)
return {
'prob_over': prob_over,
'prob_under': prob_under,
'over_ev': over_ev,
'under_ev': under_ev,
'best_bet': 'OVER' if over_ev['ev_percent'] > under_ev['ev_percent']
else 'UNDER'
}
# Example: Aaron Judge home run prop
def analyze_hr_prop(player_data, pitcher_data, matchup_data, line=0.5,
over_odds=180, under_odds=-240):
"""Analyze home run prop bet"""
prop_model = PlayerPropModel(stat_type='home_runs')
# Create features
features = prop_model.create_features(player_data, pitcher_data, matchup_data)
# Predict HR distribution
hr_dist = prop_model.predict_distribution(features)
print(f"Predicted HR mean: {hr_dist['mean']:.3f}")
print(f"Probability of 1+ HR: {1 - hr_dist['probabilities'][0]:.1%}")
print(f"Probability of 2+ HR: {1 - sum(hr_dist['probabilities'][:2]):.1%}")
# Calculate prop EV
prop_analysis = prop_model.calculate_prop_ev(
line, over_odds, under_odds, hr_dist
)
print(f"\nProp Analysis (Line: {line})")
print(f"Probability OVER: {prop_analysis['prob_over']:.1%}")
print(f"OVER EV: {prop_analysis['over_ev']['ev_percent']:.2f}%")
print(f"UNDER EV: {prop_analysis['under_ev']['ev_percent']:.2f}%")
print(f"Recommendation: {prop_analysis['best_bet']}")
return prop_analysis
Strikeout Props Using Pitcher-Batter Matchups
# R: Strikeout prop model
library(MASS) # For negative binomial
strikeout_prop_model <- function(pitcher_data, batter_data, lineup) {
# Pitcher baseline K rate
pitcher_k_rate <- pitcher_data$k_per_9 / 9 # Per batter faced
# Adjust for each batter in lineup
expected_k <- 0
for (i in 1:length(lineup)) {
batter <- lineup[i]
batter_k_rate <- batter_data[batter_data$player_id == batter, ]$k_rate
# Assume pitcher and batter rates are independent (simplified)
# More sophisticated: use matchup data
matchup_k_prob <- pitcher_k_rate * batter_k_rate / 0.23 # League avg
# Weight by likelihood of facing this batter
# Top of order faces more batters
at_bats_expected <- ifelse(i <= 3, 4.5, ifelse(i <= 6, 4, 3.5))
expected_k <- expected_k + (matchup_k_prob * at_bats_expected)
}
# Model as negative binomial (overdispersed Poisson)
# Estimate dispersion parameter
size_param <- expected_k / 2 # Variance = mean + mean^2/size
# Calculate probabilities for each K total
k_values <- 0:15
k_probs <- dnbinom(k_values, size = size_param, mu = expected_k)
results <- data.frame(
strikeouts = k_values,
probability = k_probs,
cumulative_prob = pnbinom(k_values, size = size_param, mu = expected_k)
)
return(list(
expected_k = expected_k,
distribution = results,
size = size_param
))
}
# Calculate prop value
evaluate_k_prop <- function(model_results, line, over_odds, under_odds) {
# Probability of going over
prob_over <- 1 - pnbinom(
line - 0.5, # Adjust for whole numbers
size = model_results$size,
mu = model_results$expected_k
)
prob_under <- 1 - prob_over
# Calculate EV
over_ev <- calculate_ev_r(prob_over, over_odds)
under_ev <- calculate_ev_r(prob_under, under_odds)
cat(sprintf("Expected Strikeouts: %.2f\n", model_results$expected_k))
cat(sprintf("P(Over %.1f): %.2f%%\n", line, prob_over * 100))
cat(sprintf("OVER EV: %.2f%%\n", over_ev$ev_percent))
cat(sprintf("UNDER EV: %.2f%%\n", under_ev$ev_percent))
return(list(
prob_over = prob_over,
prob_under = prob_under,
over_ev = over_ev,
under_ev = under_ev
))
}
calculate_ev_r <- function(prob, odds) {
if (odds > 0) {
payout <- odds / 100
} else {
payout <- 100 / abs(odds)
}
ev <- prob * payout - (1 - prob)
ev_percent <- ev * 100
return(list(ev = ev, ev_percent = ev_percent))
}
# R: Strikeout prop model
library(MASS) # For negative binomial
strikeout_prop_model <- function(pitcher_data, batter_data, lineup) {
# Pitcher baseline K rate
pitcher_k_rate <- pitcher_data$k_per_9 / 9 # Per batter faced
# Adjust for each batter in lineup
expected_k <- 0
for (i in 1:length(lineup)) {
batter <- lineup[i]
batter_k_rate <- batter_data[batter_data$player_id == batter, ]$k_rate
# Assume pitcher and batter rates are independent (simplified)
# More sophisticated: use matchup data
matchup_k_prob <- pitcher_k_rate * batter_k_rate / 0.23 # League avg
# Weight by likelihood of facing this batter
# Top of order faces more batters
at_bats_expected <- ifelse(i <= 3, 4.5, ifelse(i <= 6, 4, 3.5))
expected_k <- expected_k + (matchup_k_prob * at_bats_expected)
}
# Model as negative binomial (overdispersed Poisson)
# Estimate dispersion parameter
size_param <- expected_k / 2 # Variance = mean + mean^2/size
# Calculate probabilities for each K total
k_values <- 0:15
k_probs <- dnbinom(k_values, size = size_param, mu = expected_k)
results <- data.frame(
strikeouts = k_values,
probability = k_probs,
cumulative_prob = pnbinom(k_values, size = size_param, mu = expected_k)
)
return(list(
expected_k = expected_k,
distribution = results,
size = size_param
))
}
# Calculate prop value
evaluate_k_prop <- function(model_results, line, over_odds, under_odds) {
# Probability of going over
prob_over <- 1 - pnbinom(
line - 0.5, # Adjust for whole numbers
size = model_results$size,
mu = model_results$expected_k
)
prob_under <- 1 - prob_over
# Calculate EV
over_ev <- calculate_ev_r(prob_over, over_odds)
under_ev <- calculate_ev_r(prob_under, under_odds)
cat(sprintf("Expected Strikeouts: %.2f\n", model_results$expected_k))
cat(sprintf("P(Over %.1f): %.2f%%\n", line, prob_over * 100))
cat(sprintf("OVER EV: %.2f%%\n", over_ev$ev_percent))
cat(sprintf("UNDER EV: %.2f%%\n", under_ev$ev_percent))
return(list(
prob_over = prob_over,
prob_under = prob_under,
over_ev = over_ev,
under_ev = under_ev
))
}
calculate_ev_r <- function(prob, odds) {
if (odds > 0) {
payout <- odds / 100
} else {
payout <- 100 / abs(odds)
}
ev <- prob * payout - (1 - prob)
ev_percent <- ev * 100
return(list(ev = ev, ev_percent = ev_percent))
}
# Python: Player prop modeling with Statcast data
import pandas as pd
import numpy as np
from scipy import stats
from sklearn.ensemble import RandomForestRegressor
class PlayerPropModel:
"""Model for predicting player performance props"""
def __init__(self, stat_type='hits'):
"""
stat_type: 'hits', 'strikeouts', 'home_runs', 'total_bases', etc.
"""
self.stat_type = stat_type
self.model = None
def create_features(self, player_df, pitcher_df, matchup_data):
"""Create features for player prop prediction"""
features = {
# Recent performance
'last_7_avg': player_df['last_7_games'][self.stat_type].mean(),
'last_30_avg': player_df['last_30_games'][self.stat_type].mean(),
'season_avg': player_df['season'][self.stat_type].mean(),
# Advanced metrics
'exit_velocity_avg': player_df['exit_velocity'].mean(),
'hard_hit_pct': player_df['hard_hit_rate'].mean(),
'barrel_pct': player_df['barrel_rate'].mean(),
'xwOBA': player_df['xwOBA'].mean(),
# Matchup specific
'vs_pitcher_hand': matchup_data['vs_hand_splits'][self.stat_type],
'vs_pitcher_historical': matchup_data['vs_pitcher'][self.stat_type],
# Pitcher strength
'pitcher_k_rate': pitcher_df['k_rate'],
'pitcher_whip': pitcher_df['whip'],
'pitcher_xFIP': pitcher_df['xfip'],
# Context
'home_away': matchup_data['is_home'],
'park_factor': matchup_data['park_factor'],
'batting_order_pos': matchup_data['lineup_position']
}
return pd.Series(features)
def predict_distribution(self, features):
"""Predict distribution of player stat"""
# Use Poisson or Negative Binomial for count stats
if self.stat_type in ['hits', 'strikeouts', 'home_runs']:
# Predict lambda (mean) of Poisson distribution
predicted_mean = self.model.predict([features])[0]
# Generate probability distribution
max_value = int(predicted_mean * 3 + 10)
values = np.arange(0, max_value)
probabilities = stats.poisson.pmf(values, predicted_mean)
return {
'values': values,
'probabilities': probabilities,
'mean': predicted_mean,
'median': stats.poisson.median(predicted_mean),
'std': np.sqrt(predicted_mean)
}
else:
# For continuous stats, use normal distribution
predicted_mean = self.model.predict([features])[0]
predicted_std = self.estimate_std(features)
return {
'mean': predicted_mean,
'std': predicted_std,
'distribution': 'normal'
}
def calculate_prop_ev(self, line, over_odds, under_odds, predicted_dist):
"""Calculate EV for over/under prop"""
# Probability of going over
if predicted_dist.get('distribution') == 'normal':
prob_over = 1 - stats.norm.cdf(
line,
predicted_dist['mean'],
predicted_dist['std']
)
else:
# For discrete distributions (Poisson)
prob_over = 1 - stats.poisson.cdf(
line - 0.5, # Adjust for discrete
predicted_dist['mean']
)
prob_under = 1 - prob_over
# Calculate EV for both sides
over_ev = calculate_ev(prob_over, over_odds)
under_ev = calculate_ev(prob_under, under_odds)
return {
'prob_over': prob_over,
'prob_under': prob_under,
'over_ev': over_ev,
'under_ev': under_ev,
'best_bet': 'OVER' if over_ev['ev_percent'] > under_ev['ev_percent']
else 'UNDER'
}
# Example: Aaron Judge home run prop
def analyze_hr_prop(player_data, pitcher_data, matchup_data, line=0.5,
over_odds=180, under_odds=-240):
"""Analyze home run prop bet"""
prop_model = PlayerPropModel(stat_type='home_runs')
# Create features
features = prop_model.create_features(player_data, pitcher_data, matchup_data)
# Predict HR distribution
hr_dist = prop_model.predict_distribution(features)
print(f"Predicted HR mean: {hr_dist['mean']:.3f}")
print(f"Probability of 1+ HR: {1 - hr_dist['probabilities'][0]:.1%}")
print(f"Probability of 2+ HR: {1 - sum(hr_dist['probabilities'][:2]):.1%}")
# Calculate prop EV
prop_analysis = prop_model.calculate_prop_ev(
line, over_odds, under_odds, hr_dist
)
print(f"\nProp Analysis (Line: {line})")
print(f"Probability OVER: {prop_analysis['prob_over']:.1%}")
print(f"OVER EV: {prop_analysis['over_ev']['ev_percent']:.2f}%")
print(f"UNDER EV: {prop_analysis['under_ev']['ev_percent']:.2f}%")
print(f"Recommendation: {prop_analysis['best_bet']}")
return prop_analysis
Win Probability Models
Live betting requires real-time win probability updates based on game state.
# Python: In-game win probability model
import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
class LiveWinProbabilityModel:
"""Real-time win probability for live betting"""
def __init__(self):
self.model = None
self.historical_states = None
def create_game_state_features(self, game_state):
"""
Extract features from current game state
game_state should include:
- inning, outs, runners, score, pitcher, batter, etc.
"""
features = {
# Score situation
'score_diff': game_state['home_score'] - game_state['away_score'],
'home_score': game_state['home_score'],
'away_score': game_state['away_score'],
# Game progress
'inning': game_state['inning'],
'is_top': game_state['is_top_inning'],
'outs': game_state['outs'],
'innings_remaining': 9 - game_state['inning'] + (1 if game_state['is_top_inning'] else 0),
# Base runners
'runner_1b': game_state['runner_on_1st'],
'runner_2b': game_state['runner_on_2nd'],
'runner_3b': game_state['runner_on_3rd'],
'runners_on': game_state['runner_on_1st'] + game_state['runner_on_2nd'] + game_state['runner_on_3rd'],
# Base-out state (24 states)
'base_out_state': self.calculate_base_out_state(game_state),
# Pitcher/batter matchup
'current_pitcher_era': game_state['pitcher_era'],
'pitcher_pitch_count': game_state['pitch_count'],
'current_batter_woba': game_state['batter_woba'],
# Team strength (pre-game)
'home_team_strength': game_state['home_elo_rating'],
'away_team_strength': game_state['away_elo_rating'],
# Bullpen availability
'home_bullpen_available': game_state['home_bullpen_innings'],
'away_bullpen_available': game_state['away_bullpen_innings']
}
# Interaction features
features['score_diff_per_inning_left'] = (
features['score_diff'] / max(features['innings_remaining'], 1)
)
return pd.Series(features)
def calculate_base_out_state(self, game_state):
"""Calculate base-out state (0-23)"""
bases = (game_state['runner_on_1st'] * 1 +
game_state['runner_on_2nd'] * 2 +
game_state['runner_on_3rd'] * 4)
return bases * 3 + game_state['outs']
def train_from_historical_games(self, historical_play_by_play):
"""Train model on historical play-by-play data"""
features_list = []
labels = []
for game_id, game_data in historical_play_by_play.groupby('game_id'):
home_won = game_data.iloc[-1]['home_won']
for idx, play in game_data.iterrows():
# Skip if game is over
if play['game_over']:
continue
state_features = self.create_game_state_features(play)
features_list.append(state_features)
labels.append(home_won)
X = pd.DataFrame(features_list)
y = np.array(labels)
# Train gradient boosting model
self.model = GradientBoostingClassifier(
n_estimators=200,
max_depth=5,
learning_rate=0.1,
random_state=42
)
self.model.fit(X, y)
print(f"Model trained on {len(X)} game states")
print(f"Training accuracy: {self.model.score(X, y):.3f}")
def predict_win_probability(self, game_state):
"""Predict home team win probability for current game state"""
features = self.create_game_state_features(game_state)
features_df = pd.DataFrame([features])
win_prob = self.model.predict_proba(features_df)[0, 1]
return {
'home_win_prob': win_prob,
'away_win_prob': 1 - win_prob
}
def find_live_betting_opportunities(self, game_state, live_odds):
"""Compare model probability to live betting odds"""
model_probs = self.predict_win_probability(game_state)
# Convert live odds to probabilities
home_implied = american_to_prob(live_odds['home_ml'])
away_implied = american_to_prob(live_odds['away_ml'])
# Calculate edges
home_edge = model_probs['home_win_prob'] - home_implied
away_edge = model_probs['away_win_prob'] - away_implied
# Calculate EV
home_ev = calculate_ev(model_probs['home_win_prob'], live_odds['home_ml'])
away_ev = calculate_ev(model_probs['away_win_prob'], live_odds['away_ml'])
recommendation = None
if home_ev['ev_percent'] > 2: # 2% EV threshold
recommendation = 'BET HOME'
elif away_ev['ev_percent'] > 2:
recommendation = 'BET AWAY'
else:
recommendation = 'NO BET'
return {
'model_home_prob': model_probs['home_win_prob'],
'implied_home_prob': home_implied,
'home_edge': home_edge,
'home_ev': home_ev['ev_percent'],
'away_ev': away_ev['ev_percent'],
'recommendation': recommendation
}
# Example: Analyze live betting situation
def live_bet_example():
"""Example of live betting analysis"""
# Current game state (top of 7th, tie game)
game_state = {
'home_score': 3,
'away_score': 3,
'inning': 7,
'is_top_inning': True,
'outs': 1,
'runner_on_1st': True,
'runner_on_2nd': False,
'runner_on_3rd': False,
'pitcher_era': 3.85,
'pitch_count': 89,
'batter_woba': 0.340,
'home_elo_rating': 1520,
'away_elo_rating': 1480,
'home_bullpen_innings': 8.0,
'away_bullpen_innings': 6.5,
'home_won': None # To be predicted
}
# Live betting odds
live_odds = {
'home_ml': -125,
'away_ml': +105
}
model = LiveWinProbabilityModel()
# Assume model is already trained
analysis = model.find_live_betting_opportunities(game_state, live_odds)
print("Live Betting Analysis:")
print(f"Model Home Win Prob: {analysis['model_home_prob']:.1%}")
print(f"Implied Home Win Prob: {analysis['implied_home_prob']:.1%}")
print(f"Edge: {analysis['home_edge']:.1%}")
print(f"Home EV: {analysis['home_ev']:.2f}%")
print(f"Away EV: {analysis['away_ev']:.2f}%")
print(f"Recommendation: {analysis['recommendation']}")
Run Expectancy Matrix for Live Betting
# R: Run expectancy and win probability
library(dplyr)
# Create run expectancy matrix from historical data
create_re_matrix <- function(play_by_play_data) {
# Calculate runs scored in remainder of inning for each state
re_matrix <- play_by_play_data %>%
group_by(outs, runner_1b, runner_2b, runner_3b) %>%
summarise(
avg_runs_scored = mean(runs_end_of_inning - runs_start_of_play),
.groups = 'drop'
) %>%
arrange(outs, runner_1b, runner_2b, runner_3b)
return(re_matrix)
}
# Win probability added (WPA) for each play
calculate_wpa <- function(win_prob_before, win_prob_after) {
return(win_prob_after - win_prob_before)
}
# Leverage index - how important is this game situation?
calculate_leverage <- function(game_state, wp_model) {
# Calculate WP change for all possible outcomes
outcomes <- c('out', 'single', 'double', 'triple', 'hr', 'walk')
wp_changes <- numeric(length(outcomes))
base_wp <- predict_wp(game_state, wp_model)
for (i in seq_along(outcomes)) {
new_state <- simulate_outcome(game_state, outcomes[i])
new_wp <- predict_wp(new_state, wp_model)
wp_changes[i] <- abs(new_wp - base_wp)
}
# Leverage is average absolute WP change weighted by outcome probability
outcome_probs <- c(0.65, 0.15, 0.05, 0.01, 0.03, 0.11) # Rough estimates
leverage <- sum(wp_changes * outcome_probs)
return(leverage)
}
# Live betting edge calculation
live_betting_edge <- function(model_wp, live_odds_home, live_odds_away) {
implied_home <- american_to_prob(live_odds_home)
implied_away <- american_to_prob(live_odds_away)
# Remove vig
total <- implied_home + implied_away
fair_home <- implied_home / total
fair_away <- implied_away / total
# Compare to model
home_edge <- model_wp - fair_home
away_edge <- (1 - model_wp) - fair_away
return(list(
home_edge = home_edge,
away_edge = away_edge,
model_wp = model_wp,
fair_home = fair_home,
fair_away = fair_away
))
}
# R: Run expectancy and win probability
library(dplyr)
# Create run expectancy matrix from historical data
create_re_matrix <- function(play_by_play_data) {
# Calculate runs scored in remainder of inning for each state
re_matrix <- play_by_play_data %>%
group_by(outs, runner_1b, runner_2b, runner_3b) %>%
summarise(
avg_runs_scored = mean(runs_end_of_inning - runs_start_of_play),
.groups = 'drop'
) %>%
arrange(outs, runner_1b, runner_2b, runner_3b)
return(re_matrix)
}
# Win probability added (WPA) for each play
calculate_wpa <- function(win_prob_before, win_prob_after) {
return(win_prob_after - win_prob_before)
}
# Leverage index - how important is this game situation?
calculate_leverage <- function(game_state, wp_model) {
# Calculate WP change for all possible outcomes
outcomes <- c('out', 'single', 'double', 'triple', 'hr', 'walk')
wp_changes <- numeric(length(outcomes))
base_wp <- predict_wp(game_state, wp_model)
for (i in seq_along(outcomes)) {
new_state <- simulate_outcome(game_state, outcomes[i])
new_wp <- predict_wp(new_state, wp_model)
wp_changes[i] <- abs(new_wp - base_wp)
}
# Leverage is average absolute WP change weighted by outcome probability
outcome_probs <- c(0.65, 0.15, 0.05, 0.01, 0.03, 0.11) # Rough estimates
leverage <- sum(wp_changes * outcome_probs)
return(leverage)
}
# Live betting edge calculation
live_betting_edge <- function(model_wp, live_odds_home, live_odds_away) {
implied_home <- american_to_prob(live_odds_home)
implied_away <- american_to_prob(live_odds_away)
# Remove vig
total <- implied_home + implied_away
fair_home <- implied_home / total
fair_away <- implied_away / total
# Compare to model
home_edge <- model_wp - fair_home
away_edge <- (1 - model_wp) - fair_away
return(list(
home_edge = home_edge,
away_edge = away_edge,
model_wp = model_wp,
fair_home = fair_home,
fair_away = fair_away
))
}
# Python: In-game win probability model
import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
class LiveWinProbabilityModel:
"""Real-time win probability for live betting"""
def __init__(self):
self.model = None
self.historical_states = None
def create_game_state_features(self, game_state):
"""
Extract features from current game state
game_state should include:
- inning, outs, runners, score, pitcher, batter, etc.
"""
features = {
# Score situation
'score_diff': game_state['home_score'] - game_state['away_score'],
'home_score': game_state['home_score'],
'away_score': game_state['away_score'],
# Game progress
'inning': game_state['inning'],
'is_top': game_state['is_top_inning'],
'outs': game_state['outs'],
'innings_remaining': 9 - game_state['inning'] + (1 if game_state['is_top_inning'] else 0),
# Base runners
'runner_1b': game_state['runner_on_1st'],
'runner_2b': game_state['runner_on_2nd'],
'runner_3b': game_state['runner_on_3rd'],
'runners_on': game_state['runner_on_1st'] + game_state['runner_on_2nd'] + game_state['runner_on_3rd'],
# Base-out state (24 states)
'base_out_state': self.calculate_base_out_state(game_state),
# Pitcher/batter matchup
'current_pitcher_era': game_state['pitcher_era'],
'pitcher_pitch_count': game_state['pitch_count'],
'current_batter_woba': game_state['batter_woba'],
# Team strength (pre-game)
'home_team_strength': game_state['home_elo_rating'],
'away_team_strength': game_state['away_elo_rating'],
# Bullpen availability
'home_bullpen_available': game_state['home_bullpen_innings'],
'away_bullpen_available': game_state['away_bullpen_innings']
}
# Interaction features
features['score_diff_per_inning_left'] = (
features['score_diff'] / max(features['innings_remaining'], 1)
)
return pd.Series(features)
def calculate_base_out_state(self, game_state):
"""Calculate base-out state (0-23)"""
bases = (game_state['runner_on_1st'] * 1 +
game_state['runner_on_2nd'] * 2 +
game_state['runner_on_3rd'] * 4)
return bases * 3 + game_state['outs']
def train_from_historical_games(self, historical_play_by_play):
"""Train model on historical play-by-play data"""
features_list = []
labels = []
for game_id, game_data in historical_play_by_play.groupby('game_id'):
home_won = game_data.iloc[-1]['home_won']
for idx, play in game_data.iterrows():
# Skip if game is over
if play['game_over']:
continue
state_features = self.create_game_state_features(play)
features_list.append(state_features)
labels.append(home_won)
X = pd.DataFrame(features_list)
y = np.array(labels)
# Train gradient boosting model
self.model = GradientBoostingClassifier(
n_estimators=200,
max_depth=5,
learning_rate=0.1,
random_state=42
)
self.model.fit(X, y)
print(f"Model trained on {len(X)} game states")
print(f"Training accuracy: {self.model.score(X, y):.3f}")
def predict_win_probability(self, game_state):
"""Predict home team win probability for current game state"""
features = self.create_game_state_features(game_state)
features_df = pd.DataFrame([features])
win_prob = self.model.predict_proba(features_df)[0, 1]
return {
'home_win_prob': win_prob,
'away_win_prob': 1 - win_prob
}
def find_live_betting_opportunities(self, game_state, live_odds):
"""Compare model probability to live betting odds"""
model_probs = self.predict_win_probability(game_state)
# Convert live odds to probabilities
home_implied = american_to_prob(live_odds['home_ml'])
away_implied = american_to_prob(live_odds['away_ml'])
# Calculate edges
home_edge = model_probs['home_win_prob'] - home_implied
away_edge = model_probs['away_win_prob'] - away_implied
# Calculate EV
home_ev = calculate_ev(model_probs['home_win_prob'], live_odds['home_ml'])
away_ev = calculate_ev(model_probs['away_win_prob'], live_odds['away_ml'])
recommendation = None
if home_ev['ev_percent'] > 2: # 2% EV threshold
recommendation = 'BET HOME'
elif away_ev['ev_percent'] > 2:
recommendation = 'BET AWAY'
else:
recommendation = 'NO BET'
return {
'model_home_prob': model_probs['home_win_prob'],
'implied_home_prob': home_implied,
'home_edge': home_edge,
'home_ev': home_ev['ev_percent'],
'away_ev': away_ev['ev_percent'],
'recommendation': recommendation
}
# Example: Analyze live betting situation
def live_bet_example():
"""Example of live betting analysis"""
# Current game state (top of 7th, tie game)
game_state = {
'home_score': 3,
'away_score': 3,
'inning': 7,
'is_top_inning': True,
'outs': 1,
'runner_on_1st': True,
'runner_on_2nd': False,
'runner_on_3rd': False,
'pitcher_era': 3.85,
'pitch_count': 89,
'batter_woba': 0.340,
'home_elo_rating': 1520,
'away_elo_rating': 1480,
'home_bullpen_innings': 8.0,
'away_bullpen_innings': 6.5,
'home_won': None # To be predicted
}
# Live betting odds
live_odds = {
'home_ml': -125,
'away_ml': +105
}
model = LiveWinProbabilityModel()
# Assume model is already trained
analysis = model.find_live_betting_opportunities(game_state, live_odds)
print("Live Betting Analysis:")
print(f"Model Home Win Prob: {analysis['model_home_prob']:.1%}")
print(f"Implied Home Win Prob: {analysis['implied_home_prob']:.1%}")
print(f"Edge: {analysis['home_edge']:.1%}")
print(f"Home EV: {analysis['home_ev']:.2f}%")
print(f"Away EV: {analysis['away_ev']:.2f}%")
print(f"Recommendation: {analysis['recommendation']}")
Bankroll Management Strategies
Proper bankroll management is critical for long-term success in sports betting.
# Python: Comprehensive bankroll management system
import numpy as np
import pandas as pd
class BankrollManager:
"""Advanced bankroll management system"""
def __init__(self, initial_bankroll, max_bet_pct=0.05, kelly_fraction=0.25):
self.initial_bankroll = initial_bankroll
self.current_bankroll = initial_bankroll
self.max_bet_pct = max_bet_pct
self.kelly_fraction = kelly_fraction
self.bet_history = []
def calculate_bet_size(self, edge, odds, method='fractional_kelly'):
"""
Calculate appropriate bet size
Methods:
- 'fractional_kelly': Conservative Kelly (recommended)
- 'full_kelly': Full Kelly (high variance)
- 'fixed_unit': Fixed percentage of bankroll
- 'fixed_dollar': Fixed dollar amount
"""
if method == 'fractional_kelly':
# Convert edge to win probability
implied_prob = american_to_prob(odds)
true_prob = implied_prob + edge
# Kelly calculation
if odds > 0:
b = odds / 100
else:
b = 100 / abs(odds)
kelly_pct = (b * true_prob - (1 - true_prob)) / b
kelly_pct = max(0, kelly_pct) # Never negative
# Apply fraction
bet_pct = kelly_pct * self.kelly_fraction
# Apply maximum bet constraint
bet_pct = min(bet_pct, self.max_bet_pct)
bet_size = self.current_bankroll * bet_pct
elif method == 'fixed_unit':
bet_size = self.current_bankroll * 0.01 # 1 unit = 1%
elif method == 'fixed_dollar':
bet_size = 100 # Fixed $100
else: # full_kelly
implied_prob = american_to_prob(odds)
true_prob = implied_prob + edge
if odds > 0:
b = odds / 100
else:
b = 100 / abs(odds)
kelly_pct = (b * true_prob - (1 - true_prob)) / b
kelly_pct = max(0, min(kelly_pct, self.max_bet_pct))
bet_size = self.current_bankroll * kelly_pct
return {
'bet_size': bet_size,
'bet_pct': bet_size / self.current_bankroll,
'method': method
}
def place_bet(self, bet_size, odds, won):
"""Record bet and update bankroll"""
# Calculate payout
if odds > 0:
payout = bet_size * (odds / 100) if won else -bet_size
else:
payout = bet_size * (100 / abs(odds)) if won else -bet_size
# Update bankroll
self.current_bankroll += payout
# Record bet
self.bet_history.append({
'bet_size': bet_size,
'odds': odds,
'won': won,
'payout': payout,
'bankroll_after': self.current_bankroll,
'roi': (payout / bet_size) if bet_size > 0 else 0
})
return {
'payout': payout,
'new_bankroll': self.current_bankroll,
'total_roi': (self.current_bankroll / self.initial_bankroll - 1)
}
def get_statistics(self):
"""Calculate bankroll statistics"""
if not self.bet_history:
return {}
history_df = pd.DataFrame(self.bet_history)
total_bets = len(history_df)
wins = history_df['won'].sum()
losses = total_bets - wins
win_rate = wins / total_bets if total_bets > 0 else 0
total_wagered = history_df['bet_size'].sum()
total_profit = history_df['payout'].sum()
roi = (total_profit / total_wagered) if total_wagered > 0 else 0
# Calculate maximum drawdown
bankroll_series = history_df['bankroll_after']
running_max = bankroll_series.expanding().max()
drawdown = (bankroll_series - running_max) / running_max
max_drawdown = drawdown.min()
# Sharpe ratio (simplified)
returns = history_df['roi']
sharpe = (returns.mean() / returns.std()) if returns.std() > 0 else 0
return {
'total_bets': total_bets,
'wins': wins,
'losses': losses,
'win_rate': win_rate,
'total_wagered': total_wagered,
'total_profit': total_profit,
'roi': roi,
'current_bankroll': self.current_bankroll,
'total_return': (self.current_bankroll / self.initial_bankroll - 1),
'max_drawdown': max_drawdown,
'sharpe_ratio': sharpe,
'avg_bet_size': history_df['bet_size'].mean(),
'largest_bet': history_df['bet_size'].max()
}
def risk_of_ruin(self, win_prob, avg_odds, n_bets=100):
"""
Calculate probability of losing entire bankroll
Uses simplified formula for Risk of Ruin
"""
# Convert odds to win/loss ratio
if avg_odds > 0:
win_loss_ratio = avg_odds / 100
else:
win_loss_ratio = 100 / abs(avg_odds)
# Risk of ruin formula
# RoR = ((1-p)/p * (1/w))^B where p=win prob, w=win/loss ratio, B=bankroll units
lose_prob = 1 - win_prob
if win_prob * (1 + win_loss_ratio) > 1: # Positive expectation
# Units in bankroll (assuming 1% per bet)
units = 100
ror = ((lose_prob / win_prob) * (1 / win_loss_ratio)) ** units
else: # Negative expectation
ror = 1.0 # Certain ruin in long run
return min(ror, 1.0)
# Example: Simulate betting season
def simulate_betting_season(n_games=162,
edge=0.03,
avg_odds=-110,
win_rate=0.55,
initial_bankroll=10000):
"""Simulate a full betting season"""
manager = BankrollManager(
initial_bankroll=initial_bankroll,
kelly_fraction=0.25
)
np.random.seed(42)
for i in range(n_games):
# Calculate bet size
bet_info = manager.calculate_bet_size(edge, avg_odds, method='fractional_kelly')
# Simulate outcome
won = np.random.random() < win_rate
# Place bet
manager.place_bet(bet_info['bet_size'], avg_odds, won)
# Get final statistics
stats = manager.get_statistics()
print("Season Betting Results:")
print(f"Total Bets: {stats['total_bets']}")
print(f"Win Rate: {stats['win_rate']:.1%}")
print(f"Total Wagered: ${stats['total_wagered']:,.2f}")
print(f"Total Profit: ${stats['total_profit']:,.2f}")
print(f"ROI: {stats['roi']:.2%}")
print(f"Final Bankroll: ${stats['current_bankroll']:,.2f}")
print(f"Total Return: {stats['total_return']:.2%}")
print(f"Max Drawdown: {stats['max_drawdown']:.2%}")
print(f"Sharpe Ratio: {stats['sharpe_ratio']:.2f}")
return manager
Risk Assessment and Portfolio Theory
# R: Portfolio theory applied to sports betting
library(ggplot2)
# Variance of Kelly betting
kelly_variance <- function(edge, win_prob, n_bets) {
# Kelly fraction
kelly_pct <- edge / (1 - win_prob)
# Variance of log returns
log_win <- log(1 + kelly_pct)
log_lose <- log(1 - kelly_pct)
expected_log_return <- win_prob * log_win + (1 - win_prob) * log_lose
variance_log_return <- win_prob * (log_win - expected_log_return)^2 +
(1 - win_prob) * (log_lose - expected_log_return)^2
# Portfolio variance over n bets
total_variance <- n_bets * variance_log_return
return(list(
expected_return = expected_log_return * n_bets,
variance = total_variance,
std_dev = sqrt(total_variance)
))
}
# Diversification across multiple bet types
portfolio_correlation <- function(bet_types) {
# Correlation matrix for different bet types
# Game totals vs moneylines typically have low correlation
# Player props vs game outcomes have moderate correlation
n <- length(bet_types)
cor_matrix <- matrix(0.3, n, n) # Assume 0.3 correlation
diag(cor_matrix) <- 1
return(cor_matrix)
}
# Optimal portfolio allocation
optimal_bet_allocation <- function(edges, variances, correlations, total_bankroll) {
# Mean-variance optimization
# Maximize expected return for given risk level
# This is simplified - in practice would use quadratic programming
# Here we use a heuristic based on Sharpe ratio
n_bets <- length(edges)
sharpe_ratios <- edges / sqrt(variances)
# Allocate based on Sharpe ratios
weights <- sharpe_ratios / sum(sharpe_ratios)
allocations <- weights * total_bankroll
return(data.frame(
bet_type = paste0("Bet_", 1:n_bets),
edge = edges,
variance = variances,
sharpe = sharpe_ratios,
weight = weights,
allocation = allocations
))
}
# R: Portfolio theory applied to sports betting
library(ggplot2)
# Variance of Kelly betting
kelly_variance <- function(edge, win_prob, n_bets) {
# Kelly fraction
kelly_pct <- edge / (1 - win_prob)
# Variance of log returns
log_win <- log(1 + kelly_pct)
log_lose <- log(1 - kelly_pct)
expected_log_return <- win_prob * log_win + (1 - win_prob) * log_lose
variance_log_return <- win_prob * (log_win - expected_log_return)^2 +
(1 - win_prob) * (log_lose - expected_log_return)^2
# Portfolio variance over n bets
total_variance <- n_bets * variance_log_return
return(list(
expected_return = expected_log_return * n_bets,
variance = total_variance,
std_dev = sqrt(total_variance)
))
}
# Diversification across multiple bet types
portfolio_correlation <- function(bet_types) {
# Correlation matrix for different bet types
# Game totals vs moneylines typically have low correlation
# Player props vs game outcomes have moderate correlation
n <- length(bet_types)
cor_matrix <- matrix(0.3, n, n) # Assume 0.3 correlation
diag(cor_matrix) <- 1
return(cor_matrix)
}
# Optimal portfolio allocation
optimal_bet_allocation <- function(edges, variances, correlations, total_bankroll) {
# Mean-variance optimization
# Maximize expected return for given risk level
# This is simplified - in practice would use quadratic programming
# Here we use a heuristic based on Sharpe ratio
n_bets <- length(edges)
sharpe_ratios <- edges / sqrt(variances)
# Allocate based on Sharpe ratios
weights <- sharpe_ratios / sum(sharpe_ratios)
allocations <- weights * total_bankroll
return(data.frame(
bet_type = paste0("Bet_", 1:n_bets),
edge = edges,
variance = variances,
sharpe = sharpe_ratios,
weight = weights,
allocation = allocations
))
}
# Python: Comprehensive bankroll management system
import numpy as np
import pandas as pd
class BankrollManager:
"""Advanced bankroll management system"""
def __init__(self, initial_bankroll, max_bet_pct=0.05, kelly_fraction=0.25):
self.initial_bankroll = initial_bankroll
self.current_bankroll = initial_bankroll
self.max_bet_pct = max_bet_pct
self.kelly_fraction = kelly_fraction
self.bet_history = []
def calculate_bet_size(self, edge, odds, method='fractional_kelly'):
"""
Calculate appropriate bet size
Methods:
- 'fractional_kelly': Conservative Kelly (recommended)
- 'full_kelly': Full Kelly (high variance)
- 'fixed_unit': Fixed percentage of bankroll
- 'fixed_dollar': Fixed dollar amount
"""
if method == 'fractional_kelly':
# Convert edge to win probability
implied_prob = american_to_prob(odds)
true_prob = implied_prob + edge
# Kelly calculation
if odds > 0:
b = odds / 100
else:
b = 100 / abs(odds)
kelly_pct = (b * true_prob - (1 - true_prob)) / b
kelly_pct = max(0, kelly_pct) # Never negative
# Apply fraction
bet_pct = kelly_pct * self.kelly_fraction
# Apply maximum bet constraint
bet_pct = min(bet_pct, self.max_bet_pct)
bet_size = self.current_bankroll * bet_pct
elif method == 'fixed_unit':
bet_size = self.current_bankroll * 0.01 # 1 unit = 1%
elif method == 'fixed_dollar':
bet_size = 100 # Fixed $100
else: # full_kelly
implied_prob = american_to_prob(odds)
true_prob = implied_prob + edge
if odds > 0:
b = odds / 100
else:
b = 100 / abs(odds)
kelly_pct = (b * true_prob - (1 - true_prob)) / b
kelly_pct = max(0, min(kelly_pct, self.max_bet_pct))
bet_size = self.current_bankroll * kelly_pct
return {
'bet_size': bet_size,
'bet_pct': bet_size / self.current_bankroll,
'method': method
}
def place_bet(self, bet_size, odds, won):
"""Record bet and update bankroll"""
# Calculate payout
if odds > 0:
payout = bet_size * (odds / 100) if won else -bet_size
else:
payout = bet_size * (100 / abs(odds)) if won else -bet_size
# Update bankroll
self.current_bankroll += payout
# Record bet
self.bet_history.append({
'bet_size': bet_size,
'odds': odds,
'won': won,
'payout': payout,
'bankroll_after': self.current_bankroll,
'roi': (payout / bet_size) if bet_size > 0 else 0
})
return {
'payout': payout,
'new_bankroll': self.current_bankroll,
'total_roi': (self.current_bankroll / self.initial_bankroll - 1)
}
def get_statistics(self):
"""Calculate bankroll statistics"""
if not self.bet_history:
return {}
history_df = pd.DataFrame(self.bet_history)
total_bets = len(history_df)
wins = history_df['won'].sum()
losses = total_bets - wins
win_rate = wins / total_bets if total_bets > 0 else 0
total_wagered = history_df['bet_size'].sum()
total_profit = history_df['payout'].sum()
roi = (total_profit / total_wagered) if total_wagered > 0 else 0
# Calculate maximum drawdown
bankroll_series = history_df['bankroll_after']
running_max = bankroll_series.expanding().max()
drawdown = (bankroll_series - running_max) / running_max
max_drawdown = drawdown.min()
# Sharpe ratio (simplified)
returns = history_df['roi']
sharpe = (returns.mean() / returns.std()) if returns.std() > 0 else 0
return {
'total_bets': total_bets,
'wins': wins,
'losses': losses,
'win_rate': win_rate,
'total_wagered': total_wagered,
'total_profit': total_profit,
'roi': roi,
'current_bankroll': self.current_bankroll,
'total_return': (self.current_bankroll / self.initial_bankroll - 1),
'max_drawdown': max_drawdown,
'sharpe_ratio': sharpe,
'avg_bet_size': history_df['bet_size'].mean(),
'largest_bet': history_df['bet_size'].max()
}
def risk_of_ruin(self, win_prob, avg_odds, n_bets=100):
"""
Calculate probability of losing entire bankroll
Uses simplified formula for Risk of Ruin
"""
# Convert odds to win/loss ratio
if avg_odds > 0:
win_loss_ratio = avg_odds / 100
else:
win_loss_ratio = 100 / abs(avg_odds)
# Risk of ruin formula
# RoR = ((1-p)/p * (1/w))^B where p=win prob, w=win/loss ratio, B=bankroll units
lose_prob = 1 - win_prob
if win_prob * (1 + win_loss_ratio) > 1: # Positive expectation
# Units in bankroll (assuming 1% per bet)
units = 100
ror = ((lose_prob / win_prob) * (1 / win_loss_ratio)) ** units
else: # Negative expectation
ror = 1.0 # Certain ruin in long run
return min(ror, 1.0)
# Example: Simulate betting season
def simulate_betting_season(n_games=162,
edge=0.03,
avg_odds=-110,
win_rate=0.55,
initial_bankroll=10000):
"""Simulate a full betting season"""
manager = BankrollManager(
initial_bankroll=initial_bankroll,
kelly_fraction=0.25
)
np.random.seed(42)
for i in range(n_games):
# Calculate bet size
bet_info = manager.calculate_bet_size(edge, avg_odds, method='fractional_kelly')
# Simulate outcome
won = np.random.random() < win_rate
# Place bet
manager.place_bet(bet_info['bet_size'], avg_odds, won)
# Get final statistics
stats = manager.get_statistics()
print("Season Betting Results:")
print(f"Total Bets: {stats['total_bets']}")
print(f"Win Rate: {stats['win_rate']:.1%}")
print(f"Total Wagered: ${stats['total_wagered']:,.2f}")
print(f"Total Profit: ${stats['total_profit']:,.2f}")
print(f"ROI: {stats['roi']:.2%}")
print(f"Final Bankroll: ${stats['current_bankroll']:,.2f}")
print(f"Total Return: {stats['total_return']:.2%}")
print(f"Max Drawdown: {stats['max_drawdown']:.2%}")
print(f"Sharpe Ratio: {stats['sharpe_ratio']:.2f}")
return manager
Complete End-to-End Pipeline
# Python: Complete betting model pipeline
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import joblib
class MLBBettingPipeline:
"""End-to-end pipeline for MLB betting predictions"""
def __init__(self):
self.game_model = None
self.prop_models = {}
self.bankroll_manager = None
self.feature_cache = {}
def fetch_data(self, date):
"""Fetch all required data for given date"""
# This would connect to your data sources
# For example: MLB API, Statcast, odds API
data = {
'games': self.fetch_todays_games(date),
'team_stats': self.fetch_team_stats(date),
'pitcher_stats': self.fetch_pitcher_stats(date),
'odds': self.fetch_odds(date),
'weather': self.fetch_weather(date),
'lineups': self.fetch_lineups(date)
}
return data
def fetch_todays_games(self, date):
"""Fetch today's game schedule"""
# Placeholder - would call MLB API
return pd.DataFrame({
'game_id': [1, 2, 3],
'date': [date] * 3,
'home_team': ['NYY', 'BOS', 'LAD'],
'away_team': ['TOR', 'TB', 'SF'],
'home_pitcher': ['Cole', 'Sale', 'Kershaw'],
'away_pitcher': ['Gausman', 'Glasnow', 'Webb']
})
def engineer_features(self, data):
"""Create all features for predictions"""
games = data['games']
features_list = []
for idx, game in games.iterrows():
game_features = {
'game_id': game['game_id'],
'date': game['date'],
'home_team': game['home_team'],
'away_team': game['away_team']
}
# Add team stats
home_stats = self.get_team_features(
game['home_team'],
game['date'],
data['team_stats']
)
away_stats = self.get_team_features(
game['away_team'],
game['date'],
data['team_stats']
)
# Prefix features
for key, val in home_stats.items():
game_features[f'home_{key}'] = val
for key, val in away_stats.items():
game_features[f'away_{key}'] = val
# Add pitcher stats
home_pitcher_stats = self.get_pitcher_features(
game['home_pitcher'],
game['date'],
data['pitcher_stats']
)
away_pitcher_stats = self.get_pitcher_features(
game['away_pitcher'],
game['date'],
data['pitcher_stats']
)
for key, val in home_pitcher_stats.items():
game_features[f'home_pitcher_{key}'] = val
for key, val in away_pitcher_stats.items():
game_features[f'away_pitcher_{key}'] = val
# Add contextual features
contextual = self.get_contextual_features(game, data)
game_features.update(contextual)
features_list.append(game_features)
return pd.DataFrame(features_list)
def get_team_features(self, team, date, team_stats):
"""Extract team-level features"""
team_data = team_stats[
(team_stats['team'] == team) &
(team_stats['date'] < date)
].tail(20) # Last 20 games
if len(team_data) == 0:
return self.get_default_team_features()
return {
'win_pct': team_data['win'].mean(),
'runs_per_game': team_data['runs_scored'].mean(),
'runs_allowed_per_game': team_data['runs_allowed'].mean(),
'wOBA': team_data['wOBA'].mean(),
'FIP': team_data['FIP'].mean(),
'bullpen_ERA': team_data['bullpen_ERA'].mean()
}
def get_pitcher_features(self, pitcher, date, pitcher_stats):
"""Extract pitcher-level features"""
pitcher_data = pitcher_stats[
(pitcher_stats['pitcher'] == pitcher) &
(pitcher_stats['date'] < date)
].tail(5) # Last 5 starts
if len(pitcher_data) == 0:
return self.get_default_pitcher_features()
return {
'ERA': pitcher_data['ERA'].mean(),
'FIP': pitcher_data['FIP'].mean(),
'WHIP': pitcher_data['WHIP'].mean(),
'K_per_9': pitcher_data['K_per_9'].mean(),
'BB_per_9': pitcher_data['BB_per_9'].mean()
}
def get_contextual_features(self, game, data):
"""Extract contextual features"""
weather = data['weather'].get(game['game_id'], {})
return {
'temperature': weather.get('temp', 70),
'wind_speed': weather.get('wind', 0),
'precipitation': weather.get('precip', 0),
'day_of_week': game['date'].dayofweek,
'month': game['date'].month
}
def get_default_team_features(self):
"""Return league average features"""
return {
'win_pct': 0.500,
'runs_per_game': 4.5,
'runs_allowed_per_game': 4.5,
'wOBA': 0.320,
'FIP': 4.00,
'bullpen_ERA': 4.00
}
def get_default_pitcher_features(self):
"""Return league average pitcher features"""
return {
'ERA': 4.00,
'FIP': 4.00,
'WHIP': 1.30,
'K_per_9': 8.5,
'BB_per_9': 3.0
}
def make_predictions(self, features):
"""Generate predictions for all games"""
predictions = []
for idx, game_features in features.iterrows():
# Predict with game model
X = game_features[self.game_model.feature_names].values.reshape(1, -1)
home_win_prob = self.game_model.predict_proba(X)[0]
predictions.append({
'game_id': game_features['game_id'],
'home_team': game_features['home_team'],
'away_team': game_features['away_team'],
'home_win_prob': home_win_prob,
'away_win_prob': 1 - home_win_prob
})
return pd.DataFrame(predictions)
def find_betting_opportunities(self, predictions, odds_data, min_ev=2.0):
"""Identify positive EV betting opportunities"""
opportunities = []
for idx, pred in predictions.iterrows():
game_odds = odds_data[odds_data['game_id'] == pred['game_id']].iloc[0]
# Check home moneyline
home_ev = calculate_ev(
pred['home_win_prob'],
game_odds['home_ml']
)
# Check away moneyline
away_ev = calculate_ev(
pred['away_win_prob'],
game_odds['away_ml']
)
# Add positive EV bets
if home_ev['ev_percent'] >= min_ev:
opportunities.append({
'game_id': pred['game_id'],
'bet_type': 'moneyline',
'team': pred['home_team'],
'odds': game_odds['home_ml'],
'model_prob': pred['home_win_prob'],
'ev_percent': home_ev['ev_percent'],
'edge': home_ev['edge']
})
if away_ev['ev_percent'] >= min_ev:
opportunities.append({
'game_id': pred['game_id'],
'bet_type': 'moneyline',
'team': pred['away_team'],
'odds': game_odds['away_ml'],
'model_prob': pred['away_win_prob'],
'ev_percent': away_ev['ev_percent'],
'edge': away_ev['edge']
})
return pd.DataFrame(opportunities)
def calculate_bet_sizes(self, opportunities):
"""Calculate appropriate bet sizes using Kelly"""
for idx, opp in opportunities.iterrows():
bet_size_info = self.bankroll_manager.calculate_bet_size(
edge=opp['edge'],
odds=opp['odds'],
method='fractional_kelly'
)
opportunities.loc[idx, 'bet_size'] = bet_size_info['bet_size']
opportunities.loc[idx, 'bet_pct'] = bet_size_info['bet_pct']
return opportunities
def generate_daily_report(self, date):
"""Generate complete daily betting report"""
print(f"=== MLB Betting Report for {date} ===\n")
# Fetch data
print("Fetching data...")
data = self.fetch_data(date)
# Engineer features
print("Engineering features...")
features = self.engineer_features(data)
# Make predictions
print("Generating predictions...")
predictions = self.make_predictions(features)
# Find opportunities
print("Identifying betting opportunities...")
opportunities = self.find_betting_opportunities(
predictions,
data['odds'],
min_ev=2.0
)
if len(opportunities) == 0:
print("\nNo positive EV opportunities found today.")
return None
# Calculate bet sizes
opportunities = self.calculate_bet_sizes(opportunities)
# Sort by EV
opportunities = opportunities.sort_values('ev_percent', ascending=False)
# Display report
print(f"\n{len(opportunities)} Betting Opportunities Found:\n")
for idx, opp in opportunities.iterrows():
print(f"Game: {opp['team']}")
print(f" Bet Type: {opp['bet_type']}")
print(f" Odds: {opp['odds']:+.0f}")
print(f" Model Probability: {opp['model_prob']:.1%}")
print(f" Edge: {opp['edge']:.2%}")
print(f" EV: {opp['ev_percent']:.2f}%")
print(f" Recommended Bet: ${opp['bet_size']:.2f} ({opp['bet_pct']:.2%} of bankroll)")
print()
return opportunities
# Example usage
def main():
"""Run daily betting pipeline"""
# Initialize pipeline
pipeline = MLBBettingPipeline()
# Load trained models (assumed to be pre-trained)
# pipeline.game_model = joblib.load('models/game_model.pkl')
# Initialize bankroll manager
pipeline.bankroll_manager = BankrollManager(
initial_bankroll=10000,
kelly_fraction=0.25,
max_bet_pct=0.05
)
# Generate report for today
today = datetime.now()
opportunities = pipeline.generate_daily_report(today)
return opportunities
R Implementation of Pipeline
# R: Betting pipeline implementation
library(dplyr)
library(caret)
library(lubridate)
mlb_betting_pipeline <- function(date, models, bankroll = 10000) {
# Fetch data
cat("Fetching game data for", as.character(date), "\n")
games <- fetch_games(date)
team_stats <- fetch_team_stats(date)
pitcher_stats <- fetch_pitcher_stats(date)
odds_data <- fetch_odds(date)
# Engineer features
cat("Engineering features...\n")
features <- engineer_all_features(games, team_stats, pitcher_stats)
# Make predictions
cat("Generating predictions...\n")
predictions <- predict(models$game_model, features, type = "prob")
features$home_win_prob <- predictions[, "home_win"]
# Merge with odds
betting_data <- features %>%
left_join(odds_data, by = "game_id")
# Calculate EV for each bet
betting_data <- betting_data %>%
mutate(
home_ml_ev = calculate_ev_r(home_win_prob, home_ml)$ev_percent,
away_ml_ev = calculate_ev_r(1 - home_win_prob, away_ml)$ev_percent,
home_edge = home_win_prob - american_to_prob(home_ml),
away_edge = (1 - home_win_prob) - american_to_prob(away_ml)
)
# Find positive EV bets
home_bets <- betting_data %>%
filter(home_ml_ev > 2) %>%
mutate(
team = home_team,
odds = home_ml,
model_prob = home_win_prob,
ev = home_ml_ev,
edge = home_edge
)
away_bets <- betting_data %>%
filter(away_ml_ev > 2) %>%
mutate(
team = away_team,
odds = away_ml,
model_prob = 1 - home_win_prob,
ev = away_ml_ev,
edge = away_edge
)
opportunities <- bind_rows(home_bets, away_bets) %>%
select(game_id, team, odds, model_prob, ev, edge) %>%
arrange(desc(ev))
# Calculate bet sizes
if (nrow(opportunities) > 0) {
opportunities$bet_size <- mapply(
function(edge, odds) {
kelly <- kelly_criterion(
american_to_prob(odds) + edge,
odds,
kelly_fraction = 0.25
)
bankroll * kelly$bet_size
},
opportunities$edge,
opportunities$odds
)
}
# Print report
cat("\n=== Betting Opportunities ===\n")
print(opportunities)
return(opportunities)
}
# Helper functions
fetch_games <- function(date) {
# Placeholder - would fetch from API
data.frame(
game_id = 1:3,
date = date,
home_team = c("NYY", "BOS", "LAD"),
away_team = c("TOR", "TB", "SF")
)
}
fetch_odds <- function(date) {
# Placeholder - would fetch from odds API
data.frame(
game_id = 1:3,
home_ml = c(-150, -120, -180),
away_ml = c(130, 100, 160)
)
}
engineer_all_features <- function(games, team_stats, pitcher_stats) {
# Feature engineering
# Placeholder
games
}
# R: Betting pipeline implementation
library(dplyr)
library(caret)
library(lubridate)
mlb_betting_pipeline <- function(date, models, bankroll = 10000) {
# Fetch data
cat("Fetching game data for", as.character(date), "\n")
games <- fetch_games(date)
team_stats <- fetch_team_stats(date)
pitcher_stats <- fetch_pitcher_stats(date)
odds_data <- fetch_odds(date)
# Engineer features
cat("Engineering features...\n")
features <- engineer_all_features(games, team_stats, pitcher_stats)
# Make predictions
cat("Generating predictions...\n")
predictions <- predict(models$game_model, features, type = "prob")
features$home_win_prob <- predictions[, "home_win"]
# Merge with odds
betting_data <- features %>%
left_join(odds_data, by = "game_id")
# Calculate EV for each bet
betting_data <- betting_data %>%
mutate(
home_ml_ev = calculate_ev_r(home_win_prob, home_ml)$ev_percent,
away_ml_ev = calculate_ev_r(1 - home_win_prob, away_ml)$ev_percent,
home_edge = home_win_prob - american_to_prob(home_ml),
away_edge = (1 - home_win_prob) - american_to_prob(away_ml)
)
# Find positive EV bets
home_bets <- betting_data %>%
filter(home_ml_ev > 2) %>%
mutate(
team = home_team,
odds = home_ml,
model_prob = home_win_prob,
ev = home_ml_ev,
edge = home_edge
)
away_bets <- betting_data %>%
filter(away_ml_ev > 2) %>%
mutate(
team = away_team,
odds = away_ml,
model_prob = 1 - home_win_prob,
ev = away_ml_ev,
edge = away_edge
)
opportunities <- bind_rows(home_bets, away_bets) %>%
select(game_id, team, odds, model_prob, ev, edge) %>%
arrange(desc(ev))
# Calculate bet sizes
if (nrow(opportunities) > 0) {
opportunities$bet_size <- mapply(
function(edge, odds) {
kelly <- kelly_criterion(
american_to_prob(odds) + edge,
odds,
kelly_fraction = 0.25
)
bankroll * kelly$bet_size
},
opportunities$edge,
opportunities$odds
)
}
# Print report
cat("\n=== Betting Opportunities ===\n")
print(opportunities)
return(opportunities)
}
# Helper functions
fetch_games <- function(date) {
# Placeholder - would fetch from API
data.frame(
game_id = 1:3,
date = date,
home_team = c("NYY", "BOS", "LAD"),
away_team = c("TOR", "TB", "SF")
)
}
fetch_odds <- function(date) {
# Placeholder - would fetch from odds API
data.frame(
game_id = 1:3,
home_ml = c(-150, -120, -180),
away_ml = c(130, 100, 160)
)
}
engineer_all_features <- function(games, team_stats, pitcher_stats) {
# Feature engineering
# Placeholder
games
}
# Python: Complete betting model pipeline
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import joblib
class MLBBettingPipeline:
"""End-to-end pipeline for MLB betting predictions"""
def __init__(self):
self.game_model = None
self.prop_models = {}
self.bankroll_manager = None
self.feature_cache = {}
def fetch_data(self, date):
"""Fetch all required data for given date"""
# This would connect to your data sources
# For example: MLB API, Statcast, odds API
data = {
'games': self.fetch_todays_games(date),
'team_stats': self.fetch_team_stats(date),
'pitcher_stats': self.fetch_pitcher_stats(date),
'odds': self.fetch_odds(date),
'weather': self.fetch_weather(date),
'lineups': self.fetch_lineups(date)
}
return data
def fetch_todays_games(self, date):
"""Fetch today's game schedule"""
# Placeholder - would call MLB API
return pd.DataFrame({
'game_id': [1, 2, 3],
'date': [date] * 3,
'home_team': ['NYY', 'BOS', 'LAD'],
'away_team': ['TOR', 'TB', 'SF'],
'home_pitcher': ['Cole', 'Sale', 'Kershaw'],
'away_pitcher': ['Gausman', 'Glasnow', 'Webb']
})
def engineer_features(self, data):
"""Create all features for predictions"""
games = data['games']
features_list = []
for idx, game in games.iterrows():
game_features = {
'game_id': game['game_id'],
'date': game['date'],
'home_team': game['home_team'],
'away_team': game['away_team']
}
# Add team stats
home_stats = self.get_team_features(
game['home_team'],
game['date'],
data['team_stats']
)
away_stats = self.get_team_features(
game['away_team'],
game['date'],
data['team_stats']
)
# Prefix features
for key, val in home_stats.items():
game_features[f'home_{key}'] = val
for key, val in away_stats.items():
game_features[f'away_{key}'] = val
# Add pitcher stats
home_pitcher_stats = self.get_pitcher_features(
game['home_pitcher'],
game['date'],
data['pitcher_stats']
)
away_pitcher_stats = self.get_pitcher_features(
game['away_pitcher'],
game['date'],
data['pitcher_stats']
)
for key, val in home_pitcher_stats.items():
game_features[f'home_pitcher_{key}'] = val
for key, val in away_pitcher_stats.items():
game_features[f'away_pitcher_{key}'] = val
# Add contextual features
contextual = self.get_contextual_features(game, data)
game_features.update(contextual)
features_list.append(game_features)
return pd.DataFrame(features_list)
def get_team_features(self, team, date, team_stats):
"""Extract team-level features"""
team_data = team_stats[
(team_stats['team'] == team) &
(team_stats['date'] < date)
].tail(20) # Last 20 games
if len(team_data) == 0:
return self.get_default_team_features()
return {
'win_pct': team_data['win'].mean(),
'runs_per_game': team_data['runs_scored'].mean(),
'runs_allowed_per_game': team_data['runs_allowed'].mean(),
'wOBA': team_data['wOBA'].mean(),
'FIP': team_data['FIP'].mean(),
'bullpen_ERA': team_data['bullpen_ERA'].mean()
}
def get_pitcher_features(self, pitcher, date, pitcher_stats):
"""Extract pitcher-level features"""
pitcher_data = pitcher_stats[
(pitcher_stats['pitcher'] == pitcher) &
(pitcher_stats['date'] < date)
].tail(5) # Last 5 starts
if len(pitcher_data) == 0:
return self.get_default_pitcher_features()
return {
'ERA': pitcher_data['ERA'].mean(),
'FIP': pitcher_data['FIP'].mean(),
'WHIP': pitcher_data['WHIP'].mean(),
'K_per_9': pitcher_data['K_per_9'].mean(),
'BB_per_9': pitcher_data['BB_per_9'].mean()
}
def get_contextual_features(self, game, data):
"""Extract contextual features"""
weather = data['weather'].get(game['game_id'], {})
return {
'temperature': weather.get('temp', 70),
'wind_speed': weather.get('wind', 0),
'precipitation': weather.get('precip', 0),
'day_of_week': game['date'].dayofweek,
'month': game['date'].month
}
def get_default_team_features(self):
"""Return league average features"""
return {
'win_pct': 0.500,
'runs_per_game': 4.5,
'runs_allowed_per_game': 4.5,
'wOBA': 0.320,
'FIP': 4.00,
'bullpen_ERA': 4.00
}
def get_default_pitcher_features(self):
"""Return league average pitcher features"""
return {
'ERA': 4.00,
'FIP': 4.00,
'WHIP': 1.30,
'K_per_9': 8.5,
'BB_per_9': 3.0
}
def make_predictions(self, features):
"""Generate predictions for all games"""
predictions = []
for idx, game_features in features.iterrows():
# Predict with game model
X = game_features[self.game_model.feature_names].values.reshape(1, -1)
home_win_prob = self.game_model.predict_proba(X)[0]
predictions.append({
'game_id': game_features['game_id'],
'home_team': game_features['home_team'],
'away_team': game_features['away_team'],
'home_win_prob': home_win_prob,
'away_win_prob': 1 - home_win_prob
})
return pd.DataFrame(predictions)
def find_betting_opportunities(self, predictions, odds_data, min_ev=2.0):
"""Identify positive EV betting opportunities"""
opportunities = []
for idx, pred in predictions.iterrows():
game_odds = odds_data[odds_data['game_id'] == pred['game_id']].iloc[0]
# Check home moneyline
home_ev = calculate_ev(
pred['home_win_prob'],
game_odds['home_ml']
)
# Check away moneyline
away_ev = calculate_ev(
pred['away_win_prob'],
game_odds['away_ml']
)
# Add positive EV bets
if home_ev['ev_percent'] >= min_ev:
opportunities.append({
'game_id': pred['game_id'],
'bet_type': 'moneyline',
'team': pred['home_team'],
'odds': game_odds['home_ml'],
'model_prob': pred['home_win_prob'],
'ev_percent': home_ev['ev_percent'],
'edge': home_ev['edge']
})
if away_ev['ev_percent'] >= min_ev:
opportunities.append({
'game_id': pred['game_id'],
'bet_type': 'moneyline',
'team': pred['away_team'],
'odds': game_odds['away_ml'],
'model_prob': pred['away_win_prob'],
'ev_percent': away_ev['ev_percent'],
'edge': away_ev['edge']
})
return pd.DataFrame(opportunities)
def calculate_bet_sizes(self, opportunities):
"""Calculate appropriate bet sizes using Kelly"""
for idx, opp in opportunities.iterrows():
bet_size_info = self.bankroll_manager.calculate_bet_size(
edge=opp['edge'],
odds=opp['odds'],
method='fractional_kelly'
)
opportunities.loc[idx, 'bet_size'] = bet_size_info['bet_size']
opportunities.loc[idx, 'bet_pct'] = bet_size_info['bet_pct']
return opportunities
def generate_daily_report(self, date):
"""Generate complete daily betting report"""
print(f"=== MLB Betting Report for {date} ===\n")
# Fetch data
print("Fetching data...")
data = self.fetch_data(date)
# Engineer features
print("Engineering features...")
features = self.engineer_features(data)
# Make predictions
print("Generating predictions...")
predictions = self.make_predictions(features)
# Find opportunities
print("Identifying betting opportunities...")
opportunities = self.find_betting_opportunities(
predictions,
data['odds'],
min_ev=2.0
)
if len(opportunities) == 0:
print("\nNo positive EV opportunities found today.")
return None
# Calculate bet sizes
opportunities = self.calculate_bet_sizes(opportunities)
# Sort by EV
opportunities = opportunities.sort_values('ev_percent', ascending=False)
# Display report
print(f"\n{len(opportunities)} Betting Opportunities Found:\n")
for idx, opp in opportunities.iterrows():
print(f"Game: {opp['team']}")
print(f" Bet Type: {opp['bet_type']}")
print(f" Odds: {opp['odds']:+.0f}")
print(f" Model Probability: {opp['model_prob']:.1%}")
print(f" Edge: {opp['edge']:.2%}")
print(f" EV: {opp['ev_percent']:.2f}%")
print(f" Recommended Bet: ${opp['bet_size']:.2f} ({opp['bet_pct']:.2%} of bankroll)")
print()
return opportunities
# Example usage
def main():
"""Run daily betting pipeline"""
# Initialize pipeline
pipeline = MLBBettingPipeline()
# Load trained models (assumed to be pre-trained)
# pipeline.game_model = joblib.load('models/game_model.pkl')
# Initialize bankroll manager
pipeline.bankroll_manager = BankrollManager(
initial_bankroll=10000,
kelly_fraction=0.25,
max_bet_pct=0.05
)
# Generate report for today
today = datetime.now()
opportunities = pipeline.generate_daily_report(today)
return opportunities
Exercise 1: Odds Conversion (Easy)
Convert the following betting lines to implied probabilities and calculate the vig:
a) Yankees -180 vs Red Sox +160
b) Dodgers -250 vs Giants +210
c) Astros +105 vs Rangers -125
Tasks:
- Calculate implied probability for each team
- Calculate the vig (overround)
- Calculate no-vig (fair) probabilities
Exercise 2: Expected Value Calculation (Easy)
Your model predicts the following win probabilities. Calculate the EV for each bet:
| Team | Model Prob | Odds | Bet Amount |
|---|---|---|---|
| Team A | 58% | -140 | $100 |
| Team B | 45% | +150 | $100 |
| Team C | 52% | -105 | $100 |
Which bets have positive EV?
Exercise 3: Kelly Criterion Application (Medium)
You have a $5,000 bankroll and identify the following edges:
| Bet | Odds | Edge |
|---|---|---|
| Bet 1 | -110 | 3% |
| Bet 2 | +140 | 5% |
| Bet 3 | -180 | 2% |
Tasks:
- Calculate full Kelly bet size for each
- Calculate quarter-Kelly bet size for each
- If you can only make one bet, which should you choose and why?
Exercise 4: Run Line vs Moneyline (Medium)
Given:
- Yankees moneyline: -150 (implied prob 60%)
- Yankees run line (-1.5): +120
- Your model gives Yankees 62% to win
Tasks:
- Estimate probability Yankees win by 2+ runs (assume they win by 2+ in 45% of their wins)
- Calculate EV for moneyline bet
- Calculate EV for run line bet
- Which is the better bet?
Exercise 5: Player Prop Modeling (Hard)
Model a strikeout prop for pitcher with the following data:
Pitcher Stats (last 5 starts):
- Strikeouts: 7, 9, 6, 8, 10
- Innings: 6.0, 7.0, 5.1, 6.2, 7.0
Opposing Team:
- Team K rate: 24.5% (league average: 22.8%)
Prop Line: Over/Under 6.5 strikeouts
Odds: Over -115, Under -105
Tasks:
- Estimate pitcher's expected strikeouts (account for team K rate)
- Model as Poisson distribution
- Calculate probability of over 6.5
- Determine if there's value on either side
- Calculate recommended bet size using quarter-Kelly
Exercise 6: Live Betting Simulation (Hard)
Simulate a live betting scenario:
Game State:
- Top 8th inning, 1 out, runner on 2nd
- Home team leading 4-3
- Home team has closer available (1.50 ERA)
- Away team has 9th hitter, 1-2 hitters due up
Live Odds:
- Home -200
- Away +170
Tasks:
- Estimate home team win probability using run expectancy
- Calculate implied probability from odds
- Determine if there's betting value
- Estimate how win probability changes if:
- Current batter gets a hit
- Current batter makes an out
Exercise 7: Bankroll Simulation (Hard)
Simulate a betting season with the following parameters:
- Starting bankroll: $10,000
- Number of bets: 200
- Average edge: 3%
- Win rate: 55%
- Average odds: -110
- Betting strategy: Quarter-Kelly
Tasks:
- Implement the simulation in R or Python
- Run 1,000 simulations
- Calculate:
- Mean final bankroll
- Median final bankroll
- Probability of doubling bankroll
- Probability of losing 50%+ of bankroll (risk of ruin)
- Maximum drawdown distribution
- Compare results to flat betting (2% of bankroll per bet)
Exercise 8: Model Calibration (Hard)
You've collected predictions and outcomes from your model:
| Predicted Prob | Actual Win Rate | Sample Size |
|---|---|---|
| 0.45-0.50 | 0.47 | 50 |
| 0.50-0.55 | 0.52 | 80 |
| 0.55-0.60 | 0.61 | 75 |
| 0.60-0.65 | 0.59 | 60 |
| 0.65-0.70 | 0.68 | 40 |
Tasks:
- Calculate calibration error for each bin
- Create a calibration plot
- Calculate Brier score
- Calculate log loss
- Suggest adjustments to improve calibration
- Implement isotonic regression calibration in Python
Exercise 9: Multi-Factor Model (Expert)
Build a comprehensive game prediction model using:
Data Sources:
- Team statistics (last 30 days): batting stats, pitching stats
- Starting pitcher metrics: ERA, FIP, K/9, BB/9, WHIP
- Bullpen strength: ERA, recent workload
- Home field advantage
- Rest days
- Weather conditions
- Umpire factors
Tasks:
- Engineer at least 20 meaningful features
- Train multiple models (logistic regression, random forest, XGBoost)
- Implement proper time-series cross-validation
- Calibrate probability predictions
- Evaluate model performance using:
- Log loss
- Brier score
- ROC-AUC
- Calibration plots
- Backtest on 2023 season data with proper betting simulation
Exercise 10: Betting Strategy Optimization (Expert)
Design and test an optimal betting strategy:
Requirements:
- Use your predictions from Exercise 9
- Incorporate bankroll management
- Consider multiple bet types (ML, RL, totals)
- Implement portfolio approach (diversification)
- Account for correlation between bets
Tasks:
- Define criteria for placing bets (minimum EV, minimum edge, etc.)
- Implement position sizing strategy
- Create risk management rules (max bets per day, max exposure, stop-loss)
- Backtest on historical data (full season)
- Calculate performance metrics:
- Total ROI
- Sharpe ratio
- Maximum drawdown
- Win rate
- Average bet size
- Compare to benchmark strategies (flat betting, aggressive Kelly)
- Perform sensitivity analysis on key parameters
Summary
This chapter covered the analytical foundations of sports betting as applied to MLB:
- Market Mechanics: Understanding how odds work, implied probability, and the efficient market hypothesis
- Predictive Modeling: Building robust models using machine learning, Elo ratings, and advanced features
- Value Assessment: Calculating expected value and identifying profitable betting opportunities
- Kelly Criterion: Optimal bet sizing based on edge and probability
- Player Props: Modeling individual player performance using Statcast and distributions
- Live Betting: Real-time win probability updates and in-game opportunities
- Bankroll Management: Risk assessment, portfolio theory, and long-term sustainability
- Implementation: End-to-end pipeline for daily betting analysis
Key Takeaways:
- Sports betting markets are highly efficient but not perfectly so
- Consistent profit requires significant edge (typically 3-5%+) after accounting for vig
- Proper bankroll management is as important as finding edges
- Model calibration is critical—poorly calibrated probabilities lead to incorrect bet sizing
- Diversification and portfolio thinking reduce risk
- Long-term success requires discipline, data-driven decisions, and continuous model improvement
Further Reading:
- Haghighat, E. et al. (2021). "Machine Learning Applications in Sports Betting"
- Kovalchik, S. (2016). "Searching for the GOAT of tennis win prediction"
- Lopez, M. & Matthews, G. (2015). "Building an NCAA basketball prediction model"
- Boulier, B. & Stekler, H. (2003). "Predicting the outcomes of National Football League games"
Warning: This chapter presents analytical methods for educational purposes. Sports betting involves significant financial risk. Most bettors lose money over time. This material should not be construed as encouragement to gamble. Always bet responsibly and within your means.