Resources for MLB Analytics
Everything you need to get started with baseball analytics - data sources, packages, installation guides, and reference materials.
Quick Start Guides
Get up and running in minutes
R Environment Setup
~15 minutesInstall R
Download from CRAN (choose your OS)
Install RStudio
Download from Posit (free Desktop version)
Install Packages
Run this code in RStudio:
# Install core packages
install.packages(c(
"baseballr", # MLB data access
"Lahman", # Historical database
"tidyverse", # Data manipulation
"ggplot2", # Static visualization
"plotly", # Interactive viz
"gt", # Beautiful tables
"shiny" # Interactive apps
))
# Load packages
library(baseballr)
library(tidyverse)
# Test: Get today's standings
mlb_standings(2024)
Python Environment Setup
~15 minutesInstall Python
Download from python.org or use Anaconda
Install Packages
Run this in your terminal:
# Install core packages
pip install pybaseball pandas numpy
pip install matplotlib seaborn plotly
pip install scikit-learn streamlit
# Or with conda:
conda install pandas numpy matplotlib seaborn
pip install pybaseball
# Test your setup
from pybaseball import standings
import pandas as pd
# Get current standings
standings(2024)
Data Sources
Where MLB data comes from
Baseball Savant
baseballsavant.mlb.com
MLB's official Statcast data portal. The source for pitch-level tracking data including exit velocity, launch angle, spin rate, pitch movement, sprint speed, and catch probability.
statcast_search() in R
statcast_batter() in Python
FanGraphs
fangraphs.com
The leading sabermetrics site. Comprehensive player stats, leaderboards, projections, and advanced metrics. Excellent for WAR, wRC+, and plate discipline data.
fg_batter_leaders() in R
batting_stats() in Python
Baseball Reference
baseball-reference.com
The most comprehensive historical baseball database. Complete stats back to 1871, detailed game logs, and their own WAR calculation (bWAR).
bref_daily_batter() in R
bwar_bat() in Python
Lahman Database
seanlahman.com/baseball-archive
The gold standard for historical baseball research. Complete season-by-season stats from 1871 to present, available as an R package or downloadable files.
library(Lahman) in R
lahman.download_lahman() in Python
Retrosheet
retrosheet.org
Play-by-play data for historical games. Essential for win probability analysis, run expectancy matrices, and detailed situational analysis.
retrosheet_data() in R
Direct downloads for Python
MLB Stats API
statsapi.mlb.com
Official MLB API for real-time data. Schedules, rosters, live game data, and standings. Powers most baseball apps and sites.
mlb_game_pk() in R
schedule_and_record() in Python
Package Reference
Essential libraries for baseball analytics
R Packages
The essential R package for MLB data. Access Statcast, FanGraphs, Baseball Reference, and more.
install.packages("baseballr")
Complete historical baseball database from 1871 to present as data frames.
install.packages("Lahman")
Collection of packages for data science: dplyr, ggplot2, tidyr, readr, and more.
install.packages("tidyverse")
The grammar of graphics for R. Create publication-quality visualizations.
install.packages("ggplot2")
Create interactive charts. Use ggplotly() to convert ggplot2 plots.
install.packages("plotly")
Build interactive web applications entirely in R. Perfect for dashboards.
install.packages("shiny")
Python Libraries
Python's answer to baseballr. Access Statcast, FanGraphs, and Baseball Reference.
pip install pybaseball
The foundational data analysis library. DataFrames, data cleaning, and analysis.
pip install pandas
Numerical computing in Python. Arrays, linear algebra, and statistics.
pip install numpy
Static visualizations. Seaborn adds statistical plotting capabilities.
pip install matplotlib seaborn
Interactive charts with Python. Works great with pandas DataFrames.
pip install plotly
Machine learning in Python. Classification, regression, clustering, and more.
pip install scikit-learn
Additional Learning Resources
Books, podcasts, and communities to deepen your knowledge
Essential Books
- The Book Playing the Percentages in Baseball - The definitive guide to baseball strategy
- Analyzing Baseball Data with R Marchi & Albert - Hands-on R tutorials with baseball data
- Moneyball Michael Lewis - The story that popularized baseball analytics
- The Hidden Game of Baseball Palmer & Thorn - Classic introduction to sabermetrics
- Smart Baseball Keith Law - Modern guide to understanding baseball stats
Podcasts
- Effectively Wild FanGraphs - Daily baseball analysis and discussion
- Statcast Podcast MLB - Deep dives into Statcast data with Mike Petriello
- Baseball Prospectus BP staff discussing advanced baseball analysis
- Fangraphs Audio Various shows covering all aspects of baseball
- Talking Sabermetrics Academic and industry perspectives on baseball analytics
Communities
- SABR Society for American Baseball Research - The OG analytics community
- r/Sabermetrics Reddit community for baseball analytics discussion
- Baseball Twitter/X Follow analysts, writers, and enthusiasts for daily insights
- FanGraphs Community User blogs and discussion forums
- Baseball Prospectus Premium content and community forums