Essential Tools & Data

Resources for MLB Analytics

Everything you need to get started with baseball analytics - data sources, packages, installation guides, and reference materials.

Quick Start Guides

Get up and running in minutes

R Environment Setup

~15 minutes
1
Install R

Download from CRAN (choose your OS)

2
Install RStudio

Download from Posit (free Desktop version)

3
Install Packages

Run this code in RStudio:

R
# Install core packages
install.packages(c(
  "baseballr",     # MLB data access
  "Lahman",        # Historical database
  "tidyverse",     # Data manipulation
  "ggplot2",       # Static visualization
  "plotly",        # Interactive viz
  "gt",            # Beautiful tables
  "shiny"          # Interactive apps
))

# Load packages
library(baseballr)
library(tidyverse)

# Test: Get today's standings
mlb_standings(2024)

Python Environment Setup

~15 minutes
1
Install Python

Download from python.org or use Anaconda

2
Choose an IDE

VS Code, Jupyter, or PyCharm

3
Install Packages

Run this in your terminal:

Terminal
# Install core packages
pip install pybaseball pandas numpy
pip install matplotlib seaborn plotly
pip install scikit-learn streamlit

# Or with conda:
conda install pandas numpy matplotlib seaborn
pip install pybaseball
Python
# Test your setup
from pybaseball import standings
import pandas as pd

# Get current standings
standings(2024)

Data Sources

Where MLB data comes from

Primary

Baseball Savant

baseballsavant.mlb.com

MLB's official Statcast data portal. The source for pitch-level tracking data including exit velocity, launch angle, spin rate, pitch movement, sprint speed, and catch probability.

Statcast Pitch Data Exit Velocity Sprint Speed Expected Stats
statcast_search() in R statcast_batter() in Python
Visit Site
Primary

FanGraphs

fangraphs.com

The leading sabermetrics site. Comprehensive player stats, leaderboards, projections, and advanced metrics. Excellent for WAR, wRC+, and plate discipline data.

WAR wRC+ Projections Splits
fg_batter_leaders() in R batting_stats() in Python
Visit Site
Primary

Baseball Reference

baseball-reference.com

The most comprehensive historical baseball database. Complete stats back to 1871, detailed game logs, and their own WAR calculation (bWAR).

Historical Stats Game Logs bWAR Transactions
bref_daily_batter() in R bwar_bat() in Python
Visit Site
Historical

Lahman Database

seanlahman.com/baseball-archive

The gold standard for historical baseball research. Complete season-by-season stats from 1871 to present, available as an R package or downloadable files.

Season Stats Team Records Salaries Awards
library(Lahman) in R lahman.download_lahman() in Python
Visit Site
Play-by-Play

Retrosheet

retrosheet.org

Play-by-play data for historical games. Essential for win probability analysis, run expectancy matrices, and detailed situational analysis.

Play-by-Play Game Logs Rosters Schedules
retrosheet_data() in R Direct downloads for Python
Visit Site
Real-Time

MLB Stats API

statsapi.mlb.com

Official MLB API for real-time data. Schedules, rosters, live game data, and standings. Powers most baseball apps and sites.

Live Games Schedules Rosters Standings
mlb_game_pk() in R schedule_and_record() in Python
API Docs

Package Reference

Essential libraries for baseball analytics

R Packages

baseballr Data Access

The essential R package for MLB data. Access Statcast, FanGraphs, Baseball Reference, and more.

install.packages("baseballr")
Lahman Historical

Complete historical baseball database from 1871 to present as data frames.

install.packages("Lahman")
tidyverse Data Manipulation

Collection of packages for data science: dplyr, ggplot2, tidyr, readr, and more.

install.packages("tidyverse")
ggplot2 Visualization

The grammar of graphics for R. Create publication-quality visualizations.

install.packages("ggplot2")
plotly Interactive Viz

Create interactive charts. Use ggplotly() to convert ggplot2 plots.

install.packages("plotly")
shiny Web Apps

Build interactive web applications entirely in R. Perfect for dashboards.

install.packages("shiny")

Python Libraries

pybaseball Data Access

Python's answer to baseballr. Access Statcast, FanGraphs, and Baseball Reference.

pip install pybaseball
pandas Data Manipulation

The foundational data analysis library. DataFrames, data cleaning, and analysis.

pip install pandas
numpy Numerical

Numerical computing in Python. Arrays, linear algebra, and statistics.

pip install numpy
matplotlib / seaborn Visualization

Static visualizations. Seaborn adds statistical plotting capabilities.

pip install matplotlib seaborn
plotly Interactive Viz

Interactive charts with Python. Works great with pandas DataFrames.

pip install plotly
scikit-learn Machine Learning

Machine learning in Python. Classification, regression, clustering, and more.

pip install scikit-learn

Additional Learning Resources

Books, podcasts, and communities to deepen your knowledge

Essential Books

  • The Book Playing the Percentages in Baseball - The definitive guide to baseball strategy
  • Analyzing Baseball Data with R Marchi & Albert - Hands-on R tutorials with baseball data
  • Moneyball Michael Lewis - The story that popularized baseball analytics
  • The Hidden Game of Baseball Palmer & Thorn - Classic introduction to sabermetrics
  • Smart Baseball Keith Law - Modern guide to understanding baseball stats

Podcasts

  • Effectively Wild FanGraphs - Daily baseball analysis and discussion
  • Statcast Podcast MLB - Deep dives into Statcast data with Mike Petriello
  • Baseball Prospectus BP staff discussing advanced baseball analysis
  • Fangraphs Audio Various shows covering all aspects of baseball
  • Talking Sabermetrics Academic and industry perspectives on baseball analytics

Communities

  • SABR Society for American Baseball Research - The OG analytics community
  • r/Sabermetrics Reddit community for baseball analytics discussion
  • Baseball Twitter/X Follow analysts, writers, and enthusiasts for daily insights
  • FanGraphs Community User blogs and discussion forums
  • Baseball Prospectus Premium content and community forums