There is a particular kind of frustration that comes from wanting a baseball number you can see but having no idea how to get it out of the internet and into your own hands. You can read a leaderboard on a website; you cannot bend it, filter it, or chart it. pybaseball closes that gap. It is a free, open-source Python library that pulls Major League data — season stats, Statcast pitch-by-pitch, leaderboards — straight into a pandas DataFrame, where it becomes yours to do anything with.
This is a from-scratch tutorial. By the end you will have installed the library, pulled the 2025 batting leaderboard, filtered it down to the hitters who actually played, and drawn a clean bar chart of the top ten by OPS — the exact chart below, produced by about twenty lines of code. No API key, no scraping, no spreadsheet exports. Ten minutes, start to finish.
Install it
pybaseball lives on PyPI, so installation is a single command. Open a terminal and run:
pip install pybaseball
That pulls in pandas, requests, and matplotlib as dependencies, which means you already have everything you need to fetch data and plot it. If you keep your projects in virtual environments — and you should — activate yours first. The version this tutorial was written against is 2.2.7; you can confirm what you got by running python -c "import pybaseball; print(pybaseball.__version__)". Anything in the 2.x line behaves the same way for what follows.
The imports
Three lines set the stage. We import matplotlib for plotting and pybaseball under the short alias pyb. The first line silences a thicket of harmless deprecation and SSL warnings that some of the backends emit — cosmetic, but it keeps your console readable.
import warnings; warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt
import pybaseball as pyb
That is the entire setup. Everything else is two function calls and a chart.
Pull the leaderboard
Here is the single most important decision in this whole exercise, and it is one most tutorials get wrong. pybaseball can pull season batting lines from two different sources: FanGraphs (via batting_stats()) and Baseball-Reference (via batting_stats_bref()). They return similar-looking tables. They do not behave the same way for automated scripts.
FanGraphs aggressively rate-limits and frequently blocks programmatic access. Call batting_stats() a couple of times in quick succession and you will start collecting timeouts, 403s, and empty frames — not because your code is wrong, but because the server decided a robot was knocking. The Baseball-Reference and Statcast backends do not do this. They are the reliable choice, and they are what this tutorial uses throughout. So we reach for batting_stats_bref:
df = pyb.batting_stats_bref(2025)
df = df[df["PA"] >= 300].copy()
top = df.sort_values("OPS", ascending=False).head(10)
Three lines, three ideas. The first hands you the entire 2025 player-batting table as a DataFrame — every hitter, dozens of columns, indexed and ready. The second filters it: df["PA"] >= 300 keeps only players with at least 300 plate appearances, which throws out September call-ups and bench arms whose tiny samples would otherwise clutter the top of a rate-stat list. The .copy() is a small good habit — it tells pandas you mean to keep this slice as its own object and quietly avoids a SettingWithCopyWarning later. The third line sorts by OPS, highest first, and keeps the top ten.
Note that the column names — PA, OPS, OBP, SLG, HR — come straight from Baseball-Reference’s own headers. If you ever forget what a table offers, print(df.columns.tolist()) spells it out.
Chart the top ten
A DataFrame is useful; a picture is persuasive. We will draw a horizontal bar chart, which is the right shape for a ranked list of names because the labels read left-to-right like a leaderboard should. The setup grabs the names and OPS values out of the top frame and reverses both lists with [::-1] — matplotlib draws the first bar at the bottom, so reversing puts the best hitter on top where the eye expects him.
fig, ax = plt.subplots(figsize=(8.0, 5.4))
names = top["Name"].tolist()[::-1]
vals = [float(v) for v in top["OPS"].tolist()][::-1]
ax.barh(range(len(names)), vals)
ax.set_yticks(range(len(names)))
ax.set_yticklabels(names, fontsize=10)
ax.set_xlabel("OPS (on-base plus slugging)")
ax.set_title("2025 OPS leaders, pulled with pybaseball", loc="left")
for i, v in enumerate(vals):
ax.text(v + .004, i, "%.3f" % v, va="center", fontsize=9)
plt.show()
The barh call draws the bars; the two set_yticks/set_yticklabels lines attach player names to them. The little for loop is the touch that makes it feel finished: it writes each OPS value as text just past the end of its bar, formatted to three decimals with "%.3f", so a reader gets both the visual length and the exact figure. Swap plt.show() for fig.savefig("ops_leaders.png") if you would rather write the chart to a file. Run the script and the figure below is what you get.
Reading what you pulled
The numbers are worth a glance, because they are a good sanity check that the pull worked. Aaron Judge sits on top at a 1.150 OPS — 54 home runs against a .462 on-base percentage and a .688 slugging mark, a line that barely looks real. Shohei Ohtani follows at 1.022, and rookie Nick Kurtz crashes the list at 1.002 on the strength of a .619 slug in under 500 plate appearances. Here is the full top ten exactly as the script saved it:
| Player | PA | HR | OBP | SLG | OPS |
|---|---|---|---|---|---|
| Aaron Judge | 710 | 54 | 0.462 | 0.688 | 1.15 |
| Shohei Ohtani | 811 | 63 | 0.393 | 0.629 | 1.022 |
| Nick Kurtz | 489 | 36 | 0.383 | 0.619 | 1.002 |
| Cal Raleigh | 759 | 65 | 0.362 | 0.595 | 0.957 |
| George Springer | 661 | 36 | 0.393 | 0.559 | 0.952 |
| Ronald Acu\xc3\xb1a Jr. | 412 | 21 | 0.417 | 0.518 | 0.935 |
| Kyle Schwarber | 742 | 58 | 0.363 | 0.565 | 0.927 |
| Juan Soto | 715 | 43 | 0.396 | 0.525 | 0.921 |
| Kyle Stowers | 457 | 25 | 0.368 | 0.544 | 0.912 |
| Giancarlo Stanton | 311 | 24 | 0.342 | 0.564 | 0.906 |
That table was not typed by hand. It is the same top DataFrame, written out column by column — which is the whole point of getting data into Python in the first place. Once it is a frame, charting it, tabulating it, joining it to another season, or filtering it by team are all one line away.
Where to go next
Two functions unlock most of the rest of the library. The first is playerid_lookup, which translates a human name into the various ID numbers the data sources use:
pyb.playerid_lookup("judge", "aaron")
That returns a small DataFrame with, among other columns, key_mlbam — the MLB Advanced Media ID. You need that ID for the second function, statcast_batter, which pulls a single hitter’s every tracked pitch and batted ball, complete with exit velocity, launch angle, and hit coordinates. That is the raw material for a spray chart, and it is exactly the next tutorial in this series: building a spray chart from Statcast data picks up precisely where this one leaves off, turning statcast_batter output into a picture of where a hitter sprays the ball.
A short troubleshooting note
Three things trip up newcomers. First, rate limits: if a call hangs or returns nothing, you are almost certainly hitting a FanGraphs endpoint — switch to the Baseball-Reference or Statcast equivalents, and do not hammer any source in a tight loop. Second, caching: pybaseball can remember responses so you are not re-downloading the same data on every run, which is both faster and friendlier to the servers. Enable it once with from pybaseball import cache; cache.enable() and repeated pulls come back instantly. Third, an empty DataFrame usually means a season hasn’t started or a name was misspelled, not that anything is broken — check your arguments before you check your code.
The bottom line
The distance between “I wish I could chart that” and an actual chart is, it turns out, about twenty lines of Python and one good decision about which backend to trust. Install pybaseball, lean on the Baseball-Reference and Statcast sources, get your data into a DataFrame, and the entire toolbox of pandas and matplotlib opens up behind it. You now have the leaderboard in your hands — bend it however you like.
Sources & Further Reading
- Leaderboard data: Baseball-Reference, pulled via pybaseball’s
batting_stats_brefbackend (version 2.2.7). Numbers retrieved June 2026; re-runnable viascripts/pybaseball_quickstart.py. - Baseball Savant — the Statcast source behind
statcast_batterand the spray-chart follow-up. - FanGraphs — an alternate stats backend, noted here for the rate-limiting caveat discussed above.