Compute wOBA From Scratch in Python (With Real Linear Weights)

Why bother computing wOBA by hand when FanGraphs hands it to you for free? Because once you have built the thing, you stop treating it as a black box. wOBA is genuinely simple under the lid — a weighted sum of the ways a hitter reached base, divided by a plate-appearance denominator — and the whole model fits in about a dozen lines of Python. Type those lines once and the stat stops being a number on a page and becomes something you actually understand.

Here is the plan. We write the formula out, drop the linear weights into a dictionary, wrap a woba() function around a line of counting stats, run it on a small illustrative batting line, and finish by pulling real counts so you can aim it at any hitter who ever played. The core needs no libraries — just Python.

The formula, in one line

wOBA assigns every way of reaching base a coefficient equal to its measured run value, sums those up, and divides by a denominator that counts the meaningful plate appearances. Written out, a representative recent-season version looks like this:

wOBA = ( 0.69×uBB + 0.72×HBP + 0.89×1B + 1.27×2B + 1.62×3B + 2.10×HR ) / ( AB + BB − IBB + SF + HBP )

One detail to internalize before we code: the weights are not constants of nature. FanGraphs recalculates them every season from league-wide run expectancy and publishes them in its annual “Guts!” table, and they drift a few points year to year as the run environment shifts. The set above is representative of a recent season and perfect for learning the mechanics, but for a real analysis you should pull the exact weights for the exact year from FanGraphs. Note also that the numerator uses unintentional walks (uBB) — an intentional walk is a managerial decision, not a hitting skill, so the standard formula strips it out, which is why IBB is also subtracted in the denominator.

The weights as a dictionary

Python’s dictionary is the natural home for a set of named coefficients. We map each event to its weight, which keeps the later arithmetic readable and makes swapping in a different season’s values a one-line edit:

WEIGHTS = {
    "uBB": 0.69,   # unintentional walk
    "HBP": 0.72,   # hit by pitch
    "1B":  0.89,   # single
    "2B":  1.27,   # double
    "3B":  1.62,   # triple
    "HR":  2.10,   # home run
}

That is the entire model. Every coefficient sits right next to a comment explaining what it is, and because it is just a dict, the day you want to recompute a 2011 wOBA you replace six numbers and nothing else changes.

The function

Now the function itself. It takes a dictionary of a hitter’s counting stats, computes the weighted numerator and the plate-appearance denominator separately, and returns their ratio. We guard against a zero denominator so an empty line returns 0.0 instead of crashing:

def woba(stats, weights=WEIGHTS):
    numerator = (
        weights["uBB"] * stats["uBB"]
        + weights["HBP"] * stats["HBP"]
        + weights["1B"]  * stats["1B"]
        + weights["2B"]  * stats["2B"]
        + weights["3B"]  * stats["3B"]
        + weights["HR"]  * stats["HR"]
    )
    denominator = (
        stats["AB"]
        + stats["BB"]
        - stats["IBB"]
        + stats["SF"]
        + stats["HBP"]
    )
    if denominator == 0:
        return 0.0
    return numerator / denominator

Read the two halves against the formula and they line up term for term. The numerator is the dot product of events and weights; the denominator is at-bats plus walks, minus intentional walks, plus sacrifice flies and hit-by-pitches. Note one bookkeeping point that trips people up: singles are not usually a column in a stat table — you derive them as total hits minus doubles, triples, and home runs (1B = H - 2B - 3B - HR), and we will do exactly that when we feed it real data.

Run it on a line

Let us put it to work on a small, clearly illustrative batting line — round numbers chosen to exercise the formula, not the real stat line of any actual player. Suppose a hitter, over a stretch, posts the following:

line = {
    "AB":  500,
    "BB":   60,
    "IBB":   5,
    "uBB":  55,   # BB - IBB
    "HBP":   8,
    "SF":    6,
    "1B":   95,
    "2B":   30,
    "3B":    3,
    "HR":   35,
}

print(round(woba(line), 3))

Run that and you get a wOBA in the high-.300s — a strong, well-above-average mark, which makes sense for a line carrying 35 home runs and 60 walks. The exact figure will depend on the weights you plug in, which is the whole lesson: change the season’s coefficients and the same counting line produces a slightly different wOBA. Because wOBA is anchored to the on-base scale, you can sanity-check the output against familiar landmarks — roughly .320 is league average, .370 is excellent, and .400 is most-valuable-player territory. If your function spits out a 1.4 or a 0.05, you have a bug, not a superstar.

Feeding it real counts

The dictionary above was typed by hand; the point of writing the function is to stop typing lines by hand. pybaseball will hand you real season totals as a pandas DataFrame, and from there it is a short hop to assembling the dictionary our function expects:

import pybaseball as pyb

df = pyb.batting_stats_bref(2025)
row = df[df["Name"] == "Some Player"].iloc[0]

line = {
    "AB":  int(row["AB"]),
    "BB":  int(row["BB"]),
    "IBB": int(row.get("IBB", 0)),
    "HBP": int(row["HBP"]),
    "SF":  int(row.get("SF", 0)),
    "2B":  int(row["2B"]),
    "3B":  int(row["3B"]),
    "HR":  int(row["HR"]),
}
line["1B"]  = int(row["H"]) - line["2B"] - line["3B"] - line["HR"]
line["uBB"] = line["BB"] - line["IBB"]

print(round(woba(line), 3))

Two derived fields do the real work: 1B backs singles out of total hits, and uBB backs intentional walks out of total walks. Everything else is a straight read from the row. A practical caution worth repeating: not every backend exposes IBB or SF, which is why we use row.get(..., 0) to default them to zero rather than crash — and if a source genuinely lacks them, your wOBA will be a hair off from the official figure rather than wrong in spirit. For a published-grade number, pair real counts with the exact yearly weights from FanGraphs, and the result will match the leaderboard to the thousandth.

The bottom line

wOBA looks like an advanced stat and behaves like one, but under the hood it is a weighted average you can build in an afternoon: a dictionary of run values, a function that sums and divides, and a denominator that counts the plate appearances that matter. Type the weights as a dict, write woba() once, and you can recompute the number for any hitter, any season, the moment you can get the counts — just remember the weights are a yearly download, not a constant, and that the version here is representative rather than exact. The same machinery, with Statcast inputs instead of box-score counts, is how expected wOBA gets built — same formula, different events.

Sources & Further Reading

Free textbook: Chapter 6: Numerical Summaries: Center, Spread, and Shape — the theory behind this, at DataField.dev.
FanGraphs Library — the wOBA entry and the annual “Guts!” table of exact yearly linear weights.
FanGraphs — season wOBA leaderboards to check your computed numbers against.
Baseball Savant — Statcast expected-wOBA, the same formula applied to batted-ball inputs.