BaseRuns and Cluster Luck: Why Run Totals Don't Match the Parts

Add up everything a lineup did over a game — the singles, the walks, the extra-base hits — and you have a tidy inventory of offensive events. What you do not have, not reliably, is the number of runs that scored. Runs are not a sum of events; they are a product of events arriving in the right order. Three singles bunched into one inning plate a run or two. The same three singles spread one apiece across the third, sixth, and eighth innings usually plate nobody at all.

That ordering — the sequencing of when the hits fall relative to the outs and the men on base — is where actual runs peel away from deserved runs. And sequencing, over any stretch shorter than a full season, is mostly luck. BaseRuns is the framework that estimates how many runs a team’s underlying events should have produced, sequence-blind, and the gap between that estimate and the real scoreboard has a name that has stuck: cluster luck.

Why runs aren’t a sum

Imagine two teams that, over a week, collect the identical line: the same hits, the same walks, the same total bases, the same number of outs. By every component measure they hit exactly alike. Yet one team can score noticeably more runs than the other, and neither did anything wrong. The difference is purely when the events happened.

A run requires a runner to reach and then be driven in before the third out ends the inning. Bunch your baserunners together and they drive each other home; scatter them across innings, each erased by outs before the next one arrives, and they strand harmlessly. The total inventory of offense is the same either way. This is the central, slightly uncomfortable fact: a meaningful slice of any short-run scoreboard is sequencing, and sequencing is not a skill a lineup can summon on command. It is, start to start, closer to a coin flip than a craft.

What BaseRuns is trying to do

BaseRuns — a model originated by sabermetrician David Smyth — answers a deliberately narrow question: given a team’s rate of getting runners on base and its rate of advancing them, how many runs would that profile yield on average, with no credit or blame for the order events happened to fall in? It is an estimate of context-neutral run production, the runs the parts deserve.

The appeal is that it sidesteps the noisiest input entirely. By never looking at sequencing, BaseRuns gives you a number that reflects only the quality and quantity of the events — the on-base ability and the slugging — and not the luck of their arrangement. Compare it against the runs a team actually scored and the difference isolates how much sequencing helped or hurt. That residual is the quantity worth studying, because unlike the events themselves, it does not tend to persist.

The structure of the formula

BaseRuns has a distinctive shape that is worth seeing even if you never compute it by hand. It is built from four quantities, usually written A, B, C, and D:

Runs = ( A × B ) / ( B + C ) + D

Read it piece by piece. A is baserunners — roughly the events that put a man on (hits and walks, minus home runs, which are handled separately). B is an advancement term, a weighted measure of how effectively those runners get moved around the bases, driven mostly by extra-base power. C is outs. And D is home runs, added in flat at the end because a home run scores its hitter no matter what — it needs no help from sequencing, so it sits outside the fraction.

The engine is the middle fraction, B / ( B + C ). Think of it as the share of baserunners who come around to score: the rate at which advancement beats out the clock of outs. Multiply your runners (A) by that scoring rate and you get the runs the parts imply; then add the home runs (D) that score themselves. The exact coefficients inside A, B, and D are calibrated to a season’s run environment — FanGraphs publishes team BaseRuns and is the place to read the current weights — but the skeleton above is the whole idea. Crucially, nowhere does the formula know or care what inning anything happened in.

An illustrative example

A clearly hypothetical case makes the gap concrete. Picture a team whose season-long components — its hits, walks, power, and outs — plug into BaseRuns and yield an estimate of, say, 760 runs. That is what the parts say this offense was worth. Now suppose the actual scoreboard credited them with 805 runs.

The 45-run surplus is cluster luck: this lineup got its hits in unusually productive sequences — rallies that bunched, singles that happened to follow walks, two-out knocks that found runners already aboard. Nothing about it is fake; those runs really scored. But the surplus reflects timing, not a repeatable talent for timing, and the honest expectation is that it fades. A team running 45 runs ahead of its components is a strong candidate to score fewer next year even if it hits identically — a textbook case of regression to the mean. (Numbers invented to illustrate the arithmetic; no real club is implied.)

The same logic for runs allowed

Everything above flips cleanly onto the other side of the ledger. A pitching staff and defense surrender a season’s worth of events too, and those events can be sequenced kindly or cruelly. Run BaseRuns on the components allowed and you get the runs a team should have given up; compare it to the runs actually allowed and you have the defensive half of cluster luck.

A team that allowed far fewer runs than its components imply got the friendly sequencing — the baserunners it permitted tended to strand — and is a candidate to give more back. One that allowed far more than its parts suggest was sequenced against and may improve. Put both halves together and BaseRuns yields an expected run differential built entirely from context-neutral components, which is often a better read on a team’s true quality than its actual differential. It pairs naturally with a hand-rolled run-differential power ranking: the actual differential tells you what happened, the BaseRuns differential tells you how much of it the team earned.

How to use the gap

The practical move is to treat the BaseRuns gap the way you treat any luck signal — as a forecast nudge, not a verdict on the past. A first-place team coasting on a large positive cluster-luck gap is overperforming its parts and is a regression risk; a sub-.500 team with strong components and a big negative gap may be quietly better than its record. The events are the signal, the sequencing is the noise, and BaseRuns is the tool that separates them.

Two honest cautions keep this from becoming a crystal ball. First, cluster luck regresses strongly but never perfectly to zero, and a single season is a finite sample, so the residual is a tilt in the odds, not a guarantee. Second, BaseRuns is a model with its own approximations — it assumes average advancement behavior and cannot see a lineup genuinely built to manufacture runs, just as it cannot see a baserunning disaster. Used as a probabilistic adjustment rather than a prophecy, though, it is one of the cleaner ways to ask a deceptively hard question: did that team’s runs match its parts, and if not, which way should you bet?

The bottom line

Runs are events in sequence, and sequence is mostly luck. BaseRuns strips the sequencing out and tells you the runs a team’s components deserved; cluster luck is the leftover, the distance between what the scoreboard said and what the parts implied. It applies symmetrically to runs scored and runs allowed, and in both directions the gap tends to shrink. So when a club is wildly outscoring — or outscored by — the quality of its underlying hitting and pitching, do not take the run total at face value. Find the BaseRuns gap, remember that it regresses, and bet accordingly.

Sources & Further Reading

FanGraphs — team BaseRuns standings and the expected-vs-actual run differentials that surface cluster luck.
FanGraphs Library — the BaseRuns entry, including the A/B/C/D construction and the current-season coefficients.
SABR — background on run-estimation methods and the sequencing problem BaseRuns is built to solve.