Here is a piece of sabermetric folklore worth testing: a team’s run differential tells you more about its future than its actual won-lost record does. The claim sounds almost paradoxical — how can a made-up “expected” record beat the real one? — but it falls straight out of the data. Across four recent seasons, a team’s Pythagorean win percentage (built from runs scored and allowed) predicts next year’s record better than its actual win percentage does: r = 0.57 versus 0.54. It’s a small edge, but it’s real and it’s the right direction, and the reason why is the whole point.

0.57Pythagorean win% → next year's win%
0.54Actual win% → next year's win%
90consecutive-season pairs tested (2021-2024)

The test

For every team I lined up two numbers from one season against its record the following season: its actual winning percentage, and its Pythagorean winning percentage — the record it “should” have had given its runs scored and allowed (RS^1.83 / (RS^1.83 + RA^1.83)). Then I asked which one correlates more strongly with next year’s actual record across all 90 consecutive-season pairs from 2021 through 2024. Whichever wins is the better crystal ball.

The exhibit

A bar chart comparing two correlations with next season's win percentage: actual win percentage at 0.54 and Pythagorean win percentage at 0.57, across 90 consecutive-season pairs from 2021 to 2024. The Pythagorean bar is slightly taller.
How well each year-N measure predicts year-(N+1) win%, across 90 team-season pairs (2021–2024). The Pythagorean record (from run differential) edges the actual record, 0.57 to 0.54. Data: MLB Stats API standings + run differential.

The Pythagorean bar is the taller one. It’s not a blowout — 0.57 versus 0.54 — but the run-differential-based estimate beats the literal record at forecasting the future, every time you’d expect a fluke to favor one or the other. The team’s real won-lost record is the worse predictor of its own future record.

Why the “fake” record wins

The resolution is that a season’s record is part skill and part luck, and the luck doesn’t carry over. The biggest source of that luck is timing — how a team’s runs happened to be distributed across games. Bunch your runs into blowout wins and waste them in close losses and you’ll finish below your run differential; cluster them efficiently and you’ll finish above it. That over- or under-performance is mostly luck, and luck is, by definition, not repeatable.

Run differential strips that timing luck out. By looking at how many runs a team scored and allowed rather than when, the Pythagorean record captures the part of the season that reflects true quality and discards the part that was a coin flip. So when you carry it forward a year, you’re forecasting from the signal instead of the signal-plus-noise. The actual record, by contrast, drags last year’s good or bad timing luck into a prediction where it has no business being — which is exactly why it’s the worse forecaster.

A worked intuition

Picture two 85-win teams. One had a +90 run differential (a true ~91-win team that underperformed by six on bad timing); the other had a +5 differential (a true ~82-win team that overperformed by three on good timing). Their records are identical, so actual win% predicts the same future for both. But they are not the same team: the first is genuinely better and is the better bet to win more next year, while the second is likely to slide back. Pythagorean record sees that difference; the standings don’t. Multiply that across ninety team-seasons and you get the 0.57-versus-0.54 edge — the predictable consequence of forecasting from differential instead of wins. It’s the same logic behind why last year only weakly predicts this year at all: records regress, and run differential regresses less.

Where this read has limits

  • The edge is small, and the sample is modest. 0.57 vs 0.54 over 90 pairs from four seasons is a real but narrow gap; a different window would wobble the exact numbers. The direction (Pythagorean > actual) is the robust, repeatable finding documented across far longer histories — not the third decimal.
  • Neither number is a great predictor. Both top out near 0.55 because so much changes between seasons — roster turnover, injuries, aging. Run differential is the better baseline, not a strong one; real projection systems beat both by adding player-level forecasts.
  • Run differential has its own luck. It removes timing luck but not everything — a team’s RS/RA still carry some sequencing and health noise. It’s a better signal, not a pure one.
  • Park and league context. Raw runs aren’t park-adjusted; the Pythagorean estimate inherits whatever distortion the run environment adds, which a fuller model would strip out.

The takeaway

If you want to know how good a team really was — and how good it’s likely to be next year — look at its run differential, not its record. The Pythagorean “expected” win percentage beats the actual one at predicting the future (0.57 to 0.54) for a simple reason: it keeps the signal (how many runs you scored and allowed) and throws away the noise (the lucky timing that turned those runs into a few extra or fewer wins). The standings are what happened; run differential is closer to what was real — and what’s real is what carries over.

Reproduce it

The records and run totals are bundled in data_layer/standings_multiseason.json (2021–2024, MLB Stats API). For each consecutive-season pair, compute year-N actual win% and Pythagorean win% (from RS/RA), then correlate each with year-(N+1) win%. The comparison is regenerated by charts/chart_pythag_predicts_next.py. No network at build time, nothing hand-entered.

Sources & Further Reading