Every pitching stat ever invented — ERA, FIP, even the expected metrics — waits for something to happen. It waits for the batter to swing, for the ball to be put in play, for the run to score or not. Only then does it render a verdict. Stuff+ does something stranger and more useful: it grades a pitch the instant it leaves the hand, before the hitter has decided whether to swing, based purely on how the pitch moved.
The premise is that a pitch’s physical characteristics — how hard it was thrown, how much it broke, how it spun, where it was released, and how it played off the pitcher’s other offerings — tell you most of what you need to know about how good it was. Outcomes are noisy; the pitch’s shape is not. A family of models built on that idea — Stuff+, Location+, and Pitching+ — has quietly become one of the sharpest early-warning systems in the sport.
Grading the pitch, not the result
Imagine two identical 97-mph fastballs with the same late ride and the same release point. One gets fouled straight back; the other gets hit 420 feet because the hitter happened to be sitting dead-red. To ERA, those are wildly different pitches — one harmless, one a disaster. To a stuff model, they are the same pitch, graded identically, because the model never looks at what the hitter did. It looks at what the pitcher did.
That is the whole conceptual move. A Stuff+ model is trained on tracking data — velocity, movement in both planes, spin rate and spin axis, extension and release point — and learns which combinations of those traits have historically led to good outcomes across hundreds of thousands of pitches. It then scores any individual pitch on that learned relationship. The result is a measure of raw nastiness that is, by design, blind to luck, defense, ballpark, and the batter’s guess.
Stuff+, Location+, and Pitching+
The family splits the job into two halves and then puts them back together. Stuff+ grades the pitch’s physical quality alone — its movement and velocity and how it tunnels off the arsenal — deliberately ignoring where it ended up in or out of the zone. It is the answer to “how filthy was that pitch?”
Location+ grades the other half: command. It asks where the pitch was thrown and how valuable that location was, given the count and the pitch type, regardless of how nasty the movement was. A perfectly placed 89-mph fastball on the black can score well in Location+ even if its Stuff+ is pedestrian. Pitching+ is the combination — the two folded into a single number that approximates total per-pitch effectiveness, stuff and command together. Think of Stuff+ as the raw weapon, Location+ as the aim, and Pitching+ as the shot.
Reading the scale
The scale is the friendliest thing about these metrics, because it is one you already know. Each is centered so that 100 is league average and every point above or below is a percentage point better or worse, exactly like OPS+ and wRC+ on the hitting side. A pitch or a pitcher at 110 grades out as roughly ten percent better than average; one at 85 is well below the line. Higher is always better, for all three flavors.
That shared scale makes the numbers immediately legible. You do not need to memorize what counts as a good spin rate or a good amount of horizontal break — the model has already translated all of those raw physical inputs into a single index against the league. A starter whose four-seamer grades at 120 in Stuff+ has, by the model’s reckoning, a genuinely plus pitch; one whose whole arsenal sits below 95 in Pitching+ is working with thin material no matter what his ERA says on a given night.
Why it stabilizes so fast
Here is the property that makes Stuff+ genuinely valuable rather than merely clever: it stabilizes far faster than outcome stats. A pitcher’s velocity and movement are nearly the same from his first start to his fortieth — they are physical traits, not results — so a model built on them settles into a reliable reading after only a handful of starts. ERA, by contrast, can stay garbage for months on a small sample, and even FIP and xFIP, which strip out defense and sequencing, still need a real pile of strikeouts, walks, and home runs to mean much.
Because the inputs stabilize early, the output is predictive early. If a pitcher’s stuff grades out elite in April while his ERA is still bloated, the stuff model is often the truer signal, and the runs tend to follow. This is the same done-versus-deserved logic that animates expected stats and barrel rate on the contact side — isolate the repeatable skill, ignore the noisy result, and you get a better read on what comes next. A pitcher with great stuff and bad luck is a buy; the converse is a warning.
Where it comes from — and where it falls short
Stuff+ is not one canonical number but a small ecosystem of public and proprietary models. FanGraphs publishes a widely cited public version, and there are well-known siblings — PitchingBot, the models built around the Driveline community, and various in-house team systems that never see daylight. They share the philosophy but differ in their training data, their features, and exactly how they weight movement against velocity against location, which means they do not always agree on a given pitcher.
That disagreement is the first honest caveat: when two reputable stuff models rank the same arm differently, neither is simply wrong — they are different models. The second caveat is that command is genuinely harder to model than stuff, so Location+ is on shakier ground than Stuff+; predicting the value of a location depends on intent the data can only infer. And the third is that these models can miss sequencing and tunneling effects — the way a pitch plays off the one before it, or off the hitter’s expectation — beyond what the per-pitch features capture. A nasty pitch thrown predictably is worth less than the same pitch thrown to set up the next one, and a per-pitch grade only partly sees that.
The bottom line
Stuff+ answers the question every scout has always asked with their eyes — how good is that pitch? — and answers it with tracking data instead of adjectives. Stuff+ is the raw weapon, Location+ is the aim, Pitching+ is the two together, and all three live on the familiar 100-is-average scale. Their superpower is speed: because they grade physical traits rather than outcomes, they tell you something real after a few starts, long before ERA has made up its mind. Just remember that the models disagree, that command is the soft spot, and that a filthy pitch still has to be sequenced well — the grade is the start of the scouting report, not the end of it.
Sources & Further Reading
- FanGraphs — the public Stuff+, Location+, and Pitching+ leaderboards and model.
- FanGraphs Library — methodology notes on how the pitch-grading models are built and scaled.
- Baseball Savant — the underlying Statcast velocity, movement, spin, and release-point data the models are trained on.