With a mere weekend to go in the regular season, the AL Rookie of the Year race will be a runaway for Nick Kurtz. The National League is up in the air. Who should be ROY? For reasons that are best left to psychiatrists, objective specifications of well-thought out criteria always seem to suggest that the best rookie on your team should be ROY and any other conclusion reeks of media bias.
I am not going to assess Drake Baldwin‘s qualifications. But I am writing this piece to make three points:
- WAR is probably the best criterion we have for making judgments on this question; but
- WAR, while a pretty good explainer of actual ROY votes, admits of so many empirical exceptions that one oughtn’t be surprised when it doesn’t work, even approximately.
- You can make a statistical model which does a pretty good job of explaining the voting since 1949. If you believe it, Drake ought to be about a 1 to 6 favorite, which I don’t believe because the Las Vegas odds, which have Horton at about a 1 to 3 favorite, are probably taking lots of things like publicity and team success into account that I’m not considering at all. Baldwin is currently around a 2-1 underdog,
WAR: What Is It Good For?
If you’re reading this, you’ve already seen enough about WAR that you already know about its general usefulness. But there are a few features of WAR that make it uniquely valuable in judging ROY. First, it is really the only tool we have which evaluates both hitters and pitchers in a common measure, and ROY is required to evaluate both rookie pitchers and rookie batters. I should stress that we shouldn’t expect it to be perfect at this, but there is no serious argument that it isn’t better than any other measure we have. If you think you have a better one, let me know. And for position players, the defensive component of WAR, hoewever inadequately, adjusts for defensive prowess and the importance of some positions automatically. It sums the components into one metric. I listened to CJ say on Monday night that WAR doesn’t cut it for this purpose. But if you believe WAR accurately creates Wins (which we know it does, in the same way that Pythagorean Projections project wins and xBA predicts batting average) for both pitchers and position players, it is a fair tool to use, whatever its inaccuracies in any particular case.
Second, rookies are, by definitions, players brought in to fill holes, either from underperforming players or from nomplayer at all at some position/role. They are essentially being judged against all the other replacement players without an MLB track record. So their value against those players is essentially what we want to know.
Third, WAR is independent of team quality. By using WAR, we can avoid the largely unproductive argument over whether Cade Horton is more worthy that Drake Baldwin because Cade Horton was contributing to a team that made the playoffs and Drake Baldwin was playing for a team that will be fishing next week.
Fourth, WAR is an objective measure, calculable for 1949 ans 2025. Obviously no one voting for Don Newcombe in 1949 had any iadea what WAR was. But we, who never saw Newcombe pitch or Del Crandall catch, can say that Newcombe‘s 5.7 WAR was a strong indicator that the 88% of voters who voted for him over Crandall (0.3 WAR) knew what they were doing. It also allows us to spot anomalies that we may not be able to explain. When Joe Charbonneau won the 1980 AL ROY with 2.4 WAR and Britt Burns finished 5th with a 7 WAR season we gain valuable information about the voting process.
Fifth, WAR is a counting stat. Rookies vary widely in their debut times, and often someone will make an argument for a player with a meteoric but brief stint. WAR does not credit these players very highly. And, as it transpires, neither do voters. If we calculate WAR per game played, which will give these players maximal credit, we find that such players have never won a ROY award. The archetypal example here is Bob “Hurricane” Hazle, who made his debut in late July 1957, hit 0.403, OPSed 1.126 and was instrumental in getting the Braves to the World Series. His 1.8 WAR finished fourth, though he did get a 1st place vote. But I guarantee you the Braves Journal of its day was indignant at his egregious snub. (Note that pitchers can actually accumulate WAR pretty quickly: the Mets’ Nolan McLean slightly leads Cade Horton in WAR after making only 7 starts. That’s partly why I think voters implicitly downgrade pitcher WAR.)
So I’m going to assume that WAR is the basic assessment tool to determine ROY, but I’m not going to be dogmatic about it. I will create one category of ROY winners who led his peers in WAR, but I am respectful enough of the voters to credit any winner within 1.5 WAR of the maximum as reasonable. Further, I note that I didn’t go back and add potential candidates with good WAR but no votes. I only used the players actually voted for.
Spill the Wine: By The Numbers
There have been 152 ROYs, drawn from 941 candidates receiving votes, an average of a little over 6 candidates per league-year. Of these 941, 71 led their cohort in WAR and won ROY. Another 48 won with WAR values within 1.5 of the leader. As I said above, I am not so dogmatic about WAR that I discount all manner of other assessment. I think these 48 winners are reasonable. That doesn’t mean that voters might not have been misled by gaudy statistics that we now know to be less relevant, like batting averages. I also think voters have systematically preferred position players to pitchers. A pitcher historicall seems to need to overperform WAR-wise to be competitive. (Obviously this can also be taken as an implicit critique of WAR — that pitcher’s WAR is somehow biased upward from their contribution. This proposition has been debated extensively and my take on that debate is that the critics are wrong, but those who believe will feel that the WAR-downweighting of pitchers in ROY voting is appropriate.)
Of the remaining 33, there were none who fell short on WAR but excelled at WAR per game, like Hurricane Hazle. So these remaining 33 cases are anomalies. But before looking at these, let’s celebrate some success: of 152 ROYs, the vast majority of whom were selected before WAR or any of its sabermetric precursors were even conceived of, we can explain, or at least roughly justify, 119 of them. (I apologize for those pulling hard for Drake Baldwin, but that also means that nothing I’m going to say here will have much to be used in this debate: Baldwin (3.0 as of 9/25) and Horton (2.0 WAR) are both reasonable choices. For that matter, so are Caleb Durbin (3.0), Matt Shaw (3.0) and even Nolan McLean (2.0).
So let’s look at the anomalies:
The first thing to note is that in 25 of these cases, the guy with the higher WAR denied ROY was a pitcher. Even where the voters are comparing one pitcher to another, they make “mistakes,” which I put in scare quotes around because it assumes that pitcher’s WAR accurately reflect value. But note that this method successfully selected the high or near-high WAR pitcher lots of times. What misled voters were wins and saves, those terrbly team-prowess dependent stats. In 1976, Brandon Webb was better than Dontrelle Willis in just about every metric, but Willis was 14-6 and Webb was 10-9. Webb finished third in the voting. The most egregious was 2000, where Kazuhiro Sasaki’s 37 saves in 62 innings of work earned him ROY despite a 1.3 WAR, while Mark Redman and Barry Zito both put up 3.4 WAR and finished 8th and 9th in the voting with pedestrian W-L records.
And sometimes I think that pitchers just weren’t regarded as all that valuable. The 1992 NL was a year without a great candidate, and WAR leader Ben Rivera was actually traded by Atlanta to the Phillies before he became an effective starter in June. Still, Eric Karros’ WAR of 0.4 was historically low, and c=voters had a numer of 2+ players to choose from. That vote was one on the only ones in which a negative WAR player got votes: Donovan Osborne finished 5th without even performing at replacement level.
One thing that the rise of sabermetrics has done is make this sort of thing much more unlikely. No big WAR loser has won since Buster Posey (3.9) beat Jason Heyward (6.4) 15 years ago. The last winner with a 1.0 WAR deficit was Craig Kimbrel (over Vance Worley, who finished third; Freddie Freeman was runner-up.) Since then, the 7 players who won with a lower WAR than someone else in their pool were all within 1.0 WAR. Even if WAR is inaccurate, voters seem to be using an internal mental map which is much closer to WAR than it used to be.
The 2007 NL contest was interesting as well. Ryan Braun was a better hitter than Troy Tulowitski, though Tulowitski was no slouch. But Braun’s 5.2 oWAR was offset by an atrocious -2.9 dWAR. (Could he really have been that bad defensively? He was never that bad the rest of his career, once those sweet sweet PEDs began to flow.) Tulowitski, playing a premier defensive position, combined a 3.7 oWAR with a league-leading 3.9 dWAR. While you might think this is a sign that voters don’t really value defense, this was one of the closest votes in ROY history, and Braun’s gaudier offensive numbers barely carried the day. But the defensive gap between the two players, almost 7 WAR, should have been enough to give Tulowitski the nod, even if you don’t think the difference was really that big.
No. Really. Use Math and Tell Me Who’s Going to Win
So of course I had to make a statistical model, right? I made a logit model, which is a construct you use to estimate the probability of things, with the following specification:
ln(WinProb/(1-WinProb)) = C + a NumberofCandidates + b Pitcher + c RelativeWAR + d RelativeOPS + e Relative ERA
I don’t really know the number of candidates, but I’ll assume that it’s going to be 4: that Baldwin, Horton, Shaw, and Durbin get votes. Relative WAR is just the difference between your WAR and the Mean WAR of all candidated, Relative OPS and Relative ERA are the same thing, but only for position players and pitchers respectively. Pitcher =1 if your’re a pitcher and 0 otherwise.
I have to say I thought going in that there was no chance this modeling would be any good, but I was surprised to see I was wrong. This isn’t a great model, but it isn’t terrible.
Here are the results:
| Dep. Variable: | winner | No. Observations: | 941 |
|---|---|---|---|
| Model: | Logit | Df Residuals: | 935 |
| Method: | MLE | Df Model: | 5 |
| Date: | Wed, 24 Sep 2025 | Pseudo R-squ.: | 0.2574 |
| Time: | 20:38:27 | Log-Likelihood: | -308.99 |
| converged: | True | LL-Null: | -416.11 |
| Covariance Type: | nonrobust | LLR p-value: | 2.539e-44 |
| coef | std err | z | P>|z| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| const | 0.2398 | 0.286 | 0.837 | 0.403 | -0.322 | 0.801 |
| pitcher | -1.0195 | 0.253 | -4.023 | 0.000 | -1.516 | -0.523 |
| relWAR | 0.8831 | 0.094 | 9.388 | 0.000 | 0.699 | 1.067 |
| relOPS | 6.1633 | 1.925 | 3.201 | 0.001 | 2.390 | 9.937 |
| relERA | -0.5861 | 0.330 | -1.774 | 0.076 | -1.234 | 0.062 |
| numcan | -0.2846 | 0.041 | -7.009 | 0.000 | -0.364 | -0.205 |
There are several things to observe about this: first, all things equal, pitchers do worse than position players. (How much worse is an empirical question.) You can tell this because the coefficient on pitcher is negative. But a pitcher with 1 WAR higher than another player has a negligible difference. He loses -1.01 in log-odds for being a pitcher, but gets 0.8831 back for WAR superiority by 1. Indeed: a 1.1 WAR advantage makes him dead even.
So we can now apply this to the NL ROY data. It really doesn’t like Horton, giving him only a 6 percent chance against 40 percent for Baldwin and Shaw and Durbin around 24 and 30 percent respectively. This is unsurprising given his lower WAR, being a pitcher, and an inability to improve his status by having a lower ERA than other pitching candidates, since there aren’t any. (It might be fairer to compare him not to other candidates, but to the league, but that would require me to collect and estimate a whole new data set.
If Horton and Baldwin get all the votes, the result is even more lopsided. In that case, Drake has an 87% chance against 13% for Cade. Do I believe that? Well, since we only get one shot at it, there’s no way to refute it. If Horton wins, it’ll just be one of those 13% of the cases that are weird.
So there you have it: math says Baldwin is going to win.
Some Final Thoughts
The 2025 NL ROY contest is unusual in that there is no candidate with over 3.0 WAR. This has happened a fair number of times though, and the fact that nobody has had a 4 WAR season moots the argument that that player was robbed if he doesn’t win. That said, 2.0 WAR pitchers winning ROY are pretty thin on the ground: Ony 24 winners have 2.0 WAR or less, and 2 of those are the 2020 winners who obviously shouldn’t count. The last pitcher to win in a non-Covid year with a WAR uner 2,0 was Kazuhiro Sasaki in 2000, but he had 37 saves in Seattle’s dream season, second in the AL and one above Mariano Rivera. It was a bad vote, as I said above, but partly understandable in that no position player had a WAR higher than 2.4. (But I still would have picked Barry Zito, who barely got any votes at all.)
Closing “Joke”
But when it comes down do it, as an economist, the choice is obvious: Cade has released 3 EPs. Drake is an international superstar. Case closed.



Rob Copenhaver, JC’d:
I think it’s kinda dumb for Charlie to get a start. He’s not Greg Maddux. He doesn’t need a victory lap.
I’d be okay with using him as an opener, I think. Let him pitch the first or second inning, then let him come out to the mound, and then give him the hook and let him get a standing ovation. He’ll get it, and he’ll deserve it.
Also for Rob in discussion of whether Bummer is good value at $9 million:
https://www.espn.com/mlb/story/_/id/43837137/sources-jakob-junis-guardians-agree-1-year-45m-deal
Phew. We will be paying JJ, PJ, and AB $25 million, which is the cost of an entire bullpen of Junises. I’ve always felt that AA overpays relief pitching (Will Smith contract, for example)
There can be no better testament to the general perception of Bummer’s worth than our inability to trade him, or anyone else, at the deadline, when the whole league knew we were shopping our pen.
I don’t know why people say the Braves have trouble developing major league talent. I just watched Kolby Allard pitch to Justyn-Henry Malloy in the thick of a pennant race!
Steve Howe is an interesting one with his 0.4 WAR ROY winning season. It wasn’t a great year for rookies in the NL, but Howe wasn’t close to the best rookie reliever. There were at least 3 better relievers including Jeff Reardon, but the best was Al Holland at 3 WAR with a sub-2 ERA. And if you didn’t like giving a relief pitcher the ROY, you could’ve picked Bill Gullickson with a strong season as a starting pitcher or Lonnie Smith with his .339 batting average. Of course they didn’t have WAR back then but just the gaudy rate stats favored other players. It probably helps to play for the Dodgers.
There are a lot of interesting stories in this data about what voters have done in the past. Sasaki, Howe and Kimbrell are all examples of “pitchers are disfavored unless you’re a reliever for a good team” which is a really odd way to think about things.
I can only assume that, based on his rookie season, AA has already locked Hurricane Hazle down to a long-term extension.
But re: the Good Team corollary (w/r/t Sasaki & Howe etc) mentioned above, I know you said you didn’t take team success into account into your model, but as you went through the data, did you notice any kind of correlation between winning ROY and team success? Or might that be insane to try to map over time, in part b/c the definition of success has changed so much as the playoff formats have changed? The Cubs having a good season (and the fact that they are The Cubs/a brand name) just seems to be an inseparable part of the Horton buzz. But this was a fun read, thanks for this, JonathanF.
Thanks.
Because it wasn’t in the BRef data pages I scraped, I didn’t do anything with record, but it was in the back of my mind. I can try putting in winpct and/or playoff team as another variable without too much trouble, but you’re right… “playoff team” used to mean “WS team” at the beginning of the data, and the meaning of
“.54 winpct” has moved from “OK team” to “WS contender.” Maybe I’ll try this while i’m watching us choke the Ryder Cup.
Greetings from Athens, Ga…
Thanks for the deep well of context, JonF.
And Britt Burns… his rookie season was a great example of why sabermetrics folks would lose their minds over certain award injustices, esp. in the ’80s. Hey, he only went 15-13… never mind the rest of his stats. (Plus, to be fair to the voters, the meteoric Joe Charboneau, who drank beer thru his nose and had a song written for him, was a more colorful story.)
Go Dawgs.
Use your new LED watch well. I think those will be spectacular in a night game.
OK, cph. I was able to implement winning pct. It works in the modelling, but is not a very strong factor. It only moves Horton from 6 to 7 percent, though it does drop Baldwin from 40 percent to 36 percent. If I were trying to write a publishable academic article, there would be a lot more work to refine this aspect of the model… but I’m not doing that.
ububba: Thx: Of course ROY isn’t really intended as a predictive vote, but Charbonneau’s career was under 2 WAR and Burns was 17.9.
I think it’ll be interesting to see whether Nacho Alvarez develops as a hitter. He, like Justyn-Henry Malloy and Vaughn Grissom, is a guy who came through our high minors with a reputation as a good hitter for contact, with maybe only fair power and no premium defensive position. Thus far, Malloy and Grissom have proven to be tweeners, basically Quad-A guys. That’s significant, given that they’re three of the most successful hitting prospects the team has produced in recent years.
Other than Michael Harris II, three of the four best hitting prospects that the team has produced have been catchers: William Contreras, Shea Langeliers, and Drake Baldwin. The next guys on the list are probably Malloy, Grissom, and Alvarez in some order, and possibly Braden Shewmake, too, though the fact of the matter is that part of what’s going on here is that Shewmake could only be a top-10 prospect in a very thin system.
I do wonder whether the Braves will push for a reunion with guys like Grissom and Malloy, who likely would not cost overmuch to acquire at this point, and who would provide our organization with some welcome depth if nothing else. (Obviously, I’d love it if we could get a guy like Jordan Walker, who clearly needs a change of scenery in the worst way and who grew up in Stone Mountain, but the Cardinals don’t appear willing to give him away for free just yet.)
How much of this is minor league coaching and how much is drafting? And to the extent it is drafting, how much of it is drafting high (or low depending on how you think of it) because your team is good? And how much is talent assessment in the draft versus simply deciding you need to load up on pitchers and you trade off position player prowess to get there.
Oh, I think what you’re mentioning there is a lot of it. Grissom’s an 11th rounder, Malloy’s a 6th rounder, Alvarez is a 5th rounder. By any measure, they’re draft success stories: the average player picked in those rounds never makes the majors.
On the other hand, the survivorship bias illustrates a deeper problem: if these guys are our biggest successes, that’s because our first, second, and third-rounders have struggled to pan out, as I’ve complained about elsewhere!
Thanks, Jonathan, this is great. Does the change around 1980 from only first-place votes to 5-3-1 votes affect your analysis at all? My first thought when you said that voters had never rewarded Hurricane Hazles was about Willie McCovey in 1959, who was the unanimous NL ROY for 52 games of steroid-era-Bonds-like production (.354-.429-.656). Since he was unanimous, I don’t know whether you compared him to anyone else that year who might have had 1.6 more WAR in three times as many games, and I couldn’t find a quick way to search for other rookies that year. If there had been a bunch of rookies getting second- and third-place votes in the years before 1980, might that have affected your model’s results more generally?
On your point about ROY not being intended as a predictive vote, when I was a kid I always assumed that it was in part a predictive vote. Since you can expect to be rooting for a rookie for several years (even more back then), you should be happiest if he’s the one not just with the best results in his rookie year (which could be partially luck), but the one who can be expected to be the best over the next several years. If you want to check whether voters have seen it that way, you could add age (a proxy for expected future improvement) as a variable. It would be interesting to see whether 20-year-old rookies have done better in the voting and 26-year-old rookies worse than the model in its current form predicts.
If there is a predictive component to voting, that might also explain the pitcher (or maybe just starter?) penalty you found. I assume that pitchers vary more from year to year than hitters, both because of injuries and because of other performance variation, especially if you’re mainly looking at W-L like a lot of early ROY voters were doing. That would mean that voters can be less confident that a pitcher with a certain rookie performance will sustain or improve on that performance than you can be with a hitter, so if there’s a predictive component to voting, you might penalize a pitcher.
ububba – I thought the reason Charboneau was colorful was that he opened the beer bottles with his eye socket.
Good points, jamesd.
The change in voting certainly meant there were more people getting votes. (261 guys in the database got no first place votes.) And the fact that McCovey was unanimous meant there was no one to compare him to — I didn’t go back to try and get WAR numbers for guys who didn’t get votes. (I’m not even completely sure I have a source for “rookieness.”) But Stathead tells me https://stathead.com/tiny/RQNB1 that McCovey led both leagues in rookie WAR in 1959 despite his short season.
Thanks. I forgot that the definition of rookie status has changed several times over the years, which I suppose might affect things slightly if someone who had a noteworthy 150 PA or whatever in one year would still be eligible for ROY the next year and his “pre-rookie year” performance might affect perceptions of him and his voting success.
Welcome to our new Dane Dunning!
https://www.mlbtraderumors.com/2025/09/braves-claim-alek-manoah.html
I wouldn’t undersell Manoah. It takes longer than most folks want for guys to recover from TJS and I think Manoah might come out of it next year. I’m not a big Dunning fan but I do think Manoah and Ian Anderson have something left in the tank. Either could be as good or better than Bryce Elder
I think the Braves’s strategy right now is all about the fans. If it gets 1000 more in the ballpark they will do it. If Tom Glavine could still go and he wanted to they would let him pitch if they thought they could get more butts in the seats. I like what they’re doing. . The last half of the season has been surprisingly watchable even with a decimated pitching staff.
Recapped