With a mere weekend to go in the regular season, the AL Rookie of the Year race will be a runaway for Nick Kurtz. The National League is up in the air. Who should be ROY? For reasons that are best left to psychiatrists, objective specifications of well-thought out criteria always seem to suggest that the best rookie on your team should be ROY and any other conclusion reeks of media bias.

I am not going to assess Drake Baldwin‘s qualifications. But I am writing this piece to make three points:

  • WAR is probably the best criterion we have for making judgments on this question; but
  • WAR, while a pretty good explainer of actual ROY votes, admits of so many empirical exceptions that one oughtn’t be surprised when it doesn’t work, even approximately.
  • You can make a statistical model which does a pretty good job of explaining the voting since 1949. If you believe it, Drake ought to be about a 1 to 6 favorite, which I don’t believe because the Las Vegas odds, which have Horton at about a 1 to 3 favorite, are probably taking lots of things like publicity and team success into account that I’m not considering at all. Baldwin is currently around a 2-1 underdog,

WAR: What Is It Good For?

If you’re reading this, you’ve already seen enough about WAR that you already know about its general usefulness. But there are a few features of WAR that make it uniquely valuable in judging ROY. First, it is really the only tool we have which evaluates both hitters and pitchers in a common measure, and ROY is required to evaluate both rookie pitchers and rookie batters. I should stress that we shouldn’t expect it to be perfect at this, but there is no serious argument that it isn’t better than any other measure we have. If you think you have a better one, let me know. And for position players, the defensive component of WAR, hoewever inadequately, adjusts for defensive prowess and the importance of some positions automatically. It sums the components into one metric. I listened to CJ say on Monday night that WAR doesn’t cut it for this purpose. But if you believe WAR accurately creates Wins (which we know it does, in the same way that Pythagorean Projections project wins and xBA predicts batting average) for both pitchers and position players, it is a fair tool to use, whatever its inaccuracies in any particular case.

Second, rookies are, by definitions, players brought in to fill holes, either from underperforming players or from nomplayer at all at some position/role. They are essentially being judged against all the other replacement players without an MLB track record. So their value against those players is essentially what we want to know.

Third, WAR is independent of team quality. By using WAR, we can avoid the largely unproductive argument over whether Cade Horton is more worthy that Drake Baldwin because Cade Horton was contributing to a team that made the playoffs and Drake Baldwin was playing for a team that will be fishing next week.

Fourth, WAR is an objective measure, calculable for 1949 ans 2025. Obviously no one voting for Don Newcombe in 1949 had any iadea what WAR was. But we, who never saw Newcombe pitch or Del Crandall catch, can say that Newcombe‘s 5.7 WAR was a strong indicator that the 88% of voters who voted for him over Crandall (0.3 WAR) knew what they were doing. It also allows us to spot anomalies that we may not be able to explain. When Joe Charbonneau won the 1980 AL ROY with 2.4 WAR and Britt Burns finished 5th with a 7 WAR season we gain valuable information about the voting process.

Fifth, WAR is a counting stat. Rookies vary widely in their debut times, and often someone will make an argument for a player with a meteoric but brief stint. WAR does not credit these players very highly. And, as it transpires, neither do voters. If we calculate WAR per game played, which will give these players maximal credit, we find that such players have never won a ROY award. The archetypal example here is Bob “Hurricane” Hazle, who made his debut in late July 1957, hit 0.403, OPSed 1.126 and was instrumental in getting the Braves to the World Series. His 1.8 WAR finished fourth, though he did get a 1st place vote. But I guarantee you the Braves Journal of its day was indignant at his egregious snub. (Note that pitchers can actually accumulate WAR pretty quickly: the Mets’ Nolan McLean slightly leads Cade Horton in WAR after making only 7 starts. That’s partly why I think voters implicitly downgrade pitcher WAR.)

So I’m going to assume that WAR is the basic assessment tool to determine ROY, but I’m not going to be dogmatic about it. I will create one category of ROY winners who led his peers in WAR, but I am respectful enough of the voters to credit any winner within 1.5 WAR of the maximum as reasonable. Further, I note that I didn’t go back and add potential candidates with good WAR but no votes. I only used the players actually voted for.

Spill the Wine: By The Numbers

There have been 152 ROYs, drawn from 941 candidates receiving votes, an average of a little over 6 candidates per league-year. Of these 941, 71 led their cohort in WAR and won ROY. Another 48 won with WAR values within 1.5 of the leader. As I said above, I am not so dogmatic about WAR that I discount all manner of other assessment. I think these 48 winners are reasonable. That doesn’t mean that voters might not have been misled by gaudy statistics that we now know to be less relevant, like batting averages. I also think voters have systematically preferred position players to pitchers. A pitcher historicall seems to need to overperform WAR-wise to be competitive. (Obviously this can also be taken as an implicit critique of WAR — that pitcher’s WAR is somehow biased upward from their contribution. This proposition has been debated extensively and my take on that debate is that the critics are wrong, but those who believe will feel that the WAR-downweighting of pitchers in ROY voting is appropriate.)

Of the remaining 33, there were none who fell short on WAR but excelled at WAR per game, like Hurricane Hazle. So these remaining 33 cases are anomalies. But before looking at these, let’s celebrate some success: of 152 ROYs, the vast majority of whom were selected before WAR or any of its sabermetric precursors were even conceived of, we can explain, or at least roughly justify, 119 of them. (I apologize for those pulling hard for Drake Baldwin, but that also means that nothing I’m going to say here will have much to be used in this debate: Baldwin (3.0 as of 9/25) and Horton (2.0 WAR) are both reasonable choices. For that matter, so are Caleb Durbin (3.0), Matt Shaw (3.0) and even Nolan McLean (2.0).

So let’s look at the anomalies:

YearLeagueWinnerPitcher?WARBetter WARPitcher?WAR_max
1949ALRoy SieversFalse2.2Mike GarciaTrue4.9
1953NLJim GilliamFalse3.9Harvey HaddixTrue7.3
1958ALAlbie PearsonFalse0.8Gary BellTrue2.9
1959ALBob AllisonFalse1.4Jim PerryTrue3.0
1961NLBilly WilliamsFalse1.2Joe TorreFalse3.4
1962NLKen HubbsFalse0.0Donn ClendenonFalse1.6
1969ALLou PiniellaFalse2.1Ken TatumTrue4.8
1971ALChris ChamblissFalse0.5Bill ParsonsTrue3.1
1973NLGary MatthewsFalse3.4Steve RogersTrue5.0
1976NLButch MetzgerTrue1.4Pat ZachryTrue3.5
1977ALEddie MurrayFalse3.2Mitchell PageFalse6.1
1978NLBob HornerFalse2.1Don RobinsonTrue3.7
1979ALJohn CastinoFalse2.0Ross BaumgartenTrue3.9
1980ALJoe CharboneauFalse2.4Britt BurnsTrue7.0
1980NLSteve HoweTrue0.4Al HollandTrue3.0
1983ALRon KittleFalse1.9Mike BoddickerTrue4.1
1983NLDarryl StrawberryFalse2.6Bill DoranFalse4.8
1985NLVince ColemanFalse2.5Tom BrowningTrue4.1
1986ALJosé CansecoFalse3.0Mark EichhornTrue7.3
1989NLJerome WaltonFalse1.9Greg HarrisTrue3.8
1990ALSandy AlomarFalse2.4Kevin AppierTrue5.3
1990NLDavid JusticeFalse2.9Mike HarkeyTrue5.2
1992ALPat ListachFalse4.5Kenny LoftonFalse6.6
1992NLEric KarrosFalse0.4Ben RiveraTrue2.6
1994NLRaúl MondesíFalse1.8Steve TrachselTrue3.4
1996NLTodd HollandsworthFalse1.1F.P. SantangeloFalse3.3
1998ALBen GrieveFalse2.2Rolando ArrojoTrue4.1
2000ALKazuhiro SasakiTrue1.3Mark Redman/Barry ZitoTrue/True3.4
2003NLDontrelle WillisTrue4.4Brandon WebbTrue5.9
2007NLRyan BraunFalse2.0Troy TulowitzkiFalse6.8
2009NLChris CoghlanFalse1.1Randy WellsTrue4.3
2010ALNeftalí FelizTrue2.5Austin JacksonFalse5.1
2010NLBuster PoseyFalse3.9Jason HeywardFalse6.4

The first thing to note is that in 25 of these cases, the guy with the higher WAR denied ROY was a pitcher. Even where the voters are comparing one pitcher to another, they make “mistakes,” which I put in scare quotes around because it assumes that pitcher’s WAR accurately reflect value. But note that this method successfully selected the high or near-high WAR pitcher lots of times. What misled voters were wins and saves, those terrbly team-prowess dependent stats. In 1976, Brandon Webb was better than Dontrelle Willis in just about every metric, but Willis was 14-6 and Webb was 10-9. Webb finished third in the voting. The most egregious was 2000, where Kazuhiro Sasaki’s 37 saves in 62 innings of work earned him ROY despite a 1.3 WAR, while Mark Redman and Barry Zito both put up 3.4 WAR and finished 8th and 9th in the voting with pedestrian W-L records.

And sometimes I think that pitchers just weren’t regarded as all that valuable. The 1992 NL was a year without a great candidate, and WAR leader Ben Rivera was actually traded by Atlanta to the Phillies before he became an effective starter in June. Still, Eric Karros’ WAR of 0.4 was historically low, and c=voters had a numer of 2+ players to choose from. That vote was one on the only ones in which a negative WAR player got votes: Donovan Osborne finished 5th without even performing at replacement level.

One thing that the rise of sabermetrics has done is make this sort of thing much more unlikely. No big WAR loser has won since Buster Posey (3.9) beat Jason Heyward (6.4) 15 years ago. The last winner with a 1.0 WAR deficit was Craig Kimbrel (over Vance Worley, who finished third; Freddie Freeman was runner-up.) Since then, the 7 players who won with a lower WAR than someone else in their pool were all within 1.0 WAR. Even if WAR is inaccurate, voters seem to be using an internal mental map which is much closer to WAR than it used to be.

The 2007 NL contest was interesting as well. Ryan Braun was a better hitter than Troy Tulowitski, though Tulowitski was no slouch. But Braun’s 5.2 oWAR was offset by an atrocious -2.9 dWAR. (Could he really have been that bad defensively? He was never that bad the rest of his career, once those sweet sweet PEDs began to flow.) Tulowitski, playing a premier defensive position, combined a 3.7 oWAR with a league-leading 3.9 dWAR. While you might think this is a sign that voters don’t really value defense, this was one of the closest votes in ROY history, and Braun’s gaudier offensive numbers barely carried the day. But the defensive gap between the two players, almost 7 WAR, should have been enough to give Tulowitski the nod, even if you don’t think the difference was really that big.

No. Really. Use Math and Tell Me Who’s Going to Win

So of course I had to make a statistical model, right? I made a logit model, which is a construct you use to estimate the probability of things, with the following specification:

ln(WinProb/(1-WinProb)) = C + a NumberofCandidates + b Pitcher + c RelativeWAR + d RelativeOPS + e Relative ERA

I don’t really know the number of candidates, but I’ll assume that it’s going to be 4: that Baldwin, Horton, Shaw, and Durbin get votes. Relative WAR is just the difference between your WAR and the Mean WAR of all candidated, Relative OPS and Relative ERA are the same thing, but only for position players and pitchers respectively. Pitcher =1 if your’re a pitcher and 0 otherwise.

I have to say I thought going in that there was no chance this modeling would be any good, but I was surprised to see I was wrong. This isn’t a great model, but it isn’t terrible.

Here are the results:

Dep. Variable:winnerNo. Observations:941
Model:LogitDf Residuals:935
Method:MLEDf Model:5
Date:Wed, 24 Sep 2025Pseudo R-squ.:0.2574
Time:20:38:27Log-Likelihood:-308.99
converged:TrueLL-Null:-416.11
Covariance Type:nonrobustLLR p-value:2.539e-44
coefstd errzP>|z|[0.0250.975]
const0.23980.2860.8370.403-0.3220.801
pitcher-1.01950.253-4.0230.000-1.516-0.523
relWAR0.88310.0949.3880.0000.6991.067
relOPS6.16331.9253.2010.0012.3909.937
relERA-0.58610.330-1.7740.076-1.2340.062
numcan-0.28460.041-7.0090.000-0.364-0.205

There are several things to observe about this: first, all things equal, pitchers do worse than position players. (How much worse is an empirical question.) You can tell this because the coefficient on pitcher is negative. But a pitcher with 1 WAR higher than another player has a negligible difference. He loses -1.01 in log-odds for being a pitcher, but gets 0.8831 back for WAR superiority by 1. Indeed: a 1.1 WAR advantage makes him dead even.

So we can now apply this to the NL ROY data. It really doesn’t like Horton, giving him only a 6 percent chance against 40 percent for Baldwin and Shaw and Durbin around 24 and 30 percent respectively. This is unsurprising given his lower WAR, being a pitcher, and an inability to improve his status by having a lower ERA than other pitching candidates, since there aren’t any. (It might be fairer to compare him not to other candidates, but to the league, but that would require me to collect and estimate a whole new data set.

If Horton and Baldwin get all the votes, the result is even more lopsided. In that case, Drake has an 87% chance against 13% for Cade. Do I believe that? Well, since we only get one shot at it, there’s no way to refute it. If Horton wins, it’ll just be one of those 13% of the cases that are weird.

So there you have it: math says Baldwin is going to win.

Some Final Thoughts

The 2025 NL ROY contest is unusual in that there is no candidate with over 3.0 WAR. This has happened a fair number of times though, and the fact that nobody has had a 4 WAR season moots the argument that that player was robbed if he doesn’t win. That said, 2.0 WAR pitchers winning ROY are pretty thin on the ground: Ony 24 winners have 2.0 WAR or less, and 2 of those are the 2020 winners who obviously shouldn’t count. The last pitcher to win in a non-Covid year with a WAR uner 2,0 was Kazuhiro Sasaki in 2000, but he had 37 saves in Seattle’s dream season, second in the AL and one above Mariano Rivera. It was a bad vote, as I said above, but partly understandable in that no position player had a WAR higher than 2.4. (But I still would have picked Barry Zito, who barely got any votes at all.)

Closing “Joke”

But when it comes down do it, as an economist, the choice is obvious: Cade has released 3 EPs. Drake is an international superstar. Case closed.