Note from Alex: The previous piece in this series, Part 3, was pre-empted by rumors about the Upton trade within hours of publication. I’m linking to it here, so that you can go back and read it.

I started this series proposing that baseball should have a higher home field advantage than other sports because it has an inherent batting last advantage, and the home team bats last. JoeyT suggested I try to measure that effect, which I hadn’t thought of when I started this quest. I’m going to propose a method which I think gets at it to some extent and I welcome everyone’s thoughts on the matter.

Think about, say, the field dimension advantage. If short fences or some godforsaken hill in center field favors the home team, then the effect of the field advantage ought to decrease as the game goes on because there are fewer and fewer opportunities as the game goes on for the home team to exploit their advantage. By contrast, the bats-last advantage is probably very very small at the start of the game but ought to grow and grow as the game goes along. The archetypal bats-last advantage is the ability to play for one run, which the home team can do in an extra-inning tied game and the visitor can’t. The ability to make defensive substitutions is improved when the home team can squeeze one extra at-bat out of a good hitter since they get an extra half-inning after the visitors have batted if they need it. This advantage is pretty worthless in the early innings but is worth something late.

So, suppose a game is tied in the early innings. The home field advantage will be mostly field-related and less last-bat related, while later in the game it should be far more last-bat related. Let’s now look at the data and see what they say. I have taken every game which was won by one team or the other (this new criterion caused me to lose 61 games out of the database; my previous results were infected by keeping these games in and improperly counting them against the home team, but 61 games out of 100,000 is rounding error) and look at games which were tied at end of inning x. This table gives the results:

Tied After Home Win % Number of games
start of game .539 100062
1st .534 51881
2nd .532 34220
3rd .526 24407
4th .523 18646
5th .518 15240
6th .520 13119
7th .522 11771
8th .521 10303
9th .524 9155
10th .520 5097
11th .521 2814
12th .521 1549
13th .521 864
14th .503 457
15th .510 261
16th .490 153
17th .489 88
18th .457 46
19th .429 28
20th .400 15
21st .556 9
22nd .667 3
23rd .667 3
24th .500 2
0-5 .533 244456
6-9 .521 44348
Extra innings .518 11389

Look what happens. While home teams win 53.9 percent of all games, that rate drops when tied after the first inning, and continues to drop every inning until the fifth inning, when it starts to rise again. It then bounces around a little, but is around 52 percent from the fifth inning through the 13th inning. Once you get out past the 13th inning, you’re only talking about 457 games and I think it’s fair to say that the winning percentages are meaningless.

(I hear some of you in the back saying that all of these percentages are meaningless. If you think so, just stop reading now.)

It’s even more striking graphically (once you suppress the very long games):

The red line is just the data from the table. The green and blue lines give upper and lower 95 percent confidence intervals.

Given my “analysis” above, I can estimate that about two percent of the home field advantage (roughly half) comes from batting last. There are plenty of reasons to object to this, but this estimate is sufficiently fragile that I have no desire for objections now. Note, by the way, that the restriction of this analysis to tied games is only to sharpen the estimate and to avoid the problem of estimating the effect of a lead which might have come about in the early inning through all the other effects. I started doing some math to make this effect more precise, but too many other variables that you’d have to estimate come into play.

One more issue before putting travel to bed. mravery asked me for a heat map based on homestand versus roadtrip status. That graph follows:

The problem with heatmaps like this one is that they don’t tell you just how few games are in some of these cells. For example, there are only 78 games ever played that are the second game of a homestand and the first game of a roadtrip (those are obviously makeup game situations of some sort). So the fact home teams don’t do well in them doesn’t tell you anything.

I think I do this one better, however, with a linear regression. This post is already too long, so I’m not going to describe linear regression, but the following equation explains over 20 percent of the variance in homefield win rates and captures the effects mravery was asking for better than the heatmap, in my opinion:

Winning Percentage in a Homestand = Season Winning Percentage + 0.0052 x length of homestand
Winning Percentage on a Roadtrip = Season Winning Percentage – 0.0047 x length of trip

Call both of these coefficients half a percent, and you see that winning percentages do vary with travel (and with staying home). A six game road trip costs you about 2 percentage points in your winning percentage on that trip, while a six game homestand raises your expected win rate (on that homestand) by about 2 percent. These effects are both significant, though they only explain about 3 percent of the variance in wins.

So where does that leave us? With Alex’s indulgence, I’ll take one final post in a few days to sum up what I’ve learned from this.