Let the crapshoot begin
So those of you who have nothing better to do than remember everything I’ve written here will of course recall my periodic forays into Bradley-Terry Ratings for baseball. It’s time to use math to predict the playoffs. Why not? It can’t be worse than anyone else’s predictions, can it? It may not be better, but no one will ever be able to prove that it’s worse.
A brief recap: to make Bradley-Terry Ratings, you need a dataset in which everyone has played everyone else a number of times, and you create a single number to represent the strength of that team such that you come closest to predict the actual record of that team in expectation over the season when every match between two teams is of the form:
Pr(Road Team Win) = RoadRating/(HomeRating x HomeFieldAdvantageFactor + RoadRating)
Let’s start with the rankings for the playoff teams.
There are a number of things you have to make Bradley-Terry ratings. The first is to decide the period over which they are calculated. The standard method is to use that season’s games. Under that criterion, here are the ratings:
But of course using the whole season’s games can be misleading. Teams can change dramatically over the course of the year, as a certain team which retooled at the deadline last year might remind you. What if we just ranked teams based on July-October? That would yield the following rankings:
You have to drop all the way down to 16th place to get all 12 playoff teams. And the Braves rise to 2nd. (This proves that the whole season matters. Without those first two months, the Yankees would be fishing now.)
One more entirely biased method is to go from June 1 to the end of the year. That yields this:
Ahh… that’s more like it. Now the right team is the best team.
There’s an important lesson here even before we begin playoff simulations. Anyone who claims to be able to rank teams must have in their mind something that is the team and something about the timeframe they are measuring over. None of that is a guarantee of performance in the playoffs, but you won’t get any better predictions out that the strength of the team you assumed going in.
So we can start with the most pessimistic Braves’ method: the full year estimate, which is standard. I’ll go through the playoffs in some detail, and then circle back to show the change in probabilities from only using post-May data.
The next step is to assign a home field advantage. I am really skeptical of trying to measure a playoff home field advantage, so I’m just going to go with the sort of numbers I’ve seen in the literature and go with a 1.1 multiplier. This really only has a big effect in the Wild Card round, where the better team gets up to three home games. The following table gives the Wild Card round estimates:
|Wild Card Round|
For each team, this gives their probabilities of winning a game in the series and their chances of winning two out of three. So the Padres, for example, have a 42.9% chance of winning a game at Citifield, which gives them a 36.1% chance of winning two out of three. The last two columns are just the complement of columns (2) and (3). Note, by the way, how little a 2-out-of-three format changes the probabilities. That’s the basis for the basic insight that if you want to upset someone, your best chance is a short series.
Now we get to the LDS round. The lack of reseeding under the new schedule makes this pretty easy. When we look at all the possibilities, we get this:
|Game 1 Visitor||Game 1 Home||Matchup Probability||PR Vis||PR Home|
I have sorted these by the probability that the matchup occurs. Since the Mets have the highest probability of winning their Wild Card Series, it gets the first row. I list the Visitor in Game 1 and the Home Team in Game 1, the probability that matchup occurs, and then the probability that the Visiting Team (in Game 1) survives, followed by the complement, the probability that the other team survives. All the home teams are substantial favorites, which you’d expect.
Next comes the LCS. Following the same structure as the previous table (though now we are playing 7 game series) we get the following:
The probabilities here require both teams to survive to this round, which is just the multiplication of each teams chances of surviving the second round.
Finally, we come to the 36 possible World Series matchups, sorted by their probability of occurrence, and in the same format as the previous two tables:
One thing you can do with this table is calculate fair odds. If I wanted to bet that Seattle beats the Padres in the World series, I ought to get around 1,000:1 odds.
We can then cumulate by team across this table and get the full crapshoot probabilities. The chances of winning the World Series are:
|Team||BT Rating||Championship Probability|
How robust are these probabilities? Well, suppose we use team rankings since June 1st. The revised probabilities are:
|Team||BT Rating||Championship Probability|
That’s more like it, although the Braves are barely better than 4:1 odds. But it’s hard to refute these probabilities. They don’t say the Padres won’t win, only that it’s about 150-1 against. Current Vegas odds are around 30-1 on the Padres. That’s a terrible bet.
On the other hand, the current Dodger odds of about 3.5:1, while not exactly fair. are not strikingly unfair either. But the Braves odds of 6:1 are really good if you think the last four months are who the Braves are.