By Matthias Kullowatz (@mattyanselmo)
Starting yesterday, you will find playoff seeding probabilities in our web app. We show the probability that each team finishes in each playoff seeding position in its conference, as well as the Supporters’ Shield probabilities for all teams.
What is this based on? Well, it’s a two-part process. First, we built a model capable of predicting the probabilities of future game outcomes based on team performance to date. Then we set up a simulation to randomly determine outcomes for all the remaining games this season, with probabilities derived from that predictive model. For each of 1,000 simulated seasons, we tallied each team’s final points, wins, and goals scored and allowed, and seeded the teams in each conference. Then we figured out what proportion of those 1,000 seasons each team finished in each place.
That’s the executive summary. Next, the details.
The predictive model is actually a combination of two independent Poisson models (GLMs): one to forecast goal scoring expectations for the home team and one to do the same for the away team. The goal scoring expectations are derived from those teams’ prior performances during the season. Using a randomly selected example game, on June 30th Portland traveled to Seattle for the Cascadian rivalry. It was each team’s 15th game of the season, so we had 14 games of information on both teams from which to form a prediction.
To predict an expected number of goals that the Sounders would score, we would have calculated from the previous 14 games their averages in expected goals for (xGF), actual goals for (GF), the Timbers’ expected goals against (xGA), and the Timbers’ actual goals against (GA). It seems intuitive that these four metrics would help to predict the expected number of goals the Sounders would score that day. The Timbers’ expected goals scored in that match would be similarly based on their xGF and GF averages previously, and the Sounders’ xGA and GA previously.
Building separate models for the home and away teams allowed us to work home field advantage into the predictions through the distinct models’ intercepts. Also, based on league average goal scoring, the Poisson distribution seems to fit well.
To predict the expected outcome for that Timbers-Sounders clash in week 18, we would tune the aforementioned models to all games between weeks 10 and 20 in all seasons between 2011 and 2018 (except, of course, weeks 18, 19, and 20 of 2018, which wouldn’t have existed yet). Note that it seems reasonable that the trends established from week 20 games in a previous season could help better predict an outcome from week 18 this season, so we include those weeks in the training data. What makes this a predictive model is that, for every observation in the training dataset, the predictor variables (previous performance of both teams) were calculated over a time period that occurred strictly before the outcome (final score of the game).
Then we fit both Poisson models to the actual goals scored by the home and away teams, respectively, in all those games, regressed on the metrics described in the previous paragraph—i.e. the “lookback” metrics for each team’s previous performance up to, but not including, the game in question. Because week 18 – 20 games from previous seasons would be more relevant predictors of this week-18 game than, say, a week-10 matchup, we weighted each observation in the regressions as the week number of that game, capped at the week in question. So a week 18 game would have 1.8 times the weight of a week 10 game in the Poisson model’s tuning algorithm, and a week-20 game would have the same weight as a week-18 game.
Let’s take a break to apply this methodology to the Timbers-Sounders game. The Sounders had averaged 1.31 xGF, 1.63 xGA, 0.79 GF, and 1.29 GA. The Timbers had produced 1.44 xGF, 1.54 xGA, 1.50 GF, and 1.21 GA. Though the Timbers had performed much better according to actual goals, the model tuning picks up the fact that xG are much more predictive of future performance, and thus neither team was considered all that great. The home model expected 1.8 goals from the Sounders, and the away model expected 1.3 goals from the visiting Timbers. This suggested the potential for a higher-than-normal scoring game, because the typical averages are 1.65 and 1.09.
*You can derive these same numbers in the app’s Team xGoals tab by using the date filter through 6/29.
Now, for turning these into game outcome probabilities. We assume a Poisson distribution for each of home and away goals scored, and generate the Poisson probabilities of 0 through 10 goals scored. To find the probability of the Timbers’ 3 – 2 win on the road, we multiply the Timbers chances of scoring 3 goals by the Sounders’ chances of scoring 2, which works out to 2.7%. The highest probability was a 1 – 1 draw at 11%, for reference. Overall, the model suggested a 49% chance of a Sounders win to just a 27% chance of a Timbers win, which is pretty typical for an MLS game between evenly matched teams. Below I’ve included the scoring matrix of probabilities.
It’s worth noting that we consider a Dixon-Coles adjustment to augment the probability of 0 – 0 and 1 – 1 draws, but the independent home/away assumption actually calibrates pretty well to MLS outcomes. Thus our adjustment is hardly noticeable. Also, an interesting feature of the predictive models is that, for the home team, previous xGF is valued about twice the weight of previous GF (at least, for the weeks on which these models were built). However, for the away team, previous xGF carries virtually all of the weight.
As always, please enjoy responsibly.