expected goal differential

Toronto FC: Road Warriors by Matthias Kullowatz

By Matthias Kullowatz (@mattyanselmo)

Team xPoints
TOR 73.8
SEA 52.2
MTL 52.0
SJ 50.6
NYRB 49.1
NE 47.9
COL 47.5
VAN 46.1
CLB 46.1
SKC 44.5
FCD 43.1
LA 42.5
ORL 42.2
POR 41.6
RSL 40.1
DCU 39.2
PHI 38.6
NYC 38.6
CHI 38.0
HOU 37.1

I thought my computer had spit out an error when it told me Toronto FC was the best team in MLS. To the right you can see the power rankings that I was too scared to publish in their typical location without an accompanying article. These are the number of points teams would be expected to earn if the 34-game season started today and each team played a balanced schedule. Toronto may or may not be one of the best teams in MLS, but here's why the computer thinks so.

After last weekend's 1 - 0 win in Philadelphia, Toronto finally completed its seven-game road trip to start the 2015 campaign, a difficult way to start the season which was necessitated by construction to expand BMO Field. That type of road trip typically only happens in MLB or the NBA if the rodeo is in town. The model gives teams bonuses when they have played fewer than half their games at home, assuming that, had they gotten more home games, their expected goals stats would be better. 

While it's a bit crazy to think that Toronto will break the MLS points record with more than 70, it's not crazy to think that maybe they're even better than you, our readers, thought when you ranked them second in the East. Toronto is, after all, fifth in the league in expected goal differential (xGD) despite the fact that--as mentioned before--it hasn't played a single home game. 

Let's play around with some more-intuitive math. In the past five seasons, home teams have outscored away teams by an average of 0.41 expected goals, and this season Toronto has outscored its opponents by an average of 0.18 expected goals per game. If we give Toronto a 0.82 xGD swing, weighted over 3.5 games, then their xGD jumps to 0.59. That would rank them first this season, and either first or second in each of the previous four seasons. 

Toronto is an outlier in both not having played any home games, and having played fewer games than most teams overall. This tends to break regression models. You might notice that the Montreal Impact is also toward the top of the rankings, and not surprisingly, they have played just one home game (25%) and only four total games. Small sample sizes, relative to the rest of the league, are more likely to create outlying results, and that's why the computer is insanely high on those two Canadian clubs. That said, Toronto has put together a very impressive season thus far, even if it doesn't look like it in the standings, and I think it justifies our readers' beliefs that Toronto would be good in 2015. 


Calculating Expected Goal Differential 1.0 by Drew Olsen

The basic premise of expected goal differential is to assess how dangerous a team's shots are, and how dangerous its opponent's shots are. A team that gets a lot of dangerous shots inside the box, but doesn't give up such shots on defense, is likely to be doing something tactically or skillfully, and is likely to be able to reproduce those results.

The challenge to creating expected goal differential (xGD), then, is to obtain data that measures the difficulty of each shot all season long. Our xGD 1.0 utilized six zones on the field to parse out the dangerous shots from those less so. Soon, we will create xGD 2.0 in which shots are not only sorted by location, but also by body part (head vs. foot) and by run of play (typical vs. free kick or penalty). Obviously kicked shots are more dangerous than headed shots, and penalty kicks are more dangerous than other shots from zone two, the location just behind the six-yard box.

So now, for the calculations.

Across the entire league, for all 8,291 shots taken in 2013, we calculate the proportion of shots from each zone that were finished (scored):

Location Goals Shots Finish%
One 129 415 31.1%
Two 451 2547 17.7%
Three 100 1401 7.1%
Four 85 1596 5.3%
Five 51 2190 2.3%
Six 5 142 3.5%

We see that shots from zones one and two are the most dangerous, while shots from farther out or from wider angles are less dangerous. To calculate a team's offensive "dangerousness," we count the number of shots each team attempted from each zone, and then multiply each total by the league's finishing rate. As an example, here we have Sporting Kansas City's offensive totals:

Locations Goals Attempts Finish% ExpGoals
One 5 18 31.1% 5.6
Two 29 160 17.7% 28.3
Three 5 78 7.1% 5.6
Four 3 97 5.3% 5.2
Five 2 120 2.3% 2.8
Six 1 17 3.5% 0.6
Total 45 490 9.2% 48.1

Offensively, if SKC had finished at the league average rate from each respective zone, then it would have scored about 48 goals. Now let's focus on SKC's defensive shot totals:

Locations Goals Attempts Finish% ExpGoals
One 4 13 31.1% 4.0
Two 17 95 17.7% 16.8
Three 4 54 7.1% 3.9
Four 4 56 5.3% 3.0
Five 1 84 2.3% 2.0
Six 0 4 3.5% 0.1
Total 30 306 9.8% 29.8

Defensively, had SKC allowed the league average finishing rate from each zone, it would have allowed about 30 goals (incidentally, that's exactly what it did allow, ignoring own goals).

Subtracting expected goals against from expected goals for, we get a team's expected goal differential. Expected goal differential works so well as a predictor because teams are more capable of repeating their ability to get good (or bad) shots for themselves, and allow good (or bad) shots to their opponents. An extreme game in which a team finishes a high percentage of shots won't sway that team's xGD, nor that of its opponents, making xGD a better indicator of "true talent" at the team level.

As for xGD 2.0, coming soon to a laptop near you, the main difference is that there will be additional shot types to consider. Instead of just six zones, now there will be six zones broken down by headed and kicked shots (12 total zones) in addition to free kick---and possibly even penalty kick---opportunities (adding, at most, four more shot types). As with xGD 1.0, a team's attempts for each type of shot will be multiplied by the league's average finishing rates, and then those totals will be summed to find expected goals for and expected goals against.