Model Update: Coefficient Blending

By Matthias Kullowatz (@mattyanselmo)

With our most recent app update, you might notice that some numbers in the xGoals tables have changed for past years where it wouldn’t normally make sense to see changes. As an example, Josef Martinez had 29.2 xG in 2018, but updated app shows 28.7 (-1.7%). No, this is not an Atlanta effect, though I can understand why you might support such an effect. Gyasi Zardes lost 0.5 xG as well (-2.4%), and no one dislikes Columbus.

We have updated our xGoal models with the 2018 season’s data, and that is the culprit of all the discrepancies since the last version of the app. I have already cited the largest two discrepancies by magnitude, so this isn’t some major overhaul of the model. In fact, only 2018’s xG values have been materially adjusted.* The new model estimated 35.6 fewer xGoals in 2018 than it did before, equivalent to a 2.8% drop.

Before we get to the why, I’ve included a few tables showing the largest effects. Because we use a logistic regression model, and because coefficients in such a model are more multiplicative in impact than they are additive, it’s not surprising that the largest raw discrepancies are among those with the most xG in 2018.

Player Season xG (before) xG (now) Difference
Josef Martinez 2018 29.2 28.7 -0.54
Gyasi Zardes 2018 20.6 20.1 -0.44
Alberth Elis 2018 16 15.6 -0.43
Bradley Wright-Phillips 2018 15.9 15.5 -0.41
Maximiliano Urruti 2018 12.1 11.7 -0.41
Zlatan Ibrahimovic 2018 15.8 15.4 -0.39
Ola Kamara 2018 15.3 14.9 -0.39
Sebastian Giovinco 2018 14.8 14.5 -0.38
Mauro Manotas 2018 15.7 15.3 -0.37
Kei Kamara 2018 13.9 13.5 -0.37

The greatest ratio discrepancies are, conversely, among those with the fewest xG (table filtered to those with more than 1.0 xG so that you might actually recognize a few names).

Player Season xG (before) xG (now) Ratio
Maxime Chanot 2018 1.1 1.0 0.94
Kyle Beckerman 2018 1.2 1.1 0.94
Julio Cascante 2018 1.4 1.4 0.94
Chad Marshall 2018 2.4 2.3 0.95
Gustav Svensson 2018 1.4 1.3 0.95
Justen Glad 2018 1.2 1.1 0.95
Laurent Ciman 2018 1.2 1.2 0.95
Roman Torres 2018 1 1.0 0.95
Aaron Long 2018 2.7 2.6 0.95
Oriol Rosell 2018 1.1 1.0 0.95

Because there is some averaging that goes on at the team level, most teams were impacted very similarly, losing between 1.0 and 2.0 expected goals (both for and against). The greatest differences in expected goal differential (xGD) have changed by less than 1.0 xGD.

Team Season xGF (before) xGF (now) Difference
SKC 2018 59.9 58.0 -1.94
ATL 2018 65.8 63.9 -1.85
NYRB 2018 60.9 59.1 -1.81
NYC 2018 56.4 54.7 -1.69
LAFC 2018 58.7 57.1 -1.66
Team Season xGA (before) xGA (now) Difference
MIN 2018 60.2 58.2 -2.02
SJE 2018 62.4 60.6 -1.79
ORL 2018 61.6 59.9 -1.76
COL 2018 57.4 55.6 -1.73
VAN 2018 54.9 53.2 -1.73
Team Season xGD (before) xGD (now) Difference
MIN 2018 -14.4 -13.7 0.75
SKC 2018 17.6 16.9 -0.726
ATL 2018 26.8 26.1 -0.629
NYRB 2018 19.1 18.6 -0.535
COL 2018 -19.9 -19.4 0.507

As for the cause of this one-time discontinuity in xG, our model fits individual coefficients, or effects, to each season. This guarantees that the sum of goals is equal to the sum of xGoals in any given year, which helps to normalize things like xGD from season to season. Because expected goals are a key part of our predictive models, controlling for the goal scoring environment of a given season is important. Baseball fans out there may notice this is similar to “+” stats, like OPS+, which adjust for era (among other things). We are simply defining an era as a season.

The issue arises when a new season starts, and we don’t have enough data to properly calibrate an effect for that new season. Last year, we just used the 2017 coefficient applied to all shots in 2018, but now we are seeing the sudden effect of giving 2018 its own coefficient. So what are we doing to do about it?

Well it’s not a huge problem. The rankings on all of our app tabs aren’t going to change noticeably, and our predictive models won’t produce materially different probabilities for game outcomes. But we will ensure that the discrepancy is less sudden going forward. Every match week in which we integrate new data into the app, we will refit the model with a new-season coefficient. We will then weight the prior week’s model predictions against the new week’s model predictions to produce xG estimates for the new season. The weights will be determined by how far through the season we are at that time. Thus, each week there will be tiny changes to xG calculations, but you won’t notice; it won’t take half an expected goal away from Josef or anything like that.

*In aggregate, no season other than 2018 saw a change in xGoals by more than 0.1…TOTAL. Only a handful of players in seasons before 2018 will see their xG figures change by a single tenth of a goal. The rest won’t change at all.