By Matthias Kullowatz (@mattyanselmo)

We updated our xGoals model a few weeks ago, as well as our process for continuously updating it throughout the season. Naturally, we’ve done the same for the xPassing model, which estimates the probability of any given pass being completed based on a number of details about the pass. You can read more about the original model here, but here’s the summary of the new model:

Built with a different algorithm (gradient-boosted decision trees)
Uses the following pass details:
- x,y field position
- direction (angle) of pass
- long ball indicator
- through ball indicator
- cross indicator
- headed pass indicator
- kick off indicator
- free kick, throw in, and corner indicators
- goal keeper indicator
- home team indicator
- player differential (due to red cards)
Does not include seasonal intercepts as we do for xGoals

Model details

This new xPassing model was built using xGBoost, a hot new version of boosted decision trees. Additionally, I include data from the 2018 season in the model fit. The new model fits the data better, and just as importantly, I can run the model with parallel computing, allowing me to update the model more regularly. I will be updating the model monthly in the 2019 season to make sure that changes between model fits are more continuous. For anyone curious, I tuned the model using a sequence of two tuning grids, scoring each fit with the log-loss metric using cross validation.

Those familiar with ensemble decision tree models like xGBoost know that the model is randomly different with every fit—at least, if you’re randomly sampling rows and/or columns for each tree, like we are. To test the magnitude of this random effect, I refit the model 10 times with 10 different random seeds. Across seasons, the standard deviation of estimated successful passes (xPass) ranged from 7 to 10 total completions. Considering there are more than 200,000 passes completed each season, I think it’s safe to say you won’t notice at the season level. At the player level, the typical standard deviation of pass score (completions – xPass) was 0.2 with a maximum of 0.6. League leaders hang out around +100, so these minor deviations due to randomness in the model aren’t going to screw up the rankings or anything.

Comparing new model to old

Here’s a summary of how the two models differ by zone. Positive differences indicate zones where the newer model is predicting higher completion rates, while negative differences are where the older model is predicting higher rates. The models generally agree for most of the pitch, but there is some disagreement around the box and on the attacking wings. The spoiler is that the new model does better in these zones.

And here are the player-seasons that will suffer the largest changes in this first model upgrade.

Player	Season	Completed	2017Model	Score	2018Model	Score	Diff
Matt Besler	2015	1186	1164.7	21.3	1144.5	41.5	20.1
Graham Zusi	2018	2022	1928.6	93.4	1945.4	76.6	-16.7
Saphir Taider	2018	1682	1636.3	45.7	1652.4	29.6	-16.1
Bastian Schweinsteiger	2017	1520	1468.2	51.8	1482.8	37.2	-14.6
Leandro Gonzalez Pirez	2018	1656	1590.8	65.2	1605.1	50.9	-14.2
Leandro Gonzalez Pirez	2017	1642	1596.4	45.6	1610.1	31.9	-13.6
Joao Plata	2017	903	897.1	5.9	883.7	19.3	13.3
Ilie Sanchez	2018	2065	2016.8	48.2	2029.9	35.1	-13.1
Osvaldo Alonso	2016	2025	1912.5	112.5	1925.5	99.5	-13.0
Miguel Almiron	2018	1253	1238.5	14.5	1251.4	1.6	-12.9

However, the model changes don’t affect most players to that extent. For most higher-volume passers—at least 500 pass attempts in a season—the median change in predicted completions (and thus pass score) is less than 2 passes. On a per-100 pass basis (one of our metrics in the app), the typical change will be a matter of a few tenths. The “Flipped” column refers to the number of players that flipped from positive to negative scores or vice versa.

Volume	Player-seasons	Passes	Rate	Per100 median change	Score median change	Flipped
> 500 passes	1,130	1,141,040	77.8%	-0.2	-1.8	34
<= 500 passes	1,082	213,680	74.5%	-0.1	-0.1	33

So how much better is this model, and is it worth the changes to players’ passing statistics? You be the judge.

This plot shows the difference between actual completion rates and model-estimated rates, for both models. The 2017 model is the original one fit through the 2017 season, and the 2018 model is the updated model fit through this most recent 2018 season. Differences close to zero are good, the updated model (blue) is closer to zero. There was much rejoicing.

By zone, the 2017 model struggled when the pass occurred near and inside the box. There are small sample sizes here, but the margin of error on a 95% confidence interval is only about 2%. Some of these errors are outside that range, so it would be nice if the new model did better.

And the good news is that the new model shows lower errors in and around the box. Also the new model performed slightly better most everywhere else, with most errors rounding down to 0% to the nearest tenth of a percent. I’m not inclined right now to force the model to fit perfectly because I want the model to stabilize over the long term, rather than chase every last bit of noise.

So there you have it. Have we built a perfect xPassing model? Of course not. But I’ll continue to be transparent about our models, and I’ll continue to maintain open dialogue with the community (you folks!) so that we’re considering all the best ideas. Please don’t hesitate to find me on twitter if you want to continue the conversation.

American Soccer Analysis

Passing Model Update

Model details

Comparing new model to old