By Dave Laidig (@davelaidig)

For years I’ve been interested in how players contribute to team results. I’ve sought a measure of player contributions to a win that covered all aspects of a game. While many valuable and informative soccer metrics have been created, common stats are not entirely on point with this issue.

For example, xG stats apply only to scoring attempts, and perhaps goalkeepers. Adding xAssists and key passes broadens the scope of included players. But the contribution of defensive oriented players would not be expected to show up on these metrics. And offensive-oriented players would still rely on teammates to threaten the net before their effort can be measured.

The xGChain metric is useful for identifying players that participate in the most productive attacks, and includes players that play further away from the goal. But this metric does not include non-offensive actions. And each players’ contribution is given equal weight, whether it’s the initial square pass to a CB in the defensive half, or delivering a cross into the penalty area. Experienced analysts consider the dashboard of key performance indicators and piece together insights from the elements. But I’m looking to consolidate all game elements with a common perspective.

My goal is not new, nor necessarily unique. There have been many attempts to create a comprehensive performance index; from Sarah Rudd using Markov Chains, Dan Altman’s Shapley Values, Goalimpact ratings to corporate sponsored efforts like the Castrol Index and the MLS-partnered Audi Index. Recently, Nils MacKay has advanced his own model that also evaluates the xG added by game actions via a different approach. Even ASA contributor Mark Goodman has tried his own ranking system. In addition, I understand many teams have their own version of a performance index. Unfortunately, these metrics were created at private expense and are proprietary, which makes it difficult to evaluate the data and their utility (to those without subscriptions at least).

As a result, I set about to create a metric that assigns each player their contribution to the team’s result. The fundamental calculation is the difference between the chances of scoring when a player gets the ball, and the chances of scoring when a player is done. To facilitate this comparison, I used the 2017 season as a basis for determining the chances of scoring (at the end of the possession) from any area on the field.

Average xG per Possession

Using the American Soccer Analysis data set covering all 374 matches of the 2017 MLS season, I placed all shots, dribbles, passes, and defensive actions into chronological order, and applied my possession definition.

Possession:

Starts with Shot, dribble, completed pass, or incomplete free kick, corner, throw-in
Ends with a shot, opponent offensive action (pass/dribble/shot), or end of half/game

The result was over 65,000 possessions in the 2017 MLS season. I broke the pitch into over 100 zones, and tracked which zones show up in each possession, and the possession result. This data provides the average expected possession result (in xG) for each zone. For more detail, an earlier version of the chances of scoring from different zones can be found here.

For this analysis, I improved the earlier results by separating free kicks, corner kicks and penalties from the regular run of play touches. I removed the possession start condition that relied on a defensive action; because it typically was an immediate turnover and did not reflect a real “possession” in my opinion. And I also added every zone where a completed pass was received; and then removed duplicates so that each zone I could possibly capture was represented only once in a possession chain. This I felt was a truer indication of the average xG per possession (when possessing the ball in any particular zone).

Features of Average xG per Possession field grid:

The average xG per result data are intuitive and based simply on observations,
Possession results use xG instead of goals scored which increases the number of non-zero observations and reduces some of the randomness,
Values are based on the entire possession chain, and is not limited to the last couple touches or an arbitrary period of time before a shot (note: some possessions can exceed 30 passes and over a minute of game time),
Possession during the run of play is separated from free kicks, corner kicks, and penalties,
Areas where passes are received are also included in possession chain calibration,
Each zone only counted once per possession, and
Zones approximate field markings and are smaller in the final third as small changes in location start to be more meaningful.

Overall Player Contribution Rating

Again, we look to the ASA data to examine all possession chains in the 2017 and 2018 MLS seasons. Knowing the value of the various pitch locations (in terms of the average xG result from the possession, also shortened to “zone value”) means we can evaluate each touch based on the difference between the start value and the end value.

The start value is the value for the player’s first recorded zone for his touch. And the end value can either be (1) the zone value of a completed pass, (2) zero for a turnover, (3) the shot xG, or (4) the probability of scoring a shot on target. A 100% probability of scoring a goal is the same as 1.0 xG, and lesser probabilities of scoring equal a proportionate equivalent of xG. Thus, player value is measured in xG equivalents (also called non-shot xG).

While the details of how this method applies to various scenarios will be discussed in greater detail in the subcategory discussions; there are a few noteworthy aspects of the overall player value rating to highlight at the start.

Overall Player Value

Represents value added by player in terms of added/decreased xG expected
Includes GK actions for opponent shots on target (see GK Value below)
Includes red card penalty (see F-Up Value below)
Includes assessment for PK won (+0.20) and PK conceded (-0.55) (see F-Up Value below)
Includes losses of possession not otherwise captured in game data (see TO/LOP Value below)
Does not value off the ball plays
Does not value incomplete passes where team keeps possession (see Pass Value below)
Does not value incomplete passes where team never had possession (e.g., clearance)
Does not value defense actions that do not immediately lead to own team’s possession (see Defense – Turnover Value below)

In 2017, the average player contribution for 90 minutes is 0.107 (xG equivalents). There were 169 players with above average values and at least 1000 minutes in 2017; there were 166 players below average. The highest 30% were at 0.14 xG per game and higher. And there was a middle 40% in between 0.14 and 0.07. The bottom 30% were at 0.07 xG per game and lower.

Validity as a Performance Measurement

If getting a higher score is “good,” then we should see higher scores reflect “good” results. Otherwise, you’re not measuring what you think you’re measuring (in technical terms, the measure is not “valid”). Fortunately, we do see the player ratings reflect actual success.

We can start with the purpose of this measure, breaking down each player’s contribution to the team winning a game. And the team with the higher overall rating was more likely to actually win the game. The difference between Team A’s value and Team B’s value after a game is highly correlated with the actual goal differential. For the 2017 MLS season, the correlation was 0.90, and so far in 2018 308 games), the correlation is 0.85. For comparisons, the xGD and actual goal difference correlation is 0.44 in 2017 and 0.50 for 2018.

And I’m not saying the player value ratings are a better stat than xG stats per se, especially since xG metrics have demonstrated utility for all sorts of applications. I’m only reporting that adding additional information via player ratings gets closer to mirroring actual results, which makes intuitive sense. And if we want to look at the Audi Index, a statistic with similar goals as the player value rating, the match level correlation between Team A minus Team B index results and the actual goal difference was 0.71 in 2016. In sum, the player value rating appears to meet its goal of reflecting team results at the game level, and reflects a stronger relationship than the Audi Index.

Although breaking out the players’ contributions to individual game results is the primary goal, we can also examine how the player ratings reflect other important indicators of success. Turning to the season table, the correlation between a team’s season total and their points per game is 0.76 for 2017 and 0.70 for 2018. In contrast, the Audi Index was correlated to season points at 0.44 in 2016. Further, other known performance effects show up. Home field advantage is reflected as well; the home team rating averages 1.51 xG equivalents, and the away team 0.69 xG equivalents. Across perspectives, better teams seem to have higher value ratings.

And for better or worse, one of the most persuasive measures of validity is whether a metric produces generally expected results. In essence, how do the good players rate?

As of September 3rd, the top 25 season contributions by total value and per 90 minutes.

Player   Position   Minutes   2018 Season Value  
Josef Martinez F 2396 11.91
Stefan Frei GK 2403 11.67
Maximiliano Moralez CAM 2551 11.18
Graham Zusi RB 2524 9.79
Valeri Qazaishvili LM 2433 9.10
Miguel Almiron CAM 2543 8.86
Sebastian Giovinco F 2064 8.27
Alberth Elis RM 2180 8.03
Romell Quioto LM 1975 7.98
Evan Bush GK 2684 7.96
Bradley Wright-Phillips F 2223 7.94
Albert Rusnak CAM 2377 7.71
Carlos Vela CAM 1902 7.19
Zoltan Stieber RM 1734 6.90
Joao Plata LM 1358 6.86
Matt Hedges CB 2531 6.80
Diego Valeri CAM 2336 6.57
Diego Fagundez CAM 2275 6.52
Ola Kamara F 2287 6.39
Andrew Farrell RB 2474 6.21
Romain Alessandrini RM 1464 6.09
Johnny Russell RF 1853 6.06
Matt Turner GK 2497 5.99
Cristian Techera RM 1182 5.80
Michael Barrios RM 2052 5.60

Player  Position   Minutes   2018 Value (p90min)  
Joao Plata LM 1358 0.45
Josef Martinez F 2396 0.45
Cristian Techera RM 1182 0.44
Stefan Frei GK 2403 0.44
Maximiliano Moralez CAM 2551 0.39
Romain Alessandrini RM 1464 0.37
Sebastian Saucedo LF 1001 0.37
David Villa F 1268 0.37
Romell Quioto LM 1975 0.36
Ismael Tajouri-Shradi F 1107 0.36
Adama Diomande F 860 0.36
Sebastian Giovinco F 2064 0.36
Zoltan Stieber RM 1734 0.36
Graham Zusi RB 2524 0.35
Carlos Vela CAM 1902 0.34
Valeri Qazaishvili LM 2433 0.34
Giovani dos Santos CAM 848 0.34
Alberth Elis RM 2180 0.33
Santiago Mosquera CM 1193 0.33
Bill Poni Tuiloma CD 617 0.32
Bradley Wright-Phillips F 2223 0.32
Ryan Hollingshead LB 691 0.32
Miguel Almiron CAM 2543 0.31
Tosaint Ricketts F 620 0.30
Johnny Russell RF 1853 0.29
*Minimum 500 minutes

In 2017, the top 25 season contributions in MLS by total value and per 90 minutes:

Player   Minutes   2017 Season Value  
Joao Plata 2417 11.35
Tim Melia 2935 10.04
Nemanja Nikolic 3069 9.87
Diego Fagundez 2555 9.52
Miguel Almiron 2523 8.90
Haris Medunjanin 3231 8.48
Romain Alessandrini 2574 8.40
Albert Rusnak 2723 8.14
David Accam 2250 8.09
David Villa 2684 7.93
Sacha Kljestan 2742 7.77
Cristian Roldan 3106 7.57
Lee Nguyen 2656 7.28
Mauro Manotas 2200 6.95
Hector Villalba 2827 6.89
Graham Zusi 2244 6.72
Victor Vazquez 2473 6.71
Sebastian Giovinco 2162 6.61
Justin Meram 2685 6.53
Jack Harrison 2872 6.52
Ola Kamara 3004 6.16
Diego Valeri 3002 5.89
Alberth Elis 1737 5.75
Benny Feilhaber 2407 5.64
Christian Ramirez 2529 5.60

Player   Minutes   2017 Value (p90 min)  
Joao Plata 2417 0.42
Shkelzen Gashi 1090 0.40
Romell Quioto 1357 0.34
Diego Fagundez 2555 0.34
David Accam 2250 0.32
Miguel Almiron 2523 0.32
Tim Melia 2935 0.31
Alberth Elis 1737 0.30
Josef Martinez 1588 0.29
Romain Alessandrini 2574 0.29
Nemanja Nikolic 3069 0.29
Mauro Manotas 2200 0.28
Sebastian Giovinco 2162 0.28
Graham Zusi 2244 0.27
Albert Rusnak 2723 0.27
Brad Guzan 1331 0.27
David Villa 2684 0.27
Sacha Kljestan 2742 0.26
Yordy Reyna 1140 0.25
Lee Nguyen 2656 0.25
Victor Vazquez 2473 0.24
Haris Medunjanin 3231 0.24
Jefferson Savarino 1730 0.23
Darren Mattocks 1172 0.23
Arturo Alvarez 1166 0.23
*Minimum 1000 minutes

Predicting Future Performance

The player value ratings are not intended as a predictive measure. But measures of performance naturally lead to questions about consistency over time.

As a word of caution, we’re comparing a full season of data (2017) against a partial season (through August 20, 2018). And by imposing a minimum level of playing time – 400 minutes in most cases – in both seasons, the player populations may be skewed. In fact, there is some evidence that the poorer performers from 2017 have dropped out of the 2018 population at twice the rate. In addition, there is a host of potential intervening or moderating variables that could enhance or minimize the “real” effect size. For one quick example, Alphonso Davies was a poor performer in 2017 (-0.02 per 90), and a strong performer in 2018 (0.22 per 90). This is an abnormal jump, but he was also 16 in the previous year. It’s possible that including age curves, positions, teams, tactical roles and the like can all improve predictive accuracy. And that will just take time to test those ideas.

With the caveats aside, the 2017 and 2018 ratings (for the 283 players with 400+ minutes in both years) is 0.465. And the scatterplot for the field players is below.

With modest relationships, addressing the variability is a must. And reporting confidence intervals make sense. However, I prefer to borrow from I/O Psychology and use hit-miss ratios because it feels more intuitive to use. In other words, if we put the 2017 ratings into buckets, what percentage of each bucket had a successful 2018 rating.

For a measure of “success” in 2018, I used a simple standard of whether the player had an above average rating. If all ratings were random, one would have a 50% chance of selecting a player with an above average 2018. Thus, we can judge our labels on whether they improve on the desired hit rate; or assist in avoiding “misses.”

2017 Category   2018 “Success” %   2018 Avg (xG equiv per 90)   Population dropout rate from 2017  
Top 10% 0.78 0.22 10%
Top 30% 0.69 0.19 27%
Middle 40% 0.44 0.12 41%
Bottom 30% 0.4 0.09 45%
Bottom 10% 0.39 0.09 56%

The overall trend points to a good rating means a better chance that player will have a good following year. A player from a pool with a nearly 70% hit rate is a better risk than a player from a pool with only a 40% hit rate. And it is possible that the success rate for poor performers is inflated because other poor players were either cut or left the league.

In practice, a knowing which pool a player falls informs the conversation. Why would a player in the bottom pool go against the trend and be successful the following year? Were they injured, out of favor, playing a different role, or perhaps just a teenager in their first year as a pro? A player value rating starts a conversation; it doesn’t end the conversation.

And independent of any predictive uses, player ratings that describe game performance have utility as well. By converting the contribution of game actions into a common currency – the xG equivalents – we can compare the relative contributions of different positions. Top level forwards are prized players, and quantitative ratings can justify, or debunk, this common wisdom. And a higher-level stat, especially one that isolates the influence of a player, can help flag underperforming players for a deeper inquiry. Further, understanding the typical value created from different positions can be used to allocate limited roster resources. In other words, where will team get the greatest benefit from an upgrade?

The overall player contributions are not as consistent as one would want for predictive purposes. But when looking for an edge in evaluating and predicting performance, why not use all the tools that are available? And the less desirable prediction results could be due to the effort to capture all events that influences game results. Some rare events can have huge consequences, like red cards or conceding penalties; thus, adding to the “noise” captured in the overall rating.

In Part 2, I’ll examine the subcategories that make up the overall player value. We’ll take a deeper look at the how the player contribution is calculated for different game events (shots, passes, turnovers, defensive actions, etc.) – and take a look at the 2018 leaders in key categories like shot value. Finally, we will determine which subcategories are much more consistent from year to year than the overall player values.