Breaking it down: g+ Sub-categories
/By Paul Harvey, Mike Imburgio, & Ben Bellman
When Goals Added first entered the public sphere, it marked a major shift in how soccer could be interpreted using data. Although similar models had been developed, it was hard to get data and turn that data into insights. In terms of a unit of account, there are few expected value models as accessible and understandable as g+.
Despite the initial success, though, g+ is still an opaque measure in many ways. Much like it’s spiritual father figure, WAR, you lose some of the how of a player's game. Although on the broad level the numbers are clear, even a simple comparison between two players can be hard to do just with the six Goals Added categories. Sure, we can say that one player has added more value via passing than another one. But what kind of passes led to that value? What’s the actual on field difference between the two players?
Introducing Subcategories
For that reason, it is worthwhile to break the six g+ categories down to more manageable subcategories that more closely map to specific actions on the field. This will allow a deeper understanding of how a player approaches the game, and the specific ways in which they add value on the ball.
There are unquestionably challenges to dividing the game into manageable chunks. Although the 6 categories each cover broad soccer territory, they are fairly clearly delineated on the surface. Everyone knows the difference between a pass and a reception. Separating one kind of pass from another, however, requires more careful consideration as well as tolerance for gray areas. Dividing the categories further also risks creating event types that are too specific, and are not useful for most players. To avoid information overload and maximize relevance, there must be only a few subcategories with meanings that are easy to understand.
Additionally, we were careful to make sure that the categories provided a range of information about players. For example, if one category always amounts to all of the value for a given action type, the breakdowns would not be useful. Ideally, subcategories would tell us about the style of a player in terms of the value they provide, showing variations across categories within players that play similar positions.
After deliberations, we ended up with 37 total subcategories.
For the purposes of this analysis, Progressive means any pass that travels at least 25% of the distance to goal. This does not exclude passes in the defensive half or third like some definitions of progressive in the analytics sphere. Regressive means moving away from the goal at least 25% of the original distance. Danger Zone (or DZ) means an arc of 25 yards around the goal. This is more useful than simply denoting the penalty box, because it captures a significant amount of the value in “Zone 14” as well. Whenever Fast Break (or just Break) is used it refers to a play moving faster than 5 yards per second down the field.
In some cases, an event can fall into more than one category - for example, a progressive pass can also occur during a break. To avoid doubling the value of that action across two categories, we opted to allow the more narrow category to take priority over the broader one. For example, a corner taken could fall into as many as 5 different subcategories, but as “set piece pass” is the most specific that is where it is categorized.
How can this be used?
One of the challenges of Goals Added has been taking a single number that represents a sum total of a player’s contributions over the course of thousands of minutes and events, and using that to draw conclusions about the player. This is especially true where the output of the g+ model is significantly different from “the eye test”.
With subcategories, it’s easy to see at a glance what a player is doing more of (or less of) that is contributing to their overall g+ performance. These kinds of breakdowns allow not just a better understanding of individual players, but a better understanding of which actions provide the most value and how to maximize those actions in a playing context.
Receiving is #1
The most valuable subcategory on a per action basis is winning a penalty (at +0.25 g+ per action); its counterpart, committing an infraction that results in a penalty, costs the greatest expected value. These are rare actions, though, and for consistent value creation nothing comes close to receiving in the “danger zone”. Receiving actions within 25 yards of goal account for 13% of all positive G+. While that’s second to Carries Towards Goal as a percentage of the total (16% of all positive value), on a per action basis, an average reception in the “danger zone” is more than six times as valuable as an average carry towards goal.
In part, this is due to how the model calculates receiving value. As a refresher, the total sum value change is partitioned to the receiver based on the given xPass (the modeled probability of pass completion) for that pass. More difficult passes therefore assign a greater value to the receiver. In the case of passes to the most heavily defended part of the pitch, the vast majority of the value goes to the receiver.
There have always been questions about the way the value is allocated in this situation, and many other possession value models choose to allocate passing value differently. There are checks and balances in place; while receiving in dangerous areas creates significant value, the most negative subcategory is miscontrolled passes in which the receiver cannot maintain possession of a pass. This keeps the value of the players receiving certain passes in context.
Solo Pa’lante
For the run up to the 2022 World Cup, the US Men’s National Team’s motto was “Solo Pa’lante” or “Only Forward”. Ironically, the USMNT had trouble with meaningful ball progression throughout qualifying. The idea was sound, even if the execution was shoddy, and provides a useful framework for understanding value on the player action level.
Essentially, soccer boils down to advancing the ball towards the opponent's goal. The closer you get to goal, the more likely you are to score. The inverse is also true; the closer you get to your opponent’s goal, the less likely they are to score on an ensuing possession. (This is why players who are effective progressive passers tend to have higher G- as well.) This bears out in the dissection of value created in g+. As mentioned previously, the top two value creating subcategories on a net level are Carries Towards Goal and Receiving in the Danger Zone. The third is Progressive Passes.
In short, players that move the ball towards goal add value for their teams. There are obviously cases where the risk is too high, or the options aren’t available, but on net the players that help their teams are the ones that move it forward.
On the other side, moving the ball away from goal is harshly penalized in g+. Regressive Passes, that move the ball 25% of the distance away from the goal mouth, are the second lowest value plays on net. The third lowest value is Carries Away From Goal. When players do these actions, it’s primarily due to risk management - the player wants to avoid giving the ball away. When examined at scale, though, the tendency to take the lower risk option will negatively impact overall possession value created or lost. g+ can cut through the noise of the individual moment and look at whether or not the choices being made are on the whole benefitting the team.Small changes in passing risk tolerance can massively affect the value an attacking midfielder brings to a team.
Value by Position
Of course, not every position is going to get the same contribution from each of the subcategory types. For CBs, almost half of their net value is created from the two Interrupting subcategories Stopping Breaks and Own DZ Actions. Almost half of the net value of Strikers is receiving the ball in the danger zone.At the same time, many of the same actions create consistent value across positions. Because so many high value actions take place close to goal, players regardless of position have the highest leverage opportunities in those same spaces. For example, CBs receive the majority of their value from interrupting actions, but are on par with strikers on set piece receptions and shots.
There are other interesting notes here; for example, DMs and CMs have very similar per 96 outputs in raw G+, but the composition of those numbers are interesting. DMs have passing scores similar to CBs, heavier on non-progressive long balls and the “Other Pass” category. Meanwhile, CMs receive significantly more value from entering the box.This breakdown can provide value in player scouting; it highlights the most important actions for each role and an expected value for those actions. The focus on specific high value areas can streamline the process and reduce noise from analysis.
Miscellaneous Notes
There are a number of curious outcomes of the model that were uncovered when looking under the hood. Not all need a long explainer, but are worth noting when looking at how players gain and lose value.
Carrying Towards Goal and Progressive Passes are two of the most valuable on ball actions. At the same time, carrying is a little more valuable on a per action basis despite not generally taking up as much space. The threshold for a carry to be counted in the data is 5.5 yards, and a carry towards goal is one that moves any distance towards the goal - not the 25% required for a progressive pass. The chief advantage of a carry over a pass is that it can’t be incomplete, but there’s also the fact that a forward carry causes the defense to have to shift its structure to engage the ball. This results in further disruption down the line.
When a player is flagged for offside, the model allocates value in an counterintuitive way. Since the action is “Offside Pass”, it is put in the Passing bucket. However, the entire possession value loss goes to the player who was caught offside. This has the biggest impact on strikers, who rarely rack up pass value to begin with. In the 2023 season Christian Benteke had a raw passing G+ of -1.74. If you take away being called offside, that number rises to just -0.10. This is less an intentional model decision than a quirk of accounting.
On defense the model likes stopping breaks, intervening in the danger zone, and reactive actions such as clearances, recoveries, and interceptions. Proactive actions outside of the danger zone are almost equivalently likely to be negative as they are positive. Part of this is the possible fail state; challenges can be missed, fouls can be committed, and the team can be harmed. Goals added thinks that the risks of going into challenges in a set defensive situation often outweigh the benefits. General fan perception is the opposite - players who are “duel winners” tend to get the majority of the plaudits. Further exploration shows that recoveries - which fall under reactive actions - have a more significant correlation with winning.
The model does not like defensive headers outside the danger zone, which are almost always negative regardless of who wins the duel. That said, most of the value gained in interrupting is when the ball is cleared as a separate action. This is a possibly unfair penalty on players who are relied upon to win these duels. Offensive aerial duels, meanwhile, are not assessed a value unless they are a completed pass.
The diagonal switch of play is a popular attacking move and appealing from an aesthetic perspective - who doesn’t like a beautiful driven ball ripped across the field? It also primarily serves to move the ball from an area of high defensive density, to a lower one. g+ does pick up on this - non progressive long balls have an average value per action of +0.0018. They’re not quite as good as progressive passes generally, but because they’re more likely to be completed the passer tends to get more of the value.
Set piece passes are tricky to value correctly. For example, if a team regularly takes short corners then the corner taker is going to get penalized for something they have no control over. Goals added hates short corners (a right and proper stance). The high volatility of set piece passing, where the outcomes can be either very good or not good at all, make it difficult to do 1 to 1 comparisons of players. Separating these numbers out could be useful.
On a per action basis, shots range from roughly 0.02 to 0.03 g+. Shots are never negative, and the most valuable on average are shots after an individual attacking action such as a progressive carry. There’s been a great deal of debate on how to calculate the value of a shot in the model, where players already receive a great deal of value from receiving. As it stands, the shot value is the expected additional value of the possession post shot outcome - so the likelihood is rebounded back into play or out for a set piece.
In part 2 of this article series, the subcategories will be further explored on the player level. The practical application of subcategories can be used to identify relative strengths and weaknesses of players, which can provide insights across various groups of stakeholders from the fans to the players themselves.