Goals Added: The Art of the Wheel

This is part four of our series on Goals Added (g+). Here is part one, where John first introduced it. Here is part two, where Matthias broke our brains. And here is part three, where Kieran showed us how it fits into the world of soccer analytics.

By Eliot McKinley

Look at that. Pretty, right? The colors. The font. The icons. The logo. That tiny beeswarm. All work together to make a pretty damn good visualization. But these things just don’t emerge from a computer fully formed, they take a lot of effort and there are a ton of ways that things can go wrong. Inspired by Peter McKeever’s discussion of his beautiful diamond plots, here are the decisions we made and iterations of this viz along the way to what you see above.

As you’ve seen this week, American Soccer Analysis has developed a new metric to assess player value. We sought to develop an intuitive and visually striking way to depict this value, in this case in comparison to the average player at a specific position.

A bar chart or radar plot may be sufficient for something like this, but we wanted to do something a little different because our metric is fundamentally different. Typically a bar or radar, like we have done at ASA before, will have some assortment of stats, with potentially widely divergent scales, that encompass the salient metrics of a soccer player. Or at least the metrics that whoever made the plots thought were important. The choice of which metrics to use is important and can tell different stories depending on what you choose to display and how you do it.

In contrast, our player value wheels are comprehensive and represent the totality of a player’s on ball contributions in the same unit, goals added (g+). That constant unit, instead of the percentiles or other scales you’re used to seeing on player vizzes, is why you'll notice a wider range of values on player wheels for passing and receiving than for, say, fouls: passes do most of the work to make a possession valuable. We split a player’s actions up into six categories for display, but we could lump or split them into any number of buckets and they would always sum up to the same g+ value.

For example, our "dribbling" category includes value added from successful take-ons, which is roughly what your typical bar chart or radar axis labeled "dribbling" is trying to capture. But "dribbling" on this viz also accounts for value lost from failed take-ons, dispossessions, and miscontrols, as well as value gained or lost from simply carrying the ball in any direction. Remember, this wheel shows every touch! You could split "dribbling" into five different slices of the wheel if you wanted to, but those slices would still add up to the same quantity of goals added.

We already had a visual framework for comparing a player’s individual attributes to an overall baseline in Directional Passing Above Expected (DPOE). DPOE breaks down a player’s deviation from their expected passing based upon what direction a pass is made, with values above zero representing where they exceed expectations and those below zero where they fall short. By swapping out pass direction for goals added above average (g+avg) in some preexisting code, we could take the first step towards where we wanted to go with our visualization.

So this works, it uses the default colors for each action, the scaling is not great, the font sizes are too small, but it is functional. You’ll see worse on Twitter every day. But of course it could be better, much better.

3.png

This iteration fixes some things, first off the colors. Instead of the default, rather garish, ggplot2 color set we’ve swapped out for some that, at least, won’t make your eyes bleed. We’ve increased the font sizes so you don’t have to use a magnifying glass to read them. We added an overall player value on the inside which is the sum of all the action type values. In the legend we changed the spreadsheet column names to something human-readable as well as re-ordered the values so they flow in a more logical order. Finally we consolidated the “fouls conceded” and “fouls earned” values due to the similarity and to decrease the number of segments in the plot, again, doing this doesn’t affect the overall player value.

One key component was still missing, the player position. This is necessary since a player’s g+avg is calculated based on their position, so that a fullback is not graded on the same curve as a forward when it comes to shooting. We tried a few ways of indicating this, along with some different shading options, that were abandoned, before deciding to just figure it out later. Another change was to move the value labels from the center of the plot to the outside in order to make the center less crowded. A final subtle change in this iteration was to remove the black line at the zero point and replace it with slightly thicker segments with the same colors as the bars. First of all, this made the plots a little more streamlined, the black like broke things up a bit too much. Secondly it highlighted action types that were close to zero better.

While the bones of this visualization were basically built, there were still many details to iron out. The colors looked too much like something you’d find in a kid’s first book of colors. The incomparable John Muller suggested a palette based upon the neon t-shirts you may have worn back in the ‘90s. However, these colors don't really work well on a white background so we turned to a dark theme. Everyone is using dark themes these days, because, as Covid-19 viz superstar John Burn-Murdoch explained, they look really cool. Beyond that, dark themes provide better contrast and are easier to read on small devices like your phone.

At this point we are getting pretty close to the final product. Big changes here are the font, from the default Helvetica to a more aggressive looking Bebas Neue, and the addition of icons to denote each action type. Further, we removed the background fills which may have worked better on a white background but were too overwhelming on a dark background. On the data side, we changed the units on our values, changing from whole season values to per 96 minute values. This was done for two reasons:

  1. We didn’t want to penalize players that didn’t play a full season or were injured

  2. It’s easier to think about a player’s contribution on a per game basis than over an entire season

At this point we are almost there. In this iteration, we added a white circle in the middle to provide more contrast for the total g+ value, as well as added a key at the top indicating that this was a per 96 minute measure. Additionally, we added a subtitle to indicate which season and positional baseline a player’s actions were compared to, 2019 and attacking midfielder in this case. By adding labels directly next to the icons, the legend was no longer needed and was removed. In order to make negative values even more clear, we chose to use a stripe pattern for any negative values on the chart. The use of patterned fills can lead to some data viz atrocities, but in this case it provides a nice contrast with the positive values. Finally, the icons for shooting and dribbling were changed to be a bit stronger and more recognizable.

And here is the final product.

We’ve added total minutes played to the subtitle and included a logo in the upper right to indicate the player’s team. A line was added from the center circle to the “goals added” key to more clearly indicate what it means. There is repetition of the overall g+ value, but we believe this reinforcement is warranted.

To get a sense of how a players g+avg compares to others’, we added a beeswarm plot just under the goals added label. The dots represent the distribution of g+avg for each position and season, with the plotted player’s dot filled with white. As you can see in the gif above the dot moves according to each player’s respective g+avg compared to wingers in 2019. Carlos Vela is at the high end and Miguel Ibarra at the low, with all other wingers somewhere in between. We deliberately wanted to keep the beeswarm minimal and didn’t include labels on it. While this could be confusing if you have never seen them before, beeswarms are becoming widely utilized and immediately recognizable if you’ve seen one before. Plus we figured everyone would be reading this explainer anyway.

One final change was made to the receiving icon, previous versions had the flag up on the mailbox, which as you surely know, means that there is mail to be picked up. As such, we moved the flag to a downward position as this icon was not meant to indicate anything outgoing. It’s a small thing, but we think it matters.

One other important design decision was the scaling of each action sector. We chose to set limits of ± 0.07 goals added. This represents about the 8th and 92nd percentiles of per 96 minute values across all players and action types.. We decided that this provided enough range that both very high and low values would be captured, but also that those around zero could also be differentiated. Importantly, these values are not scaled differently for each category as is typically done in traditional bar and radar plots. As all have the same units of g+, we wanted to keep the comparison across types intact. For example, fouling is almost always the least consequential of our metrics, and we wanted to be sure that it was compared directly to those that do drive the values, such as shooting, receiving, passing, or dribbling. If we were to scale each individually by percentile, for example, then it could appear that fouling was just as important as other metrics, which is not the case.

So that’s it, this process took a couple of weeks of intermittent work to get to its final form. There were dozens of iterations of this from start to finish, with only some of them shown here. Constant feedback was important to this, including the aesthetic expertise of John Muller and the help of many on the ASA Slack including Matthias Kullowatz, Kieran Doyle, Tiotal Football, Jamon Moore, and others.

This viz was made using the R packages, ggplot2, png, ggimage, ggpattern, ggbeeswarm, and cowplot, there is no post processing done in any image editing software such as Photoshop. We think we struck a good balance here between data visualization as art and data visualization as a tool. Feel free to send any complaints to @DrewJOlsen.