Shots in the Dark: how data providers tell us different versions of what happened / by Eliot McKinley

By Eliot McKinley (@etmckinley)

Recently, this tweet created a small firestorm in the soccer analytics community. While it is unclear the source of the error, it was pretty clear that there weren’t 1,300 passes and 50 shots in an English League 2 match. This led to responses from prominent analysts such as StatsBomb’s Ted Knutson (including on his podcast [starts at 10:45]), Opta’s (and ASA alum) Tom Worville and Ryan Bahia, and Chris Anderson, author of The Numbers Game. All of them were saying pretty much the same thing: question the data you are using. If the data you are using to analyze a problem is not valid, then your solutions won’t be either.

So what do we know about the data that is used for soccer analysis? Previous studies have shown that people are pretty good at agreeing about what type of event occured in a soccer game (e.g. shots, tackles). But as far as I can tell, the accuracy and precision of locations  of game events among the various data providers has not been studied. As Joe Mulberry pointed out when looking at the troubling inconsistencies between spatial tracking data and event data, small differences in locations can have big effects on downstream analysis including expected goals (xG) models. In other words, small inconsistencies in how data is tracked can have big consequences for the models built off that data. So what are the differences between how soccer data providers collect and report their data?

To partially answer this, I took to Twitter. I created a Google survey that asked a user to watch a video of a goal and then code the location of a shot using Peter McKeever’s fabulous online tool. While the specifics of how companies code the data are still a bit shrouded, this method is probably a crude version of what data companies do. But instead of (presumably) well paid and well trained professionals doing the work, it is random, totally trustworthy, people on the internet doing it for free.

I asked people to look at three different shots. The first was a headed goal from Poland in the 2018 World Cup, this one was a bit tricky because the broadcast angle and the player’s jump makes determining the exact location difficult.

The second was a goal scored off a direct free kick by England, this seemingly would be easier.

Finally, a low-quality video of a goal scored by my high school alma mater, St. Charles. This is the hardest of the three to determine and is probably more representative of the types of recordings companies may see from lower tier leagues around the world.

After some data cleaning, I ended up with 119 measurements for the Poland goal, 35 for the England goal, and 26 for the St. Charles goal. Raw results are available here. While we don’t have a true value for the shot position, I assumed that the crowd was wise and considered the average x and y positions from the Google survey were the actual ones. As you can see from the figure above, there was quite a bit of variability in where the users coded each shot. For the Poland goal there was an average difference of 1.52 yards between the consensus shot location and an individual coded point, for England it was 1.81 yards, and for St. Charles it was 3.66 yards.

I then applied the American Soccer Analysis expected goal model to the data set to see how much variation there was in downstream analysis. The results, again, showed quite a bit of variation. For the Poland goal, the xG values ranged from 0.050 to 0.281, the England goal from 0.058 to 0.094, and the St. Charles goal from 0.030 to 0.588.

Since one of the biggest drivers of the ASA xG model is the logarithmic distance to the goal, small changes in the coded location can have large effects on xG. Thi is especially true when a shot is taken close to the goal. Both Poland’s goal and England’s goal had roughly similar errors from the mean location, but Poland’s one showed much more variation in xG because it was much closer to the goal. In short, a 1.5-yard error on a shot taken 10 yards from goal will have a much larger effect than the same error on a shot 30 yards from goal.

The variation on the St. Charles shot xG was immense. The low-quality video and bad angle surely complicated the location coding, and there is even an argument that it was an own goal (which 32% of users thought it was). Again, these issues are probably minor for the bigger leagues around the world with HD video and multiple angles, but if you are data scouting somewhere more obscure, you may think twice about the numbers you are looking at.

This variation in xG could have profound knock-on effects. Over time, errors would likely even out over an entire season or seasons, but given the rarity of shots at a team or player level, this error could compound. For example, if a single player had taken all three of these shots in a game, their total xG could range anywhere between 0.138 and 0.963. Many disparate conclusions could be made based on such variable results.

I then looked at variation amongst shot location coding at the 2018 World Cup using four well-known soccer data providers (by the way, if anyone has a data source to add to this, my DMs are open). Looking at the two shots included in the Google survey shows how variable these can be (see above). For the Poland goal, three of the providers were in good agreement with the Google survey results, while one was well outside. For the English goal, there was more disagreement, with a couple of providers coding the shot as being on opposite sides of the field, not a small error.

Taking a random subset of shots across the two World Cup games, generally shows that there is general agreement among the four data providers about where a shot was taken. However, there were often outliers among the data providers. Furthermore, there were a couple of cases where the providers even disagreed on which player took the shot. Just looking at the x-coordinate of the shot (goal line-to-goal line), 30% of the 45 shots I looked at had a range among the data providers greater than 5 yards. That seems like a lot for what should be a pretty objective measurement. If this is the variability that is seen in data from the World Cup, the biggest sporting event on the planet, one can only wonder what it looks like in less prestigious confines like the EPL, or more prestigious confines like MLS.

compare.png

Lastly, I was able to match 1,313 shots from Data Providers A and B from the 2018 World Cup. Looking across all the matched shots, there was a statistically significant difference in shot distance between the two providers, with B being longer than A. Furthermore, both providers showed a roughly similar shape of their density curves, with bimodal peaks, just shifted a bit. Looking at a Bland-Altman plot, we see that the mean difference in distance between the two providers was 1.52 yards, with a relatively wide 95% confidence interval (+/- 1.96 SD). Additionally, it does not appear that the difference in distance between the two providers was related to distance from goal, i.e. as shots got farther, the distance between the two did not change. You would expect that, all else being equal, data provider B to have lower xG values than A due to overall coding shots as farther from goal.

So which of the providers is best? It’s not really something I can answer here. I can say that there is disagreement among the data providers, and sometimes that disagreement is large. In order to answer that question, I would need: 1) much more data and 2) the actual shot locations. The first is hard because data is expensive and tightly held for one provider, let alone many. The second is hard because we don’t have the objective shot locations, although tracking data may get us most of the way there (I know the 2018 World Cup was tracked, so if anyone has that data, hit me up).

Until then we have to rely on the data that we can get, and be cognizant of its potential limitations in our analyses. As Toronto FC’s Devin Pleuler said about xG, “It’s a shit metric, but a really great framework.” What matters is a way to assess the quantity and quality of chances a team produces. xG can do this, even if the numbers can be a bit fuzzy.