Model Update: Coefficient Blending by Matthias Kullowatz

With our most recent app update, you might notice that some numbers in the xGoals tables have changed for past years where it wouldn’t normally make sense to see changes. As an example, Josef Martinez had 29.2 xG in 2018, but updated app shows 28.7 (-1.7%). No, this is not an Atlanta effect, though I can understand why you might support such an effect. Gyasi Zardes lost 0.5 xG as well (-2.4%), and no one dislikes Columbus.

We have updated our xGoal models with the 2018 season’s data, and that is the culprit of all the discrepancies since the last version of the app. I have already cited the largest two discrepancies by magnitude, so this isn’t some major overhaul of the model. In fact, only 2018’s xG values have been materially adjusted.* The new model estimated 35.6 fewer xGoals in 2018 than it did before, equivalent to a 2.8% drop.

Read More

A Tale of Two Central Defensive Midfielders by Eliot McKinley

Michael Bradley and Wil Trapp share several obvious qualities. They are both captains for club and country. They are both smooth passing defensive midfielders, and they both possess excellent heads of hair. Another similarity is that they rarely shoot or score goals, each collecting only one goal over the last three seasons. Coincidentally, both of those goals are what we could enthusiastically describe as "wonder-goals." Bradley's long-distance chip for the US national team in a World Cup qualifier against Mexico at the Azteca (a goal not remembered as fondly as it deserves due to the rest of qualifying) and Trapp for the Crew to win a match in stoppage time against Orlando City this past summer. However, one difference between these two players was how each responded to the confidence boost that came after scoring a once-in-a-career goal.

Read More

You Down with t-SNE? by ASA Staff

We all know that some teams play a certain style, Red Bulls play with high pressure and direct attacks, Vancouver crosses the ball, Columbus possesses the ball from the back. Although we know these things intuitively, we can use analytical methods to group teams as well. Doing so seems unnecessary when we have all these descriptors like press-resistance, overload, trequartista-shadow striker hybrid, gegenthrowins, mobile regista, releasing, Colorado Countercounter gambits...etc (we actually don’t know what some of these terms mean and may have made some up, but the real ones are popular so just google them yourself). Those terms are nice, but no qualitative descriptor can tell us how the styles of New York City and Columbus differ from each other. We need to measure, compare, and model two teams’ playing styles and efficiencies. If we are able to do these things we may be in a position to answer what style really is.

Read More

Expected Possession Goals GameFlow: Taking a Ride Into the Danger Zone by Jamon Moore

A few weeks ago, we introduced Expected Possession Goals (xPG) GameFlow, a visualization of the momentum of a soccer match from kickoff to final whistle of each game. xPG GameFlow uses the accumulation of Chance xPG to measure the strength of an opportunity for a team to get a shot. The higher the Chance xPG differential between the teams, the longer the bar for that minute for the team with the higher amount. Quite often goals are scored when the momentum bars on the xPG GameFlow chart are at their longest.

Many people have asked us, “what is the difference between xPG and xG?” or “how does xPG translate to xG or to goals?” To aid xPG GameFlow in answering questions such as “which team had the better chances?” and “when should a team have scored?”, we introduced a couple improvements after the first week of tweeting MLS game charts on @GameFlowxPG. I wanted to provide more context for these improvements and dive deeper into them.

Read More

Adjusting team xGoals by Matthias Kullowatz

By Matthias Kullowatz (@mattyanselmo)

When we produced the game-by-game expected goals results last week, we were surprised to see that Seattle had outpaced Portland 4.0 to 1.7. That didn't feel right, but it didn't take long before we noticed that Seattle recorded five shots inside the six-yard box leading up to its first goal. Those shots added up to more than 2.0 expected goals, despite the fact that soccer's rules limit scoring to one goal at a time. 

Read More

Introducing interactive data at ASA by Matthias Kullowatz

You know how you go to some sports websites and you can sort and filter their data, and there are lots of options and it looks cool and stuff? Well starting today, we’re rolling out interactive versions of our stats that also look cool.  You can find the link up at the top under "xG Interactive Tables." This first iteration focuses on shot stats and expected goals, and it gives you guys more ways to filter and explore the data.

Read More

Do expected goals models lack style? by Jared Young

By Jared Young (@JaredEYoung)

Expected goals models are hip in the land of soccer statistics. If you have developed one, you are no doubt sporting some serious soccer knowledge. But it seems to be consistent across time and geography that the smart kids always lack a bit of style.

If you are reading this post you are probably at least reasonably aware of what an expected goals model is. It tells you how many goals a team should have scored given the shots they took. Analysts can then compare the goals actually scored with the goals a team was expected to score and use that insight to better understand players and teams and their abilities.

The best expected goals models incorporate almost everything imaginable about the shot. What body part did the shooter connect with? What were the exact X,Y coordinates of the shooter? What was the position of the goalie? Did the player receive a pass beforehand? Was it a set piece? All of these factors are part of the model. Like I said, they are really cool.

But as with all models of the real world, there is room for improvement. For example, expected goals models aren’t great at factoring in the number of defenders between the shooter and the goal. That could force a higher number of blocked shots or just force the shooter to take a more difficult shot than perhaps they would like to. On the opposite end of that spectrum, perhaps a shooter was wide open on a counterattack, the models would not likely recognize that situation and would undervalue the likelihood of a goal being scored. But I may have found something that will help in these instances.

I recently created a score that attempted to numerically define extreme styles of play. On the one end of the score are extreme counterattacking teams (score of 1) and on the other end are extreme possession-oriented teams (score of 7). The question is, if I overlay this score on top of expected goals models, will I find any opportunities like those mentioned above? It appears there are indeed places where looking at style will help.

I have only scored one full MLS season with the Proactive Score (PScore) so I’ll start with MLS in 2014, where I found two expected goals models with sufficient data. There is the model managed here by the American Soccer Analysis team (us!) and there is the publicly available data compiled by Michael Caley (@MC_of_A). Here is a chart of the full season’s average PScore and the difference between goals scored and expected goals scored for the ASA model and Michael Caley’s model.

Both models are pretty similar. If you were to draw a straight line regression through this data you would find nothing in particular. But allowing a polynomial curve to find a best fit reveals an interesting pattern in both charts. When the Pscores are below 3, indicating strong counterattacking play, the two models consistently under predict the number of goals scored. This makes sense given what I mentioned above; teams committed to the counterattack should find more space when shooting and should have a better chance of making their shots. Michael Caley’s model does a better job handling it, but there is still room for improvement.

It’s worth pointing out that teams that rely on the counterattack tend to be teams that consider themselves to be less talented (I repeat, tend to be). But you would think that less-talented teams would also be teams that would have shooters that are worse than average. The fact that counterattacking teams outperform the model indicates they might also be overcoming a talent gap to do so.

On the other hand, when the PScore is greater than 4, the models also underpredict the actual performance. This, however, might be for a different reason. Usually possession-oriented teams are facing more defenders when shooting. The bias here may be a result of the fact that teams that can outpossess their opponent to that level may also have the shooting talent to outperform the model.

Notice also where most teams reside, between 3 and 4. This appears to be no man’s land; a place where the uncommitted or incapable teams underperform.

Looking at teams in aggregate, however, comes with its share of bias, most notably the hypothesis I suggested for possession-oriented teams. To remove that bias, I looked at each game played in MLS in 2014, home and away, and plotted those same metrics. I did not have Michael Caley’s data by game, so I only looked at the ASA model.

For both home and away games there does appear to be a consistent bias against counterattacking teams. In games where teams produce strong counter-attacking Pscores of 1 or 2, we see them also typically outperforming expected goals (G - xG). Given that xG models are somewhat blind to defensive density it would make perfect sense that counterattacking teams shoot better than expected. By design they should have more open shots than teams that play possession soccer. It definitely appears to me that xG models should somehow factor in teams that are playing counterattacking soccer or they will under estimate goals for those teams.

What’s interesting is that same bias does not reveal itself as clearly at the other end of the spectrum, like we saw in the first graph. When looking at the high-possession teams -- the sixes and sevens -- the teams' efficiencies become murkier. If anything, it appears that being more proactive to an extreme is detrimental to efficiency (G - xG), especially for away teams. The best fit line doesn’t quite do the situation justice. When away teams are very possession-oriented with a PScore of 6 or 7, they actually underperform the ASA xG model by an average of 0.3 goals per game. That seems meaningful, and might suggest that gamestates are playing a role in confusing us. With larger samples sizes this phenomenon could be explored further, but for now it's safe to say that when a team plays a counter-attacking game, it tends to outperform its expected goals.

Focusing on home teams with high possession over the course of the season, we saw an uptick to goals minus expected goals. But It doesn’t appear the case that possession-oriented teams shoot better due to possession itself, based on the trends we saw from game to game. It seems that possession-oriented teams play that way because they have the talent to, and it’s the talent on the team that is driving them to outperform their expected goals.

So should xG models make adjustments for styles of play? It really depends on the goal of the model. If the goal is to be supremely accurate then I would say that the xG models should look at the style of play and make adjustments. However, style is something that is not specific to one shot, it looks over an entire game. Will modelers want to overlay macro conditions to their models rather than solely focus on the unique conditions of each shot?

Perhaps the model should allow this bias to continue. After all, it could reveal that counterattacking teams have an advantage in scoring as one would expect.

If the xG models look to isolate shots based on certain characteristics, perhaps they should strive to add data to each particular moment. Perhaps an aggregate overlay on counterattacks would be counterproductive as it would take the foot off the pedal of collecting better data for each shot taken. Perhaps this serves as inspiration to keep digging, keep mining for the data that helps fix this apparent bias. Perhaps it’s the impetus to shed the sweater vest and find an old worn-in pair of boots. Something a little more hip to match the intellect.