Expected Goals 2-0

Finishing in MLS Part 2: Is Finishing Real? Heading Towards a Conclusion by Sean Steffen

The topic of “finishing” is always a fun one in the analytics world, and, last April, it’s one I studied using data going all the way back to the beginning of the league to see if I could find evidence for a statistically significant gradient of repeatable finishing skill in MLS. Click the link to read the piece in full, but the short of it was, while there were many instances where a forward outperformed their xG by a wide margin or converted an unusual number of their shots on goal, these seasons were rarely repeated within a player’s career as you would expect if such numbers were tied to a skill.

After such a long and arduous study, you can imagine my consternation any time I read a piece praising or criticizing a player’s finishing skill within the league. In fact, when Jordan Morris told the New York Times, “my finishing is still raw,” I nearly had an aneurysm. Doesn't anyone read long winded statistical articles anymore? (Answer: no) But read more after the jump.

Read More

Do expected goals models lack style? by Jared Young

By Jared Young (@JaredEYoung)

Expected goals models are hip in the land of soccer statistics. If you have developed one, you are no doubt sporting some serious soccer knowledge. But it seems to be consistent across time and geography that the smart kids always lack a bit of style.

If you are reading this post you are probably at least reasonably aware of what an expected goals model is. It tells you how many goals a team should have scored given the shots they took. Analysts can then compare the goals actually scored with the goals a team was expected to score and use that insight to better understand players and teams and their abilities.

The best expected goals models incorporate almost everything imaginable about the shot. What body part did the shooter connect with? What were the exact X,Y coordinates of the shooter? What was the position of the goalie? Did the player receive a pass beforehand? Was it a set piece? All of these factors are part of the model. Like I said, they are really cool.

But as with all models of the real world, there is room for improvement. For example, expected goals models aren’t great at factoring in the number of defenders between the shooter and the goal. That could force a higher number of blocked shots or just force the shooter to take a more difficult shot than perhaps they would like to. On the opposite end of that spectrum, perhaps a shooter was wide open on a counterattack, the models would not likely recognize that situation and would undervalue the likelihood of a goal being scored. But I may have found something that will help in these instances.

I recently created a score that attempted to numerically define extreme styles of play. On the one end of the score are extreme counterattacking teams (score of 1) and on the other end are extreme possession-oriented teams (score of 7). The question is, if I overlay this score on top of expected goals models, will I find any opportunities like those mentioned above? It appears there are indeed places where looking at style will help.

I have only scored one full MLS season with the Proactive Score (PScore) so I’ll start with MLS in 2014, where I found two expected goals models with sufficient data. There is the model managed here by the American Soccer Analysis team (us!) and there is the publicly available data compiled by Michael Caley (@MC_of_A). Here is a chart of the full season’s average PScore and the difference between goals scored and expected goals scored for the ASA model and Michael Caley’s model.

Both models are pretty similar. If you were to draw a straight line regression through this data you would find nothing in particular. But allowing a polynomial curve to find a best fit reveals an interesting pattern in both charts. When the Pscores are below 3, indicating strong counterattacking play, the two models consistently under predict the number of goals scored. This makes sense given what I mentioned above; teams committed to the counterattack should find more space when shooting and should have a better chance of making their shots. Michael Caley’s model does a better job handling it, but there is still room for improvement.

It’s worth pointing out that teams that rely on the counterattack tend to be teams that consider themselves to be less talented (I repeat, tend to be). But you would think that less-talented teams would also be teams that would have shooters that are worse than average. The fact that counterattacking teams outperform the model indicates they might also be overcoming a talent gap to do so.

On the other hand, when the PScore is greater than 4, the models also underpredict the actual performance. This, however, might be for a different reason. Usually possession-oriented teams are facing more defenders when shooting. The bias here may be a result of the fact that teams that can outpossess their opponent to that level may also have the shooting talent to outperform the model.

Notice also where most teams reside, between 3 and 4. This appears to be no man’s land; a place where the uncommitted or incapable teams underperform.

Looking at teams in aggregate, however, comes with its share of bias, most notably the hypothesis I suggested for possession-oriented teams. To remove that bias, I looked at each game played in MLS in 2014, home and away, and plotted those same metrics. I did not have Michael Caley’s data by game, so I only looked at the ASA model.

For both home and away games there does appear to be a consistent bias against counterattacking teams. In games where teams produce strong counter-attacking Pscores of 1 or 2, we see them also typically outperforming expected goals (G - xG). Given that xG models are somewhat blind to defensive density it would make perfect sense that counterattacking teams shoot better than expected. By design they should have more open shots than teams that play possession soccer. It definitely appears to me that xG models should somehow factor in teams that are playing counterattacking soccer or they will under estimate goals for those teams.

What’s interesting is that same bias does not reveal itself as clearly at the other end of the spectrum, like we saw in the first graph. When looking at the high-possession teams -- the sixes and sevens -- the teams' efficiencies become murkier. If anything, it appears that being more proactive to an extreme is detrimental to efficiency (G - xG), especially for away teams. The best fit line doesn’t quite do the situation justice. When away teams are very possession-oriented with a PScore of 6 or 7, they actually underperform the ASA xG model by an average of 0.3 goals per game. That seems meaningful, and might suggest that gamestates are playing a role in confusing us. With larger samples sizes this phenomenon could be explored further, but for now it's safe to say that when a team plays a counter-attacking game, it tends to outperform its expected goals.

Focusing on home teams with high possession over the course of the season, we saw an uptick to goals minus expected goals. But It doesn’t appear the case that possession-oriented teams shoot better due to possession itself, based on the trends we saw from game to game. It seems that possession-oriented teams play that way because they have the talent to, and it’s the talent on the team that is driving them to outperform their expected goals.

So should xG models make adjustments for styles of play? It really depends on the goal of the model. If the goal is to be supremely accurate then I would say that the xG models should look at the style of play and make adjustments. However, style is something that is not specific to one shot, it looks over an entire game. Will modelers want to overlay macro conditions to their models rather than solely focus on the unique conditions of each shot?

Perhaps the model should allow this bias to continue. After all, it could reveal that counterattacking teams have an advantage in scoring as one would expect.

If the xG models look to isolate shots based on certain characteristics, perhaps they should strive to add data to each particular moment. Perhaps an aggregate overlay on counterattacks would be counterproductive as it would take the foot off the pedal of collecting better data for each shot taken. Perhaps this serves as inspiration to keep digging, keep mining for the data that helps fix this apparent bias. Perhaps it’s the impetus to shed the sweater vest and find an old worn-in pair of boots. Something a little more hip to match the intellect.

Introducing Expected Goals 2.0 and its Byproducts by Drew Olsen

Many of the features listed below from our shot-by-shot data for 2013 and 2014 can be found above by hovering over the "Expected Goals 2.0" link. Last month, I wrote an article explaining our method for calculating Expected Goals 1.0, based only on the six shot locations. Now, we have updated our methods with the cool, new, sleek Expected Goals 2.0.

Recall that in calculating expected goals, the point is to use shot data to effectively suggest how many goals a team or player "should have scored." This gives us an idea of how typical teams and players finish, given certain types of opportunities, and then allows us to predict how they might do in the future. Using shot locations, if teams are getting a lot of shots from, say, zone 2 (the area around the penalty spot), then they should be scoring a lot of goals.

Expected Goals 2.0 for Teams

Now, in the 2.0 version, it's not only about shot location. It's also about whether or not shots are being taken with the head or the foot, and whether or not they come from corner kicks. Data from the 2013 season suggest that not only are header and corner kick shot totals predictive of themselves (stable metrics), but they also lead to lower finishing rates. Thus, teams that fare exceptionally well or poorly in these categories will now see changes in their Expected Goals metrics.

Example: In 2013, Portland took a low percentage of its total shots as headers (15.4%), as well as a low percentage of its total shots from corner kicks (12.3%). Conversely, it allowed higher percentages of those types of shots to its opponents (19.2% and 15.0%, respectively). Presumably, the Timbers' style of play encourages this behavior, and this is why the 2.0 version of Expected Goal Differential (xGD) liked the Timbers more so than the 1.0 version

We also calculate Expected Goals 2.0 contextually--specifically during times periods of an even score (even gamestate)--for your loin-tickling pleasure.

Expected Goals 2.0 for Players

Another addition from the new data we have is that we can assess players' finishing ability while controlling for the various types of shots. Players' goal totals can be compared to their Expected Goals totals in an attempt to quantify their finishing ability. Finishing is still a controversial topic, but it's this type of data that will help us to separate out good and bad finishers, if those distinctions even exist. Even if finishing is not a repeatable skill, players with consistently high Expected Goals totals may be seen as players that get themselves into dangerous positions on the pitch--perhaps a skill in its own right.

The other primary player influencing any shot is the main guy trying to stop it, the goalkeeper. This data will someday soon be used to assess goalkeepers' saving abilities, based on the types of shot taken (location, run of play, body part), how well the shot was placed in the goal mouth, and whether the keeper gave up a dangerous rebound. Thus for keepers we will have goals allowed versus expected goals allowed.

Win Expectancy

Win Expectancy is something that exists for both Major League Baseball and the National Football League, and we are now introducing it here for Major League Soccer. When the away team takes the lead in the first 15 minutes, what does that mean for their chances of winning? These are the questions that can be answered by looking at past games in which a similar scenario unfolded. We will keep Win Expectancy charts updated based on 2013 and 2014 data.