DC United: Model Breakers or Just Lucky?

By Kevin Minkus (@kevinminkus)

D.C. United's defense, according to advanced metrics, is not very good. As of this weekend, their Expected Goals Against (xGA) sit at 15.7, their Total Shot Ratio (TSR) is .415, and they've allowed 156 shots in 11 games. According to these stats, they should find themselves near the bottom of the Eastern Conference standings. But, miraculously, they're not. Instead, they're at the top of the standings, with 21 points through 11 games. They've allowed only nine goals, tied for fewest in the league.

Some consider this to mostly be a product of luck; after all, they've outperformed their xGA by nearly seven goals, and their PDO is at 1058. Given the quality of shots they've faced this season, the probability of them allowing nine goals or fewer is just three percent. These numbers suggest they are due for a regression sometime soon. But there's a problem. The numbers said the same thing last season, and that regression never happened. In 2014, D.C. United finished first in the East with 59 points, third most in the league. They had just a two percent chance of outperforming expected goals allowed by the margin that they did (about 12 goals). A question on many MLS analysts' minds is “How?”

At least part of this over-performance probably can be attributed to luck (and maybe unconscious biases in the data). You can't ever really rule it out entirely. But a much greater part of it has a simple, straightforward explanation: they actually do defend well, even though it doesn't show up in the stats in ways we might expect it to.

The chart below shows expected goals against minus goals against for 2014, as well as the probability of at least that level of over or under performance occurring. Teams with a highly positive xGA-GA are “lucky”, teams with a highly negative xGA-GA are “unlucky”. The right-most column is the percentage of shots against that are off target or blocked.

2014 Expected goals against minus goals against for 2014
Team xGA-GA Chances of Allowing at

Least that Many Goals
% of Shots Blocked

or Off Target
RSL 15.12 0.00564.16%
DCU 12.62 0.02263.97%
SJ 9.08 0.08562.64%
FCD 4.67 0.2866.06%
POR 4.44 0.25961.54%
CLB 3.90 0.24862.68%
NYRB 1.36 0.42562.42%
HOU 1.23 0.46462.50%
SKC 1.04 0.45960.17%
NE 0.43 0.52162.62%
VAN 0.13 0.52566.29%
PHI -0.21 0.53561.48%
CHI -0.84 0.59261.80%
LA -3.08 0.76461.25%
SEA -4.47 0.82163.86%
TOR -5.82 0.87363.89%
MTL -7.70 0.91758.08%
CHV -14.47 0.99360.64%
COL -14.92 0.99355.82%

The two teams that over-performed xGA the most, D.C. United and Real Salt Lake, also were in the top four for highest percentage of off target or blocked shots against. The correlation between these two variables is pretty high for 2014. For the years 2011 to 2014, it isn't quite as high, but is still significant. That is, the teams that over-perform expectations the most generally force higher than expected numbers of off target and blocked shots. This, of course, makes sense; shots that miss or get blocked can't become goals. United is once again forcing a lot of misses this season, as their percentage of shots that are off target or blocked stands at about 62 percent.

This, too, could merely be the product of luck. None of off target percentage, block percentage, or off target plus block percentage is, in general, particularly repeatable from season to season. However, off target plus block percentage is generally a good indicator of save percentage, at least for teams with the highest percentages (for whom it is less likely to be luck):

From these numbers, and from watching them play, the logical conclusion is that D.C. United pressures shooters on the ball, and gets defenders behind the ball, to the extent that they force opposing teams to shoot poorly, resulting in missed shots, blocked shots, and shots that are easy saves for the goalkeeper. Put another way, their defensive pressure causes teams to shoot at percentages below what expected goals models would predict. That D.C. United's defense leads to a high save percentage also means that their PDO will be a poor measure of how “lucky” they are.

As further evidence that this is not just luck, teams that consistently force a lot of blocked and missed shots also tend to be teams that allow a high proportion of shots from crosses. D.C. United and Real Salt Lake, 2014's two most over-performing teams, were also the two teams that faced the highest percentage of shots off crosses. This suggests those teams are making the conscious decision to pack the box and pressure shots, at the cost of allowing space out wide.

Most teams can be considered “good” defensively by the number of shots they give up. A low number of shots typically means a low number of expected goals, which in turn means a low number of goals. Other defenses are considered “good” because they only give up low percentage shots. They may give up a lot of shots, but, because those shots aren't likely to go in, they have low expected goals totals, and they don't allow many actual goals.

D.C. United's defense, as I've tried to show, takes a third way. They give up shots, from good positions, but because of their abnormally high defensive pressure,  those shots are more likely to miss the target or be blocked or saved, and therefore expected goals models overstate their chances of going in. Because of this, most metrics will systematically mis-evaluate the team's defense. This was the case last season, and it is once again the case at the start of this season. Presumably, expected goals models that incorporate defensive positioning would more accurately describe the team's defensive performance, but until the day comes when that data is made publicly available, we won't know for sure.

When to park the bus in MLS

By Kevin Minkus (@kevinminkus)

Should teams park the bus? When?

Goals change games. Garry Gelade recently wrote two excellent pieces on this phenomenon (found here and here). One of his key findings is that teams that are down a goal increase their shooting rates to try to make up the deficit, while teams that are ahead take fewer shots. The thinking goes that teams that are ahead can afford to let up on the attack in order to better maintain defensive shape, and thus give up fewer high quality chances to their opponents. In other words, they park the bus. Whether this is a sound strategy remains an open question, and, if it is, how early is too early to do it?

As an example, here is what the 2014 Crew looked like in terms of shots when behind, tied, and ahead (hat tip to Garry, once again, for the excellent way to visualize this):

Let me know on twitter if you'd like to see a different team's graph for any season from 2011 to 2015 - @kevinminkus.

As you can see, Columbus shot less frequently when in the lead. This is a pretty typical trend.

Using logistic regression, we can evaluate the effect of shots and shot quality on a leading team's chances of conceding the next goal. The model I've built, like Garry's, breaks down a game into a sequence of game states. The game begins at 0-0, and each time a goal is scored, a new game state segment begins. My model takes as inputs the number of shots the leading team takes, and the average quality of those shots (using the site's expected goals model) during a segment. It then outputs the probability of that team conceding the next goal.

In general, teams that shoot more are less likely to concede the next goal in a game. Teams that take better shots are also less likely to concede the next goal. If we include only situations where a team is up by one goal, the same results hold. However, if we only look at time frames towards the end of games and where teams are up by one goal- situations where parking the bus would be appropriate- things change. 

To examine the problem this way, I've built separate models using data filtered by when each segment begins. I've filtered the data this way since I'm hoping to answer the question of when a team should start to go into a defensive shell. Using the start time of the segment, I think, is a good though not perfect proxy for this. For example, then, to see whether parking the bus is a good tactic up a goal after 70 minutes, the model is built using data from game segments which begin on or after the 70 minute mark. Note that as a point of interest I've also included whether the leading team is home or away as a variable in the model.

The chart below shows the minute mark I've filtered by, and whether each of the three variables for the leading team- shots, shot quality, and venue (home or away)- has a statistically significant effect on whether that team concedes a goal. 

Essentially what this shows is after the 63rd minute, taking more shots no longer decreases a leading team's chances of giving up a goal. If a team is looking to see the scoreline out, this would be the time to implement a tactical change by withdrawing into a defensive shell. It still makes sense, however, to take high quality chances as they come, at least until about the 69th minute.

It's also interesting to note that in close games in the second half, being home or away doesn't really help prevent conceding. This appears to be evidence against teams playing differently up one late at home versus up one late away.

If, instead of holding on to the scoreline, a team's goal is to put the game away by scoring an insurance goal, that can be modeled, too. For the chart below I've built logistic regression models for each minute mark, using the same variables. The output now, though, is the probability the leading team scores the next goal.

The models suggest taking more shots increases a leading team's odds of being next to score until the 71st minute, while taking high quality shots increases a leading team's odds of being next to score until the 77th minute. So, if a team wants one more goal, taking more shots will help until about 19 minutes remaining, and taking high quality shots will help until about 13 minutes remaining.

There's definitely more work to be done in this area. One next step would be to directly evaluate the trade-off between seeing out the score and trying to put the game away by scoring once more.

This analysis also certainly isn't definitive. I've approached the problem this one way, I'm sure there are some flaws with this approach, and I'd love to hear about them, and see other ways to tackle it.