Looking for the model-busting formula / by Andrew Olsen

Well that title is a little contradictory, no? If there's a formula to beat the model then it should be part of the model and thus no longer a model buster. But I digress. That article about RSL last week sparked some good conversation about figuring out what makes one team's shots potentially worth more than those of another team. RSL scored 56 goals (by their own bodies) last season, but were only expected to score 44, a 12-goal discrepancy. Before getting into where that came from, here's how our Expected Goals data values each shot:

  1. Shot Location: Where the shot was taken
  2. Body part: Headed or kicked
  3. Gamestate: xGD is calculated in total, and also specifically during even gamestates when teams are most likely playing more, shall we say, competitively.
  4. Pattern of Play: What the situation on the field was like. For instance, shots taken off corner kicks have a lower chance of going in, likely due to a packed 18-yard box. These things are considered, based on the Opta definitions for pattern of play.

But these exclude some potentially important information, as Steve Fenn and Jared Young pointed out. I would say, based on their comments, that the two primary hindrances to our model are:

  1. How to differentiate between the "sub-zones" of each zone. As Steve put it, was the shot from the far corner of Zone 2, more than 18 yards from goal? Or was it from right up next to zone 1, about 6.5 yards from goal?
  2. How clean a look the shooter got. A proportion of blocked shots could help to explain some of that, but we're still missing the time component and the goalkeeper's positioning. How much time did the shooter have to place his shot and how open was the net?

Unfortunately, I can't go get a better data set right now so hindrance number 1 will have to wait. But I can use the data set that I already have to explore some other trends that may help to identify potential sources of RSL's ability to finish. My focus here will be on their offense, using some of the ideas from the second point about getting a clean look at goal.

Since we have information about shot placement, let's look at that first. I broke down each shot on target by which sixth of the goal it targeted to assess RSL's accuracy and placement. Since the 2013 season, RSL is second in the league in getting its shots on goal (37.25%), and among those shots, RSL places the ball better than any other team. Below is a graphic of the league's placement rates versus those of RSL over that same time period. (The corner shots were consolidated for this analysis because it didn't matter to which corner the shot was placed.)

Placement Distribution - RSL vs. League

 

RSL obviously placed shots where the keeper was not likely at: the corners. That's a good strategy, I hear. If I include shot placement in the model, RSL's 12-goal difference in 2013 completely evaporates. This new model expected them to score 55.87 goals in 2013, almost exactly the 56 they scored.

Admittedly, it isn't earth-shattering news that teams score by shooting at the corners, but I still think it's important. In baseball, we sometimes assess hitters and pitchers by their batting average on balls in play (BABIP), a success rate during specific instances only when the ball is contacted. It's obvious that batters with higher BABIPs will also have higher overall batting averages, just like teams that shoot toward the corners will score more goals.

But just because it is obvious doesn't mean that this information is worthless. On the contrary, baseball's sabermetricians have figured out that BABIP takes a long time to stabilize, and that a player who is outperforming or underperforming his BABIP is likely to regress. Now that we know that RSL is beating the model due to its shot placement, this begs the question, do accuracy and placement stabilize at the team level?

To some degree, yes! First, there is a relationship between a team's shots on target totals from the first half of the season and the second half of the season. Between 2011 and 2013, the correlation coefficient for 56 team-seasons was 0.29. Not huge, but it does exist. Looking further, I calculated the differences between teams' expected goals in our current model and teams' expected goals in this new shot placement model. The correlation from first half to second half on that one was 0.54.

To summarize, getting shots on goal can be repeated to a small degree, but where those shots are placed in the goal can be repeated at the team level. There is some stabilization going on. This gives RSL fans hope that at least some of this model-busting is due to a skill that will stick around.

Of course, that still doesn't tell us why RSL is placing shots well as a team. Are their players more skilled? Or is it the system that creates a greater proportion of wide-open looks?

Seeking details that may indicate a better shot opportunity, I will start with assisted shots. A large proportion of assisted shots may indicate that a team will find open players in front of net more often, thus creating more time and space for shots. However, an assisted shot is no more likely to go in than an unassisted one, and RSL's 74.9-percent assist rate is only marginally better than the league's 73.1 percent, anyway. RSL actually scored about six fewer goals than expected on assisted shots, and six more goals than expected on unassisted shots. It becomes apparent that we're barking up the wrong tree here.*

Are some teams more capable of not getting their shots blocked? If so then then those teams would likely finish better than the league average. One little problem with this theory is that RSL gets it shots blocked more often than the league average. Plus, in 2013, blocked shot percentages from the first half of the season had a (statistically insignificant) negative correlation to blocked shots in the second half of the season, suggesting strongly that blocked shots are more influenced by randomness and the defense, rather than by the offense which is taking the shots.

Maybe some teams get easier looks by forcing rebounds and following them up efficiently. Indeed, in 2013 RSL led the league in "rebound goals scored" with nine, where a rebounded shot is one that occurs within five seconds of the previous shot. That beat their expected goals on those particular shots by 5.6 goals. However, earning rebounds does not appear to be much of a skill, and neither does finishing them. The correlation between first-half and second-half rebound chances was a meager--and statistically insignificant--0.13, while the added value of a "rebound variable" to the expected goals model was virtually unnoticeable. RSL could be the best team at tucking away rebounds, but that's not a repeatable league-wide skill. And much of that 5.6-goal advantage is explained by the fact that RSL places the ball well, regardless of whether or not the shot came off a rebound.

Jared did some research for us showing that teams that get an extremely high number of shots within a game are less likely to score on each shot. It probably has something to do with going for quantity rather than quality, and possibly playing from behind and having to fire away against a packed box. While that applies within a game, it does not seem to apply over the course of a season. Between 2011 and 2013, the correlation between a teams attempts per game and finishing rate per attempt was virtually zero.

If RSL spends a lot of time in the lead and very little time playing from behind--true for many winning teams--then its chances may come more often against stretched defenses. RSL spent the fourth most minutes in 2013 with the lead, and the fifth fewest minutes playing from behind. In 2013, there was a 0.47 correlation between teams' abilities to outperform Expected Goals and the ratio of time they spent in positive versus negative gamestates.

If RSL's boost in scoring comes mostly from those times when they are in the lead, that would be bad news since their Expected Goals data in even gamestates was not impressive then, and is not impressive now. But if the difference comes more from shot placement, then the team could retain some of its goal-scoring prowess. 8.3 goals of that 12-goal discrepancy I'm trying to explain in 2013 came during even gamestates, when perhaps their ability to place shots helped them to beat the expectations. But the other 4-ish additional goals likely came from spending increased time in positive gamestates. It is my guess that RSL won't be able to outperform their even gamestate expectation by nearly as much this season, but at this point, I wouldn't put it past them either.

We come to the unsatisfying conclusion that we still don't know exactly why RSL is beating the model. Maybe the players are more skilled, maybe the attack leaves defenses out of position, maybe it spent more time in positive gamestates than it "should have." And maybe RSL just gets a bunch of shots from the closest edge of each zone. Better data sets will hopefully sort this out someday.

*This doesn't necessarily suggest that assisted shots have no advantage. It could be that assisted shots are more commonly taken by less-skilled finishers, and that unassisted shots are taken by the most-skilled finishers. However, even if that is true, it wouldn't explain why RSL is finishing better than expected, which is the point of this article.