Finishing rate as a predictor of future performance / by Matthias Kullowatz

By Matthias Kullowatz (@mattyanselmo)

Finishing ability can loosely be described as the efficiency with which a player or team puts the ball in the back of the net. A simple finishing metric might be goals divided by shots---in other words, a shooting percentage. The finishing metric I'll use here is goals divided by expected goals, or G/xG. Soccer nerds like me have bantered about whether or not finishing is skill, and that is still a controversial topic, but what's not controversial is how finishing rates can improve prediction models. I'll show you.

Jared pumped out an article a few weeks ago about how certain types of teams are capable of overperforming their expected goals. Jared suggested that some of it may be due to style of play. Counter-attacking teams, for example, are likely to get fewer shots, but those shots are probably of higher quality. Specificially, those shots are probably taken with more time, space, and physical momentum toward the goal than the average shot---three things not currently included in our expected goals. If this is true, then it means that finishing rate (G/xG) could be a stable metric, one that predicts future success. But not because finishing is necessarily a skill, but rather because of the influence of team style on finishing.

In fact, finishing rate is a stable predictor. Below is a smoothed curve showing the correlation between expected goal differential (xGD) in the first X games and actual goal differential in the final 34 - X games. Correlation here is the square root of the R-squared value. The red curve includes only expected goals as a predictor, and the green curve additionally includes finishing rate.

Even early on in the season, we see that the addition of finishing rate into the model helps the existing expected goals to predict future goals. Technically, this graph is showing the correlations between goal differentials, so let me break it down by goals scored and goals allowed. 

The two graphs above suggest that the entire increase in predictive correlation comes from offensive finishing rate, not defensive finishing rate allowed. I think this can probably be explained by the fact that a counter-attacking team will always be a counter-attacking team, but will not always play against a counter-attacking team. I think it's similar to the reason that pitchers find it hard to control their BABIP (Batting Average of Balls in Play). They're facing a variety of opponents, and they can't control what comes back at them. (Sorry, non baseball fans.) 

Skeptics with a good feel of linear regression may point out that any time you include more explanatory variables, the R-squared must increase. Yes, that's true, but on a case-by-case basis I also noticed that the p-value on the finishing rate coefficient became consistently significant by a team's 12th game. Some out-of-sample model diagnostics also showed that the average absolute error on predicted goal differential is less when including finishing rates, which is probably even stronger support for the usage of finishing rates.

As Jared and others have already suggested, some form of finishing rate should be included in predictive models, as well as in the general evaluation of a team. I'm just a little late to the party. I like to avoid awkwardly arriving first, anyway.