By Kevin Minkus (@kevinminkus)
Note: If you're not interested in the math, skip down to “With that in mind". Alternatively, if you're especially interested in math, checkout my github repo with the data and a jupyter notebook.
What if we wanted to rank MLS penalty kick takers? What would be the best way to go about it?
We could look at historical PKs, and take the players who have converted the highest percentage of their chances. Here are a handful a players who have scored 100% of their regular season penalties, going back to 2011:
Octavio Rivero, Victor Bernardez, Danny Cruz, Servando Carrasco, Shaun Maloney, Gonzalo Pineda, Brek Shea, Hendry Thomas
Raise your hand if you want Victor Bernardez taking your team's PKs.
We’d do better to incorporate the number of attempts each player has taken, and use that as a tiebreaker for players with the same percentages. Here are the ten best using that method:
The top looks a bit better, but what about the players at the bottom?
Do we really believe Michael Bradley and Thierry Henry are worse than Ryan Finley, a man who somehow managed to be entrusted with two penalties (of which he made one) while playing just 800 total MLS minutes? And what about Steven Gerrard, whose only attempt came against Nick Rimando and his fantastic 47% success rate?
I’m going to use Bayesian inference, and, more specifically here, Item Response Theory, to get at each player's true propensity to score penalties in a bit more nuanced and comprehensive way than looking only at historic totals. To explain that informally, with Bayesian inference we begin an analysis with some hypothesis about each player's true penalty scoring ability. Then we update that hypothesis as we see additional data. There is some good precedent for using this sort of approach to measure scoring ability. See here and here for two excellent articles. I'm going to apply this method specifically to shots from the penalty spot, but, with a few tweaks and considerations, it can be applied more widely. Stay tuned.
Using an Item Response Theory framework, I'll model the probability, pk, of penalty k going in as a function of a shooter i's latent penalty taking ability, θi, and a goalkeeper j's latent penalty stopping ability, bj. (See here for a fantastic piece that uses IRT to model NBA late game fouls, from which I borrow heavily.)
Here the higher the penalty taker's ability, θi, the more likely the shot will go in. The higher the keeper's ability, bj, the less likely the shot will go in.
Including an effect for the goalkeeper, while making the model more true to reality, has the added benefit of simultaneously giving us an estimate of goalie ability. I should note though that I'm crediting off-target shots to the keeper as a save. This might not be wholly accurate, but I think if you squint hard enough the case could be made that a goalie's positioning, or size, or “intimidation” can factor into off-target penalties.
We'll define our prior beliefs on θ and b as below:
Using MCMC sampling, we can incorporate the data we have on each shooter and keeper to update those prior beliefs into what is more or less our best estimate of the latent ability of each shooter and each keeper.
From that method, here are the 90% credible intervals for the 10 penalty takers with the highest average θ's, and the 10 with the lowest:
So Benny Feilhaber is pretty good. Giles Barnes, maybe not so much. The main thing to note, though, is that, even among the, as far as we can tell, ten best and ten worst penalty takers, there's a ton of overlap between our estimate of their abilities. Based on the data we have available, there's just not much discernible difference between the best and the worst.
It's tangential to the point I want to make, but here are the ten best and ten worst goalies at saving penalty kicks:
There's a bit more spread among keepers, where we have more data per keeper. Rimando is clearly very good. Donovan Ricketts is more likely bad. Jake Gleeson isn't great either. The Timbers sure know how to pick 'em. Given his recent playoff heroics it's affirming to see Zach Steffen pop up highly without that playoff data included.
To conclude, and summarize my main point again, because of small sample sizes it's very difficult to determine between good and poor penalty shooters only using the data that’s available from games. This is quite a lot of math to make a fairly intuitive point, but I already had half an article written with the math, and I’m going to come back to the method at a later date.
With that in mind:
After Atlanta United lost to the Columbus Crew on penalties last Thursday, there was a good amount of discussion around Tata Martino's admission that he did not have his team practice penalties in the game's lead-up. Martino's critics mostly argued that practice makes perfect, and that improving a team's ability there would pay huge dividends in that situation. Those on the other side maintained that penalties are mostly a crapshoot, especially given the pressure and exhaustion of the situation.
Both sides have some merit. It is certainly difficult to simulate that amount of pressure in practice, but committing penalties to muscle memory can go some way towards making that unnecessary.
Given the discussion I began this piece with, I think there's an even greater benefit to practicing penalties, though, and that's to figure out who is good at taking them. As I showed, small sample sizes make it very hard to pick out the most proficient shooters. To determine a difference with 80% probability between a true 80% penalty taker and a true 70% penalty taker, we would need to observe 231 shots from each shooter. That's a pretty substantial, maybe unrealistic, difference in ability, but a team still is never going to be able to measure it if they're only using data from actual games. If a team instead spends practice getting those reps- they'd need about 3 a day over three months in the example just given- the team can improve its knowledge of its players.
Former Philadelphia 76ers VP of Basketball Strategy Ben Falk recently talked about doing this exact thing in basketball, with three-pointers. In Philadelphia under Sam Hinkie, the coaching staff would track each player's three-point shooting percentage during practice. If a player wanted to be allowed to shoot threes in a game, he would have to hit them at a certain rate in practice.
Using practice data to assess a player's ability can, of course, can be applied to any measurable skill where in-game sample sizes may be too small - penalty kicks, free kicks, finishing, headers...
It's occasionally argued that players can't really improve their ability to shoot penalties through practice. It's something they either have, or they don't. They can either withstand the pressure, or they can't. I'm wholly unqualified to evaluate that point. But if a team can instead focus on picking the right penalty takers, it can improve its odds of winning a shootout by, say, 5%, or its odds of scoring a single penalty by 2%, and that's functionally the same thing as improving that skill individually.
There are some aspects of an in-game penalty scenario that practice can't perfectly replicate- pressure and exhaustion being the two biggest. But coaches generally have to be able to simulate these anyway - basketball coaches at every level run end-of-game drills, after all - so I do think there are ways around that.
An interesting further wrinkle is that penalties taken in practice are a repeated game. The same shooter will face the same keeper many times. This isn't the case in a live game. However, given the proliferation of data at the professional level, I would argue that we shouldn't think of an in-game PK as a one-off scenario either. Ideally opposing keepers and shooters have watched enough film or reviewed enough notes to have a good idea of each other's tendencies.
I should note lastly that it's not clear whether training time spent on penalties gives a club a greater return on investment than time spent practicing something else - set pieces, for example. Penalties certainly occur with low frequency, but when they do occur, it can be in a very high leverage situation, like, say, the knockout round of the MLS playoffs.
So there are some difficulties that mean using practice time to get more data on a team's penalty takers isn't necessarily perfect. But I think there are conditions under which it generally makes sense.