Machine Learning the Crew / by Eliot McKinley

By Eliot McKinley (@etmckinley)

Machine learning is so hot right now and if Skynet is going to destroy all humans, it should at least know a little bit about Major League Soccer’s Columbus Crew. To wit, I created a machine learning model to classify which position in a Gregg Berhalter 4-2-3-1 formation a player most likely played in during a single game.

I chose the Crew for a couple reasons. First, they are my favorite team. Second, they had consistent coaching for a long period of time with a defined style of play. The latter is very important, as the model has to be trained well in order for the results to make sense. Since the Crew almost always played a 4-2-3-1 that relied on ball possession to disorganize the defense and create goal opportunities (get used to that phrase USMNT fans) it was a perfect test of whether this kind of thing could be done.

I won’t go too far into specifics, but the basic model used a Random Forest decision tree model that used the 2015-2017 seasons to predict player positions for games played in 2018. Player positions were defined to start the game (e.g. Harrison Afful = right full back, Federico Higuain = center attacking midfielder) and player actions were associated with each player during a game. These actions included passing types (based on K-means clustering, similar to this one), and the locations of defensive actions, aerial duels, and shots. The final output is a probability that a player occupied a specific position during the game (e.g. Gyasi Zardes had a 95% probability of striker, 3% left wing, 2% right wing).

Let’s look at some examples.

During Gregg Berhalter’s managerial debut for the national team against Panama, he was lauded for having an actual tactical plan for his team, something novel for the USMNT in at least a year. One of the wrinkles he introduced was playing Nick Lima as a false or inside fullback (your choice of nomenclature). Lima tucked in centrally and operated as an extra central midfielder when in possession. While this was something new for the national team, Berhalter had Harrison Afful operate like this periodically throughout the 2018 season for the Crew. Most games, Afful would play as an attacking right full back, with the model returning 80-90% probabilities that he played as a RB. However, in the games when Afful played as a false/inside fullback you would see a decrease in the RB probability and an increase typically in the RCB probability indicating he was playing more centrally.

Federico Higuain is a prototypical 10, pulling the strings for the Crew’s offense. The model typically scores Pipa highly in his central attacking midfielder position. However, due to his freedom of movement, especially in recent years, he can sometimes have increased probabilities in other positions. One game to highlight is the May 5 game in Seattle. In this game, Pedro Santos received a red card in the 15th minute. The Crew bunkered for the rest of the game and escaped with a hard fought 0-0 draw. During this game Higuain operated as almost a third defensive midfielder and had his deepest average touch position since 2015. The model picks this up by classifying Higuain as a central defensive midfielder in this match.

While this model is preliminary, it does appear to be able to classify positions within the narrow range of Gregg Berhalter’s 4-2-3-1 setup with the Columbus Crew. It is able to fairly reliably pick out what position a player was in and can spot changes in the normal tactical script. As formations are far more than the numbers listed in the box score, a team such as Red Bulls who play the same 4-2-3-1 formation but play a very different style would not be well characterized by the model based on Columbus.

This is only a first step in building a more robust model of player positioning. The task will be much tougher using multiple teams and formations with varying styles of play. We all know that the positions marked on the team sheet are a lie, and hopefully algorithms such as the one presented here can help clear up whether a player was acting as a holding midfielder or a regista, or quantify his or her contribution to both. This day is probably a ways away, and hopefully we reach it before Skynet become sentient and ends civilization as we know it.