I recently created a decent set of MLS possession data while working on another project, and I was curious if the patterns of the famous Reep analysis would hold for MLS. Thus, I attempted to replicate his result, and perhaps offer a couple new perspectives to the data.
I was first introduced to the legacy of Charles Reep while reading The Numbers Game (by Chris Anderson & David Sally). Reep was an early advocate for applying statistics to soccer, and was famous for tracking game events by hand over many seasons. According to his data, most goals were scored from possessions with three passes or fewer. And this was taken as empirical justification to play directly; minimizing the touches with longer passes in order to improve results.
Although Reep’s status as a pioneer in the sport is secure, many still debate the results and interpretation. Some critiques assert the underlying data was misinterpreted. Highlighting a simple majority of goals may not be the best analysis when most possessions had three or fewer passes anyway. Others suggest the structure of the analysis confuses correlation with causation; leading to misapplication of the results. In short, one can’t tell if the results were caused by the number of passes, or whether some other factors have causal roles. As I attempt to recreate the analysis; it’s worth stating the same criticisms and critiques apply to this replication effort as well. Read More
When you talk about a soccer team, you almost always talk about its style: high-pressing, possession-heavy, parking-the-bus, etc. A team’s style not only signifies how they play on the field but also reflects its coaching. Since there aren't guidelines on how the style of the team should be defined, everyone uses their own rules and we can't directly compare each other's descriptions.
An accurate quantitative description of the style is needed. It can help one to properly analyze not only the opponent's team but also his/her own team. With an accurate method to describe the style, one can scientifically evaluate if a training exercise is efficient at serving its purpose. We previously have used dimension reduction technique, t-SNE, to find MLS teams with similar styles based on the spatial distribution of activities and pass networks. This time we use a different method, k-means clustering of pass types, to quantitatively measure style, tactical specialization, and the influence of coaching on a team’s system. Read More
We updated our xGoals model a few weeks ago, as well as our process for continuously updating it throughout the season. Naturally, we’ve done the same for the xPassing model, which estimates the probability of any given pass being completed based on a number of details about the pass. You can read more about the original model here, but here’s the summary of the new model: Read More
In Game of Throw-Ins, I characterized and introduced an expected throw-in possession retention model (xRetain) for MLS. Go read the whole thing, but it showed that throw-ins are more likely to be completed and possession retained when they are thrown backwards, quickly, and outside a team’s defensive third. But what are MLS teams and players doing with their throw-ins?
To help differentiate teams’ throw-in styles, I turned to hierarchical clustering (see the graph below). I won’t get into mathematical details, but you can think of it sort of like an evolutionary tree. However, instead of the branches separating species, they are separating different throw-in angle frequencies. Kind of like how humans and chimpanzees are near each other on the branches of an evolutionary tree but far away from birds, teams which always throw the ball backwards and short will be far away from those that always take throw-ins forward and long. Read More
Much has been written and studied about set pieces in soccer. Penalty kicks have been Bayesed multiple times, I’ve analyzed free kicks in MLS and at the World Cup, corner kicks have been rigorously studied. But what about the humble throw-in? Aside from when teams develop a long throw-in program (see Delap, Rory) they are largely ignored or even ridiculed, in the case of Liverpool hiring a throw-in coach (see the first comment here).
We all know that some teams play a certain style, Red Bulls play with high pressure and direct attacks, Vancouver crosses the ball, Columbus possesses the ball from the back. Although we know these things intuitively, we can use analytical methods to group teams as well. Doing so seems unnecessary when we have all these descriptors like press-resistance, overload, trequartista-shadow striker hybrid, gegenthrowins, mobile regista, releasing, Colorado Countercounter gambits...etc (we actually don’t know what some of these terms mean and may have made some up, but the real ones are popular so just google them yourself). Those terms are nice, but no qualitative descriptor can tell us how the styles of New York City and Columbus differ from each other. We need to measure, compare, and model two teams’ playing styles and efficiencies. If we are able to do these things we may be in a position to answer what style really is. Read More
Directional Passes Over Expected: Where do players exceed passing expectations?
During the National League Wildcard playoff game, American Soccer Analysis contributor and Lamar Hunt US Open Cup champion, Sean Steffen tweeted about the baseball stat Directional Outs Above Average. This metric tells you about the defensive range of an outfielder, with positive values indicating a direction where the player is better than average at creating an out and negative where the player is below average. Obviously, this exact type of metric cannot be used in soccer, but it did inspire me to figure out how something like it could be used. Thus, Directional Passes Over Expected (DPOE) was born. Read More
Short passes dominate every soccer game. They are the most abundant on-the-ball action. But the variation in short pass accuracy is small; the difference in short pass success rates between the best and the worst team in MLS is 13%. For a typical game with about 400 short passes, the difference represents 52 more successful attempts, or one extra pass every two minutes. How much impact can these extra passes have?
Atlanta United is especially dependent on short passes that lead to shots. What would a few more short passes mean for their offense? Yankee Stadium is a tough place for any visiting team. Critics say that it is too small, and only New York City FC play well there. How exactly do they take advantage of the home turf?
The best way to approach these questions other than watching thousands of clips is to make a model with data and use it to examine or even predict what a team excels or suffers. There isn’t one... yet. Can we make one? Read More