By Benamin Bellman (@beninquiring)
I’ve been wondering for some time about soccer teams’ reliance on star power and top statistical producers. Is it really a good strategy? Are teams with one main goal scorer or playmaker easier to “figure out”? When the game is on the line, is a singular threat easier to neutralize than a team with a plethora of attacking options? And would this kind of reliance actually hamper a team’s success across a season?
My skepticism must seem foolish to European executives, given the huge fees Gonzalo Higuain and Paul Pogba went for this summer. But the conventional wisdom is different in the American sports landscape. In our most popular sports, one person simply can’t do it all. Here, Defense Wins Championships. The San Antonio Spurs, the best NBA team of the past two decades, emphasize team play over everything. Peyton Manning was completely underwhelming in both of his Super Bowl wins, needing his incredible teams to carry him to glory. One star pitcher or one star hitter is simply not capable of winning a World Series on their own. The anecdotal evidence even appears in MLS. Chris Wondolowski’s 27 goals in 2012 didn't get the Earthquakes past the first round of MLS playoffs; MVP Sebastian Giovinco and 2015’s Toronto FC didn't have much else to offer.
More and more, the conventional European wisdom is coming under fire. Money-ballers, regardless of continent, will tell you that 100 million quid is better spent on several great players instead of one Pogba. This piece takes a similar stance; if one player can make all the difference for a team, I want rigorous empirical evidence using on-the-field production and results.
This analysis uses two different data sets. To track patterns in Europe, I’ve collected information for all teams in six leagues for the 2011/12 through 2015/16 seasons: English and Scottish Premier Leagues, Ligue 1, Serie A, La Liga, and Bundesliga. This data set has each team’s point total at the end of each season, and the total goals and assists each player contributed in that season. I also use ASA’s shot database for the 2011 through 2015 MLS seasons, tracking scored goals as well as expected goals. Results from these data are not comparable to other leagues, but help us understand patterns in the quality of actual events rather than opaque (and possibly fluky) goal totals.
To track how statistical production is distributed across a team’s players, I’m pulling from my demography background and using Theil’s Entropy Index of diversity (or “E”). This index comes from information theory, and is often used by segregation scholars to understand the diversity of an area’s population across ethnic or economic groups. In my analysis, the population is a team’s goals in a season, and the groups are the team’s goal scorers (or assisters). E is smaller for teams where few players account for a large proportion of a team’s goals, and increases as those goals are spread more evenly across players. For example, take this year’s Colorado Rapids and Portland Timbers (as of August 24). Fanendo Adi (12 goals) and Diego Valeri (10 goals) account for about 60% of Timbers goals this season, while the top two Rapids scorers, Shkelzen Gashi (4 goals) and Kevin Doyle (4 goals), make up 30% of their team’s total. According to my formula, the Timbers’ goal E is 1.93, while the Rapids’ goal E is 2.36. Note that these teams both have 12 goal scorers; E also increases with the number of players that have scored at least one goal, reflecting that increased diversity, making it a very useful metric for describing lots of different goal distributions. Message me (@BenInquiring) if you’d like some more information (and lots of nerdiness) about these calculations. I use loess curves to visualize the relationships between the various E scores and teams’ season points totals, and validated my interpretations with linear regression models, which I’ll discuss, but won’t present.
If having the best striker possible bang in goal after goal were the best way to win matches, we’d see it in the plot above. In reality, there doesn’t seem to be any relationship between diversity of goals across scorers and success in a season for the European leagues. A simple linear model of goal E against total points yields no significant relationship, and while a negative relationship (less goal diversity = more wins) appears in a model with goal and assist E together, that effect disappears after the tally of each team’s median goal scorer is included.
In fact, diversity of assists ends up being the real story:
There is an obvious positive relationship between assist E and team success just by using the eyeball metric. In the left half of the distribution, the slope of the line is rather shallow. If you look at the data points, it’s quite rare for a team with an assist E score below 1.75 to end with more than 50 points in a season, but there isn’t much of a return to increased assist E until about 2.0. But then, the slope of the relationship picks up very quickly! Nearly all of the best teams in Europe over the past five seasons had very high diversity in who contributed their assists. There are still some poor teams with a big assist E, indicating that having lots of contributors isn’t a sure path to success, but it certainly seems like an important piece of the puzzle in European leagues. Linear regression models show this relationship to be robust, even when accounting for the median assister on a team. That means diversity of assists matters regardless of how much the middle of the pack produces. My research design doesn’t really allow me to claim this, but I suspect that having many possible chance creators is the “difference” in finding an equalizer or winner, boosting a team’s point total relative to a team with similar goal production, but fewer options in who creates the chances.
But will this pattern hold for MLS, a league so different from the European model? In addition to xG, ASA’s data allow me to remove goals that come from penalties, which likely inflate the goal tallies for a few players. While these results are likely not relevant for other leagues, I have greater confidence in their validity within MLS.
Let’s start with diversity of goals and xG for MLS. Note that the E scores for xG are greater than scored goals because many players contribute to xG without the ball going in the goal:
Visually, it seems like there is a slight negative relationship between goal diversity and total points, particularly for xG. However, it’s not so clear cut. After taking the confidence intervals into account (loess curves are actually a kind of regression model), there is no relationship for goal E; it’s possible to draw a flat line within that curve’s confidence intervals. There is still a slight negative relationship for the xG curve, but this is not robust in the linear regressions I ran. Part of the problem might be a small sample size: there were 440 teams to consider in the European league data, and only 95 here. If this pattern continues for a few more seasons, that might be enough statistical power to confidently say that diversity in xG production is a bad thing in MLS. But for now, there’s no clear effect.
Once again, assists and chance creation are the real story in this analysis. Unlike in Europe, diversity of assists has no real effect on a team’s success in a season. But really unlike Europe, diversity in xA, the underlying creation of scoring opportunities, is very bad for a team’s success! This effect is quite pronounced, even when bowing to confidence intervals. Teams that had E for xA less than 2.6 ended the regular season with an average of 47-48 points at the model’s lowest possible estimate. But by the time teams reach an E of 2.7, the largest estimate is already below that! Teams that have a score of about 2.9 average about 25-37 points a season. That is a big effect, and it remained statistically significant in every model I threw at the data. Takeaway: having a reliable chance creator matters big time in MLS. Without one, you’re not very likely to make the playoffs.
These findings about chance creation are actually the reverse of conventional wisdoms from Europe and the US. In Europe, diverse chance creation is a characteristic of a successful team, while the better teams in MLS tend to rely on one or two playmakers. But if we consider the structural differences in roster construction, these patterns do makes sense. MLS teams have tight budget restrictions, and the Designated Player rule encourages huge inequity in player salaries. If you’re paying one guy ten times the average player’s wages, and his talent fits the price, it makes sense to tactically structure your team around his ability to find other players in dangerous positions. That is a very repeatable path to scoring goals and winning games, especially if opposing defenses are, well, on a budget. European teams don’t have such restrictions, and big clubs can spread their wealth around to arm their attack with numerous talented players, allowing great flexibility in generating their scoring opportunities. I did not expect to find this difference between MLS and Europe, and it marks another way that our league stands out in the global soccer landscape, and suggests that how clubs are governed really does help transform the on-field product.