By Benjamin Bellman (@beninquiring)
Analyzing Foreign Transfers to MLS with Survival Analysis
Finding success in the global transfer market is a critical part of a winning MLS season. While the U.S. is obviously progressing as a soccer nation, there just aren’t enough talented players from this country to satisfy its growing thirst for exciting play and trophies. Still, signing players that are unfamiliar with MLS has also been a risky prospect. Foreign signings with poor outcomes can waste valuable international roster spots, and destroy a team’s budget in a relatively low-spending league. However, Atlanta United and LAFC have shown that signing players who can contribute immediately can make an expansion team an immediate contender. Understanding the historical structural patterns in the success and failure of past MLS teams could offer insights into navigating these decisions, for both future teams and those who need players for 2019 and beyond. Obviously, the most important considerations for signing a player have to do with style of play and personality, but understanding general hotspots and mistakes could improve the odds of signing foreign players who succeed in MLS.
I’ve never seen this sort of analysis before, so deciding on data and metrics posed some challenges. ASA has shot-based metrics back to 2011, but I wanted to use a metric for success that I could apply fairly across positions and seasons. I eventually decided on the duration of a player’s spell with their initial MLS club. It’s not a sure-fire measure of success or production, but it’s reasonable to assume that foreign players who stuck around a team for multiple years were welcome contributors to the team in some tangible way. Players with short spells can be successful signings, but finding long-term players can also be a goal in its own right, especially for culture building during transition periods.
Data and Methods
As a social scientist, I decided to focus on demographic information about players and where they previously played. These aspects of players’ careers are connected to powerful social processes like the life course and geopolitics, not to mention underlying differences in play style across leagues and countries. While analytics has begun to unveil the mechanics of soccer as a game, there’s not much discussion about how social science can help put together a winning team (beyond sports psychology).
I scraped around and cobbled together an incomplete hobbyist’s record of non-U.S. nationals that transferred into MLS from other professional leagues since its debut in 1996 through 2018, totaling 740 players. This information includes where the player transferred from, their nationality, age at transfer date, and the duration between their arrival to and departure from squad in days (retirement, sale, trade, expired contract, expired loan, loaned away, etc.). I also wanted to look at salary information for the first year of a signing’s contract. While this number doesn’t include any additional transfer fees, there’s likely to be strong correlation between the two, and salary is a good indicator of how much potential a front office sees in a player, or their expected level of contribution. I raided ASA’s collection of salary information from the MLSPU through 2017, merging by player name and using their earliest year’s base salary (not eventual guaranteed compensation), resulting in 434 players with matched salary info.
After deciding on this data structure, I grabbed my demographer’s toolkit and broke out some time-to-event models. Duration data can’t be reliably analyzed with OLS regression or similar models; their values don’t follow a Gaussian distribution, and are subject to problems that require specific assumptions and techniques. For example, some players in the data had yet to leave their first MLS club in 2018, introducing “right-censoring” into the data, meaning that their event hasn’t been observed. There’s little hope of finding statistical power in such a small sample covering all ages from dozens of leagues and countries, so I mostly use Kaplan-Meier survival curves to describe the empirical patterns across categories in the data. I also explored the data with Cox proportional hazard models, and visualize associations with time and salary using projections.
I first decided to see if any patterns exist in MLS across the type of player being acquired. Is it a better bet to get a forward over a defender on the global transfer market? To simplify the graph and avoid semantic debates, I put all the players in four groups: goalkeepers, defenders (including wing-backs), midfielders (defensive and attacking), and attackers (forwards and wingers).
There are some interesting patterns here. By 750 days (about two years), all four position groups have about the same probability of remaining on their first MLS team. But that survival probability has varied widely within those first two years. Goalkeepers have been the fastest position to be moved, generally followed by attackers, then midfielders, then defenders. This difference can be quite large, with only half of foreign goalkeepers remaining after a season, compared to 60-70% of foreign defenders remaining.
Interpreting these patterns is a separate challenge. Because of MLS’s combination of salary cap, allocation budget, and international slots, club suits have the think about overall value in addition to total on-field performance. This means balancing the additional cost of international players with the available domestic talent, and finding the a cost-benefit balance. For example, it’s likely that the overall talent of foreign keepers has nothing to do with their shorter survival rates on MLS rosters. Goalkeepers have always been the chief American soccer export, so the increased value of American keepers given their skill and domestic status probably contributes to the shorter spells of their international counterparts. I was actually surprised to find that defenders seem to have an edge over all other positions in terms of survival rates. Americans are frequently ranked among the best defenders in this league, but it’s possible that, on average, American MLS defenders are not considered good enough value to justify playing over a somewhat-better foreign player.
League and National Eligibility
My next step was to analyze foreign transfers according to previous league and their personal nationality. This would help teams identify positive structural environments that make a successful transfer more likely net of other considerations. Let’s say you really need a center back, and have identified two players with attributes that fit your style of play. Player A is English, playing in the Championship, and Player B is French, playing in Liga MX. How have similar transfer targets worked out in past MLS seasons?
Again, there haven’t been enough transfers for such specific cases to lend any statistical power for a multivariate model, but we can still sketch out historical trends with Kaplan-Meier curves. To keep the trends meaningful and not clutter the viz, I’m only looking at the then most common leagues and nationalities from my data.
Most of the league trends are indistinguishable from each other, especially within the first two years after signing a contract. However, I can see some notable patterns in the historical data:
Foreign players signed from USL have historically had very poor short-term outlooks in MLS. A quarter of them are gone within 100 days of signing a contract, and in general, these players leave MLS at the fastest rates within the first two years of a contract.
Players from the top Swedish league seem to have a slight short-term advantage for staying in a squad, though their survival curve catches up to the pack during a player’s second season. However, it’s the only league where all players left within the five-year plot limit, with a maximum spell of around 1,100 days.
After about two seasons, Premier League players leave their first MLS team at the fastest rates.
Players transferred from the top Dutch, Mexican, Colombian, and Argentine leagues had the longest spells of players from these ten most common leagues (the last Eredivisie record before my five-year x-plot limit was at about 1,000 days with a 0.2 survival rate).
There are fewer obvious trends when analyzing player nationalities. Mexican players seem to be moved fastest within the first year, but that curve is absorbed by the pack by year two. The only reals pattern are the sharp drop-off of Spanish players after 700 days, and the surprising mid-term longevity of French players. I can’t think of a reason for this pattern; I think about someone like Sébastien Le Toux, who played in MLS for close to a decade, but I can’t think of structural mechanisms that would lead to French players having an advantage in MLS over the course of 1-3 seasons. They can’t all be Thierry Henry!
Substantively, I’d argue that a player’s nationality should be a much smaller consideration than choosing which league to target and scout. Leagues are defined by demographic trends and compositions shaped by the financial imbalances of the global soccer market. MLS teams are more likely to buy older players from Europe, and younger players from Latin America. Part of this is simply the demographic composition of each league, but demographics, especially age and development potential, define how teams and players navigate the market and make decisions. As a demographer and population geographer, I think there are piles of demographic puzzles when identifying, developing, and acquiring talent that teams can solve and maximize to create a competitive edge, especially in a value-driven league like MLS.
Foreign Transfers Over The Years
I tried to get transfers from all eras of MLS’s history, and while a historical analysis isn’t necessary important for finding and edge in 2019, it can help us understand how the league has interfaced with the global market, and where they relationship is going. Year as a numeric variable was a significant predictor of durations using the Cox model, so I’m using projected survival curves based on the average player is they were signed in each year.
There’s a clear pattern that players signed more recently have, on average, say at their initial team longer. Based on stories of how international players helped put MLS on the map, I also tested a quadratic relationship with year signed. It turned out significant, but based on this descriptive plot of projected median durations based on year, interpretation is difficult due to few (or missing) early-era transfers.
To help with this interpretation problem, I limited the data to the 2005 season and after. I also decided to evaluate this association between durations and seasons with another correlated variable: the increasing salaries of MLS in general. As it turns out, including salary in the Cox model wipes out the independent association with year, suggesting that increased salaries are driving the lengthening of foreign players’ spells.
Now, it’s impossible to know exactly why this relationship exists, and it’s likely that multiple processes are happening here. First, better players are more expensive, and if you sign better players, you want to keep them longer. It also may be that higher salaries signal higher expectations of potential, and that clubs are willing to give players longer to meet those expectations, even to the point of breaching the sunk cost fallacy (#PlayYourKids advocates understand this process well). And finally, giving out larger contracts could make it more difficult to move disappointing players, since that increases the cost for another club to sign them away from your club.
Based on the recent increases in foreign transfer salaries and durations, MLS teams expect these players to be major contributors. Despite the growth of the homegrown and academy movements, I think everyone should expect this trend to continue as league revenues and viewership continue to grow. MLS teams are still buying expensive stars like Zlatan and Wayne Rooney, but the signings of younger, somewhat expensive players with the expectation to develop and sell them on have increased (Almiron, Barco, Kaku, Rossi, Horta, etc.). It seems unlikely that teams would move these kinds of players after a single season, even if it’s somewhat disappointing (See: Barco, Horta). However, teams should be cautious that these investments don’t stop them from doing what’s best for the team, that playing an expensive foreign player isn’t always the best way to find wins.
Beyond this trend, it’s hard to prescribe any specific actions based on these findings. Instead, I think these kinds of analyses and discussions should inform the more detailed concerns about the kinds of players clubs want to sign, and how best to achieve those goals. Consider this tweet I made a few days ago:
There may be bias from missing data in this “sample”, but the odds of drawing four Postobon players with high production value and none with low production value are slim, given the spread for other leagues. I think there’s something structural in the market at play here: Colombia is a hotspot for developing attacking talent that can contribute immediately in MLS, but it’s undervalued by European buying clubs compared to Argentina and Brazil, meaning that there’s room for lower bids as players try to increase their exposure. Detailed data about demographic and transfer trends about other countries and leagues are also critical to this kind of thinking. There may be a trend happening in a league that MLS teams are unfamiliar with, which could be identified and exploited by a cavalier club. I hope this piece sparks some public discussion about social and economic structures that the global soccer industry is embedded in, and how these kinds of analysis can improve decision making when identifying and acquiring talented players at both the club and academy levels.