"Positions" are a lie.

By Benjamin Harrison (@NimajnebKH)

The idea of a player “position” is too inflexible.

We know – as fans – that that there are more than 11 different types of soccer players. We simply assign them titles which match a variety of on field roles, and some of those labels fit better than others. A “defensive” midfielder may also be a holding midfielder, is likely a central midfielder, and could even be a deep-lying playmaker. We may use the more nuanced terminology in a basic narrative description of game play – but there is no standard definition for how those roles might translate into measurable events. Soccer analytics is often left with a set of basic positions to categorize play on the field. These are reflected fairly well in the most basic statistics measured by OPTA. Consider a set of 209 players receiving starts over the 2014 season:   

The raw data here is collected from whoscored.com. Pass attempts per 90’ accordingly excludes crosses and set pieces. “Defensive actions” are all tackles (successful or not) interceptions, clearances, and blocks. Where deemed useful, I used the position selection option from whoscored (this is an extremely useful tool for reasons that will hopefully become evident over the course of this post) to restrict the player to a dataset which fit into an assembled 11-man lineup (only 11 starters- a potential lineup, were chosen from each team). Although positional differences are apparent in the basic biplot, the accumulation of passes and defensive actions also incorporates aspects of style – the pace of play – which vary considerably by team. To remove team context, I summed up the pass and defense rates by team and converted the axes to share of team actions for the 2014 dataset.

We’ll be using the 2015 dataset (raw data collected from whoscored as of April 23rd) through the remainder of this post. These 232 data points have been assembled using a slightly different approach – collecting all player statistics with a cutoff of 270 minutes game time, and normalizing individual numbers to the team average. Players who change positions between games should be expected to blur some position-specific distinctions, but major changes in player role are infrequent enough to be overwhelmed by the general trends. Despite the modest differences in method, the two plots exhibit predictably comparable values – there are a finite number of actions teams can take in a game, and a limited number of general tactical formations used in MLS (and soccer, in general).

The modified plot clarifies how the team uses the particular player as a share of its overall play. When the plot is constrained to a team-specific lineup, it can be a useful tool for visualizing average tactical setup, changes between seasons/games, and tactical adjustments to game state (check out the three links for some handy case studies specific to Seattle Sounders play). Positional differences remain apparent, but considerable overlap persists between categories, and their range implies poorly-matched roles. So long as a “midfielder” can have the same share of team actions as both a striker and a central defender, it remains a poor label. Overly broad player categories force the statistical comparison of different player roles having vastly different circumstantial difficulty (see, for example, this study of players with similar attacking midfield roles to Lamar Neagle). Often, difficult behavior is associated with exactly those aspects of play that lead to team success:

“Chances” are defined here as the sum of all assists, key passes, and shots. Offensive “touches” are the sum of basic passes, cross attempts, and shots. Evaluating player performance based on skill-dependent statistics is dependent upon a thorough assessment of player behavior. We need player typing to be as diverse as on-field roles, and as indifferent to nominal “position” as possible. The statistics used to characterize type should be characteristic of role and as far removed as possible from player quality/skill (e.g., shooting rate should discriminate attacking players, but the ability to generate shots is descriptive of quality, so it is not useful as a role-dependent statistic). Finally, we shouldn't use so many statistics in constructing a model of roles such that the result becomes overfit to specific players or contains redundancy (e.g. including two different types of basic passing rates – say, short passes and long passes – would exaggerate role difference specific to distribution).

For now, with the 2015 dataset, I assessed pass and defense share as described above.Goalkeepers have been excluded (it is interesting to include them in team analysis, but their position label is relatively effective). I also calculated and recorded dribbles/touch (measuring attacking style on the ball) and crosses per touch (wide vs. central play). I then relativized each of these four role indices to its 210-player maximum and performed a hierarchical cluster analysis on the resulting data matrix:  

I chose a position for pruning the tree (dashed line) that identifies 15 discrete player clusters grouped by role similarity by the four indices (this step is arbitrary this time, but will be automated in the future). Alongside each, I’ve roughly characterized the differences picked up in the analysis on a scale of --- (well-below average) to 0 (average) to +++ (well-above). Notice, if we move the cutoff line to the left to define only 3 groups, these would be primary defenders at the top, wide players in the middle, and central attackers at the bottom. Running a principal components analysis on the same dataset, let’s take a look at the differences between nominal position and cluster identity on the two first axes of variation. 

The overlap problem with position is considerably reduced (though not absent) with cluster identity. To be useful, the cluster identities must also exhibit superior discrimination of role difficulty. Short pass accuracy is a skill-dependent statistic, but highly variable depending on situation:  

Here the short pass accuracy by position is compared to that by cluster (cluster 11 is excluded, since it is simply Fabian Castillo – the point guard man who never encountered a ball he didn't want to dribble past an opponent). Many clusters exhibit a substantially tighter range of values than for the position counterparts – remember that these categories have not been defined by any values that explicitly measure skill or quality. Within clusters (or between closely related clusters) players should show similar statistical performance unless otherwise influenced by skill (as shown with the previously linked example concerning Neagle). No matter how well we characterize situational difficulty (e.g. how far from goal a shot is taken, or the direction, location and length of a pass), constraining the performance of peers provides a more complete characterization of expected result.

Providing context for player evaluation is only part of the value of this approach. The performance of individual players is strongly controlled by myriad factors even beyond team and role context. Grouping similar players may allow us to address questions that would be otherwise complicated by sample size. Take, for example, the question of whether any player can be considered to overperform or underperform expected goals.  

If a style-specific skill in finishing exists, the grouping of similar players – with the resulting increase in sample size – might allow its detection more readily than would be the case measuring goal records for an individual player subject to seasonal noise, team context, and age-related development trends. However, the modest differences between xG and G in the data above should probably be considered a vindication of the model, if anything. Attackers with substantially different on-field roles and shot selection still exhibit predicted finishing success. Still, this approach may warrant further testing in the future with more refined role discrimination and a larger dataset.

The four-index model above warrants more work. Some player groups are very effective, but others clearly could benefit from different weighting prior to clustering and/or additional indices. Take, for example, cluster 15 which mainly incorporates central attacking players with fairly average pass share. The cluster also picked up Vancouver CB Pa Modou Kah, who has exhibited abnormally low pass and defense shares for his role so far in 2015. The present dataset may also suffer from limited sample size (any set of a few games may lead to some very unusual game states and corresponding performance). Nevertheless, preliminary work suggests player typing may be a useful analytical tool.

2015 ASA Preview: Vancouver Whitecaps

*xG = expected goals, xA = expected assists, xGD = expected goal differential. For more information see our xGoals by Team page.

By Drew Olsen (@drewjolsen)

For a team that entered 2014 with middling expectations, securing 50 points for the first time in MLS club history and making the playoffs was no small success for the Whitecaps. But this is a team that has finished with between 43 and 50 points each of the last three seasons and been eliminated twice as the 5th seed in the playoffs. Vancouver is beginning to take on the same role Costa Rica occupies in CONCACAF qualifying; both are good teams that can be counted on to pose a challenge to any opponent, but are not contenders to finish near the top of the standings.

To try to change that reputation the team is building a young, talented roster led by 2nd year coach Carl Robinson. It is a roster that is unlikely to win MLS Cup in the next season or two, but has lots of promise for the future. With eight homegrown players 22 or younger plus the addition of Young DP Octavio Rivero, the future looks bright in Vancouver.

Expectations are tempered for 2015 and it will be difficult for the Whitecaps to make the playoffs again in a competitive Western Conference, but that does not mean this season won't be a success. With an average roster age less than 24, this year is likely to be a stepping stone towards eventual success in Vancouver.

Defense

There is plenty to build on from last season, beginning with the Whitecaps' stingy defense. Allowing only 1.17 goals per game last year kept Vancouver in many games, and our expected goals metrics suggest they actually got a bit unlucky by allowing as many as they did. In other words, the quality of this defense was no fluke.

David Ousted was an exactly average keeper last year, and it's unlikely much will change for him in 2015. Jordan Harvey started every game last season, and he will again join Steven Beitashour at fullback. The question mark comes from the center of defense, where last year's starters for much of the year, Johnny Leveron and Andy O'Brien, have both moved on. If the quality on the backline of 2014 is to continue, it will have to come with a new centerback pairing. Kendall Waston looks likely to take one of the starting spots, with newcomers Pa Modou-Kah and Diego Rodriguez fighting for the other starting position. The 34 year old veteran Kah comes from Portland, where he has been in and out of the starting lineup for two seasons. Rodriguez joins from Uruguay, by way of La Liga side Malaga. It is not an understatement to say the Whitecaps' season may depend on the ability of its defense to mesh.

Midfield

Anchored by DPs Pedro Morales and the now officially signed Matias Laba, the midfield will again be one to be reckoned with. Morales' 20.75 xG + xA was 4th in the league last season, and he will continue to be relied on to create for the young attacking corps. Laba isn't afraid to get stuck in, and should provide a valuable bit of protection in front of the new centerbacks.

Russell Teibert returns on the left side after a disappointing 2014. A lot was expected from him after two goals, nine assists, and 35 key passes in 2013, but he managed no goals and just two assists and 24 key passes, despite playing 2000 more minutes last season. Erik Hurtado may end up on the right, and also might compete against the aging Mauro Rosales for playing time. Rosales started the final 10 games of the season after coming over from Chivas USA, but at age 34 he set a career high for most minutes since coming to MLS. Whitecaps mainstay Gershon Koffie will also try to regain a foothold in the midfield after missing the end of last season with injuries.

Forwards

Despite the hype surrounding young strikers Kekuta Manneh (20 years old) and former Rookie of the Year Darren Mattocks (24), scoring proved difficult last season. The 42 goals Vancouver netted were 6th worst in the league and six fewer than any other playoff team. To bolster their attack, Young DP Octavio Rivero was signed from Chile, where he scored 10 goals in only 18 appearances last season. Rivero looks ready to contribute from day one, having scored a brace in his preseason debut.

Prognosis

Vancouver has a very young team that looks to be both fun and frustrating to watch this season. While the attack has been improved, a drop-off in defensive quality is likely. It will be difficult to return to the playoffs in a loaded Western Conference, but if the defense can meld and Rivero can score, the sky is the limit.