Looking past the hot takes: How will the loss of Martins and Dempsey affect the Sounders?

Those of you who read ASA regularly know that I am tentative declare to others that I am a “Sounders fan” because of the bias and label that it automatically associates with me and my analysis. Last night, to me, was a wonderful game played by titanic rivals. Extra time happened and things began to take shape and my thought was regardless of the outcome we would yet again be talking about the epic-ness of when the Timbers and Sounder meet in battle.

Read More

"Positions" are a lie.

By Benjamin Harrison (@NimajnebKH)

The idea of a player “position” is too inflexible.

We know – as fans – that that there are more than 11 different types of soccer players. We simply assign them titles which match a variety of on field roles, and some of those labels fit better than others. A “defensive” midfielder may also be a holding midfielder, is likely a central midfielder, and could even be a deep-lying playmaker. We may use the more nuanced terminology in a basic narrative description of game play – but there is no standard definition for how those roles might translate into measurable events. Soccer analytics is often left with a set of basic positions to categorize play on the field. These are reflected fairly well in the most basic statistics measured by OPTA. Consider a set of 209 players receiving starts over the 2014 season:   

The raw data here is collected from whoscored.com. Pass attempts per 90’ accordingly excludes crosses and set pieces. “Defensive actions” are all tackles (successful or not) interceptions, clearances, and blocks. Where deemed useful, I used the position selection option from whoscored (this is an extremely useful tool for reasons that will hopefully become evident over the course of this post) to restrict the player to a dataset which fit into an assembled 11-man lineup (only 11 starters- a potential lineup, were chosen from each team). Although positional differences are apparent in the basic biplot, the accumulation of passes and defensive actions also incorporates aspects of style – the pace of play – which vary considerably by team. To remove team context, I summed up the pass and defense rates by team and converted the axes to share of team actions for the 2014 dataset.

We’ll be using the 2015 dataset (raw data collected from whoscored as of April 23rd) through the remainder of this post. These 232 data points have been assembled using a slightly different approach – collecting all player statistics with a cutoff of 270 minutes game time, and normalizing individual numbers to the team average. Players who change positions between games should be expected to blur some position-specific distinctions, but major changes in player role are infrequent enough to be overwhelmed by the general trends. Despite the modest differences in method, the two plots exhibit predictably comparable values – there are a finite number of actions teams can take in a game, and a limited number of general tactical formations used in MLS (and soccer, in general).

The modified plot clarifies how the team uses the particular player as a share of its overall play. When the plot is constrained to a team-specific lineup, it can be a useful tool for visualizing average tactical setup, changes between seasons/games, and tactical adjustments to game state (check out the three links for some handy case studies specific to Seattle Sounders play). Positional differences remain apparent, but considerable overlap persists between categories, and their range implies poorly-matched roles. So long as a “midfielder” can have the same share of team actions as both a striker and a central defender, it remains a poor label. Overly broad player categories force the statistical comparison of different player roles having vastly different circumstantial difficulty (see, for example, this study of players with similar attacking midfield roles to Lamar Neagle). Often, difficult behavior is associated with exactly those aspects of play that lead to team success:

“Chances” are defined here as the sum of all assists, key passes, and shots. Offensive “touches” are the sum of basic passes, cross attempts, and shots. Evaluating player performance based on skill-dependent statistics is dependent upon a thorough assessment of player behavior. We need player typing to be as diverse as on-field roles, and as indifferent to nominal “position” as possible. The statistics used to characterize type should be characteristic of role and as far removed as possible from player quality/skill (e.g., shooting rate should discriminate attacking players, but the ability to generate shots is descriptive of quality, so it is not useful as a role-dependent statistic). Finally, we shouldn't use so many statistics in constructing a model of roles such that the result becomes overfit to specific players or contains redundancy (e.g. including two different types of basic passing rates – say, short passes and long passes – would exaggerate role difference specific to distribution).

For now, with the 2015 dataset, I assessed pass and defense share as described above.Goalkeepers have been excluded (it is interesting to include them in team analysis, but their position label is relatively effective). I also calculated and recorded dribbles/touch (measuring attacking style on the ball) and crosses per touch (wide vs. central play). I then relativized each of these four role indices to its 210-player maximum and performed a hierarchical cluster analysis on the resulting data matrix:  

I chose a position for pruning the tree (dashed line) that identifies 15 discrete player clusters grouped by role similarity by the four indices (this step is arbitrary this time, but will be automated in the future). Alongside each, I’ve roughly characterized the differences picked up in the analysis on a scale of --- (well-below average) to 0 (average) to +++ (well-above). Notice, if we move the cutoff line to the left to define only 3 groups, these would be primary defenders at the top, wide players in the middle, and central attackers at the bottom. Running a principal components analysis on the same dataset, let’s take a look at the differences between nominal position and cluster identity on the two first axes of variation. 

The overlap problem with position is considerably reduced (though not absent) with cluster identity. To be useful, the cluster identities must also exhibit superior discrimination of role difficulty. Short pass accuracy is a skill-dependent statistic, but highly variable depending on situation:  

Here the short pass accuracy by position is compared to that by cluster (cluster 11 is excluded, since it is simply Fabian Castillo – the point guard man who never encountered a ball he didn't want to dribble past an opponent). Many clusters exhibit a substantially tighter range of values than for the position counterparts – remember that these categories have not been defined by any values that explicitly measure skill or quality. Within clusters (or between closely related clusters) players should show similar statistical performance unless otherwise influenced by skill (as shown with the previously linked example concerning Neagle). No matter how well we characterize situational difficulty (e.g. how far from goal a shot is taken, or the direction, location and length of a pass), constraining the performance of peers provides a more complete characterization of expected result.

Providing context for player evaluation is only part of the value of this approach. The performance of individual players is strongly controlled by myriad factors even beyond team and role context. Grouping similar players may allow us to address questions that would be otherwise complicated by sample size. Take, for example, the question of whether any player can be considered to overperform or underperform expected goals.  

If a style-specific skill in finishing exists, the grouping of similar players – with the resulting increase in sample size – might allow its detection more readily than would be the case measuring goal records for an individual player subject to seasonal noise, team context, and age-related development trends. However, the modest differences between xG and G in the data above should probably be considered a vindication of the model, if anything. Attackers with substantially different on-field roles and shot selection still exhibit predicted finishing success. Still, this approach may warrant further testing in the future with more refined role discrimination and a larger dataset.

The four-index model above warrants more work. Some player groups are very effective, but others clearly could benefit from different weighting prior to clustering and/or additional indices. Take, for example, cluster 15 which mainly incorporates central attacking players with fairly average pass share. The cluster also picked up Vancouver CB Pa Modou Kah, who has exhibited abnormally low pass and defense shares for his role so far in 2015. The present dataset may also suffer from limited sample size (any set of a few games may lead to some very unusual game states and corresponding performance). Nevertheless, preliminary work suggests player typing may be a useful analytical tool.

GAME OF THE WEEK: Los Angeles VS. Seattle

Since the weekend was filled with barbecues, families, and time away from the pseudo grind of the world, we decided to skip out on our weekly podcast. But we all love our "Game of the Week" contest so much that we decided to still preview tonight’s game of the week between Seattle and LA. This is what we do for you, America. This is our service.

DREW:

The LA Galaxy are playing a soccer game? ESPN, you know what to do... broadcast it at a time when everyone East of Utah will be asleep! After last week's Galaxy v Red Bulls snoozefest took 89 minutes for anything to happen, ESPN has decided to go double or nothing and show the slumping Galaxy against a Seattle team on a roll. It has largely been because Lamar Neagle (no, seriously) has either found out how to use those neon jerseys to blind defenders, or finally decided he's an MLS quality striker. After Seattle started the season unable to score goals, the Sounders are now getting them in bunches. Or as I like to put it: they're regressing to the mean with a vengeance!

As for the Galaxy, their dependence on Juninho was exposed last week after a hard tackle from his namesake forced him to leave the game early. Los Angeles never got back into sync with him off the field, and New York dominated the rest of the game. As of this writing, his status is still up in the air, but If the Galaxy are going to keep Ozzie Alonso in check they'll need Juninho to keep him occupied. Should Garcia (or anyone else) get the start in Juninho's place, then Alonso will get more forward than he otherwise would, freeing up Neagle, Martins, and family to attack the net. Couple that with the fact that Carlo Cudicini has looked as good in goal this season as Jimmy Nielsen looks in jorts, and the Galaxy could be in for a hurtin'.
All that said, the Galaxy have been very solid at home this season, and the Fishing Village to the North have found their scoring touch at home, but still struggle to get goals on the road. My prediction: if Juninho plays, the Galaxy will pull this out 2-1. If not, it will be a 1-1 draw.
MATTHIAS:
The Sounders have come on strong recently, recording 13 points in their last five matches. I checked for a recent dip in Seattle's strength of schedule, but there was no such dip to be found. Seattle has played three of those last five matches on the road, including a win on the road in Kansas City and a win at home over Dallas.
The Sounders' win at Colorado shouldn't be overlooked either. I will be coming out with a strength of schedule index soon, but my beta version* suggests that the Rapids have played the toughest schedule to this point (along with New England). That's not to mention that, as the away team, Seattle was giving up an estimated third-of-a-goal in an uphill battle. Impressive stuff.
But after saying all those wonderful things about Seattle, my three points this week go to the Galaxy in a one-goal victory. Though the Sounders find themselves second in the tables in goal differential, they are second to the Galaxy. Though the Sounders have an impressive 1.21 Shots-on-goal Ratio, the Galaxy have outdone them again at 1.37. Though the Sounders' strength of schedule has been difficult recently, over the course of the season it's the Galaxy that have faced seemingly tougher opponents. The final nail in the coffin is that the game will be played in Los Angeles, and that third-of-a-goal advantage will lie with the Galaxy. LA drew more than 20,000 fans to its last home match on May 5th, and you can bet they'll show up for the red hot Sounders.
 
*Strength of schedule is currently based on opponents' goal differentials and shots-on-goal ratios.
HARRISON:
I write a lot about the Sounders over the course of the week so let me make this simple. They were taking shots; at first they weren't going in and then, recently, they started all going in. Somewhere in between these two truths lies the median of this organization. They aren't as good or as lucky as what they've been in cumulatively over the past 2 1/2 weeks. But they certainly weren't as bad as what they were to start the season. It's a bit difficult to gauge the true talent level of this squad because of how frequent these parts are moving about.
Unfortunately, for the Sounders, Ozzie Alonso is suffering from a groin strain that will probably prevent him from making an appearance and Steve Zakuani is still not able to go this weekend. Which will force the Sounders to work with an inopportune 18 and even a less-conducive starting XI. This isn't something new to them this year, but I imagine that it's still going to be tough for them to deal with due to how Los Angeles works the ball through the middle of the field with Marcelo Sarvas.
However, the Galaxy are also dealing with injuries to their central midfield---specifically with Juninho who, as Drew mentioned above, was taken out ironically enough by a rough tackle from New York's opposing Juninho. Los Angeles uses an assortment of means to move the ball up the pitch. They average more shots than their opponents, more possessions and longer ones by the standard of TFS. Despite that, they've managed an impressive 17 points in 11 games and are still considered one of the more unlucky teams in all the league.
Adding to their attack the athletic Robbie Rogers and a Landon Donovan---who has something to prove to Jurgen Klinsmann---and all of a sudden you have a club that is very dangerous and probably one of the better ones in the league. Add that to the likelihood of the Sounders shot-to-goal ratio coming back to earth and the absence of Ozzie Alonso, and you end up with a very likely Galaxy win at home. I don't think it's going to be anywhere a long the lines of the Sounders defeat from the playoffs, but a 2-1 victory wouldn't surprise me.
----
Current Standings (as best as I can remember them):
Drew 0 - 3 ; Prediction: LA (if Juninho plays)
Matthias 2 - 3 ; Prediction: LA
Harrison 1 - 4 ; Prediction: LA