MLS Roster Rules: The Thorough Examination of A Discovery Claim

By Harrison Crow (@harrison_crow)

Two nights ago we burned some midnight oil and recorded our latest podcast talking about changes being made with the release of the new MLS roster rules. Leave it to Bruce Arena to act on these things the very next day.

Ives Galarcep and Goal.com broke down the story and provided most pertinent details as former US youth international and“West Ham” (mostly seeing time on the reserve squad) midfielder, Sebastian Lletget, is “set to sign” with LA Galaxy. Additionally, if the article is correct, the New England Revolution had a discovery claim on him and will receive $50,000 in allocation compensation as a result.

You may be asking yourself “why does this matter?”, and it's a good question. I think it’s important to highlight a few interesting pieces of information as a result of this maneuver we now understand a bit better.

1. Bruce Arena continues to use the mechanisms at his disposal.

First understand there is only one mechanism at play. Lletget, despite being a former member of the US U-23’s, wasn't acknowledged with the allocation roster, making the attacking midfielder free game for discovery lists, which is where this business with New England enters. The Revolution received a $50,000 allocation because they submitted their bingo card discovery list correctly (I can’t imagine what they’d do with another attacking midfielder).

This isn't anywhere close to New York giving up Eric Alexander and Ambroise Oyongo for the rights to Sascha Kljestan and the reclamation project Felipe Martins (seriously, what a freakin’ coup). We don’t’ know what percent of their allocation budget LA gave up for him, but $50,000 in allocation relative to the total budget or salary cap isn't much. When you consider the minimum youth contract is now $60,000 for a roster spot of player 18-25 it further puts things in positive perspective.

Additionally, we now have some sort of idea how these discovery claims will work too. It’s not as if we didn't have an idea of how this worked before, but it brings more transparency to the process and how it might continue to work in the future relative to these types of situations.

The questions continually ringing in my head are as follows; does this mean LA Galaxy could only sign him because he was on their discovery list? Did they have had to submit a discovery claim on him before signing him? Lastly, if LA used a discovery claim on him does mean they only have six left or do they now gain back a spot Lletget once occupied on their claim list?

These are mostly all questions which are not helpful and ones having little real bearing in the long run. We have no idea of how to answer any of them and we’ll probably never really get straight answers on it but there are interesting thoughts that, if patient, we may see indications in the future.

2. New England randomly gains an additional $50,000 in allocation

As I already mentioned, New England gets paid for pretty much just putting the right guy on the list. Unless they think this off-season they’re going to get reasonable bid offers for Kelyn Rowe or they aren't willing to reward Lee Nguyen with a designated player contract (yep, I believe it’s going to be a thing), then Lletget probably wasn’t going to be of any use to them. They also still have Steve Neumann (picked fourth overall in 2014 MLS Superdraft) who has largely been an afterthought during his short MLS tenure and could also potentially fit into an attacking midfield role.

LA got a good deal, dispensing little in compensation compared to what other teams have surrendered in the past. This wasn't completely one sided as New England also benefited on the fact it had little immediate use for the attacking midfielder and have time to replace him on their discovery lists while earning a bit of monopoly money in the process.

3. There are still US players on the market not listed in the allocation roster.

During the podcast we referenced the subjective nature involved in identifying those on the allocation roster and those who were not mentioned. Three that did not make the list are Alfredo Morales, Will Packwood and Zarek Valentine.

Morales still with FC Ingolstadt 04 in German Division II soccer and could be an interesting snag for a few different MLS teams that need a boast to their central midfield or depth out wide.

Packwood is currently training with New England and could sign soon. The $50,000 discovery claim allocation payment for Lletget should come close to covering the majority of his salary which just makes things seem that much better for New England.

Valentin is an interesting one as he left MLS at the end of his Generation Adidas contract for Norway, but since joining FK Bodø/Glimt it would seem that he’s been challenged for time on the pitch. Mind you I don’t know a lot about what’s going on in Norway and some of that might be due to injuries but he could be sly candidate for a return to MLS.

 

When to park the bus in MLS

By Kevin Minkus (@kevinminkus)

Should teams park the bus? When?

Goals change games. Garry Gelade recently wrote two excellent pieces on this phenomenon (found here and here). One of his key findings is that teams that are down a goal increase their shooting rates to try to make up the deficit, while teams that are ahead take fewer shots. The thinking goes that teams that are ahead can afford to let up on the attack in order to better maintain defensive shape, and thus give up fewer high quality chances to their opponents. In other words, they park the bus. Whether this is a sound strategy remains an open question, and, if it is, how early is too early to do it?

As an example, here is what the 2014 Crew looked like in terms of shots when behind, tied, and ahead (hat tip to Garry, once again, for the excellent way to visualize this):

Let me know on twitter if you'd like to see a different team's graph for any season from 2011 to 2015 - @kevinminkus.

As you can see, Columbus shot less frequently when in the lead. This is a pretty typical trend.

Using logistic regression, we can evaluate the effect of shots and shot quality on a leading team's chances of conceding the next goal. The model I've built, like Garry's, breaks down a game into a sequence of game states. The game begins at 0-0, and each time a goal is scored, a new game state segment begins. My model takes as inputs the number of shots the leading team takes, and the average quality of those shots (using the site's expected goals model) during a segment. It then outputs the probability of that team conceding the next goal.

In general, teams that shoot more are less likely to concede the next goal in a game. Teams that take better shots are also less likely to concede the next goal. If we include only situations where a team is up by one goal, the same results hold. However, if we only look at time frames towards the end of games and where teams are up by one goal- situations where parking the bus would be appropriate- things change. 

To examine the problem this way, I've built separate models using data filtered by when each segment begins. I've filtered the data this way since I'm hoping to answer the question of when a team should start to go into a defensive shell. Using the start time of the segment, I think, is a good though not perfect proxy for this. For example, then, to see whether parking the bus is a good tactic up a goal after 70 minutes, the model is built using data from game segments which begin on or after the 70 minute mark. Note that as a point of interest I've also included whether the leading team is home or away as a variable in the model.

The chart below shows the minute mark I've filtered by, and whether each of the three variables for the leading team- shots, shot quality, and venue (home or away)- has a statistically significant effect on whether that team concedes a goal. 

Essentially what this shows is after the 63rd minute, taking more shots no longer decreases a leading team's chances of giving up a goal. If a team is looking to see the scoreline out, this would be the time to implement a tactical change by withdrawing into a defensive shell. It still makes sense, however, to take high quality chances as they come, at least until about the 69th minute.

It's also interesting to note that in close games in the second half, being home or away doesn't really help prevent conceding. This appears to be evidence against teams playing differently up one late at home versus up one late away.

If, instead of holding on to the scoreline, a team's goal is to put the game away by scoring an insurance goal, that can be modeled, too. For the chart below I've built logistic regression models for each minute mark, using the same variables. The output now, though, is the probability the leading team scores the next goal.

The models suggest taking more shots increases a leading team's odds of being next to score until the 71st minute, while taking high quality shots increases a leading team's odds of being next to score until the 77th minute. So, if a team wants one more goal, taking more shots will help until about 19 minutes remaining, and taking high quality shots will help until about 13 minutes remaining.

There's definitely more work to be done in this area. One next step would be to directly evaluate the trade-off between seeing out the score and trying to put the game away by scoring once more.

This analysis also certainly isn't definitive. I've approached the problem this one way, I'm sure there are some flaws with this approach, and I'd love to hear about them, and see other ways to tackle it.

"Positions" are a lie.

By Benjamin Harrison (@NimajnebKH)

The idea of a player “position” is too inflexible.

We know – as fans – that that there are more than 11 different types of soccer players. We simply assign them titles which match a variety of on field roles, and some of those labels fit better than others. A “defensive” midfielder may also be a holding midfielder, is likely a central midfielder, and could even be a deep-lying playmaker. We may use the more nuanced terminology in a basic narrative description of game play – but there is no standard definition for how those roles might translate into measurable events. Soccer analytics is often left with a set of basic positions to categorize play on the field. These are reflected fairly well in the most basic statistics measured by OPTA. Consider a set of 209 players receiving starts over the 2014 season:   

The raw data here is collected from whoscored.com. Pass attempts per 90’ accordingly excludes crosses and set pieces. “Defensive actions” are all tackles (successful or not) interceptions, clearances, and blocks. Where deemed useful, I used the position selection option from whoscored (this is an extremely useful tool for reasons that will hopefully become evident over the course of this post) to restrict the player to a dataset which fit into an assembled 11-man lineup (only 11 starters- a potential lineup, were chosen from each team). Although positional differences are apparent in the basic biplot, the accumulation of passes and defensive actions also incorporates aspects of style – the pace of play – which vary considerably by team. To remove team context, I summed up the pass and defense rates by team and converted the axes to share of team actions for the 2014 dataset.

We’ll be using the 2015 dataset (raw data collected from whoscored as of April 23rd) through the remainder of this post. These 232 data points have been assembled using a slightly different approach – collecting all player statistics with a cutoff of 270 minutes game time, and normalizing individual numbers to the team average. Players who change positions between games should be expected to blur some position-specific distinctions, but major changes in player role are infrequent enough to be overwhelmed by the general trends. Despite the modest differences in method, the two plots exhibit predictably comparable values – there are a finite number of actions teams can take in a game, and a limited number of general tactical formations used in MLS (and soccer, in general).

The modified plot clarifies how the team uses the particular player as a share of its overall play. When the plot is constrained to a team-specific lineup, it can be a useful tool for visualizing average tactical setup, changes between seasons/games, and tactical adjustments to game state (check out the three links for some handy case studies specific to Seattle Sounders play). Positional differences remain apparent, but considerable overlap persists between categories, and their range implies poorly-matched roles. So long as a “midfielder” can have the same share of team actions as both a striker and a central defender, it remains a poor label. Overly broad player categories force the statistical comparison of different player roles having vastly different circumstantial difficulty (see, for example, this study of players with similar attacking midfield roles to Lamar Neagle). Often, difficult behavior is associated with exactly those aspects of play that lead to team success:

“Chances” are defined here as the sum of all assists, key passes, and shots. Offensive “touches” are the sum of basic passes, cross attempts, and shots. Evaluating player performance based on skill-dependent statistics is dependent upon a thorough assessment of player behavior. We need player typing to be as diverse as on-field roles, and as indifferent to nominal “position” as possible. The statistics used to characterize type should be characteristic of role and as far removed as possible from player quality/skill (e.g., shooting rate should discriminate attacking players, but the ability to generate shots is descriptive of quality, so it is not useful as a role-dependent statistic). Finally, we shouldn't use so many statistics in constructing a model of roles such that the result becomes overfit to specific players or contains redundancy (e.g. including two different types of basic passing rates – say, short passes and long passes – would exaggerate role difference specific to distribution).

For now, with the 2015 dataset, I assessed pass and defense share as described above.Goalkeepers have been excluded (it is interesting to include them in team analysis, but their position label is relatively effective). I also calculated and recorded dribbles/touch (measuring attacking style on the ball) and crosses per touch (wide vs. central play). I then relativized each of these four role indices to its 210-player maximum and performed a hierarchical cluster analysis on the resulting data matrix:  

I chose a position for pruning the tree (dashed line) that identifies 15 discrete player clusters grouped by role similarity by the four indices (this step is arbitrary this time, but will be automated in the future). Alongside each, I’ve roughly characterized the differences picked up in the analysis on a scale of --- (well-below average) to 0 (average) to +++ (well-above). Notice, if we move the cutoff line to the left to define only 3 groups, these would be primary defenders at the top, wide players in the middle, and central attackers at the bottom. Running a principal components analysis on the same dataset, let’s take a look at the differences between nominal position and cluster identity on the two first axes of variation. 

The overlap problem with position is considerably reduced (though not absent) with cluster identity. To be useful, the cluster identities must also exhibit superior discrimination of role difficulty. Short pass accuracy is a skill-dependent statistic, but highly variable depending on situation:  

Here the short pass accuracy by position is compared to that by cluster (cluster 11 is excluded, since it is simply Fabian Castillo – the point guard man who never encountered a ball he didn't want to dribble past an opponent). Many clusters exhibit a substantially tighter range of values than for the position counterparts – remember that these categories have not been defined by any values that explicitly measure skill or quality. Within clusters (or between closely related clusters) players should show similar statistical performance unless otherwise influenced by skill (as shown with the previously linked example concerning Neagle). No matter how well we characterize situational difficulty (e.g. how far from goal a shot is taken, or the direction, location and length of a pass), constraining the performance of peers provides a more complete characterization of expected result.

Providing context for player evaluation is only part of the value of this approach. The performance of individual players is strongly controlled by myriad factors even beyond team and role context. Grouping similar players may allow us to address questions that would be otherwise complicated by sample size. Take, for example, the question of whether any player can be considered to overperform or underperform expected goals.  

If a style-specific skill in finishing exists, the grouping of similar players – with the resulting increase in sample size – might allow its detection more readily than would be the case measuring goal records for an individual player subject to seasonal noise, team context, and age-related development trends. However, the modest differences between xG and G in the data above should probably be considered a vindication of the model, if anything. Attackers with substantially different on-field roles and shot selection still exhibit predicted finishing success. Still, this approach may warrant further testing in the future with more refined role discrimination and a larger dataset.

The four-index model above warrants more work. Some player groups are very effective, but others clearly could benefit from different weighting prior to clustering and/or additional indices. Take, for example, cluster 15 which mainly incorporates central attacking players with fairly average pass share. The cluster also picked up Vancouver CB Pa Modou Kah, who has exhibited abnormally low pass and defense shares for his role so far in 2015. The present dataset may also suffer from limited sample size (any set of a few games may lead to some very unusual game states and corresponding performance). Nevertheless, preliminary work suggests player typing may be a useful analytical tool.

The Weekend Kick-off: Texas Two Step

by Harrison Crow (@Harrison_Crow)

If there is one thing that we know about sports it's simply that familiarity breeds hate. Classy line... and one that I had to steal because this introduction was, for some absurd reason, killing me. Face it, Houston moving back to the Western Conference this past year probably excited a lot of fans as it could mean a more prominent and possibly resurgent Texas Derby with both clubs meeting more often than once a year.

I feel as if the most quoted thing in connection with the Dynamo is how soon can they get Erik 'Cubo' Torres. I don't want to exaggerate and call them a terrible team, but they haven't had a real good showing of late. Either their defense is terrible, their attack is anemic, or it's some gross combination of the two. The sick thing about this is that our numbers actually indicate that they might actually end up being the better team.

Okay, Mr. Snooty. You can point to the current standings and wag your finger at FC Dallas but indulge me for a moment. Forget about Houston being tied for seventh place in points per game; they have thus far been the inverse of FC Dallas, with a smidge more than an expected goal per match and just less than one expected goal against. This presents the possibility, despite the disparity in the standings, that these two teams are a lot closer than many would readily admit.

I think it's fair to suspect FC Dallas might go on a downward spiral at some point in their 2015 campaign. Not because they're "Dallas" and thus making it something easy to call, but it has to do with the amount of shots their surrendering, the leverage index of those shots and the fact that they are who they are. Also, Dallas becomes unbearable in summer time this according to my own personal research and experience of it being "hot as balls" when I visited.

That being said, Dallas has a great quartet of Mauro Diaz, Tesho Akindele, Blas Perez and Fabian Castillo. While this group has been described with excess hyperbole by many early in the season, it's still a very good grouping of talent that can hurt you very quickly and through multiple delivery methods.

Michele and Diaz are both gifted at delivering from dead balls and set pieces, Castillo and Akindele have tons of physical gifts mixed with fun technical abilities that make watching highlights a joy. Blas Perez is a brute that wins balls in the air and is excellent back to goal. Let's not attempt to convince ourselves that this attack is not going to get better at some point.

I think this game boils down to which team can find the right mix of shots and leverage opportunity. Will Texas finally start taking more regular attempts as they get those opportunities presented or will they squander them looking for the best chance that might not come?

Likewise I think Houston needs to use their creativity to find shots that aren't just shots added to a tally but are meaningful in the way that might increase the probability in their favor.

FANTASY PERSPECTIVE

HOUSTON DYNAMO

Tyler Deric (Selected 17.8% , Cost $5.0)
Surprisingly enough Deric has been a top-three keeper in MLS according to our G -xG rankings. Houston's shots allowed gives credence to the idea that he might just be able to sustain this.

DeMarcus Beasley (Selected 14.9% , Cost $7.1)
Possibly one the best all around fullbacks in MLS and right now the best fantasy full back fake money can buy. The question is have to ask yourself is do you value full backs on defense over centerbacks that have dominated the season thus far?

FC DALLAS

Chris Seitz (Selected 20.8%, Cost $4.9)
A solid keeper in his own right, Seitz ownership mostly spawns from the three clean sheets in the first four matches of the season. But the Dallas defense is allowing a lot of shots which kind of limits his long term value.

Ryan Hollingshead (Selected 20.8%, Cost $5.4)
The injuries sustained by Mauro Diaz directly related to the early minutes which Hollingshead received. His cost is reasonable but with the return of Diaz it's a legit question of how many minutes he's going to regularly see.

THE WEEKEND MATCH-UPS

(expected goal differential in even game-states)

FRIDAY

Dallas FC (0.05) @ Houston Dynamo (-0.10)
Prediction: Draw

San Jose (-0.03) @ Real Salt Lake (-0.41)
Prediction: Draw

SATURDAY

Toronto FC (-0.23) @ Philadelphia Union (-0.06)
Prediction: Draw

Columbus Crew SC (0.30) @ DC United (-0.54)
Prediction: CCSC, FTW!

Colorado Rapids (-0.19) @ LA Galaxy (0.00)
Prediction: Draw

Vancouver (0.06) @ Portland Timbers FC (0.00)
Prediction: Draw

SUNDAY

Chicago Fire (-0.08) @ Sports Kansas City (0.63)
Prediction: SKC!

Seattle Sounders FC (0.04) @ New York City FC (-0.53)
Prediction: Draw

 

NERD IMAGERY

 

Yeah, go see it before all your friends do and spoil all the good parts.

Proactivity Doesn't Mean Success in 2015

By Jared Young (@jaredeyoung)

For more on Pscore, see last month's post.

As April comes to a close, Orlando City is still, by a fair margin, the most proactive team in MLS. They are joined closely by Montreal, NYCFC and Columbus. Unlike last year, where the top seven most proactive teams made the playoffs, this season proactive play is no guarantee for success, with only Columbus playing well in the top half of the East.

Each month I’ll change up the table to give some different looks. I’ll also try to look at how Proactive Score relates to other statistics to see if Proactive Score is making sense in a larger context.

This month I split the PScore between home and away. Last month I pointed out that Portland, Real Salt Lake and Sporting Kansas City were all playing more reactively than in past seasons. What’s interesting is that they are all playing more reactive at home than they are away. On the flip side, Toronto FC was a very reactive side last year but look like they could ultimately be one of the more proactive teams. Despite no homes games this year they currently rank 7th in the league.

Team Rank Last Pscore PPG Pscore Home Pscore Away Home Dif
ORL 1 1 7.5 1 7.3 7.8 -0.5
MON 2 2 6.8 0.5 9 6 3
CLB 3 10 6.4 1.6 7.8 4.7 3.1
NYCFC 4 12 6.4 0.8 5.8 7 -1.3
NYRB 5 3 6.3 2 7.7 5 2.7
CHI 6 4 6 1.5 5.8 6.5 -0.8
TOR 7 7 5.8 1 5.8
SEA 8 5 5.7 1.9 5.8 5.7 0.1
LA 9 11 5.6 1.5 7 4.3 2.8
NE 10 15 5.4 1.8 5.5 5.3 0.3
POR 11 16 5.4 1.1 4 6.8 -2.8
DC 12 6 5.3 2 4.8 6 -1.3
VAN 13 13 5 1.8 5.6 4.3 1.4
COL 14 14 4.7 1 6.3 2.7 3.6
SJ 15 18 4.7 1.3 5.3 4.3 1.1
HOU 16 9 4.6 1.3 4.8 4.3 0.5
PHI 17 8 4.3 0.7 5.5 3.4 2.1
SKC 18 20 4.1 1.3 3.8 4.5 -0.8
RSL 19 17 3.9 1.3 3.7 4 -0.3
DAL 20 19 3.5 1.8 2.8 4.7 -1.9

This month we’ll start with a fairly easy comparison of PScore against pass completion rates and possession. Not surprisingly PScore and pass completion are strongly correlated with an RSquared of .63

Given PScore is built on long passes and backward passes it stands to reason that teams that prefer short backwards passes will complete a higher percentage of their attempts. Orlando City is the team in the upper right. This chart highlights why a team like FC Dallas can be so good but complete the 3rd lowest percentage of passes in the league. The reason is because they attempt a higher volume of forward and long passes than the rest of the league. 

Here is the comparison of pass completion rate and possession:

And here is the comparison of PScore and possession. 

It’s interesting that PScore predicts possession levels just as well as pass completion rate. But something else pops for me here that is worth tracking in the future. Look at the two data points for Orlando City and Montreal. Their PScore level indicates they should be enjoying a much higher level of possession. Both those clubs are underperforming in the bottom half of their conference. On the flip side, FC Dallas is enjoying much more possession than their PScore suggests they should be, and they are performing very well in the West. The other two data points well below the line and to the right of FC Dallas are Real Salt Lake and Sporting Kansas City. Both of those clubs have had rocky starts this year but are perennial contenders.

It’s another interesting angle to watch. If we look at how proactive a team is and compare that to the possession they should expect to have, can we assess how well the team is performing? I ran a quick regression that looked at the error of possession and Pscore relationship and compared it to points. The Rsquared was just 6% but trending in the right direction. One of the issues is there are many points with very little error. It may only be useful when looking at large error levels. 

Next month we’ll revisit the importance of being more proactive or reactive than your opponent.

 

The Weekend Kick-Off: New York City FC To Play With Fire; Hoping Not To Get Burned

by Harrison Crow (@Harrison_Crow)

This season started off with Chicago being a punch line. They looked bad against LA and were arguably worse against Vancouver a week later. They were not aesthetically pleasing and their new star Shaun Maloney wasn't doing much to inspire visions of a team turn around.

Nearly five weeks later the team has back-to-back wins and Maloney is not looking as bad (sporting an xG+xA of .81). Shockingly enough, Chicago isn't the dumpster fire it once was. There may even be enough pieces with the return of Mike Magee to make a push for a playoff spot.

I'm not trying to get ahead of myself; there are still 29 more matches to play. Chicago could still be a bad team but there is something about having either an above average defense or offense that presents a complicated variable.

Chicago might be a mess defensively (1.40 xG against) but their attack has all sorts of interesting pieces. Harry "don't call me Harrison" Shipp is perhaps one of the most interesting American attacking pieces in Major League Soccer. Kennedy Igboananike is very quietly having a strong first year. Quincy Amarikwa is still doing his thing as perhaps the most under-appreciated striker in MLS, and Joevin Jones has been a nice little pick-up too.

The sum of the team has melded to make a greater whole than the individuals. We'll see this weekend if their success can continue.

Whereas Chicago has been defeated by their poor and mistake prone defense, it's been New York City's moments without David Villa on the ball that has been their downfall. Villa has been worth just about every penny. The problem has been outside of Villa. They've gotten league average assistance from his trio of strike partners Khiry Shelton, Adam Nemec and Patrick Mullins. But inconsistent creation from the midfield and a defense that is still trying to get on the same page has created problems.

Mikkel Diskerud has shown moments of brilliance between his slick passes and curling shots finding holes for goals. But he's still working through adjustments to the league and he's perhaps not the pure creator that Jason Kreis or NYC needs. Maybe Frank Lampard will be that person, and maybe not. Maybe it will take the summer transfer window to acquire that player.

Right now, New York City boasts a defense that has pieces and talent but somehow hasn't yet translated that to being successful. Currently averaging 1.40 xG against and standing 15th overall, the  scary thing is that their PDO is sitting around 987, right near the normal resting heart rate of a club. In other words, they probably are what they are as a team. I'm sure they'll have some ups and downs through the season, but without limiting the shots this club isn't going to really take that step.

They already found out that backup striker Tony Taylor is out for the season. Should NYC loose out on Villa tonight, and that's the current rumor going around, they will have to not only figure out how to make up the difference in his ability to create and score goals but hold to bay a team that actually has a decent attack of their own.

The real outcome of this match will boil down to whose defense holds. Will Sean Johnson show up for this match and can Josh Saunders continue to be an above average keeper? This season is still young and while a single game hardly defines the destiny of a season, I suspect these two clubs will be dancing around each other through the season in the standings.

Tonight, for mostly obvious reasons, I'm taking the Chicago Fire for all three points. That said, I wouldn't be surprised if their defense collapsed and a draw was a result but either way I shade in the Fire's direction of earning points.

FANTASY PERSPECTIVE

Chicago Fire

Harrison Shipp (Selected 26.1%, Cost $7.9)
There are few entertaining and redeeming qualities about the Fire and Shipp is one and perhaps all of them at the same time. I can't imagine that his cost is going to stay suppressed for much longer if he keeps putting together the goal scoring opportunities for his strikers and finding the back of the net himself.

Lovel Palmer (Selected 12.3%, Cost $5.9)
There are few players in MLS as versatile as Palmer which translates to more minutes because of it. He'll never be an individual that puts together huge games in terms of points. But it'll be consistent point allotment from match to match and in MLS Fantasy that's a huge quality to be find.

New York City FC

David Villa (Selected 20.3%, Cost $10.3)
The 33-year old Spaniard looks out this week so he probably doesn't impact fantasy this week but looking down the range, once he heats up, he'll be the best striker in MLS. Write it down.

Mix (Selected 10.9%, Cost $9.1)
This is one of those occasions that I don't get the price relative to the production that an owner is going to get. There are a lot of people that bought into him early (probably due to the pairing of Villa) and kind of got burned. He's a player that we're still learning about because we didn't have a lot of concrete data on him. I think he still has a bright future with the US and in MLS.

THE WEEKEND MATCH-UPS

(expected goal differential in even game-states)

Saturday

Dallas FC (0.04) @ Colorado Rapids (-0.20)
Prediction: Draw

Philadelphia Union (-0.03) @ Columbus Crew SC (0.29)
Prediction: Columbus

Real Salt Lake (-0.37) @ New England Revolution (0.32)
Prediction: New England

Sporting KC (0.78) @ Houston Dynamo (-0.18)
Prediction: Sporting Kansas City

Sunday

DC United (-0.49) @ Vancouver Whitecaps (0.00)
Prediction: Whitecaps

LA Galaxy (0.08) @ New York Red Bulls (-0.01)
Prediction: Draw

Toronto FC (-0.46) @ Orlando City SC (0.13)
Prediction: Draw

Portland Timbers (0.22) @ Seattle Sounders FC (0.86)
Prediction: Draw

 

NERD IMAGERY

Expected Goals 3.0 Methodology

By Matthias Kullowatz (@mattyanselmo)

Michael Bertin of Deadspin recently critiqued the expected goals craze that is rushing through advanced soccer metrics. He specifically noted that so many expected goals models are currently proprietary, hidden inside of black boxes. We here at ASA have sought to be as transparent as possible, and so we have published our logistic* expected goals models in the Explanation section of our xGoals 3.0 tab above.

Many of the variables in the model are intuitive. The distance from the shooter to the goal obviously affects the difficulty of the shot, as well as the angle from which the shot was taken. Shots off corner kicks have a lower chance of going in--once controlled for shot location, angle, body part, and other factors--because the box is packed. Fastbreak shots off through balls have a high chance of going in because the shooter often has time and space. The variables in the basic shooter/team model include: distance, goal mouth available, whether the shot was headed, whether the shot came off a cross or through ball, and whether the shot came from any one of the various patterns of play including corner kicks, direct free kicks, indirect free kicks, fastbreaks, or penalties. The "regular" pattern of play is included in the intercept term.

A recent change we have made is substituting a log-Distance variable into the model for what was just a linear Distance variable. This idea was admittedly inspired by Bertin. Using log-Distance will change some of the output on the blog because the results of extremely close and extremely distant shots were not being as accurately predicted as they are now. Justification for this change can be seen in the graph to the right. The trend is that of a (negative) log function rather than a linear function. Note the spike around 13 yards. These are penalties, and as you can see, our model's calibration is off a bit. Penalties average 13 yards in distance in our data set, though this will not effect the utility of the model because distances are relative.

I have also updated how the model treats the width of the goal mouth available to the shooter. From straight on, a shooter has eight yards from left post to right post. But as his angle gets worse, that width available can shrink considerably. To appropriately model the effect of goal mouth availability, I used a quadratic function, which is justified to the right. The plot shows how the log odds of a goal change due to angle, with diminishing returns for better angles. Here, shot distance is frozen between 9 and 15 yards. 

 

Additional Keeper Model Variables

The height of the shot in the goal mouth is also important. Players aim both low and high to try and beat the keeper, and justification for that strategy is borne out beautifully in the graph shown to the right. The log odds of a goal increase the further the shot height is from a comfortable 3.5 feet. The decline in log odds between about 6.5 and 8 feet is a bit perplexing, though. I controlled for distance on this graph, but not other factors. It turns out that 21 percent of all shots in the upper portion of the goal mouth were headed, versus just 14 percent of shots below that zone. This surely plays a role in the strange behavior between heights of 6.5 and 8 feet, and we have controlled for headed shots in the model. Here, shot distance is frozen between 15 and 21 yards.

The last variable I'm going to justify is the linear version of the lateral distance a keeper had to move to make a save. This was the hardest part of the model mathematically, as it required some tricky analytic geometry and some basic assumptions about keeper positioning that aren't always true. Basically, we assume that keepers position themselves along the angle bisector of the two rays that extend from the shot to both posts. If they don't, then they should (usually). The lateral distance to the shot is then measured along a line that goes through the near post, perpendicular to the angle bisector. The geometry, as well as justification for the linear term in the model, are shown below. Again, there is strange behavior in the log odds when the lateral distance is between 3.5 and 4. The is because very few shots are taken from straight on, and thus the sample size is incredibly small and subject to weird fluctuation. Here, shot distance is frozen between 9 and 15 yards.

 

For logistic models (and many other general linearized models and non-linear models), the R-square value is not a particularly intuitive value. I hope the p-values in the models above, in addition to the graphs and basic logic about soccer, help to justify our Expected Goals 3.0 model. 

*Logistic models use a log odds response instead of a probability. This is because linear models by themselves could potentially arrive at probabilities above 1.0 or below 0.0. Log odds are the natural logarithm of the ratio of probability of success "p" to probability of failure "1 - p," or ln[p/(1-p)]. 

The Weekend Kick-Off: A Coast-to-coast trip

By Harrison Crow (@Harrison_Crow)

A mid-week US Mens National team game, a Thursday evening game between Philly and NYC, all leading up to a Friday game? Maybe I should have posted this "kick-off" on Wednesday? Well, forgive me. At least there has been plenty of American soccer to go around this week, which has inhibited my new seasons of Property Brothers that hit Netflix last week (why do they go always go with the smaller Reno budget???). I'll live.

Let's get to this week's Friday night game of San Jose traveling to Harrison, New Jersey to take on the New York Red Bulls (notice, I didn't use sarcastic tone or put New York in parentheses? Be proud of me, this is growth).

San Jose will be without exciting newcomer in Innocent Emeghara, who was suspended by MLS, and defender Shaun Francis, who is out one to two months with a fractured cheekbone. However, Dom Kinnear and company got a bit o' luck with Chris Wondolowski who was with the US national team but played zero minutes. Wondo ranks 11th in the league in xGoals + xAssists, indicating that he is a crucial piece of the Earthquake's offense, and he should be available tonight.

Jesse Marsch is in a much better position with his line-up. A healthy attack of Bradley Wright-Phillips, Lloyd Sam and Felipe Martins with Sascha Kljestan launching passes into the attacking third makes for an altogether overwhelming task for San Jose defenders Clarence Goodson and Victor Bernandez. That said Marsch still will have to deal with his own set of missing personnel with the potential unavailability of both Ronald Zubar and Damien Perrinelle.

Something that we talk a lot about around these parts is the traveling conditions for teams that are traveling West-to-East and East-to-West. I haven't done research on it, but it's something that Drew has talked a lot about. I don't like speculating on things for which I have no data in front of my face, but I feel like East-West travel through time zones has been shown to have a hangover effect on away teams (anyone that wants to do a study and needs some help give us a shout). Going into this, my mind thinks San Jose has a lot to overcome.

But, we're not here to get opinions. We're here for facts. That's why you come to read this blog...mostly. It might also be my winning personality and Property Brother mentions.

The Red Bulls are tied for first in points per-game within Major League Soccer, cohabiting that position with DC United. Paradoxically, both teams are ranked toward the bottom of our expected goals tables, so perhaps some regression is coming. If not this week, soon.

Currently the Red Bulls are tied (with Real Salt Lake... you can't script this stuff) with the second-highest PDO. Which, as we discussed last week, is a barometer for exceeding or falling short of likely expectations, especially early in the season. Their shots against totals are a very under-discussed talking point that could end up costing them some points in the future. Especially when their finishing rate against (6.8%) is almost sure to rise in the coming weeks.

I still think that the Red Bulls are the better team, and considering they are a very good home team and San Jose is making a cross country trip, things kind of lie in their favor. But don't be surprised if San Jose finds some cheap goals and still gets a point.

That said, PREDICTION: I'm going with the Red Bulls.

FANTASY PERSPECTIVE:

San Jose Earthquakes

Fatai Alashi (owned %6.6 - worth $5.1)
Alashe is getting plenty of selections by owners with the growing importance within MLS Fantasy of having someone cheap that is going to see minutes on your bench. His performance for SJ isn't about getting a bunch of points--though he's had some solid moments--it's about making sure you get some points. He's started four of six games played by San Jose this year.

Chris Wondolowski (owned %5.1 - worth $10.7)
Wondolowski is the most consistent goal scorer in MLS not named Robbie Keane. Goals scored isn't everything in MLS Fantasy, but it's of course the big point-getter, and there are few that are going to be worth that much of an investment. As mentioned above, Wondo is 11th in the league in combined xGoals and xAssists. He's a key part of that offense.

 

New York Red Bulls

Bradley Wright-Phillips (owned %12.9 - worth $10.9)
BWP is showing that he's more than just Thierry Henry's last project with two goals and two assists in four games. My initial concern is that he's creating fewer shots, especially considering that last season's 27 goals came not just from quality chances but also volume of shots (109). However, our numbers have him at 3.46 xG+xA, which is fifth in the league and first on a per-game basis.

Lloyd Sam (owned %8.9 - worth $8.8)
Sam is in a similar situation as BWP with gathering less total xG than what he's actually scored. But, just like Wright-Phillips, it's not as if he's overachieving by much. His expected assists is over one, putting him on pace for 8 - 10 assists this season. I really like Sam and I fully expected there is going to be a tough moment where I have to come to the realization that he might not be worth the price relative to the other market options, which makes me sad, but for now he's doing great and I expect him to continue to do so.

The Weekend Match-ups:

Saturday

Houston (-.49) at DC United (-0.82)
Prediction: Draw

Orlando City (0.26) at Columbus Crew SC (0.11)
Prediction: Draw

Toronto (-0.56) at FC Dallas (-0.12)
Prediction: F-C-D

Seattle Sounders (0.62) at Colorado (-0.06)
Prediction: EBFG, Sounders

Vancouver (0.00) at (Real Salt Lake -0.62)
Prediction: Southersiders

Sporting KC (1.26) at LA Galaxy (0.32)
Prediction: SKC, but this is one of the more mind boggling match-ups--I may come back to this one in a few weeks.

 

Sunday
New England (0.34) at Philadelphia Union (0.61)
Prediction:  Union, because it has to eventually happen.

Portland Timbers ( 0.37) at New York City (-0.91)
Prediction: Cascadia with a third win on the day. #BestCoast

 

NERD IMAGERY OF THE WEEK:
 

 

One day I'm going to finish my Marvel meets MLS post and you're all going to hate it. For the time being I want you to think about how much Dax McCarty looks like Remy Lebeau. You're welcome.

 

Mexico at USMNT: Klinsmann stays the course

By Jared Young (@jaredeyoung)

The USMNT avoided their trademark collapse on Wednesday and easily defeated their arch-rival Mexico by the classic score of dos a cero. The final score was about the only stat that changed however for Jurgen Klinsmann’s team, as the USA continued the style of play that has characterized their post-World Cup friendlies. Klinsmann continued to experiment with new players and played a conservative style focused on getting good shots while limiting the opponents’ quality chances. He said that he was starting to hone in on the Gold Cup and so fans might have expected the US would come out of their shell. Perhaps the surprise of the match was that they stayed the course, in what could be Klinsmann’s preferred strategy for the next cycle.

Klinsmann went with a 4-4-2 diamond set up, while El Tri came out in a conservative 5-3-2 setup. Both teams offered very low defensive pressure to start the game before slowly opening up. Both teams combined for just 8 shots in the first half with only two being attempted inside the 18 yard box. There was just no space for either offense to operate.

In the second half as the teams opened up, it was brilliant play from Michael Bradley combined with a little luck and solid finishing that gave the US their only two goals of the game. Jordan Morris, a 20 year old, scored his first goal for the USMNT. Much will be made of Jordan being a college player but we need to remember that most of the best players in the world are not playing soccer in college. It’s simply not part of a good player’s development in any country but the US. Just over four years ago, the 2nd goal scorer of this match Juan Agudelo, scored a USMNT goal as a 17 year old. Did it matter that he was or was not in college? Heck, he wasn’t old enough to be in college. The media loves a good story but this country won’t show soccer maturity until we can bring that global perspective to the game. Celebrate a young player scoring and give that context, just please not that he’s choosing to play in college.

486 minutes from “newbies”: Klinsmann said his focus was turning to the Gold Cup, but he continued to experiment with new players. More than half of the minutes played were by players who did not play in the World Cup. This was the second highest minute total for the young guys in this series of friendlies, only exceeded by the Switzerland match.

72% pass completion percentage: Blame the poor field conditions but this pass completion percentage was the lowest from the US during this cycle. When a team is sitting deep, low completion percentages are expected, but at home this was perhaps too sloppy a number.

Four shots on target for USMNT to two for Mexico: Yet again, the USMNT gained the shot advantage despite giving up more shots. Mexico outshot the US 12-8 but eight of Mexico’s shots were hail Mary’s from outside the 18 yard box. The USMNT’s TSR (Total Shots Ratio) since the World Cup is 39%, but they make up for it by putting 44% of their shots on target and getting quality looks. That remained a key strength of the US team against Mexico.

Rough go for Garza. The only space in the attacking half that Mexico found in the first half was in Greg Garza’s area. Garza has been given a long look by Klinsmann in these friendlies. He’s earned the most caps of any non-World Cup player with seven. 

The circled passes above were attempted by Mexico in Garza’s area. There was clearly space to operate and Mexico was exploiting. Yes, it appears that El Tri was building more often down the right side, but the fact that they found so much space in that area is disturbing. Meanwhile DeAndre Yedlin was playing very aggressive defense and his area remained primarily clean. That is until the 2nd half.

Mexico, perhaps seeing that Yedlin was aggressively playing the ball, shifted their focus to his side. Luckily they didn't have enough success to score a goal. It should be noted that Brek Shea kept his area on Mexico’s right hand side clean in his second half shift.

A win over your arch-rival will always be good, and this team needed to finish off a match and get a good result. With difficult road friendlies at the Netherlands and Germany on the horizon, we should expect more of the same style from Klinsmann. His speeches about playing proactively with the rest of the world seem to have quieted, but he’s found a nice recipe over the last few friendlies. The US has allowed just four goals in the last four games, and just one in the first half. At the same time they’ve put 13 shots on target and limited their opponents to just 9. The US has converted seven of those 13 shots as well. Hard to complain where the US sits as they approach the Gold Cup in July.

MLS Trade Analysis: Alex for Jason Johnson

By Mike Fotopoulos (@irishoutsider)

Yesterday, the Fire traded Alex to Houston for Jason Johnson. These are the kinds of trades that make my inner MLS capgeek smile. Chicago trades a perfectly average midfielder on a perfectly average contract for a pocket full of cap room and a free player to boot. They needed cap relief and fewer midfielders, and this move gets the job done.

Alex is definitely out of the picture in the Fire midfield with Matt Polster, Michael Stephens, Victor Perez, and likely Chris Ritter and Razvan Cocis ahead of him on the depth chart. Getting Houston to throw Jason Johnson and his Generation Adidas contract is basically free money. Johnson’s contract is basically a free option to see if he pans out, so it would seem that the Fire are coming out ahead on the trade.

The question for Houston is their own need for midfield depth. Given the Dynamo’s current pairing of Nathan Sturgis and Luis Garrido, Alex seems to be bringing exactly that. He has struggled for playing time recently in Chicago, so it is hard to say whether he would be a clear starter over either. More likely, it is straight purchase of a serviceable midfielder, which is exactly what the Fire put up on offer. They found themselves with depth to sell and were able to find someone to pay them off. 

Interestingly enough, bringing in yet another forward player places Chicago back in a position where they can find themselves with more attacking depth. Mike Magee and Patrick Nyarko are still recovering from injury, but it is possible to see the Fire start the summer with an extra player up top. If Johnson can find a role on the current roster, they could see themselves ready to deal again, potentially making another deal along these lines. 

MLS cap space is a precious commodity, and as Chicago continues to repair its roster, optimizing every dollar spent is the key. Trades like this get some dead money off of the bench and also give a free look at a young player, so clubs should take advantage of these situations whenever they arise.