Gold Cup Team Preview: Canada

Despite having the worst FIFA Ranking in the tournament, Canada is a good bet to get out of Group B and advance to the quarterfinals for the first time since 2009. They have quietly been playing very solid soccer for the last year and collected a 5-4-2 (W-D-L) record in their last eleven matches, including impressive draws against Bulgaria, Iceland and Panama. They are 5-1-0 in their last six CONCACAF matches as well. 

How did they get here?
Canada is a co-host for the tournament and therefore an automatic qualifier. This marks their 12th Gold Cup appearance out of thirteen, so they were a good bet to qualify regardless.

What Group are they in?
They are in Group B, along with favorite Costa Rica, Jamaica, and El Salvador. The winner of the group will play the Group A runner-up (probably Panama), the runner-up will play the Group C runner-up (probably Trinidad & Tobago or Guatemala) and third place will play either the Group C (Mexico) or Group A (USA) winner.

Read More

Mexico at USMNT: Klinsmann stays the course

By Jared Young (@jaredeyoung)

The USMNT avoided their trademark collapse on Wednesday and easily defeated their arch-rival Mexico by the classic score of dos a cero. The final score was about the only stat that changed however for Jurgen Klinsmann’s team, as the USA continued the style of play that has characterized their post-World Cup friendlies. Klinsmann continued to experiment with new players and played a conservative style focused on getting good shots while limiting the opponents’ quality chances. He said that he was starting to hone in on the Gold Cup and so fans might have expected the US would come out of their shell. Perhaps the surprise of the match was that they stayed the course, in what could be Klinsmann’s preferred strategy for the next cycle.

Klinsmann went with a 4-4-2 diamond set up, while El Tri came out in a conservative 5-3-2 setup. Both teams offered very low defensive pressure to start the game before slowly opening up. Both teams combined for just 8 shots in the first half with only two being attempted inside the 18 yard box. There was just no space for either offense to operate.

In the second half as the teams opened up, it was brilliant play from Michael Bradley combined with a little luck and solid finishing that gave the US their only two goals of the game. Jordan Morris, a 20 year old, scored his first goal for the USMNT. Much will be made of Jordan being a college player but we need to remember that most of the best players in the world are not playing soccer in college. It’s simply not part of a good player’s development in any country but the US. Just over four years ago, the 2nd goal scorer of this match Juan Agudelo, scored a USMNT goal as a 17 year old. Did it matter that he was or was not in college? Heck, he wasn’t old enough to be in college. The media loves a good story but this country won’t show soccer maturity until we can bring that global perspective to the game. Celebrate a young player scoring and give that context, just please not that he’s choosing to play in college.

486 minutes from “newbies”: Klinsmann said his focus was turning to the Gold Cup, but he continued to experiment with new players. More than half of the minutes played were by players who did not play in the World Cup. This was the second highest minute total for the young guys in this series of friendlies, only exceeded by the Switzerland match.

72% pass completion percentage: Blame the poor field conditions but this pass completion percentage was the lowest from the US during this cycle. When a team is sitting deep, low completion percentages are expected, but at home this was perhaps too sloppy a number.

Four shots on target for USMNT to two for Mexico: Yet again, the USMNT gained the shot advantage despite giving up more shots. Mexico outshot the US 12-8 but eight of Mexico’s shots were hail Mary’s from outside the 18 yard box. The USMNT’s TSR (Total Shots Ratio) since the World Cup is 39%, but they make up for it by putting 44% of their shots on target and getting quality looks. That remained a key strength of the US team against Mexico.

Rough go for Garza. The only space in the attacking half that Mexico found in the first half was in Greg Garza’s area. Garza has been given a long look by Klinsmann in these friendlies. He’s earned the most caps of any non-World Cup player with seven. 

The circled passes above were attempted by Mexico in Garza’s area. There was clearly space to operate and Mexico was exploiting. Yes, it appears that El Tri was building more often down the right side, but the fact that they found so much space in that area is disturbing. Meanwhile DeAndre Yedlin was playing very aggressive defense and his area remained primarily clean. That is until the 2nd half.

Mexico, perhaps seeing that Yedlin was aggressively playing the ball, shifted their focus to his side. Luckily they didn't have enough success to score a goal. It should be noted that Brek Shea kept his area on Mexico’s right hand side clean in his second half shift.

A win over your arch-rival will always be good, and this team needed to finish off a match and get a good result. With difficult road friendlies at the Netherlands and Germany on the horizon, we should expect more of the same style from Klinsmann. His speeches about playing proactively with the rest of the world seem to have quieted, but he’s found a nice recipe over the last few friendlies. The US has allowed just four goals in the last four games, and just one in the first half. At the same time they’ve put 13 shots on target and limited their opponents to just 9. The US has converted seven of those 13 shots as well. Hard to complain where the US sits as they approach the Gold Cup in July.

Scoring The Proactivity of MLS Teams

By Jared Young (@jaredeyoung)

Last year I became interested in using statistics to measure a team’s style of play. I was inspired by a Jonathan Wilson article that laid out two extreme styles, which he labeled proactive and reactive. Proactive teams are concerned primarily with possessing the ball and high pressure on defense to get the ball back as quickly as possible. This is Barcelona and tiki taka in its purest form. The reactive teams are characterized by a desire to maintain their defensive shape, will typically offer low defensive pressure and will be direct in their attack.

I've adapted the score, that I called P Score, since last time and the details for the curious are below. One thing about the change to point out now is that I've adjusted the scale to be a 10 point scale - 10 is a high level of possession and 1 is very reactive. 

Here are the P Score rankings for MLS through March. The columns to the right of the total scores show a team’s proactivity relative to their opponent. The way to read the table (for example starting in data column 3) is that Orlando City was less proactive than their opponent in 25% of their games and averaged one point per game. A game is considered even if the two teams were within one point of each other in their P Score for that game. 

Less Proactive Even More Proactive
Rank Team P Score Pts/Gm % of Gms Pts/Gm % of Gms Pts/Gm % of Gms Pts/Gm
1 Orlando City SC 9.5 1.3 0.25 1 0.25 3 0.5 0.5
2 Montreal Impact 7.3 0.7 0.33 0 0.33 1 0.33 1
3 New York Red Bulls 7.3 2.3 0 0 0 0 1 2.3
4 Chicago 7.3 0.8 0 0 0.5 1.5 0.5 0
5 Seattle 7 1.3 0 0 0.33 3 0.67 0.5
6 D.C. United 6.7 2 0.33 0 0 0 0.67 3
7 Toronto FC 6.7 1 0.33 0 0.33 0 0.33 3
8 Philadelphia 6.5 0.5 0.25 1 0.5 0 0.25 1
9 Houston 6.3 1.3 0.25 1 0.5 0.5 0.25 3
10 Columbus 6 1 0.67 0 0 0 0.33 3
11 L.A. Galaxy 6 1.3 0.25 0 0.5 2 0.25 1
12 NYCFC 6 1.3 0.25 1 0.25 1 0.5 1.5
13 Vancouver 5.8 2.3 1 2.3 0 0 0 0
14 Colorado 5.3 1 0.33 1 0.67 1 0 0
15 New England 5.3 1 0.25 0 0.75 1.3 0 0
16 Portland 5 0.8 0.25 1 0.5 1 0.25 0
17 Salt Lake 5 1.7 0 0 0.33 3 0.67 1
18 San Jose 4.5 1.5 0.75 2 0.25 0 0 0
19 FC Dallas 4 2.5 0.5 2 0.25 3 0.25 3
20 Kansas City 3.5 1.3 0.67 2 0.33 1 0 0

Observations

  • Orlando City SC so far scores the highest with a Pscore of 9.5, significantly higher than 2nd place Montreal
  • A couple of teams that are usually known for their possession oriented style of play are at the bottom of the list. The Portland Timbers change of style has been noted, but Sporting Kansas City anchoring the list is a big surprise given their history of a 4-3-3.
  • Two of the best reactive teams last year, New England and Dallas, are again near the bottom of the league.
  • Looking at the table in some depth reveals some interesting early trends about where points are concentrated. I summed up the table in a visual below.

What this table says is that if a team is going to be proactive, it’s beneficial to be more proactive than their opponent. The same goes for reactive teams - results are better when a team is more reactive than their opponent. The implication is that commitment to an execution of a style of play, regardless of style, is a key contributor to success. That’s a pretty fascinating learning and I’ll monitor the numbers over the season as we get bigger sample sizes. 

The New P Score Calculation

The P Score is built off the idea that pass type data can indicate what style of play a team is playing. A proactive team will attempt a higher number of shorter passes and should in theory have a higher percentage of backwards passes. A direct team will attempt longer passes in an effort to counterattack and will have less backward passes. 

When I developed the P Score on the 2014 season I was disappointed in the availability of passing data and I was forced to use variables that I didn't want to use. The model simply used the percentage of long passes and total passes. Recently, Whoscored added more pass types to their match center and I've evolved the model. I tried most pass types available including short, long, backward and through passes as well as crosses. I also looked at blocked shots because reactive teams block a higher percentage of shots than proactive teams. Given their penchant for defensive shape, that makes sense. 

I used multivariate regression using outcomes from a collection of games from the 2014 season. You can read which games I selected for the dependent variable in the prior post. Only two pass types ended up being statistically significant; the percentage of backward passes and the percentage of long passes. Both coefficients adjust the model in the direction you would expect. A higher percentage of long passes lowers the score and a higher percentage of backward passes increases the score. I did not use total passes in the model because that variable can be strongly influence by an opponent, whereas percentages would be more likely to indicate a team’s actual intent. The Rsquared of the new model was a sturdy 0.79.
The old and new models had similar results. I scored the 2015 season both ways and the correlation between the two is 0.95. Orlando City SC is still the top team and Sporting Kansas City is the bottom team scoring both ways.

I strongly prefer this version of the model because it looks at the percentage of the type of team passes to indicate style as opposed to anything related to volume, which as I mentioned would be much more likely to be manipulated by an opponent.

If you have any questions about the methodology please leave a comment or reach out to me on twitter @jaredeyoung. I’ll be publishing the P Score table monthly throughout the season.

USMNT IN Switzerland: Beyond the Score

By Jared Young (@jaredeyoung)

The USMNT took on Switzerland Tuesday, their 9th friendly since the World Cup, and in the process relinquished their 6th second half lead. The 1-1 draw wouldn't have been as much of a disappointment if the result didn't tell the same story about a team unable to hold a lead against top competition. The USMNT is now eleven goals against and just one goal scored in the second half of these friendlies. And that’s all I’m going to say about that. Here are three other stats to take away from the latest International weekend.

9: Is Klinsmann too conservative? Jurgen Klinsmann’s team didn't escape Europe with double digit shot attempts, as they finished with just nine. Is the team too conservative when it comes to shot selection? Three goals in nine attempts is an excellent conversion and there were a few shots that could have easily been converted, Michael Bradley’s sitter against Switzerland being the most notable. But are there too few shots taken? Consider that eight of the nine attempts were taken inside the box and even more crazy, inside the area of the spot. There was only one shot attempted from outside the 18-yard box, and that was Brek Shea’s laser goal off of a free kick. In other words, the team didn't attempt a shot outside the box in the run of play. Pause on that one for a moment.

This weekend the USMNT attempted 18.7 passes in the final third for every shot while their opponents attempted 10.8 passes in the final third per shot. Considering the US was playing a more direct style on offense that does imply they may be too picky once they get the ball in position. The results this weekend weren't terrible, especially offensively, but it does beg the question: does the US have the right shot selection balance offensively? More in part III of this post.

19.8: High energy, low team pressure. Colin Trainor has been publishing work on a metric that attempts to measure how much a team employs the high press. The metric takes opponent passes attempted in their defensive half plus about 20% of the offensive half of the field (so about 60% of the field that is the farthest away from their goal) and a team’s defensive actions in that same area. The lower the passes per defensive action, the more intense the high press. A measure of mid-single digits would indicate a consistent high pressure strategy. Here is the PPDA metric chart by team and area of the field.

You can see from the chart that Switzerland was much more aggressively defending up the pitch than the US. When the action was in the defensive end, both teams employed similar pressure. This resulted in the possession being strongly in favor of Switzerland at over 60%. The US did have high individual energy in their opponent’s offensive half but mainly that running around was just to disrupt the Switzerland offense as much as possible. The team as a whole was willing to wait to employ significant pressure. We didn't see a particularly aggressive US team this window and it makes you wonder if Klinsmann isn't perhaps going for results instead of pushing his team to be proactive like he was doing during the last World Cup cycle in these friendlies.

2: Blocked shots against UEFA teams. I now the late game defense is the big issue, but I’m not done harping on the shot selection. In this nine game stretch the USMNT has taken to the road against four European foes and have managed a 1-1-2 (W-D-L), but could easily have been 3-0-1. They did this attempting just 29 shots in the four games, an average of 7.3. The crazy stat is that only two of those shots were blocked, or just 6.9% of the total shots. A typical blocked shot percentage is roughly 25%. You can’t argue with the 17% finishing rate in those four games, but it does make you wonder the team is too picky on offense. 

Let’s do a little thought experiment to see if this trend is something that should change. Back to the latest window and games against Denmark and Switzerland. What if the US took shots as frequently as their opponents but also finished their shots at their opponents’ lower rate. The numbers would look like this:

The US would have only scored 2.6 goals had they been as selective as their opponents, and so while the sample sizes are clearly small, at least it looks from here that Klinsmann isn't too crazy.

Next up for the US is the rowdy rivalry with El Tri in what will hopefully be a Gold Cup Final preview (said by the guy living in Philly, home of the Gold Cup Final).

2015 ASA Preview: DC United

*xG = expected goals, xA = expected assists, xGD = expected goal differential. For more information see our xGoals by Team page.

By Jared Young (@jaredeyoung)

Time for soccer analysts to perk up because D.C. United’s 2015 season is the one to watch if you are into numbers. Why? Because in 2014, D. C. United defied them. And the story of the year, at least for the geeks, is whether or not D.C. United can do it again.

After one of the worst seasons in franchise history in 2013, D.C. United shocked the league and won the Eastern Conference in 2014. A turnaround like that required improvement on both sides of the ball. United went from easily the worst offense in the league with 22 goals to a solid offense with 52. And they turned the worst defense in the league with 59 goals allowed to tied for the league’s best with 37 goals.

So what’s the problem? The issue is that the numbers say it shouldn't have happened. The ASA expected goals model says that D.C. United should have scored just 38 goals compared to their 52. And on the defensive side the ASA models thinks that 48.6 goals against would have been a more likely number, compared to the actual 37. Their expected goal ratio in fact was 3rd worst in MLS. Michael’s Caley’s expected goals model suggests a similar story to that of ASA. Were D.C. United just lucky or are they doing something that the models don’t contemplate? On the offensive side of the equation the positive story can be traced to two dynamics scorers.

D.C. United’s dynamic duo

If United fans had their way, they’d pair Luis Silva and Fabian Espindola at the top of Ben Olsen’s 4-4-2 formation. And despite the fact that the two play a very similar style of forward, they’d be right. That pair was the reason for the strong shooting from United. The paid scored 22 goals last season, but the ASA expected goals model suggest they should have 11.4. That 10+ goal gap is most of the team’s “overproduction” last year.

Espindola’s finishing prowess is puzzling because he actually took his shots on average four feet further from the goal than other shooters.  Remarkably he was terrific at avoiding blocked shots. While shooters on average have their shots blocked one in every four attempts, Espindola had that happen approximately once in almost eight attempts. He may have been more focused on getting open looks versus how far he was from the goal. Silva’s ability is tougher to figure. Everything but his finishing rate appears average. His shots on target level was slightly higher than average and the percentage of his shots that were blocked was 23 percent. Unless the team finds new ways to increase their shot totals, the duo’s ability to shoot better than expected will be depended on going into 2015.

What’s changed going into the 2015 season?

The biggest change for the red and black this offseason was actually confirmation of a new soccer specific stadium to be built for the 2017 season. That stadium has the opportunity to enhance the soccer experience in the D.C. area and build a bigger base of fans for the team.

From a roster perspective there were just a few changes. United added Jairo Arrieta from Columbus Crew as forward depth. He will be a key contributor, especially in the early going. The most intriguing acquisition was nearly 31 year old Malmo FF and Finnish national midfielder Markus Halsti. Halsti is a versatile defensive minded midfielder who also adds depth for the defense. He should fit in well with Perry Kitchen, Davey Arnaud, Nick DeLeon and Chris Pontius. D.C. United also signed their first round pick Miguel Aguilar, who figures to get his feet wet on the wings this season.

No change is good news for the defense that led the league is goals against. Second year emerging star Steve Birnbaum and Bobby Boswell anchor the center of the defense while Sean Franklin, Taylor Kemp, and Chris Korb will rotate at the fullback.

What to expect in 2015

When expected goals models fail how can you expect what to expect? The defense-first United stand to keep up their stingy ways despite the models. The return of the core defense and the continued development of one of the bright young keepers in MLS, Bill Hamid, should mean Ben Olsen’s squad maintains their perch near the top of MLS.

The offense could prove to be more difficult to maintain, at least to start the season. Espindola starts the campaign with a six game suspension and Silva has been nursing a hamstring all preseason. Eddie Johnson’s playing future is uncertain at this point due to an enlarged heart, and that leaves Chris Rolfe and Arrieta to maintain status quo. It could be a rocky opening to the season, but one of United’s strengths is that they have a number of players that can play multiple positions. Rolfe and Pontius, for example, are hybrid offensive players that give Olsen flexibility with both lineups and styles of play.

And therein lies United’s core strength; while they prefer to play defense first and are usually more reactive than their competition, they can win playing all styles. Early on Ben Olsen will need to mix and match players until he lands on a core group and formation. There’s no reason to think that D.C. United will slip so far as to miss the playoffs, unless of course you believe that numbers never lie.  

Do expected goals models lack style?

By Jared Young (@JaredEYoung)

Expected goals models are hip in the land of soccer statistics. If you have developed one, you are no doubt sporting some serious soccer knowledge. But it seems to be consistent across time and geography that the smart kids always lack a bit of style.

If you are reading this post you are probably at least reasonably aware of what an expected goals model is. It tells you how many goals a team should have scored given the shots they took. Analysts can then compare the goals actually scored with the goals a team was expected to score and use that insight to better understand players and teams and their abilities.

The best expected goals models incorporate almost everything imaginable about the shot. What body part did the shooter connect with? What were the exact X,Y coordinates of the shooter? What was the position of the goalie? Did the player receive a pass beforehand? Was it a set piece? All of these factors are part of the model. Like I said, they are really cool.

But as with all models of the real world, there is room for improvement. For example, expected goals models aren’t great at factoring in the number of defenders between the shooter and the goal. That could force a higher number of blocked shots or just force the shooter to take a more difficult shot than perhaps they would like to. On the opposite end of that spectrum, perhaps a shooter was wide open on a counterattack, the models would not likely recognize that situation and would undervalue the likelihood of a goal being scored. But I may have found something that will help in these instances.

I recently created a score that attempted to numerically define extreme styles of play. On the one end of the score are extreme counterattacking teams (score of 1) and on the other end are extreme possession-oriented teams (score of 7). The question is, if I overlay this score on top of expected goals models, will I find any opportunities like those mentioned above? It appears there are indeed places where looking at style will help.

I have only scored one full MLS season with the Proactive Score (PScore) so I’ll start with MLS in 2014, where I found two expected goals models with sufficient data. There is the model managed here by the American Soccer Analysis team (us!) and there is the publicly available data compiled by Michael Caley (@MC_of_A). Here is a chart of the full season’s average PScore and the difference between goals scored and expected goals scored for the ASA model and Michael Caley’s model.

Both models are pretty similar. If you were to draw a straight line regression through this data you would find nothing in particular. But allowing a polynomial curve to find a best fit reveals an interesting pattern in both charts. When the Pscores are below 3, indicating strong counterattacking play, the two models consistently under predict the number of goals scored. This makes sense given what I mentioned above; teams committed to the counterattack should find more space when shooting and should have a better chance of making their shots. Michael Caley’s model does a better job handling it, but there is still room for improvement.

It’s worth pointing out that teams that rely on the counterattack tend to be teams that consider themselves to be less talented (I repeat, tend to be). But you would think that less-talented teams would also be teams that would have shooters that are worse than average. The fact that counterattacking teams outperform the model indicates they might also be overcoming a talent gap to do so.

On the other hand, when the PScore is greater than 4, the models also underpredict the actual performance. This, however, might be for a different reason. Usually possession-oriented teams are facing more defenders when shooting. The bias here may be a result of the fact that teams that can outpossess their opponent to that level may also have the shooting talent to outperform the model.

Notice also where most teams reside, between 3 and 4. This appears to be no man’s land; a place where the uncommitted or incapable teams underperform.

Looking at teams in aggregate, however, comes with its share of bias, most notably the hypothesis I suggested for possession-oriented teams. To remove that bias, I looked at each game played in MLS in 2014, home and away, and plotted those same metrics. I did not have Michael Caley’s data by game, so I only looked at the ASA model.

For both home and away games there does appear to be a consistent bias against counterattacking teams. In games where teams produce strong counter-attacking Pscores of 1 or 2, we see them also typically outperforming expected goals (G - xG). Given that xG models are somewhat blind to defensive density it would make perfect sense that counterattacking teams shoot better than expected. By design they should have more open shots than teams that play possession soccer. It definitely appears to me that xG models should somehow factor in teams that are playing counterattacking soccer or they will under estimate goals for those teams.

What’s interesting is that same bias does not reveal itself as clearly at the other end of the spectrum, like we saw in the first graph. When looking at the high-possession teams -- the sixes and sevens -- the teams' efficiencies become murkier. If anything, it appears that being more proactive to an extreme is detrimental to efficiency (G - xG), especially for away teams. The best fit line doesn’t quite do the situation justice. When away teams are very possession-oriented with a PScore of 6 or 7, they actually underperform the ASA xG model by an average of 0.3 goals per game. That seems meaningful, and might suggest that gamestates are playing a role in confusing us. With larger samples sizes this phenomenon could be explored further, but for now it's safe to say that when a team plays a counter-attacking game, it tends to outperform its expected goals.

Focusing on home teams with high possession over the course of the season, we saw an uptick to goals minus expected goals. But It doesn’t appear the case that possession-oriented teams shoot better due to possession itself, based on the trends we saw from game to game. It seems that possession-oriented teams play that way because they have the talent to, and it’s the talent on the team that is driving them to outperform their expected goals.

So should xG models make adjustments for styles of play? It really depends on the goal of the model. If the goal is to be supremely accurate then I would say that the xG models should look at the style of play and make adjustments. However, style is something that is not specific to one shot, it looks over an entire game. Will modelers want to overlay macro conditions to their models rather than solely focus on the unique conditions of each shot?

Perhaps the model should allow this bias to continue. After all, it could reveal that counterattacking teams have an advantage in scoring as one would expect.

If the xG models look to isolate shots based on certain characteristics, perhaps they should strive to add data to each particular moment. Perhaps an aggregate overlay on counterattacks would be counterproductive as it would take the foot off the pedal of collecting better data for each shot taken. Perhaps this serves as inspiration to keep digging, keep mining for the data that helps fix this apparent bias. Perhaps it’s the impetus to shed the sweater vest and find an old worn-in pair of boots. Something a little more hip to match the intellect.