Goal differential and making the MLS Playoffs

Note: This chart originally appeared on BrotherlyGame for a Philadelphia Union preview but I thought this community might like to see the relationship between goals and making the playoffs.

We know the league's proud tradition of parity, but did you ever think that a mere four goals could increase a team's chances of making the playoffs by 75%? That apparently has been the case over the last four MLS seasons. The chart below shows the relationship between goals scored, goals conceded and probability of making the playoffs. 

Pretty graphs after the jump.

Read More

MLS Week 34 and Playoffs Seeding Generator

With both conferences so tight, there are some pretty wacky playoff scenarios floating around. Check out our game projections if you want to see what our model is predicting. In an effort to make it easy for everyone to figure out what each result means and to avoid having to do a lot of math while the games are progressing on Sunday, I came up with this spreadsheet. Just plug in the score to the games below, and the final conference standings will appear, down to the first few tiebreakers. The playoff matchups will fill in below.

Hopefully you will find this handy while you're in the middle of watching multiple games at once on Sunday, and throughout the playoffs.

Read More

What Piquionne's goal means to Portland

Though our game states data set doesn't yet include all of 2013, it still includes 137 games. In those 137 games, only five home teams ever went down three goals, and all five teams lost. There were 24 games in which the home team went down two goals, with only one winner (4.2%) and five ties (20.8%). The sample of two-goal games perhaps gives a little hope to the Timbers, but these small sample sizes lend themselves to large margins of error. It is also important to note that teams that go down two goals at home tend to be bad teams---like Chivas USA, which litters that particular data set. None of the five teams that ever went down three goals at home made the playoffs this year. Only seven of the 24 teams to go down two goals at home made it to the playoffs. Portland is a good team. Depending on your model of preference, the Timbers are somewhere in the top eight. So even if those probabilities up there hypothetically had small margins of error, they still wouldn't necessarily apply to the Timbers.

Oh, and while we're talking about extra variables, in those games the teams had less time to come back. To work around these confounding variables, I consulted a couple models, and I controlled for team ability using our expected goal differential. Here's what I found.

A logistic model suggests that, for each goal of deficit early in a match, the odds of winning are reduced by a factor of  about two or three. A tie, though, would also allow Portland to play on. A home team's chances winning or tying fall from about 75 percent in a typical game that begins zero-zero, to about 25 percent being down two goals. Down three goals, and that probability plummets to less than 10 percent. But using this particular logistic regression was dangerous, as I was forced to extrapolate for situations that never happen during the regular season---starting a game from behind.

So I went to a linear model. The linear model expects Portland to win by about 0.4 goals. 15.5 percent of home teams in our model were able to perform at least 1.6 goals above expectation, what the Timbers would need to at least force a draw in regulation. Only 4.6 percent of teams performed 2.6 goals above expectation. If we just compromise between what the two models are telling us, then the Timbers probably have about a 20-percent chance to pull off a draw in regulation. That probability would have been closer to five percent had Piquionne not finished a beautiful header in stoppage time.

ASA Podcast XXVIII: The One where we talk MLS Conference Semi-finals

Last night we talked about the eight teams still in the playoffs in a round-robin-style discussion, and then followed up the playoff talk with a general discussion about numbers. Specifically we talked about often-quoted and used statistics that don't really hold any value. I also pretty much alienate all lawyers who listen to the podcast. Enjoy! [audio http://americansocceranalysis.files.wordpress.com/2013/11/asa-episode-xxviii.mp3]

Jamison Olave's Value to New York

There was quite a popular tweet from a canine about New York's improved play this season when Jamison Olave was playing. https://twitter.com/GothamistDan/status/397398611438608384

There are obviously confounding factors at play here, not to mention small sample sizes. There were only seven matches this season in which Olave did not start, and eight in which he played 45 minutes or less. Any data obtained from these games is going to be subject to A) small sample sizes, B) lots of variance in the response variable (goals or wins), and C) no control for quality of opponent or location of the match.

To deal with the small sample size/variance problem, I'm going to use our now semi-famous data set on shot location origins. Steven Fenn kindly showed the world their predictive value, and to me that means that expected goals for and against are the most stable stat available for such an analysis. To control for New York's opponents---when Olave was both in and out of the starting XI---I have included each of New York's opponent's expected goals data in the linear regression, while also accounting for whether or not the Red Bulls were at home. Blah, blah, blah, to the results!

Looking at the defensive side, New York allowed shots leading to 0.24 fewer expected goals against in games that Olave started. That seems to indicate New York's need for Olave, but the p-value was a kind-of-high 26 percent. Overall, New York's expected goal differential climbed 0.19 goals in those games that Olave started, though again, the p-value was quite high at 46 percent.*

Now for your shitty conclusion, courtesy of shitty p-values: Olave's influence on New York's level of play this season was questionable. There is some suggestion that he helped reduce goal-scoring against, however there is a reasonable chance that that difference was due to other, not-measured-here variables. What I am more comfortable claiming is that he does not make a 0.86-goal difference on the defensive side.

The point is this. New York's shot creation and goal scoring ability, for and against, are more a function of whether or not the Red Bulls are home, and against whom they are playing. Not as much whether Olave starts. Obviously putting an inferior player into the starting XI isn't going to help New York out. But, as I always question, do we really know how to value soccer players at all? Maybe Olave just doesn't make that much of a difference. After all, he's only one of eleven players.

*For those curious, the number of minutes Olave played was a worse predictor variable than the simple binary variable of whether or not he started. Controlling for the strength of opponent was necessary since perhaps Mr. Petke was more likely to sit Olave against a worse opponent at home, or something like that.

MLS Playoff Chances

We are now including playoffs and Supporters' Shield probabilities on our MLS Tables, if you were unaware. These chances are calculated based on each team's current points and remaining schedule. The remaining game-by-game probabilities are specifically generated from the following:

  • Which team is playing at home. Home teams have won nearly 50% of all MLS matches during the last three seasons. There is definitely an advantage to having a remaining schedule packed with home matches. 
  • Total attempts generated and conceded. All season I have been studying the best predictive measures, and all season it's been SHOTS.
  • Finishing rates for and against. Though finishing rates weren't very predictive at the season's midpoint, it turns out they're not completely worthless after at least 25 games.
  • Past strength of schedule, as measured by past opponents' attempts data.
  • Other variables, such as past possession percentage, were considered, but they did not help predict the outcome of the game better than those chosen above.

The above indicators were tested on the last 10 week's worth of games during the 2011, 2012 and 2013 seasons (well, up through the most recent week). Using a multinomial regression, I was able to calculate the influence of each indicator, and then the resulting probabilities of each of the three possible outcomes (home win, draw, away win). Once these probabilities were established, it was a matter of simulation.

My simulation was based on the game-by-game probabilities (above), and I let the computer simulate the games as though they were weighted coin flips (like how your PS3 simulates Fifa or Madden games) over and over again. So basically, I simulated the last 36 games of the season 10,000 times each, added up the total points for each team, and created 10,000 simulated tables. The POFF% probabilities is simply the proportion of times that each team earned a "clean" playoff berth---that is, ties for fifth were not included. The Shield% probability represents the proportion of times each team earned at least a share of the top position in MLS.

As an example, Portland made the playoffs 9,919 times out of the 10,000 simulated seasons, seeing its simulated probability of making the playoffs go up about 4.7% from last week after a win against the Galaxy. What might surprise you a little is that the Timbers have a very real chance (16.3%) at the Supporters' Shield, according to this model. That's up about 10% from last week.

To try to understand why Portland received such a favorable jump, consider that just about every possible thing that could have helped the Timbers' chances happened. Portland earned 3 points against a team that the model thought was pretty good (the Galaxy), Seattle and New York tied---the best possible outcome from Portland's perspective---and Sporting KC lost in a pretty big upset. Though RSL still won, that doesn't matter too much because Portland gets RSL at home, and that match will be the one that most likely determines a points winner between the two teams. Also consider that Portland has a home match against Seattle coming up. Because the model gives big boosts to home teams, the Timbers have a reasonable chance for 6-point swings against RSL and Seattle. Oh, and there's that away match at Chivas to end the season. The model thinks Portland will take three points in that one with about 50% probability,* which is quite high for any away team.

There is still a lot of fight left in the Eastern Conference playoff push, as four teams have less than 60% probability of making the playoffs. In the West, things are a little more straightforward, though San Jose has been sneaky of late, and it could potentially steal Colorado's playoff spot. The model seems to like the Galaxy's talent and schedule more than Colorado, enough to put the Rapids on the Western Conference's hot seat.

*That might seem low, but consider that Chivas has 5 wins, 7 losses and 4 draws at home this season. That means they have only lost 7/16 games at home, or about 44%. Portland's 50.3% makes more sense in that context.