Game States Analysis

When should a team park the bus? (Abridged) by Jared Young

"Tottenham might as well have put the team bus in front of their goal," said Jose Mourinho in 2004 following a draw between his Chelsea club and the Spurs. Although he would later say the phrase was one typically used in Portugal, Mourinho was credited with coining the phrase 'parking the bus,' which described a team that was sitting the whole team behind the ball in an effort to block the goal. It's less frequent for team to play a full 90 minutes that way, but often teams with a lead will change tactics late in the game and park the bus in an effort to ensure victory. To do this they move their line of defensive pressure back toward the goal, committing more players to defense. The other team is allowed more possession of the ball but the bet is they'll have a lower chance of actually scoring the equalizer. 

During this year’s FA Cup Final, Arsenal took a 1-0 lead over Aston Villa into halftime. They had thoroughly dominated the game and had taken eight shots to Aston Villa’s one. In that case, the obvious tactical choice was to change nothing at all. Arsenal logically kept up the pressure just as they did in the first half, added three more goals and finished with a shot advantage of sixteen to two.

Read More

Portland Timbers: Comeback Kids? by Matthias Kullowatz

I watched the Timbers go down 2 - 0 in the first half Wednesday night against FC Dallas before leaving disgusted for my indoor game. At halftime of my game, I noticed that Portland had come back to tie. Two common occurrences for the Timbers this year have been comebacks and ties, so perhaps it shouldn't have been that surprising. The Timbers have played nearly 400 minutes this season from behind--a quarter of their time spent on the field--which has given them plenty of time to win back the home crowd after early goals conceded. In all that time spent losing (nearly four game's worth) Portland has outscored its opponents 13-to-4. That's like four straight 3 - 1 wins. Even though most teams perform better when playing from behind, that still ranks Portland second in the league behind Vancouver (see chart below).

This begs the question, is Portland actually one of the best teams when facing a deficit, or might this be a product of some random variation? To the stats!

It turns out, Portland also does well by Expected Goals in losing gamestates. In fact, relative to the league, the Timbers are the best at generating quality and quantity of opportunities in these situations with an expected goal differential of +1.4. We know Expected Goals to be more stable, and thus it is probably a truer indication of what to expect in the future. Check out the chart below, scaled on a per 96-minute basis (basically, per game).

xGD When Losing

Team GF GA GD xGF xGA xGD GD Rank xGD Rank
POR 3.1 1.0 2.2 2.5 1.1 1.4 2 1
FCD 2.0 0.9 1.1 1.9 0.8 1.2 6 2
SEA 2.3 1.3 1.0 1.6 0.7 1.0 8 3
LA 1.8 0.0 1.8 1.8 0.9 1.0 3 4
NYRB 2.0 1.0 1.0 1.8 1.0 0.8 9 5
TOR 2.3 1.1 1.1 1.9 1.2 0.7 7 6
SJ 1.6 0.7 0.9 1.6 1.0 0.6 10 7
PHI 1.6 1.6 0.0 1.8 1.3 0.5 14 8
CHI 3.0 1.5 1.5 1.5 1.0 0.5 4 9
SKC 1.3 0.9 0.4 1.7 1.3 0.4 12 10
DCU 2.0 0.7 1.3 1.2 0.9 0.3 5 11
CLB 0.9 0.5 0.5 1.5 1.3 0.2 11 12
COL 2.7 2.3 0.4 1.6 1.5 0.1 13 13
MTL 0.8 1.8 -1.0 1.4 1.3 0.1 16 14
RSL 1.6 2.6 -1.0 1.6 1.5 0.0 17 15
NE 0.5 1.4 -0.9 1.4 1.3 0.0 15 16
CHV 0.6 2.9 -2.3 1.3 1.4 0.0 19 17
VAN 3.1 0.4 2.7 1.3 1.5 -0.1 1 18
HOU 0.8 2.5 -1.7 1.1 1.7 -0.6 18 19
Averages 1.8 1.3 0.5 1.6 1.2 0.4    

But wait! Hold the bus. There is one major confounding factor that we can control for here. Home field advantage. The Timbers have oddly found themselves frequently facing deficits at home, which means that a large portion of their time spent losing is spent in the friendly confines of Providence Park in downtown Portland. In fact, the Timbers lead the league in minutes spent losing at home--a weird stat, to be sure. Here's the same chart, but for teams losing at home.

xGD When Losing at Home

Team GF GA GD xGF xGA xGD GD Rank xGD Rank
SJ 3.3 0.8 2.5 3.5 0.5 3.0 5 1
NYRB 3.2 1.6 1.6 2.6 0.6 2.1 7 2
POR 3.6 1.0 2.6 3.0 1.0 2.1 4 3
FCD 2.8 0.0 2.8 2.1 0.4 1.7 3 4
COL 3.6 3.6 0.0 2.1 0.8 1.3 14 5
TOR 3.8 0.0 3.8 2.5 1.3 1.3 2 6
SEA 1.6 0.5 1.1 1.6 0.6 1.0 8 7
CHI 2.5 1.6 0.8 1.5 0.6 0.9 10 8
LA 0.9 0.0 0.9 1.8 1.0 0.8 9 9
NE 0.0 1.2 -1.2 1.4 0.6 0.7 16 10
CLB 0.8 0.4 0.4 1.7 1.0 0.7 13 11
PHI 2.4 1.7 0.7 1.9 1.3 0.6 11 12
VAN 5.1 0.0 5.1 1.5 0.9 0.6 1 13
MTL 0.7 1.5 -0.7 1.8 1.4 0.4 15 14
DCU 1.9 1.3 0.6 1.0 0.9 0.1 12 15
SKC 2.1 0.0 2.1 1.3 1.2 0.1 6 16
HOU 1.5 2.9 -1.5 1.7 1.6 0.1 17 17
RSL 0.0 1.8 -1.8 0.5 0.8 -0.3 18 18
CHV 0.0 3.8 -3.8 1.0 2.1 -1.0 19 19
Averages 2.1 1.3 0.8 1.8 1.0 0.8  

Even when I control for home field advantage, we still see the Timbers among the best teams at playing from behind, averaging 2.1 more goals than their opponents per 96 minutes. Is it the coaching? The players' mentalities? The raucous home turf on West Burnside? Luck? I don't know, but I know it's happening.


Game States: An Attempt at an Introduction by Drew Olsen

I'm no mathematician. Matty maybe, but I am not. So when approaching something like Game States, I felt it good to attempt to introduce it with something, though it's rather ominous and a bit intimidating. So most--if not all--of the information provided is taken from a source who is smarter than I am. That's really what this blog is all about, finding people who know and understand the principles we are trying to learn and centralize the material and keep it in tidy location where people that are new---not just to the sport, but also the concept of analytics---can go to find information and grow their knowledge.

The idea of game states is that a match will consist of a sequence of states, where each state is defined by a combination or series of events that culminate in creating a new state. Those events give details and help break down the match. They provide context or meaning to the data that we record. Game states, as I understand it, is based upon the idea of the Markov Chains.

...[A] mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process usually characterized as memoryless: the next state depends only on the current state and not on the sequence of events that preceded it.

Let's apply this idea to a sport...

In football, if a team is in a certain situation, what happened previously has no effect on what will happen next. For example, if we have a 1st-and-10 from our own 20, it does not matter if the previous play was a kickoff for a touchback or a 10-yard gain for a first down after a 3rd-and-10 from the 10-yardline. Either way, we now have a new situation that will only directly affect the next play.

To put it all into soccer terminology, if a team has possession of the ball and is progressing past the half-line and into the attacking third it doesn't matter if they got it on an interception or a goal kick. Regardless of how it happened they now have the ball going into the oppositions attacking third and will have an opportunity to threaten and score. I struggle with this thought because tactically this may not be true---an individual getting the ball on a break is different than someone participating with a soft building of play in attacking the opposition's net. Coming from a statistical and memory-less position we simply want the facts.

Let's go to another sport and put this into baseball terms because of the advanced progress that the sport in general has made in analytics. Baseball analytics breaks game states down into four basic concepts: the score, inning, base runners and outs. This would be the way you could calculate basic run/win expectancy and ratios.

If you have the average number of runs expected to score in an inning after any game state, you can figure out how many runs a stolen base is worth, or a triple, or a strikeout. The game state essentially allows us to relate everything that happens on the diamond back to the major currencies of baseball: winning and runs.

We're not talking about baseball. We're talking about soccer, or for you euro snobs, 'Fútbol'. However, taking the concepts of an already practically applied matrix, such as baseball and the one that that Tom Tango has already developed(see below), can give us ideas on how we can attempt to create a corresponding one in soccer. Soccer has wins too---though I see more people associate points---but our true currency in this sport is goals. Everything leads back to the price or value of a goal. Whether it be a cross or a tackle, the ultimate result of what we want is to be able to understand how things work together to produce goals.Comparing the possible game states; Soccer just--just like in baseball--has a score line that can help give us an idea of the transitions between states. A team being up one goal, or on the other side of the coin being down one goal, can give us a definition of a game state. It's simplistic, but it works. It also can show us why some times, looking at you Alex Fergueson, teams take less shots than at other times. It would make sense just for the purpose of proper possession.

We also have measurement for the length of the game, in that each team will play against one another for a total of 90 minutes. In soccer, we have time intervals. The largest problem is that because it's constantly moving rather than a set static state, such as an inning, it creates a lot of various probabilities and chains. However, if you wanted to mitigate that to an extent you could instead just revert to the basics of using the first half vs. the second half of a match. While these are two big time intervals, when used in conjunction with other specific game states it could continue to help us develop a better understanding of the game.

The next one listed is base runners, but I'm going to pass on that and move on to the concept of outs. This is something that at very best I can say is a difficult correlation, but if you wanted to attempt one I might try changes in possession. This is a rather poor concept and I have no idea how or if you want to use this... probably not. Baseball limits a team to three attempts to score per their half of the inning rather than having 3 specific possessions or attempts to score. In the first 45 minutes of a half you could have a team with anywhere between 12 and 40 possessions alone depending on who the opponent was and how the team executed its attack.

Back to base runners, and this is one of the easier things to mimic though I have no way of proving how closely they are related. Shots on goal  conceivably could give us a baseline on the probability of goal scored and, more importantly, the points that are associated with winning or drawing.

One additional game state that hasn't been mentioned and doesn't really correlate with anything that baseball has, is yellow and red cards. While, baseball has ejections it doesn't necessarily affect the dimensions of the game. However, in soccer it puts the team down a man, rather than being able to just sub in a replacement.

These are all elements which you would consider context. How do teams perform during these situations? Do their possessions last longer? Do they take more shots? What are the quality of shots? This is the information and really the purpose behind gathering the data points. Are the Seattle Sounders just as likely to score in a "-1" goal situation as they are in a tie-game?

This is all information that we are seeking. The context can provide further details as well as specific entry into how certain aspects and statistics can be properly correlated to goals, and then to points toward the table.

I feel like there is more to write about here... and that's because it's true. There is a lot more to write about. But this is primarily to cover the basics of game states. We'll talk more about it in our podcast tomorrow, and we'll have a follow up post to everything when we start preparing to post MLS game state information.