By Matthias Kullowatz (@mattyanselmo)
As I began constructing prediction models for this season, I was faced with the obvious problem of dealing with small sample sizes. Teams have played three or four games to this point, which isn't much to go on when trying to forecast their futures. Portland, for example, has produced the fifth-best expected goal differential in the league (xGD of +0.22), but is missing its two best midfielders. I'm skeptical that the Timbers will be able to maintain that in the coming weeks. So I'm looking to last season to help me out with the beginning of this season.
Below are some heat plots depicting the correlation of six metrics to themselves. For example, if we sum each team's goals scored in its last 10 games of the past season and correlate that to its goals scored in the first ten games of this season, we get a correlation coefficient of 0.195. The highest correlations never breached 0.60, so a "red hot" correlation in the plots is about 0.60. Each of these correlations comes from a sample of 56 teams (18 in 2011-12, 19 in 2012-13 and 2013-14).
For the most part, expected goals stabilize to a greater degree than raw goals across the off-season.
Goals Allowed is a strange metric where the number of goals allowed in a team's last game of one season--a single game!--correlates strongly to its goals allowed during the next season. My theory is that the teams that have thrown in the towel by season's tend to play more open and are likely to allow more goals toward the end of a season. Those same teams tend not to be good--that's why they're not in the playoffs--and they continue to suck in the following season.
Expected Goal Differential shows a very strong correlation across the off-season, and I'm eager to employ some previous-season xGD data in the predictions models.
Next up, I'll look at the xGD in even gamestates across the off-season, and I'm hoping to publish those prediction models by Even Better Monday (the one after Good Friday). So be on the lookout!