Shot Attempts Analysis

Sebastian Giovinco: Master of the Free Kick by Harrison Crow

When Sebastian Giovinco earned himself a free-kick just outside the penalty box on Monday night it felt as though fate was serving up one of those great moments. Ninety seconds later, as the 72nd minute expired, Giovinco delivered on the set-up by sending a curled ball over the half-hearted leap of the Red Bulls' defensive wall. It went barely above the head of roaming fullback Michael Murillo, goalkeeper Luis Robles couldn't move to his right fast enough, and Toronto was thrust into the lead in the first leg of the Eastern Conference semifinals.

The goal was amazing and the moment was a big one for a team on the road. As mentioned shortly afterwards on the broadcast and later repeated on seemingly every facet of social media, Seba has now scored more set piece goals than any other player since his arrival to Major League Soccer in 2015.

Read More

Shot Limiting: Bringing the heat (maps) by Sean Steffen

Earlier this year, I decided the world needed a 30 page paper on shot limiting in MLS. Of course, the powers that be found this to be a tad self indulgent, and more accurately, sad that I had the time to do such a thing. They ended up talking me down to a slightly more readable 20 pages, which can be read here.

But my art will not be compromised, gosh darnit! There is still so much to be learned about this topic, and, more to the point, my obsession hadn’t been quenched. Several questions were raised within the paper that I simply didn’t have the necessary data to explore.

Read More

Year-to-year Shot Correlations by Matthias Kullowatz

By Matthias Kullowatz (@mattyanselmo)

Perhaps I played my best card too early, publishing year-to-year expected goals correlations first. I won't waste a lot of words explaining what's going on here, but basically I'm just looking to see what shooting metrics correlate from the end of last year to the beginning of this year. This time, let's go back and look at raw shot totals. Without further ado, to the pretty plots!

Notes

Shot-to-shot correlations hold up pretty well against expected goals when it comes to repeatability. However, that doesn't necessarily mean they should be used in predictive models in place of expected goals. Expected goals not only predict themselves well ("stability"), but also predict outcomes well, like goals scored and games won.

For the most part, we see stronger correlations between shots on target than between total shots. This is not the first time I've found that some form of goal mouth placement at the team level is repeatable. The expeted goals model we use for goalkeeper ratings is based partially on placement, and this version of expected goals will almost surely creep into my prediction models this season.

Should away teams be more aggressive? by Drew Olsen

Second Half Shot chart - HOUvPOR - April 2014

The Portland Timbers traveled to Houston on Sunday in desperate need of three points to get out of the cellar in the Western Conference. They played well in the first half, outshooting the Dynamo 8 – 7 en route to a 1 – 1 tie, while dominating possession. Then Portland came out in the second half much like many away teams do with a tie score, conservatively. The second-half shot charts to the right serve as an indication of the change in strategy.  

This conjured up a question that constantly bugs me. Should away teams go for wins more often when tied in the second half? Let's get right to the data. Here is chart summarizing the offensive aggression of away teams during gamestates when the score is tied and the teams are playing with the same number of players. The data presents the proportion of totals earned by the away team in both the first and second halves.

2013-14 Goals xGoals Shots
1st Half 44.8% (266) 42.3% (282.9) 43.4% (2948)
2nd Half 34.8% (184) 37.4% (168.6) 39.7% (1654)
P-value 0.017 --- 0.007

The away team consistently garners 42% to 45% of these primary offensive stats during the first half, and then drops down to the 35%-to-40% range in the second half. For the proportions of goals and shots, those differences are statistically significant (there is no simple test for xGoals%, but it is probably statistically significant as well).

My instinct is that away teams are capable of playing in the second half as they do in the first half, and that these discrepancies are a product of conscious decision making by away coaches and players. Teams likely change strategy in the second half to preserve a tie. Playing more openly would ostensibly increase the chances of both a loss and win, while decreasing the chances of a tie. However, I would think based on the data above that it would increase the chances of a win more so than the chances of a loss. Since a win would earn the away team an extra two points, while a loss would cost it just one, my gut says teams should go for it more often.

Are away teams playing conservatively because mindless soccer conventionality tells them that it's okay to get one point on the road? Is this the self-detrimental risk aversion that plagues coaches in other sports, or are these numbers missing something that could justify the conservative play?

I can't say that I've proven anything, but these data suggest the former.

ASA Podcast XLIV: The One Where We Talk About What We Write About by Drew Olsen

Harrison and Matty discuss their two most recent articles, respectively about Harrison's Shots Created per 90 statistic and Matty's obsessive need to put RSL down because its players are more gooder at soccer than he is. It's a short one, perfect for your commute![mixcloud http://www.mixcloud.com/hkcrow/asa-podcast-xliv-the-one-where-we-talk-about-what-we-write/ width=660 height=180 /]

MLS Top 50: Total Shots Created by Drew Olsen

I've briefly mentioned the stat Total Shots Created before. Basically it's how frequently a player contributes to the moment leading to an attempt on goal. It's one that I like a lot in terms of crediting individual players for their single contributions to the team's efforts. Obviously there are other elements to a match that are also important and lead to definitive events that have predictive value (i.e. other things that players can do to help a team win). However, shots are one of the more valuable numbers out there and available. There is also the little fact that everyone loves goals. Goals are awesome and invoke celebrations. Shot deflections, all out blocked shots, or midfield recoveries hardly elicit the same reaction from friends but arguably hold near as much individual performance weight/value to the team. With all the emphasis on shots and individual production there is another number worth mentioning: %Tsh (Percentage of Team Shots). It's a pretty percentage of how many of the total team shots a player was involved in creating, not just shooting himself.

This time around, the list is compiled of the top 50 players in shot creation based upon the shots they've taken, assists that they've been attributed, or other shots they've created by their passing ability. Players below have been sorted by their %Tsh.

Player Club POS GP GS MINS G A SHTS KP SH-C ShC-90 Total Team Shots %Tsh
Federico Higuain CLB F 5 5 448 4 2 18 17 37 7.43 71 52.11%
Pedro Morales VAN M 6 4 407 1 2 14 14 30 6.63 67 44.78%
Fabian Espindola DC F 5 5 441 1 2 9 13 24 4.90 55 43.64%
Mauro Diaz DAL M 6 6 515 2 3 10 14 27 4.72 62 43.55%
Robbie Keane LA F 4 4 360 3 1 17 11 29 7.25 68 42.65%
Mauro Rosales CHV M 6 6 540 0 3 9 15 27 4.50 64 42.19%
Lloyd Sam NY M 6 6 531 0 3 11 16 30 5.08 74 40.54%
Landon Donovan LA M-F 4 4 360 0 2 12 12 26 6.50 68 38.24%
Giles Barnes HOU M 5 5 437 0 1 21 5 27 5.56 71 38.03%
Diego Valeri POR M 6 6 518 1 0 18 14 32 5.56 85 37.65%
Erick Torres CHV F 6 6 524 5 0 18 6 24 4.12 64 37.50%
Thierry Henry NY F 4 4 360 1 0 19 8 27 6.75 74 36.49%
Shea Salinas SJ M 0 360 0 3 2 18 23 5.75 69 33.33%
Alvaro Saborio RSL F 6 6 540 3 0 18 4 22 3.67 66 33.33%
Felipe Martins MTL M 6 6 536 1 2 18 12 32 5.37 97 32.99%
Justin Mapp MTL M 6 6 540 0 3 11 18 32 5.33 97 32.99%
Gilberto TOR F 4 4 333 0 0 12 8 20 5.41 62 32.26%
Michael Bradley TOR M 343 1 0 5 15 20 5.25 62 32.26%
Deshorn Brown COL F 5 4 366 1 0 16 4 20 4.92 62 32.26%
Teal Bunbury NE F 6 6 540 0 1 14 9 24 4.00 75 32.00%
Clint Dempsey SEA M 4 3 303 6 1 20 5 26 7.72 84 30.95%
Obafemi Martins SEA F 6 6 531 1 4 10 12 26 4.41 84 30.95%
Quincy Amarikwa CHI F 6 5 475 3 1 14 11 26 4.93 84 30.95%
Eddie Johnson DC F 5 5 441 0 0 11 6 17 3.47 55 30.91%
Graham Zusi KC F-M 4 4 360 1 2 8 14 24 6.00 78 30.77%
Diego Fagundez NE M-F 6 6 539 0 0 19 4 23 3.84 75 30.67%
Darren Mattocks VAN F 6 6 490 1 2 11 7 20 3.67 67 29.85%
Chris Wondolowski SJ F-M 4 4 360 3 0 17 3 20 5.00 69 28.99%
Joao Plata RSL F 3 3 207 2 2 9 8 19 8.26 66 28.79%
Leo Fernandes PHI F 5 3 326 2 1 12 8 21 5.80 74 28.38%
Dom Dwyer KC F 5 4 340 2 0 19 3 22 5.82 78 28.21%
Brad Davis HOU M 311 0 2 3 15 20 5.79 71 28.17%
Marco Di Vaio MTL F 3 3 270 1 1 22 4 27 9.00 97 27.84%
Fabian Castillo DAL F 6 6 539 2 0 15 2 17 2.84 62 27.42%
Will Johnson POR M 6 6 539 1 0 18 5 23 3.84 85 27.06%
Maurice Edu PHI M 6 6 540 2 1 11 8 20 3.33 74 27.03%
Hector Jimenez CLB M 5 5 433 0 2 6 10 18 3.74 71 25.35%
Lamar Neagle SEA F 6 5 416 1 2 13 6 21 4.54 84 25.00%
Mike Magee CHI F 4 4 360 1 2 13 6 21 5.25 84 25.00%
Will Bruin HOU F 5 5 449 3 1 11 5 17 3.41 71 23.94%
Bernardo Anor CLB M 5 5 416 2 0 13 4 17 3.68 71 23.94%
Kenny Miller VAN F 6 5 447 3 1 9 6 16 3.22 67 23.88%
Cristian Maidana PHI M 5 4 293 0 2 8 7 17 5.22 74 22.97%
Vincent Nogueira PHI M 6 6 540 1 1 10 6 17 2.83 74 22.97%
Michel DAL M-D 6 3 312 3 1 9 4 14 4.04 62 22.58%
Vicente Sanchez COL F 3 2 193 4 0 6 8 14 6.53 62 22.58%
Darlington Nagbe POR F-M 6 6 490 0 1 5 13 19 3.49 85 22.35%
Juninho LA M 4 4 358 0 2 7 6 15 3.77 68 22.06%
Benny Feilhaber KC M 5 5 449 1 1 7 9 17 3.41 78 21.79%
Kyle Beckerman RSL M 6 6 540 2 2 7 5 14 2.33 66 21.21%

 

This list below is sorted by total ShC-90, shot creation per 90 minutes. The one stipulation I would make is to be careful when looking at some of the numbers below. Guys like Justin Meram end up at the top of the list after playing just 58 minutes and scoring a goal in that short time. This leads to incorrect perceptions of certain players, as well as providing horrible and trite narratives like "Justin Meram is the most underrated player ever." That might be true, but probably not. Just look out for small sample sizes.

 

Player Club POS GP GS MINS G A SHTS KP SH-C ShC-90 Total Team Shots %Tsh
Justin Meram CLB M 5 0 58 1 1 5 2 8 12.41 71 11.27%
Yannick Djalo SJ M 2 0 56 0 0 6 1 7 11.25 69 10.14%
Marco Di Vaio MTL F 3 3 270 1 1 22 4 27 9.00 97 27.84%
Joao Plata RSL F 3 3 207 2 2 9 8 19 8.26 66 28.79%
Clint Dempsey SEA M 4 3 303 6 1 20 5 26 7.72 84 30.95%
Federico Higuain CLB F 5 5 448 4 2 18 17 37 7.43 71 52.11%
Robbie Keane LA F 4 4 360 3 1 17 11 29 7.25 68 42.65%
Kekuta Manneh VAN F 6 1 167 1 0 11 2 13 7.01 67 19.40%
Thierry Henry NY F 4 4 360 1 0 19 8 27 6.75 74 36.49%
Pedro Morales VAN M 6 4 407 1 2 14 14 30 6.63 67 44.78%
Vicente Sanchez COL F 3 2 193 4 0 6 8 14 6.53 62 22.58%
Landon Donovan LA M-F 4 4 360 0 2 12 12 26 6.50 68 38.24%
Graham Zusi KC F-M 4 4 360 1 2 8 14 24 6.00 78 30.77%
Dillon Serna COL M 2 1 106 0 1 5 1 7 5.94 62 11.29%
Dom Dwyer KC F 5 4 340 2 0 19 3 22 5.82 78 28.21%
Leo Fernandes PHI F 5 3 326 2 1 12 8 21 5.80 74 28.38%
Brad Davis HOU M 311 0 2 3 15 20 5.79 71 28.17%
Shea Salinas SJ M 0 360 0 3 2 18 23 5.75 69 33.33%
Giles Barnes HOU M 5 5 437 0 1 21 5 27 5.56 71 38.03%
Diego Valeri POR M 6 6 518 1 0 18 14 32 5.56 85 37.65%
Gilberto TOR F 4 4 333 0 0 12 8 20 5.41 62 32.26%
Felipe Martins MTL M 6 6 536 1 2 18 12 32 5.37 97 32.99%
Justin Mapp MTL M 6 6 540 0 3 11 18 32 5.33 97 32.99%
Mike Magee CHI F 4 4 360 1 2 13 6 21 5.25 84 25.00%
Michael Bradley TOR M 343 1 0 5 15 20 5.25 62 32.26%
Cristian Maidana PHI M 5 4 293 0 2 8 7 17 5.22 74 22.97%
Lloyd Sam NY M 6 6 531 0 3 11 16 30 5.08 74 40.54%
Chris Wondolowski SJ F-M 4 4 360 3 0 17 3 20 5.00 69 28.99%
Quincy Amarikwa CHI F 6 5 475 3 1 14 11 26 4.93 84 30.95%
Deshorn Brown COL F 5 4 366 1 0 16 4 20 4.92 62 32.26%
Fabian Espindola DC F 5 5 441 1 2 9 13 24 4.90 55 43.64%
Jermain Defoe TOR F 3 3 242 3 0 11 2 13 4.83 62 20.97%
Mauro Diaz DAL M 6 6 515 2 3 10 14 27 4.72 62 43.55%
Lamar Neagle SEA F 6 5 416 1 2 13 6 21 4.54 84 25.00%
Kelyn Rowe NE M 2 2 179 0 0 6 3 9 4.53 75 12.00%
Mauro Rosales CHV M 6 6 540 0 3 9 15 27 4.50 64 42.19%
Obafemi Martins SEA F 6 6 531 1 4 10 12 26 4.41 84 30.95%
Steven Lenhart SJ F 3 3 258 0 0 9 3 12 4.19 69 17.39%
Erick Torres CHV F 6 6 524 5 0 18 6 24 4.12 64 37.50%
Saer Sene NE M 6 4 286 0 0 8 5 13 4.09 75 17.33%
Michel DAL M-D 6 3 312 3 1 9 4 14 4.04 62 22.58%
Teal Bunbury NE F 6 6 540 0 1 14 9 24 4.00 75 32.00%
Juan Luis Anangono CHI F 6 1 113 1 0 5 0 5 3.98 84 5.95%
Sal Zizzo KC F 5 4 367 0 2 9 5 16 3.92 78 20.51%
Dwayne De Rosario TOR M 5 3 254 0 0 10 1 11 3.90 62 17.74%
Bradley Wright-Phillips NY F 5 2 278 1 0 9 3 12 3.88 74 16.22%
Diego Fagundez NE M-F 6 6 539 0 0 19 4 23 3.84 75 30.67%
Will Johnson POR M 6 6 539 1 0 18 5 23 3.84 85 27.06%
David Texeira DAL F 5 2 211 1 0 6 3 9 3.84 62 14.52%
Marco Pappa SEA M 4 2 165 0 0 6 1 7 3.82 84 8.33%

 

Overall, we're still just getting used to this statistic, but it seems like it could help dig a little deeper into valuing those players that don't always directly put the goal in the back of the net, but still play a key role in their teams' abilities to do so.

Location Adjusted Total Shots Ratio by Drew Olsen

Millionaire Malcolm Forbes was famous for his quote, "He who dies with the most toys wins." And while that might not be the most moral mantra for life, sports fans have a hard time arguing with the logic. After all, a game is about runs, points or goals, and after enough of those it's about shiny trophy cases. But in the world of sports analysis there is no such victory in the absolute. Analysts need to explain how those runs, points or goals came about. In the world of soccer especially, there is never a complete answer. Goals are exceedingly rare, so explaining how they grace us with their presence mathematically is difficult, to say the least. We're happy with higher R-squareds and other such geeky descriptive metrics. Have you ever seen a trophy case filled with strong correlations? Nope, all we get is a little blog post, and if we're lucky, some twitter praise. Still, we search.... One of the more popular explanations for winning in soccer is Total Shots Ratio, which calculates the percentage of shots taken by a team in games played by that team. A 60% TSR means that a given team took 60% of the total shots fired in the games they played. The logic isn't all that difficult to wrap your head around. If you can take more shots than your opponent you are likely to score more goals. For the English Premier League, TSR explains 68% of the variance in the point table, which is impressive for one statistic. TSR happens to be less important in MLS.

data sources: AmericanSoccerAnalysis, mlssoccer.com

The variance prediction is just 37% and this is likely due to the lower finishing rates in MLS compared to the EPL, rendering shots less effective. But there are probably a number of other reasons why TSR is less predictive of points in MLS. There are a larger percentage of teams employing counterattack strategies which have significant impacts on finishing rates, which would in turn alter the effectiveness of TSR. But what if the shots were weighted to account for the location of the shots? It would be logical to assume that better teams take better shots and make it more difficult on the opposing shooters. But does that logic actually manifest itself when predicting points? ASA's Expected Goals 1.0 worked pretty well, so a TSR adjusted for shot locations ought to work better than the original TSR.

The first thing required would be a fair weighting of shots by location. To do that I took the ratio of the finishing rate by location and divided by the average finishing rate. Here is the resulting table for adjusting the value of shots.

Location Weighting
1 3.14
2 1.79
3 0.72
4 0.54
5 0.24

For the sake of simplicity I have collapsed zones 5 & 6 into a fifth zone. This table illustrates that a shot from zone 1--inside the 6-yard box--is actually worth 3.14 average shots. And a shot from zone 5 is worth just .24 average shots. Adjusting all of the shots in MLS in 2013 yields the following result when attempting to predict table points.

data sources: AmericanSoccerAnalysis, mlssoccer.com

You can tell from just eyeballing the dispersion of the data points that the location adjusted TSR better aligns with points and the Rsquared agrees. There is a 17-percent increase in R-squared. Not just pure volume of shots, but the location of those shots is vital to predicting points in MLS. It would be interesting to see if location is equally important in the EPL, where TSR is already such a strong predictor.

For the curious, the New York Red Bulls were the team that was best at getting better shots than their opponent. Their TSR improved from 47% to 52% when adjusting for shot location. Real Salt Lake actually took the biggest hit. Their TSR was 53% and their location-adjusted TSR dropped to 48%.

It's only one season's worth of data, but with such an impressive increase in the ability to explain the variance in point totals, it confirms that location does matter, and that teams are rewarded by taking better shots themselves while pushing their opponents -out farther from goal. And perhaps soccer analysts have another statistical toy to add to the toy box---Location-Adjusted Total Shot Ratio.

In Defense of the San Jose Earthquakes and American Soccer by Drew Olsen

Note: This is part II of the post using a finishing rate model and the binomial distribution to analyze game outcomes. Here is part I. As if American soccer fans weren’t beaten down enough with the removal of 3 MLS clubs from the CONCACAF Champions League, Toluca coach Jose Cardozo questioned the growth of American soccer and criticized the strategy the San Jose Earthquake employed during Toluca’s penalty-kick win last Wednesday. Mark Watson’s team clearly packed it in defensively and looked to play “1,000 long balls” on the counterattack. It certainly doesn’t make for beautiful fluid soccer but was it a smart strategy? Are the Earthquakes really worthy of the criticism?

Perhaps it’s fitting that Toluca is almost 10,000 feet above sea level because at that level the strategy did look like a disaster. Toluca controlled the ball for 71.8% of the match and ripped off 36 shots to the Earthquakes' 10. It does appear that San Jose was indeed lucky to be sitting 1-1 at the end of match. The fact that Toluca only scored one lone goal in those 36 shots must have been either unlucky or great defense, right? Or could it possibly have been expected?

The prior post examined using the binomial distribution to predict goals scored, and again one of the takeaways was that the finishing rates and expected goals scored in a match decline as shots increase, as seen below. This is a function of "defensive density," I’ll call it, or basically how many players a team is committing to defense. When more players are committed to defending, the offense has the ball more and ultimately takes more shots. But due to the defensive intensity, the offense is less likely to score on each shot.

 source: AmericanSoccerAnalysis

Mapping that curve to an expected goals chart you can see that the Earthquakes expected goals are not that different from Toluca’s despite the extreme shot differential.

source data: AmericanSoccerAnalysis

Given this shot distribution, let’s apply the binomial distribution model to determine what the probability was of San Jose advancing to the semifinals of the Champions League. I’m going to use the actual shots and the expected finishing rate to model the outcomes. The actual shots taken can be controlled through Mark Watson’s strategy, but it's best to use expected finishing rates to simulate what outcomes the Earthquakes were striving for. Going into the match the Earthquake needed a 1-1 draw to force a shootout. Any better result would have seen them advancing and anything worse would have seen them eliminated.

Inputs:

Toluca Shots: 36

Toluca Expected Finishing Rate: 3.6%

San Jose Shots: 10

San Jose Expected Finishing Rate: 11.2%

Outcomes:

Toluca Win: 39.6%

Toluca 0-0 Draw: 8.3%

Toluca 1-1 Draw: 13.9% x 50% PK Toluca = 6.9%

Total Probability Toluca advances= 54.9%

 

San Jose Win: 32.3%

2-2 or higher Draw = 5.8%

San Jose 1-1 Draw: 13.9% x 50% PK San Jose = 6.9%

Total Probability San Jose Advances = 45.1%

 

The odds of San Jose advancing with that strategy are clearly not as bad as the 10,000-foot level might indicate. Counterattacking soccer certainly isn’t pretty, but it wouldn’t still exist if it weren’t considered a solid strategy.

It’s difficult, but we can also try to simulate what a “normal” possession-based strategy might have looked like in Toluca. In MLS the average possession for the home team this year is 52.5% netting 15.1 shots per game. In Liga MX play, Toluca is only averaging about 11.4 shots per game so they are not a prolific shooting team. They are finishing at an excellent 15.2%, which could be the reason San Jose attempted to pack it in defensively. The away team in MLS is averaging 10.4 shots per game. If we assume that a more possession oriented strategy would have resulted in a typical MLS game then we have the following expected goals outcomes.

source data: AmericanSoccerAnalysis

Notice the expected goal differential is actually worse for San Jose by .05 goals. Though it may not be statistically significant, at the very least we can say that San Jose's strategy was not ridiculous.

Re-running the expected outcomes with the above scenario reveals that San Jose advances 43.3% of the time. A 1.8% increase in the probability of advancing did not deserve any criticism, and definitely not such harsh criticism. It shows that the Earthquakes probably weren’t wrong in their approach to the match. And if we had factored in a higher finishing rate for Toluca, the probabilities would favor the counterattack strategy even more.

Even though the US struck out again in the CONCACAF Champions League, American's don't need to take abuse for their style of play. After all, soccer is about winning, and in the case of a tie, advancing. We shouldn't be ashamed or be criticized when we do whatever it takes to move on.

 

Predicting Goals Scored using the Binomial Distribution by Drew Olsen

Much is made of the use of the Poisson distribution to predict game outcomes in soccer. Much less attention is paid to the use of the binomial distribution. The reason is a matter of convenience. To predict goals using a Poisson distribution, “all” that is needed is the expected goals scored (lambda). To use the binomial distribution, you would need to both know the number of shots taken (n) and the rate at which those shots are turned into goals (p). But if you have sufficient data, it may be a better way to analyze certain tactical decisions in a match. First, let’s examine if the binomial distribution is actually dependable as a model framework. Here is the chart that shows how frequently a certain number of shots were taken in a MLS match.

source data: AmericanSoccerAnalysis

The chart resembles a binomial distribution with right skew with the exception of the big bite taken out of the chart starting with 14 shots. How many shots are taken in a game is a function of many things, not the least of which are tactical decisions made by the club. For example it would be difficult to take 27 shots unless the opposing team were sitting back and defending and not looking to possess the ball. Deliberate counterattacking strategies may very well result in few shots taken but the strategy is supposed to provide chances in a more open field.

Out of curiosity let’s look at the average shot location by shots taken to see if there are any clues about the influence of tactics. To estimate this I looked expected goals by each shot total. This does not have any direct influence on the binomial analysis but could come in useful when we look for applications.

source: AmericanSoccerAnalysis

The average MLS finishing rate was just over 10 percent in 2013. You can see that, at more than 10 shots per game, the expected finishing rate stays constant right at that 10-percent rate. This indicates that above 10 shots, the location distribution of those shots is typical of MLS games. However, at fewer than 10 shots you can see that the expected goal scoring rate dips consistently below 10%. This indicates that teams that take fewer shots in a game also take those shots from worse locations on average.

The next element in the binomial distribution is the actual finishing rate by number of shots taken.

 source: AmericanSoccerAnalysis

Here it’s plain that the number of shots taken has a dramatic impact on the accuracy rate of each shot. This speaks to the tactics and pace of play involved in taking different shot amounts. A team able to squeeze off more than 20 shots is likely facing a packed box and a defense less interested in ball possession. What’s fascinating then is that teams that take few shots in a game have a significantly higher rate of success despite the fact that they are taking shots from farther out. This indicates that those teams are taking shots with significantly less pressure. This could indicate shots taken during a counterattack where the field of play is more wide open.

Combining the finishing accuracy model curve with number of shots we can project expected goals per game based on number of shots taken.

ExpGoalsbyShotsTaken

What’s interesting here is that the expected number of goals scored plateaus at about 18 shots and begins to decline after 23 shots. This, of course, must be a function of the intensity of the defense they are facing for those shots because we know their shot location is not significantly different. This model is the basis by which I will simulate tactical decisions throughout a game in Part II of this post.

Now we have the two key pieces to see if the binomial distribution is a good predictor of goals scored using total shots taken and finishing rate by number of shots taken. As a refresher, since most of us haven’t taken a stat class in a while, the probability mass function of the binomial distribution looks like the following:

source: wikipedia

Where:

n is the number of shots

p is the probability of success in each shot

k is the number of successful shots

Below I compare the actual distribution to the binomial distribution using 13 shots (since 13 is the mode number of shots from 2013’s data set), assuming a 10.05% finishing rate.

source data: AmericanSoccerAnalysis, Finishing Rate model

The binomial distribution under predicts scoring 2 goals and over predicts all other options. Overall the expected goals are close (1.369 actual to 1.362 binomial). The Poisson is similar to the binomial but the average error of the binomial is 12% better than the Poisson.

If we take the average of these distributions between 8 and 13 shots (where the sample size is greater than 40) the bumps smooth out.

source data: AmericanSoccerAnalysis, Finishing Rate model

The binomial distribution seems to do well to project the actual number of goals scored in a game, and the average binomial error is 23% lower than with the Poisson. When individually looking at shots taken 7 to 16 the binomial has 19% lower error if we just observe goal outcomes 0 and 1. But so what? Isn’t it near impossible to predict the number of shots a team will take in the game? It is. But there may be tactical decisions like counterattacking where we can look at shots taken and determine if the strategy was correct or not. And a model where the final stage of estimation is governed by the binomial distribution appears to be a compelling model for that analysis. In part II I will explore some possible applications of the model.

Jared Young writes for Brotherly Game, SB Nation's Philadelphia Union blog. This is his first post for American Soccer Analysis, and we're excited to have him!