Beyond xG: Using PFF Shooting Grades to improve goalscoring predictions in MLS

By Alexander Schram

Editor’s note: This is a submission from a PFF FC employee. PFF FC uses proprietary methods/models to produce their analysis. The methodology described by the author combining well organized subjective opinion with quantifiable metrics that can augment analysis is interesting. ASA was not paid to publish this article and publication does not constitute an endorsement of PFF FC.

Expected goals (xG) have become the most widely used metric in football analytics in the last decade. In short, xG models, such as the one developed by American Soccer Analysis, calculate how many goals a team should have scored based on the characteristics of the shots they have taken. Penalties provide the easiest example: roughly 76% of penalties are converted, so each penalty has an xG value of 0.76. Many articles have shown that using expected goals is preferred to actual goals when evaluating performance, because the metric is generally much more stable due to the rare nature of actual goals.

One of the limitations of xG is that the models do not take into account the player taking a shot. Not all players are created equal, meaning a shot taken by Hany Mukhtar, the 2022 MLS MVP, is more likely to end up in goal than a shot taken by let’s say, Nouhou Tolo, the legendary Sounders defender who isn’t particularly known for his finishing skills. However, a model that only takes into account a player’s past xG values, will predict the same amount of future goals for Mukhtar and Tolo in the unlikely but not impossible case both have produced similar shots in a given sample. And while it is proven that finishing is not necessarily a repeatable skill, it’s safe to say that if we want to predict future goals, adding shooting ability seems like a good idea.

In come PFF Shooting Grades. The PFF player grading system evaluates every player for every event during a game, including every shot. Each player is given a grade of -2.0 to +2.0 in 0.5 increments on a given play with 0 generally being the average or “expected” grade. The plus-minus grades, which involve manual scoring of each event by a human, are transformed into a 0-100 scale at a game and season level to make them more intuitive. For a detailed breakdown of how PFF player grades are calculated, read this article. The 2021 and 2022 leaderboards for shooting are given below (Mukhtar is the only player in the top-10 for both years, he currently ranks 12th for the ongoing 2023 season).

2021   2022   
RankPlayerTeamPFF Shooting GradeRankPlayerTeamPFF Shooting Grade
1Damir KreilachReal Salt Lake91.91Hany MukhtarNashville SC99.2
2Ricardo PepiFC Dallas90.92Lewis MorganNew York Red Bulls91.5
3Felipe MoraPortland Timbers90.63Daniel GazdagPhiladelphia Union90.2
4Ola KamaraD.C. United90.34Dejan JoveljicLA Galaxy88.5
5Hany MukhtarNashville SC90.15Jesús FerreiraFC Dallas88
6Johnny RussellSporting Kansas City896Sebastián DriussiAustin FC87.7
7ChicharitoLA Galaxy88.87Thiago AlmadaAtlanta United87.4
8Daniel SalloiSporting Kansas City87.68Gonzalo HiguaínInter Miami CF87.1
9Diego FagúndezAustin FC86.69Carlos VelaLos Angeles FC86.7
10Daryl DikeOrlando City SC86.310Mikael UhrePhiladelphia Union85.5

So let’s try to build a model. Before we start, we are not expecting miracles here. Predicting football games is harder than predicting the stock market, so we are bound to find out that we are still missing a lot of information that influences future goalscoring.

We’ll start with using stats from the previous 5 games to predict the number of goals scored in the next 5 games. Goalkeepers are excluded from the dataset, obviously. We combine data from ASA, which is easily retrieved through their API, with PFF data and focus our attention on the 2021 and 2022 seasons in MLS. The idea is simple: future goalscoring depends on both being able to get yourself into scoring position (measured by xG) as well as being able to shoot the ball (measured by PFF Shooting Grades).

First we need to make sure that xG and PFF Shooting Grades are not measuring the same thing. In the plot below, we see a small linear relationship between the sum of xG and the average PFF Shooting Grades in the previous 5 games. However, LOWESS suggests that outliers are playing a role: for xG totals north of 1.0, there seems to be no relationship between the two, while there is still plenty of variation left in PFF Shooting Grades for xG totals below 1.0, which implies that both are two independent measures. But how much do they contribute to goalscoring?

Now that we’ve gotten that out of the way, let’s look at some simple ordinary least squares (OLS) models. The table below shows the R-squared, F-statistic and Akaike Information Criterion (AIC) for some. The results confirm that a model with xG per 90 outperforms one with goals per 90 as a single explanatory variable. It also shows that xG per 90 is a better predictor of future goals than the PFF Shooting Grades alone. However, combining variables is the way to go and as it turns out, combining xG per 90 with the PFF Shooting Grades improves the performance of the model, while adding goals per 90 as a third explanatory variable brings little extra to the table.

Included variablesR-squaredF-statisticAIC
Goals per 900.0531874.975011
xG per 900.0732673.874262
Avg. PFF Shooting Grades0.0592130.274769
Goals per 90 & xG per 900.081475.774008
Goals per 90 & Avg. PFF Shooting Grades0.0751371.474200
xG per 90 & Avg. PFF Shooting Grades0.1021926.173189
All 3 variables0.1031287.273183

If we take the last model and multiply the coefficient of each variable by its standard deviation in our sample, we get an absolute impact value, representing the variable's contribution to the prediction. Normalizing the impact values by dividing them by the sum of all impacts allows for a measure of each variable's importance relative to each other. In our case, xG per 90 demonstrates the highest impact (51.9%), followed by PFF Shooting Grades (42.9%), and goals per 90 (5.2%). 

As a side note, changing the number of games (using the previous X games to predict the number of goals scored in the next X games) has the expected effect: including more games leads to better predictions at first, the AIC increase from including 5 games to including 10 games indicates that including an additional 5 games in the model does not lead to a significant improvement in the model's fit to the data, and the extra complexity introduced by the additional parameters outweighs any potential benefits. Below are OLS outputs for models that include all three variables:

Number of games includedObservationsR-squaredF-statisticAIC
1299620.017256.714661
3336720.0571012.354054
5337440.1031926.173189
10304460.1372413.891487
15251900.1271223.684689

But our dataset doesn’t necessarily lend itself to OLS. Not only is our dependent variable ordinal, our sample is also flooded with players taking a very limited number of shots. Knowing this, a Random Forest model is a better choice. We include goals per 90, xG per 90 and the average PFF Shooting Grades over the past 5 games to predict the number of goals a player will score in the next 5. It leads to an R-squared value of 0.115, which easily beats the values obtained when considering each variable individually. 

The Random Forest model also gives us an opportunity to compare feature importances. Again, xG per 90 wins the race (50.1%), followed by a somewhat different distribution than we saw earlier between PFF Shooting Grades (30.2%), and goals per 90 (19.7%). Per position, the feature importances vary: the number of predicted future goals depends more on previous xG per 90 for defenders and midfielders, while actual goals per 90 come into play more for forwards. However, the importance of shooting ability is relatively stable for all.

PositionGoals per 90xG per 90PFF Shooting Grades
Defenders8.60%57.10%34.40%
Midfielders18.90%47.10%34.00%
Forwards32.00%35.40%32.60%

So both OLS and Random Forest confirm what we were expecting: predictions of future goalscoring are improved by adding information about shooting ability that isn’t captured in xG models. This makes a ton of sense and is in line with Ben Tovarney’s findings here, with the key difference that he did not have the PFF Shooting Grades at his disposal. Obviously, the predictions can be improved by including other relevant variables or by using more sophisticated methods, for example by incorporating team strengths into the mix, but we found what we were looking for: evidence that adding a measure of shooting ability improves our predictions of future goalscoring.

Lastly, if we want to use our Random Forest model to predict who will score most goals in the next 5 games, it is important to add some features that we know will improve our predictions. For example, if we don’t include playing time during the previous 5 games as a feature, subs who happened to be successful in limited playing time will get too much credit for their inflated per 90 stats. Furthermore, we know positions play a role, so let’s add in the primary position of a player as well. The R-squared increases to 0.272, which is nice, and the feature importances are as expected (strikers are the most important when in comes to goalscoring):

FeatureImportance
xG per 9035.60%
Sum of minutes25.00%
Avg. PFF Shooting Grades20.90%
Goals per 9011.20%
Position: ST3.80%
Position: W1.10%
Position: AM1.00%
Position: CB0.40%
Position: CM0.40%
Position: DM0.40%
Position: FB0.30%

So without further ado, we present the players who are predicted to score the most in the next 5 games:

PlayerTeamPredicted Goals
Ryan GauldVancouver Whitecaps2.47
Teal BunburyNashville SC2.32
Carles GilNew England Revolution2.3
Julián CarranzaPhiladelphia Union2.08
Cucho HernándezColumbus Crew1.97
Alan PulidoSporting Kansas City1.92
EvanderPortland Timbers1.84
Josef MartínezInter Miami CF1.82
Brandon VázquezFC Cincinnati1.81
Daniel GazdagPhiladelphia Union1.77
Christian BentekeD.C. United1.75
C.J. SapongToronto FC1.67
Thiago AlmadaAtlanta United1.65
Luciano AcostaFC Cincinnati1.64
Hany MukhtarNashville SC1.62