Data and sports have long been a pairing, but the integration and complexity of that data have varied for different sports. Baseball’s start-stop system of play has been conducive for the proliferation of data collection and usage of advanced statistics (sabermetrics) with the most notable early case being Billy Beane’s Oakland Athletics of the 1990s (as popularized in the 2003 book, Moneyball, and its 2011 movie adaptation). Soccer, however, has been slower to come around in the advancement of statistics and data. This is largely due to its complexity, continuous nature of play, and social/cultural resistance (“Soccer is played on a pitch, not a spreadsheet!”). With the technology now at our disposal, barriers for data collection and computation have been lifted and new avenues opened up.
In my Louisville City writing, I make heavy reference to data and statistics. While some are somewhat straightforward, like the number of shots or possession percentages, others are a bit more obtuse. They require a little more background to understand and properly apply to your understanding of the team’s performances. This piece breaks down the most popular advanced soccer statistic, expected goals (xG).
Expected goals (or xG) measure the quality of a chance by calculating the likelihood that it will be scored from a particular position on the pitch during a particular phase of play. xG is measured on a scale between 0.0 and 1.0, where 0.0 represents a chance that is impossible to score and 1.0 represents a chance that a player would be expected to score every single time (1) (which will never happen). Think of it as a numerical value assigned to the “eye test”. 'Shots on goal' does not differentiate between a long-range strike and a missed open goal from two yards out, but xG does (2). Even the most uninformed viewer can watch those two sequences and pick which of the two “should have been a goal.” Regardless if the shot is made or not, it is assigned a probability of whether a given shot will result in a goal based on historical data on shots taken from different positions (3). This tool helps with comprehension of the game by showing how many goals an average team would be expected to score (3) based on the distance to the goal, angle, body part (e.g., header or foot), type of assist (e.g., through ball, cross, pull-back, etc.), and pattern of play (e.g., open play, fast break, direct free kick, corner kick, throw-in, etc.) (1). Different leagues have different data availability. While the “Big 5” European leagues like the Premier League and La Liga will have a greater depth of data to utilize, smaller leagues like the USL will generally have less of these components.
Let take a look at some LouCity examples. On 09/07/2021, Louisville City traveled north to Connecticut to take on the Hartford Athletic. GameFlowxPG posted the below xG breakdown for the match using data from American Soccer Analysis (more on various xG sources in a bit).
As you can see, LouCity won the contest 4-2. Although 4 goals were scored, Morados’ xG was 2.35. This indicates if the match were played 100 times, they would score about 235 goals, averaging 2.35 goals a match based on the quality of chances. Differences between the actual score line and xG are common and totally normal (especially when looking at a small sample size, like one match). xG is not literally a predictor of how many goals a team will score, but rather how many goals may occur based on all the factors at play based on similar situations in the past.
Let’s take this example further and examine two of the four goals.
In the 29th minute, Jorge Gonzalez buried a loose ball after the keeper failed to secure Cameron Lancaster’s initial shot. According to the above graphic, its xG was 0.76. In other words, that would be a goal roughly three out of every four attempts. Given that the keeper was lying on the ground and the shot was taken roughly a foot from the goal, it makes sense this attempt would be much closer to 1.0 than 0.0.
In the 59th minute, Antoine Hoppenot secured LouCity’s fourth and final goal of the match. It was a spectacular shot that will be on his highlight reel for years to come. What made it so special was how challenging of a shot it was. Per the above graphic, it had an xG just below 0.1. While he was not outlandishly far from the goal, the defensive presence, shooting on the run, and the placement of the shot all contributed to it being considered a shot only made roughly 10% of the time.
Now that we have established what xG is, several caveats must be made clear. Like any statistical measure, context is everything, and understanding limitations can help to stymie misuse. First off, there is not one universal xG model. Various models assign different weights to different actions. Models will have a different set of parameters that result in the final expected goals figure they produce - one model may rate a chance at 0.52 xG, while others have it at 0.47, 0.58, or even greater than 0.60 (4). Take StatsBomb and Opta, for example. Both have reputable xG models, but the assigned xG value for a penalty kick differs. For StatsBomb, it’s 0.76, while it is 0.79 for Opta. It’s not a perfect science given the shot percentage variations league to league. If using this data, it’s important to use a reputable provider and use the same one consistently. In my data overviews of LouCity and the USL, I reference American Soccer Analysis, but I do pull tables from FootyStats for match previews that contain an xG value that does differ.
The second point of clarification that will be covered here is the sample size. If you follow baseball, you may hear jokes early in the season from broadcasters about a ridiculous number of home runs a player is on pace for. While the numbers they may be mentioned are not made up, they are poking fun at the fact that they are misleading. A player who hits 8 home runs in his first ten games is VERY unlikely to keep that pace up. The same principle applies with xG. Single-game xG comparisons are fine if you want to see how dangerous a team’s chances were in a particular game, but drawing conclusions about those numbers from single matches, either positive or negative, can be misleading (5). As of 09/10/2021, per American Soccer Analysis, the average number of shots per match in the USL is 12.22. With only roughly 12 shots per match, any variation can greatly skew the data. If a team scores 4 goals on 4 challenging shots, that xG will vary from a match where many sitters were missed. Over time, the summation of all the chances in relation to the number of actual goals scored will increase. While single game xG values can provide some insight, it should be taken for what it is and within the context of other stats and factors.
The third and final factor worth mentioning is that a team with a higher xG does not mean the team “should have won.” xG is only measuring chance quality and not the expected outcome of the game (1). Goals change games and the score line influences how teams play. The name itself can be a misconception. We do not “expect” goals to occur exactly as the likelihood predicts1 but in the long run, a team that consistently gets more xG than their opponents will win more games than a team that consistently gets less xG than their opponents.
So, in short, xG is a measure ranging from 0 to 1 that captures the quality of a chance and the likelihood it would be a goal based on numerous factors about the play compared to historical outcomes. Every shot is compared to thousands of shots with similar characteristics to determine the probability that this shot will result in a goal (6). People say things like “he should have scored from there” or “we were unlucky not to get a result,” which are simply narrative forms of something that xG will support (or refute) through the use of data (4). The YouTube page, Tifo Football, has a great, concise video on the subject I’d encourage you to check out.
I hope that this primer on xG has helped your understanding and appreciation for the numbers behind the beautiful game! Big thanks to USL Tactics and Zach Allen-Kelly for their assistance with this piece!
For those seeking further individual examples, I have pulled a few high and low xG opportunities from Louisville City’s 2021 season to use as examples!