The Soccer Data Revolution

The first time I watched the famous baseball analytics film “Moneyball,” I asked myself, “Where the hell is soccer’s Billy Beane?” Now, for those of you who have somehow never seen this must-see movie, I was asking why no soccer geek had attempted to do what Beane did when he revolutionized baseball with his intelligent use of data to sign undervalued players to his team. Yes, I’ll admit, being the soccer geek that I am, I imagined being locked up in my room all day, studying my little book of numbers and eventually coming up with the secret of what it takes to win. Ideally, I was going to gift this formula to my favorite club, Liverpool FC, to help them finally win that ever elusive Premier League trophy. 

As it turns out, soccer already has its data geeks. And believe me, these soccer statisticians and fanatics are shaping the future of the sport. 

For poorer clubs stuck in a world in which the rich clubs only get more dominant each year, employing data analysts may, in fact, be their only hope. If used correctly, data analytics can help these mediocre clubs to drastically improve the quality of their match analysis, scouting, and training, allowing them to more effectively develop strategies to compete with the more affluent clubs. 

Few things are as painful as watching your team waste a ‘golden chance.’ Not many Chelsea fans will forget how Fernando Torres somehow failed to slide the ball into an empty goal against Manchester United in 2011. Regular fans, engaged in their often-heated post-match discussions, might say that the team missed a key opportunity. For the soccer analyst, however, it is much more useful to know just how good the missed chance was. The analyst can then provide the coach with a more objective evaluation of the team’s performance. But how does the analyst quantify the quality of a chance in the first place?

This is where the ingenious invention “expected goals” (xG) comes in. Taking into account variables such as how far from the goal a player is, how many defenders are around them, at what angle they are positioned from the center of the goal, and whether the ball is on the ground or in the air, xG models can determine the probability that a player, in nearly any given situation, will score a goal. 

The probability is calculated based on a large, historical database that records past data of the outcomes (goal or no goal) of real game situations. For example, a penalty kick in the English Premier League goes in about 77 percent of the time, which translates to a 0.77 goal value. For a soccer fan, this means that if your favorite team has a penalty, you can probably be optimistic, since the probability of them scoring is 0.77. 

For the analyst, xG is particularly useful because it provides an objective view of the quality of goal-scoring opportunities that a team has created over a period of time. A team may lose a game two to zero, for example, but looking at the data can soften the blow of the loss. If an analyst studies the post-game data and finds that the team actually created chances resulting in a predicted total goal value of three (the sum of the team’s xG probabilities), the team played well even though they did not get to capitalize on their efforts and score those predicted goals. 

Expected goals is just one example from a group of new, revolutionary metrics being used in soccer (others include expected assists, expected points, etc.). Forward thinking individuals are now using data to make more objective judgements about The Beautiful Game.

One such forward thinking individual, perhaps the closest soccer has come to finding its own Billy Beane, is Matthew Benham. Formerly a shrewd professional soccer better, Benham now owns FC Brentford in England, his beloved boyhood club, and FC Midtjylland in Denmark, the team with which he has found the most success. By introducing the use of data analytics, Benham has elevated Midtjylland from a mediocre club on the brink of bankruptcy to the champions of Denmark. The club uses analytics in areas such as scouting, in-game analysis, post-game analysis, and training. 

Of course, it’s one thing to have data and another to know what to do with it. Midtjylland’s success is remarkable, not only in the abundance of data they have been able to collect, but in their utilization and interpretation of that data as well. Midtjylland gives other small, relatively poor clubs the hope that they may too employ data analytics as a secret weapon in the economically unbalanced world of soccer. 

However, it is unlikely that the better-funded clubs will simply sit back and ignore the use of analytics. In fact, Liverpool FC (my dream employer and one of the wealthier soccer clubs) already employs data scientists. Willer Spearman, a Harvard educated physicist who has published numerous research papers, including one entitled, “Physics-Based Modeling of Pass Probabilities in Soccer,” works as the lead data scientist at the club. Tough competition, not just for my own career dreams, but for the poorer, mediocre clubs hoping to use analytics as their secret weapon.  

Some may feel that analytics cannot be used to predict every outcome in soccer or that the numbers take away from the emotion and complexity of the sport. And they would be rightbut only to an extent. Soccer, with its low scoring nature and its susceptibility to random events, will always be a difficult sport to quantify and analyze. Data and probability can’t explain every outcome: for example, in 2015, Leicester City won the English Premier League with the odds stacked against them 5000 to one. But does this mean that any attempts to further our understanding of the game are fruitless? Matthew Benham and FC Midtjylland certainly wouldn’t say so.

As for the claim that numbers are taking the emotion away from soccer, I like to think of it this way: the numbers are not there to reduce the emotion of The Beautiful Game, they are simply meant to explain it. I, for one, felt a greater appreciation for the brilliance of Lionel Messi when I looked into his xG statistics and realized that, season after season, he scored from positions that most other players only score from 1 percent of the time.  

Soccer will always have an element of randomness to it. It is in this randomness, perhaps, that one finds beauty. The sport’s complexity also ensures that there probably will never be some simple mathematical formula that “solves the game.” However, there will be curious and intelligent minds like Matthew Benham that will constantly seek to push the boundaries of how we understand soccer. As for us regular soccer fans, we need only hope that the use of analytics will level the playing field and prevent soccer from becoming a sport dominated by the same exclusive group of rich clubs.

Mediocre Issue | November 2019