Off script

off script

So, how did your team get on in the first round of Premier League fixtures for the 2019-20 season? My team, Sheffield United, were back in the top flight after a 13-year absence. It didn’t go too well though. Here’s the report:

EFL goal machine Billy Sharp’s long wait for a top-flight strike ends on the opening day. Ravel Morrison with the assist. But Bournemouth run out 4-1 winners.

And as if that’s not bad enough, we finished the season in bottom place:

script

Disappointing, but maybe not unexpected.

Arsenal also had a classic Arsenal season. Here’s the story of their run-in:

It seems only the Europa League can save them. They draw Man United. Arsenal abandon all hope and crash out 3-2. Just as they feared. Fans are more sad than angry. Once again they rally. Aubameyang and Alexandre Lacazette lead a demolition of high flying Liverpool. But they drop too many points and end up trophyless with another fifth-place finish.

Oh, Arsenal!

But what is this stuff? The Premier League doesn’t kick off for another week, yet here we have complete details of the entire season, match-by-match, right up to the final league table.

Welcome to The Script, produced by BT Sport. As they themselves explain:

Big data takes on the beautiful game.

And in slightly more detail…

BT has brought together the biggest brains in sports data, analysis and machine learning to write the world’s first artificial intelligence-driven script for a future premier league season.

Essentially, BT Sport have devised a model for match outcomes based on measures of team abilities in attack and defence. So far, so standard. After which…

We then simulate the random events that could occur during a season – such as injuries and player transfers – to give us even more accurate predictions.

But this is novel. How do you assign probabilities to player injuries or transfers? Are all players equally susceptible to injury? Do the terms of a player’s contract affect their chances of being sold? And who they are sold too? And what is the effect on a team’s performance of losing a player?

So, this level of modelling is difficult. But let’s just suppose for a minute you can do it. You have a model for what players will be available for a team in any of their fixtures. And you then have a model that, given the 2 sets of players that are available to teams for any fixture, spits out the probabilities of the various possible scores. Provided the model’s not too complicated, you can probably first simulate the respective lineups in a match, and then the scores given the team lineups. And that’s why Sheffield United lost 4-1 on the opening day to Bournemouth. And that’s why Arsenal did an Arsenal at the end of the season. And that’s why the league table ended up like it did above.

But is this a useful resource for predicting the Premier League?

Have a think about this before scrolling down. Imagine you’re a gambler, looking to bet on the outcome of the Premier League season. Perhaps betting on who the champions will be, or the top three, or who will be relegated, or whether Arsenal will finish fifth. Assuming BT’s model is reasonable, would you find the Script that they’ve provided helpful in deciding what bets to make?

|
|
|
|
|
|
|
|
|
|
|
|
|
|

Personally, I think the answer is ‘no’, not very helpful. What BT seem to have done is run A SINGLE SIMULATION of their model, for every game over the entire season, accumulating the simulated points of each team per match to calculate their final league position.

A SINGLE SIMULATION!

Imagine having a dice that you suspected of being biased, and you tried to understand its properties with a single roll. It’s almost pointless. Admittedly, with the Script, each team has 38 simulated matches, so the final league table is likely to be more representative of genuine team ability than the outcome of a single throw of a dice. But still, it’s the simulation of just a single season.

What would be much more useful would be to simulate many seasons and count, for example, in how many of those seasons Sheffield United were relegated. This way the model would be providing an estimate of the probability that Sheffield United gets relegated, and we could compare that against market prices to see if it’s a worthwhile bet.

In summary, we’ve seen in earlier posts (here and here, for example) contenders for the most pointless simulation in a sporting context, but the Script is lowering the bar to unforeseen levels. Despite this, if the blog is still going at the end of the season, I’ll do an assessment of how accurate the Script’s estimates turned out to be.

 

Can’t buy me love

Ok, money can’t buy you love, but can it buy you the Premier League title? We’ll look at that below, but first this recent Guardian article notes the following Premier League statistics:

Between 2003 and 2006 there were just 3 instances of a team having more than 70% of possession in a game. Two seasons ago there were 37, last season 63 and this season 67.

In other words, by even the simplest of statistical measures, Premier League games are becoming increasingly one-sided, at least in terms of possession. And the implication in the Guardian article is that money is the driving factor behind this imbalance. But is that really the case?

This graph shows final league position of the 20 Premier League teams plotted against their wealth in terms of start-of-season squad market value (taken from here).

To make things slightly clearer, the following diagram shows the same thing, but with a smooth curve (in blue) added on top, estimated using standard statistical techniques, which shows the overall trend in the data.

Roughly speaking, teams above the blue line have performed better than their financial resources would have suggested; those below have performed worse.

Bear in mind this is just one season’s data. Also, success breeds success, and money breeds money, so the differential between teams in terms of wealth as a season progresses is likely to increase further. For these reasons and others, not too much should be read into the slight wobbles in the blue curve. Nonetheless, a number of general features emerge:

  1. It’s a very noisy picture for teams with less than £250 m. Arguably, at that level, there’s no very obvious pattern between wealth and final position: there’s a bunch of teams with between £100 m and £250 m, and their league position within this group of teams isn’t obviously dependent on their wealth. As such, teams in this category are unlikely to get out of the bottom half of the table, and their success within the bottom half is more likely to depend on how well they’ve spent their money than on how much they actually have. And on luck.
  2. Teams with between £250 m and £500 m are likely to force their way out of the ‘relegation-battle pack’, but not into the top 6 elite.
  3. The cost of success at the top end is high: the blue curve at the top end is quite flat, so you have to spend a lot to improve your position. But money, as long as there’s enough of it, counts a lot for elite clubs, and the evidence is that the teams who are prepared to spend the most are likely to improve their league position.
  4. A couple of clubs stand out as having performed very differently to what might be expected: Manchester United have considerably under-performed, while Wolves have substantially over-performed.

The trials and tribulations of Manchester United are well documented. Chances are they just need a change of manager. <Joke>. But Wolves is a much more interesting case, which takes us back to the Guardian article I referred to. As discussed above, this article is more about the way money is shaping the way games are played rather than about the success it brings, with matches between the rich and poor teams increasingly becoming challenges of the attack of one side against the defence of the other. But Wolves have adapted to such imbalances, playing long periods without possession, and attacking with speed and precision when they do have the ball. The template for this type of play was Leicester City in their title-winning season, but even though it was just a few seasons ago, the financial imbalances were far smaller than now.

It seems then, that to a very large extent, a team’s performance in the Premier League is likely to be determined by its wealth. Good management can mitigate for this, just as bad management can lead to relatively poor performance. But even where teams are punching above their weight, they are having to do so by adapting their gameplay, so that matches are still dominated in terms of possession by the wealthier sides. As the Guardian article concludes:

Money guides everything. There have always been rich clubs, of course, but they have never been this rich, and the financial imbalances have never had such an impact on how the game is played.

 

“I don’t like your mum”

var

VAR, eh?

So, does video-assisted refereeing (VAR) improve the quality of decision-making in football matches?

Of course, that’s not the only question about VAR: assuming there is an improvement, one has to ask whether it’s worth either the expense or the impact it has on the flow of games when an action is reviewed. But these are subjective questions, whereas the issue about improvements in decision-making is more objective, at least in principle. With this in mind, IFAB, the body responsible for determining the laws of football, have sponsored statistical research into the extent to which VAR improves the accuracy of refereeing decisions.

But before looking at that, it’s worth summarising how the VAR process works. VAR is limited to an evaluation of decisions made in respect of four types of events:

  • Goals
  • Penalties
  • Straight red cards
  • Mistaken identity in the award of cards

And there are two modes of operation of VAR:

  • Check mode
  • Review mode

The check mode runs in the background throughout the whole game, without initiation by the referee. All incidents of the above type are viewed and considered  by the VAR, and those where a potential error are checked, with the assistance of replays if necessary. Such checks are used to identify situations where the referee is judged to have made a ‘clear and obvious error’ or there has been a ‘serious missed incident’.  Mistakes for other types of incidents – e.g. the possible award of a free kick – or mistakes that are not judged to be obvious errors should be discarded during the check process.

When a check by VAR does reveal a possible mistake of the above type, the referee is notified, who is then at liberty to carry out a review of the incident. The review can consist solely of a description of the event from the VAR to the referee, or it can comprise a video review of the incident by the referee using a screen at the side of the pitch. The referee is not obliged to undertake a review of an incident, even if flagged by the VAR following a check. On the other hand, the referee may choose to carry out a review of an incident, even if it has not been flagged by the VAR.

Hope that’s all clear.

Anyway, the IFAB report analysed more than 800 competitive games in which VAR was used, and includes the following statistics:

  • 56.9% of checks were for penalties and goals; almost all of the others were for red card incidents;
  • On average there were fewer than 5 checks per match;
  • The median check time of the VAR was 20 seconds
  • The accuracy of reviewable decisions before VAR was applied was 93%.
  • 68.8% of matches had no review
  • On average, there is one clear and obvious error every 3 matches
  • The decision accuracy after VAR is applied is 98.9%.
  • The median duration of a review is 60 seconds
  • The average playing time lost due to VAR is less than 1% of the total playing time.
  • In 24% of matches, VAR led to a change in a referee’s decision; in 8% of matches this change led to a decisive change in the match outcome.
  • A clear and obvious error was not corrected by VAR in around  5% of matches.

This all seems very impressive. A great use of Statistics to check the implementation of the process and to validate its ongoing use. And maybe that’s the right conclusion. Maybe. It’s just that, as a statistician, I’m still left with a lot of questions. Including:

  1. What was the process for checking events, both before and after VAR? Who decided if a decision, either with or without VAR, was correct or not?
  2. It would be fairest if the analysis of incidents in this experiment were done ‘blind’. That’s to say, when an event is reviewed, the analyst should be unaware of what the eventual decision of the referee was. This would avoid the possibility of the experimenter – perhaps unintentionally – being drawn towards incorrect agreement with the VAR process decision.
  3. It’s obviously the case when watching football, that even with the benefit of slow-motion replays, many decisions are marginal. They could genuinely go either way, without being regarded as wrong decisions. As such, the impressive-looking 93% and 98.9% correct decision rates are probably more fairly described as rates of not incorrect decisions.
  4. There’s the possibility that incidents are missed by the referee, missed by VAR and missed by whoever is doing this analysis. As such, there’s a category of errors that are completely ignored here.
  5. Similarly, maybe there’s an average of only 5 checks per match because many relevant incidents are being missed by VAR.
  6. The use of the median to give average check and review times could be disguising the fact that some of these controls take a very long time indeed. It would be a very safe bet that the mean times are much bigger than the medians, and would give a somewhat different picture of the extent to which the process interrupts games when applied.

So, I remain sceptical. The headline statistics are encouraging, but there are aspects about the design of this experiment and the presentation of results that I find questionable. And that’s before we assess value in terms of cost and impact on the flow of games.

On the other hand, there’s at least some evidence that VAR is having incidental effects that aren’t picked up by the above experiment. It was reported that in Italy Serie A,  the number of red cards given for dissent during the first season of VAR was one, compared with eleven in the previous season. The implication being that VAR is not just correcting mistakes, but also leading to players moderating their behaviour on the pitch. Not that this improvement is being universally adopted by all players in all leagues of course. But anyway, this fact that VAR might actually be improving the game in terms of the way it’s played, above and beyond any potential improvements to the refereeing process, is an interesting aspect, potentially in VAR’s favour, which falls completely outside the scope of the IFAB study discussed above.

But in terms of VAR’s impact on refereeing decisions, I can’t help feeling that the IFAB study was designed, executed and presented in a way that shines the best possible light on VAR’s performance.


Incidentally, if you’re puzzled by the title of this post, you need to open the link I gave above, and exercise your fluency in Spanish vernacular.

Olé, Olé, Olé

ole

So, everyone agrees that Ole Solskjær has been a breath of fresh air at Man United and is largely responsible for their remarkable turn around this season. But here’s a great article by the guys at StatsBomb that adds perspective to that view. Sure, there’s been a change in results since Solskjær arrived, but more importantly xG – the expected goals – have also improved considerably, both in terms of attack and defence. This suggests that the results are not just due to luck; United are genuinely creating more chances are preventing those for the opposition at a greater rate than under Mourinho.

Nonetheless, United’s performance in terms of actual goals is out-performing that of xG: at the time of the StatsBomb report, total xG for United over all games under Solskjær was 17.72, whereas actual goals were 25; and total xG against United was 10.99, with actual goals at 8. In other words, they’ve scored more, and conceded fewer, goals than their performance merits. This suggests that, notwithstanding the improvement in performance, United have also benefited from an upsurge in luck, both in attack and defence.

But more generally, what is the value of a good manager? This recent article references a statistical analysis of data from the German Bundesliga, which aimed to quantify the potential effect a manager could have on a team. It’s not a completely straightforward issue, since the best managers tend to go to the best clubs, who are bound to have a built-in tendency for success that’s not attributable to the manager. Therefore, the research attempted to distinguish between team and manager effects. Their conclusions were:

  • The best 20% of managers were worth around 0.3 points per game more than the weakest 20% of managers. This amounts to 10.2 points over a 34-game season in the Bundesliga.
  • A manager’s estimated performance proved to be a valuable predictor in team performance when a manager changed clubs.
  • The best and worst managers have a strong impact on team performance. For teams with managers having closer to average ability, team performance is more heavily impacted by other factors, such as player quality and recruitment strategy.

In summary, on the basis of this research, there is value in aiming for the best of managers, and avoiding the worst, but not much evidence to suggest it’s worth shopping around in the middle. There are some caveats to this analysis though, and in particular about the way it’s described in the Guardian article:

  1. The analysis uses data from German leagues only up to 2013-14.
  2. This amounts to a total of just 6,426 matches, and includes relatively few managers.
  3. The Guardian article states ‘budget per season’ was accounted for. It wasn’t.
  4. The Guardian article refers to ‘statistical wizardry’. This consists of simple linear regression on points per half season with separate effects for managers and teams. This might be a sensible strategy, but it’s not wizardry.

So, it’s best to treat the precise conclusions of this report with some caution. Nonetheless, the broad picture it paints is entirely plausible.

And going back to Solskjær: there are good reasons to believe he is partly responsible for the overall improvement in performance at United, but a comparison between goals and xG suggests that the team have also been a bit on the lucky side since his arrival, and that their results have flattered to deceive a little.

It’s not your fault (maybe)

 

Most of you who came through the UK school system will have taken GCSE’s at the end of your secondary school education. But did that occur in a year that was even or an odd number? If it was an even number, I have good news for you: a ready-made and statistically validated excuse as to why your results weren’t as good as they could have been.

A recent article in the Guardian pointed to academic research which compared patterns of GCSE results in years with either a World Cup or Euro tournament final – i.e. even-numbered years – with those of other years – i.e. odd-numbered years. They found, for example, that the chances of a student achieving 5 good GCSE grades is 12% lower for students in a tournament year compared with a non-tournament year. This is a big difference, and given the size of the study, strongly significant in statistical terms. In other words, it’s almost impossible that a difference of this magnitude could have occurred by chance if there were really no effect.

The implication of the research is that the World Cup and Euros, which take place at roughly the same time as GCSE final examinations, have a distracting effect on students, leading to poorer results. Now, to be clear: the analysis cannot prove this claim. The fact that there is a 2-year cycle in quality of results is beyond doubt. But this could be due to any cause which has a 2-year cycle that coincides with GCSE finals (and major football finals). But, what could that possibly be?

Moreover, here’s another thing: the difference in performance in tournament and non-tournament years varies among types of students, and is greatest for the types of students that you’d guess are most likely to be distracted by football.

  1. The effect is greater for boys than for girls, though it is also present and significant for girls.
  2. The difference in performance (of achieving five or more good GCSE grades) reaches 28% for white working class boys.
  3. The difference for black boys with a Caribbean background is similarly around 28%.

So, although it requires a leap of faith to assume that the tournament effect is causal rather than coincidental so far as GCSE performance goes, the strength of circumstantial evidence is such that it’s a very small leap of faith in this particular case.

 

The numbers game

If you’re reading this post, you’re likely to be aware already of the importance of Statistics and data for various aspects of sport in general and football in particular. Nonetheless, I recently came across this short film, produced by FourFourTwo magazine, which gives a nice history of the evolution of data analytics in football. If you need a refresher on the topic, this isn’t a bad place to look.

And just in case you don’t think that’s sufficient to justify this post in a Statistics blog, FourFourTwo claims to be ‘the world’s biggest football magazine’. Moreover, many of the articles on the magazine’s website are analytics-orientated. For example: ‘Ronaldo averaged a game every 4.3 days‘. Admittedly, many of these articles are barely-disguised advertisements for a wearable GPS device intended for tracking activity of players during matches. But I suppose even 199 (pounds)  is a number, right?

 

Nokia 3310

Whatever happened to the Nokia 3310, and what’s that got to do with sports data?

Many of you will know Rasmus Ankerson from his involvement with both Brentford and Midtjylland. Maybe you’ve also seen this video of a TED talk Rasmus gave a while back, but I’ve only just come across it. I think it’s interesting because there are now plenty of articles, books and – ahem – blogs, which emphasise the potential for statistics and data analytics in both sports and gambling. But Rasmus’s talk here goes in the other direction and argues that since data analytics has been proven as a valuable tool to assist gambling on sports, there are lessons that can be learned for leaders of business and industry. The main themes are

  1. In any process where there’s an element of chance, it’s important to recognise that good and bad results are not just a function of good and bad performance, but also of good and bad luck;
  2. There are potentially huge gains in trying to identify the aspects of performance that determine either good or bad results, notwithstanding the interference effects of luck.

In other words, businesses, like football teams, have results that are part performance-driven and part luck. Signal and noise, if you like. Rasmus argues that good business, like good football management, is about identifying what it is that determines the signal, while mitigating for the noise. And only by adopting this strategy can companies, like Nokia, avoid the type of sudden death that happened to the 3310. Or as Rasmus puts it: “RIP  at gadgets graveyard”.

Anyway, Rasmus’s talk is a great watch, partly because of the message it sends about the importance of Statistics to both sport and industry, but also because it includes something about the history of the relationship between Smartodds, Brentford and Midtjylland. Enjoy.

Regression to the mean

kahneman

When Matthew.Benham@smartbapps.co.uk was asked at a recent offsite for book recommendations, his first suggestion was  Thinking, Fast and Slow.  This is a great book, full of insights that link together various worlds, including statistics, economics and psychology. Daniel Kahneman, the book’s author, is a world-renowned psychologist in his own right, and his book makes it clear that he also knows a lot about statistics. However, in a Guardian article  a while back, Kahneman was asked the following:

Interestingly, it’s a fact that highly intelligent women tend to marry men less intelligent than they are. Why do you think this might be?

He answered as follows:

It’s a fact – but it’s not interesting at all. Assuming intelligence is similarly distributed between men and women, it’s just a mathematical inevitability that highly intelligent women, on average, will be married to men less intelligent than them. This is “regression to the mean”, and all it really tells you is that there’s no perfect correlation between spouses’ intelligence levels. But our minds are predisposed to try to construct a more compelling explanation.

<WOMEN: please insert your own joke here about the most intelligent women choosing not to marry men at all.>

Anyway, I can’t tell if this was Kahneman thinking fast or slow here, but I find it a puzzling explanation of regression to the mean, which is an important phenomenon in sports modelling. So, what is regression to the mean, why does it occur and why is it relevant to Smartodds?

Let’s consider these questions by looking at a specific dataset. The following figure shows  the points scored in the first and second half of each season by every team in the Premier League since the inaugural 1992-93 season. Each point in the plot represents a particular team in a particular season of the Premier League. The horizontal axis records the points scored by that team in the first half of the season; the vertical axis shows the number of points scored by the same team in the second half of the same season.

 

pp1Just to check your interpretation of the plot, can you identify:

  1. The point which corresponds to Sunderland’s 2002-03 season where they accumulated just a single point in the second half of the season?
  2. The point which corresponds to Man City’s 100-point season in 2017-18?

Click here to see the answers.

Now, let’s take that same plot but add a couple of lines as follows:

  • The red line divides the data into roughly equal sets. To its left are the points that correspond to the 50% poorest first-half-of-season performances; to its right are the 50% best first-half-of-season performances.
  • The green line corresponds to teams who had an identical performance in the first and second half of a season. Teams below the green line performed better in the first half of a season than in the second; teams above the green line performed better in the second half of a season than in the first.

pp2

 

In this way the picture is divided into 4 regions that I’ve labelled A, B, C and D. The performances within a season of the teams falling in these regions are summarised in the following table:

First Half Best half Number of points
A Below average First 94
B Above average First 174
C Above average Second 71
D Below average Second 187

I’ve also included in the table the number of points in each of the regions. (Counting directly from the figure will give slightly different numbers because of overlapping points).

First compare A and D, the teams that performed below average in the first half of a season. Looking at the number of points, such teams are much more likely to have had a better second half to the season (187 to 94). By contrast, comparing B and C, the teams that do relatively well in the first half of the season are much more likely to do worse in the second half of the season (71 to 174).

This is regression to the mean. In the second half of a season teams “regress” towards the average performance: teams that have done below average in the first half of the season generally do a bit less badly in the second half; teams that have done well in the first half generally do a bit less well in the second half. In both cases there is a tendency to  move – regress – towards the average in the second half. I haven’t done anything to force this; it’s just what happens.

We can also view the phenomenon in a slightly different way. Here’s the same picture as above, where points falling on the green line would correspond to a team doing equally well in both halves of the season. But now I’ve also used standard statistical methods to add a “line of best fit” to the data, which is shown in orange. This line is a predictor of how teams will perform in the second half of season given how they performed in the first, based on all of the data shown in the plot.

pp3

In the left side of the plot are teams who have done poorly in the first half of the season. In this region the orange line is above the green line, implying that such teams are predicted to do better in the second half of the season. On the right side of the plot are the teams who have done well in the first half of the season. But here the orange line is below the green line, so these teams are predicted to do worse in the second half of the season. This, again, is the essence of regression to the mean.

One important thing though: teams that did well in the first half of the season still tend to do well in the second half of the season; the fact that the orange line slopes upwards confirms this. It’s just that they usually do less well than they did in the first half; the fact that the orange line is less steep than the green line is confirmation of that. Incidentally, you’ve probably heard the term “regression line” used to describe a “line of best fit”, like the orange line. The origins of this term are precisely because the fit often involves a regression to the mean, as we’ve seen here.

But why should regression to the mean be such an intrinsic phenomenon that it occurs in football, psychology and a million other places? I just picked the above data at random: I’m pretty sure I could have picked data from any competition in any country – and indeed any sport – and I’d have observed the same effect. Why should that be?

Let’s focus on the football example above. The number of points scored by a team over half a season (so they’ve played all other teams) is dependent on two factors:

  1. The underlying strength of the team compared to their opponents; and
  2. Luck.

Call these S (for strength) and L (for luck) and notionally let’s suppose they add together to give the total points (P). So

P = S + L

Although there will be some changes in S over a season, as teams improve or get worse, it’s likely to be fairly constant. But luck is luck. And if a team has been very lucky in the first half of the season, it’s unlikely they’ll be just as lucky in the second. And vice versa. For example, if you roll a dice and get a 6, you’re likely to do less well with a second roll. While if you roll a 1, you’ll probably do better on your next roll. So while S is pretty static, if L was unusually big or small in the first half of the season, it’s likely to be closer to the average in the second half. And the overall effect on P? Regression to the mean, as seen in the table and figures above.

Finally: what’s the relevance of regression to the mean to sports modelling? Well, it means that we can’t simply rely on historic performance as a predictor for future performance.  We need to balance historic performance with average performance to compensate for inevitable regression to the mean effects; all of our models are designed with exactly this feature in mind.

 

xG, part 1

Adam.Weinrich@smartodds.co.uk wrote and asked for a discussion of xG. I’m so happy about this suggestion that I’m actually going to do two posts on the topic. In this post we’ll look at the xG for a single shot on goal; in a subsequent post we’ll discuss the xG for a passage of play and for an entire game.

xG stands for expected goals, and it’s famous enough now that it’s used almost routinely on Match of the Day. But what is it, why is it used, how is it calculated and is it all it’s cracked up to be?

It’s well-understood these days when trying to assess how well a team has performed in a game, that because goals themselves are so rare,  it’s better to go beyond the final result and look at the match in greater statistical detail.

For example, this screenshot shows the main statistics for the recent game between Milan and Genoa, as provided by Flashscore. Milan won 2-1, but it’s clear from the data here that they also dominated the game in terms of possession and goal attempts. So, on the basis of this information alone, the result seems fair.

Actually, Milan’s winner came in injury time, and if they hadn’t got that goal, again on the basis of the above statistics, you’d probably argue that they would have been unlucky not to have won. In that case the data given here in terms of shots and possession would have given a fairer impression of the way the match played out than just the final result.

But even these statistics can be misleading: maybe most of Milan’s goal attempts were difficult, and unlikely to lead to goals, whereas Genoa’s fewer attempts were absolute sitters that they would score 9 times out of 10. If that were the case, you might conclude instead that Genoa were unlucky to lose. xG – or expected goals – is an attempt to take into account not just the number of chances a team creates, but also the difficulty of those chances.

The xG for a single attempt at goal is an estimate of the probability that, given the circumstances of a shot – the position of the ball, whether the shot is kicked or a header, whether the shot follows a dribble or not, and other relevant information – it is converted into a goal.

This short video from OPTA gives a pretty simple summary.

 

 

So how is xG calculated in practice? Let’s take a simple example. Suppose a player is 5 metres away from goal with an open net. Looking back through a database of many games, we might find (say) 1000 events of an almost identical type, and on 850 of those occasions a goal was scored. In that case the xG would be estimated as 850/1000 = 0.85. But breaking things down further, it might be that 900 of the 1000 events were kicked shots, while 100 were headers; and the number of goals scored respectively from these events were 800 and 50. We’d then calculate the xG for this event as 800/900 = 0.89 for a kicked shot, but  50/100 = 0.5 for a header.

But there are some complications. First, there are unlikely to be many events in the database corresponding to exactly the same situation (5 metres away with an open goal). Second, we might want to take other factors into account: scoring rates from the same position are likely to be different in different competitions, for example, or the scoring rate might depend on whether the shot follows a dribble by the same player. This means that simple calculations of the type described above aren’t feasible. Instead, a variation of standard regression – logistic regression – is used.  This sounds fancy, but it’s really just a statistical algorithm for finding the best formula to convert the available variables (ball position, shot type etc.) into the probability of a goal.

So in the end, xG is calculated via a formula that takes a bunch of information at the time of a shot – ball position, type of shot etc. etc. – and converts it into a probability that the shot results in a goal. You can see what xG looks like using this  simple app.

Actually, there are 2 alternative versions of xG here, that you can switch between in the first dialog box. For both versions, the xG will vary according to whether the shot is a kick or a header. For the second version the xG also depends on whether the shot is assisted by a cross, or preceded by a dribble: you select these options with the remaining dialog boxes. In either case, with the options selected, clicking on the graphic of the pitch will return the value of xG according to the chosen model. Naturally, as you get closer to the goal and with a more favourable angle the xG increases.

One point to note about xG is that there is no allowance for actual players or teams. In the OPTA version there is a factor that distinguishes between competitions – presumably since players are generally better at converting chances in some competitions than others – but the calculation of xG is identical for all players and teams in a competition. Loosely speaking, xG is the probability a shot leads to a goal by an average player who finds themselves in that position in that competition. So the actual xG, which is never calculated, might be higher if it’s a top striker from one of the best teams, but lower if it’s a defender who happened to stray into that position. And in exactly the same way, there is no allowance in the calculation of xG for the quality of the opposition: xG averages over all players, both in attack and defence.

It follows from all this discussion that there’s a subtle difference between xG and the simpler  statistics of the kind provided by Flashscore. In the latter case, as with goals scored, the statistics are pure counts of different event types. Apart from definitions of what is a ‘shot on goal’, for example, two different observers would provide exactly the same data. xG is different: two different observers are likely to agree on the status of an event – a shot on an open goal from the corner of the goal area, for example – but they may disagree on the probability of such an event generating a goal. Even the two versions in the simple app above gave different values of xG, and OPTA would give a different value again. So xG is a radically different type of statistic; it relies on a statistical model for converting situational data into probabilities of goals being scored, and different providers may use different models.

We’ll save discussion about the calculation of xG for a whole match or for an individual player in a whole match for a subsequent post. But let me leave you with this article from the BBC. The first part is a summary of what I’ve written here – maybe it’s even a better summary than mine. And the second part touches on issues that I’ll discuss in a subsequent post. But half way down there’s a quiz in which five separate actions are shown and you’re invited to guess the value of xG for each. See if you can beat my score of 2/5.


Incidentally, why do we use the term ‘expected goals’ if xG is a probability? Well, let’s consider the simpler experiment of tossing a coin. Assuming it’s a fair coin, the probability of getting a head is 0.5. In (say) 1000 tosses of the coin, on average I’d get 500 heads. That’s 0.5 heads per toss, so as well as being the probability of a head, 0.5 is also the number of heads we expect to get (on average) when we toss a single coin. xH if you like. And the same argument would work for a biased coin that has probability 0.6 of coming up heads: xH = 0.6. And exactly the same argument works for goals: if xG is the probability of a certain type of shot becoming a goal, it’s also the expected goals we’d expect, per event, from events of that type.


And finally… if there are any other statistical topics that you’d like me to discuss in this blog, whether related to sports or not, please do write and let me know.