Football’s back

 

It’s coming home… Well, coming back technically. But either way the Premier League has now agreed to return to some sort of normal service on 17th June. Some leagues have already started – South Korea and Germany – with others set to follow shortly. For the time being at least, it’s starting to feel like business as usual.

So now feels like a good time to pause the blog again. Unless something happens to change my mind, this will be my final post on the Coronavirus epidemic. Thankfully, in many countries, the epidemic has been brought under partial control through social-distancing and other measures, so the debate has moved away from the science of epidemiology, in which Statistics is central, to socio-behavioural issues, in which Statistics is relevant, but less fundamental. Moreover, debate in the UK especially is increasingly focused on political aspects, and I don’t think it’s appropriate to use this forum to contribute to those discussions.

So, I’ll stop here. Thanks to those of you who have been following these posts, and especially to anyone who has commented either directly to the blog or indirectly to me personally.

By way of concluding things, I’ll just mention an article in today’s Financial Times, since it pulls together many of the themes I’ve tried to cover in these posts, while also summarising the current state of the epidemic in the UK.

  1. There’s been a lot of discussion about the fairness of comparing death rates across countries, which measure should be used, and whether numbers should be stated per capita. On balance, it’s believed that showing excess deaths per capita of population is fairest. This graph includes such a comparison: The UK fares worst by this measure, and is second worst only to the US and Peru by alternative measures.
  2. Italy – from where I am writing – also does poorly by this same measure. However, Italy was hit much sooner than the UK. The UK had an advantage of several weeks of both time and evidence to help avert the crisis.
  3. One of the factors for the UK’s troubled epidemic trajectory is the speed with which it ordered a nationwide lockdown. This graph shows a very strong relationship – though not necessarily causal – between the speed of lockdown and the number of excess deaths.  As an aside, there’s also confirmation today that the failure to stop major sporting events in the period immediately before the lockdown, in particular the Liverpool-Atletico return leg in the Champions league, is likely to have contributed to subsequent Coronavirus hotspots.
  4.  Another reason the UK has been so badly affected compared to other countries is the fact that cases were spread very early on in the epidemic across most of the country.  In Italy most cases were, and indeed still are, concentrated in Lombardia and surrounding regions. In Spain it was Madrid and, to a lesser extent, Barcelona. In France, Paris and another region to the east close to the Swiss-German border. Having localised outbreaks made containment measures much easier and more effective compared to the situation in the UK where most regions were affected. The following graph shows the number of excess deaths through time in each of the UK regions. There are some differences, but excluding Northern Ireland, no region ended up having more than twice as many excess deaths as any other.

5. I’ve referred several times in the blog to the statistician Professor David Spiegelhalter, who for a period was widely quoted by the government for apparently suggesting that cross-country comparisons are inappropriate. But that was a wilful misquote of what he actually said, and a more accurate summary of his feelings on the subject are included in the FT article:

If we can believe the data from other countries, then the UK has done badly in terms of excess deaths. The issues now concern what will happen for the rest of the year, and trying to understand the processes contributing to our large excess.

Finally, though I’m planning not to write any more on this topic, there are plenty of places to keep up-to-date with statistical matters relating to the epidemic, many of which I’ve referenced in earlier posts. In particular, there’s nobody better at explaining tricky statistical ideas in simple terms than David Spiegelhalter, whose blog is available here.

Just lucky

According to world chess champion Magnus Carlsen, the secret to his success is…

I’ve just been lucky.

Lucky? At chess?

Well, no, actually. This is Carlsen talking about his success at Fantasy Football. At the time of writing, Carlsen’s Premier League Fantasy Football team, Kjell Ankedal, is top of the League:

Top of the league sounds great, but this picture, which shows just the top 10 teams, is a little misleading. The Premier League Fantasy Football League actually has more than 6 million teams, and Kjell Ankedal is currently top of all of them. Moreover, Kjell Ankedal has finished in the top 5% of the league for the past 4 seasons, and in 2017-18 finished 2397th. Again, with 6 million teams the 2017-18 result would place Carlsen in the top  0.04%.

Obviously, football – and by corollary fantasy football –  is a game with many more sources of random intervention than chess, including the referee, the weather, VAR, the managers and just the inevitable chaos that can ensue from the physics of 22 people chasing, kicking and punching a ball. Compare that with the deterministic simplicity of a chess move such as e4.

And yet…

Can it be that Carlsen is ‘just lucky’ at Fantasy Football? Lucky to be top of the league after finishing in the top 5% or so, year after year? Well, we could make some assumptions about Carlsen actually being just an average player, and then work out the probability that he got the set of results he actually got, over this and recent seasons, if he was really just lucky rather than a very good player…

And it would be vanishingly small.

In his Ted Talk, Rasmus Ankersen says that the famous quote ‘The league table never lies’ should be replaced with ‘The league table always lies’. There’s simply too much randomness in football matches for a league table based on 38 matches or so per team to end up with a ranking of teams that reflects their exact ability. And yet, if you look at the top and bottom of most league tables there are very few surprises. League tables are noisy arrangements of teams ranked by their ability, but they are not just total chaos. Better teams generally do better than poorer teams, and teams are never champions or relegated just due to good or bad luck. So, to be in the top few percent of players, consistently over several seasons, with so many people playing is just implausible unless Carlsen is a much-better-than-average player.

So, while it’s true that Carlsen’s precise Fantasy Football ranking is affected to a greater extent by luck than is his world chess ranking, it’s probably a little disingenuous for him to say he’s just been lucky

And maybe it’s no coincidence that someone who’s eminently brilliant at chess turns out also to be eminently brilliant at fantasy football. Maybe one of the keys to Carlsen’s success at chess is an ability to optimise his strategy over the uncertainty in the moves his opponent will make.

Or maybe he’s just brilliant at everything he does.


Obviously, what applies to Carlsen with respect to Fantasy Football applies equally well to betting syndicates trading on football markets. Luck will play a large part in determining short term wins and losses, but in the very long term luck is ironed out, and what determines the success of the syndicate is their skill, judgement and strategy.

Off script

off script

So, how did your team get on in the first round of Premier League fixtures for the 2019-20 season? My team, Sheffield United, were back in the top flight after a 13-year absence. It didn’t go too well though. Here’s the report:

EFL goal machine Billy Sharp’s long wait for a top-flight strike ends on the opening day. Ravel Morrison with the assist. But Bournemouth run out 4-1 winners.

And as if that’s not bad enough, we finished the season in bottom place:

script

Disappointing, but maybe not unexpected.

Arsenal also had a classic Arsenal season. Here’s the story of their run-in:

It seems only the Europa League can save them. They draw Man United. Arsenal abandon all hope and crash out 3-2. Just as they feared. Fans are more sad than angry. Once again they rally. Aubameyang and Alexandre Lacazette lead a demolition of high flying Liverpool. But they drop too many points and end up trophyless with another fifth-place finish.

Oh, Arsenal!

But what is this stuff? The Premier League doesn’t kick off for another week, yet here we have complete details of the entire season, match-by-match, right up to the final league table.

Welcome to The Script, produced by BT Sport. As they themselves explain:

Big data takes on the beautiful game.

And in slightly more detail…

BT has brought together the biggest brains in sports data, analysis and machine learning to write the world’s first artificial intelligence-driven script for a future premier league season.

Essentially, BT Sport have devised a model for match outcomes based on measures of team abilities in attack and defence. So far, so standard. After which…

We then simulate the random events that could occur during a season – such as injuries and player transfers – to give us even more accurate predictions.

But this is novel. How do you assign probabilities to player injuries or transfers? Are all players equally susceptible to injury? Do the terms of a player’s contract affect their chances of being sold? And who they are sold too? And what is the effect on a team’s performance of losing a player?

So, this level of modelling is difficult. But let’s just suppose for a minute you can do it. You have a model for what players will be available for a team in any of their fixtures. And you then have a model that, given the 2 sets of players that are available to teams for any fixture, spits out the probabilities of the various possible scores. Provided the model’s not too complicated, you can probably first simulate the respective lineups in a match, and then the scores given the team lineups. And that’s why Sheffield United lost 4-1 on the opening day to Bournemouth. And that’s why Arsenal did an Arsenal at the end of the season. And that’s why the league table ended up like it did above.

But is this a useful resource for predicting the Premier League?

Have a think about this before scrolling down. Imagine you’re a gambler, looking to bet on the outcome of the Premier League season. Perhaps betting on who the champions will be, or the top three, or who will be relegated, or whether Arsenal will finish fifth. Assuming BT’s model is reasonable, would you find the Script that they’ve provided helpful in deciding what bets to make?

|
|
|
|
|
|
|
|
|
|
|
|
|
|

Personally, I think the answer is ‘no’, not very helpful. What BT seem to have done is run A SINGLE SIMULATION of their model, for every game over the entire season, accumulating the simulated points of each team per match to calculate their final league position.

A SINGLE SIMULATION!

Imagine having a dice that you suspected of being biased, and you tried to understand its properties with a single roll. It’s almost pointless. Admittedly, with the Script, each team has 38 simulated matches, so the final league table is likely to be more representative of genuine team ability than the outcome of a single throw of a dice. But still, it’s the simulation of just a single season.

What would be much more useful would be to simulate many seasons and count, for example, in how many of those seasons Sheffield United were relegated. This way the model would be providing an estimate of the probability that Sheffield United gets relegated, and we could compare that against market prices to see if it’s a worthwhile bet.

In summary, we’ve seen in earlier posts (here and here, for example) contenders for the most pointless simulation in a sporting context, but the Script is lowering the bar to unforeseen levels. Despite this, if the blog is still going at the end of the season, I’ll do an assessment of how accurate the Script’s estimates turned out to be.

 

Can’t buy me love

Ok, money can’t buy you love, but can it buy you the Premier League title? We’ll look at that below, but first this recent Guardian article notes the following Premier League statistics:

Between 2003 and 2006 there were just 3 instances of a team having more than 70% of possession in a game. Two seasons ago there were 37, last season 63 and this season 67.

In other words, by even the simplest of statistical measures, Premier League games are becoming increasingly one-sided, at least in terms of possession. And the implication in the Guardian article is that money is the driving factor behind this imbalance. But is that really the case?

This graph shows final league position of the 20 Premier League teams plotted against their wealth in terms of start-of-season squad market value (taken from here).

To make things slightly clearer, the following diagram shows the same thing, but with a smooth curve (in blue) added on top, estimated using standard statistical techniques, which shows the overall trend in the data.

Roughly speaking, teams above the blue line have performed better than their financial resources would have suggested; those below have performed worse.

Bear in mind this is just one season’s data. Also, success breeds success, and money breeds money, so the differential between teams in terms of wealth as a season progresses is likely to increase further. For these reasons and others, not too much should be read into the slight wobbles in the blue curve. Nonetheless, a number of general features emerge:

  1. It’s a very noisy picture for teams with less than £250 m. Arguably, at that level, there’s no very obvious pattern between wealth and final position: there’s a bunch of teams with between £100 m and £250 m, and their league position within this group of teams isn’t obviously dependent on their wealth. As such, teams in this category are unlikely to get out of the bottom half of the table, and their success within the bottom half is more likely to depend on how well they’ve spent their money than on how much they actually have. And on luck.
  2. Teams with between £250 m and £500 m are likely to force their way out of the ‘relegation-battle pack’, but not into the top 6 elite.
  3. The cost of success at the top end is high: the blue curve at the top end is quite flat, so you have to spend a lot to improve your position. But money, as long as there’s enough of it, counts a lot for elite clubs, and the evidence is that the teams who are prepared to spend the most are likely to improve their league position.
  4. A couple of clubs stand out as having performed very differently to what might be expected: Manchester United have considerably under-performed, while Wolves have substantially over-performed.

The trials and tribulations of Manchester United are well documented. Chances are they just need a change of manager. <Joke>. But Wolves is a much more interesting case, which takes us back to the Guardian article I referred to. As discussed above, this article is more about the way money is shaping the way games are played rather than about the success it brings, with matches between the rich and poor teams increasingly becoming challenges of the attack of one side against the defence of the other. But Wolves have adapted to such imbalances, playing long periods without possession, and attacking with speed and precision when they do have the ball. The template for this type of play was Leicester City in their title-winning season, but even though it was just a few seasons ago, the financial imbalances were far smaller than now.

It seems then, that to a very large extent, a team’s performance in the Premier League is likely to be determined by its wealth. Good management can mitigate for this, just as bad management can lead to relatively poor performance. But even where teams are punching above their weight, they are having to do so by adapting their gameplay, so that matches are still dominated in terms of possession by the wealthier sides. As the Guardian article concludes:

Money guides everything. There have always been rich clubs, of course, but they have never been this rich, and the financial imbalances have never had such an impact on how the game is played.

 

“I don’t like your mum”

var

VAR, eh?

So, does video-assisted refereeing (VAR) improve the quality of decision-making in football matches?

Of course, that’s not the only question about VAR: assuming there is an improvement, one has to ask whether it’s worth either the expense or the impact it has on the flow of games when an action is reviewed. But these are subjective questions, whereas the issue about improvements in decision-making is more objective, at least in principle. With this in mind, IFAB, the body responsible for determining the laws of football, have sponsored statistical research into the extent to which VAR improves the accuracy of refereeing decisions.

But before looking at that, it’s worth summarising how the VAR process works. VAR is limited to an evaluation of decisions made in respect of four types of events:

  • Goals
  • Penalties
  • Straight red cards
  • Mistaken identity in the award of cards

And there are two modes of operation of VAR:

  • Check mode
  • Review mode

The check mode runs in the background throughout the whole game, without initiation by the referee. All incidents of the above type are viewed and considered  by the VAR, and those where a potential error are checked, with the assistance of replays if necessary. Such checks are used to identify situations where the referee is judged to have made a ‘clear and obvious error’ or there has been a ‘serious missed incident’.  Mistakes for other types of incidents – e.g. the possible award of a free kick – or mistakes that are not judged to be obvious errors should be discarded during the check process.

When a check by VAR does reveal a possible mistake of the above type, the referee is notified, who is then at liberty to carry out a review of the incident. The review can consist solely of a description of the event from the VAR to the referee, or it can comprise a video review of the incident by the referee using a screen at the side of the pitch. The referee is not obliged to undertake a review of an incident, even if flagged by the VAR following a check. On the other hand, the referee may choose to carry out a review of an incident, even if it has not been flagged by the VAR.

Hope that’s all clear.

Anyway, the IFAB report analysed more than 800 competitive games in which VAR was used, and includes the following statistics:

  • 56.9% of checks were for penalties and goals; almost all of the others were for red card incidents;
  • On average there were fewer than 5 checks per match;
  • The median check time of the VAR was 20 seconds
  • The accuracy of reviewable decisions before VAR was applied was 93%.
  • 68.8% of matches had no review
  • On average, there is one clear and obvious error every 3 matches
  • The decision accuracy after VAR is applied is 98.9%.
  • The median duration of a review is 60 seconds
  • The average playing time lost due to VAR is less than 1% of the total playing time.
  • In 24% of matches, VAR led to a change in a referee’s decision; in 8% of matches this change led to a decisive change in the match outcome.
  • A clear and obvious error was not corrected by VAR in around  5% of matches.

This all seems very impressive. A great use of Statistics to check the implementation of the process and to validate its ongoing use. And maybe that’s the right conclusion. Maybe. It’s just that, as a statistician, I’m still left with a lot of questions. Including:

  1. What was the process for checking events, both before and after VAR? Who decided if a decision, either with or without VAR, was correct or not?
  2. It would be fairest if the analysis of incidents in this experiment were done ‘blind’. That’s to say, when an event is reviewed, the analyst should be unaware of what the eventual decision of the referee was. This would avoid the possibility of the experimenter – perhaps unintentionally – being drawn towards incorrect agreement with the VAR process decision.
  3. It’s obviously the case when watching football, that even with the benefit of slow-motion replays, many decisions are marginal. They could genuinely go either way, without being regarded as wrong decisions. As such, the impressive-looking 93% and 98.9% correct decision rates are probably more fairly described as rates of not incorrect decisions.
  4. There’s the possibility that incidents are missed by the referee, missed by VAR and missed by whoever is doing this analysis. As such, there’s a category of errors that are completely ignored here.
  5. Similarly, maybe there’s an average of only 5 checks per match because many relevant incidents are being missed by VAR.
  6. The use of the median to give average check and review times could be disguising the fact that some of these controls take a very long time indeed. It would be a very safe bet that the mean times are much bigger than the medians, and would give a somewhat different picture of the extent to which the process interrupts games when applied.

So, I remain sceptical. The headline statistics are encouraging, but there are aspects about the design of this experiment and the presentation of results that I find questionable. And that’s before we assess value in terms of cost and impact on the flow of games.

On the other hand, there’s at least some evidence that VAR is having incidental effects that aren’t picked up by the above experiment. It was reported that in Italy Serie A,  the number of red cards given for dissent during the first season of VAR was one, compared with eleven in the previous season. The implication being that VAR is not just correcting mistakes, but also leading to players moderating their behaviour on the pitch. Not that this improvement is being universally adopted by all players in all leagues of course. But anyway, this fact that VAR might actually be improving the game in terms of the way it’s played, above and beyond any potential improvements to the refereeing process, is an interesting aspect, potentially in VAR’s favour, which falls completely outside the scope of the IFAB study discussed above.

But in terms of VAR’s impact on refereeing decisions, I can’t help feeling that the IFAB study was designed, executed and presented in a way that shines the best possible light on VAR’s performance.


Incidentally, if you’re puzzled by the title of this post, you need to open the link I gave above, and exercise your fluency in Spanish vernacular.

Olé, Olé, Olé

ole

So, everyone agrees that Ole Solskjær has been a breath of fresh air at Man United and is largely responsible for their remarkable turn around this season. But here’s a great article by the guys at StatsBomb that adds perspective to that view. Sure, there’s been a change in results since Solskjær arrived, but more importantly xG – the expected goals – have also improved considerably, both in terms of attack and defence. This suggests that the results are not just due to luck; United are genuinely creating more chances are preventing those for the opposition at a greater rate than under Mourinho.

Nonetheless, United’s performance in terms of actual goals is out-performing that of xG: at the time of the StatsBomb report, total xG for United over all games under Solskjær was 17.72, whereas actual goals were 25; and total xG against United was 10.99, with actual goals at 8. In other words, they’ve scored more, and conceded fewer, goals than their performance merits. This suggests that, notwithstanding the improvement in performance, United have also benefited from an upsurge in luck, both in attack and defence.

But more generally, what is the value of a good manager? This recent article references a statistical analysis of data from the German Bundesliga, which aimed to quantify the potential effect a manager could have on a team. It’s not a completely straightforward issue, since the best managers tend to go to the best clubs, who are bound to have a built-in tendency for success that’s not attributable to the manager. Therefore, the research attempted to distinguish between team and manager effects. Their conclusions were:

  • The best 20% of managers were worth around 0.3 points per game more than the weakest 20% of managers. This amounts to 10.2 points over a 34-game season in the Bundesliga.
  • A manager’s estimated performance proved to be a valuable predictor in team performance when a manager changed clubs.
  • The best and worst managers have a strong impact on team performance. For teams with managers having closer to average ability, team performance is more heavily impacted by other factors, such as player quality and recruitment strategy.

In summary, on the basis of this research, there is value in aiming for the best of managers, and avoiding the worst, but not much evidence to suggest it’s worth shopping around in the middle. There are some caveats to this analysis though, and in particular about the way it’s described in the Guardian article:

  1. The analysis uses data from German leagues only up to 2013-14.
  2. This amounts to a total of just 6,426 matches, and includes relatively few managers.
  3. The Guardian article states ‘budget per season’ was accounted for. It wasn’t.
  4. The Guardian article refers to ‘statistical wizardry’. This consists of simple linear regression on points per half season with separate effects for managers and teams. This might be a sensible strategy, but it’s not wizardry.

So, it’s best to treat the precise conclusions of this report with some caution. Nonetheless, the broad picture it paints is entirely plausible.

And going back to Solskjær: there are good reasons to believe he is partly responsible for the overall improvement in performance at United, but a comparison between goals and xG suggests that the team have also been a bit on the lucky side since his arrival, and that their results have flattered to deceive a little.

It’s not your fault (maybe)

 

Most of you who came through the UK school system will have taken GCSE’s at the end of your secondary school education. But did that occur in a year that was even or an odd number? If it was an even number, I have good news for you: a ready-made and statistically validated excuse as to why your results weren’t as good as they could have been.

A recent article in the Guardian pointed to academic research which compared patterns of GCSE results in years with either a World Cup or Euro tournament final – i.e. even-numbered years – with those of other years – i.e. odd-numbered years. They found, for example, that the chances of a student achieving 5 good GCSE grades is 12% lower for students in a tournament year compared with a non-tournament year. This is a big difference, and given the size of the study, strongly significant in statistical terms. In other words, it’s almost impossible that a difference of this magnitude could have occurred by chance if there were really no effect.

The implication of the research is that the World Cup and Euros, which take place at roughly the same time as GCSE final examinations, have a distracting effect on students, leading to poorer results. Now, to be clear: the analysis cannot prove this claim. The fact that there is a 2-year cycle in quality of results is beyond doubt. But this could be due to any cause which has a 2-year cycle that coincides with GCSE finals (and major football finals). But, what could that possibly be?

Moreover, here’s another thing: the difference in performance in tournament and non-tournament years varies among types of students, and is greatest for the types of students that you’d guess are most likely to be distracted by football.

  1. The effect is greater for boys than for girls, though it is also present and significant for girls.
  2. The difference in performance (of achieving five or more good GCSE grades) reaches 28% for white working class boys.
  3. The difference for black boys with a Caribbean background is similarly around 28%.

So, although it requires a leap of faith to assume that the tournament effect is causal rather than coincidental so far as GCSE performance goes, the strength of circumstantial evidence is such that it’s a very small leap of faith in this particular case.

 

The numbers game

If you’re reading this post, you’re likely to be aware already of the importance of Statistics and data for various aspects of sport in general and football in particular. Nonetheless, I recently came across this short film, produced by FourFourTwo magazine, which gives a nice history of the evolution of data analytics in football. If you need a refresher on the topic, this isn’t a bad place to look.

And just in case you don’t think that’s sufficient to justify this post in a Statistics blog, FourFourTwo claims to be ‘the world’s biggest football magazine’. Moreover, many of the articles on the magazine’s website are analytics-orientated. For example: ‘Ronaldo averaged a game every 4.3 days‘. Admittedly, many of these articles are barely-disguised advertisements for a wearable GPS device intended for tracking activity of players during matches. But I suppose even 199 (pounds)  is a number, right?

 

Nokia 3310

Whatever happened to the Nokia 3310, and what’s that got to do with sports data?

Many of you will know Rasmus Ankerson from his involvement with both Brentford and Midtjylland. Maybe you’ve also seen this video of a TED talk Rasmus gave a while back, but I’ve only just come across it. I think it’s interesting because there are now plenty of articles, books and – ahem – blogs, which emphasise the potential for statistics and data analytics in both sports and gambling. But Rasmus’s talk here goes in the other direction and argues that since data analytics has been proven as a valuable tool to assist gambling on sports, there are lessons that can be learned for leaders of business and industry. The main themes are

  1. In any process where there’s an element of chance, it’s important to recognise that good and bad results are not just a function of good and bad performance, but also of good and bad luck;
  2. There are potentially huge gains in trying to identify the aspects of performance that determine either good or bad results, notwithstanding the interference effects of luck.

In other words, businesses, like football teams, have results that are part performance-driven and part luck. Signal and noise, if you like. Rasmus argues that good business, like good football management, is about identifying what it is that determines the signal, while mitigating for the noise. And only by adopting this strategy can companies, like Nokia, avoid the type of sudden death that happened to the 3310. Or as Rasmus puts it: “RIP  at gadgets graveyard”.

Anyway, Rasmus’s talk is a great watch, partly because of the message it sends about the importance of Statistics to both sport and industry, but also because it includes something about the history of the relationship between Smartodds, Brentford and Midtjylland. Enjoy.