Whatever happened to the Nokia 3310, and what’s that got to do with sports data?

Many of you will know Rasmus Ankerson from his involvement with both Brentford and Midtjylland. Maybe you’ve also seen this video of a TED talk Rasmus gave a while back, but I’ve only just come across it. I think it’s interesting because there are now plenty of articles, books and – ahem – blogs, which emphasise the potential for statistics and data analytics in both sports and gambling. But Rasmus’s talk here goes in the other direction and argues that since data analytics has been proven as a valuable tool to assist gambling on sports, there are lessons that can be learned for leaders of business and industry. The main themes are

In any process where there’s an element of chance, it’s important to recognise that good and bad results are not just a function of good and bad performance, but also of good and bad luck;

There are potentially huge gains in trying to identify the aspects of performance that determine either good or bad results, notwithstanding the interference effects of luck.

In other words, businesses, like football teams, have results that are part performance-driven and part luck. Signal and noise, if you like. Rasmus argues that good business, like good football management, is about identifying what it is that determines the signal, while mitigating for the noise. And only by adopting this strategy can companies, like Nokia, avoid the type of sudden death that happened to the 3310. Or as Rasmus puts it: “RIP at gadgets graveyard”.

Anyway, Rasmus’s talk is a great watch, partly because of the message it sends about the importance of Statistics to both sport and industry, but also because it includes something about the history of the relationship between Smartodds, Brentford and Midtjylland. Enjoy.

When Matthew.Benham@smartbapps.co.uk was asked at a recent offsite for book recommendations, his first suggestion was Thinking, Fast and Slow. This is a great book, full of insights that link together various worlds, including statistics, economics and psychology. Daniel Kahneman, the book’s author, is a world-renowned psychologist in his own right, and his book makes it clear that he also knows a lot about statistics. However, in a Guardian article a while back, Kahneman was asked the following:

Interestingly, it’s a fact that highly intelligent women tend to marry men less intelligent than they are. Why do you think this might be?

He answered as follows:

It’s a fact – but it’s not interesting at all. Assuming intelligence is similarly distributed between men and women, it’s just a mathematical inevitability that highly intelligent women, on average, will be married to men less intelligent than them. This is “regression to the mean”, and all it really tells you is that there’s no perfect correlation between spouses’ intelligence levels. But our minds are predisposed to try to construct a more compelling explanation.

<WOMEN: please insert your own joke here about the most intelligent women choosing not to marry men at all.>

Anyway, I can’t tell if this was Kahneman thinking fast or slow here, but I find it a puzzling explanation of regression to the mean, which is an important phenomenon in sports modelling. So, what is regression to the mean, why does it occur and why is it relevant to Smartodds?

Let’s consider these questions by looking at a specific dataset. The following figure shows the points scored in the first and second half of each season by every team in the Premier League since the inaugural 1992-93 season. Each point in the plot represents a particular team in a particular season of the Premier League. The horizontal axis records the points scored by that team in the first half of the season; the vertical axis shows the number of points scored by the same team in the second half of the same season.

Just to check your interpretation of the plot, can you identify:

The point which corresponds to Sunderland’s 2002-03 season where they accumulated just a single point in the second half of the season?

The point which corresponds to Man City’s 100-point season in 2017-18?

Now, let’s take that same plot but add a couple of lines as follows:

The red line divides the data into roughly equal sets. To its left are the points that correspond to the 50% poorest first-half-of-season performances; to its right are the 50% best first-half-of-season performances.

The green line corresponds to teams who had an identical performance in the first and second half of a season. Teams below the green line performed better in the first half of a season than in the second; teams above the green line performed better in the second half of a season than in the first.

In this way the picture is divided into 4 regions that I’ve labelled A, B, C and D. The performances within a season of the teams falling in these regions are summarised in the following table:

First Half

Best half

Number of points

A

Below average

First

94

B

Above average

First

174

C

Above average

Second

71

D

Below average

Second

187

I’ve also included in the table the number of points in each of the regions. (Counting directly from the figure will give slightly different numbers because of overlapping points).

First compare A and D, the teams that performed below average in the first half of a season. Looking at the number of points, such teams are much more likely to have had a better second half to the season (187 to 94). By contrast, comparing B and C, the teams that do relatively well in the first half of the season are much more likely to do worse in the second half of the season (71 to 174).

This is regression to the mean. In the second half of a season teams “regress” towards the average performance: teams that have done below average in the first half of the season generally do a bit less badly in the second half; teams that have done well in the first half generally do a bit less well in the second half. In both cases there is a tendency to move – regress – towards the average in the second half. I haven’t done anything to force this; it’s just what happens.

We can also view the phenomenon in a slightly different way. Here’s the same picture as above, where points falling on the green line would correspond to a team doing equally well in both halves of the season. But now I’ve also used standard statistical methods to add a “line of best fit” to the data, which is shown in orange. This line is a predictor of how teams will perform in the second half of season given how they performed in the first, based on all of the data shown in the plot.

In the left side of the plot are teams who have done poorly in the first half of the season. In this region the orange line is above the green line, implying that such teams are predicted to do better in the second half of the season. On the right side of the plot are the teams who have done well in the first half of the season. But here the orange line is below the green line, so these teams are predicted to do worse in the second half of the season. This, again, is the essence of regression to the mean.

One important thing though: teams that did well in the first half of the season still tend to do well in the second half of the season; the fact that the orange line slopes upwards confirms this. It’s just that they usually do less well than they did in the first half; the fact that the orange line is less steep than the green line is confirmation of that. Incidentally, you’ve probably heard the term “regression line” used to describe a “line of best fit”, like the orange line. The origins of this term are precisely because the fit often involves a regression to the mean, as we’ve seen here.

But why should regression to the mean be such an intrinsic phenomenon that it occurs in football, psychology and a million other places? I just picked the above data at random: I’m pretty sure I could have picked data from any competition in any country – and indeed any sport – and I’d have observed the same effect. Why should that be?

Let’s focus on the football example above. The number of points scored by a team over half a season (so they’ve played all other teams) is dependent on two factors:

The underlying strength of the team compared to their opponents; and

Luck.

Call these S (for strength) and L (for luck) and notionally let’s suppose they add together to give the total points (P). So

Although there will be some changes in S over a season, as teams improve or get worse, it’s likely to be fairly constant. But luck is luck. And if a team has been very lucky in the first half of the season, it’s unlikely they’ll be just as lucky in the second. And vice versa. For example, if you roll a dice and get a 6, you’re likely to do less well with a second roll. While if you roll a 1, you’ll probably do better on your next roll. So while S is pretty static, if L was unusually big or small in the first half of the season, it’s likely to be closer to the average in the second half. And the overall effect on P? Regression to the mean, as seen in the table and figures above.

Finally: what’s the relevance of regression to the mean to sports modelling? Well, it means that we can’t simply rely on historic performance as a predictor for future performance. We need to balance historic performance with average performance to compensate for inevitable regression to the mean effects; all of our models are designed with exactly this feature in mind.

Adam.Weinrich@smartodds.co.uk wrote and asked for a discussion of xG. I’m so happy about this suggestion that I’m actually going to do two posts on the topic. In this post we’ll look at the xG for a single shot on goal; in a subsequent post we’ll discuss the xG for a passage of play and for an entire game.

xG stands for expected goals, and it’s famous enough now that it’s used almost routinely on Match of the Day. But what is it, why is it used, how is it calculated and is it all it’s cracked up to be?

It’s well-understood these days when trying to assess how well a team has performed in a game, that because goals themselves are so rare, it’s better to go beyond the final result and look at the match in greater statistical detail.

For example, this screenshot shows the main statistics for the recent game between Milan and Genoa, as provided by Flashscore. Milan won 2-1, but it’s clear from the data here that they also dominated the game in terms of possession and goal attempts. So, on the basis of this information alone, the result seems fair.

Actually, Milan’s winner came in injury time, and if they hadn’t got that goal, again on the basis of the above statistics, you’d probably argue that they would have been unlucky not to have won. In that case the data given here in terms of shots and possession would have given a fairer impression of the way the match played out than just the final result.

But even these statistics can be misleading: maybe most of Milan’s goal attempts were difficult, and unlikely to lead to goals, whereas Genoa’s fewer attempts were absolute sitters that they would score 9 times out of 10. If that were the case, you might conclude instead that Genoa were unlucky to lose. xG – or expected goals – is an attempt to take into account not just the number of chances a team creates, but also the difficulty of those chances.

The xG for a single attempt at goal is an estimate of the probability that, given the circumstances of a shot – the position of the ball, whether the shot is kicked or a header, whether the shot follows a dribble or not, and other relevant information – it is converted into a goal.

This short video from OPTA gives a pretty simple summary.

So how is xG calculated in practice? Let’s take a simple example. Suppose a player is 5 metres away from goal with an open net. Looking back through a database of many games, we might find (say) 1000 events of an almost identical type, and on 850 of those occasions a goal was scored. In that case the xG would be estimated as 850/1000 = 0.85. But breaking things down further, it might be that 900 of the 1000 events were kicked shots, while 100 were headers; and the number of goals scored respectively from these events were 800 and 50. We’d then calculate the xG for this event as 800/900 = 0.89 for a kicked shot, but 50/100 = 0.5 for a header.

But there are some complications. First, there are unlikely to be many events in the database corresponding to exactly the same situation (5 metres away with an open goal). Second, we might want to take other factors into account: scoring rates from the same position are likely to be different in different competitions, for example, or the scoring rate might depend on whether the shot follows a dribble by the same player. This means that simple calculations of the type described above aren’t feasible. Instead, a variation of standard regression – logistic regression – is used. This sounds fancy, but it’s really just a statistical algorithm for finding the best formula to convert the available variables (ball position, shot type etc.) into the probability of a goal.

So in the end, xG is calculated via a formula that takes a bunch of information at the time of a shot – ball position, type of shot etc. etc. – and converts it into a probability that the shot results in a goal. You can see what xG looks like using this simple app.

Actually, there are 2 alternative versions of xG here, that you can switch between in the first dialog box. For both versions, the xG will vary according to whether the shot is a kick or a header. For the second version the xG also depends on whether the shot is assisted by a cross, or preceded by a dribble: you select these options with the remaining dialog boxes. In either case, with the options selected, clicking on the graphic of the pitch will return the value of xG according to the chosen model. Naturally, as you get closer to the goal and with a more favourable angle the xG increases.

One point to note about xG is that there is no allowance for actual players or teams. In the OPTA version there is a factor that distinguishes between competitions – presumably since players are generally better at converting chances in some competitions than others – but the calculation of xG is identical for all players and teams in a competition. Loosely speaking, xG is the probability a shot leads to a goal by an average player who finds themselves in that position in that competition. So the actual xG, which is never calculated, might be higher if it’s a top striker from one of the best teams, but lower if it’s a defender who happened to stray into that position. And in exactly the same way, there is no allowance in the calculation of xG for the quality of the opposition: xG averages over all players, both in attack and defence.

It follows from all this discussion that there’s a subtle difference between xG and the simpler statistics of the kind provided by Flashscore. In the latter case, as with goals scored, the statistics are pure counts of different event types. Apart from definitions of what is a ‘shot on goal’, for example, two different observers would provide exactly the same data. xG is different: two different observers are likely to agree on the status of an event – a shot on an open goal from the corner of the goal area, for example – but they may disagree on the probability of such an event generating a goal. Even the two versions in the simple app above gave different values of xG, and OPTA would give a different value again. So xG is a radically different type of statistic; it relies on a statistical model for converting situational data into probabilities of goals being scored, and different providers may use different models.

We’ll save discussion about the calculation of xG for a whole match or for an individual player in a whole match for a subsequent post. But let me leave you with this article from the BBC. The first part is a summary of what I’ve written here – maybe it’s even a better summary than mine. And the second part touches on issues that I’ll discuss in a subsequent post. But half way down there’s a quiz in which five separate actions are shown and you’re invited to guess the value of xG for each. See if you can beat my score of 2/5.

Incidentally, why do we use the term ‘expected goals’ if xG is a probability? Well, let’s consider the simpler experiment of tossing a coin. Assuming it’s a fair coin, the probability of getting a head is 0.5. In (say) 1000 tosses of the coin, on average I’d get 500 heads. That’s 0.5 heads per toss, so as well as being the probability of a head, 0.5 is also the number of heads we expect to get (on average) when we toss a single coin. xH if you like. And the same argument would work for a biased coin that has probability 0.6 of coming up heads: xH = 0.6. And exactly the same argument works for goals: if xG is the probability of a certain type of shot becoming a goal, it’s also the expected goals we’d expect, per event, from events of that type.

And finally… if there are any other statistical topics that you’d like me to discuss in this blog, whether related to sports or not, please do write and let me know.

I’ve mentioned in previous posts that an analysis of the detailed statistics from a game can provide a deeper understanding of team performance than just the final result. This point of view is increasingly shared and understood in the football world, but there are some areas of resistance. Here‘s Mourinho after yesterday’s 3-1 defeat of Man United against Man City:

The way people who don’t understand football analyse it with stats. I don’t go for stats. I go for what I felt in the game and it was there until minute 80-something. I consider the performance of my team one with mistakes. It is different from a bad performance.

And here are the stats that he doesn’t go for:

Of course, there’s a fair point to be made: statistics don’t tell the whole story, and it’s always important, wherever possible, to balance the information that they provide with the kind of information you get from an expert watching a game. Equally though, it has to be a missed opportunity not to take any account of the information that is contained in statistics. Or maybe Mourinho is such a total expert that statistics are completely irrelevant compared to his ‘feel for the game’.

In an earlier post I discussed how the use of detailed in-play statistics was becoming much more important for sports modelling, and we looked at a video made at OPTA where they discuss how the data from a single event in a match is converted into a database entry. In that video there was reference to another video showing OPTA ‘behind-the-scenes’ on a typical match day. You can now see that video below.

Again, this video is a little old now, and chances are that OPTA now use fully genuine copies of Windows (see video at 2.07), but I thought again it might be of interest to see the process by which some of our data are collected. In future posts we might discuss the nature of some of the data that they are collecting.

One way of trying to improve sports models is to adapt them to include extra information. In football, for example, rather than just using goals from past fixtures, you might try to include more detailed information about how those fixtures played out.

It’s a little old now – 2013 – but I recently came across the video below. As you probably know, OPTA is the leading provider of in-match sports data, giving detailed quantitative measures of every event and every player in a match, not just in football, but for many other sports as well.

In this video, Sam from OPTA is discussing the data derived from a single event in a football match: Iniesta’s winner in the 2010 world cup final. I think it’s interesting because we tend to treat the data as our raw ingredients, but there is a process by which the action in a game is converted into data, and this video gives insights into that actual process.

In future posts we might look at how some of the data collected this way is used in models.

Incidentally, this video was produced by numberphile, a group of nerds maths enthusiasts who make fun (well, you know, “fun”) YouTube videos on all aspects of maths and numbers, including, occasionally, statistics. Chances are I’ll be digging through their archives to see if there’s anything else I can steal borrow for the blog.

Question: if you watch the video carefully, you will see at some point (2:12, precisely) that event type number 31 is “Picked an orange”. What is that about? Is “picked an orange” a colloquialism for something? Forgive my ignorance, but I have simply no idea, and would be really happy if someone could explain.

If anyone knows the answer or has alternative suggestions I’ll include them here, thanks.

Actually, could it be this? When a match is played in snowy conditions, an orange ball is used to make it more visible. Maybe “picking an orange” refers to the decision to switch to such a ball by the referee.

But while simulation is a bit problematic – though immensely entertaining – in football and other sports, it has a totally different meaning in the context of Statistics, and proves to be an essential part of the statistician’s toolbox.

Here’s how it works: at its heart a statistical model describes a process in terms of probabilities. Since computers can be tricked to mimic randomness, this means that in many circumstances they can be used to simulate the process of generating new ‘data’ from the statistical model. These data can then, in turn, be used to learn something about the behaviour of the model itself.

Let’s look at a simple example.

The standard statistical model for a sequence of coin tosses is that each toss of the coin is independent from all others, and that in each toss ‘heads’ or ‘tails’ will each occur with the same probability of 0.5. The code in the following R console will simulate the tossing of 100 coins, print and tabulate the results, and show a simple bar chart of the counts. Just press the ‘Run’ button to activate the code, then toggle between the windows to see the printed and graphical output. Since it’s a simulation you’re likely to get different results if you repeat the exercise. (Just like if you really tossed 100 coins, you’d probably get different results if you did it again.)

runs<-function(ntoss,nrep,p_head=0.5){
tosses<-sample(c('Heads','Tails'), size=nrep*ntoss, replace=T, prob=c(p_head,1-p_head))
tosses<-matrix(tosses,nrow=nrep)
l<-apply(tosses,1,function(x)max(rle(x)$length))
hist(l,breaks=(min(l)-.5):(max(l)+.5),col="lightblue",main="Maximum Run Length ",xlab="Length")
}
#specify number of tosses
ntoss<-100
#do the simulation
tosses<-sample(c('Heads', 'Tails'), size=ntoss, replace=TRUE)
#show the simulated coins
print(tosses)
#tabulate results
tab<-table(tosses)
#show table of results
print(tab)
#draw barplot of results
barplot(tab, col="lightblue", ylab="Count")

That’s perhaps not very interesting, since it’s kind of obvious that we’d expect a near 50/50 split of heads and tails each time we repeat the experiment. But suppose instead we’re interested in runs of heads or tails, and in particular, the longest run of heads or tails in the sequence of 100 tosses. Some of you may remember we did something like this as an experiment at an offsite some years ago. This is sort of relevant to Smartodds since if we make a sequence of 50/50 bets, looking at the longest run of heads or tails is equivalent to looking at the longest run of winning or losing bets. Anyway, the mathematics to calculate the probability of (say) a run of 10 heads or tails occurring is not completely straightforward. But, we can simulate the tossing of 100 coins many times and see how often we get a run of 10. And if we simulate often enough we can get a good estimate of the probability. So, lets try tossing a coin 100 times, count the longest sequence of heads or tails, and repeat that exercise 10,000 times. I’ve already written the code for that. You just have to toggle to the R console window and write
runs(100, 10000)

followed by ‘return’. You should get a picture like this

Yours will be slightly different because your simulated tosses will be different from mine, but since we are both simulating many times (10,000 repetitions) the overall picture should be very similar. Anyway, on this basis, I got a run of 10 heads or tails around 400 times (though I could have tabulated the results to get the number more precisely). Since this was achieved in 10,000 simulations, it follows that the probability of a maximum sequence of 10 heads or tails is around 400/10000 = 0.04 or 4%.

Some comments:

This illustrates exactly the procedure we adopt for making predictions from some of our models. Not so much deadball models, from which it’s usually easy to get predictions by a simple formula, but our in running models often require us to simulate the goals (or points) in a game, and to repeat the game many times, in order to get the probability of a certain number of goals/points.

You can increase the accuracy of the calculation by increasing the number of repetitions. This can be prohibitive if the simulations are slow, and a compromise usually has to be accepted between speed and accuracy. Try increasing (or decreasing) the number of repetitions in the above example: what effect does it have on the shape of the graph?

The function runs is actually slightly more general than the above example illustrates. If, for example, you write runs(100, 10000, 0.6), this repeats the above experiment but where the probability of getting a head on any toss of the coin is 0.6. This isn’t too realistic for coin tossing, but would be relevant for sequences of bets, each of which has a 0.6 chance of winning. How do you think the graph will change in this case? Try it and see.

The calculation of the probability of the longest runs in sequences of coin tosses can actually be done analytically, so the use of simulation here is strictly unnecessary. This isn’t the case for many models – including our in running model. In such cases simulation is the only viable method of calculating predictions.

Simulation has important statistical applications other than calculating predictions. Future posts might touch on some of these.

If you had any difficulty running using the R console in this post – either because my explanations weren’t very good, or because the technology failed – please mail to let me know. As I explained at our recent offsite, I’ve set up a page which explains further the use of the R consoles in this blog, and provides an empty console that you can experiment with. But please do write to me if anything’s unclear or if you’d like extra help or explanations.

Ever wondered what league tables would look like if each team’s overall performance was measured not just by their total points scored, but also by the quality of the opposition they had faced? Well, wonder no more. David Firth and colleagues at Warwick University produce a weekly table that does precisely that. Here’s the table for the English Premier League after 8 rounds of games (8 October).

The left hand side of the figure shows the true table; the right hand side shows an adjusted table with a corrected number of points (and therefore position) for each team based on the quality of the opposition they have already faced. The adjusted points per team, listed in the table here as Pts|8, is the actual points from the true table, plus the value of shed, which takes a large positive value of teams have faced tough opposition so far, but is large and negative for teams that have played relatively weak opposition. The green and red arrows show teams who’ve gone up or down respectively when using the alternative method of calculating points.

Some headline conclusions based on the current table are:

Liverpool piggyback to top in this scheme of things, since the model evaluates that their initial set of fixtures has been more arduous than those of either Man City or Chelsea;

West Ham have the biggest shed value of 4.7, implying they have had the hardest set of fixtures so far among all the Premier League teams. Once this value is added to their actual points, their position changes from 15th to 11th.

Burnley and Crystal Palace have positions in the true league that flatter to deceive. They’ve had relatively easy fixtures compared to other teams, and accounting for this factor means they drop by 4 and 3 places respectively.

It’s not exactly rocket science, but the mathematics required to calculate the adjusted tables is fairly sophisticated compared with the simple arithmetic that leads to the standard tables. If you’re interested there’s a summary here. Though different, our own models for football and other sports are built on similar principles, so that team ratings are affected not just by previous results, but also by the quality of the opposition those results were achieved against. Future posts might discuss this in further detail.

As well as providing weekly tables, Alt-3 also provide a blog which discusses the results and highlight the differences between the standard and the revised tables.