Revel in the amazement

In an earlier post I included the following table:

As I explained, one of the columns contains the genuine land areas of each country, while the other is fake. And I asked you which is which.

The answer is that the first column is genuine and the second is fake. But without a good knowledge of geography, how could you possibly come to that conclusion?

Well, here’s a remarkable thing. Suppose we take just the leading digit of each  of the values. Column 1 would give 6, 2, 2, 1,… for the first few countries, while column 2 would give 7, 9, 3, 3,… It turns out that for many naturally occurring phenomena, you’d expect the leading digit to be 1 on around 30% of occasions. So if the actual proportion is a long way from that value, then it’s likely that the data have been manufactured or manipulated.

Looking at column 1 in the table, 5 out of the 20 countries have a population with leading digit 1; that’s 25%. In column 2, none do; that’s 0%. Even 25% is a little on the low side, but close enough to be consistent with 30% once you allow for discrepancies due to random variations in small samples. But 0% is pretty implausible. Consequently, column 1 is consistent with the 30% rule, while column 2 is not, and we’d conclude – correctly – that column 2 is faking it.

But where does this 30% rule come from? You might have reasoned that each of the digits 1 to 9 were equally likely – assuming we drop leading zeros – and so the percentage would be around 11% for a leading digit of 1, just as it would be for any of the other digits. Yet that reasoning turns out to be misplaced, and the true value is around 30%.

This phenomenon is a special case of something called Benford’s law, named after the physicist Frank Benford who first formalised it. (Though it had also been noted much earlier by the astronomer Simon Newcomb). Benford’s law states that for many naturally occurring datasets, the probability that the leading digit of a data item is 1 is equal to 30.1%. Actually, Benford’s law goes further than that, and gives the percentage of times you’d get a 2 or a 3 or any of the digits 1-9 as the leading digit. These percentages are shown in the following table.

Leading Digit 1 2 3 4 5 6 7 8 9
Frequency 30.1% 17.6% 12.5% 9.7% 7.9% 6.7% 5.8% 5.1% 4.6%

For those of you who care about such things, these percentages are log(2/1), log(3/2), log(4/3) and so on up to log(10/9), where log here is logarithm with respect to base 10.

But does Benford’s law hold up in practice? Well, not always, as I’ll discuss below. But often it does. For example, I took a dataset giving the altitudes of a large set of football stadiums around the world. I discarded a few whose altitude is below sea level, but was still left with over 13,000 records. I then extracted the leading digit of each of the altitudes (in metres)  and plotted a histogram of these values. This is just a plot of the percentages of occasions each value occurred. These are the blue bars in the following diagram. I then superimposed the predicted proportions from Benford’s law. These are the black dots.


The agreement between the observed percentages and those predicted by Benford’s law is remarkable. In particular, the observed percentage of leading digits equal to 1 is almost exactly what Benford’s law would imply. I promise I haven’t cheated with the numbers.

As further examples, there are many series of mathematically generated numbers for which Benford’s law holds exactly.

These include:

  • The Fibonacci series: 1, 1, 2, 3, 5, 8, 13, …. where each number is obtained by summing the 2 previous numbers in the series.
  • The integer powers of two: 1, 2, 4, 8, 16, 32, …..
  • The iterative series obtained by starting with any number and successively multiplying by 3. For example, starting with 7, we get: 7, 21, 63, 189,….

In each of these cases of infinite series of numbers, exactly 30.1% will have leading digit equal to 1; exactly 17.6% will have leading digit equal to 2, and so on.

And there are many other published examples of data fitting Benford’s law (here, here, here… and so on.)

Ok, at this point you should pause to revel in the amazement of this stuff. Sometimes mathematics, Statistics and probability come together in a way to explain naturally occurring phenomena that is so surprising and shockingly elegant it takes your breath away.

So, when does Benford’s law work. And why?

It turns out there are various ways of explaining Benford’s law, but none of them – at least as far as I can tell – is entirely satisfactory. All of them require a leap of faith somewhere to match the theory to real-life. This view is similarly expressed in an academic article, which concludes:

… there is currently no unified approach that simultaneously explains (Benford’s law’s) appearance in dynamical systems, number theory, statistics, and real-world data.

Despite this, the various arguments used to explain Benford’s law do give some insight into why it might arise naturally in different contexts:

  1. If there is a law of this type, Benford’s law is the only one that works for all choices of scale. The decimal representation of numbers is entirely arbitrary, presumably deriving from the fact that humans, generally, have 10 fingers. But if we’d been born with 8 fingers, or chosen to represent numbers anyway in binary, or base 17, or something else, you’d expect a universal law to be equally valid, and not dependent on the arbitrary choice of counting system. If this is so, then it turns out that Benford’s law, adapted in the obvious way to the choice of scale, is the only one that could possibly hold. An informal argument as to why this should be so can be found here.
  2. If the logarithm of the variable under study has a distribution that is smooth and roughly symmetric – like the bell-shaped normal curve, for example – and is also reasonably well spread out, it’s easy to show that Benford’s law should hold approximately. Technically, for those of you who are interested, if X is the thing we’re measuring, and if log X has something like a normal distribution with a variance that’s not too small, then Benford’s law is a good approximation for the behaviour of X. A fairly readable development of the argument is given here. (Incidentally, I stole the land area of countries example directly from this reference.)

But in the first case, there’s no explanation as to why there should be a universal law, and indeed many phenomena – both theoretical and in nature – don’t follow Benford’s law. And in the second case, except for special situations where the normal distribution has some kind of theoretical justification as an approximation, there’s no particular reason why the logarithm of the observations should behave in the required way. And yet, in very many cases – like the land area of countries or the altitude of football stadiums – the law can be shown empirically to be a very good approximation to the truth.

One thing which does emerge from these theoretical explanations is a better understanding of when Benford’s law is likely to apply and when it’s not. In particular, the argument only works when the logarithm of the variable under study is reasonably well spread out. What that means in practice is that the variable itself needs to cover several orders of magnitude: tens, hundreds, thousands etc. This works fine for something like the stadium altitudes, which vary from close to sea-level up to around 4,000 metres, but wouldn’t work for total goals in football matches, which are almost always in the range 0 to 10, for example.

So, there are different ways of theoretically justifying Benford’s law, and empirically it seems to be very accurate for different datasets which cover orders of magnitude. But does it have any practical uses? Well, yes: applications of Benford’s law have been made in many different fields, including…

Finally, there’s also a version of Benford’s law for the second digit, third digit and so on. There’s an explanation of this extension in the Wikipedia link that I gave above. It’s probably not easy to guess exactly what the law might be in these cases, but you might try and guess how the broad pattern of the law changes as you move from the first to the second and to further digits.

Thanks to those of you wrote to me after I made the original post. I don’t think it was easy to guess what the solution was, and indeed if I was guessing myself, I think I’d have been looking for a uniformity in the distribution of the digits, which turns out to be completely incorrect, at least for the leading digit. Even though I’ve now researched the answer myself, and made some sense of it, I still find it rather shocking that the law works so well for an arbitrary dataset like the stadium altitudes. Like I say: revel in the amazement.

Statty night

Apologies for the terrible pun in the title.

When I used to teach Statistics I tried to emphasise to students that Statistics is as much an art as a science. Statisticians are generally trying to make sense of some aspect of the world, and they usually have just some noisy data with which to try to do it. Sure, there are algorithms and computer packages they can chuck data into and get simple answers out of. But usually those answers are meaningless unless the algorithm/package is properly tailored to the needs of the specific problem. And there are no rules as to how that is best done: it needs a good understanding of the problem itself, an awareness of the data that are available and the creative skill to be able to mesh those things with appropriate statistical tools. And these are skills that are closer to the mindset of an artist than of a scientist.

But anyway… I recently came across the following picture which turns the tables, and uses Statistics to make art. (Or to destroy art, depending on your point of view). You probably recognise the picture at the head of this post as Van Gogh’s Starry Night, which is displayed at MOMA in New York.

By contrast, the picture below is a statistical reinterpretation of the original version of Starry Night, created by photographer Mario Klingemann through a combination of data visualisation and statistical summarisation techniques .

The Starry Night Pie Packed

As you can see, the original painting has been replaced by a collage of coloured circles, which are roughly the same colour as the original painting. But in closer detail, the circles have an interesting structure. Each is actually a pie chart whose slices in size and colour correspond the proportions of colours in that region of the original picture.

Yes, pointless, but kind of fun nonetheless. You can find more examples of Klingemann’s statistically distorted classical artworks here.

In similar vein… the diagram below, produced by artist Arthur Buxton, is actually a quiz. Each of the pie charts represents the proportions of the main colours in one of Van Gogh’s paintings. In other words, these pie charts represent the colour distributions over a whole Van Gogh painting, rather than just a small region of a picture, as in the painting above. The quiz is to identify which Van Gogh painting each of the pie charts refers to.

You can find a short description of Arthur Buxton’s process in developing this picture here.

There’s just a small snag: I haven’t been able to locate the answers. My guess is that the pie chart in column 2 of row 2 corresponds to Starry Night. And the one immediately to the left of that is from the Sunflower series. But that’s pretty much exhausted my knowledge of the works of Van Gogh. Let me know if you can identify any of the others and I’ll add them to a list below.


On top of the world

I’ll be honest, usually I try to find a picture that fits in with the statistical message I’m trying to convey. But occasionally I see a picture and then look for a statistical angle to justify its inclusion in the blog. This is one of those occasions. I don’t know what your mental image of the top of Everest is like, but until now mine wasn’t something that resembled the queue for the showers at Glastonbury.

Anyway, you might have read that this congestion to reach the summit of Everest is becoming increasingly dangerous. In the best of circumstances the conditions are difficult, but climbers are now faced with a wait of several hours at very high altitude with often unpredictable weather. And this has contributed to a spate of recent deaths.

But what’s the statistical angle? Well, suppose you wanted to make the climb yourself. What precautions would you take? Obviously you’d get prepared physically and make sure you had the right equipment. But beyond that, it turns out that a statistical analysis of relevant data, as the following video shows, can both improve your chances of reaching the summit and minimise your chances of dying while doing so.

This video was made by Dr Melanie Windridge, and is one of a series she made under the project title “Summiting the Science of Everest”. Her aim was to explore the various scientific aspects associated with a climb of Everest, which she undertook in Spring 2018. And one of these aspects, as set out in the video, is the role of data analysis in planning. The various things to be learned from the data include:

  1. Climbing from the south Nepal side is less risky than from the north Tibet side. This is explained by the steeper summit on the south side making descent quicker in case of emergency.
  2. Men and women have equally successful at completing summits of Everest. And they also have similar death rates.
  3. Age is a big factor: over forties are less likely to make the summit; over sixties have a much higher death rate.
  4. Most deaths occur in the icefall regions of the mountain.
  5. Many deaths occur during descent.
  6. Avalanches are a common cause of death. Though they are largely unpredictable, they are less frequent in Spring. Moreover, walking through the icefall regions early in the morning also reduces avalanche risk.
  7. The distribution of summit times for climbers who survive is centred around 9 a.m., whereas for those who subsequently die during the descent it’s around 2 p.m. In other words, it’s safest to aim to arrive at the summit relatively early in the morning.

Obviously, climbing Everest will never be risk free – the death rate of people making the summit is, by some counts, around 6.5%. But intelligent use of available data can help minimise the risks. Statistics, in this context, really can be a matter of life or death.

Having said that, although Dr Melanie seemed reassured that the rate of deaths of climbers is decreasing, here’s a graphical representation of the data showing that the actual number of deaths – as opposed to the rate of deaths – is generally increasing with occasional spikes.

Looking on the bright side of things though, Everest is a relatively safe mountain to climb: the death rate for climbers on Annapurna, also in the Himalayas, is around 33%!

In light of all this, if you prefer your climbs to the top of the world to be risk free, you might try scaling the Google face (though I recommend turning the sound off first):

While for less than the prices of a couple of beers you can get a full-on VR experience as previewed below:

Finally, if you’re really interested in the statistics of climbing Everest, there’s a complete database of all attempted climbs available here.

Faking it


Take a look at the following table:



It shows the total land area, in square kilometres, for various countries. Actually, it’s the first part of a longer alphabetical list of all countries and includes two columns of figures, each purporting to be the corresponding area of each country. But one of these columns contains the real areas and the other one is fake. Which is which?

Clearly, if your knowledge of geography is good enough that you know the land area of Belgium – or any of the other countries in the table – or whether Bahrain is bigger than Barbados, then you will know the answer. You could also cheat and check with Google. But you can answer the question, and be almost certain of being correct, without cheating and without knowing anything about geography. Indeed, I could have removed the first column giving the country names, and even not told you that the data correspond to land areas, and you should still have been able to tell me which column is real and which is fake.

So, which column is faking it? And how do you know?

I’ll write a follow-up post giving the answer and explanation sometime soon. Meantime, if you’d like to write to me giving your own version, I’d be happy to hear from you.


Freddy’s story: part 2

In a previous post I discussed a problem that had written to me about. The problem was a simplified version of an issue sent to him by friend, connected with a genetic algorithm for optimisation. Simply stated: you start with £100. You toss a coin and if it comes up tails you lose 25% of your current money, otherwise you gain 25%. You play this game over and over, always increasing or increasing your current money by 25% on the basis of a coin toss. The issue is how much money you expect to have, on average, after 1000 rounds of this game.

As I explained in the original post, Freddy’s intuition was that the average should stay the same at each round. So even after 1000 (or more) rounds, you’d have an average of £100. But when Freddy simulated the process, he always got an amount close to £0, and so concluded his intuition must be wrong.

A couple of you wrote to give your own interpretations of this apparent conflict, and I’m really grateful for your participation. As it turns out, Freddy’s intuition was spot on, and his argument was pretty much a perfect mathematical proof. Let me make the argument just a little bit more precise.

Suppose after n rounds the amount of money you have is M. Then after n+1 rounds you will have (3/4)M if you get a Head and (5/4)M if you get a Tail. Since each of these outcomes is equally probable, the average amount of money after n+1 rounds is

\frac{ (3/4)M + (5/4)M}{2}= M

In other words, exactly as Freddy had suggested, the average amount of money doesn’t change from one round to the next. And since I started with £100, this will be the average amount of money after 1 round, 2 rounds and all the way through to 1000 rounds.

But if Freddy’s intuition was correct, the simulations must have been wrong.

Well, no. I checked Freddy’s code – a world first! – and it was perfect. Moreover, my own implementation displayed the same features as Freddy’s, as shown in the previous post: every simulation has the amount of money decreasing to zero long before 1000 rounds have been completed.

So what explains this contradiction between what we can prove theoretically and what we see in practice?

The following picture shows histograms of the money remaining after a certain number of rounds for each of 100,000 simulations. In the previous post I showed the individual graphs of just 16 simulations of the game; here we’re looking at a summary of 100,000 simulated games.

For example, after 2 rounds, there are only 3 possible outcomes: £56.25, £93.75 and £156.25. You might like to check why that should be so. Of these, £93.75 occurred most often in the simulations, while the other two occurred more or less equally often. You might also like to think why that should be so. Anyway, looking at the values, it seems plausible that the average is around £100, and indeed the actual average from the simulations is very close to that value. Not exact, because of random variation, but very close indeed.

After 5 rounds there are more possible outcomes, but you can still easily convince yourself that the average is £100, which it is. But once we get to 10 rounds, it starts to get more difficult. There’s a tendency for most of the simulated runs to give a value that’s less than £100, but then there are relatively few observations that are quite a bit bigger than £100. Indeed, you can just about see that there is one or more value close to £1000 or so. What’s happening is that the simulated values are becoming much more asymmetric as the number of rounds increases. Most of the results will end up below £100 – though still positive, of course – but a few will end up being much bigger than £100. And the average remains at £100, exactly as the theory says it must.

After 100 rounds, things are becoming much more extreme. Most of the simulated results end up close to zero, but one simulation (in this case) gave a value of around £300,000. And again, once the values are averaged, the answer is very close to £100.

But how does this explain what we saw in the previous post? All of the simulations I showed, and all of those that Freddy looked at, and those his friend obtained, showed the amount of money left being essentially zero after 1000 rounds. Well, the histogram of results after 1000 rounds is a much, much more extreme case of the one shown above for 100 rounds. Almost all of the probability is very, very close to zero. But there’s a very small amount of probability spread out up to an extremely large value indeed, such that the overall average remains £100. So almost every time I do a simulation of the game, the amount of money I have is very, very close to zero. But very, very, very occasionally, I would simulate a game whose result was a huge amount of money, such that it would balance out all of those almost-zero results and give me an answer close to £100. But, such an event is so rare, it might take billions of billions of simulations to get it. And we certainly didn’t get it in the 16 simulated games that I showed in the previous post.

So, there is no contradiction at all between the theory and the simulations. It’s simply that when the number of rounds is very large, the very large results which could occur after 1000 rounds, and which ensure that the average balances out to £100, occur with such low probability that we are unlikely to simulate enough games to see them. We therefore see only the much more frequent games with low winnings, and calculate an average which underestimates the true value of £100.

There are a number of messages to be drawn from this story:

  1. Statistical problems often arise in the most surprising places.
  2. The strategy of problem simplification, solution through intuition, and verification through experimental results is a very useful one.
  3. Simulation is a great way to test models and hypotheses, but it has to be done with extreme care.
  4. And if there’s disagreement between your intuition and experimental results, it doesn’t necessarily imply either is wrong. It may be that the experimental process has complicated features that make results unreliable, even with a large number of simulations.

Thanks again to Freddy for the original problem and the discussions it led to.

To be really precise, there’s a bit of sleight-of-hand in the mathematical argument above. After the first round my expected – rather than actual – amount of money is £100. What I showed above is that the average money I have after any round is equal to the actual amount of money I have at the start of that round. But that’s not quite the same thing as showing it’s equal to the average amount of money I have at the start of the round.

But there’s a famous result in probability – sometimes called the law of iterated expectations – which lets me replace this actual amount at the start of the second round with the average amount, and the result stays the same. You can skip this if you’re not interested, but let me show you how it works.

At the start of the first round I have £100.

Because of the rules of the game, at the end of this round I’ll have either £75 or £125, each with probability 1/2.

In the first case, after the second round, I’ll end up with either £56.25 or £93.75, each with probability 1/2. And the average of these is £75.

In the second case, after the second round, I’ll end up with either £93.75 or £125.75, each with probability 1/2. And the average of these is £125.

And if I average these averages I get £100. This is the law of iterated expectations at work. I’d get exactly the same answer if I averaged the four possible 2-round outcomes: £56.25, £93.75 (twice) and £125.75.


\frac{56.25 + 93.75 + 93.75 + 125.75}{4} = 100

So, my average after the second round is equal to the average after the first which was equal to the initial £100.

The same argument also applies at any round: the average is equal to the average of the previous round. Which in turn was equal to the average of the previous round. And so on, telescoping all the way back to the initial value of £100.

So, despite the sleight-of-hand, the result is actually true, and this is precisely what Freddy had hypothesised. As explained above, his only ‘mistake’ was to observe that a small number of simulations suggested a quite different behaviour, and to assume that this meant his mathematical reasoning was wrong.


Midrange is dead

Kirk Goldsberry is the author of a new book on data analytics for NBA. I haven’t read the book, but some of the graphical illustrations he’s used for its publicity are great examples of the way data visualization techniques can give insights about the evolution of a sport in terms of the way it is played.


Press the start button in the graphic of the above tweet.. I’m not sure exactly how the graphic and the data are mapped, but essentially the coloured hexagons show regions of the basketball court which are the most frequent  locations for taking shots. The animation shows how this pattern has changed over the seasons.

As you probably know, most goals in basketball – excluding penalty shots – are awarded 2 points. But a shot that’s scored from outside a distance of 7.24m from the basket – the almost semi-circular outer-zone shown in the figure – scores 3 points. So, there are two ways to improve the number of points you are likely to score when shooting: first, you can get closer to the basket, so that the shot is easier; or second, you can shoot from outside the three-point line, so increasing the number of points obtained when you do score. That means there’s a zone in-between, where the shot is still relatively difficult because of the distance from the basket, but for which you only get 2 points when you do score. And what the animation above clearly shows is an increasing tendency over the seasons for players to avoid shooting from this zone. This is perhaps partly because of a greater understanding of the trade-off between difficulty and distance, and perhaps also because improved training techniques have led to a greater competency in 3-point shots.

Evidence to support this reasoning is the following data heatmap diagram which shows the average number of points scored from shots taken at different locations on the court. The closer to red, the higher the average score per shot.

Again the picture makes things very clear: average points scored are highest when shooting from very close to the basket, or from outside of the 3-point line. Elsewhere the average is low. It’s circumstantial evidence, but the fact that this map of points scored has patterns that are so similar to the current map of where players are shooting from, there’s a strong suggestion that players have evolved their play style in order to shoot at the basket from positions which they know are more likely to generate the most points.

In summary, creative use of both static and animated graphical data representations provide great insights about the way basketball play has evolved, and why that evolution is likely to have occurred, given the 3-point shooting rule.

Thanks to for posting something along these lines on RocketChat.

Animal experiments

Ever thought your cat might be trolling you? Turns out you’re right. As explained in this New Scientist article, recent Japanese research concludes that cats are entirely capable of recognising their names; they just choose not to when it suits them.

The full details of the experiment are included in a research report published in Nature. It’s an interesting, though not entirely easy, read. But I’d like to use it to point out an aspect of statistical methodology that is often ignored: statistical analyses don’t usually start with the analysis of data; they start with the design of the experiment by which the data are to be collected. And it’s essential that an experiment is designed correctly in order to be able to use Statistics to answer the question you’re interested in.

So, in this particular study, the researchers carried out four separate experiments:

  • In experiment 1, the ability of cats to distinguish their own names from that of other similar nouns was tested;
  • In experiment 2, cats living with numerous other cats were tested to see if they could distinguish their own name from that of other cats in the same household;
  • Experiment 3 was like experiment 1, but using cats from a ‘cat cafe‘ (don’t ask) rather than a normal household;
  • Experiment 4 was also like experiment 1, but using a voice other than the cat’s owner to trigger the responses.

Through this sequence of experiments, the researchers were able to judge whether or not the cats genuinely recognise and respond to their own names in a variety of environments, and to exclude the possibility that the responses were due to factors other than actual name recognition. As such, this is a great example of how the design of an experiment has been carefully tailored to ensure that a statistical analysis of the data it generates is able to answer the question of interest.

I won’t go into details, but there are many other aspects of the experimental design that also required careful specification:

  1. The number of cats to be included in the study;
  2. The choice of words to use as alternative stimuli to the cats’ names, and the order in which they are used;
  3. The definitions of actions that are considered positive responses to stimuli;
  4. The protocol for determining whether a cat has responded positively to a stimuli or not;

amongst others. Full details are available in the Nature article, as indeed are the data, should you wish to analyse them yourself.

In the context of sports modelling, these kinds of issues are less explicit, since analyses are usually retrospective, using data that have already been historically collected and stored. Nonetheless, the selection of which data to include in an analysis can affect the analysis, and it’s important to ensure that results are not sensitive to specific, subjective choices. However, for analyses of data that include a decision process – such as betting strategies – it may well be relevant to formulate an experimental design for a prospective study, comparing results based on one type of strategy, compared with that of another. We’ll discuss strategies for this type of experiment in a future post.


Can’t buy me love

Ok, money can’t buy you love, but can it buy you the Premier League title? We’ll look at that below, but first this recent Guardian article notes the following Premier League statistics:

Between 2003 and 2006 there were just 3 instances of a team having more than 70% of possession in a game. Two seasons ago there were 37, last season 63 and this season 67.

In other words, by even the simplest of statistical measures, Premier League games are becoming increasingly one-sided, at least in terms of possession. And the implication in the Guardian article is that money is the driving factor behind this imbalance. But is that really the case?

This graph shows final league position of the 20 Premier League teams plotted against their wealth in terms of start-of-season squad market value (taken from here).

To make things slightly clearer, the following diagram shows the same thing, but with a smooth curve (in blue) added on top, estimated using standard statistical techniques, which shows the overall trend in the data.

Roughly speaking, teams above the blue line have performed better than their financial resources would have suggested; those below have performed worse.

Bear in mind this is just one season’s data. Also, success breeds success, and money breeds money, so the differential between teams in terms of wealth as a season progresses is likely to increase further. For these reasons and others, not too much should be read into the slight wobbles in the blue curve. Nonetheless, a number of general features emerge:

  1. It’s a very noisy picture for teams with less than £250 m. Arguably, at that level, there’s no very obvious pattern between wealth and final position: there’s a bunch of teams with between £100 m and £250 m, and their league position within this group of teams isn’t obviously dependent on their wealth. As such, teams in this category are unlikely to get out of the bottom half of the table, and their success within the bottom half is more likely to depend on how well they’ve spent their money than on how much they actually have. And on luck.
  2. Teams with between £250 m and £500 m are likely to force their way out of the ‘relegation-battle pack’, but not into the top 6 elite.
  3. The cost of success at the top end is high: the blue curve at the top end is quite flat, so you have to spend a lot to improve your position. But money, as long as there’s enough of it, counts a lot for elite clubs, and the evidence is that the teams who are prepared to spend the most are likely to improve their league position.
  4. A couple of clubs stand out as having performed very differently to what might be expected: Manchester United have considerably under-performed, while Wolves have substantially over-performed.

The trials and tribulations of Manchester United are well documented. Chances are they just need a change of manager. <Joke>. But Wolves is a much more interesting case, which takes us back to the Guardian article I referred to. As discussed above, this article is more about the way money is shaping the way games are played rather than about the success it brings, with matches between the rich and poor teams increasingly becoming challenges of the attack of one side against the defence of the other. But Wolves have adapted to such imbalances, playing long periods without possession, and attacking with speed and precision when they do have the ball. The template for this type of play was Leicester City in their title-winning season, but even though it was just a few seasons ago, the financial imbalances were far smaller than now.

It seems then, that to a very large extent, a team’s performance in the Premier League is likely to be determined by its wealth. Good management can mitigate for this, just as bad management can lead to relatively poor performance. But even where teams are punching above their weight, they are having to do so by adapting their gameplay, so that matches are still dominated in terms of possession by the wealthier sides. As the Guardian article concludes:

Money guides everything. There have always been rich clubs, of course, but they have never been this rich, and the financial imbalances have never had such an impact on how the game is played.


Freddy’s story: part 1

This is a great story with a puzzle and an apparent contradiction at the heart of it, that you might like to think about yourself.

A couple of weeks ago wrote to me to say that he’d been looking at the recent post which discussed a probability puzzle based on coin tossing, and had come across something similar that he thought might be useful for the blog. Actually, the problem Freddy described was based on an algorithm for optimisation using genetic mutation techniques, that a friend had contacted him about.

To solve the problem, Freddy did four smart things:

  1. He first simplified the problem to make it easier to tackle, while still maintaining its core elements;
  2. He used intuition to predict what the solution would be;
  3. He supported his intuition with mathematical formalism;
  4. He did some simulations to verify that his intuition and mathematical reasoning were correct.

This is exactly how a statistician would approach both this problem and problems of greater complexity.

However… the pattern of results Freddy observed in the simulations contradicted what his intuition and mathematics had suggested would happen, and so he adjusted his beliefs accordingly. And then he wrote to me.

This is the version of the problem that Freddy had simplified from the original…

Suppose you start with a certain amount of money. For argument’s sake, let’s say it’s £100. You then play several rounds of a game. At each round the rules are as follows:

  1. You toss a fair coin (Heads and Tails each have probability 1/2).
  2. If the coin shows Heads, you lose a quarter of your current amount of money and end up with 3/4 of what you had at the start of the round.
  3. If the coin shows Tails, you win a quarter of your current amount of money and end up with 5/4 of what you had at the start of the round.

For example, suppose your first 3 tosses of the coin are Heads, Tails, Heads. The money you hold goes from £100, to £75 to £93.75 to £70.3125.

Now, suppose you play this game for a large number of rounds. Again, for argument’s sake, let’s say it’s 1000 rounds. How much money do you expect to have, on average, at the end of these 1000 rounds?

Have a think about this game yourself, and see what your own intuition suggests before scrolling down.


Freddy’s reasoning was as follows. In each round of the game I will lose or gain 25% of my current amount of money with equal probability. So, if I currently have £100, then at the end of the next round I will have either £75 or £125 with equal probability. And the average is still £100. This reasoning is true at each round of the game. And so, after any number of rounds, including 1000, I’d expect to have exactly the same amount of money as when I started: £100.

But when Freddy simulated the process, he found a different sort of behaviour. In each of his simulations, the money held after 1000 rounds was very close to zero, suggesting that the average is much smaller than £100.

I’ve taken the liberty of doing some simulations myself: the pattern of results in 16 repeats of the game, each time up to 1000 rounds,  is shown in the following figure.

Each panel of the figure corresponds to a repeat of the game, and in each repeat I’ve plotted a red trace showing how much money I hold after each round of the game.  In each case you can see that I start with £100, there’s then a bit of oscillation – more in some of the realisations than in others, due to random variation – but in all cases the amount of money I have hits something very close to zero somewhere before 250 rounds and then stays there right up to 1000 rounds.

So, there is indeed a conflict between Freddy’s intuition and the picture that these simulations provide.

What’s going on?

I’ll leave you to think about it for a while, and write with my own explanation and discussion of the problem in a future post. If you’d like to write to me to explain what you think is happening, I’d be very happy to hear from you.

Obviously, I’m especially grateful to Freddy for having sent me the problem in the first place, and for agreeing to let me write a post about it.

Update: if you’d like to run the simulation exercise yourself, just click the ‘run’ button in the following window. This will simulate the game for 1000 rounds, starting with £100. The graph will show you how much money you hold after each round of the game, while if you toggle to the console window it will tell you how much money you have after the 1000th round (to the nearest £0.01). This may not work in all browsers, but seems to work ok in Chrome. You can repeat the experiment simply by clicking ‘Run’ again. You’re likely to get a different graph each time because of the randomness in the simulations. But what about the final amount? Does that also change? And what does it suggest about Freddy’s reasoning that the average amount should stay equal to £100?

game_sim<-function(n_rounds=1000, money_start=100){ require(ggplot2) money<-c() money[1]<-money_start for(i in 2:(n_rounds)){ money[i]<-money[i-1]*sample(c(.75,1.25),1) } m<-data.frame(round=1:n_rounds,money=money) cat('Money in pounds after ',n_rounds, ' rounds is ',round(money[n_rounds],2)) ggplot(aes(x=round,y=money),data=m)+geom_line(color='red')+ ggtitle('Money') } game_sim()

Nul Points

No doubt you’re already well-aware of, and eagerly anticipating, this year’s Eurovision song contest final to be held in Tel Aviv between the 14th and 18th May. But just in case you don’t know, the Eurovision song contest is an annual competition to choose the ‘best’ song entered between the various participating European countries. And Australia!

Quite possibly the world would never have heard of Abba if they hadn’t won Eurovision. Nor Conchita Wurst.

The voting rules have changed over the years, but the structure has remained pretty much the same. Judges from each participating country rank their favourite 10  songs – excluding that of their own country, which they cannot vote for – and points are awarded on the basis of preference. In the current scheme, the first choice gets 12 points, the second choice 10 points, the third choice 8 points, then down to the tenth choice which gets a single point.

A country’s total score is the sum awarded by each of the other countries, and the country with the highest score wins the competition. In most years the scoring system has made it possible for a song to receive zero points – nul points – as a total, and there’s a kind of anti-roll-of-honour dedicated to countries that have accomplished this feat. Special congratulations to Austria and Norway who, despite their deep contemporary musical roots, have each scored nul points on four occasions.

Anyway, here’s the thing. Although the UK gave the world The Beatles, The Rolling Stones, Pink Floyd, Led Zeppelin, David Bowie, Joy Division and Radiohead. And Adele. It hasn’t done very well in recent years in the Eurovision Song Contest.  It’s true that by 1997 the UK had won the competition a respectable 5 times – admittedly with a bit of gratuitous sexism involving the removal of women’s clothing to distract judges from the paucity of the music. But since then, nothing. Indeed, since 2000 the UK has finished in last place on 3 occasions, and has only twice been in the top 10.

Now, there are two possible explanations for this.

  1. Our songs have been terrible. (Well, even more terrible than the others).
  2. There’s a stitch-up in the voting process, with countries penalising England for reasons that have nothing to do with the quality of the songs.

But how can we objectively distinguish between these two possibilities? The poor results for the UK will be the same in either case, so we can’t use the UK’s data alone to unravel things.

Well, one way is to hypothesise a system by which votes are cast that is independent of song quality, and to see if the data support that hypothesis. One such hypothesis is a kind of ‘bloc’ voting system, where countries tend to award higher votes for countries of a similar geographical or political background to their own.

This article carries out an informal statistical analysis of exactly this type. Though the explanations in the article are sketchy, a summary of the results is given in the following figure. Rather than pre-defining the blocs, the authors use the data on voting patterns themselves to identify 3 blocs of countries whose voting patterns are similar. They are colour-coded in the figure, which shows (in some vague, undefined sense) the tendency for countries on the left to favour countries on the right in voting. Broadly speaking there’s a northern Europe group in blue, which includes the UK, an ex-Yugoslavian bloc in green and a rest-of-Europe bloc in red. But whereas the fair-minded north Europeans tend to spread their results every across all countries, the other two blocs tend to give highest votes to other member countries within the same bloc.

But does this mean the votes are based on non-musical criteria? Well, not necessarily. It’s quite likely that cultural differences – including musical ones – are also smaller within geographically homogeneous blocs than across them. In other words, Romania and Moldavia might vote for each other at a much higher than average rate, but this could just as easily be because they have similar musical roots and tastes as because they are friends scratching each other’s backs.

Another study finding similar conclusions about geo-political bloc voting is contained in this Telegraph article, which makes similar findings, but concludes:

Comforting as it might be to blame bloc voting for the UK’s endless poor record, it’s not the only reason we don’t do well.

In other words, in a more detailed analysis which models performance after allowing for bloc-voting effects, England is still doing badly.

This whole issue has also been studied in much greater detail in the academic literature using complex statistical models, and the conclusions are similar, though the authors report language and cultural similarities as being more important than geographical factors.

The techniques used in these various different studies are actually extremely important in other areas of application. In genetic studies, for example, they are used to identify groups of markers for certain disease types. And even in sports modelling they can be relevant for identifying teams or players that have similar styles of play.

But if Eurovision floats your boat, you can carry out your own analysis of the data based on the complete database of results available here.

Update: Thanks to for pointing me to this. So not only did the UK finish last this year, they also had their points score reduced retrospectively. If ever you needed evidence of an anti-UK conspiracy… 😉