Can’t buy me love

Ok, money can’t buy you love, but can it buy you the Premier League title? We’ll look at that below, but first this recent Guardian article notes the following Premier League statistics:

Between 2003 and 2006 there were just 3 instances of a team having more than 70% of possession in a game. Two seasons ago there were 37, last season 63 and this season 67.

In other words, by even the simplest of statistical measures, Premier League games are becoming increasingly one-sided, at least in terms of possession. And the implication in the Guardian article is that money is the driving factor behind this imbalance. But is that really the case?

This graph shows final league position of the 20 Premier League teams plotted against their wealth in terms of start-of-season squad market value (taken from here).

To make things slightly clearer, the following diagram shows the same thing, but with a smooth curve (in blue) added on top, estimated using standard statistical techniques, which shows the overall trend in the data.

Roughly speaking, teams above the blue line have performed better than their financial resources would have suggested; those below have performed worse.

Bear in mind this is just one season’s data. Also, success breeds success, and money breeds money, so the differential between teams in terms of wealth as a season progresses is likely to increase further. For these reasons and others, not too much should be read into the slight wobbles in the blue curve. Nonetheless, a number of general features emerge:

  1. It’s a very noisy picture for teams with less than £250 m. Arguably, at that level, there’s no very obvious pattern between wealth and final position: there’s a bunch of teams with between £100 m and £250 m, and their league position within this group of teams isn’t obviously dependent on their wealth. As such, teams in this category are unlikely to get out of the bottom half of the table, and their success within the bottom half is more likely to depend on how well they’ve spent their money than on how much they actually have. And on luck.
  2. Teams with between £250 m and £500 m are likely to force their way out of the ‘relegation-battle pack’, but not into the top 6 elite.
  3. The cost of success at the top end is high: the blue curve at the top end is quite flat, so you have to spend a lot to improve your position. But money, as long as there’s enough of it, counts a lot for elite clubs, and the evidence is that the teams who are prepared to spend the most are likely to improve their league position.
  4. A couple of clubs stand out as having performed very differently to what might be expected: Manchester United have considerably under-performed, while Wolves have substantially over-performed.

The trials and tribulations of Manchester United are well documented. Chances are they just need a change of manager. <Joke>. But Wolves is a much more interesting case, which takes us back to the Guardian article I referred to. As discussed above, this article is more about the way money is shaping the way games are played rather than about the success it brings, with matches between the rich and poor teams increasingly becoming challenges of the attack of one side against the defence of the other. But Wolves have adapted to such imbalances, playing long periods without possession, and attacking with speed and precision when they do have the ball. The template for this type of play was Leicester City in their title-winning season, but even though it was just a few seasons ago, the financial imbalances were far smaller than now.

It seems then, that to a very large extent, a team’s performance in the Premier League is likely to be determined by its wealth. Good management can mitigate for this, just as bad management can lead to relatively poor performance. But even where teams are punching above their weight, they are having to do so by adapting their gameplay, so that matches are still dominated in terms of possession by the wealthier sides. As the Guardian article concludes:

Money guides everything. There have always been rich clubs, of course, but they have never been this rich, and the financial imbalances have never had such an impact on how the game is played.

 

Ernie is dead, long live Ernie

Oh no, this weekend they killed Ernie

Well, actually, not that one. This one…

No, no, no. That one died some time ago. This one…

But don’t worry, here’s Ernie (mark 5)…

Let me explain…

Ernie (Electronic Random Number Indicator Equipment) is the acronym of the random number generator that is used by the government’s National Savings and Investments (NSI) department for selecting Premium Bond winners each month.

Premium bonds are a form of savings certificates. But instead of receiving a fixed or variable interest rate paid at regular intervals, like most savings accounts, premium bonds are a gamble. Each month a number of bonds from all those in circulation are selected at random and awarded prizes, with values ranging from £25 to £1,000,000. Overall, the annual interest rate is currently around 1.4%, but with this method most bond holders will receive 0%, while a few will win many times more than the actual bond value of £1, up to one million pounds.

So, your initial outlay is safe when you buy a premium bond – you can always cash them in at the price you paid for them – but you are gambling with the interest.

Now, the interesting thing from a statistical point of view is the monthly selection of the winning bonds. Each month there are nearly 3 million winning bonds, most of which win the minimum prize of £25, but 2 of which win the maximum of a million pounds. All these winning bonds have to be selected at random. But how?

As you probably know, the National Lottery is based on a single set of numbers that are randomly generated through the physical mechanism of the mixing and selection of numbered balls. But this method of random number generation is completely impractical for the random selection of several million winning bonds each month. So, a method of statistical simulation is required.

In a previous post we already discussed the idea of simulation in a statistical context. In fact, it turns out to be fairly straightforward to generate mathematically a series of numbers that, to all intents and purposes, look random. I’ll discuss this technique in a future post, but the basic idea is that there are certain formulae which, when used recursively, generate a sequence of numbers that are essentially indistinguishable from a series of random numbers.

But here’s the thing: the numbers are not really random at all. If you know the formula and the current value in the sequence, you can calculate exactly the next value in the sequence. And the next one. And so on.

Strictly, a sequence of numbers generated this way is called ‘pseudo-random’, which is a fancy way of saying ‘pretend-random’. They look random, but they’re not. For most statistical purposes, the difference between a sequence that looks random and is genuinely random is unimportant, so this method is used as the basis for simulation procedures. But for the random selection of Premium Bond winners, there are obvious logistic and moral problems in using a sequence of numbers that is actually predictable, even if it looks entirely random.

For this reason, Ernie was invented. Ernie is a random number generator. But to ensure the numbers are genuinely random, it incorporates a genuine physical process whose behaviour is entirely random. A mathematical representation of the state of this physical process then leads to the random numbers.

The very first Ernie is shown in the second picture above. It was first used in 1957, was the size of a van and used a gas neon diode to induce the randomness. Though effective, this version of Ernie was fairly slow, generating just 2000 numbers per hour. It was subsequently killed-off and replaced with ever-more efficient designs over the years.

The third picture above shows Ernie (mark 4), which has been in operation from 2004 up until this weekend. In place of gas diodes, it used thermal noise in transistors to generate the required randomness, which in turn generated the numbers. Clearly, in terms of size, this version was a big improvement on Ernie (mark 1), being about the size of a normal PC. It was also much more efficient, being able to generate one million numbers in an hour.

But Ernie (mark 4) is no more. The final picture above shows Ernie (mark 5), which came into operation this weekend, shown against the tip of a pencil. It’s essentially a microchip. And of course, the evolution of computing equipment the size of a van to the size of a pencil head over the last 60 years or so is a familiar story. Indeed Ernie (mark 5) is considerably faster – by a factor of 42.5 or so – even compared to Ernie (mark 4), despite the size reduction. But what really makes the new version of Ernie stand out is that the physical process that induces the randomness has fundamentally changed. One way or another, all the previous versions used thermal noise to generate the randomness; Ernie (mark 5) uses quantum random variation in light signals.

More information on the evolution of Ernie can be found here. A slightly more technical account of the way thermal noise was used to generate randomness in each of the Ernie’s up to mark 4 is given here. The basis of the quantum technology for Ernie mark 5 is that when a photon is emitted towards a semi-transparent surface, is either reflected or transmitted at random. Converting these outcomes into 0/1 bit values, forms the building block of random number generation.

Incidentally, although the randomness in the physical processes built into Ernie should guarantee that the numbers generated are random, checks on the output are carried out by the Government Actuary’s Department to ensure that the output can genuinely be regarded as random. In fact they apply four tests to the sequence:

  1. Frequency: do all digits occur (approximately) equally often?
  2. Serial: do all consecutive number pairs occur (approximately) equally often?
  3. Poker: do poker combinations (4 identical digits; 3 identical digits; two pairs; one pair; all different) occur as often as they should in consecutive numbers?
  4. Correlation: do pairs of digits at different spacings in bond numbers have approximately the correct correlation that would be expected under randomness?

In the 60 or so years that premium Bonds have been in circulation, the monthly numbers generated by each of the successive Ernie’s have never failed to pass these tests.

However:


Finally, in case you’re disappointed that I started this post with a gratuitous reference to Sesame Street which I didn’t follow-up on, here’s a link to 10 facts and statistics about Sesame Street.

Worst use of Statistics of the year

You might remember in a couple of earlier posts (here and here) I discussed the Royal Statistical Society’s ‘Statistic of the Year’ competition. I don’t have updates on the results of that competition for 2018 yet, but in the meantime I thought I’d do my own version, but with a twist: the worst use of Statistics in 2018.

To be honest,  I only just had the idea to do this, so I haven’t been building up a catalogue of options throughout the year. Rather, I just came across an automatic winner in my twitter feed this week.

So, before announcing the winner, let’s take a look at the following graph:

This graph is produced by the Office for National Statistics, which is the UK government’s own statistical agency, and shows the change in average weekly wages in the UK, after allowance for inflation effects, for the period 2008-2018. 

There are several salient points that one might draw from this graph:

  1. Following the financial crash in 2008, wages declined steadily over a 6-year period to 2014, where they bottomed-out at around 10% lower than pre-crash levels.
  2. The election of a Conservative/Lib Dem coalition government in 2010 didn’t have any immediate impact on the decline of wage levels. Arguably the policy of intense austerity may simply have exacerbated the problem.
  3. Things started to pick up during 2014, most likely due to the effects of Quantitative Easing and other efforts to stimulate the economy by the Bank of England in the period after the crash.
  4. Something sudden happened in 2016 which seems to have choked-off the recovery in wage levels. (If only there was a simple explanation for what that might be.)
  5. Wages are currently at the same level as they were 7 years ago in 2011, and significantly lower than they were immediately following the financial crash in 2008.

So that’s my take on things. Possibly there are different interpretations that are equally valid and plausible. I struggle, however, to accept the following interpretation, to which I am awarding the 2018 worst use of Statistics award:

 ONS data showing real wages rising at fastest rate in 10 years… is good news for working Britain

Now, believe me, I’ve looked very hard at the graph to try to find a way in which this statement provides a reasonable interpretation of it, but I simply can’t. You might argue that wages grew at the fastest rate in a decade during 2015, but only then because wages had performed so miserably in the preceding years.  But any reasonable interpretation of the graph suggests current wages have flatlined since 2016, and it’s simply misleading to suggest that wages are currently rising at the fastest rate in 10 years. 

So, my 2018 award for the worst use of Statistics goes to…

… Dominic Raab, who until his recent resignation was the Secretary of State responsible for the United Kingdom’s withdrawal from the European Union (i.e. Brexit) and is a leading contender to replace Theresa May as the next leader of the Conservative Party.

Well done Dominic. Whether due to mendacity or ignorance, you are a truly worthy winner.