R: should you care?

There was a joke in my post last week where Binky the amateur epidemiologist was giving a lesson on the meaning of R. The joke is: we’re all semi-experts now in R. We know it’s the average number an infected person will go on to infect. And we know that it’s important that it stays below 1: bigger than 1 and the epidemic will grow exponentially; smaller and it will fade away.

So, it’s a bit disconcerting that in Friday’s  press briefing it was revealed that the current estimate of R in the UK is dangerously close to the value of 1. And this is based data of infections that will have occurred before there was a loosening of the lockdown restrictions. Should we worry?

Not according to right-wing radio talk show host Julia Hartley-Brewer:

The article by Tom Chivers that Hartley-Brewer quotes is actually pretty interesting, and connects to a phenomenon in Statistics that was discussed in a very early post to this blog in pre-Coronavirus days. I’ll use the numerical example that Tom gives to illustrate things. It’s obviously a simplification of the real world, but it makes the point very effectively.

A particular issue with the Coronavirus epidemic around the world has been its devastation in care homes. Partly this is because it tends to hit older people hardest, and partly it’s because the nature of care homes makes contagion much harder to control. As such, the transmission rate is likely to be higher in care homes compared to the rest of the population.

So, suppose we have 1000 infected people in the wider population and 1000 infected people in care homes. Suppose also that the value of R is 2 in the population, but 3 in care homes. Then, on average, these groups of infected people will infect a further 2000 and 3000 people respectively. So, in total, we have 2000 infected individuals who will infect a further 5000 people and the overall value of R is 5000/2000 = 2.5.

Since this value is dangerously high, lockdown restrictions are introduced, both in care homes and outside. Let’s assume these have the effect of reducing the transmission rate in care homes to R=2.8, while the impact in the wider population is much greater, reducing R to 1.

Some time later it’s found there are 900 infected individuals in care homes and 100 outside. Because of the respective values of R, these individuals will then, on average, infect a further 900 x 2.8 = 2520 individuals in care homes, and 100 x 1 = 100 individuals outside. So, overall, we have 1000 infected individuals who will infect an average of 2620 further individuals and the overall value of R is 2.62.

And here’s the remarkable thing: the value of R has decreased both inside care homes and outside, but the overall value of R has increased.

This is an example of Simpson’s paradox which, as explained above, was discussed in a sporting context much earlier in this blog. The point is this: although R has gone down in both the separate communities, its value remains much higher in one compared to the other. And because of the relative numbers of infected individuals, the overall calculation of R is dominated by the care home numbers after the lockdown. Previously it was balanced between care home and general population numbers. The effect is for the overall value of R to move closer to the care home value after the lockdown, which is lower than the value before the lockdown, but higher than the original overall value.

So what does this mean in practice? Julia Hartley-Brewer’s interpretation is that although the evidence is that R has increased in the UK population, this might well be a consequence of Simpson’s paradox as above. It’s not, according to her, that social restrictions are ineffective; it’s that they are so effective outside of care homes that calculations of R are now dominated by the behaviour of transmission in care homes, which forces the value to be close to 1. And she boldly concludes:

… we don’t need to return to full lockdown.

But this misses the point completely. Although the overall value of R is less than 1, and its rise may well be due to the effect of Simpson’s paradox along the lines of the numerical example above, this very argument means that it’s likely that the value of R in care homes remains considerably greater than 1. This is extremely dangerous for 2 reasons. First, within care homes, transmission rates remain at levels that imply exponential growth. Left unchecked, this would be devastating for care home residents. Second, it’s impossible in practice to completely isolate care homes from the rest of the population. So, even though R is likely to be less than 1 in the wider community, its contact with another community for which R is greater than 1 is likely to stop the epidemic from simply dying out as would inevitably happen in a closed community with R less than 1.

The conclusion, therefore, is completely the opposite of what Hartley-Brewer implies: the fact that a vulnerable subset of the population are likely to have a value of R is greater than 1 adds weight to the arguments for being cautious about weakening lockdown restrictions. Not just for people in care homes, but also for right-wing talk show hosts living on the outside.

In the real world, of course, things are much more complicated than just two sub-populations with different transmission rates. Transmission rates are likely to vary geographically and by many other socio-demographic factors. The models on which policies are being developed allow for these multiple types of behaviour, and are therefore not ‘tricked’ by Simpson’s paradox. Discussions about the value of R are therefore unhelpfully simplistic. It might be the single best measure of the state of an epidemic’s trajectory, but in itself it’s not really sufficient to determine whether the epidemic is under control or not.

It’s official: Brits get drunk more often than anywhere else in the WORLD

A while back the Global Drug Survey (GDS) produced its annual report. Here are some of the newspaper headlines following its publication:

It’s official: Brits get drunk more often than anywhere else in the WORLD. (The Mirror)

Britons get drunk more often than 35 other nations, survey finds. (The Guardian)

Brits are world’s biggest boozers and we get hammered once a week, study says. (The Sun)

And reading some of these articles in detail we find:

  • Of the 31 countries included in the study, Britons get drunk most regularly (51.1 times per year, on average).
  • Britain has the highest rate of cocaine usage (74% of participants in the survey say they have used it at some point).
  • 64% of English participants in the survey claim to have used cocaine in the last year.

Really? On average Brits are getting drunk once a week? And 64% of the population have used cocaine in the last year? 64%!

Prof Adam Winstock, founder of the survey, summarises things thus:

In the UK we don’t tend to do moderation, we end up getting drunk as the point of the evening.

At which point it’s important to take a step back and understand how the GDS works. If you want a snapshot of a population as a whole, you have to sample in such a way that every person in the population is equally likely to be sampled. Or at least ensure by some other mechanism that the sample is truly representative of the population. But the Global Drug Survey is different: it’s an online survey targeted at people whose demographics coincide with people who are more likely to be regular drinkers and/or drug users.

Consequently, it’s safe to conclude that the Brits who chose to take this survey are likely to get drunk more often than people from other countries who also completed the survey. And that 64% of British participants in the survey have used cocaine last year. But since this sample is neither random nor designed to be representative, it really tells us nothing about the population as a whole. And even comparisons of the respondents across countries should be treated cautiously: perhaps the differences are not due to variations in drink/drug usage but instead due to variations in the composition of the survey respondents across countries.

Here’s what the GDS say themselves about this…

Don’t look to GDS for national estimates. GDS is designed to answer comparison questions that are not dependent on probability samples. The GDS database is huge, but its non-probability sample means analyses are best suited to highlight differences among user populations. GDS recruits younger, more experienced drug using populations. We spot emerging drugs trends before they enter into the general population.

In other words, by design the survey samples people who are more likely to drink regularly or to have used drugs, and the GDS itself therefore warns against the headline use of the numbers. It’s not really that 64% of the UK population that’s used cocaine the last year; it’s 64% of a self-selected group who are in a demographic that are more likely to have used cocaine and who responded to an online survey.

To emphasise this point the GDS information page identifies the following summary characteristics of respondents to the survey:

  • a 2:1 ratio of male:female;
  • 60% of participants with at least a university degree;
  • an average age of 25 years;
  • more than 50% of participants reporting to have regular involvement in nightlife and clubbing.

Clearly these characteristics are quite different from those of the population as a whole and, as intended by the study, orientated towards people that are more likely to have a drinking or drug habit. At which point the newspaper headlines become much less surprising.

Now, there’s nothing wrong with carrying out surveys in this way. If you’re interested in attitudes and behaviours among drinkers and drug users, there’s not much point in wasting time on people who indulge in neither. But… what you get out of this is a snapshot of people whose characteristics match those of the survey respondents, not of the population as a whole. And sure, this is all spelt out very clearly in the GDS report itself, but that doesn’t stop the tabloids (and even the Guardian) from headlines that make it seem like Britain is drink/drug capital of the world.

In summary:

  • You can extrapolate the results of a sample to a wider population only if the sample is genuinely representative of the whole population;
  • The best way of ensuring this is to do random sampling where each member of the population could be included in the sample;
  • The media aren’t going to let niceties of this type get in the way of a good headline, so you need to be extremely wary when reading media reports based on statistical surveys.

What seems to be a more scientific approach to studies in the variation of alcohol consumption across countries is available here. On this basis, at least in 2014, average alcohol consumption in the UK was considerably lower than that in, say, France or Germany. That’s not to say Brits got drunk less: it might still be that a proportion of people drink excessively – to the point of getting drunk – but the overall average is still relative low.

However, if you look down the page there’s this graph…

…which can be interpreted as giving the proportion of each country’s population – admittedly in 2010 – who had at least one heavy night out in a period of 30 days. France and the UK are pretty much level on this basis, and not particularly extreme. Lithuania seems to be the most excessive European country in these terms, while king of the world is apparently Madagascar, where 64.8% of the population reported a heavy drinking session over the 30 day period. So…

It’s official: Madagascans get drunk more often than anywhere else in the WORLD


I recently read that more than 250 people died between 2011 and 2017 taking selfies (so-called killfies). A Wikipedia entry gives a list of some of these deaths, as well as injuries, and categorises the fatalities as due to the following causes:

  • Transport
  • Electrocution
  • Fall
  • Firearm
  • Drowned
  • Animal
  • Other

If you have a macabre sense of humour it makes for entertaining reading while also providing you with useful life tips: for example, don’t take selfies with a walrus.

More detail on some of these incidents can also be found here.

Meanwhile, this article includes the following statistically-based advice:

Humanity is actually very susceptible to selfie death. Soon, you will be more likely to die taking a selfie than you are getting attacked by a shark. That’s not me talking: that’s statistical likelihood. Stay off Instagram and stay alive

Yes, worry less about sharks, but a bit more about Instagram. Thanks Statistics.

The original academic article which identified the more than 250 selfie deaths is available here. It actually contains some interesting statistics:

  • Men are more susceptible to death-by-selfie than women, even though women take more selfies;
  • Most deaths occur in the 20-29 age group;
  • Men were more likely to die taking high-risk selfies than women;
  • Most selfie deaths due to firearms occurred in the United States;
  • The highest number of selfie deaths is in India.

None of these conclusions seems especially surprising to me, except the last one. Why India? Have a think yourself why that might be before scrolling down:


There are various possible factors. Maybe it’s because the population in India is so high. Maybe people just take more selfies in India. Maybe the environment there is more dangerous. Maybe India has a culture for high risk-taking. Maybe it’s a combination of these things.

Or maybe… if you look at the academic paper I referred to above, the authors are based at Indian academic institutes and describe their methodology as follows:

We performed a comprehensive search for keywords such as “selfie deaths; selfie accidents; selfie mortality; self photography deaths; koolfie deaths; mobile death/accidents” from news reports to gather information regarding selfie deaths.

I have no reason to doubt the integrity of these scientists, but it’s easy to imagine that their knowledge of where to look in the media for reported selfie deaths was more complete for Indian sources than for those of other countries. In which case, they would introduce an unintentional bias in their results by accessing a disproportionate number of reports of deaths in India.

In conclusion: be sceptical about any statistical analysis. If the sampling is biased for any reason, the conclusions almost certainly will be as well.

Love it or hate it

A while ago I wrote a post about the practice of advertistics – the use, and more often misuse, of Statistics by advertising companies to promote their products. And I referenced an article in the Guardian which included a number of examples of advertistics. One of these examples was Marmite.

You probably know the line: Marmite – you either love it or hate it. That’s an advertisitic in itself. And almost certainly provably incorrect – I just have to find one person who’s indifferent to Marmite.

But I want to discuss a slightly different issue. This ‘love or hate Marmite’ theme has turned up as an advertistic for a completely different product…

DNAfit is one of a number of do-it-yourself DNA testing kits. Here’s what they say about themselves:

DNAfit helps you become the best possible version of yourself. We promise a smarter, easier and more effective solution to health and fitness, entirely unique to your DNA profile. Whatever your goal, DNAfit will ensure you live a longer, happier and healthier life.

And here’s the eminent statistician, er, Rio Ferdinand, to persuade you with statistical facts as to why you should sign up with DNAfit.

But where’s the Marmite?

Well, as part of a campaign that was purportedly setup to address a decline in Marmite sales, but was coincidentally promoted as an advertistic for the DNAfit testing kit, a scientific project was set up to find genetic markers that identify whether a person will be a lover or hater of Marmite. (Let’s ignore, for the moment, the fact that the easiest way to discover if a person is a ‘lover’ or ‘hater’ of Marmite is simply to ask them.)

Here’s a summary of what they did:

  • They recruited a sample of 261 individuals;
  • For each individual, they took a DNA sample;
  • They also questioned the individuals to determine whether they love or hate Marmite;
  • They then applied standard statistical techniques to identify a small number of genetic markers that separate the Marmite lovers from the haters. Essentially, they looked for a combination of DNA markers which were present in the ‘haters’, but absent in the ‘lovers’ (or vice versa).

Finally, the study was given a sheen of respectability through the publication of a white paper with various genetic scientists as authors.

But, here’s the typical reaction of another scientist on receiving a press release about the study:

Wow, sorry about the language there. So, what’s wrong?

The Marmite gene study is actually pretty poor science. One reason, as explained in this New Scientist article, is that there’s no control for environmental factors. For example, several members of a family might all love Marmite because the parents do and introduced their kids to it at a very early age. The close family connection will also mean that these individuals have similar DNA. So, you’ll find a set of genetic characteristics that each of these family members have, and they all also love Marmite. Conclusion – these are genetic markers for loving Marmite. Wrong: these are genetic markers for this particular family who, because they share meals together, all love Marmite.

I’d guess there are other factors too. A sample of 261 seems rather small to me. There are many possible genetic markers, and many, many more combinations of genetic markers. With so many options it’s almost certain that purely by chance in 261 individuals you can find one set of markers shared only by the ‘lovers’ and another set shared only by the ‘haters’. We’ve seen this stuff before: look at enough things and something unlikely is bound to occur just by chance. It’s just unlikely to happen again outside of the sample of individuals that took part in the study.

Moreover, there seems to have been no attempt at validating the results on an independent set of individuals.

Unfortunately for DNAfit and Marmite, they took the campaign one stage further and encouraged Marmite customers – and non-customers – to carry out their own DNA test to see if they were Marmite ‘lovers’ or ‘haters’ using the classification found in the genetic study. If only they’d thought to do this as part of the study itself. Because although the test claimed to be 99.98% accurate, rather many people who paid to be tested found they’d been wrongly classified.

One ‘lover’ who was classified as a ‘hater’ wrote:

I was genuinely upset when I got my results back. Mostly because, hello, I am a ‘lover’, but also because I feel like Marmite led me on with a cheap publicity tool and I fell for it. I feel dirty and used.

While a wrongly-classified ‘hater’ said:

I am somewhat offended! I haven’t touched Marmite since I was about eight because even just the thought of it makes me want to curl up into a ball and scrub my tounge.

Ouch! ‘Dirty and used’. ‘Scrub my tongue’. Not great publicity for either Marmite or DNAfit, and both companies seem to have dropped the campaign pretty quickly and deleted as many references to it as they were able.

Ah, the price of doing Statistics badly.

p.s. There was a warning in the ads about a misclassification rate higher than 0.02% but they just dismissed it as fake news…



A day in the life

Over the next few weeks I’m planning to include a couple of posts looking at the way Statistics gets used – and often misused – in the media.

First though, I want to emphasise the extent to which Statistics pervades news stories. It’s everywhere. But we’re so accustomed to this fact, we hardly pay attention. So, I chose a day randomly last year – when I first planned this post – and made a note of all the articles that I came across which were based one way or another on Statistics.

In no particular order….

Article 1: An analysis of the ways the economy had been affected to date since the Brexit referendum.

Article 2: A report in your super soaraway Sun about research which shows 40% of the British population don’t hold cutlery correctly. (!)

Article 3: A BBC report about a study into heart defects and regeneration rates in Mexican tetra fish which may offer clues to help reduce heart disease rates in humans.

Article 4: A report showing that children’s school performance may be affected by their exact age on entry.

Article 5: A report into the rates of prescriptions of anti-depressants to children and the possible consequences of this.

Article 6: A survey of the number of teenage gamblers.

Article 7: A report on projections of the numbers of people who could be affected by future insulin shortages.

Article 8: A report on a study that suggests children’s weights are not driven by patterns of parental feeding, but rather the opposite: parents tend to adapt feeding patterns to the natural weight of their children.

Article 9: A comparison of football teams in terms of performance this season relative to last season.

Article 10: Not really about statistics exactly, but a report showing that the UK’s top-paid boss is Denise Coates, the co-founder of Bet365, who has just had a pay-rise of £265. Inludes a nice graphic showing how her salary has risen year-on-year.

Article 11: Report on a study showing failure rates of cars in MOT tests due to excessive emission rates.

Article 12: A report into an increase in the rate of anti-depressant prescriptions following the EU referendum.

Article 13: A report on rates of ice-melt in Antartica that suggest a sub-surface radioactive source.

Article 14: A report suggesting rats are getting bigger and what the implications might be.

Article 15: An explanation of algorithms that can distinguish between human and bot conversations.

Article 16: A report suggesting that global internet growth is slowing.

So that’s 16 articles in the papers I happened to look at on a random day. Pretty sure I could have picked any day and any set of papers and it would have been a similar story.

Now here’s a challenge: choose your own day and scan the papers (even just the online versions) to see how many stories have an underlying statistical content. And if you find something that’s suitable for the blog, please pass it on to me – that would be a great bonus.

When I was a kid I went on a school exchange trip to Germany. For some reason we had a lesson with our German hosts in which we were asked to explain the meaning of the Beatles’ ‘A Day in the Life’….

Embarrassingly, I think I tried to give a  literal word-by-word interpretation. But if I’d known then what I know about Statistics now, I think I could probably have made a better effort.

Here are the lyrics from one of the verses…

Ah I read the news today, oh boy
Four thousand holes in Blackburn, Lancashire
And though the holes were rather small
They had to count them all
Now they know how many holes it takes to fill the Albert Hall


Here’s a problem for you. You’re an executive member of a medium-sized company. You have quite a few employees whose livelihoods are dependent on the ongoing success of the company. The company is performing reasonably well, but someone high up in the company – let’s, for argument’s sake, say the company’s owner – is a bit of a loose cannon. Maybe he’s prone to say the wrong thing in the wrong place sometimes. Maybe he’s got a skeleton or two in his cupboard that are best kept well-hidden. And bad publicity could badly damage the reputation and value of the company, potentially costing money and jobs. What are you going to do?

Well, it turns out you need the help of statisticians.

You’ll know all about car insurance. You pay a premium, whose cost is calculated on the basis of a number of factors including your likelihood of having an accident, the value of the car, the rate of claims in the area you live, and so on. And if you have an accident or your car is stolen, then you can claim against the insurance policy. It’s a negative value bet – on average you will pay out more money in premiums than you will regain in claims – but to protect yourself against the huge losses that might be incurred by writing-off your car, or in the damages you might cause to a third party, it’s a bet you would probably take. Actually, it’s a bet you’re legally obliged to make if you want to drive a car.

But how are the risks evaluated and the prices set? Essentially on the basis of  statistical models. An insurance company will have a record of previous claims and the individual and demographic characteristics of the customers making those claims. It’s then a fairly standard statistical modelling procedure to relate the chance of a customer making a claim, and the average cost when they do, to the available characteristics.

We met something like this before in the context of expected goals (xG). In that setting we had a number of characteristics on a game play and wanted to calculate the probability a goal would be scored. Swap game state for customer characteristics and goal-scored for claim-made and you can see the problem is structurally the same. Well, almost. A game play can only lead to a single goal, whereas an insurance customer might make several claims in given period. But essentially the principle is the same: use the characteristic information – game play or customer type – to get the best predictor of some outcome – goal scored or claim made.

But, I digress. It turns out that just like protecting yourself through insurance against the potential costs of a car accident, you can protect your company against the potential embarrassment of bad behaviour by any of its employees. Or owners.

Welcome to the world of: disgrace insurance.

Yes, it turns out that you can insure your company against the fallout of bad headlines caused by any disgraceful behaviour by the members of your company. This type of insurance has apparently been around for quite a while, but the avalanche of recent celebrity scandals and a shift in funding mechanisms has altered the dynamics. Leading the way now is the start-up company SpottedRisk. They say of themselves:

SpottedRisk™ has completely reinvented the decades-old disgrace insurance product in order to meet the needs of today’s market.

What’s especially interesting here from a statistical point of view is the risk evaluation aspect. SpottedRisk have amassed a database of some 27,000 celebrities and used various metrics of their behaviour as predictors for subsequent scandals. Then, like customer characteristics and insurance claims, or game position and goal scored, they can build a model to use one to predict the other. And once they’ve evaluated the risk of a scandal and it’s likely cost, they can set the premium accordingly.

The amount paid is after a scandal depends on its severity. Or what SpottedRisk call the ‘Tier of Outcry‘. And they give some theoretical examples:

  • Roseanne Barr. Sent a number of racist and conspiracy-theory tweets and was dropped from her own show. Tier of outcry level 2. Payout $6 million.
  • Kevin Spacey. Accused by several men, some underage, of sexual harassment. Dropped from various film productions and other work activities. Tier of outcry level 4. Payout $8 million.
  • Harvey Weinstein. Industrial amounts of sexual misconduct. Persona non grata pretty much everywhere. Tier of outcry level 5. Payout $10 million.

But there’s just something I don’t quite get with this business model. A celebrity will be publicly disgraced on the occurrence of two events:

  1. He/she will have done something disgraceful;
  2. That disgraceful thing will come to light and be publicised.

Now, the celebrity and the insurance company can each make an assessment about how likely the second of these, but the celebrity is likely to have much better knowledge than the insurance company about whether they really have something to hide – that’s to say whether the first of these points is triggered. So the value of an insurance premium is much better known to the customer than to the company, who can only have a vague idea of 1, even if they can calculate 2 better than the celebrity. This is unlike car insurance, where the company is probably better able to evaluate a customer’s total risk than the customer themselves. As such a client here is in the unusual position of knowing whether the premium offered is of good value or not. This doesn’t really make much sense to me.

Additionally, the theoretical payout on Harvey Weinstein is $10 million. This is probably a fraction of the amount spent on any of the films whose production he was involved in, and it seems fanciful to think that a film studio would have bothered to insure itself against that amount of loss.

So, to my mind, something doesn’t quite add up.

Finally: is everyone insurable against disgrace? Apparently yes, except for R. Kelly and Donald Trump, the latter of whom would “probably trigger a claim every week”, according to SpottedRisk’s behavioural scientist Pete Dearborn.

The opening paragraph of this blog post is a work of fiction. Names, characters, businesses, places, events, locales, and incidents are either the products of the author’s imagination or used in a fictitious manner. Any resemblance to actual persons, living or dead, or actual events is purely coincidental.







In a recent  Guardian article, Arwa Mahdawi defines something that she jokingly calls ‘advertistics’. These are statistics based on surveys that are designed to generate results that are useful for advertising a particular product. This might be achieved in different ways, including:

  • The question might be asked in a pointed way which steers respondents in a particular direction;
  • The sample of individuals surveyed might be creatively chosen so that they are more likely to answer in a particular way;
  • An incentive might be offered to respondents who give particular answers;
  • Surveys might be ignored and repeated until the desired outcome is achieved;
  • The survey and the statistics might just be made up.

But whichever method is used, the results are presented as if they are genuinely representative of the wider population. These are advertistics.

One example referred to in Arwa’s Guardian article is a survey of Americans which concluded that 45% of Americans wear the same underpants for at least 2 consecutive days and that American men are 2.5 times as likely as women to have worn their underwear unchanged for more than a week. But here’s the catch: the survey was carried out by an underwear manufacturer, and the details of their survey design are unavailable. So, it’s impossible to know whether the individuals they sampled were genuinely representative of the wider American population, and therefore whether the 45% advertistic has any basis in reality. Nonetheless, it’s convenient for the underwear company to present it as if it does in order to strengthen their campaign for people to replace their underwear more frequently. By buying more of their products, of course.

Another example: I’m old enough to remember ads produced by the cat-food manufacturer Whiskas that claimed:

8 out of 10 cats prefer Whiskas.


  1. Nobody asked the cats; and
  2. Many owners didn’t reply.

So they were forced to change the tag line to:

8 out of 10 owners who expressed a preference said their cat prefers it.

Definitely not as snappy, though scientifically more correct. Yet without further details on exactly how the survey was conducted, doubts remain about the validity of the 8 out of 10 advertistic even with the added caveats.

Finally, remember that things can change in time, and statistics – and advertisitcs – will change accordingly. Arguably the most famous advertistic of all time is the ‘fact’ that Carlsberg is…

Probably the best beer in the world

Except, shockingly, it no longer is. The latest Carlsberg campaign includes the admission that Carlsberg is

Probably not the best beer in the world.

Which to believe? Well, the new campaign comes with evidence supplied by Carlsberg drinkers including the claims that

Carlsberg tastes like stale breadsticks

and that drinking Carlsberg is like…

… drinking the bathwater your nan died in

So, on the strength of evidence, we’re going to have to accept that Carlsberg’s not the best.



You looking at me?

Statistics: helping you solve life’s more difficult problems…

You might have read recently – since it was in every news outlet here, here, here, here, here, here, and here for example – that recent research has shown that staring at seagulls inhibits them from stealing your food. This article even shows a couple of videos of how the experiment was conducted. The researcher placed a package of food some metres in front of her in the vicinity of a seagull. In one experiment she watched the bird and timed how long it took before it snatched the food. She then repeated the experiment, with the same seagull, but this time facing away from the seagull. Finally, she repeated this exercise with a number of different seagulls in different locations.

At the heart of the study is a statistical analysis, and there are several points about both the analysis itself and the way it was reported that are interesting from a wider statistical perspective:

  1. The experiment is a good example of a designed paired experiment. Some seagulls are more likely to take food than others regardless of whether they are being looked at or not. The experiment aims to control for this effect by using pairs of results from each seagull: one in which the seagull was stared at, the other where it was not. By using knowledge that the data are in pairs this way, the accuracy of the analysis is improved considerably. This makes it much more likely to identify a possible effect within the noisy data.
  2. To avoid the possibility that, for example, a seagull is more likely to take food quickly the second time, the order in which the pairs of experiments are applied is randomised for each seagull.
  3. Other factors are also controlled for in the analysis: the presence of other birds, the distance of the food, the presence of other people and so on.
  4. The original experiment involved 74 birds, but many were uncooperative and refused the food in one or other of the experiments. In the end the analysis is based on just 19 birds who took food both when being stared at and not. So even though results prove to be significant, it’s worth remembering that the sample on which results were based is very small.
  5. It used to be very difficult to verify the accuracy of a published statistical analysis. These days it’s almost standard for data and code to be published alongside the manuscript itself. This enables readers to both check the results and carry out their own alternative analyses. For this paper, which you can find in full here, the data and code are available here.
  6. If you look at the code it’s just a few lines from R. It’s notable that such a sophisticated analysis can be carried out with such simple code.
  7. At the risk of being pedantic, although most newspapers went with headlines like ‘Staring at seagulls is best way to stop them stealing your chips‘, that’s not really an accurate summary of the research at all. Clearly, a much better way to stop seagulls eating your food is not to eat in the vicinity of seagulls. (Doh!) But even aside from this nit-picking point, the research didn’t show that staring at seagulls stopped them ‘stealing your chips’. It showed that, on average, the seagulls that bother to steal your chips, do so more quickly when you are looking away. In other words, the headline should be:

If you insist on eating chips in the vicinity of seagulls, you’ll lose them quicker if you’re not looking at them

Guess that’s why I’m a statistician and not a journalist.

The issue of designed statistical experiments was something I also discussed in an earlier post. As I mentioned then, it’s an aspect of Statistics that, so far, hasn’t much been exploited in the context of sports modelling, where analyses tend to be based on historically collected data. But in the context of gambling, where different strategies for betting might be compared and contrasted, it’s likely to be a powerful approach. In that case, the issues of controlling for other variables – like the identity of the gambler or the stake size – and randomising to avoid biases will be equally important.


Off script

off script

So, how did your team get on in the first round of Premier League fixtures for the 2019-20 season? My team, Sheffield United, were back in the top flight after a 13-year absence. It didn’t go too well though. Here’s the report:

EFL goal machine Billy Sharp’s long wait for a top-flight strike ends on the opening day. Ravel Morrison with the assist. But Bournemouth run out 4-1 winners.

And as if that’s not bad enough, we finished the season in bottom place:


Disappointing, but maybe not unexpected.

Arsenal also had a classic Arsenal season. Here’s the story of their run-in:

It seems only the Europa League can save them. They draw Man United. Arsenal abandon all hope and crash out 3-2. Just as they feared. Fans are more sad than angry. Once again they rally. Aubameyang and Alexandre Lacazette lead a demolition of high flying Liverpool. But they drop too many points and end up trophyless with another fifth-place finish.

Oh, Arsenal!

But what is this stuff? The Premier League doesn’t kick off for another week, yet here we have complete details of the entire season, match-by-match, right up to the final league table.

Welcome to The Script, produced by BT Sport. As they themselves explain:

Big data takes on the beautiful game.

And in slightly more detail…

BT has brought together the biggest brains in sports data, analysis and machine learning to write the world’s first artificial intelligence-driven script for a future premier league season.

Essentially, BT Sport have devised a model for match outcomes based on measures of team abilities in attack and defence. So far, so standard. After which…

We then simulate the random events that could occur during a season – such as injuries and player transfers – to give us even more accurate predictions.

But this is novel. How do you assign probabilities to player injuries or transfers? Are all players equally susceptible to injury? Do the terms of a player’s contract affect their chances of being sold? And who they are sold too? And what is the effect on a team’s performance of losing a player?

So, this level of modelling is difficult. But let’s just suppose for a minute you can do it. You have a model for what players will be available for a team in any of their fixtures. And you then have a model that, given the 2 sets of players that are available to teams for any fixture, spits out the probabilities of the various possible scores. Provided the model’s not too complicated, you can probably first simulate the respective lineups in a match, and then the scores given the team lineups. And that’s why Sheffield United lost 4-1 on the opening day to Bournemouth. And that’s why Arsenal did an Arsenal at the end of the season. And that’s why the league table ended up like it did above.

But is this a useful resource for predicting the Premier League?

Have a think about this before scrolling down. Imagine you’re a gambler, looking to bet on the outcome of the Premier League season. Perhaps betting on who the champions will be, or the top three, or who will be relegated, or whether Arsenal will finish fifth. Assuming BT’s model is reasonable, would you find the Script that they’ve provided helpful in deciding what bets to make?


Personally, I think the answer is ‘no’, not very helpful. What BT seem to have done is run A SINGLE SIMULATION of their model, for every game over the entire season, accumulating the simulated points of each team per match to calculate their final league position.


Imagine having a dice that you suspected of being biased, and you tried to understand its properties with a single roll. It’s almost pointless. Admittedly, with the Script, each team has 38 simulated matches, so the final league table is likely to be more representative of genuine team ability than the outcome of a single throw of a dice. But still, it’s the simulation of just a single season.

What would be much more useful would be to simulate many seasons and count, for example, in how many of those seasons Sheffield United were relegated. This way the model would be providing an estimate of the probability that Sheffield United gets relegated, and we could compare that against market prices to see if it’s a worthwhile bet.

In summary, we’ve seen in earlier posts (here and here, for example) contenders for the most pointless simulation in a sporting context, but the Script is lowering the bar to unforeseen levels. Despite this, if the blog is still going at the end of the season, I’ll do an assessment of how accurate the Script’s estimates turned out to be.


Word rank

I recently came across a large database of the use of English-American words. It aims to provide a representative sample of the usage English-American by including the words extracted from a large number of English texts of different types – books, newspaper articles, magazines etc. In total it includes around 560 million words collected over the years 1990-2017.

The word ‘football’ occurs in the database 25,271 times and has rank 1543. In principle, this means that ‘football’ was the 1543rd most frequent word in the database, though the method used for ranking the database elements is a little more complicated than that, since it attempts to combine a measure of both the number of times the word appears and the number of texts it appears in. Let’s leave that subtlety aside though and assume that ‘football’, with a frequency of 25,271, is the 1543rd most common word in the database.

The word ‘baseball’ occurs in the same database 28,851 times. With just this information, what would you predict the rank of the word ‘baseball’ to be? For example, if you think ‘baseball’ is the most common word, it would have rank 1. (It isn’t: ‘the’ is the most common word). If you think ‘baseball’ would be the 1000th most common word, your answer would be 1000.

Give it a little thought, but don’t waste time on it. I really just want to use the problem as an introduction to an issue that I’ll discuss in a future post. I’d be happy to receive your answer though, together with an explanation if you like, by mail. Or if you’d just like to fire an answer anonymously at me, without explanation, you can do so using this survey form.