# Following the science

There’s been a lot of discussion lately about the efficacy and efficiency of the UK government response to the Coronavirus epidemic. There are many strands to this, but one concerns the speed with which policies of social restriction were introduced. And a lot of the debate has focused on two sporting events that were held shortly after lockdowns were introduced in many European countries, but before they were introduced in the UK: the Cheltenham Festival and the second leg of the Champions League tie between Liverpool and Atletico Madrid.

The Liverpool game was especially controversial because it had been known for some time that Madrid was already a focus for the Coronavirus outbreak in Spain. And while most other Champions League fixtures that were held that week were held behind closed doors, the decision was made to hold the Liverpool game with spectators, including 3000 travellers from Madrid.

The picture from both Cheltenham and Liverpool after each event is concerning, since both locations appear to have a higher rate of infection than would be expected (see here and here). But it will take careful analysis of the data to establish the extent to which these apparent effects can be properly attributed to the associated sporting events, and an even fuller analysis to determine whether the decisions to hold the events were anyway reasonable or not.

One argument, for example, that’s been presented to justify not holding matches behind closed doors is that there may be more transmission if people watch a match in many pubs rather than in a stadium. And in any case, it’s perfectly valid to argue that a higher rate of infection due to holding a sporting event has to be offset against the economic and other social costs of not holding it. So, even if it turns out that the 2 events in question are genuinely likely to have increased infection rates, this doesn’t in itself imply that the decisions to hold both events were wrong.

But here’s the thing… as with all aspects of planning for and responding to events connected with the epidemic, Science – and Statistics – provides a framework for decision making. In particular, it will give predictions about what is most likely to occur if different actions are taken and, in the case of statistical models, most likely also attach probabilities to different possible outcomes, again dependent on the course of actions taken.

Crucially, though, Science will not tell you what to do. It won’t tell you how to balance costs in terms of lives against that in terms of money. Or jobs. Or something else. That’s a political decision. Moreover, ‘Science’ isn’t a fixed static object that unveils itself in uniform and unchallenged forms. There are different sciences, all of which are constantly evolving, and any combination of which might lead to conflicting conclusions. Even different statistical models might not be in complete agreement. Science will help you understand the costs and benefits of actions that are available to you; but you must take responsibility for the choices you make on the basis of that information.

However, I’ve lost count of how many times politicians – especially in the UK – defend their actions by arguing ‘we followed the science’.  Here’s Health Secretary Matt Hancock in defence of the decision to hold the Cheltenham festival:

We followed the scientific advice and were guided by that science.

And here in defence of holding the Champions League cup tie:

This is of course a question for the scientists and what matters now is that people in Liverpool and across the North West get the treatment that they need and get the curve under control.

Neither comment is likely to be completely untrue – it would obviously be outrageous for any government in any situation to completely ignore scientific evidence – but both seem to be distractions from the fact that decision-taking is a political process which balances the various risks and costs involved.

The most Science can do is to provide an assessment of what those risks and costs are.

Here’s Brian Cox’s take on the same argument:

When you hear politicians saying ‘we’re following the science’ then what that means is they don’t really understand what science is. There isn’t such a thing as ‘the’ science. Science is a mindset, it’s about trying to understand nature.

And here’s the full section with video:

# Life comes at you fast

A few posts back I tried to explain the concept of herd immunity, since that seemed to be a cornerstone of the UK policy to handle the Coronavirus epidemic. Now, just a short time later, that approach seems to be off the table, and the UK is catching up with other European countries in applying measures that restrict social contact and therefore limit the rate of transmission of the virus. The previous post also described – loosely – how if an infected person passes the virus to an average of less than one other person, then the epidemic will fade out; otherwise it will grow exponentially.

So, what forced the change in government policy? Actually, not very much – the basic scientific modelling had been around for some time. But evidence from Italy suggested that demand for ICU support in hospitals for infected individuals – both in terms of number of patients, and length of treatment – would be greater than originally assumed. And the effect of this recalibration meant that the NHS capacity for ICU would have been woefully inadequate without some kind of intervention.

The change is strategy was based on work carried out at Imperial College and summarised in this report. As academic papers go it’s fairly readable, but I thought it might still be useful to give a brief summary here. So, I’ll give an outline of the methodology used, and then a picture-trail of the main conclusions.

The techniques used can be summarised as follows:

1. A standard model for transmission of flu-type epidemics was adopted. This basically assumes that anyone having the disease has a probability of passing the disease on to anyone they have contact with. So the rate of transmission depends on the probability of transmission and the average number of contacts a person has. (See this post for discussion on these types of models.)
2. The parameters for this model – things like the transmission rate of the disease – were estimated using data from China and Italy, where the current epidemic already has a longer history;
3. The model also requires country-specific demographic information extracted from the population census, so that the numbers of infections within households, between work colleagues and so on, can be reasonably predicted.
4. Simulations from the model were generated under alternative strategies for population restriction, leading to probability estimates of the number of infections and fatalities under each strategy.

Two broad types of strategy were considered:

• Mitigation strategies, in which the average transmission rate is reduced, but stays greater than 1. In this case there is exponential growth of the epidemic until the herd immunity effect kicks in and the epidemic dies out.
• Suppression strategies, in which the average transmission rate is reduced to a level below 1, so that the exponential growth phase of the epidemic is shortened considerably.

And here’s the picture-trail giving the conclusions (for the UK):

Picture 1:

Based on the input demographics and the estimated transmission rates, this graph shows the expected number of daily fatalities – both for the UK and US – if no population restrictions were applied. For the UK the peak number of fatalities per day would occur towards the end of May, with around half a million fatalities in total. This is a large number of fatalities, but the epidemic would be effectively over by July, at which point the acquired immunity in the population as a whole would prevent further epidemic outbreak.

Picture 2:

This graph shows the effect on ICU beds of various forms of mitigation strategy, ranging from school closures only (green) to isolating cases, quarantining affected households and social-distancing of over-70’s (blue). Also shown again, for comparison, is the ‘do nothing’ curve (black). The red line is current capacity for ICU beds, while the shaded light blue area is the time period over which it is assumed the restriction measures are in place. So, just as with a ‘do nothing’ policy, each of these strategies leads to the epidemic being extinguished due to the herd immunity effect, albeit a few weeks later towards the end of July. And each of the strategies does reduce the peak demand on ICU facilities. But, even the most stringent of these strategies leads to a demand on ICU beds that is still around 12 times current capacity. This is considered unsustainable.

Picture 3:

This graph considers suppression strategies. Again, the demand on ICU beds is plotted through time, assuming a suppression strategy is adopted for the time window shaded in blue. The second panel is just a zoomed-in section of the first graph, focusing on the lower part of the graph. Both suppression strategies offer a massive improvement over doing nothing (again shown in black) up until July. The version which includes school closures as well as social distancing is actually predicted to keep ICU demand well below capacity right through to October, while a loser version without school closures leads to a 50% shortfall in resources, which I imagine to be manageable.

So in the short term these suppression approaches are far superior to mitigation in keeping ICU demand below reasonable levels. The problem, as you see from the graph, is that once the restrictions are removed, the epidemic starts all over again in the autumn. Indeed, the most stringent approach, including school closures, leads to demand in the winter of 20/21 that is higher than what the ‘do nothing’ strategy would have led to in the summer of 2020.

Picture 4:

To get round the problem of the epidemic re-starting, the report looks at various strategies of containment based on the idea of relaxing restrictions when pressure on ICU units is low, and then placing them back when numbers grow back to a specified level. In this picture, the blue rectangles correspond to periods where restrictions are applied. In each such period, after a short period of further growth, the epidemic is controlled and brought back down to very low-levels. Then the restrictions are relaxed again, and the pattern repeats itself. In this way, some semblance of normal life is maintained by having periods with no restrictions, while the level of the epidemic is always contained by having periods with restrictions. As you can see in this final picture though, it’s estimated that the periods with restrictions would need to be about twice as long as those without.

So, there are no easy solutions.

• Mitigation would allow the epidemic to run its course and fade in the space of just a few months. But it would lead to very many fatalities, and unsustainable pressures on the NHS;
• Suppression through social distancing, quarantining and school closures will reduce short-term fatalities and ease pressure on health services, but does little to alter the long-term trajectory of the epidemic;
• On-off versions of suppression can be used to contain the epidemic to sustainable levels, but will require long periods of restrictions, well into 2021 at least.

Of course, none of this is especially cheerful, but it’s obviously important to know the science when planning. It seems that the UK government’s original approach was a version of mitigation, until the recalibrated version of the model used in the Imperial College report set out what the short-term consequences of that would imply. So, like most other Europeans countries, the government moved to the current – and still evolving – suppression strategy based on social distancing, quarantining and school closures. Exactly as unfolded in Italy, it became imperative to control the first wave of the epidemic; concerns about potential future waves will have to be addressed, but by then more will be understood about the spread of the epidemic.

There are, moreover, a number of issues which may make the picture less gloomy than it seems.

1. Though the report has used the very best expert opinion available when building models and estimating unknowns, it’s possible that things are better than the model assumes;
2. A big unknown is the number of asymptomatic carriers in the population. If there are many people who have the virus without realising it – and there is some evidence to suggest that’s the case – then the natural build-up to a ‘herd immunity’ effect may be much more advanced than the model assumes, and the epidemic may die out quickly after a first wave, even with a suppression-based restrictions;
3. It may be that the virus is more strongly seasonal than the model assumes, and that summer in the northern hemisphere causes a slowdown of the virus;
4. Trials for vaccines are already underway. If a successful vaccine can be brought developed quickly and distributed, it may also eliminate the need for further rounds of restrictions;
5. Tests that can assess whether someone has previously had the virus are also under development. At the moment, social distancing is required of all individuals. But there may be many people who have had the virus without realising and who are now immune. Identifying such individuals through testing would enable them to return safely to work.
6. There are promising signs that certain existing anti-viral treatments, perhaps used in combination, will prove to be an effective cure to the Coronavirus disease, at least for some groups of critically ill patients.

In summary: the statistically-based Imperial College analysis shows how the government can implement social-interaction strategies to keep fatalities and pressure on health service facilities to tolerable levels. The time bought by these strategies – admittedly at a large economic and social cost – can then be used to enable other sciences to develop tests and vaccines to stem the epidemic entirely. It’s a battle, but understanding the statistics and adhering to the strategies adopted are key to winning it.

The Imperial College report contains considerably more detail than I’ve included here.

Other summaries of the report can be found here and here. Thanks to Michael.Freeman@Smartodds.co.uk for pointing me to the second of those.

# Andrà tutto bene

There’s been a lot of discussion this weekend about the approach proposed by the UK government for handling the Coronavirus epidemic and how it compares to the approach adopted by most other countries so far. The best explanation I’ve seen of the UK approach is contained in a thread of tweets by Professor Ian Donald of the University of Liverpool. The thread starts here:

A strong counterargument setting out arguments against this approach is given here.

I’m in no position to judge whether the UK approach is less or more risky than that adopted by, say, Italy, who have taken a much more rigorous approach to what has quickly become known as ‘social-distancing’, but which roughly translates as closing down everything that’s non-essential and forcing people to stay at home.

However, there is one essential aspect about the UK strategy which seems a little mysterious and which I thought I might be able to shed a little light on with some Statistics.

You’re no doubt familiar by now with the term ‘herd immunity‘, though the phrase itself seems to have become a bit of a political hot potato. But whatever semantics are used, the basic idea is that once enough people in a population have been infected with the virus and recovered, the remainder of the population is also protected from further epidemic outbreaks. Why should that be so?

It’s nothing to do with virology or biology – antibodies are not passed from the previously infected to the uninfected – but is entirely to do with the statistical properties of epidemiological evolution. I’ll illustrate this with a much simplified version of a true epidemic, though the principles carry over to more realistic epidemiological models.

In a previous post I discussed how the basic development of an epidemiology in its initial exponential phase can be described by the following quantities:

• E: the expected number of people an infected person is exposed to;
• p: the probability an infected person will infect a person to whom they are exposed;
• N: the number of people currently infected.

The simplest epidemiological model then assumes that the number of new infections the next day will be

$E \times p \times N$

We’ll stick with that, but I want to make a slightly different assumption from that made in the video. In the video, when someone is infected, they remain infected indefinitely, and so are available to make new infections on each subsequent day. Instead, I want to assume here that a person that’s infected remains infected only for one day. After that they either recover and are immune, or, er, something else. But either way, they remain infective only for one day. Obviously, in real life, the truth is somewhere between these two extremes. But for the purposes of this argument it’s convenient to assume the latter.

In this case, if we start with N cases, the expected number of cases the next day is

$E \times p \times N$

The next day it’s

$(E \times p)^2 \times N$

And after x days it’s

$(E \times p)^x \times N$

This means that we still get exponential growth in the number of cases whenever $E \times p$ is greater than 1; in other words, whenever an infected person will pass the virus on to an average of more than one person. But, critically, if $E \times p$ is less than 1, $(E \times p)^x \times N$ approaches zero as x grows and the epidemic dies out.

Here are some simulated trajectories. I’ve assumed we’re already at a point where there N=1000 cases and that the next day’s observations are a random perturbation around the expected value. First, let’s assume $E \times p =1.05$ – so each infected person infects an average of 1.05 other people daily. The following graphs correspond to four different simulated trajectories. If you look at the values of the counts, each of the simulations is quite different due to the random perturbations (which you can’t really see). But in each case, the epidemic grows exponentially.

But now suppose $E \times p =0.95$, so each infected individual infects an average of just 0.95 people per day. Again, the following figure shows four different simulations, each again different because of the randomness in the simulations. But now, instead of exponential growth, the epidemic tails off and essentially dies out.

This is crucially important: when  $E \times p$ is below 1, meaning infected people infect less than one other person on average, the epidemic will just fade away. Now, as discussed in the previous post, changes to hygiene and social behaviour might help in reducing the value of $E \times p$, but unless it goes below 1, the epidemic will still grow exponentially.

But, suppose a proportion Q of the population is actually immune to the virus. Then an infected person who meets an average of E people in a day, will now actually meet an average of just $E \times (1-Q)$ people that are not immune. So now the number of new infections in a day will be $E \times (1-Q)\times p$, and as long as $E \times (1-Q)\times p$ is smaller than 1, the epidemic will tail off.

This is the basis of the idea of ‘herd immunity’. Ensure that a large enough proportion Q of the population is immune, so that the average number of people an infected person is likely to infect is less than 1. This is usually achieved through vaccination programs. By contrast, and in the absence of a vaccine, the stated UK government approach is to achieve a large value of Q by letting the disease spread freely within the sub-population of people who are at low risk of developing complications from the disease, while simultaneously isolating more vulnerable people. So, although many people will get the disease initially – since there is no herd immunity initially – these will be people who are unlikely to require long-term hospital resources. And once a large enough proportion of the non-vulnerable population has been infected, it will then be safe to put the whole population back together again as the more vulnerable people will benefit from the herd immunity generated in the non-vulnerable group.

Can this be achieved? Is it really possible to separate the vulnerable and non-vulnerable sections of the population? And will the spread of the disease through the non-vulnerable sub-population occur at the correct rate: too fast and hospitals can’t cope anyway (some ‘non-vulnerable’ people will still have severe forms of the disease); too slow and the ‘herd immunity’ effect will itself be too slow to protect the vulnerable section once the populations are re-combined. As explained in the thread of tweets above, the government has some control on this rate through social controls such as school closures and so on. But will it all work, especially once you factor in the fact that many non-vulnerable people may well take forms of action that minimise their own risk of catching the virus?

I obviously don’t have answers to these questions. But since I’ve found it difficult myself to understand from the articles I’ve read how ‘herd immunity’ works, I thought this post might at least clarify the basics of that concept.

‘Andrà tutto bene’ translates as ‘everything will be ok’, and has been adopted here in Italy as the slogan of solidarity against the virus. The picture at the top of the page is outside the nursery just up the road from where I live. As you walk around there are similar flags and posters outside many buildings and people’s houses. Feel free to print a copy of my picture and stick it on the office door.

# Friday the 13th

Friday 13th. What could possibly go wrong today?

Well, according to people who suffer from Friggatriskaidekaphobia – the fear of Friday 13th – rather a lot. But is there any rationale for a fear of Friday 13th?

The scientific evidence is patchy. One study published in the British Medical Journal – ‘Is Friday the 13th bad for your health‘ – apparently found a 52% increase in hospital admissions from road accidents on Fridays that fell on the 13th of the month, compared with other Fridays.  However, one of the authors, Robert Luben, was subsequently quoted as saying:

It’s quite amusing and written with tongue firmly in cheek. It was written for the Christmas edition of the British Medical Journal, which usually carries fun or spoof articles.

I guess the authors looked at several possible statistics and reported the one that, by chance, fitted the hypothesis of Friday the 13th being unlucky. We’ve discussed this issue before: if you look at enough different phenomena where there is nothing of interest, some of them will look like there is something interesting happening just by chance. Statistics as a subject can be – and often is – badly misused this way,

Not everyone seemed to see it as a joke though. A follow-up study in the American Journal of Psychiatry titled ‘Traffic Deaths and Superstition on Friday the 13th‘  found a higher accident rate for women, but not men, on Fridays falling on the 13th of the month. This was subsequently contested by another group of researchers who published an article in the Journal BMC Public Health magazine titled ‘Females do not have more injury road accidents on Friday the 13th‘. Who to believe?

So, it’s a mixed bag. Moreover, as reported in Wikipedia – which gives an interesting history of the origins of the superstitions associated with Friday 13th – road accidents, in the Netherlands at least, are less frequent on Friday 13th, arguably because people take more care than usual. But even there I’d be cautious about the results without having a detailed look at the way the statistical analysis was carried out.

And anyway, Tuesday 8th is the new Friday 13th. You’ve been warned.

Footnote: I’m writing this on Thursday 12th, blissfully unaware of whatever horrors this particular Friday 13th will bring.

# Cube-shaped poo

Do you like pizza? If so, I’ve got good and bad news for you.

The good news is that the 2019 Ig Noble prize winner in the category of medicine is Silvano Gallus, who received the award for…

… collecting evidence that pizza might protect against illness and death…

The bad news, for most of you, is that this applies…

…if the pizza is made and eaten in Italy.

Obviously, it’s a bit surprising that pizza can be considered a health food. But if you accept that, it’s also a bit surprising that it has to be Italian pizza. So, what’s going on?

The Ig Nobel prizes are a satirical version of the Nobel prizes. Here’s the Wikipedia description:

The Ig Nobel Prize (/ˌɪɡnˈbɛl/ IG-noh-BEL) is a satiric prize awarded annually since 1991 to celebrate ten unusual or trivial achievements in scientific research, its stated aim being to “honor achievements that first make people laugh, and then make them think.” The name of the award is a pun on the Nobel Prize, which it parodies, and the word ignoble.

As such, the prize is awarded for genuine scientific research, but for areas of research that are largely incidental to human progress and understanding of the universe. For example, this year’s prize in the field of physics went to a group of scientists for…

It’s in this context that Silvano Gallus won his award. But although the Ig Noble award says something about the irrelevance of the subject matter, it’s not intended as a criticism of the quality of the underlying research. Gallus’s work with various co-authors (all Italian) was published as an academic paper ‘Does Pizza Protect Against Cancer‘ in the International Journal of Cancer. This wouldn’t happen if the work didn’t have scientific merit.

Despite this, there are reasons to be cautious about the conclusions of the study. The research is based on a type of statistical experimental design known as a case-control study. This works as follows. Suppose, for argument’s sake, you’re interested in testing the effect of pizzas on the prevention of certain types of disease. You first identify a group of patients having the disease and ask them about their pizza-eating habits. You then also find a group of people who don’t have the disease and ask them about their pizza-eating habits. You then check whether the pizza habits are different in the two groups.

Actually, it’s a little more complicated than that. It might be that age or gender or something else is also different in the two groups, so you also need to correct for these effects as well. But the principle is essentially just to see whether the tendency to eat pizza is greater in the control group – if so, you conclude that pizza is beneficial for the prevention of the specified disease. And on this basis, for a number of different cancer-types, Silvano Gallus and his co-authors found the proportion of people eating pizzas occasionally or regularly to be higher in the control group than in the case group.

Case-control studies are widely used in medical and epidemiological studies because they are quick and easy to implement. The more rigorous ‘randomised control study’ would work as follows:

1. You recruit a number of people for the study, none of whom have the disease of interest;
2. You randomise them into two groups. One of the groups will be required to eat pizza on a regular basis; the other will not be allowed to eat pizza;
3. You follow the 2 groups over a number of years and identify whether the rate of disease turns out to be lower in the pizza-eating group rather than the non-pizza-eating group;
4. Again, you may want to correct for other differences in the 2 groups (though the need for this is largely eliminated by the randomisation process).

Clearly, for both logistic and time reasons, a randomised control study is completely unrealistic for studying the effects of pizza on disease prevention. However, in terms of reliability of results, case control studies are generally inferior to randomised control studies because of the potential for bias.

In case control studies the selection of the control group is extremely important, and it might be very easy to fall into the trap of inadvertently selecting people with an unusually high rate of eating pizzas. (If, for example, you surveyed people while standing outside a pizzeria). It’s also easy – by accident or design – for the researcher to get the answer they might want when asking a question. For example: “you eat a lot of pizza, don’t you?” might get a different response from “would you describe yourself as a regular pizza eater?”. Moreover, people simply might not have an accurate interpretation of their long-term eating habits. But most importantly, you are asking people with, for example, cancer of the colon whether they are regular pizza eaters. Quite plausibly this type of disease has quite a big effect on diet, and one can well imagine that pizzas are not advised by doctors. So although the pizza-eating question is probably intended to relate to the period prior to getting the disease, it’s possible that people with the disease are no longer tending to eat pizza, and respond accordingly.

Finally, even if biases are eliminated by careful execution of the study, there’s the possibility that the result is anyway misleading. It may be that although pizzas seem to give disease protection, it’s not the pizza itself that’s providing the protection, but something else that is associated with pizza eating. For example, regular pizza eating might just be an indicator of someone who simply has regular meals, which may be the genuine source of disease protection. There’s also the possibility that while the rates of pizza eating are lower among the individuals with the specified diseases, they are much higher among individuals with other diseases (heart problems, for example). This could have been identified in a randomised control study, but flies completely under the radar in a case-control study.

So, case-control studies are a bit of a minefield, with various potential sources of misleading results, and I would remain cautious about the life-saving effects of eating pizza.

And finally… like all statistical analysis, any conclusions made on the basis of sample results are only relevant to the wider population from which that sample was drawn. And since this study was based on Italians eating Italian pizzas, the authors conclude…

Extension of the apparently favorable effect of pizza on cancer risk in Italy to other types of diets and populations is therefore not warranted.

So, fill your boots at Domino’s Pizzas, but don’t rely on the fact that this will do much in the way of disease prevention.

# No smoke without fire

No one seriously now doubts that cigarette smoking increases your risk of lung cancer and many other diseases, but when the evidence for a relationship between smoking and cancer was first presented in the 1950’s, it was strongly challenged by the tobacco industry.

The history of the scientific fight to demonstrate the harmful effects of smoking is summarised in this article. One difficulty from a statistical point of view was that the primary evidence based on retrospective studies was shaky, because smokers tend to give unreliable reports on how much they smoke. Smokers with illnesses tend to overstate how much they smoke; those who are healthy tend to understate their cigarette consumption. And these two effects lead to misleading analyses of historically collected data.

An additional problem was the difficulty of establishing causal relationships from statistical associations. Similar to the examples in a previous post, just because there’s a correlation between smoking and cancer, it doesn’t necessarily mean that smoking is a risk factor for cancer. Indeed, one of the most prominent statisticians of the time – actually of any time – Sir Ronald Fisher, wrote various scientific articles explaining how the correlations observed between smoking and cancer rates could easily be explained by the presents of lurking variables that induce spurious correlations.

At which point it’s worth noting a couple more ‘coincidences’: Fisher was a heavy smoker himself and also an advisor to the Tobacco Manufacturers Standing Committee. In other words, he wasn’t exactly neutral on the matter. But, he was a highly respected scientist, and therefore his scepticism carried considerable weight.

Eventually though, the sheer weight of evidence – including that from long-term prospective studies – was simply too overwhelming to be ignored, and governments fell into line with the scientific community in accepting that smoking is a high risk factor for various types of cancer.

An important milestone in that process was the work of another British statistician, Austin Bradford Hill. As well as being involved in several of the most prominent cases studies linking cancer to smoking, he also developed a set of 9 (later extended to 10) criteria for establishing a causal relationship between processes. Though still only guidelines, they provided a framework that is still used today for determining whether associated processes include any causal relationships. And by these criteria, smoking was clearly shown to be a risk factor for smoking.

Now, fast-forward to today and there’s a similar debate about global warming:

1. Is the planet genuinely heating up or is it just random variation in temperatures?
2. If it’s heating up, is it a consequence of human activity, or just part of the natural evolution of the planet?
3. And then what are the consequences for the various bio- and eco-systems living on it?

There are correlations all over the place – for example between CO2 emissions and average global temperatures as described in an earlier post – but could these possibly just be spurious and not indicative of any causal relationships?  Certainly there are industries with vested interests who would like to shroud the arguments in doubt. Well, this nice article applies each of Bradford Hill’s criteria to various aspects of climate science data and establishes that the increases in global temperatures are undoubtedly caused by human activity leading to CO2 release in the atmosphere, and that many observable changes to biological and geographical systems are a knock-on effect of this relationship.

In summary: in the case of the planet, the smoke that we see <global warming> is definitely a consequence of the fire we stared <the increased amounts of CO2 released into the atmosphere>.

# Killfie

I recently read that more than 250 people died between 2011 and 2017 taking selfies (so-called killfies). A Wikipedia entry gives a list of some of these deaths, as well as injuries, and categorises the fatalities as due to the following causes:

• Transport
• Electrocution
• Fall
• Firearm
• Drowned
• Animal
• Other

If you have a macabre sense of humour it makes for entertaining reading while also providing you with useful life tips: for example, don’t take selfies with a walrus.

More detail on some of these incidents can also be found here.

Humanity is actually very susceptible to selfie death. Soon, you will be more likely to die taking a selfie than you are getting attacked by a shark. That’s not me talking: that’s statistical likelihood. Stay off Instagram and stay alive

Yes, worry less about sharks, but a bit more about Instagram. Thanks Statistics.

The original academic article which identified the more than 250 selfie deaths is available here. It actually contains some interesting statistics:

• Men are more susceptible to death-by-selfie than women, even though women take more selfies;
• Most deaths occur in the 20-29 age group;
• Men were more likely to die taking high-risk selfies than women;
• Most selfie deaths due to firearms occurred in the United States;
• The highest number of selfie deaths is in India.

None of these conclusions seems especially surprising to me, except the last one. Why India? Have a think yourself why that might be before scrolling down:

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

There are various possible factors. Maybe it’s because the population in India is so high. Maybe people just take more selfies in India. Maybe the environment there is more dangerous. Maybe India has a culture for high risk-taking. Maybe it’s a combination of these things.

Or maybe… if you look at the academic paper I referred to above, the authors are based at Indian academic institutes and describe their methodology as follows:

We performed a comprehensive search for keywords such as “selfie deaths; selfie accidents; selfie mortality; self photography deaths; koolfie deaths; mobile death/accidents” from news reports to gather information regarding selfie deaths.

I have no reason to doubt the integrity of these scientists, but it’s easy to imagine that their knowledge of where to look in the media for reported selfie deaths was more complete for Indian sources than for those of other countries. In which case, they would introduce an unintentional bias in their results by accessing a disproportionate number of reports of deaths in India.

In conclusion: be sceptical about any statistical analysis. If the sampling is biased for any reason, the conclusions almost certainly will be as well.

# Love it or hate it

A while ago I wrote a post about the practice of advertistics – the use, and more often misuse, of Statistics by advertising companies to promote their products. And I referenced an article in the Guardian which included a number of examples of advertistics. One of these examples was Marmite.

You probably know the line: Marmite – you either love it or hate it. That’s an advertisitic in itself. And almost certainly provably incorrect – I just have to find one person who’s indifferent to Marmite.

But I want to discuss a slightly different issue. This ‘love or hate Marmite’ theme has turned up as an advertistic for a completely different product…

DNAfit is one of a number of do-it-yourself DNA testing kits. Here’s what they say about themselves:

DNAfit helps you become the best possible version of yourself. We promise a smarter, easier and more effective solution to health and fitness, entirely unique to your DNA profile. Whatever your goal, DNAfit will ensure you live a longer, happier and healthier life.

And here’s the eminent statistician, er, Rio Ferdinand, to persuade you with statistical facts as to why you should sign up with DNAfit.

But where’s the Marmite?

Well, as part of a campaign that was purportedly setup to address a decline in Marmite sales, but was coincidentally promoted as an advertistic for the DNAfit testing kit, a scientific project was set up to find genetic markers that identify whether a person will be a lover or hater of Marmite. (Let’s ignore, for the moment, the fact that the easiest way to discover if a person is a ‘lover’ or ‘hater’ of Marmite is simply to ask them.)

Here’s a summary of what they did:

• They recruited a sample of 261 individuals;
• For each individual, they took a DNA sample;
• They also questioned the individuals to determine whether they love or hate Marmite;
• They then applied standard statistical techniques to identify a small number of genetic markers that separate the Marmite lovers from the haters. Essentially, they looked for a combination of DNA markers which were present in the ‘haters’, but absent in the ‘lovers’ (or vice versa).

Finally, the study was given a sheen of respectability through the publication of a white paper with various genetic scientists as authors.

But, here’s the typical reaction of another scientist on receiving a press release about the study:

Wow, sorry about the language there. So, what’s wrong?

The Marmite gene study is actually pretty poor science. One reason, as explained in this New Scientist article, is that there’s no control for environmental factors. For example, several members of a family might all love Marmite because the parents do and introduced their kids to it at a very early age. The close family connection will also mean that these individuals have similar DNA. So, you’ll find a set of genetic characteristics that each of these family members have, and they all also love Marmite. Conclusion – these are genetic markers for loving Marmite. Wrong: these are genetic markers for this particular family who, because they share meals together, all love Marmite.

I’d guess there are other factors too. A sample of 261 seems rather small to me. There are many possible genetic markers, and many, many more combinations of genetic markers. With so many options it’s almost certain that purely by chance in 261 individuals you can find one set of markers shared only by the ‘lovers’ and another set shared only by the ‘haters’. We’ve seen this stuff before: look at enough things and something unlikely is bound to occur just by chance. It’s just unlikely to happen again outside of the sample of individuals that took part in the study.

Moreover, there seems to have been no attempt at validating the results on an independent set of individuals.

Unfortunately for DNAfit and Marmite, they took the campaign one stage further and encouraged Marmite customers – and non-customers – to carry out their own DNA test to see if they were Marmite ‘lovers’ or ‘haters’ using the classification found in the genetic study. If only they’d thought to do this as part of the study itself. Because although the test claimed to be 99.98% accurate, rather many people who paid to be tested found they’d been wrongly classified.

One ‘lover’ who was classified as a ‘hater’ wrote:

I was genuinely upset when I got my results back. Mostly because, hello, I am a ‘lover’, but also because I feel like Marmite led me on with a cheap publicity tool and I fell for it. I feel dirty and used.

While a wrongly-classified ‘hater’ said:

I am somewhat offended! I haven’t touched Marmite since I was about eight because even just the thought of it makes me want to curl up into a ball and scrub my tounge.

Ouch! ‘Dirty and used’. ‘Scrub my tongue’. Not great publicity for either Marmite or DNAfit, and both companies seem to have dropped the campaign pretty quickly and deleted as many references to it as they were able.

Ah, the price of doing Statistics badly.

p.s. There was a warning in the ads about a misclassification rate higher than 0.02% but they just dismissed it as fake news…

# You looking at me?

Statistics: helping you solve life’s more difficult problems…

You might have read recently – since it was in every news outlet here, here, here, here, here, here, and here for example – that recent research has shown that staring at seagulls inhibits them from stealing your food. This article even shows a couple of videos of how the experiment was conducted. The researcher placed a package of food some metres in front of her in the vicinity of a seagull. In one experiment she watched the bird and timed how long it took before it snatched the food. She then repeated the experiment, with the same seagull, but this time facing away from the seagull. Finally, she repeated this exercise with a number of different seagulls in different locations.

At the heart of the study is a statistical analysis, and there are several points about both the analysis itself and the way it was reported that are interesting from a wider statistical perspective:

1. The experiment is a good example of a designed paired experiment. Some seagulls are more likely to take food than others regardless of whether they are being looked at or not. The experiment aims to control for this effect by using pairs of results from each seagull: one in which the seagull was stared at, the other where it was not. By using knowledge that the data are in pairs this way, the accuracy of the analysis is improved considerably. This makes it much more likely to identify a possible effect within the noisy data.
2. To avoid the possibility that, for example, a seagull is more likely to take food quickly the second time, the order in which the pairs of experiments are applied is randomised for each seagull.
3. Other factors are also controlled for in the analysis: the presence of other birds, the distance of the food, the presence of other people and so on.
4. The original experiment involved 74 birds, but many were uncooperative and refused the food in one or other of the experiments. In the end the analysis is based on just 19 birds who took food both when being stared at and not. So even though results prove to be significant, it’s worth remembering that the sample on which results were based is very small.
5. It used to be very difficult to verify the accuracy of a published statistical analysis. These days it’s almost standard for data and code to be published alongside the manuscript itself. This enables readers to both check the results and carry out their own alternative analyses. For this paper, which you can find in full here, the data and code are available here.
6. If you look at the code it’s just a few lines from R. It’s notable that such a sophisticated analysis can be carried out with such simple code.
7. At the risk of being pedantic, although most newspapers went with headlines like ‘Staring at seagulls is best way to stop them stealing your chips‘, that’s not really an accurate summary of the research at all. Clearly, a much better way to stop seagulls eating your food is not to eat in the vicinity of seagulls. (Doh!) But even aside from this nit-picking point, the research didn’t show that staring at seagulls stopped them ‘stealing your chips’. It showed that, on average, the seagulls that bother to steal your chips, do so more quickly when you are looking away. In other words, the headline should be:

If you insist on eating chips in the vicinity of seagulls, you’ll lose them quicker if you’re not looking at them

Guess that’s why I’m a statistician and not a journalist.

The issue of designed statistical experiments was something I also discussed in an earlier post. As I mentioned then, it’s an aspect of Statistics that, so far, hasn’t much been exploited in the context of sports modelling, where analyses tend to be based on historically collected data. But in the context of gambling, where different strategies for betting might be compared and contrasted, it’s likely to be a powerful approach. In that case, the issues of controlling for other variables – like the identity of the gambler or the stake size – and randomising to avoid biases will be equally important.

# Data controversies

Some time ago I wrote about Mendel’s law of genetic inheritance, and how statistical analysis of Mendel’s data suggested his results were too good to be true. It’s not that his theory is wrong; it’s just that the data he provided as evidence for his theory seem to have been manipulated in such a way as to seem incontrovertible. Unfortunately the data lack the variation that Mendel’s own law would also imply should occur in measurements of that type, leading to the charge that the data had been manufactured or manipulated in some way.

The photograph, taken 100 years ago, was as striking at that time as the recent picture of a black hole, discussed in an earlier post, is today. However, this picture was taken with basic photographic equipment and telescopic lens and shows a total solar eclipse, as the moon passes directly between the Earth and the Sun.

A full story of the controversy is given here.

In summary: Einstein’s theory of general relativity describes gravity not as a force between two attracting masses – as is central to Newtonian physics – but as a curvature caused in space-time due to the presence of massive objects. All objects cause such curvature, but only those that are especially massive, such as stars and planets, will have much of an effect.

Einstein’s relativity model was completely revolutionary compared to the prevailing view of physical laws at the time. But although it explained various astronomical observations that were anomalous according to Newtonian laws, it had never been used to predict anomalous behaviour. The picture above, and similar ones taken at around the same time, changed all that.

In essence, blocking out the sun’s rays enabled dimmer and more distant stars to be accurately photographed. Moreover, if Einstein’s theory were correct, the photographic position of these stars should be slightly distorted because of the spacetime curvature effects of the sun. But the effect is very slight, and even Newtonian physics suggests some disturbance due to gravitational effects.

In an attempt to get photographic evidence at the necessary resolution, the British astronomer Arthur Eddington set up two teams of scientists – one on the African island of Príncipe, the other in Sobral, Brazil – to take photographs of the solar eclipse on 29 May, 1919. Astronomical and photographic equipment was much more primitive in those days, so this was no mean feat.

Anyway, to cut a long story short, a combination of poor weather conditions and other setbacks meant that the results were less reliable than were hoped for. It seems that the data collected at Príncipe, where Eddington himself was stationed, were inconclusive, falling somewhere between the Newton and Einstein model predictions. The data at Sobral were taken with two different types of telescope, with one set favouring the Newton view and the other Einstein’s. Eddington essentially combined the Einstein-favouring data from Sobral together with those from Príncipe and concluded that the evidence supported Einsteins relativistic model of the universe.

Now, in hindsight, with vast amounts of empirical evidence of many types, we know Einstein’s model to be fundamentally correct. But did Eddington selectively choose his data to support Einstein’s model?

There are different points of view, which hinge on Eddington’s motivation for dropping a subset of the Sobral data from his analysis. One point of view is that he wanted Einstein’s view to be correct, and therefore simply ignored the data that were less favourable. This argument is fuelled by political reasoning: it sarges that since Eddington was a Quaker, and therefore a pacifist, he wanted to support a German theory as a kind of post-war reconciliation.

The alternative point of view, for which there is some documentary evidence, is that the Sobral data which Eddington ignored had been independently designated as unreliable. Therefore, on proper scientific grounds, Eddington had behaved entirely correctly by excluding it from his analysis, and his subsequent conclusions favouring the Einstein model were entirely consistent with the scientific data and information he had available.

This issue will probably never be fully resolved, though in a recent review of several books on the matter, theoretical physicist Peter Coles (no relation) claims to have reanalysed the data given in the Eddington paper using modern statistical methods, and found no reason to doubt his integrity. I have no reason to doubt that point of view, but there’s no detail of the statistical analysis that was carried out.

What’s interesting though, from a statistical point of view, is how the interpretation of the results depends on the reason for the exclusion of a subset of the Sobral data. If your view is that Eddington knew their contents and excluded them on that basis, then his conclusions in favour of Einstein must be regarded as biased. If you accept that Eddington excluded these data a priori because of their unreliability, then his conclusions were fair and accurate.

Data are often treated as a neutral aspect of an analysis. But as this story illustrates, the choice of which data to include or exclude, and the reasons for doing so, may be factors which fundamentally alter the direction an analysis will take, and the conclusions it will reach.