Life comes at you fast

A few posts back I tried to explain the concept of herd immunity, since that seemed to be a cornerstone of the UK policy to handle the Coronavirus epidemic. Now, just a short time later, that approach seems to be off the table, and the UK is catching up with other European countries in applying measures that restrict social contact and therefore limit the rate of transmission of the virus. The previous post also described – loosely – how if an infected person passes the virus to an average of less than one other person, then the epidemic will fade out; otherwise it will grow exponentially.

So, what forced the change in government policy? Actually, not very much – the basic scientific modelling had been around for some time. But evidence from Italy suggested that demand for ICU support in hospitals for infected individuals – both in terms of number of patients, and length of treatment – would be greater than originally assumed. And the effect of this recalibration meant that the NHS capacity for ICU would have been woefully inadequate without some kind of intervention.

The change is strategy was based on work carried out at Imperial College and summarised in this report. As academic papers go it’s fairly readable, but I thought it might still be useful to give a brief summary here. So, I’ll give an outline of the methodology used, and then a picture-trail of the main conclusions.

The techniques used can be summarised as follows:

  1. A standard model for transmission of flu-type epidemics was adopted. This basically assumes that anyone having the disease has a probability of passing the disease on to anyone they have contact with. So the rate of transmission depends on the probability of transmission and the average number of contacts a person has. (See this post for discussion on these types of models.)
  2. The parameters for this model – things like the transmission rate of the disease – were estimated using data from China and Italy, where the current epidemic already has a longer history;
  3. The model also requires country-specific demographic information extracted from the population census, so that the numbers of infections within households, between work colleagues and so on, can be reasonably predicted.
  4. Simulations from the model were generated under alternative strategies for population restriction, leading to probability estimates of the number of infections and fatalities under each strategy.

Two broad types of strategy were considered:

  • Mitigation strategies, in which the average transmission rate is reduced, but stays greater than 1. In this case there is exponential growth of the epidemic until the herd immunity effect kicks in and the epidemic dies out.
  • Suppression strategies, in which the average transmission rate is reduced to a level below 1, so that the exponential growth phase of the epidemic is shortened considerably.

And here’s the picture-trail giving the conclusions (for the UK):

Picture 1:

Based on the input demographics and the estimated transmission rates, this graph shows the expected number of daily fatalities – both for the UK and US – if no population restrictions were applied. For the UK the peak number of fatalities per day would occur towards the end of May, with around half a million fatalities in total. This is a large number of fatalities, but the epidemic would be effectively over by July, at which point the acquired immunity in the population as a whole would prevent further epidemic outbreak.

Picture 2:

This graph shows the effect on ICU beds of various forms of mitigation strategy, ranging from school closures only (green) to isolating cases, quarantining affected households and social-distancing of over-70’s (blue). Also shown again, for comparison, is the ‘do nothing’ curve (black). The red line is current capacity for ICU beds, while the shaded light blue area is the time period over which it is assumed the restriction measures are in place. So, just as with a ‘do nothing’ policy, each of these strategies leads to the epidemic being extinguished due to the herd immunity effect, albeit a few weeks later towards the end of July. And each of the strategies does reduce the peak demand on ICU facilities. But, even the most stringent of these strategies leads to a demand on ICU beds that is still around 12 times current capacity. This is considered unsustainable.

Picture 3:

This graph considers suppression strategies. Again, the demand on ICU beds is plotted through time, assuming a suppression strategy is adopted for the time window shaded in blue. The second panel is just a zoomed-in section of the first graph, focusing on the lower part of the graph. Both suppression strategies offer a massive improvement over doing nothing (again shown in black) up until July. The version which includes school closures as well as social distancing is actually predicted to keep ICU demand well below capacity right through to October, while a loser version without school closures leads to a 50% shortfall in resources, which I imagine to be manageable.

So in the short term these suppression approaches are far superior to mitigation in keeping ICU demand below reasonable levels. The problem, as you see from the graph, is that once the restrictions are removed, the epidemic starts all over again in the autumn. Indeed, the most stringent approach, including school closures, leads to demand in the winter of 20/21 that is higher than what the ‘do nothing’ strategy would have led to in the summer of 2020.

Picture 4:

To get round the problem of the epidemic re-starting, the report looks at various strategies of containment based on the idea of relaxing restrictions when pressure on ICU units is low, and then placing them back when numbers grow back to a specified level. In this picture, the blue rectangles correspond to periods where restrictions are applied. In each such period, after a short period of further growth, the epidemic is controlled and brought back down to very low-levels. Then the restrictions are relaxed again, and the pattern repeats itself. In this way, some semblance of normal life is maintained by having periods with no restrictions, while the level of the epidemic is always contained by having periods with restrictions. As you can see in this final picture though, it’s estimated that the periods with restrictions would need to be about twice as long as those without.

So, there are no easy solutions.

  • Mitigation would allow the epidemic to run its course and fade in the space of just a few months. But it would lead to very many fatalities, and unsustainable pressures on the NHS;
  • Suppression through social distancing, quarantining and school closures will reduce short-term fatalities and ease pressure on health services, but does little to alter the long-term trajectory of the epidemic;
  • On-off versions of suppression can be used to contain the epidemic to sustainable levels, but will require long periods of restrictions, well into 2021 at least.

Of course, none of this is especially cheerful, but it’s obviously important to know the science when planning. It seems that the UK government’s original approach was a version of mitigation, until the recalibrated version of the model used in the Imperial College report set out what the short-term consequences of that would imply. So, like most other Europeans countries, the government moved to the current – and still evolving – suppression strategy based on social distancing, quarantining and school closures. Exactly as unfolded in Italy, it became imperative to control the first wave of the epidemic; concerns about potential future waves will have to be addressed, but by then more will be understood about the spread of the epidemic.

There are, moreover, a number of issues which may make the picture less gloomy than it seems.

  1. Though the report has used the very best expert opinion available when building models and estimating unknowns, it’s possible that things are better than the model assumes;
  2. A big unknown is the number of asymptomatic carriers in the population. If there are many people who have the virus without realising it – and there is some evidence to suggest that’s the case – then the natural build-up to a ‘herd immunity’ effect may be much more advanced than the model assumes, and the epidemic may die out quickly after a first wave, even with a suppression-based restrictions;
  3. It may be that the virus is more strongly seasonal than the model assumes, and that summer in the northern hemisphere causes a slowdown of the virus;
  4. Trials for vaccines are already underway. If a successful vaccine can be brought developed quickly and distributed, it may also eliminate the need for further rounds of restrictions;
  5. Tests that can assess whether someone has previously had the virus are also under development. At the moment, social distancing is required of all individuals. But there may be many people who have had the virus without realising and who are now immune. Identifying such individuals through testing would enable them to return safely to work.
  6. There are promising signs that certain existing anti-viral treatments, perhaps used in combination, will prove to be an effective cure to the Coronavirus disease, at least for some groups of critically ill patients.

In summary: the statistically-based Imperial College analysis shows how the government can implement social-interaction strategies to keep fatalities and pressure on health service facilities to tolerable levels. The time bought by these strategies – admittedly at a large economic and social cost – can then be used to enable other sciences to develop tests and vaccines to stem the epidemic entirely. It’s a battle, but understanding the statistics and adhering to the strategies adopted are key to winning it.


The Imperial College report contains considerably more detail than I’ve included here.

Other summaries of the report can be found here and here. Thanks to Michael.Freeman@Smartodds.co.uk for pointing me to the second of those.

The Rules of Contagion

With uncanny timing, Adam Kucharski, who is a professor at the London School of Hygiene and Tropical Medicine, has just published a book titled ‘The Rules of Contagion. Why Things Spread – and Why They Stop’. It deals, among other things, with the spread of epidemics, but shows how the science of epidemics applies equally to many other phenomena. As the Sunday Times says:

This is a hell of a moment for a book like this to come out … the principles of contagion, which, Kucharski argues, can be applied to everything from folk stories and financial crises to itching and loneliness, are suddenly of pressing interest to all of us.

The book is written in a way that’s both interesting and accessible, and if you’re at all interested in the mathematical/statistical aspects of the current epidemic, this is a very good place to learn something.

You can currently get the Kindle version at Amazon for less than the price of a coffee.


Incidentally, Adam contacted me a few years ago asking for an interview as he was writing a book about the history of gambling, all the way through to modern-day companies, like Smartodds, that are connected to the gambling industry. After taking advice I declined, so as to avoid any potential disclosure of proprietary information. I did write to Adam though, setting out some of my own thoughts about the industry and the background to my involvement at Smartodds. His book on this subject, ‘The Perfect Bet: Taking the Luck out of Gambling‘ is also a good read.

Numbers and pictures

Statistics is playing a fundamental role in supporting decision-makers by providing predictions of the Coronavirus epidemic spread and of the likely impact of possible courses of action they could take. Nothing is certain – from the transmission of the disease, to the way individuals will behave – which is why probability theory plays such an important role. We can’t be sure certain things will happen, but we can reasonably assign probabilities to them.

But at a more elementary level, clear presentation of data in both numerical and graphical form, is also important for understanding many characteristics of the epidemic. There are now various sources of well-presented information, and I thought it might be helpful to provide a list here of the best one’s I’ve found so far. If anyone has alternative sources, please send them to me or include them in the comments below and I’ll add them to the list

  1. Worldometers.info

This page gives current counts of various types – including new cases – per country. It also includes simple graphics that track the epidemic evolution. There are links to each individual country, where a country-specific history of numbers is available, and also links to look at effects by age, sex and so on. Graphs and so on are updated daily, but the numbers themselves are updated every time a country releases new daily data.

2. Informationisbeautiful.net

This page is updated daily and gives very clear graphics of a number of aspects of the epidemic. It shows, for example, slight differences in the age distribution of mortalities for Italy and China and also compares the mortality and contagion rate for this epidemic against those of other epidemics and diseases.

3. arcis.com

This is a dashboard giving numbers and a geo-graphical display of current cases. A more detailed UK-specific version of the dashboard is also hosted here.

4. lab.gedidigital.it

This is a similar country-specific dashboard, but for Italy.

5. ft.com

The Financial Times gives this comparison of the epidemic growth across countries. It’s updated daily, though sometimes I can’t get past a paywall. Similar figures are available anyway in the dashboards above.


Like I say, please let me know of any other useful sources and I’ll add them to the list.


Update:

This page has updated graphs that allow you to compare the trajectory of the epidemic in specified countries over different timescales and on different scales.

Test, test, test…

The testing of individuals for COVID-19 has become an urgent and sensitive issue. Some of the main questions are:

  1. Who should get tested and when?
  2. Why aren’t frontline health workers given tests automatically?
  3. Why aren’t all countries doing everything possible to follow the WHO advice of ‘test, test, test…’?

One probable lesson from this epidemic will be the importance of a testing strategy that’s needs to be in place prior to any future epidemic.

But as well as these questions, which concern the importance of testing on the health of individuals and the ability of society to cope with the epidemic, the issue of testing also imposes limitations on how the spread of the epidemic can be studied from a statistical perspective.

First, all countries have different protocols for testing. In some, anyone can ask to be tested; in others, only hospital patients are tested. It follows, therefore, that when different countries report different numbers of cases, this might be because there are genuinely more cases, or it might be because one country is carrying out more tests than the other. The same difficulty applies in a single country: if the number of cases changes from one day to another, is it because there is genuinely a different number of cases, or because the testing protocol has changed? This all means that when comparing figures across countries or through time, you need to be cautious that any differences might be at least partially due to differences in testing practices.

Another issue concerns the rate of infection. All of the epidemiological models on which government decisions are based require estimates of the rate of transmission of the disease and the proportion of a population that are susceptible to the disease. But if data are only available from the subset of a population that have been tested, and these are the individuals that are more likely to have the infection – they were tested precisely because they were showing symptoms – then we can’t directly estimate these quantities for the population as a whole.

Clearly, this isn’t the time to be wasting resources on random testing of individuals: how can we justify wasting resources testing people who are likely to be uninfected when we’re not testing health workers who are much more likely to be infected and for whom knowledge of infection status is crucial? But, fortuitously, there are two important case studies.

The first derives from a small town Vò Euganeo in Italy, close to Padova (where I actually used to live and work).  Early on in the Italian epidemic – on 21 February – one person from Vò died as a result of Coronavirus.  This led the local government to take two forms of action. The first was to place the town in lockdown, essentially sealed-off from the rest of Italy; the second was to test all 3,300 or so inhabitants of the town for the disease, both immediately and two weeks later. They found:

  1. Somewhere between 50% and 75% of the population tested positive for Coronavirus, but were asymptomatic;
  2. The number of daily new invectives fell from 88 to 7 over the lockdown period.
  3. The mortality rate among all people infected by the disease, showing symptoms or not, was around 1%.

Each of these data provides important information. First, there are likely to be many more people infected by Coronavirus than those who end up testing positive: many people are carriers without showing any symptoms. Second, the policy of locking down a region is effective in reducing the number of cases. And third, we get a reliable estimate of the fatality rate among all people infected by the disease. This final point is actually extremely important, as I’ll discuss below.

The second case study comes from the now infamous Diamond Princess cruise ship, which was quarantined off the shores of Japan when a number of passengers were found to be carrying the virus. The situation there is a little different in that there was no real possibility of preventing contagion between passengers. However, all passengers were tested, so can again get reliable measures of the true spread of the infection – albeit in a closed community where the virus is already present – and of the true mortality rate among infected people. The results in these respects were almost perfectly in line from this of Vò Euganeo: a significant proportion of the passengers tested positive, but were asymptomatic; and after a correction for differences in the age distribution, the mortality rate among infected individuals was around 1%.

The fact that there are a large number of asymptomatic cases in a population has good and bad consequences. First, the virus is not just non-fatal or non-serious for a large number of people, it’s also not even noticeable. But this itself has good and bad consequences. On the negative side, this means there are many potential transmitters of the disease that wouldn’t be isolated in a program which simply encouraged people with symptoms to self-isolate. This is why wider programs of social-distancing are important, even for people who are totally healthy. On the positive side, once these asymptomatic people ‘recover’ from the virus, they will contribute to a potential buffer in the community via the ‘herd immunity’ effect as discussed in an earlier post.

Another important statistical issue derives from the 1% mortality rate. Though of importance in its own right when modelling the epidemic, this value also helps us estimate the spread of the epidemic. As discussed above, there are likely to be wide variations in the reported number of cases from country to country due to differences in testing protocols. However, the number of deaths due to the virus is likely to be better standardised across – and definitely within – countries. Admittedly, some countries may still have different protocols for ascribing cause of death to the virus, but it seems reasonable to assume this effect will be smaller than that caused by differences in testing protocols.

So, rather than using the reported number of cases as an indicator of the true spread of the epidemic, it is likely to be more reliable to take the number of deaths and divide by 1%; or equivalently, multiply the number of deaths by 100. Of course, if someone dies from the virus, there’s a lag between their having tested positive and their death. One published estimate for the average time is around 9 days. It follows that if we take the number of deaths on any particular day, and multiply this number by 100, we get a reasonable estimate of the true number of infected individuals 9 days earlier. For example, in the UK, there were 16 deaths reported due to Coronavirus yesterday (17 March). This would suggest there were around 1600 active cases 9 days earlier on 8 March. But the reported number of active cases on 9 March was just 257. This would imply that due to either testing protocols or the fact that individuals were asymptomatic, only around 1 in 6 of infected individuals were recorded as such. Then, since the size of the epidemic doubles roughly every 6 days – which implies a daily increase of around 12%, – we get an estimate of today’s true number of active cases – 10 days after 8 March – by multiplying by 3.1 (which is 1.12 multiplied by itself 10 times). This leads to an estimated number of active cases of around 4,900.

Other authors have suggested the assumption of a 9-day lag is not accurate, arguing instead for a lag of 21 days. This would imply 1600 active cases on 25 February, rather than 9 March. Then, rolling the epidemic forwards 22 days to today means multiplying this estimate by 12.1 rather than 3.1, giving an estimated number of active cases equal to roughly 194,000.

Ok, this is rough-and-ready, and there are fine details – such as the correct time-lag to use – which require verification and refinement. Nonetheless, the idea forms the basis of a serious approach to the estimation of the extent of the virus that overcomes the problems created by testing protocols. It’s also been verified on South Korea data. In the period there where testing was extensive, the predicted numbers compare very well to the actual numbers; but in the earlier period, where testing was less intensive, the predictions exceed the actual numbers, as is currently the case for the UK numbers.

In summary:

  1. Two case studies on populations that have complete testing suggest that around 1% of all infected individuals will go on to die;
  2. This enables an estimate of the true number of active cases earlier in time to be estimated: simply take today’s number of deaths and multiply by 100;
  3. Extrapolate this number forward using reasonable assumptions about the epidemic growth rate.

References:

  1. I drew extensively from this article when writing this post.
  2. Fabien.Mauroy@smartodds.co.uk pointed me at the twitter feed of Steve Ilardi who also discusses this issue in the context of estimating epidemic numbers for the United States.

Lockdown

One of the difficulties in determining how to respond to the Coronavirus epidemic is a lack of evidence with which to base decisions. But since some countries – most notably China, South Korea and Italy – are ahead of the UK’s trajectory, there are lessons emerging which could inform decision-makers.

For example: what is the effect of placing areas in a lockdown?

It’s an imperfect analysis, but the following graph shows the trajectory of the total number of cases in two provinces of Lombardia.

The regions are broadly similar geographically and demographically, so it’s not totally unreasonable to consider them as equivalent when making comparisons. However, Lodi started with more cases and was placed in a  state of lockdown as of 23 February. Bergamo was also placed in lockdown, but considerably later, on 8 March.

Looking at the figure, though the number of cases has grown in both cases, in Lodi the growth is more or less linear, with signs of levelling off. In Bergamo, the growth appears to have started at an exponential rate, with a change to linear growth soon after the lockdown there, but with a steeper rate than that of Lodi. If it’s fair to make a direct comparison between these two provinces, there’s strong evidence that locking down early has a considerable impact on an epidemic’s growth.

Of course, there are a number of other factors to take account of, some of which may favour Bergamo over Lodi. People’s freedom of movement has been maintained for a longer period in Bergamo, and the effect on the economy is likely to be slighter – at least in the period shown in the graph. And we don’t know what will happen in the future – maybe when things are eventually relaxed the number of cases will grow faster in Lodi than in Bergamo.

Nonetheless, by the strict measure of short-term growth of epidemic, the evidence here is that an early and comprehensive lockdown is an effective strategy in containing numbers of new cases.


As I’m writing this I’ve just heard Boris Johnson announce a nationwide voluntary restriction on social contact in the UK. Time will tell whether this is stringent enough to get the same braking effect on the epidemic growth as was achieved in Lodi or if – as in Italy nationally from last week – a legally enforceable version of a lockdown will prove necessary.

Andrà tutto bene

There’s been a lot of discussion this weekend about the approach proposed by the UK government for handling the Coronavirus epidemic and how it compares to the approach adopted by most other countries so far. The best explanation I’ve seen of the UK approach is contained in a thread of tweets by Professor Ian Donald of the University of Liverpool. The thread starts here:

A strong counterargument setting out arguments against this approach is given here.

I’m in no position to judge whether the UK approach is less or more risky than that adopted by, say, Italy, who have taken a much more rigorous approach to what has quickly become known as ‘social-distancing’, but which roughly translates as closing down everything that’s non-essential and forcing people to stay at home.

However, there is one essential aspect about the UK strategy which seems a little mysterious and which I thought I might be able to shed a little light on with some Statistics.

You’re no doubt familiar by now with the term ‘herd immunity‘, though the phrase itself seems to have become a bit of a political hot potato. But whatever semantics are used, the basic idea is that once enough people in a population have been infected with the virus and recovered, the remainder of the population is also protected from further epidemic outbreaks. Why should that be so?

It’s nothing to do with virology or biology – antibodies are not passed from the previously infected to the uninfected – but is entirely to do with the statistical properties of epidemiological evolution. I’ll illustrate this with a much simplified version of a true epidemic, though the principles carry over to more realistic epidemiological models.

In a previous post I discussed how the basic development of an epidemiology in its initial exponential phase can be described by the following quantities:

  • E: the expected number of people an infected person is exposed to;
  • p: the probability an infected person will infect a person to whom they are exposed;
  • N: the number of people currently infected.

The simplest epidemiological model then assumes that the number of new infections the next day will be

 E \times p \times N

We’ll stick with that, but I want to make a slightly different assumption from that made in the video. In the video, when someone is infected, they remain infected indefinitely, and so are available to make new infections on each subsequent day. Instead, I want to assume here that a person that’s infected remains infected only for one day. After that they either recover and are immune, or, er, something else. But either way, they remain infective only for one day. Obviously, in real life, the truth is somewhere between these two extremes. But for the purposes of this argument it’s convenient to assume the latter.

In this case, if we start with N cases, the expected number of cases the next day is

 E \times p \times N

The next day it’s

 (E \times p)^2 \times N

And after x days it’s

 (E \times p)^x \times N

This means that we still get exponential growth in the number of cases whenever  E \times p is greater than 1; in other words, whenever an infected person will pass the virus on to an average of more than one person. But, critically, if  E \times p is less than 1,  (E \times p)^x \times N approaches zero as x grows and the epidemic dies out.

Here are some simulated trajectories. I’ve assumed we’re already at a point where there N=1000 cases and that the next day’s observations are a random perturbation around the expected value. First, let’s assume  E \times p =1.05 – so each infected person infects an average of 1.05 other people daily. The following graphs correspond to four different simulated trajectories. If you look at the values of the counts, each of the simulations is quite different due to the random perturbations (which you can’t really see). But in each case, the epidemic grows exponentially.

But now suppose  E \times p =0.95, so each infected individual infects an average of just 0.95 people per day. Again, the following figure shows four different simulations, each again different because of the randomness in the simulations. But now, instead of exponential growth, the epidemic tails off and essentially dies out.

This is crucially important: when   E \times p is below 1, meaning infected people infect less than one other person on average, the epidemic will just fade away. Now, as discussed in the previous post, changes to hygiene and social behaviour might help in reducing the value of  E \times p, but unless it goes below 1, the epidemic will still grow exponentially.

But, suppose a proportion Q of the population is actually immune to the virus. Then an infected person who meets an average of E people in a day, will now actually meet an average of just  E \times (1-Q) people that are not immune. So now the number of new infections in a day will be  E \times (1-Q)\times p, and as long as  E \times (1-Q)\times p is smaller than 1, the epidemic will tail off.

This is the basis of the idea of ‘herd immunity’. Ensure that a large enough proportion Q of the population is immune, so that the average number of people an infected person is likely to infect is less than 1. This is usually achieved through vaccination programs. By contrast, and in the absence of a vaccine, the stated UK government approach is to achieve a large value of Q by letting the disease spread freely within the sub-population of people who are at low risk of developing complications from the disease, while simultaneously isolating more vulnerable people. So, although many people will get the disease initially – since there is no herd immunity initially – these will be people who are unlikely to require long-term hospital resources. And once a large enough proportion of the non-vulnerable population has been infected, it will then be safe to put the whole population back together again as the more vulnerable people will benefit from the herd immunity generated in the non-vulnerable group.

Can this be achieved? Is it really possible to separate the vulnerable and non-vulnerable sections of the population? And will the spread of the disease through the non-vulnerable sub-population occur at the correct rate: too fast and hospitals can’t cope anyway (some ‘non-vulnerable’ people will still have severe forms of the disease); too slow and the ‘herd immunity’ effect will itself be too slow to protect the vulnerable section once the populations are re-combined. As explained in the thread of tweets above, the government has some control on this rate through social controls such as school closures and so on. But will it all work, especially once you factor in the fact that many non-vulnerable people may well take forms of action that minimise their own risk of catching the virus?

I obviously don’t have answers to these questions. But since I’ve found it difficult myself to understand from the articles I’ve read how ‘herd immunity’ works, I thought this post might at least clarify the basics of that concept.


‘Andrà tutto bene’ translates as ‘everything will be ok’, and has been adopted here in Italy as the slogan of solidarity against the virus. The picture at the top of the page is outside the nursery just up the road from where I live. As you walk around there are similar flags and posters outside many buildings and people’s houses. Feel free to print a copy of my picture and stick it on the office door.

The Dunning-Kruger effect

Thanks to those of you who wrote to say you found the previous post on the Coronavirus useful.

A small caveat though: I really am no expert on this subject. Statistics certainly has a role in understanding and predicting the evolution of the epidemic, but as in most areas of application, it’s in combination with other sciences – in this case epidemiology, virology, medicine and behavioural science – that Statistics will be of service. I know nothing about these subjects, and even the Statistics that I know is not especially geared to this type of problem.

I was thinking about this when I read the following tweet this morning:

Chances are you’ll have heard of Nate Silver either from his work on baseball analytics, or as the founder of FiveThirtyEight, a website providing data analytics for sports and politics.

Anyway, Nate’s tweet seemed quite profound to me. Genuine subject-matter experts are entitled to be precise in their pronouncements about Coronavirus, though it may be that they also express uncertainties about how things will unfold. But you should be wary of comments by people – like myself – who have some tangential skill, but not deep knowledge of the subject. If we pretend to know more than we do, then we’re misleading you. The very best that any commentator can ever do is to frame their comments within the context of their limited knowledge and expertise.

This is really the nature of Statistics. Say as much as you can from what the data tell you; but be open and honest about the limitations of what you can conclude.

So, as I wrote previously, in the interests of sharing knowledge and understanding,  I’ll try to write further posts on statistical aspects of the Coronavirus epidemic. But from the outset, please understand that I am really no expert, and that if my posts ever suggest otherwise you should discount them.

And maybe judge articles by anyone else from an equally critical perspective: is the author a subject-matter expert? If not, are they factoring in their limited knowledge by casting their conclusions with doubts and uncertainties?


It’s a bit of an aside, but reading the replies to Nate’s tweet it seems that the phenomenon of non-experts over-rating their own ability is a well-studied psychological phenomenon known as the Dunning-Kruger effect. I’m no expert (!) but it seems to me that properly done statistical analyses, in which uncertainties are properly accounted for via probabilities, provide a good antidote to this effect.

Be sufficiently worried

Though Smartodds loves Statistics is in hibernation while we work out what direction it should take in the future, the current Coronavirus epidemic raises important questions – many of which are statistical in nature – so I thought I would write some occasional posts specific to this topic.

First off, Richard.Greene@smartodds.co.uk pointed me at this topical video showing how the growth in the number of cases of Coronavirus can be modelled by exponential and logistic curves.

Take a look:

As the video explains – admittedly with numbers that are now a little out of date – while an epidemic is in its exponential growth phase, the daily increase in the number of cases is given by:

 E \times p \times N

where:

  • E is the expected number of people an infected person is exposed to;
  • p is the probability an infected person will infect a person to which they are exposed;
  • N is the number of people currently infected.

What this means is that the number of new infections is proportional to the number of current infections. So the rate of infections grows just as the number of infections grow, and this is what leads to the familiar exponential growth curve.

But crucially, although the growth curve will always be exponential in shape, the precise trajectory is massively affected by the values of E and p, each of which we have some individual control over during the epidemic. In particular:

  • E can be reduced by reducing the number of daily contacts we make;
  • p can be reduced by improving our personal hygiene habits.

So, although this is just a mathematical idealisation of the virus spread, it tells us in material terms what we can do in our day-to-day lives to minimise the growth. And slowing the trajectory of the growth is essential in allowing health systems time to manage the epidemic; in allowing time for possible seasonal effects to kick in – the hope that warmer weather will stunt virus transmissions; and allowing time for the identification and testing of potential vaccines and cures.

The other crucial fact is that in practice the exponential growth phase won’t continue indefinitely. The video discusses the consequences of having a finite population – as the number of infected people increases, an infected person is bound to meet fewer uninfected people – leading to the exponential-type curve flattening into a logistic shape and the eventual termination of the epidemic. But other factors too will halt the exponential growth. These include the development and  distribution of a vaccine, and the socio-demographic measures – progressively and aggressively altering E and p through strict social management – as seen in China, South Korea and more recently Italy.

So, although exponential growth is alarming – as the video shows, the current data suggest that the number of infections will multiply by 100 every month or so – there are also reasons to be optimistic about the demise of the epidemic. We can slow its progress through the reinforcement of sensible hygiene practices and modifications to our social behaviour, each of which will quicken the progress of the epidemic from the exponential growth phase to the  part of the logistic curve where the rate of growth diminishes and the epidemic peters out.

As the video concludes: be sufficiently worried. In other words, don’t panic, but don’t be complacent either. Good habits will offer you the best personal protection against the virus, while also helping stem the spread worldwide.


I’m no expert on epidemiology, but if you have any questions about the video above in particular, or about statistical aspects of the Coronavirus in general, I will try to answer them.

As the epidemic continues, I’ll also include further posts here from time to time, some linking to interesting articles, others, like this one, trying to explain the role of statistics in understanding the way the epidemic is likely to develop.

Britain’s toughest quiz

A year ago that I wrote a post explaining that one of the traditions of the Royal Statistical Society is that every year around Christmas it publishes a quiz that is widely recognised to be one of the toughest out there. The questions are never strictly statistical or mathematical, but they do often require an ability to think laterally and logically, as well as a good general knowledge.

So, in case you’ve nothing better to do over Christmas, this year’s version of the quiz has just been published. Feel free to have a go and submit your answers; otherwise send me your answers and we can submit a team effort. (Teams of up to 5 people are allowed). Don’t worry if you struggle though: my net score prior to last year’s quiz was zero, a value that didn’t change following last year’s quiz.

As a guide to what type of thinking goes into the questions and solutions, here are links to last year’s quiz and solutions.

In any case, happy Christmas and hope you have a great holiday.

 

Santa Claus is coming to town

The substance of this post, including the terrible joke in the finale, is all stolen from here.

Look at this graph. The Santas represent points on the graph, and broadly show that the closer you get to Christmas, the more numerous the sightings of Santa. (Presumably in supermarkets and stores, rather than in grottos and sleighs, but you get the idea).

As discussed in previous posts – here, for example – we can measure the extent to which these two variables are related using the correlation coeffiecient. If the data lined up perfectly on an increasing straight line, the correlation would be 1. If the variables were completely unrelated, the correlation would be close to zero. (Unlikely to be exactly zero, due to random variation).

For the Santa data, the correlation is probably around 0.95. It’s not quite 1 for two reasons: first there’s a bit of noise around the general trend between the variables; second, the relationship itself looks slightly curved. But anyway, there’s a clear pattern to be observed: as Christmas approaches, the sightings of Santa increase. And this would manifest itself with a correlation coefficient close to 1.

What’s the effect of this relationship? Well, changing the time period before Christmas – say moving from a month before Christmas to a week before Christmas – will change the number of Santas you’re likely to see. But does it work the other way round? If we dressed a few extra people up as Santa, would it change the number of days left till Christmas? Clearly not. There’s a cause and effect between the two variables in the graphs, but it only works in one direction. The number of days left till Christmas affects the number of Santas you see on the street, but it simply doesn’t work the other way around.

Conclusion:

Correlation doesn’t imply Clausality!

Hohoho.


Footnote: the correct version of this phrase, ‘Correlation doesn’t imply Causality’, was the subject of an earlier post.