Test, test, test…

The testing of individuals for COVID-19 has become an urgent and sensitive issue. Some of the main questions are:

  1. Who should get tested and when?
  2. Why aren’t frontline health workers given tests automatically?
  3. Why aren’t all countries doing everything possible to follow the WHO advice of ‘test, test, test…’?

One probable lesson from this epidemic will be the importance of a testing strategy that’s needs to be in place prior to any future epidemic.

But as well as these questions, which concern the importance of testing on the health of individuals and the ability of society to cope with the epidemic, the issue of testing also imposes limitations on how the spread of the epidemic can be studied from a statistical perspective.

First, all countries have different protocols for testing. In some, anyone can ask to be tested; in others, only hospital patients are tested. It follows, therefore, that when different countries report different numbers of cases, this might be because there are genuinely more cases, or it might be because one country is carrying out more tests than the other. The same difficulty applies in a single country: if the number of cases changes from one day to another, is it because there is genuinely a different number of cases, or because the testing protocol has changed? This all means that when comparing figures across countries or through time, you need to be cautious that any differences might be at least partially due to differences in testing practices.

Another issue concerns the rate of infection. All of the epidemiological models on which government decisions are based require estimates of the rate of transmission of the disease and the proportion of a population that are susceptible to the disease. But if data are only available from the subset of a population that have been tested, and these are the individuals that are more likely to have the infection – they were tested precisely because they were showing symptoms – then we can’t directly estimate these quantities for the population as a whole.

Clearly, this isn’t the time to be wasting resources on random testing of individuals: how can we justify wasting resources testing people who are likely to be uninfected when we’re not testing health workers who are much more likely to be infected and for whom knowledge of infection status is crucial? But, fortuitously, there are two important case studies.

The first derives from a small town Vò Euganeo in Italy, close to Padova (where I actually used to live and work).  Early on in the Italian epidemic – on 21 February – one person from Vò died as a result of Coronavirus.  This led the local government to take two forms of action. The first was to place the town in lockdown, essentially sealed-off from the rest of Italy; the second was to test all 3,300 or so inhabitants of the town for the disease, both immediately and two weeks later. They found:

  1. Somewhere between 50% and 75% of the population tested positive for Coronavirus, but were asymptomatic;
  2. The number of daily new invectives fell from 88 to 7 over the lockdown period.
  3. The mortality rate among all people infected by the disease, showing symptoms or not, was around 1%.

Each of these data provides important information. First, there are likely to be many more people infected by Coronavirus than those who end up testing positive: many people are carriers without showing any symptoms. Second, the policy of locking down a region is effective in reducing the number of cases. And third, we get a reliable estimate of the fatality rate among all people infected by the disease. This final point is actually extremely important, as I’ll discuss below.

The second case study comes from the now infamous Diamond Princess cruise ship, which was quarantined off the shores of Japan when a number of passengers were found to be carrying the virus. The situation there is a little different in that there was no real possibility of preventing contagion between passengers. However, all passengers were tested, so can again get reliable measures of the true spread of the infection – albeit in a closed community where the virus is already present – and of the true mortality rate among infected people. The results in these respects were almost perfectly in line from this of Vò Euganeo: a significant proportion of the passengers tested positive, but were asymptomatic; and after a correction for differences in the age distribution, the mortality rate among infected individuals was around 1%.

The fact that there are a large number of asymptomatic cases in a population has good and bad consequences. First, the virus is not just non-fatal or non-serious for a large number of people, it’s also not even noticeable. But this itself has good and bad consequences. On the negative side, this means there are many potential transmitters of the disease that wouldn’t be isolated in a program which simply encouraged people with symptoms to self-isolate. This is why wider programs of social-distancing are important, even for people who are totally healthy. On the positive side, once these asymptomatic people ‘recover’ from the virus, they will contribute to a potential buffer in the community via the ‘herd immunity’ effect as discussed in an earlier post.

Another important statistical issue derives from the 1% mortality rate. Though of importance in its own right when modelling the epidemic, this value also helps us estimate the spread of the epidemic. As discussed above, there are likely to be wide variations in the reported number of cases from country to country due to differences in testing protocols. However, the number of deaths due to the virus is likely to be better standardised across – and definitely within – countries. Admittedly, some countries may still have different protocols for ascribing cause of death to the virus, but it seems reasonable to assume this effect will be smaller than that caused by differences in testing protocols.

So, rather than using the reported number of cases as an indicator of the true spread of the epidemic, it is likely to be more reliable to take the number of deaths and divide by 1%; or equivalently, multiply the number of deaths by 100. Of course, if someone dies from the virus, there’s a lag between their having tested positive and their death. One published estimate for the average time is around 9 days. It follows that if we take the number of deaths on any particular day, and multiply this number by 100, we get a reasonable estimate of the true number of infected individuals 9 days earlier. For example, in the UK, there were 16 deaths reported due to Coronavirus yesterday (17 March). This would suggest there were around 1600 active cases 9 days earlier on 8 March. But the reported number of active cases on 9 March was just 257. This would imply that due to either testing protocols or the fact that individuals were asymptomatic, only around 1 in 6 of infected individuals were recorded as such. Then, since the size of the epidemic doubles roughly every 6 days – which implies a daily increase of around 12%, – we get an estimate of today’s true number of active cases – 10 days after 8 March – by multiplying by 3.1 (which is 1.12 multiplied by itself 10 times). This leads to an estimated number of active cases of around 4,900.

Other authors have suggested the assumption of a 9-day lag is not accurate, arguing instead for a lag of 21 days. This would imply 1600 active cases on 25 February, rather than 9 March. Then, rolling the epidemic forwards 22 days to today means multiplying this estimate by 12.1 rather than 3.1, giving an estimated number of active cases equal to roughly 194,000.

Ok, this is rough-and-ready, and there are fine details – such as the correct time-lag to use – which require verification and refinement. Nonetheless, the idea forms the basis of a serious approach to the estimation of the extent of the virus that overcomes the problems created by testing protocols. It’s also been verified on South Korea data. In the period there where testing was extensive, the predicted numbers compare very well to the actual numbers; but in the earlier period, where testing was less intensive, the predictions exceed the actual numbers, as is currently the case for the UK numbers.

In summary:

  1. Two case studies on populations that have complete testing suggest that around 1% of all infected individuals will go on to die;
  2. This enables an estimate of the true number of active cases earlier in time to be estimated: simply take today’s number of deaths and multiply by 100;
  3. Extrapolate this number forward using reasonable assumptions about the epidemic growth rate.


  1. I drew extensively from this article when writing this post.
  2. Fabien.Mauroy@smartodds.co.uk pointed me at the twitter feed of Steve Ilardi who also discusses this issue in the context of estimating epidemic numbers for the United States.

Leave a Reply