Test, test, test…

The testing of individuals for COVID-19 has become an urgent and sensitive issue. Some of the main questions are:

  1. Who should get tested and when?
  2. Why aren’t frontline health workers given tests automatically?
  3. Why aren’t all countries doing everything possible to follow the WHO advice of ‘test, test, test…’?

One probable lesson from this epidemic will be the importance of a testing strategy that’s needs to be in place prior to any future epidemic.

But as well as these questions, which concern the importance of testing on the health of individuals and the ability of society to cope with the epidemic, the issue of testing also imposes limitations on how the spread of the epidemic can be studied from a statistical perspective.

First, all countries have different protocols for testing. In some, anyone can ask to be tested; in others, only hospital patients are tested. It follows, therefore, that when different countries report different numbers of cases, this might be because there are genuinely more cases, or it might be because one country is carrying out more tests than the other. The same difficulty applies in a single country: if the number of cases changes from one day to another, is it because there is genuinely a different number of cases, or because the testing protocol has changed? This all means that when comparing figures across countries or through time, you need to be cautious that any differences might be at least partially due to differences in testing practices.

Another issue concerns the rate of infection. All of the epidemiological models on which government decisions are based require estimates of the rate of transmission of the disease and the proportion of a population that are susceptible to the disease. But if data are only available from the subset of a population that have been tested, and these are the individuals that are more likely to have the infection – they were tested precisely because they were showing symptoms – then we can’t directly estimate these quantities for the population as a whole.

Clearly, this isn’t the time to be wasting resources on random testing of individuals: how can we justify wasting resources testing people who are likely to be uninfected when we’re not testing health workers who are much more likely to be infected and for whom knowledge of infection status is crucial? But, fortuitously, there are two important case studies.

The first derives from a small town Vò Euganeo in Italy, close to Padova (where I actually used to live and work).  Early on in the Italian epidemic – on 21 February – one person from Vò died as a result of Coronavirus.  This led the local government to take two forms of action. The first was to place the town in lockdown, essentially sealed-off from the rest of Italy; the second was to test all 3,300 or so inhabitants of the town for the disease, both immediately and two weeks later. They found:

  1. Somewhere between 50% and 75% of the population tested positive for Coronavirus, but were asymptomatic;
  2. The number of daily new invectives fell from 88 to 7 over the lockdown period.
  3. The mortality rate among all people infected by the disease, showing symptoms or not, was around 1%.

Each of these data provides important information. First, there are likely to be many more people infected by Coronavirus than those who end up testing positive: many people are carriers without showing any symptoms. Second, the policy of locking down a region is effective in reducing the number of cases. And third, we get a reliable estimate of the fatality rate among all people infected by the disease. This final point is actually extremely important, as I’ll discuss below.

The second case study comes from the now infamous Diamond Princess cruise ship, which was quarantined off the shores of Japan when a number of passengers were found to be carrying the virus. The situation there is a little different in that there was no real possibility of preventing contagion between passengers. However, all passengers were tested, so can again get reliable measures of the true spread of the infection – albeit in a closed community where the virus is already present – and of the true mortality rate among infected people. The results in these respects were almost perfectly in line from this of Vò Euganeo: a significant proportion of the passengers tested positive, but were asymptomatic; and after a correction for differences in the age distribution, the mortality rate among infected individuals was around 1%.

The fact that there are a large number of asymptomatic cases in a population has good and bad consequences. First, the virus is not just non-fatal or non-serious for a large number of people, it’s also not even noticeable. But this itself has good and bad consequences. On the negative side, this means there are many potential transmitters of the disease that wouldn’t be isolated in a program which simply encouraged people with symptoms to self-isolate. This is why wider programs of social-distancing are important, even for people who are totally healthy. On the positive side, once these asymptomatic people ‘recover’ from the virus, they will contribute to a potential buffer in the community via the ‘herd immunity’ effect as discussed in an earlier post.

Another important statistical issue derives from the 1% mortality rate. Though of importance in its own right when modelling the epidemic, this value also helps us estimate the spread of the epidemic. As discussed above, there are likely to be wide variations in the reported number of cases from country to country due to differences in testing protocols. However, the number of deaths due to the virus is likely to be better standardised across – and definitely within – countries. Admittedly, some countries may still have different protocols for ascribing cause of death to the virus, but it seems reasonable to assume this effect will be smaller than that caused by differences in testing protocols.

So, rather than using the reported number of cases as an indicator of the true spread of the epidemic, it is likely to be more reliable to take the number of deaths and divide by 1%; or equivalently, multiply the number of deaths by 100. Of course, if someone dies from the virus, there’s a lag between their having tested positive and their death. One published estimate for the average time is around 9 days. It follows that if we take the number of deaths on any particular day, and multiply this number by 100, we get a reasonable estimate of the true number of infected individuals 9 days earlier. For example, in the UK, there were 16 deaths reported due to Coronavirus yesterday (17 March). This would suggest there were around 1600 active cases 9 days earlier on 8 March. But the reported number of active cases on 9 March was just 257. This would imply that due to either testing protocols or the fact that individuals were asymptomatic, only around 1 in 6 of infected individuals were recorded as such. Then, since the size of the epidemic doubles roughly every 6 days – which implies a daily increase of around 12%, – we get an estimate of today’s true number of active cases – 10 days after 8 March – by multiplying by 3.1 (which is 1.12 multiplied by itself 10 times). This leads to an estimated number of active cases of around 4,900.

Other authors have suggested the assumption of a 9-day lag is not accurate, arguing instead for a lag of 21 days. This would imply 1600 active cases on 25 February, rather than 9 March. Then, rolling the epidemic forwards 22 days to today means multiplying this estimate by 12.1 rather than 3.1, giving an estimated number of active cases equal to roughly 194,000.

Ok, this is rough-and-ready, and there are fine details – such as the correct time-lag to use – which require verification and refinement. Nonetheless, the idea forms the basis of a serious approach to the estimation of the extent of the virus that overcomes the problems created by testing protocols. It’s also been verified on South Korea data. In the period there where testing was extensive, the predicted numbers compare very well to the actual numbers; but in the earlier period, where testing was less intensive, the predictions exceed the actual numbers, as is currently the case for the UK numbers.

In summary:

  1. Two case studies on populations that have complete testing suggest that around 1% of all infected individuals will go on to die;
  2. This enables an estimate of the true number of active cases earlier in time to be estimated: simply take today’s number of deaths and multiply by 100;
  3. Extrapolate this number forward using reasonable assumptions about the epidemic growth rate.


  1. I drew extensively from this article when writing this post.
  2. Fabien.Mauroy@smartodds.co.uk pointed me at the twitter feed of Steve Ilardi who also discusses this issue in the context of estimating epidemic numbers for the United States.

Be sufficiently worried

Though Smartodds loves Statistics is in hibernation while we work out what direction it should take in the future, the current Coronavirus epidemic raises important questions – many of which are statistical in nature – so I thought I would write some occasional posts specific to this topic.

First off, Richard.Greene@smartodds.co.uk pointed me at this topical video showing how the growth in the number of cases of Coronavirus can be modelled by exponential and logistic curves.

Take a look:

As the video explains – admittedly with numbers that are now a little out of date – while an epidemic is in its exponential growth phase, the daily increase in the number of cases is given by:

 E \times p \times N


  • E is the expected number of people an infected person is exposed to;
  • p is the probability an infected person will infect a person to which they are exposed;
  • N is the number of people currently infected.

What this means is that the number of new infections is proportional to the number of current infections. So the rate of infections grows just as the number of infections grow, and this is what leads to the familiar exponential growth curve.

But crucially, although the growth curve will always be exponential in shape, the precise trajectory is massively affected by the values of E and p, each of which we have some individual control over during the epidemic. In particular:

  • E can be reduced by reducing the number of daily contacts we make;
  • p can be reduced by improving our personal hygiene habits.

So, although this is just a mathematical idealisation of the virus spread, it tells us in material terms what we can do in our day-to-day lives to minimise the growth. And slowing the trajectory of the growth is essential in allowing health systems time to manage the epidemic; in allowing time for possible seasonal effects to kick in – the hope that warmer weather will stunt virus transmissions; and allowing time for the identification and testing of potential vaccines and cures.

The other crucial fact is that in practice the exponential growth phase won’t continue indefinitely. The video discusses the consequences of having a finite population – as the number of infected people increases, an infected person is bound to meet fewer uninfected people – leading to the exponential-type curve flattening into a logistic shape and the eventual termination of the epidemic. But other factors too will halt the exponential growth. These include the development and  distribution of a vaccine, and the socio-demographic measures – progressively and aggressively altering E and p through strict social management – as seen in China, South Korea and more recently Italy.

So, although exponential growth is alarming – as the video shows, the current data suggest that the number of infections will multiply by 100 every month or so – there are also reasons to be optimistic about the demise of the epidemic. We can slow its progress through the reinforcement of sensible hygiene practices and modifications to our social behaviour, each of which will quicken the progress of the epidemic from the exponential growth phase to the  part of the logistic curve where the rate of growth diminishes and the epidemic peters out.

As the video concludes: be sufficiently worried. In other words, don’t panic, but don’t be complacent either. Good habits will offer you the best personal protection against the virus, while also helping stem the spread worldwide.

I’m no expert on epidemiology, but if you have any questions about the video above in particular, or about statistical aspects of the Coronavirus in general, I will try to answer them.

As the epidemic continues, I’ll also include further posts here from time to time, some linking to interesting articles, others, like this one, trying to explain the role of statistics in understanding the way the epidemic is likely to develop.

Time to kill?


Smartodds loves Statistics would like to remind you that the clocks go back an hour this weekend.

You probably heard that the EU is planning to end the practice of switching between ‘summer’ and ‘winter’ times, in which clocks are artificially moved back and forward by an hour at the end of October and March respectively. The rationale for this procedure of so-called daylight saving is closely linked to historical social, agricultural and industrial demands on energy supplies, but what was relevant a century ago, when the practice was first devised, is rather less relevant today.

Some media stories also suggest that putting an end to daylight saving is rather more urgent. For example: “Daylight Savings Time Literally Kills People“. Or even more dramatically: “Why Daylight Saving Time will Kill us All“.

In part there is some basis to these stories. Messing slightly with people’s regular sleep patterns can induce extra tiredness, and there is some evidence that over an entire population this can lead to an increase in the number of  driving-related and other accidental deaths. The effect is very slight though, and really says more about the effect of sleep-deprivation on accidental deaths than it does about daylight saving per se.

Rather more surprising and intriguing though is an apparent increase in the rate of heart attacks on the day after clocks go forward an hour in March, with a similar decrease on the day after they go back in October.  A recent study published by the British Medical Journal found that there was a 24% increase in patients presenting for acute heart attacks on the day after clocks go forward, and a 21% decrease on the day after clocks go back. This was based on a study of many patients over several years, and so the differences are too big just to have occurred by chance. So what’s going on? Does daylight saving give people heart attacks?

Well, the first thing a statistician will do is look for other factors which might explain the results. For example:

  1. Since clocks always change early on a Sunday morning, are Sunday, or maybe Monday, generally different from other days of the week in terms of heart attack rates, regardless of the clock change effect?
  2. Are there more heart attacks generally at some times of the year compared to others?

The answer to both these questions is yes, but in the analysis reported by the BMJ both of these effects, and others, were accounted for, so the unusual increases and decreases following daylight saving time changes are after such allowances have been made. So again, what’s going on? Does moving the clocks induce heart attacks?

Well, not really. When the researchers of the BMJ study counted the number of patients attending hospital with heart attacks within the entire week following a change in daylight saving, rather than just the next day, then they found no difference at all following  the time change in March or October. Perhaps for physiological or social reasons, heart attacks appear to be slightly delayed – on average – after the change in October, and sped up after the change in March. So if you look only at the days immediately following the change, it does look like the change itself is changing the rate of heart attacks. But over a slightly longer window of a week or so, there’s no evidence of a change at all.

In summary, moving the clocks forward or backwards won’t induce anyone to have a heart attack who wasn’t going to have one anyway; the change might just cause someone’s heart attack to occur slightly earlier or later in the same week.

There seem to be two useful messages from this:

  1. As with Simpson’s paradox, we see the danger of simply carrying out a statistical analysis without taking into account the context. Testing the daily data for whether is a change in heart attack rates when clocks are changed suggests there is an effect. But understanding the context of the problem and looking at the data over a slightly longer timespan indicates that there is no real change.
  2. The media are often just interested in a good story, and won’t let concerns about the quality of a statistical analysis get in the way of that.

I stole most of this material from Matt Parker, who describes himself as a standup mathematician. (I know!) Anyway, if you’re interested, here’s his take on the issue:

Simpson’s paradox de-paradoxed


In an earlier post I gave two examples of Simpson’s paradox. The first example concerned the success rate of two different procedures for the removal of kidney stones; the second concerned the batting averages of two baseball players. In both cases there seemed to be a contradiction in the data depending on how the data were analysed. I’d like now to try to explain this phenomenon.

Though the context is slightly different and I’ve actually just invented the data, you can perhaps see what might be happening from the following picture. These are (fictional) measures of ratings of a number of  sportspeople against their age. What conclusions would you draw?


It’s a noisy picture, but the general pattern seems to be that rating improves with age. Indeed, if I use standard statistical procedures to estimate the general trend in this picture I get the green line as shown below:


This confirms a general tendency for rating to improve with age (notwithstanding some variation around the general trend). But suppose I now tell you that actually these data correspond to several observations through time on just two players. In the plot below, I’ve coloured the observations separately for the two players. What do you conclude now?


You’d probably conclude that the blue player is more highly rated than the red player, but for both players ratings reduce with age. This is again confirmed with formal estimates of trend lines for each of the players:


So, when looking at the two players separately, age causes ratings to go down for each player. But aggregating the player data, as we did in the first plot, leads to the misleading conclusion that age tends to result in increased ratings. It does for the aggregated data if we ignore the player information, but the more likely explanation for that is that the older of the two players in our data just happens to be a much better player, and that the real effect of age is a reduction in ratings for both players.

This is Simpson’s paradox: by ignoring a hidden variable (in this case the player identifier) we get a misleading picture of the relationship between the original variables (rating and age). Sure, ratings increase with age, but only because the older player had much higher ratings overall. Looking separately at each player, ratings go down with age.

A version of this same phenomenon occurs in each of the examples from the previous post.

  1. If you look back at the kidney stones data, doctors tended to give treatment A to patients with the more severe disease (larger kidney stones). This reduces the success rate for treatment A; not because it’s a less effective treatment, but because it’s being used on patients whose condition is more severe. Indeed, it looked like treatment B was best from the aggregate data. But the true story emerges from the original tables: treatment A is best for all patients.
  2. Simpson’s paradox arises in the baseball example because of the large differences in the number of appearances at the plate per year for the two batsmen. Derek Jeter has far more appearances in 1996 than 1995; for David Justice it is the reverse. This means that the aggregate batting average for Jeter is close to his ’96 value, while for Justice it is his ’95 value. Moreover both players had better averages in ’96 compared to ’95. Consequently, the overall averages favour Jeter, who had most of his appearances in ’96, the year in which the averages were higher. Yet even in ’96 Justice’s average was higher; it’s just that it was based on relatively few appearances. Clearly, Derek Jeter was the better batter over the entire period, despite the quirk of having a lower average than David Justice in both years.

What’s especially interesting from these two examples is that the ‘correct’ resolution of the paradox is completely different in the two cases. For the medical example, taking the experimental situation into account, the non-aggregate interpretation is best: treatment A was best for both types of kidney stone and should be preferred, even though treatment B had the highest overall success rate. But with the baseball data,  Derek Jeter was the superior batter since he had the highest overall average, even though his average was beaten by that of David Justice in both years.

The moral is that Statistics is bound to be a more intricate process than that of simple number crunching. Here we had two different situations which led to the same phenomenon of Simpson’s paradox. But in one case an understanding of the experimental setting supports a non-aggregated solution; in the other the aggregate solution is best. Context is everything: treat data as if they are numbers without context and there’s a very good chance you’ll draw entirely the wrong conclusions.

Harry.Hill@smartodds.co.uk pointed me to a gif that illustrates Simpson’s paradox in much the same way as my non-animated graphs above. I’m not sure this is exactly the gif Harry suggested, but the gist is much the same. So if you prefer your Simpson’s paradox explanations all-dancing and in Technicolor, here you go:



You probably get the idea by now. Looking at just the raw data (the black dots before the animation starts) there is a strong downward trend (shown by the red line once you start the video). But if you let the video roll you’ll see that different groups of the data belong to different individuals, as indicated by the different colours. The trend line for every one of those individuals is positive, even though the overall trend was distinctly negative.


Simpson’s paradox


Here’s a fictional conversation from Match of the Day:

Lineker: United have picked up just 3 points from their opening 3 games. How many years it is since United had such a terrible start to a season?

Shearer: Ooh, that’s one for the statisticians.

It’s fictional because I don’t think it actually occurred. But it’s real in the sense that it’s a typical conversation reflecting a commonly-held view about the importance of statistics and the role of statisticians in a sporting context. I want to de-bunk this point of view,  and one of my aims in this blog is to show how statistics has a much more important role in the study of sports, above and beyond dredging through the history books to identify periods of bad United results.

In this post we’ll look at Simpson’s paradox. It’s a simple and unsettling phenomenon that arises in many different situations, and provides an illustration of why Statistics is more than just summarising data. We’ll look at two real-life examples (both taken from Wikipedia).

The first set of data come from a medical trial into the success rates of procedures for the removal of kidney stones. The study compared two available procedures, labelled A and B respectively, and analysed the results separately for both small and large kidney stones.

The success rates for a sample of patients with small kidney stones are given in the following table.

Small Stones Treatment A Treatment B
Success Rate 81/87 = 93% 234/270 = 87%

So, for example, 87 patients were given treatment A, and in 81 of these cases the treatment was deemed successful. This corresponds to a success rate of 93%. Similarly, the success rate for the 270 patients given treatment B was 87%.

For patients with a large kidney stone, the success rates using treatments A and B are summarised in the same way in the following table:

Large Stones Treatment A Treatment B
Success Rate 192/263 = 73% 55/80 = 69%

As is clear from the tables, for patients with either small or large kidney stones, treatment A has a higher success rate than treatment B, and if you were a doctor having to decide which treatment to offer to a patient, all other things being equal you’d surely choose treatment A for both types of patient.

But suppose we group all the patients together, simply adding the data from the previous tables, and then calculate the success rates with either treatment. This results in the following table (check for yourselves):

All Stones Treatment A Treatment B
Success Rate 273/350 = 78% 289/350 = 83%

Remarkably,  from exactly the same data, treatment B now has a higher success rate than treatment A!

This is Simpson’s paradox.  Having just the information from the combined table, a doctor would recommend Treatment B. But having the two separate tables for small and large kidney stones, a doctor would recommend Treatment A for both types of patient. It seems to defy all reasonable logic.

I’ll leave you to think about (or to Google) this example for a few days. I’ll then post again with some discussion.

But first here’s another example, this time in a sporting context. In baseball the standard measure of a batter’s performance is their batting average: roughly speaking, the proportion of times they make a successful hit from an appearance at the plate. The following tables compare the batting averages of two particular batters in 1995 and 1996 respectively:

1995 Derek Jeter David Justice
Batting Average 12/48 = 25% 104/411= 25.3%
1996 Derek Jeter David Justice
Batting Average 183/582 = 31.4% 45/140 = 32.1%

So, for example, in 1995 Derek Jeter made 48 appearances at the plate and made 12 hits, leading to a batting average of 25%. And in the same year David Justice recorded a batting average of 25.3%. Indeed, comparing the averages in both tables, David Justice recorded a higher batting average than Derek Jeter in both 1995 and 1996.

But, if we combine the data from the two tables to get the results for the entire period 1995-96 and re-calculate the averages, we get the following:

1995-96 Derek Jeter David Justice
Batting Average 195/630 = 31% 149/551 = 27%

We see Simpson’s paradox again. Derek Jeter has a higher batting average over the entire period even though David Justice had the superior average in each of the 2 seasons. So who was the better batter?

Like I say, I’ll leave this here for a while and discuss again later. Feel free to add something in the comments section if you’d like to discuss or ask questions.

One final thing: although I’ll save discussion of this paradox till another post, I will say that it doesn’t arise just out of chance. I mean, it’s not just a quirk of having too few data and that if we had bigger sample sizes it would all just go away. It’s a genuine – and rather disturbing – phenomenon, and can only be resolved by a deeper understanding of statistics than the arithmetic analysis provided above.

Footnote. Here you go Alan: at the time of writing (after 3 games) this is Man United’s worst start to a season since the 92/93 season when they lost their opening 2 games to Sheffield United and Everton, and drew their third against Ipswich. Terrible. But they did go on to win the league that season by 10 points!