Random sampling


A recurrent theme in the COVID-19 posts to this blog is the difficulty in interpreting data and analyses due to the way data are collected. Different countries have completely different protocols for testing for the disease, and these protocols have also changed through time. Even the reported death counts are unreliable as a measure of the disease effects.

In one earlier post I mentioned two case studies where entire – though limited – populations had been tested: Vò, in northern Italy and the Diamond Princess cruise ship. Since these entire populations were studied, the data are 100% complete, but they are special cases, since they are closed populations where an outbreak of the epidemic is known to have occurred. But what about entire countries? It’s obviously impractical – at least in present circumstances – to test an entire population.

A statistically valid alternative in this case is random sampling – testing individuals randomly selected from the entire population. The proportion testing positive in the sample provides an estimate of the proportion in the entire population, and the bigger the sample, the better the estimate. Obviously there are logistical difficulties in testing genuinely randomly selected individuals, so various practical modifications are often implemented which have to be correct for in the analysis. But the principle is the same: to use information from a randomly selected sample of individuals to estimate the population level.

A study of this type has now been carried out for Austria. Full details of the analysis can be found here. In summary:

  • 1,544 individuals were included in the study which was carried out in the first week of April;
  • These individuals were identified by a stratification procedure: 249 Austrian districts were randomly selected; households in those districts were randomly selected; individuals within those households were randomly selected;
  • Such individuals were invited to participate in the study – the acceptance rate was 77%;
  • Hospitalised individuals were excluded from the study;
  • Final results were adjusted to correct for various factors including household size, gender and age.

The conclusion, after the correction for age, gender and other effects, is that the COVID-19 infection rate in the sample was 0.33%.

Now, bearing in mind that one solution to the epidemic is that a large number of the population acquire the disease, so building a ‘herd immunity‘, the figure of 0.33% is disappointingly small. Even allowing for sampling error, the true value in the population is predicted in the study to be at most 0.76%, whereas it’s thought that herd immunity will require around 60-80% of the population to have been infected.

However, this figure of 0.33% is just a snapshot in time of people who currently have the virus; it doesn’t say anything about the proportion of people who have had the virus – perhaps asymptomatically – and recovered. That figure, which is the figure of interest when discussing herd immunity, is bound to be bigger. But it’s impossible from this study to say by how much.

Moreover, extrapolating the 0.33% to the entire population of Austria would imply around 28,500 positive cases. By contrast, the number of active cases in Austria (as of today, 11 April) is recorded as 6,608:


So, even as low an estimate as 0.33% for the countrywide infection rate implies a roughly four-fold increase in the number of cases above and beyond the official numbers.


One thought on “Random sampling

Leave a Reply