Copyright (C) CC-BY - 2020 Boud Roukema This file is available under the Creative Commons Attribution licence Licence URL: https://creativecommons.org/licenses/by/4.0/
Suppose that a government agency announces that on 10 successive days during a pandemic, the numbers of daily new infections were:
169, 169, 169, 169, 169, 169, 169, 169, 169, 169 (Example 1)
This would look highly suspicious. People do not choose to get infected in an orderly way. Medical testing stations cannot decide to publish exactly the same number of positive (confirmed infection) test results every day. The task of health agency administrative staff should be to verify that the data are authentic from the testing stations, and add them up from around the country. There should be some randomness in the numbers.
Suppose instead that the numbers of daily new infections were:
145, 150, 155, 160, 165, 170, 175, 180, 185, 190 (Example 2)
This would again look odd. Try plotting these against the numbers from 1 to 10 for the days, and you'll see a perfect straight line. Again, there is no random noise.
Now suppose that someone adds in a tiny bit of noise by hand, and the daily infection counts are:
145, 150, 156, 163, 167, 170, 175, 182, 185, 190 (Example 3)
This already has a tiny bit of randomness added. But is Example 3 what is really expected statistically? Is this enough noise to be realistic? Is this the right sort of noise - randomness - to look similar to the counts from other countries around the world? Can people get infected by SARS-CoV-2 and have their positive test results officially counted in a similar way to a military march, with everyone (almost) perfectly in step?
The reality is that natural data has many different statistical properties - properties of randomness. The paper "Anti-clustering in the national SARS-CoV-2 daily infection counts" looks at just one statistical property of the national SARS-CoV-2 counts. Example 3 has too little noise compared to that expected from the "Poisson distribution". Most countries have more noise than for a Poisson distribution, which is already more than in Example 3. Generally, the countries with more infections have a lot more noisy data.
Likely explanations of the extra noise are that a lot of this may be from super-spreader events, which sometimes happen, and sometimes don't happen on a particular day; also from the somewhat random number of laboratories that report their tests on any particular day to the city or regional or other sub-national coordinator; and a somewhat random number of sub-national coordinators who report their data to the national coordinator. It's not surprising that bigger countries, with more infections, tend to have more noise.
A few countries have count sequences whose noise properties look more like that of Example 3 rather than like those of typical countries. This is difficult to explain.