I had hoped to avoid writing a blog post on corona but there is one key piece that is informing public policy and driving the popular debate that simply does not match the underlying data that unknown researchers and doctors are finding around the world. That key question comes down to how fast corona is spreading and then and arguably more importantly how widely has it already spread beyond the observed confirmed cases?
Let us use a simple example and start introducing the starting point for what we are doing here. Let’s set the start date for corona as November 15, 2019 and say that the absolute first patient recorded in the Chinese medical system with a case of corona was the absolute first case. According to public Chinese records, the start date is accurate but this person would have been contagious before being discovered. It is also likely there were other unreported cases that helped it spread much wider and faster than our restrictive assumption but our goal here is to be very conservative in the number we report. So on November 15, 2019 corona started spreading in the population from a patient of 1. The question then becomes how fast and wide did it spread in Wuhan, China, and the rest of the world?
Now the beauty of these numbers is that I do not have to prove the current estimates are wildly wrong, which they do appear to be, but that they are not perfectly accurate. Let me give you an example. If we assume that the first patient was in Wuhan on November 15 and began infecting people at what we will assume is a fixed rate, the daily rate of growth through Friday March 27, 2020 based upon confirmed cases would have been 10.51%. In other words, if corona cases grow everyday from November 15, 2019 through March 27, 2020 by 10.5%, the numbers would grow from 1 to the number of confirmed cases of 596,000. The confirmed observed cases would account for the entirety of corona transmission vectors.
Does this match what the research shows us empirically? No. In this case there are two important assumptions being made. First, the assumed growth rate matches the real growth rate and all cases are observed confirmed cases. In reality both are false and we know both are false. What is important for our purposes here, is that they do not even need to diverge enormously for the end results to be significantly different. To again, take a simple example. Let’s assume, corona is really growing at 11% daily instead of 10.5% daily. How big of a difference does that make? Now the real number is of corona infections is not 596,000 but 1.1 million all because corona grew just a little faster than we expected. 12% becomes 3.5 million and 15% becomes 118 million with real infections outpacing observed infections by a factor of 200.
So now we need to focus on two specific questions: how fast is corona actually growing or spreading within the population and how closely do the observed and confirmed cases match what we know about real case spread in the population? Let me reemphasize: I do not even need to show that these two numbers are wildly wrong only that they are even a little wrong. We can easily show with empirical data that both of the popular conceptions are not just a little wrong but very wrong.
Let us start with how rapidly corona spreads. As we can show with simple match, assuming all corona cases are observed confirmed cases, corona grew at a daily rate of 10.5%. However, according to widely cited public sources, the daily growth rate in many countries is well above this.
Arguably the most repeated phrase in corona publicity is that it is growing at 30% daily. At the outset, we can discard this number as entirely unrealistic. How do we know? If that number were true of the virus from when it became known on November 15, it would have already infected 1.4 quadrillion people. However, this number must also be false even if we just focus on the United States. Again, how do we know? Assume the virus entered the US, roughly accurate assumption based upon existing evidence, on January 1, 2020 and grew at 30% daily, it would have already infected 6.3 billion Americans. In other words, it would have infected every American man, woman, and child 19 times. However, neither can it be the globally observed number of 10.5%. If it entered America on January 1, this would result in fewer than 6,000 corona cases. So the question becomes: how fast is corona spreading?
Before we look at what research from lots of countries is producing, let us point out, many theoretical studies are using the official World Health Organization number. This number typically sits in various modeling papers of an R0 between 2.0-2.5. I will save the notes about how poorly the WHO has handled things for another time but note here that papers like the Ferguson paper rely on these numbers. The Ferguson paper utilize R0 ranging from 2.0-2.6. A lower R0 says that the disease is spreading slower than a high R0 and would be more consistent with observed confirmed cases equaling the total number of real cases. A higher R0 means corona is spreading faster and implies higher numbers of unobserved cases. Given the WHO use of these numbers, many researchers can be forgiven for relying on them. However, are the official WHO R0 accurate?
In China, researchers generally put the R0 distinctly higher than WHO range of 2.0-2.5. One literature review of R0 estimates for China found “estimates ranged from 1.4 to 6.49, with a mean of 3.28, a median of 2.79.” One study from researchers at the Yale School of Public Health focusing on Wuhan discretely on the December 1 to January 1 time frame estimated an R0 of 4.1. Another paper focusing on China and using a mathematical model used by later researchers at Harvard, found an R0 of 3.3 and noting a daily growth of .3. There are a lot of studies that either directly or indirectly estimate this number but there are not many studies from China supporting the WHO claim of R0.
What about other countries? In Iran, one team of researchers estimated the prelockdown R0 at 4.7 though dropping to under 2.0. Another team focusing on Iran from Georgia Southern University estimated the R0 at 3.6. Another team from Iran studying Iran estimated the preloackdown R0 at 4.9 the first week of infection dropping to 4.5 and 4.3 before entering lockdown where it dropped to 2.1. What is notable in this case is that even when implementing movement restrictions the R0 for corona still remained within WHO R0 range.
In Italy, Chinese researchers estimated an R0 of 4.1 compared with their estimate of 3.15 for Wuhan. Other Chinese researchers comparing South Korea and Italy found R0’s of 2.6-3.2 and 2.6-3.3 respectively. An Italian mathematician using standard disease modeling softward found an R0 of 3.5. One researcher doing a cross country study estimated Italy at 3.8.
What is notable however is in countries that appear to have been doing well. In South Korea, numerous pieces of research found examples of “super-spreader” clusters with R0 around 4 with the rest of the country on heightened alert having a very low R0 around 1 in this case. One researcher estimated that the total country R0 was at 3.8. In the case of Singapore and Tianjian, China some Canadian researchers found that even with significantly more restrictive measures R0 at 1.97 and 1.87 just barely below the WHO standard range but within the margin of error.
There are a few specific notable take aways from this brief literature review. First, this research was all done using very standard methods. Second, like any research area there will be a range but there is striking similarity in arriving at R0 numbers that are materially different from the WHO range. Third, even in some countries held up as examples, significant steps appeared to only push the R0 just below the WHO range and not even below 0. Fourth, events are unfolding too fast that we do not have research on other countries like Germany, Spain, and the United States. Fifth, these findings indicate that the disease spread much more than the WHO guidelines. This is important because as has been noted earlier, to demonstrate the broader point, I only need to demonstrate that corona grew a little faster than believed. The weight of evidence clearly demonstrates that corona grew much more than is believed. This has a major and material impact on public policy.
This leads to the second important question of what percentage of real corona cases are captured by observed and confirmed corona cases people watch on the internet? Importantly for our purposes, there are many factors that would lead us to miss corona cases. In sum, these add up to a large percentage of missed cases. Let us detail some of the problems.
First, the speed is wrong which tells us how many to look for. The WHO estimates have guided numerous theoretical models which are relying on faulty assumptions. Think of it another way. Think if you are flying a plane and you only have directions to fly in a specific direction at 500mph. Unbeknownst to you however, you have actually been flying at 1,000mph. So when you look at your watch and the directions say you should start to see the airport, you cannot find it. We can say with a pretty high degree of certainty that based upon empirical country by country analysis, the spread is much broader than WHO guidelines advise.
Second, testing in most parts of the world, especially China and Asia, is significantly flawed with high numbers of false negative tests. Research from China finds false negative tests on 30-60% of positive patients with some studies going higher. Let us use a low estimate and say 25% of cases who got tested for corona tested negative were actually positive and they in turn infected one other person at some point. Both are low estimates but for our simple example will work. In an additional attempt to be conservative, I will do this only for China. This gives us an additional 27,00 additional cases just from false negatives and if we assume they each infected one person, this gives us a total 54,000 new unknown cases. How big an impact would that be? That increases global cases by 9.1%.
Third, we know that numerous key countries record cases differently leading to large undercount. Leaving aside political interference, we know that China for instance states they do not count asymptomatic or only mild cases even refusing to test. Mild an asymptomatic carriers are still transmitters of the disease even if they do not require significant care, hospitalization, or ICU access. According to places that have taken broad quasi random or population tests, asymptomatic and mild cases of corona are the large majority of all cases. For our purposes, we will again undercount this population and assume them to be 50%. So if China records 82,000 moderate to severe symptomatic cases, this will imply an asymptomatic or mild population of 82,000. We make the same follow up assumption, each of these people then infected one additional person. This gives us an additional population of 164,000 cases not counted in official data. We now have 218,000 new cases just from how data is counted. This would be equal to 37% of total global cases. However, there are other channels through which corona cases have been missed.
Fourth, another class of people that are missed in the total count is the number of infected that never get tested or consider getting tested for various reasons. This could range for having mild symptoms so that one has no reason to get tested or that no symptoms present in the patient. This is actually quite common. Just as a simple baseline, of the roughly 40 million estimated flu cases in the United States every year only about 40-45% of those result in medical visits. Another way to think of these people is people who did not even know they were infected. While we would be justified in setting this number as equal to confirmed cases, we want to be conservative, so I will set this as equal to one third of confirmed cases across some major countries. Let’s take a low round number of 200,000 untested cases based upon 600,000 global cases. This now raises our missing count to 418,000 out of 600,000 total global cases though that number rises daily.
It cannot be emphasized enough, the numbers I have added to confirmed observed cases are very conservative based upon highly restrictive assumptions. What I have done here however, is used theory and anecdote to build a very simple and conservative model of how many are missing. The question is do we actually see evidence based upon documentation and research of the missing cases I am arguing are out there? In short, absolutely AND in much higher numbers than I am conservatively estimating.
We know of a few places, and I am absolutely not excluding others only that these are the ones I know about, where we have either quasi random or broad population testing. These tests are not perfectly random but they give us clues about how broadly spread throughout the population corona is beyond the observations being taken at hospitals and medical centers or by self selected individuals.
Iceland has taken a broad population sample that now comprises nearly 5% of the entire population. There is not much detail about the specific testing criteria but the reports are that it was designed to test for corona in the population and not merely sick at the hospital or a medical clinic. According to Icelandic data available from the government, they recently listed 963 confirmed cases for a positive rate of 6.6%. Sounds bad right? Actually, because of the testing criteria, most people did not even know they had corona because they did not feel any worse. Out of the 963 cases only 19 needed hospitalization and only 6 needed ICU care. In other words, while there have been some tragic outcomes, corona is already much more wide spread and it has no or only mild impact on most people who tested positive.
In Holland, a similar thing happened. Hospitals in the Netherlands started seeing reports of corona and had a couple staff members fall ill from corona. To make sure their staff was protected, the conducted a broad population testing on medical staff to see how prevalent corona was among front line personnel. According to their data, after testing 1,353 they found 86 positive tests for corona a rate of (tell me if this sounds familiar) 6.4%. Only half had a fever and most positive cases continued working because they felt either no different or symptoms were so mild. Notably under existing criteria 40% of positive cases would not have been even tested because they did not have a known risk factor.
Westchester county in New York implemented an aggressive testing plan. Their tests were largely by self selected individuals and needed to pass a criteria screening protocol so it is different from Iceland and Holland in key ways. However, they have tested 29,000 people. What have they found? They registered 7,187 positive result for a 25% positive rate. This large jump in positive rate is expected from self selected individuals that pass a screening protocol. So what is the severity? 73 hospitalized and 12 deaths in county. In other words, from positive cases in Westchester county, hospitalization rate of positive cases is running at 1% and deaths are running at 0.26%.
There is another case of broad population testing. The first resident of Vo, Italy died from corona in late February. Local officials being aware of corona virus spreading in China with the help of a local university opted to quarantine and test the entire city of roughly 4,000. Now it is worth noting that Vo is a small remote town in north eastern Italy. It is not one of the major cities which would have lots of international traffic. 89 people out of 3,300 or roughly 3% of the population tested positive after they discovered one death. There is no additional data but they do not report additional tests and actually advised people against going to the hospital unless they had severe problems. Testing two weeks later revealed positive tests dropped to roughly 0.5%.
What we have seen in each of these four cases is very broad infection levels well beyond either the expected infection levels and very low levels of severity relative to the total number of cases. In other words, if we factor in the unobserved cases that are being found this dramatically changes our view of the broad severity of corona. What has been taking place before is observations being recorded at hospitals of sick people needing medical care. This gives us a biased view of disease severity. This is like going to a car crash and then making a decision about all vehicular traffic. What is potentially most notable is that the number of real cases of infection relative to observed cases of infection is not off by a percentage but many many magnitudes. In other words, the unobserved real transmitters do not comprise even 50% of total cases but like many magnitudes of observed cases.
However, are these four cases with seemingly reliable data from different places an outlier or are they consistent with research from other places? In fact, they are very consistent with research from other countries about corona spread.
One paper by American researchers conducting a cross country study with anchor countries estimated that Chinese confirmed cases were under estimated by a factor of nearly 10. If remotely accurate that would more than double existing cases globally. Mathematicians at the university involved in testing Vo wrote a paper estimating that globally undetected cases are four times larger than detected cases. What makes these asymptomatic carriers so difficult is that they may not even know they were exposed and are disease carriers. Other Italian researchers reached similar conclusions writing “our model predicts that when the first 3 infected cases had been identified by Italian authorities there were already nearly 30 cases in Italy…only 0.5% cases had been detected and confirmed by Italian authorities. While official statistics had 132 confirmed case we believe a more accurate estimate would be closer to 26000.” This is important because it reaches general conclusions in line with the Vo population sampling. UK researchers using different methods reach similar conclusions finding that “by the time a single death occurs, hundreds to thousands of cases are likely to be present in that population.” Even statisticians from the Italian National Institute of Statistics have completed research finding that corona is likely at least ten times more wide spread than official cases.
A key aspect here is how the disease spread from China in the early stages when they were censoring information. One cross country study from China, the UK, and the US using a combination of unique datasets estimates that undetected cases in China account for 86% of all corona cases. Another study found that though Wuhan was the epicenter, for every one case exported internationally from Wuhan other major Chinese cities exported 2.9 cases that “likely remained undetected”. Japanese researchers focusing on Wuhan re-estimating real infection rates rather than official Chinese rates with asymptomatic transmission estimate that real infection was much lower with actual time delayed fatality rate of 0.12%.
One of the key areas is how much this under detection of asymptomatic travelers led to disease spread. A research paper between researchers at Peking University and Cedar Sinai Hospital in Los Angeles estimated there were roughly 10,000 asymptomatic carriers who flew from Wuhan and China to the United States as of March 1, 2020. Researchers at Harvard found most countries had low level capability to detect asymptomatic travelers setting their detection rate to 38% of Singapore noting that “estimates of case counts in Wuhan based on assumptions of perfect detection in travelers may be under estimate by several fold, and severity correspondingly over estimated by several fold.” Another similar study found that even under the best of circumstances more 50% of carriers would not be detected because they would not have symptoms and many would not even know they had been exposed. It is worth noting that research generally find that disease detection for corona is very hard given the prevalence of asymptomatic carriers.
In short, not only are the real number of cases under estimated based upon observed confirmation data, they are likely enormously under estimated. As one can see from the research by in country researchers from Iran to Italy and Japan, there is a range of magnitude but there is significant agreement that the real number is off by at least numerous multiples.
This post has already gone on too long but I wanted to document the logic and the research that supports what I think is happening. The two primary key ideas so far are that corona has spread significantly faster than the WHO estimates and that there are large amounts of undetected cases of corona. So what is the importance of all of this and what does this mean for policy makers?
First, track and trace is a pointless. If your confirmed cases account for only 50% of your threat vectors you would have a hard time constraining the spread. If your confirmed cases account for 10% of your threat vectors track and trace is little more than a waste of time. Undetected transmission cases are simply too numerous and wide spread to make track and trace effective.
Second, case severity out of the infected population is actually much much lower and by one account on par with flu outcomes. Current numbers widely used in the press grossly distort the reality by measuring data at known severe outcome locations. This is like measuring driving safety at a hospital. The overwhelming number of observations will be severe cases. When looking at the population wide data, case severity appears radically different. This is not to say corona is only the flu as we do not have the depth of data to say that, only that it is decidedly less severe against infected population which when adjusted are decidedly less severe than they first appear.
Third, the two primary points here actually dovetail nicely. For the R0 to be higher than the WHO guidelines, we needed to have significantly higher population estimates. We have significantly higher infected population meaning the R0 is significantly higher. The results are complementary.
Fourth, the most widely cited Ferguson paper therefore relies on factually incorrect assumptions by assuming a much lower R0 that is only reached according to one paper under lockdown and by assuming there are very few undetected cases. We can say with a high degree of certainty both assumptions are wrong and enormously impact the scientific validity of the paper.
Fifth, governments would be well advised to focus mitigation efforts on less intrusive and disruptive behavior rather than near total economic shut downs. Other countries have focused on factors like increased mask wearing to help slow transmission. Put another way, more targeted precision policies rather than the broad blunt disruptive measures we have seen.
Sixth, a key factor in outbreaks has been super spreader nodes. From South Korea to Italy, there are clear examples of super spreader nodes. In many cases those nodes can be responsible for the overwhelming number of cases. Rather than disrupting an entire country, it is important to focus efforts on mitigating and detecting those nodes.
Seventh, there are clear examples of medical resources being taxed at super spreader locations. Public policy interventions would best be focused on helping allocate medical resources to specific locations rather than using broader blunter instruments. Whether that is through dispersing patients throughout large metro areas or short term larger facilities. Importantly, it would help to focus on targeted resource allocation to impacted areas rather than broad more coercive interventions.
Let me emphasize a few brief final points. First, nothing here is intended to down play or pretend corona is not a significant health risk. Do not behave irresponsibly. However, the panic and debate clearly suffers from profound misunderstandings. Second, I do not mean to sound insensitive when writing here. Clearly people are dying and there are difficult problems. My intent is simply to try and grasp what is best research and data telling us about the risks and best approaches.
The final pieces of research I will leave you with are this: a meta study of corona R0 found a pooled R0 number of 3.32 with a mean of 3.38; probability of being diagnosed correctly with severe symptoms at 0.6, diagnosis probability with mild or asymptomatic at 0.001, and the probability of developing severe symptoms at 0.01; higher transmission from interval between incubation and symptom onset allowing carriers to infect larger number of people.
Hope I’ve given you something to think about.
Addendum: I wrote this on Word and for some reason the links did not transfer. I have uploaded the document with links here in a PDF to the research for anyone that wants to review what I have cited.
Pingback: Sunday assorted links - Marginal REVOLUTION
Wow that is quite the tome not to have section headings and even bolded key sentences and phrases. Great subject matter; maybe a great piece too… but I’ll never know because I didn’t have the patience.
Pingback: On numerators and denominators - Occam All Ye Doubtful
Pingback: Corona Links, 20th March, 2020 - EconForEverybody
If this is correct then the good news is that we’ll be much quicker to the point where we have a herd immunity.
Let’s assume that your analysis preceding your policy takeaways is all valid.
Are you sure that policy takeaway #5 (precision policies are better than shutdowns) is actually an implication of your previous analysis?
Correct me if I’m wrong, but I interpret your analysis to say that (1) corona is much more infectious than currently widely estimated, and (2) corona is less deadly than currently widely estimated. (1) obviously pushes final death count higher, while (2) obviously pushes final death count lower. The question is which effect (1 or 2) “wins” and by how much, right? A highly infectious not-that-deadly disease can still kill insane amounts of people right? Italy’s northern Bergamo province had March deaths ~6x higher than previous years, as reported by the Wall Street Journal below. Thankfully daily deaths started to plateau ~10 days ago, about ~2.5 weeks after lockdown. ~2.5 weeks was also the period between lockdown and daily deaths plateauing in Wuhan.
Are you claiming any of the following:
a) the lockdowns are not responsible for the daily death plateus ~2.5 weeks after lockdown. (China’s plateau lasted ~2 wks, with daily deaths then declining.)
b) if there were no lockdowns, the daily death plateaus would not be much higher and much later.
:
https://www.wsj.com/articles/italys-coronavirus-death-toll-is-far-higher-than-reported-11585767179
Pingback: Best Case Scenario for Coronavirus? It's Super Infectious | The360Magazine
Pingback: Best Case Scenario for Coronavirus? It’s Super Infectious - Ask Your Family Doc
Pingback: The Best-Case Scenario for Coronavirus Is That It’s Way More Infectious Than We Think - New York Magazine - Coronavirus (COVID-19) World News & Prevention
Pingback: Best Case Scenario for Coronavirus? It’s Super Infectious - Earthy
Pingback: Best Case Scenario for Coronavirus? It’s Super Infectious - Corona Tracker
Even 1.4 quadrillion people is not enough.