Wednesday, April 29, 2020
Things are not going well these days regarding predicting the future of coronavirus in the U.S., with the epidemiological community, including critical government agencies, not succeeding in these important areas:
- They do not know the percentage of the U.S. population with active or past COVID-19 infections.
- They do not have the ability to quality control and combine virus testing information into a coherent picture of the current situation. This is a big-data problem.
- The epidemiological simulation models used by U.S government agencies or American universities have a poor track record in their predictions, with their quantification of uncertainty unreliable.
But there is a group in the U.S. with deep experience and a highly successful track record in predicting complex environmental threats. A group that is masterful in taking observations, combining them to create a good description of reality, building and testing predictive models, providing uncertainty information, and communicating the information to decision makers for critical life-threatening situations.
You know them these people: meteorologists involved in the large U.S. numerical weather prediction community. And perhaps meteorologists can help epidemiologists and the U.S. government to get a handle on the coronavirus situation.
Now don’t take this blog as one uppity weather guy trying to give advice “outside his lane.” A published paper in the Journal of Infectious Diseases (2016), said much of the same, with the authors noting the huge similarities in the work meteorologists and epidemiologists do and suggesting that the epidemiological community is roughly 40 years behind the numerical weather prediction enterprise. They observed that both epidemiological and numerical weather prediction models are attempting to simulate complex systems with exponential error growth, and thus have great sensitivity to initial conditions.
So perhaps the experience of meteorologists, who spend much of their time thinking about how to improve weather forecasting, may be relevant to the current crisis.
The First Step in Prediction: Describing the Initial State of the System
To predict the future you need to know what is happening now. The better you can describe the initial starting point of forecasts, the better the forecast.
Meteorologists have spent 3/4 of a century on such work, first with surface observations and balloon-launched radiosondes, and later with radars and satellite observations. Billions have been invested in the weather observing system, which gives us a three-dimensional observational description of atmospheric structure. Big data. And we have learned how to quality control and combine the data with complex data assimilation techniques, with the resulting description of the atmosphere immensely improving our predictions. This work is completed operationally by large, permanent groups such as NOAA and NASA, with large interactions with the research community.
Contrast this to the unfortunate state of epidemiologists predicting the future of the coronavirus.
They have very little data on what is happening now. They don’t know who in the population is currently infected or has been infected. They don’t even know the percentage of the current population that is infected. Without such information, there is no way epidemiologists can realistically simulate the future of the pandemic. They are trying, of course, but the results have been disappointing.
What they do have is death information and limited testing of those that are sick, but that information is insufficient to determine the state of current and past infection in the community, or essential parameters such as transmission rate and mortality rates.
Obviously, the U.S. needs massive testing of the population to determine how the virus has invaded our communities and who is now immune. The lack of such testing is terrible failure of multiple levels of government.
But just as big a failure is the lack of random sampling of the population to determine the percentages of infection and how that varies around the nation.
We do have enough testing capability to do this (remember national political polls only use thousands of samples, not millions). Why is the epidemiological community and our political leaders not calling for such intelligent sampling of the population? With random sampling we would KNOW what is going on and not act out of ignorance (as we currently are muddling by). Why is the media not baying about this?
Quality control is another major problem faced by the epidemiological community, who deals multiple types of tests of various quality that need to be brought together to produce an integrated picture of reality. Death information is unreliable, because of non-reports or problems with determining the primary cause of death. Quality control is a difficult task, faced by the meteorological community as well, one that we have dealt with in our data assimilation systems (e.g., observations weighted by their past quality and sophisticated consistency checks).
Starting with an initial description of the system one is predicting (the 3-D atmospheric structure for meteorologists, the initial disease state of the population for epidemiologists), simulation models are used to predict the future.
Meteorologists use complex, full-physics models comprised of equations that predict the future evolution of the atmosphere. Then we apply statistical corrections to make the forecasts even better.
Epidemiologists use three types of forecast models:
- SEIR/SIR models is the most “traditional” approach, one in which the population is divided into different groups (susceptible, exposed, infected, recovered), using relatively simple equations to describe how folks move from one group to another, all of which have assumptions about how the disease is transmitted, the effects of social interactions and more. The UK Imperial Model is an example of this approach.
- Statistical models that don’t really simulate what is going on, but are really curve-fitting exercises, in which theoretical curves (often gaussians) are used to predict the future, adjusting the curves based on the evolution of disease in the past or at other locations. There are many assumptions in this approach and they cannot properly consider the unique characteristics of the region in question. The UW IHME model is a well-known user of this approach.
- Agent-based modeling actually try to simulate the community at an individual level and it is the most complex and computer intensive approach. Although dependent on several assumptions (such as the transmission rates between individuals) this approach is the closest to the numerical weather prediction used by meteorologists. The GLEAM model from Northeastern University (and others) is an example of this.
The trouble is that none of these epidemiological models have proven particularly skillful and produce vastly different results, something noted in some of the media, social media, and several new research papers. The UW IHME model, often quoted by local and national political leaders, has been particularly problematic (this paper describes some of the issues), including the fact that its probability forecasts are highly uncalibrated. The UK Imperial Model in mid-March predicted 1.1-1.2 million deaths in the U.S., even with mitigation (so far the U.S. death toll has been about 60,000). Many of the coronavirus prediction efforts have evinced unstable forecasts, with great shifts as more data becomes available or the models are enhanced.
The poor performance of these models in predicting the coronavirus is not surprising: the lack of testing and particularly the lack of rational random sampling of the population results in no viable description of what is happening now. The favored IHME model is only based on death rates, not on the infection state of the community. Can you imagine if meteorologists tried to predict weather only using data around active storms? Very quickly, the forecasts–even of storms–would become worthless. The same happens with coronavirus.
You cannot skillfully predict the future if you don’t have a realistic starting point. Furthermore, some of the models are highly simplistic and not based on the fundamental dynamics of disease spread (like the curve-fitting IHME approach).
The U.S. has a permanent, large, well-funded governmental prediction enterprise for weather prediction, one that has improved dramatically over the past decades. No such parallel effort exists in the government for epidemiological modeling. Instead, University groups, such as UW IHME, have revved up ad-hoc efforts using research models.
The Bottom Line:
Our government and political leadership have been making extraordinary decisions to close down major sectors of the economy, promulgating stay-at-home orders, moving education online, and spending trillions of dollars.
And they have done so with inadequate information. Decision makers don’t know how many people are infected or were infected. They don’t know how many people are already immune or the percentage of infected that are asymptomatic. They are using untested models that have not been shown to be reliable. This is not science-based decision making, no matter how often this term has been used, and responsibility for this sorry state of affairs is found on both the Federal and state levels.
The meteorological community has a long and successful track record in an analogous enterprise, showing the importance of massive data collection to describe the environment you wish to predict, the value of sophisticated and well-tested models to make the prediction, and the necessity to maintain a dedicated governmental group that is responsible for state-of-science prediction.
Perhaps this approach should be considered by the infectious disease community. and the experience of the numerical weather prediction community might be useful.