Still Flying Blind: Can Meteorologists Help Epidemiologists with #Coronavirus?

Reposted from the Cliff Mass Weather Blog

Wednesday, April 29, 2020

Things are not going well these days regarding predicting the future of coronavirus in the U.S., with the epidemiological community, including critical government agencies, not succeeding in these important areas:

  • They do not know the percentage of the U.S. population with active or past COVID-19 infections.
  • They do not have the ability to quality control and combine virus testing information into a coherent picture of the current situation.  This is a big-data problem.
  • The epidemiological simulation models used by U.S government agencies or American universities have a poor track record in their predictions, with their quantification of uncertainty unreliable.

But there is a group in the U.S. with deep experience and a highly successful track record in predicting complex environmental threats.  A group that is masterful in taking observations, combining them to create a good description of reality, building and testing predictive models, providing uncertainty information, and communicating the information to decision makers for critical life-threatening situations.

You know them these people meteorologists involved in the large U.S. numerical weather prediction community.  And perhaps meteorologists can help epidemiologists and the U.S. government to get a handle on the coronavirus situation.

Now don’t take this blog as one uppity weather guy trying to give advice “outside his lane.”    A published paper in the Journal of Infectious Diseases (2016), said much of the same, with the authors noting the huge similarities in the work meteorologists and epidemiologists do and suggesting that the epidemiological community is roughly 40 years behind the numerical weather prediction enterprise.  They observed that both epidemiological and numerical weather prediction models are attempting to simulate complex systems with exponential error growth, and thus have great sensitivity to initial conditions.

So perhaps the experience of meteorologists, who spend much of their time thinking about how to improve weather forecasting, may be relevant to the current crisis.

The First Step in Prediction:  Describing the Initial State of the System

To predict the future you need to know what is happening now. The better you can describe the initial starting point of forecasts, the better the forecast.

Meteorologists have spent 3/4 of a century on such work, first with surface observations and balloon-launched radiosondes, and later with radars and satellite observations.  Billions have been invested in the weather observing system, which gives us a three-dimensional observational description of atmospheric structure.  Big data.  And we have learned how to quality control and combine the data with complex data assimilation techniques, with the resulting description of the atmosphere immensely improving our predictions.  This work is completed operationally by large, permanent groups such as NOAA and NASA, with large interactions with the research community.

Contrast this to the unfortunate state of epidemiologists predicting the future of the coronavirus.

They have very little data on what is happening now.  They don’t know who in the population is currently infected or has been infected.  They don’t even know the percentage of the current population that is infected.   Without such information, there is no way epidemiologists can realistically simulate the future of the pandemic.  They are trying, of course, but the results have been disappointing.

What they do have is death information and limited testing of those that are sick, but that information is insufficient to determine the state of current and past infection in the community, or essential parameters such as transmission rate and mortality rates.

Obviously,  the U.S. needs massive testing of the population to determine how the virus has invaded our communities and who is now immune.  The lack of such testing is terrible failure of multiple levels of government.

But just as big a failure is the lack of random sampling of the population to determine the percentages of infection and how that varies around the nation.

We do have enough testing capability to do this (remember national political polls only use thousands of samples,  not millions).  Why is the epidemiological community and our political leaders not calling for such intelligent sampling of the population?   With random sampling we would KNOW what is going on and not act out of ignorance (as we currently are muddling by).   Why is the media not baying about this?

Quality control is another major problem faced by the epidemiological community, who deals multiple types of tests of various quality that need to be brought together to produce an integrated picture of reality.  Death information is unreliable, because of non-reports or problems with determining the primary cause of death.  Quality control is a difficult task, faced by the meteorological community as well, one that we have dealt with in our data assimilation systems (e.g., observations weighted by their past quality and sophisticated consistency checks).

Simulation Models

Starting with an initial description of the system one is predicting (the 3-D atmospheric structure for meteorologists, the initial disease state of the population for epidemiologists), simulation models are used to predict the future.

Meteorologists use complex, full-physics models comprised of equations that predict the future  evolution of the atmosphere.  Then we apply statistical corrections to make the forecasts even better.

Epidemiologists use three types of forecast models:

  • SEIR/SIR models is the most “traditional” approach, one in which the population is divided into different groups (susceptible, exposed, infected, recovered), using relatively simple equations to describe how folks move from one group to another, all of which have assumptions about how the disease is transmitted, the effects of social interactions and more. The UK Imperial Model is an example of this approach.
  • Statistical models that don’t really simulate what is going on, but are really curve-fitting exercises, in which theoretical curves (often gaussians) are used to predict the future, adjusting the curves based on the evolution of disease in the past or at other locations.  There are many assumptions in this approach and they cannot properly consider the unique characteristics of the region in question. The UW IHME model is a well-known user of this approach.
  • Agent-based modeling actually try to simulate the community at an individual level and it is the most complex and computer intensive approach.   Although dependent on several assumptions (such as the transmission rates between individuals) this approach is the closest to the numerical weather prediction used by meteorologists. The GLEAM model from Northeastern University (and others) is an example of this.

The trouble is that none of these epidemiological models have proven particularly skillful and produce vastly different results, something noted in some of the media, social media,  and several new research papers.  The UW IHME model, often quoted by local and national political leaders, has been particularly problematic (this paper describes some of the issues), including the fact that its probability forecasts are highly uncalibrated.  The UK Imperial Model in mid-March predicted 1.1-1.2  million deaths in the U.S., even with mitigation (so far the U.S. death toll has been about 60,000).  Many of the coronavirus prediction efforts have evinced unstable forecasts, with great shifts as more data becomes available or the models are enhanced.

The poor performance of these models in predicting the coronavirus is not surprising:  the lack of testing and particularly the lack of rational random sampling of the population results in no viable description of what is happening now.  The favored IHME model is only based on death rates, not on the infection state of the community.   Can you imagine if meteorologists tried to predict weather only using data around active storms? Very quickly, the forecasts–even of storms–would become worthless.  The same happens with coronavirus.

You cannot skillfully predict the future if you don’t have a realistic starting point.  Furthermore, some of the models are highly simplistic and not based on the fundamental dynamics of disease spread (like the curve-fitting IHME approach).
The U.S. has a permanent, large, well-funded governmental prediction enterprise for weather prediction, one that has improved dramatically over the past decades.  No such parallel effort exists in the government for epidemiological modeling.  Instead, University groups, such as UW IHME, have revved up ad-hoc efforts using research models. 

The Bottom Line:
Our government and political leadership have been making extraordinary decisions to close down major sectors of the economy, promulgating stay-at-home orders, moving education online, and spending trillions of dollars. 

And they have done so with inadequate information.  Decision makers don’t know how many people are infected or were infected. They don’t know how many people are already immune or the percentage of infected that are asymptomatic.  They are using untested models that have not been shown to be reliable.  This is not science-based decision making, no matter how often this term has been used, and responsibility for this sorry state of affairs is found on both the Federal and state levels.
The meteorological community has a long and successful track record in an analogous enterprise, showing the importance of massive data collection to describe the environment you wish to predict, the value of sophisticated and well-tested models to make the prediction, and the necessity to maintain a dedicated governmental group that is responsible for state-of-science prediction. 

Perhaps this approach should be considered by the infectious disease community. and the experience of the numerical weather prediction community might be useful.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

110 Comments
Inline Feedbacks
View all comments
Me@Home
April 29, 2020 7:18 pm

Regarding testing for this virus, I have not seen anything which gives me confidence that the testing being done anywhere is specific to this virus. The testing started early so I suspect that it is not specific but for a range of such viruses. Can anyone help me here?

LdB
Reply to  Me@Home
April 29, 2020 7:57 pm

Well you put “How does a coronavirus test work” in an internet search engine and then you start reading.

The short answer you won’t understand is they rely on either
1.) immunoassay (detection of proteins associated with the virus ) .. marginal error rate
2.) They detect nucleic acid (virus’s genetic code) .. low error rate

However each test kit when accepted by a country will publish the expected error rate.

Now go read.

John Piccirilli
Reply to  Me@Home
April 30, 2020 5:34 am

Imperial model was made 13years ago

John F. Hultquist
April 29, 2020 9:53 pm

On Monday, April 20 (I missed it at the time) there was an opinion by Andy Kessler in the Wall Street Journal. I made it mid-way in the first of 4 columns and started to laugh.
The title is “Upgrade Our 8-Track Government.”

Do your best to find and read this.
– – – –

We live in Washington State about 100 miles east of Seattle. The county has a large area and a small scattered population – students are gone from the local university, and so on.
As of today the county has 14 confirmed cases with maybe 650 tested. Number of deaths = Zero.
This is like living in a rain shadow in a state noted for rain and mist. Meteorologists might be able to help with such issues.

Reply to  John F. Hultquist
April 29, 2020 10:04 pm

You’re getting F’d with a one-size fits all approach because Inslee is desperate and deranged to see Trump gone.

J Mac
April 29, 2020 10:09 pm

Proposing we exchange one set of epidemiologists and their weak models for another set of meteorologists pushing bad weather/climate models, seeking to ‘trespass’ well beyond their skill set into epidemiology, is the height of arrogance unsupported by evidence of reproducible achievement.

Steven Mosher
April 29, 2020 10:51 pm

” Without such information, there is no way epidemiologists can realistically simulate the future of the pandemic. They are trying, of course, but the results have been disappointing.”

maybe cliff should look at spaghetti plots for Hurricanes?

With a huge data base of past hurricanes, with real time information,
weather forecasters still can only output spaghetti

Useful spaghetti, but spaghetti

Cliff Mass
Reply to  Steven Mosher
April 30, 2020 8:20 am

Steve…don’t understand your point. Spaghetti plots are a way, one way, of displaying ensemble forecasts. This is useful information to see the variability in the solutions. What is the issue?..cliff

MarkW
Reply to  Steven Mosher
April 30, 2020 11:09 am

Compare those spaghetti plots with plots from 20 to 30 years ago. They have gotten much better.
Unlike climate models.

Steven Mosher
April 29, 2020 11:26 pm

“We do have enough testing capability to do this (remember national political polls only use thousands of samples, not millions). Why is the epidemiological community and our political leaders not calling for such intelligent sampling of the population? With random sampling we would KNOW what is going on and not act out of ignorance (as we currently are muddling by). Why is the media not baying about this?

Huh?

There are several different purposes for testing, Figuring out the percentage infected is LOW
on the priorities.

First for PCR testing ( testing whether you CURRENTLY HAVE COVID) you do NOT WANT
a RANDOM sample.

Since you cant test everyone you want to prioritize THOSE WITH SYMPTOMS.
Why? so you can isolate and track the positives and get them the medical help they need.
So for DIAGNOSTIC TESTING ( PCR testing DO YOU HAVE IT NOW!!!) you do not want a
random sample, and you would not even KNOW HOW TO CONTSTRUCT a random sample.
would you sample mean and women equally? all ages equally? smokers non smokers?
various BMI? city dwellers? commuters? police? all these factors can effect the positive
rate you would find if you did it “randomly” because defining random cant be done in a vacuum.

So nobody with a brain is calling for RANDOM diagnostic testing, because diagnostic testing has
a PURPOSE: Identify why this person has a fever or is short of breath or has lost their sense
of smell.
That’s why in Korea they test those with symptoms and their contacts. not random, because it has
a purpose. Find the sick, find the higher probability asymptomatic and treat and isolate.

So maybe you are talking about serological testing?

This is blood test to tell if you HAD covid.

There are three purposes here.
1. Employee screening. You’d like to know how health workers, teachers, transit employees,
police, fire, people with HIGH CONTACT jobs are doing. Can you them some measure
of comfort. within these groups new york is testing randomly. It has no policy
implications.

2. Plasma donor screening. Since you have a pool of people you know had the disease, you probably
don’t need t do random testing to find more donors. but you could if tests were free and widely
available. No policy implications.

3. Disease prevalence testing. there is an academic question how many people had the disease
This allows you t better estimate the death rate. But this is not that important. We already know
the disease is deadly enough. Count the bodies.

So there is One reason you want random testing: disease prevalence and adjusting death rates.
The problem here again is how you construct a random sample. Lets take New Yorks recent
attempt to randomly sample: they went to grocery stories and got people. Was that random?
Well it all depends. Did they walk to the store? take a car? or mass transit? Do they live in
a house, large apartment building? alone? with others? work in a large office or small
Another approach is to use a marketing firm like they did in LA. Random with respect to age, sex and race
But was random with respect to Social distancing? did they select people who were heavily skewed
to those staying home 100% of the time pr not?
You see random isn’t all that easy.. because the sample has to be REPRESENTATIVE with respect to
factors that may drive infection.

Last, we won’t have widespread serology for a while. It takes time

Cliff Mass
Reply to  Steven Mosher
April 30, 2020 8:24 am

Steve,
When we only had the ability for a few tests, then testing only those with symptoms should have priority. But even then, a small percentage of the tests should have been used for random sampling. Today, we have enough testing capability to do both and many tests are being used for cases that are unlikely to be COVID.. We are closing down vast parts of our economy without knowing the real story….random testing is the only way to do it. I am sure we can figure out reasonable ways to do the sampling…some might even use blood samples that have already been collected for other reasons…cliff

Steven Mosher
April 29, 2020 11:39 pm

“Agent-based modeling actually try to simulate the community at an individual level and it is the most complex and computer intensive approach. Although dependent on several assumptions (such as the transmission rates between individuals) this approach is the closest to the numerical weather prediction used by meteorologists. The GLEAM model from Northeastern University (and others) is an example of this.”

effective agent based modelling is difficult. It’s like modelling the molecules that make up the wind.
And yes they are dependent on assumptions. WHY? because there is no physics of disease spread.
Modelers have to decide.. Do we model schools? businesses? mass transit? homes?
Do we model how many times young people meet old people? ( some do). Do we have any
data on how and where people meet? How many hermits are there? Hw many social butterflies?
Do we model gyms? churches? weddings? funerals? concerts? sports venues?

Structurally they are “like” weather models or models we use to predict war.
Biggest difference?

in a weather model YOU KNOW THE DATA YOU WANT TO COLLECT and that data happens every day
With a disease what data do you want to collect? Comorbidities turns out to be important!
Well that’s private data, ask yourself… do we have a table that segments the US population
by

1. Age
2. gender
3. Comorbidities
4. Occupation
5. Social interaction
6. Smoking
7. Household size
8. method of transportation

Thats just a start.

Imagine you were building a weather model and the only data you could get was
Did it rain?

Given that, predict the rest of the stuff of interest.

April 29, 2020 11:58 pm

It is odd that governments of advanced countries do not involve their data collection and analysis agencies. In the UK with have the ONS, the office of national statistics. The produce good quality data that the gov can rely on.
They produce lists of random members of the population and then call or visit then to gather data.
Such a list of data subjects could easily include the words “you have been randomly selected to be checked for testing for covid19. Please book a visit to your nearest drive in test centre”
We have the testing infrastructure in place. All we need is a list of randomised subjects.
I wonder why they do not do it?

Steven Mosher
Reply to  Steve Richards
April 30, 2020 1:10 am

That would not give you a random sample.

“We have the testing infrastructure in place. All we need is a list of randomised subjects.
I wonder why they do not do it?”

1. because the testing infrastructure is being used to Diagnose those with symptoms .
you don’t learn anything by randomly sampling in your method, because it’s not
random. it might, for example, bias toward people who have cars and dont ride mass transit.
it might bias toward people who are stay at home mom’s with time on their hands. heck
in Korea I get texts 2-3 times a day saying ‘A person with Covid was within 5 minutes
of you, please go to testing” Do I? of course not. I wash my hands and don’t touch my face.
I wear a mask. I have no symptoms. Not gonna drive to a test. I don’t own a car.

2. because random sampling is HARD.

let me explain why random sampling is hard.

Lets start with a simple example where you know some things. (numbers made up to make it
easy to understand)

1. you know that men are about 50% of the population and women the other 50.
2. you want to understand the average weight of americans
3. you randomly sample .
4. you compute the average. 1t’s 160 lbs.
5. you check your sample.. it 60% men. OPPS
6. You adjust your average based on the population.
OR
7. you sample again taking care to select 1000 men (50%) and 1000 women (50%)
Now you can report the average and the average by gender.

Why? because weight varies with gender. So you have to control your random sampling
with respect to variables that MATTER, so you dont have a bias.

Now you are testing for covid.

1. you dont know what proportion of the population are BAD HAND WASHERS.
You dont know what portion are constant face touchers.
In reality, if you knew, imagine that 25% are bad hand washers and constant face touchers
2. You “randomly” choose 1000 names from some marketing database.
3. you dont know what portion are bad hand washers and constant face touchers.
4. your “random” test shows 50% of your sample is infected.
5. you never ask the question “do you wash your hands properly with soap for at least 20 seconds”

what can you conclude from the “random” test?
Was your sample representative of FACTORS THAT MATTER?
Do you even know what factors matter? you know gender matters to weight, what matters t becoming
covid infected? hand washing? train riding? you dont know with a lot of certainty.
So it could be that your random sample had 50% bad hand washers as opposed to the population
average of 25%. you dont know unless you
1. know the factors that contribute to being infected.
2. select your random sample in line with these factors.

random sampling is good, the issue is the sample also has to be representative of unknown factors that contribute to getting infected

David Yaussy
Reply to  Steven Mosher
April 30, 2020 5:54 am

Steve, I’ve gotten on you in the past for cryptic, drive-by flamethrowing. Recent posts , while still somewhat terse, are helpful in explaining your thoughts and have been a huge improvement. Thank you.

Reply to  Steven Mosher
April 30, 2020 6:18 am

Steve
Good points. But we are still left with a population sample which is very useful
In Scotland, 50k people have been tested, 10k are positive. Probably from people showing symptoms. A good sample size.
First define what the sample is and how it was obtained. Everyone is clear on what you have measured and what questions you can reasonably ask. i.e. define the limits of the study.
Then you can apply randomness to your sample. For the 50k population, Take say, 200 random samples. There are proven methods of how to do this. For instance, the more random samples from your population sample you have, the closer the mean of means approaches the true sample mean.
The normal stats analyses can then be applied.
This of course ONLY applies to your original sample population. Not to the whole population.
People are rightfully suspicious of statistics. But done properly they are integral to understanding problems.

Adam Gallon
Reply to  Steve Richards
April 30, 2020 1:23 am

We don’t have the testing infrastructure in place.
There are testing stations, but few & far between.
The major problem in the UK & I’m guessing other countries, is that our local Environmental Health services, have been largely dismantled, as their need had been removed, with infectious diseases controlled by vaccination & drug treatments.
Thus the people are no longer there, to go & visit those who are selected for testing.
What will you respond to better, somebody from your local area, ringing your door bell, telling you who they are and asking you if they can do a test & explaining why, or a letter from some faceless bureaucrat, telling you to do a round trip of say 25 miles & report to your test centre?

Adam Gallon
April 30, 2020 1:28 am

Testing’s all well & good, practical measures are needed to control the infection.
Like working out where people are most likely to contract the infection?
The answer’s probably in hospital.
https://www.medrxiv.org/content/10.1101/2020.04.14.20065730v1
44% of the Chinese patients in this study, contracted Covid-19 in hospital.
Conclusion?
Dedicated facilities for handling suspected Covid-19 patients.

April 30, 2020 1:54 am

Cliff,
The problem with this epidemiology etc starts with two lacks.
1. There is a lack of private enterprise incentivisation at all stages of the work. Bureaucracy again makes the mistake of assuming it can be better, contrary to history in many fields.
2. There is no clear money path promoting the profit motive, thus for allowing reward for excellence and a kick up the backside for underperformance.

We have seen these 2 factors help to make climate research so poor.

Here is an example. Try, as an interested citizen scientist, to download daily atmospheric, global, digital data for CO2 for the year 2020 to date. It is an unholy mess of poor data, missing data, contradictory data and especially data that the primary agencies will not release (with the exception of Scripps and NOAA at the one site of Mauna Loa, in my efforts to date). Geoff S

Steven F Cords
April 30, 2020 4:47 am

They do not know the percentage of the U.S. population with active or past COVID-19 infections.
Test the sewage. Results are much quicker than waiting for someone to get sick or other testing methods.

Tom Abbott
April 30, 2020 8:52 am

From the article: “The UK Imperial Model in mid-March predicted 1.1-1.2 million deaths in the U.S., even with mitigation (so far the U.S. death toll has been about 60,000).”

Isn’t this old news and no longer relevant?

If I recall correctly, a mathematical error was found in this model and was subsequently corrected.

Tom Abbott
April 30, 2020 9:00 am

From the article: “The lack of such testing is terrible failure of multiple levels of government.”

It’s a failure on past governments to foresee such needs.

Current governments had this problem dumped in their laps unannounced and are now playing catch-up.

They had to practically start from scratch but they have done an outstanding job of ramping things up. This is not a failure. Nobody is calling a failure on ventilators now because Trump managed to ramp up production from nothing to a surplus in a matter of weeks.

The same thing is happening with the testing which is getting better and more available every day and will soon be in the category of the ventilators: More than we need.

Editor
April 30, 2020 9:46 am

Both weather and pandemics models are subject to sensitivity to initial conditions problems — as known from Chaos Theory — thus short term predictions are as good as our knowledge of the system and and skill of our models — but mid- and long-term predictions break down and produce wildly differing results, even from initial conditions data that are very similar.

No Weatherman would attempt to predict whether it would rain on any particular day in a particular city a year from now. Weather is far too complex and chaotic (in the Chaos Theory sense) — but because Weatherpersons have a hundred years of recorded weather history to look back on, they can tell you when the rainy season is in Cincinnati, and maybe even a probability figure for rain in the first week of June.

Pandemics are far easier than weather — less complex, less chaotic. Influenza pandemics also have a known history and we have past experience with them. They are not all the same, but they are very similar.

Let the Weatherpersons inform the Epidemiologists — teach them from their weather experience.

Note that epidemiologists not involved in the fame-producing public-posturing Panic Game have far different views than those that created the world-wide economic crash.

John P.A. Ioannidis has produced a couple of papers pointing out the misguided lockdowns — Knut Wittkowski has been so outspoken that YouTube pulled one of his video interviews as “dangerous” as he basically says that the school closings and lockdowns have produced negative effects and voices lack of support for social distancing – except for those at real risk: Old Folks and those with serious co-morbidities.

pochas94
May 1, 2020 6:07 am

Why do we need testing? Either you’re sick or you’re not. All of this data gathering is just job security for academics. If an epidemic is starting: >65 with comorbidities, quarantine, the rest of you stay home if you’re sick (cough or fever).

niceguy
May 3, 2020 10:17 pm

For the same reasons models don’t work, can’t be trained, and depend on factors nobody can measure, the “stop Covid” bluetooth contact app can’t be trained, can’t be parameterized correctly, and can’t be useful.

May 6, 2020 7:01 pm

I would be looking at long range weather forecasts to predict spikes in the daily death totals, maybe when the Arctic Oscillation shifts negative.

Verified by MonsterInsights