Guest Post by Willis Eschenbach
A couple of days ago, I got to looking at the daily record of US deaths from the coronavirus. It’s shown in Figure 1 below:
Figure 1. US daily deaths. Created on May 5, but shows May 4th data.
So … have the US deaths peaked, and if so when? Hard to tell. However, I looked at that graph in Figure 1 and I thought “It looks like the data might be reflecting lower counts on the weekends”.
Now, my go-to method for determining the existence, period, and amplitude of underlying repeating cycles in a dataset is the curious method called “CEEMD”. That stands for Complete Ensemble Empirical Mode Decomposition. I discuss the method here. It is a way to decompose a signal into underlying signals. It’s called “complete” because when you add all the underlying signals back together, it gives you back the original signal.
Once all possible underlying cycles have been removed from the data, what remains is called the “CEEMD Residual”. This residual is an excellent indicator of the overall trend of the data. Here is an overview of the CEEMD decomposition of the daily deaths data shown in Figure 1.
Figure 2. CEEMD complete decomposition of the data shown in Figure 1. The top panel is the raw data. Panels C1 to C4 are the empirical modes. Finally, at the bottom is the CEEMD residual.
As you can see, two of the four empirical modes (C2 and C4) are weak, with very low amplitude. Modes C1 and C3, on the other hand, show a much stronger signal. We can see the periods and strengths of each of the empirical modes C1-C4 in Figure 3, which shows the periodogram of each of the empirical modes C1-C4.
Figure 3. Periodograms of each of the empirical modes shown in Figure 2. The strongest signal is the seven-day signal, showing that my guess about weekends was likely correct. There is also a significant amount of energy in the first overtone of the 7-day signal, with a period of 3.5 days.
So … how does this analysis work out in practice? Here is the same data as in Figure 1, along with the CEEMD residual.
Figure 4. US daily deaths, along with the CEEMD residual. Data from May 4th, analyzed May 6th.
Well, I’d have to say that that looks like good news … it would be excellent if we were indeed 20 days past the peak.
And here is a look with the underlying 7-day signal overlaid on the daily data.
Figure 5. As in Figure 4, but overlaid with the seven-day empirical mode signal (Mode C3). The overlaid empirical mode C3 is shown for illustrative purposes only. You can see that when the empirical mode is added to the residual it will be a good match to the data.
This is a most interesting result. It shows one of the reasons that I use the CEEMD analysis—it breaks the raw data down into meaningful underlying signals. In this case, early in the spread of the virus at the left-hand side of the graph, the 7-day signal (blue line) was quite small. But now that there are a large number of deaths the 7-day signal is much larger. It is this kind of a result that is unobtainable by say standard Fourier analysis.
Finally, I prefer the CEEMD residual method over say a Gaussian smooth because it goes all of the way out to both the start and finish of the data. Not only that, but the information out near the ends is meaningful. Here’s a comparison of the CEEMD residual with a Gaussian filter.
Figure 6. Daily US deaths, CEEMD residual, and 7-day Full-Width to Half Maximum (FWHM) Gaussian smooth of the data. This is data from May 4th, processed on May 6th. Treatment of the Gaussian smooth near the endpoints is discussed in the Appendix here.
As you can see, the Gaussian smooth is high at the start of the daily deaths data, and low at the end of the data. The Gaussian smooth is dropping at the right-hand end, and the CEEMD Residual is turning upwards.
And two days later, here’s the situation:
Figure 7. More recent data, from May 6th, daily deaths and CEEMD Residual
At the right-hand end of the graph, the CEEMD residual was already foreshadowing the turn from decreasing to increasing, at the same time that the Gaussian smoothing was wrongly indicating a further decrease (see Figure 6). As I said, the CEEMD residual contains important information out at the ends.
Conclusions? Well, my first one would be that attempting to analyze coronavirus death data without removing the repeating weekly variations is … well, I’ll call it “overly optimistic” and leave it there.
My next conclusion is that the CEEMD residual is an excellent indicator of the ever-changing and oft-deceptive central tendency in time series data.
Next, about a week ago the CDC changed its guidance on the reporting of deaths involving the COVID virus. Rather than make an explicit distinction between deaths WITH coronavirus and deaths FROM coronavirus, they said to enter COVID-19 on the death certificate if the physician SUSPECTS that the coronavirus MIGHT have CONTRIBUTED to the death … “suspects the virus might have contributed” to the death??? Could they possibly be more vague?
The size of the effect of this change on the way the US reports the death count is unknown, but it can only increase the purported count, not decrease the count. As a result, we cannot be sure that the increase in deaths is real and not just a change in reporting
Finally, it appears that the US has peaked in terms of daily deaths. Might be another peak to come, might be two more peaks, might be no more peaks, but in any case but it appears we’ve passed the first peak.
Stay well, dear friends. When I was a young man, an old geezer (who was likely about my age now) told me “Son, when you have your health you have everything!”
But back then, I didn’t understand …
PLEASE: Quote the exact words you are discussing in your comment. This avoids endless misunderstandings and problems.