Guest Post by Willis Eschenbach
There are a number of lovely folks in this world who know how to use a shovel, but who have never sharpened a shovel. I’m not one of them. I like to keep my tools sharp and to understand their oddities. So I periodically think up and run new tests of some of the tools that I use.
Now, a while ago I invented a variant of Fourier analysis, that I called the “Slow Fourier Transform”. I found out later I wasn’t the first person to invent it—Tamino pointed out that it was first invented thirty years ago, and that it is actually called the “Date-Compensated Discrete Fourier Transform”, or DCDFT (Ferraz-Mello, S. 1981, Astron. J., 86, 619). Figure 1 below shows an example of the DCDFT method in use, a periodogram of the cycles in the sunspots:
Now, in Figure 1 we can see the familiar 11-year sunspot cycle in the data, along with somewhat weaker sunspot cycles of 10 and 12 years. It also APPEARS that we can see the claimed ~90-year “Gleissberg Cycle”.
However, a deeper examination of the sunspot data shows that the “Gleissberg Cycle” only exists in the first half of the data, and even there it only exists for a couple of cycles. Figure 2 shows a Complete Ensemble Empirical Mode Decomposition of the same sunspot data. The upper graph in Figure 2 shows the underlying empirical modes, and the lower graph shows their frequency:
Figure 2. CEEMD, annual average sunspot numbers. UPPER GRAPH: Panel 1 shows the raw sunspot data. Panels C1 through C7 show the seven empirical modes, in order of increasing period. The final panel shows the residual. If you add the bottom eight panels together, you get the raw data shown in the first panel. LOWER GRAPH: Periodograms of the empirical modes. These show the nature of the individual
The ~90-year purported “Gleissberg cycle” is shown in empirical mode C6. In the lower graph in Figure 2, we can see that after the 11-year cycles, C6 has the second-strongest cycle in the data … but in the upper graph, we can see that whatever signal exists, it is actually fairly short-lived, dying out after only a couple of cycles.
And that means that my periodogram shown in Figure 1 was misleading me—the peak at around 90 years was not actually significant. It only lasts a couple of cycles.
So I wanted to sharpen my periodogram tool so it would indicate which cycles are statistically significant. In the past I’ve tested my method by looking at periodograms of square waves, and of individual sine waves, combinations of sine waves and the like.
This time I thought “What I want to test next is something totally featureless, something like my imagination of the Cosmic Background Radiation. That would help me distinguish random noise from significant cycles.
Well, of course I don’t have the CBR to test my periodograms with, so here was my plan for generating some random noise.
I generated a series of sine waves at all periods from one year to thousands of years. They all had the same amplitude. Next, I randomized their phases, meaning that they all started at random points in their cycle. I figured, nothing could be more generic and bland than the sum of a bunch of sine waves of equal strength of all possible periods. Then I added them all together, and plotted the result.
Now, I’m not sure what I expected to find. Something like a hum, something kind of soothing. Or perhaps like on the ocean, when you have small wind-ripples on top of a chop on top of a swell with a bigger swell underneath that. Harmony of the spheres kind of thing is what I thought I’d get, complex but smooth like some mathematical BeeGee’s harmony… however, this was not the case at all. Figure 3 below shows a sample of one of the many different results I’ve generated by adding together thousands of sine waves of identical amplitude covering all the periods
Figure 3. Ten examples of what you get when you add together thousands of sine waves evenly blanketing an entire range of frequencies.
These results were surprising to me for several reasons. The first is their irregular, jagged, spiky nature. I’d figured that because these are the sum of smooth sine waves, the result would be at least smoothish as well … but not so at all.
The next surprise to me was the steepness of the trends. Look at Series 4 at the lower left of Figure 3. Note the size and speed of the rise in the signal. Or check out Series 3. There is a very steep drop in the middle of the record.
The next thing I hadn’t foreseen is the fractal, self-similar nature of the signal. Because it is composed of similar sine waves at all (or at least a wide range) of time scales, the variations at shorter time scales are very similar to variations at larger scales.
I was also not expecting the clear long-term cycles and trends shown in the various random realizations. Regarding the cycles, I had expected that the various sine waves would cancel each other out more than they did, particularly at longer periods.
And regarding the trends, I had thought that because none of the underlying sine waves contained a trend, then as a result the sum of them wouldn’t have much of a trend either. I was wrong on both counts. The signals contain both clear cycles and clear trends.
Another unexpected oddity, although it made sense after I thought about it, is that like a variety of natural climate datasets, these signals all have very high Hurst exponents. The Hurst exponent measures what has been described as the “long-term persistence” of a dataset. Since all of these signals are the sum of unchanging sine waves which assuredly have long-term persistence, it makes perfect sense that these signals also have a high Hurst exponent.
Upon contemplation, I also note that these series are totally deterministic, but with a very long repeat time. For example, the repeat time of all possible periods from 2 to 100 is 6.972038e+40 cycles.
The strangest part of all of this is that the signals look quite lifelike. By that, I mean that they look like a variety of climate-related records.Any one of them could be the El Nino Index, or the temperature of the stratosphere, or any of a number of other datasets.
So after I generated my random datasets composed solely of unvarying sine waves, I used my periodogram function to see what the apparent frequencies of the waves were. Here is a sample of a few of them:
Figure 4. Periodograms covering waves from one to 3200 cycles, in a dataset of length 12,800.
Now, at the left end of each of the graphs in Figure 4 we can see that the periodograms are accurate, showing all cycles as being the same small size. This is true up to about 100 cycles, or about 1/30 of the length of the dataset. But as we get further and further to the right, where we are looking at longer and longer cycles, we can see that we get larger and larger random peaks in the periodogram. These can be as large as forty or fifty percent of the total peak-to-peak range of the raw signal.
In order to gain a better understanding of what’s going on, I plotted all of the periodograms. Then I calculated the mean and the range of the errors, and developed an equation for how much we can expect in the way of random cycles. Figure 5 shows that result.
Figure 5. Periodograms of 100 datasets formed by adding together unvarying sine waves covering all periods up to the length of the dataset, in this case 12,800. Dotted line indicates the level below which we find 95% of the random data.
I also looked at the same situation at various dataset lengths, down to about 200 data points. Here, for example, is the situation regarding a random dataset of length 316, the same length as the annual sunspot record.
Figure 6. Periodograms of 100 datasets formed by adding together unvarying sine waves covering all periods up to the length of the dataset, in this case 316. Dotted line indicates the level below which we find 95% of the random data.
Now, this has allowed me to develop a simple empirical expression for the 95% confidence limit. As you can see, the error increases with increasing length of the period in question.
And this is the precise sharpening of the tool that I was looking for. Let me start by revisiting the first figure above, the periodogram of the sunspots, and I’ll use the same error measure of the amplitude of 95% of the random cycles:
Figure 7. As in Figure 1, but with the addition of the line showing the extent of 95% of the random errors as described above.
As you can see, this distinguishes the valid signal at 11 years from the two-cycle fluctuation at 88 years. If you compare this to Figure 6, you can see that a cycle at 88 years needs to be quite large in order to be statistically significant.
Now, I mentioned above that the random datasets generated by this method look very similar to natural datasets. As evidence of this, Series 7 in Figure 3 above is not a random dataset like the others. Series 7 is actually the detrended record of the historical variations in ∆14C, which I discussed in my previous post … compare that actual observational record to say Series 2. There’s not a lot of difference.
And this brings me to the reason for this post. I’ll start by quoting from my previous post linked just above, which discussed the results of a gentleman posting as “Javier”, who in turn used the results of Cliverd et al. If you have not read that post, please do so, as it is central to these findings. In that previous post I’d said:
Let me recapitulate the bidding. To get from the inverted 14C record shown in Figure 3 to the record used by Clilverd et al, they have
- thrown away three-quarters of the data,
- removed a purported linear trend of unknown origin from the remainder,
- subtracted a 7000-year cycle of unknown origin , and
- ASSERTED that the remainder represents solar variations with an underlying 2,300 year period …
The series shown as “Series 7” above is the result of the first two of those steps. As you can see, there is claimed to be a 7000-year signal that they say is “possibly caused by changes in the carbon system itself”. However, there is no reason to believe that this is anything other than a random variation, particularly since it does not appear in the three-quarters of the data that they’ve thrown away … but let’s set that aside for the moment and look at the result of subtracting the purported 7,000-year cycle from the ∆14C data. Here is the periodogram of that result:
Note that this seems to indicate a cycle of about 960 years, and another at about 2200 years … but are they statistically significant?
In the comments to my post, Javier replied and said that I was wrong, that there indeed is a ~2400-year cycle in the ∆14C data. I pointed out to him that a CEEMD (Complete Ensemble Empirical Mode Decomposition) shows that in fact what exists is several cycles of about 2100 years in length, and then sort of a cycle of 2700 years length, and then another short cycle. This result is seen in the empirical mode C9 below:
Figure 9. CEEMD of the ∆14C data after removal of the linear trend and a 7,000 year cycle. Panel 1 shows the raw ∆14C data. Panels C1 through C9 show the nine empirical modes, in order of increasing period. The final panel shows the residual. If you add the bottom eleven panels together, you recover the raw data shown in the first panel.
In empirical mode C9 above you can see the situation I described, with short cycles at the start and end and a long cycle in the middle.
Mode C8 is also interesting, as it has a clear regular ~1000-year cycle at the beginning. Strangely, it tapers off over the period of record to, well, almost nothing. Again, I see this as evidence that this is simply a random fluctuation rather than a true underlying cycle.
In my discussion with Javier, I held that in neither case are we seeing any kind of true underlying cyclicity. And my thanks to Javier for his spirited defense, as it was this question that has led me to sharpen my periodogram tool.
And to complete the circle, Figure 10 below shows what my newly honed periodogram tool says about the ∆14C data:
Figure 10. As in Figure 8, periodogram of the ∆14C data after removal of a linear trend of unknown origin and a 7,000 year cycle of unknown origin, but this time with the addition of the line showing the limit of 95% of the cycles created by the addition of sine waves.
I note that neither the ~ 1,000-year nor the 2,400-year cycles exceed the range of 95% of the random data. It also bears out the CEEMD analysis, in that the ~1000 year period shows more complete cycles, and more regular cycles, than the 2400 year period. As a result, it is closer to significance than the ~2400 year cycle.
Conclusions? Well, my conclusion is that while it is possible that the ~ 88-year “Gleissberg cycle” in the sunspots, and the ~1,000-year cycle and the ~ 2400-year cycle in the ∆14C data may be real, solid, and persistent, I find no support for those claims in the data that we have at hand. The CEEMD analysis shows that none of these signals are either regular or sustained … and this conclusion is supported by my analysis of the random data. The fluctuations that we are seeing are not distinguishable from random fluctuations.
Anyhow, that’s what I got when I sharpened my shovel … comments, questions, and refutations welcome.
My best to everyone, and my thanks again to Javier,
As Always: I, like most folks, can defend my own words and claims. However, nobody can defend themselves against a misunderstanding of their own words. So to prevent misunderstanding, please quote the exact words that you disagree with. That way we can all be clear regarding the exact nature of your objection.
In Addition: If you think I’m using the wrong method or the wrong dataset, please link to or explain the right method or the right dataset. Simply claiming that I am doing something the wrong way does not advance the discussion unless you can show us the right way.
More On CEEMD: Noise Assisted Data Analysis