Guest Post by Willis Eschenbach
I’ve developed a curious kind of algorithm that I’ve humorously called “slow Fourier analysis”, but which might be better described as a method for direct spectral analysis. The basic concept is quite simple. I fit a sine wave with a certain cycle length, say 19 months, to the dataset of interest, and I note the peak-to-peak amplitude of this best-fit sine wave. I repeat this process for the various cycle lengths of interest, and put that all together as a periodogram. The periodogram shows how large of a sine wave is the best fit for the data for each of the cycle lengths. This method has worked a treat, but yesterday, this odd method of mine handed me a big surprise.
Now, I have been doing these periodograms utilizing only the time periods inherent in the data. So if the data was monthly, I’ve looked at all possible integral lengths of monthly cycles, starting with two month, three month, four month, five month cycles, and so on up to cycles with periods as long as a third the number of months in the data.
And on the other hand, if the data was yearly, I’ve looked at all possible integral lengths of yearly cycles, starting with two year cycles, three year cycles, four years, and so on up to a third of the number of years in the data.
(I mistrust reports of the longer cycles, and any cycle with a period longer than a third of the dataset is not really worth calculating—it will fool you every time.)
So I was going along in blissful ignorance when an alert WUWT reader, Greg Goodman, pointed out that there was no reason to restrict myself to integral periods. He noted that I could subdivide the results and gain greater resolution. I was dismissive of his idea. I said that I thought you could do it, and that he was right and it was an interesting idea, but I said I thought you wouldn’t gain any real resolution by doing that. Hang on, let me get my exact words and the graphic I used in that thread to illustrate my point … here’s what I said:
In any case, I’m still not convinced that the procedure will give any real increase in information. Here’s the difference, for example, between monthly and annual data:
I think that if I increase the resolution on the sunspot periodogram while still using the annual data, it won’t look anything like the real monthly data … I’ll report back on that one.
Well … this is the promised “report back”. As you can see, both the monthly and the annual data show the peak at 11 years. The monthly data shows more detail, as it resolves the individual peaks, at 10, 11, and ~ 11.9 years. I thought that the test I outlined would be a good one. I’d see what happened when I sampled annual data at monthly intervals. So here’s what I did. I took the SIDC monthly sunspot data. For the annual data, I didn’t use the SIDC annual data. Instead, I averaged the SIDC monthly data to give me annual data that I knew was a true annual average of that particular monthly dataset. Figure 2 shows the monthly data in red, and the annually averaged monthly data in black.
So to be clear, the data shown as the black line is the annual average of monthly data shown as the red line
Next, I modified my periodogram-generating computer function to allow for fractional time sampling of the data.
Finally, I calculated three periodograms. One is the periodogram of the annually averaged data, shown in yellow/black below. The second is the periodogram of the underlying monthly data, shown in blue.
And finally, I calculated the third periodogram (shown in cyan) using the annually averaged data, but sampled on a monthly basis … Figures 3 and 4 show those results. I must confess, when I saw them my jaw hit the floor.
Figures 3 and 4. Upper panel shows the periodogram of annual sunspot data (yellow/black), monthly sunspot data (blue), and monthly-sampled annual data (cyan). Lower panel is a closeup of the upper panel, showing the period from seven to twenty-five years.
You can see why my jaw hit the floor. The blue line (periodogram of actual monthly data) is almost identical to the cyan line (periodogram of annual data sampled at monthly intervals). Who knew?
It turns out that contrary to my expectation, the information about the strength of the monthly cycles is NOT lost when the individual monthly values are subsumed into annual averages. Somehow, that information still exists and can be retrieved. When I take the annual data and analyze it at monthly frequencies, the data is still there. Not exactly, to be sure, as you can see the process is not perfect or precise.
But it is astoundingly accurate. I absolutely didn’t expect that.
Now, I suppose some signal analysis guy will pop up and say he knew that all along and abuse me for my colossal ignorance … but I taught myself this stuff, and all I can say is, it sure was a surprise to me.
Truly, I don’t understand this result. For many, many years I’ve thought that when you average data in chunks, like say averaging monthly data into yearly data, that all of the monthly information was gone. Lost. Kaput. Irretrievable.
But that doesn’t seem to be the case at all. It seems that very little of the power spectrum information was lost at all as a result of the annual averaging process.
Naturally, of course, this brings up the next question—is this procedure invertible? That is to say, given a periodogram such as the one above, can I run the process in reverse? Can I start from the annual data, calculate the monthly periodogram from that annual data, and then invert the periodogram to give me back the monthly data? That would be really amazing … but I’m pessimistic.
My guess is no, the periodogram can’t be inverted to me give back the monthly data … but given how poor my last pathetic guess (at the head of this post) was, I’ll certainly give it a try. Any assistance gladly accepted.
Like I said above … once again all I can say is, I’ll report back on that one.
Regards to all,
Acknowledgement: My thanks to WUWT reader Greg Goodman for the suggestion to investigate fractional time periods.
For Clarity: If you disagree with something someone says, please quote the exact words you disagree with. It avoids all kinds of misunderstandings.
Data and Code: I’ve collected the R code, the R functions, and the data into a small (22KB) zipped folder called “Oversampling Folder“. It includes the monthly and annual sunspot data as CSV files. If you change your R workspace to the folder, it should be turnkey.
For the original data:
SIDC Monthly Sunspot Data Note that following the advice of Leif Svalgaard, I have increased all of the pre-1947 SIDC sunspot values by 20%, to correct for the change in counting methods at that time. The change is immaterial for this analysis.
Changes In The R Function: The function to generate the periodogram, named “sineptop” (for sine peak-to-peak), was previously called like this:
sineptop(annual_data, frequency = 1)
or like this:
sineptop(monthly_data,frequency = 12)
The “frequency” variable identifies the data as having that many periods per year.
It can still be called like that. In addition, however, the new sineptop syntax for e.g. monthly sampling an annual dataset looks like this:
sineptop(annual_data, frequency = 1, by = 1/12)
If the “by” variable is not specified it is assumed to be 1, so you don’t need to specify it, and the original syntax still works.