Well, Color Me Gobsmacked!

Guest Post by Willis Eschenbach

thumb slide ruleI’ve developed a curious kind of algorithm that I’ve humorously called “slow Fourier analysis”, but which might be better described as a method for direct spectral analysis. The basic concept is quite simple. I fit a sine wave with a certain cycle length, say 19 months, to the dataset of interest, and I note the peak-to-peak amplitude of this best-fit sine wave. I repeat this process for the various cycle lengths of interest, and put that all together as a periodogram. The periodogram shows how large of a sine wave is the best fit for the data for each of the cycle lengths. This method has worked a treat, but yesterday, this odd method of mine handed me a big surprise.

Now, I have been doing these periodograms utilizing only the time periods inherent in the data. So if the data was monthly, I’ve looked at all possible integral lengths of monthly cycles, starting with two month, three month, four month, five month cycles, and so on up to cycles with periods as long as a third the number of months in the data.

And on the other hand, if the data was yearly, I’ve looked at all possible integral lengths of yearly cycles, starting with two year cycles, three year cycles, four years, and so on up to a third of the number of years in the data.

(I mistrust reports of the longer cycles, and any cycle with a period longer than a third of the dataset is not really worth calculating—it will fool you every time.)

So I was going along in blissful ignorance when an alert WUWT reader, Greg Goodman, pointed out that there was no reason to restrict myself to integral periods. He noted that I could subdivide the results and gain greater resolution. I was dismissive of his idea. I said that I thought you could do it, and that he was right and it was an interesting idea, but I said I thought you wouldn’t gain any real resolution by doing that. Hang on, let me get my exact words and the graphic I used in that thread to illustrate my point … here’s what I said:

In any case, I’m still not convinced that the procedure will give any real increase in information. Here’s the difference, for example, between monthly and annual data:

periodogram monthly and annual

I think that if I increase the resolution on the sunspot periodogram while still using the annual data, it won’t look anything like the real monthly data … I’ll report back on that one.

Well … this is the promised “report back”. As you can see, both the monthly and the annual data show the peak at 11 years. The monthly data shows more detail, as it resolves the individual peaks, at 10, 11, and ~ 11.9 years. I thought that the test I outlined would be a good one. I’d see what happened when I sampled annual data at monthly intervals. So here’s what I did. I took the SIDC monthly sunspot data. For the annual data, I didn’t use the SIDC annual data. Instead, I averaged the SIDC monthly data to give me annual data that I knew was a true annual average of that particular monthly dataset. Figure 2 shows the monthly data in red, and the annually averaged monthly data in black.

SIDC sunspot numbers monthly and annually averagedFigure 2. Monthly SIDC sunspot data (red), and the same monthly SIDC data annually averaged (black). 

So to be clear, the data shown as the black line is the annual average of monthly data shown as the red line

Next, I modified my periodogram-generating computer function to allow for fractional time sampling of the data.

Finally, I calculated three periodograms. One is the periodogram of the annually averaged data, shown in yellow/black below. The second is the periodogram of the underlying monthly data, shown in blue.

And finally, I calculated the third periodogram (shown in cyan) using the annually averaged data, but sampled on a monthly basis … Figures 3 and 4 show those results. I must confess, when I saw them my jaw hit the floor.

periodograms sidc annual monthly oversampledcloseup periodograms sidc annual monthly oversampled

Figures 3 and 4. Upper panel shows the periodogram of annual sunspot data (yellow/black), monthly sunspot data (blue), and monthly-sampled annual data (cyan). Lower panel is a closeup of the upper panel, showing the period from seven to twenty-five years.

You can see why my jaw hit the floor. The blue line (periodogram of actual monthly data) is almost identical to the cyan line (periodogram of annual data sampled at monthly intervals). Who knew?

It turns out that contrary to my expectation, the information about the strength of the monthly cycles is NOT lost when the individual monthly values are subsumed into annual averages. Somehow, that information still exists and can be retrieved. When I take the annual data and analyze it at monthly frequencies, the data is still there. Not exactly, to be sure, as you can see the process is not perfect or precise.

But it is astoundingly accurate. I absolutely didn’t expect that.

Now, I suppose some signal analysis guy will pop up and say he knew that all along and abuse me for my colossal ignorance … but I taught myself this stuff, and all I can say is, it sure was a surprise to me.

Truly, I don’t understand this result. For many, many years I’ve thought that when you average data in chunks, like say averaging monthly data into yearly data, that all of the monthly information was gone. Lost. Kaput. Irretrievable.

But that doesn’t seem to be the case at all. It seems that very little of the power spectrum information was lost at all as a result of the annual averaging process.

Naturally, of course, this brings up the next question—is this procedure invertible? That is to say, given a periodogram such as the one above, can I run the process in reverse? Can I start from the annual data, calculate the monthly periodogram from that annual data, and then invert the periodogram to give me back the monthly data? That would be really amazing … but I’m pessimistic.

My guess is no, the periodogram can’t be inverted to me give back the monthly data … but given how poor my last pathetic guess (at the head of this post) was, I’ll certainly give it a try. Any assistance gladly accepted.

Like I said above … once again all I can say is, I’ll report back on that one.

Regards to all,

w.

Acknowledgement: My thanks to WUWT reader Greg Goodman for the suggestion to investigate fractional time periods.

For Clarity: If you disagree with something someone says, please quote the exact words you disagree with. It avoids all kinds of misunderstandings.

Data and Code: I’ve collected the R code, the R functions, and the data into a small (22KB) zipped folder called “Oversampling Folder“. It includes the monthly and annual sunspot data as CSV files. If you change your R workspace to the folder, it should be turnkey.

For the original data:

SIDC Annual Sunspot Data

SIDC Monthly Sunspot Data  Note that following the advice of Leif Svalgaard, I have increased all of the pre-1947 SIDC sunspot values by 20%, to correct for the change in counting methods at that time. The change is immaterial for this analysis.

Changes In The R Function: The function to generate the periodogram, named “sineptop” (for sine peak-to-peak), was previously called like this:

sineptop(annual_data, frequency = 1)

or like this:

sineptop(monthly_data,frequency = 12)

The “frequency” variable identifies the data as having that many periods per year.

It can still be called like that. In addition, however, the new sineptop syntax for e.g. monthly sampling an annual dataset looks like this:

sineptop(annual_data, frequency = 1, by = 1/12)

If the “by” variable is not specified it is assumed to be 1, so you don’t need to specify it, and the original syntax still works.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

167 Comments
Inline Feedbacks
View all comments
Bernie Hutchins
May 27, 2014 7:18 pm

Greg Locock on May 27, 2014 at 4:44 pm suggests a test signal 1,1,1,1 and notes correctly that zero-padding it does not give the correct answer. Indeed true. But this is not a good example because it IS bandlimited – bandlimited to 0 in fact. It certainly looks like DC and anything else is wrong.
Something closer to his second suggestion is much more revealing. Consider 1,1,1,1,1,-1,-1,-1,-1,-1. This APPEARS to be a square wave. A true square wave is NOT bandlimited. But the FFT only “understands” bandlimited (to 1/2 the sampling frequency) signals. We may feel that the 10 samples given “ought to be” enough to TELL the FFT it is a true square wave. What does the FFT “think” it is? Indeed the same interpolation scheme I mentioned at 4:24PM gives us the answer (top panel):
http://electronotes.netfirms.com/FFTInterpSquare.jpg
The red waveform is exactly the unique function that IS bandlimited to available FFT frequencies (DC and five others in this case) which goes through the 10 points. It wiggles! That nice square wave was just in your mind.
[Incidentally, the same problem comes up in trying to use the FFT to calculate Fourier Series coefficients. Here (blue dots in bottom panel) we have a fundamental k=1, a third harmonic at k=3, roughly 1/3 the fundamental, and a fifth harmonic k=5, roughly 1/5. Crude. If we want more FS coefficients, and more ACCURATE ones, we need a lot more samples, and it eventually always falls short for high enough k (due to aliasing). ]
The FFT is exactly symmetric with regard to time and frequency. Thus we CAN interpolate the spectrum by putting zeros in the MIDDLE of the time series (not at the end) as shown by the red curve in the second panel. Nothing remarkable here as far as I can see.
Not enough room here for all the details.

Nick Stokes
May 27, 2014 7:42 pm

1sky1 says: May 27, 2014 at 5:57 pm
“Poor Fourier must be rolling over in his grave. Few here comprehend that his series of harmonically-spaced, finite amplitude sinusoids provides a valid decomposition only for L- periodic signals, with 1/L as the fundamental frequency. Any other set of LS-fitted sinusoids fails to add up to the signal, especially when the underlying signal isn’t periodic.”

But Willis isn’t looking for a decomposition. He’s looking for a band-limited filter. And getting a displaced sinc function, which isn’t optimal, but works.
“If you are trying to tell me that the FFT of a DC signal is identical to that of a step function then something is very wrong.”
You can pad with anything. Periodic extension if you want. That would answer this objection. But it doesn’t help much where padding is used in practice.

Greg
May 27, 2014 7:48 pm

“Greg Locock on May 27, 2014 at 4:44 pm suggests a test signal 1,1,1,1 and notes correctly that zero-padding it does not give the correct answer.”
Hardly surprising. Doing that changes the mean , the variance and the stationarity of the data. Not bad for a few zeros.

Bernie Hutchins
May 27, 2014 7:52 pm

Greg asks on May 27, 2014 at 6:01 pm:
“Are you aware of a less arbitrary way of doing that? Or since it’s pretty arbitrary maybe that’s a good as anything.”
If you are less concerned with fighting off noise (not always an option – and your applications seem not to appreciate noise!) and are not adverse to that “big echo” that eventually comes back, the inverse filter where stable is “perfect” and not arbitrary. But I agree that a certain amount of “bubble gum and baling wire” is usually necessary in practice.
From your questions and comments on this blog I infer that you are pretty good at appreciating and attacking the difficulties that always come up in practice. The flip side is that useful suggestions may be few and far between!
If I can be of help, feel free to email me (just Google my name).
Bernie

Greg
May 27, 2014 8:16 pm

“If I can be of help…”
Thanks, will do.

May 27, 2014 9:08 pm

Is that a “Versalog” rule?

RC Saumarez
May 27, 2014 11:42 pm

There seems to be some dispute over Fourier Interpolation. It is simply time domain interpolation achieved by artificially increasing the sampling rate. However, if one has a correctly sampled signal signal, it is exact provided that the fundamental signal is an exact period of the transform length. If not, there are potential errors since if there is a component at, say 1.5 times the sampling rate, in the DFT the last point in the signal record will not predict the first point by Taylor’s theorem and so introduce a discontinuity that results in a Gibb’s phenomenon when one tries to interpolate. This is simply a reflection of the the fact that a DFT is not a Fourier transform but a series and that the sinc function extends to +- infinity. Obviously, in practice one uses compromises with windowing, long records etc, etc.
The issue here is more subtle: can one reconstruct the original data series from the “slow Fourier transform”? This is completely staightforward. Once the coefficients of the series have been computed, simply sum the series, either by brute force evaluation at each (irregular) point in the time series or by approximation to that point by substution of the coefficients into a DFT (with as high a sampling frequency one likes) and invert the transform.
I have my doubts about the theory of the “slow Fourier Transform” since the orthogonality property is not satisfied for non integer multiples of the fundamental frequency. It is not a DFT and it is not a true Fourier Transform although it is notionally continuous in frequency because the time period of integration is limited.

Greg
May 28, 2014 2:18 am

“I have my doubts about the theory of the “slow Fourier Transform” ”
Willis called this SFT but is aware it’s tongue in cheek. For those who draw too much parallel with DFT what Willis is doing is not a “transform” but is a spectral analysis. It’s more akin the the chirp-z analysis I usually do.
If he used equal frequency steps , rather than period steps , it should be possible to select the subset of his amplitudes that match DFT frequencies and they should be close to a DFT and hence be a transform that could be used to reconstruct the original.
If all you want to know is where the peaks lie, this spectral analysis is fine and has the advantages Willis states. But not to be confused with a frequency domain _transformation_ that is equivalent to the time domain representation.

Dinostratus
May 28, 2014 4:21 am

“Now, I suppose some signal analysis guy will pop up and say he knew that all along and abuse me for my colossal ignorance … but I taught myself this stuff, and all I can say is, it sure was a surprise to me.”
Well, take a class or read a book or something. There are plenty of surprising results out there. Don’t you want to get to them as soon as possible? This post comes across like a retired guy tinkering with his 1953 Desoto and complaining that the neighborhood kids don’t care.

Bernie Hutchins
May 28, 2014 9:55 am

Dinostratus said in part May 28, 2014 at 4:21 am:
“… Don’t you want to get to them as soon as possible? This post comes across like a retired guy tinkering with his 1953 Desoto and complaining that the neighborhood kids don’t care. …”
You make a fair point. But……
While it is always presumptuous (and dangerous) to confuse one’s self with Feynman, who thrived on figuring everything out for himself, there is great merit in getting a hint of an idea from own thinking, or from another source, and developing it carefully. Later you can enjoy the fact that someone else figured it out TOO, the relative timing becoming less important.
Too much (especially in climate?) people have embraced a shiny new tool (perhaps only new to them) and applied it to “real” data before seeing how it worked with simple “test signals” like a pulse, a sinewave, white noise, red noise, etc. Elementary “bench-testing” always is advisable and builds confidence.
This “retired guy” supposes that your “as soon as possible” should be “as soon as it is carefully considered and understood”. Some folks do not thrive on books and classes, many of which leave students with incomplete understanding, or even misled. Back on May 21, Dinostratus showed just such a misunderstanding of the FT vs. FFT on this blog.
Listen to the guy with the 53 Desoto!

Greg
May 28, 2014 10:20 am

Agreed, Willis’ “beginner’s mind” approach is good.
I was always slow in my studies because I insisted on understanding what I was doing , not just learning to repeat it. That meant I was always explaining stuff to the guys who got top the marks.

Steve from Rockwood
May 28, 2014 1:28 pm

Padding a time series with zeros would lead to a mess of high frequencies unless the time series was first detrended, the DC component removed AND then made periodic by a tapering filter, such that the end of the time series was continuous with the start. This is why Fourier analysis of temperature trends is such a useless exercise. You have to throw the baby out with the bath water in order to follow best practice.

Steve Fitzpatrick
May 28, 2014 2:03 pm

Three apparent impossibilities:
“Nixon goes to China.”
“Hell freezes over.”
“Tamino offers a reasoned and positive contribution.”
What an unexpected development.

Brian
May 28, 2014 3:36 pm

Willis, I have experienced the same feelings of accomplishment that one gets when a self-taught person invents a better way to do something, only to be told by an expert in the field that it isn’t ‘new’. As an hardware engineer, I was given access to a mainframe computer (a mighty one, in the early eighties) just because it was there. It came with an operating system and a file editor. As a means to an end – I had no budget to hire a programmer or buy software – I taught myself to write programs, and “invented” a method of dividing a huge file into halves, enabling me to find an entry in fewer “read” operations. It greatly speeded up my program’s execution. I showed the method to others, and was told I had reinvented the “binary” search, something that is taught to programmers in their first year. I taught myself so much stuff, I later had a second career as a programmer, achieving the title of SENIOR applications developer… Never stop learning.

Bernie Hutchins
May 28, 2014 3:37 pm

Steve from Rockwood said:
May 28, 2014 at 1:28 pm
“Padding a time series with zeros would lead to a mess of high frequencies unless the time series was first detrended, the DC component removed AND then made periodic by a tapering filter, such that the end of the time series was continuous with the start. This is why Fourier analysis of temperature trends is such a useless exercise. You have to throw the baby out with the bath water in order to follow best practice.”
Exactly HOW are you padding the time series with zeros? There are AT LEAST three meanings to “zero-padding” So if the time series is
[1, 1, 1, -1, -1, -1, -1, -1, 1, 1]
What does it look like padded with 10 zeros, ready for the FFT?
Thanks Steve, a specific example should help a lot.
Also I guess I don’t understand what “made periodic by a tapering filter” means.

1sky1
May 28, 2014 5:12 pm

Nick Stokes:
The very fact that Willis uses an equally-spaced SET of periods and calls this methodology either “slow F. transform” or “periodogram” belies your contention that he’s looking for a “band-limited filter.” The latter would produce a time-VARYING output, as in wavelet analysis; what he obtains for each member of that set by the ages-old curve-fitting procedure called sinusoidal regression is nothing more than the CONSTANT amplitude and phase. Nowhere does he concern himself with the NON-orthogonality of his results, which do NOT “work” with signals ecountered in geophysics in any reliable–let alone “advantageous”–way.

1sky1
May 28, 2014 5:19 pm

Bernie hutchins:
Since you seem to have the time for much numerical experimentation, I’d like to send you a brief segment of real-world data for a blind test of Prony’s method, whose basis functions strike me as being quite suitable.

Nick Stokes
May 28, 2014 5:21 pm

1sky1 says: May 28, 2014 at 5:12 pm
“The latter would produce a time-VARYING output, as in wavelet analysis; what he obtains for each member of that set by the ages-old curve-fitting procedure called sinusoidal regression is nothing more than the CONSTANT amplitude and phase.”

It’s the ultimate in band-limited filters; it produces a single frequency response.
It’s as if he used his signal to excite a whole range of perfect tuning forks, and identified periodicity by a strong response.

1sky1
May 28, 2014 6:01 pm

Nick Stokes:
Your fantastic analogy with “perfect tuning forks” indicates scant mathematical comprehension of spectral windows in finite-length records and the demonstrable absence of strictly periodic components in geophysical signals not driven by astronomical factors.

Bernie Hutchins
May 28, 2014 6:19 pm

1sky1 said May 28, 2014 at 5:19 pm
“Bernie hutchins:
Since you seem to have the time for much numerical experimentation, I’d like to send you a brief segment of real-world data for a blind test of Prony’s method, whose basis functions strike me as being quite suitable.”
The situation may not be overly promising. If the data is noisy, we would have to try it provisionally. Also I am not sure how much a “brief segment” is. I only need 2N samples to try up to order N, and that should be easy to try – glad to. Presumably this could be numbers posted here or by email to me. I have very little experience (or confidence with) data files other than .mat. Most of my stuff is done with test signals generated by the program itself. No recent experience with Prony and real stuff – last time was pulses propagated in dry and (then) wet sand some years ago. But sure – let’s consider what can be tried.

Nick Stokes
May 28, 2014 7:12 pm

1sky1 says: May 28, 2014 at 6:01 pm
“Your fantastic analogy with “perfect tuning forks” indicates scant mathematical comprehension of spectral windows in finite-length records”

My mathematical comprehension of spectral windows etc is fine. A lossless tuning fork has highly frequency dependent acoustic impedance. The amount of energy that it can pick up from a finite burst of signal is a very reasonable measure of periodicities in that signal. And that is what he is measuring.

Bernie Hutchins
May 28, 2014 7:33 pm

1sky1 said, replying to Nick Stokes May 28, 2014 at 6:01 pm:
“Your fantastic analogy with “perfect tuning forks” indicates scant mathematical comprehension of spectral windows in finite-length records and the demonstrable absence of strictly periodic components in geophysical signals not driven by astronomical factors.”
A bit over the top perhaps? As one who has actually built filter banks, I am not adverse to the notion of, at least, tuning forks of high-Q. Before the FFT (etc.) this was a major way of doing spectral analysis – the other being heterodyning through a single sharp bandpass.
When 1sky1 says “indicates scant mathematical comprehension of spectral windows in finite-length records” we are left to contemplate all those “records” that are not finite length. None!
Additionally, 1sky1 says: “… and the demonstrable absence of strictly periodic components in geophysical signals not driven by astronomical factors”. Again “strictly periodic” (astronomical or otherwise) suggests to me infinite duration. And don’t we always try to show that something is absent by first looking for it by conventional means?
For myself, I have a pretty good idea what 1sky1 is trying to say, and believe it could be brought forward as a valid point. But it needs considerable elaboration. To many it probably sounds like a tricky gotcha “bar bet” engineers supposedly pose to each other: no signals are bandlimited – all signals are bandlimited.

Nick Stokes
May 28, 2014 8:23 pm

Bernie Hutchins says: May 28, 2014 at 7:33 pm
“the other being heterodyning through a single sharp bandpass.”

Actually, it could also be said that Willis is heterodyning. The regression basically multiplies the signal by the sinusoid and sums (integrates). If the signal has a near-periodic component, you get a low frequency response. The summation (integration) is a low pass filter, so the result is again a measure of the strength of the periodicity and how close the test frequency is to it.
I’m trying to write a post on this. It’s interesting stuff.