Guest Post by Willis Eschenbach
I’ve developed a curious kind of algorithm that I’ve humorously called “slow Fourier analysis”, but which might be better described as a method for direct spectral analysis. The basic concept is quite simple. I fit a sine wave with a certain cycle length, say 19 months, to the dataset of interest, and I note the peak-to-peak amplitude of this best-fit sine wave. I repeat this process for the various cycle lengths of interest, and put that all together as a periodogram. The periodogram shows how large of a sine wave is the best fit for the data for each of the cycle lengths. This method has worked a treat, but yesterday, this odd method of mine handed me a big surprise.
Now, I have been doing these periodograms utilizing only the time periods inherent in the data. So if the data was monthly, I’ve looked at all possible integral lengths of monthly cycles, starting with two month, three month, four month, five month cycles, and so on up to cycles with periods as long as a third the number of months in the data.
And on the other hand, if the data was yearly, I’ve looked at all possible integral lengths of yearly cycles, starting with two year cycles, three year cycles, four years, and so on up to a third of the number of years in the data.
(I mistrust reports of the longer cycles, and any cycle with a period longer than a third of the dataset is not really worth calculating—it will fool you every time.)
So I was going along in blissful ignorance when an alert WUWT reader, Greg Goodman, pointed out that there was no reason to restrict myself to integral periods. He noted that I could subdivide the results and gain greater resolution. I was dismissive of his idea. I said that I thought you could do it, and that he was right and it was an interesting idea, but I said I thought you wouldn’t gain any real resolution by doing that. Hang on, let me get my exact words and the graphic I used in that thread to illustrate my point … here’s what I said:
In any case, I’m still not convinced that the procedure will give any real increase in information. Here’s the difference, for example, between monthly and annual data:
I think that if I increase the resolution on the sunspot periodogram while still using the annual data, it won’t look anything like the real monthly data … I’ll report back on that one.
Well … this is the promised “report back”. As you can see, both the monthly and the annual data show the peak at 11 years. The monthly data shows more detail, as it resolves the individual peaks, at 10, 11, and ~ 11.9 years. I thought that the test I outlined would be a good one. I’d see what happened when I sampled annual data at monthly intervals. So here’s what I did. I took the SIDC monthly sunspot data. For the annual data, I didn’t use the SIDC annual data. Instead, I averaged the SIDC monthly data to give me annual data that I knew was a true annual average of that particular monthly dataset. Figure 2 shows the monthly data in red, and the annually averaged monthly data in black.
Figure 2. Monthly SIDC sunspot data (red), and the same monthly SIDC data annually averaged (black).
So to be clear, the data shown as the black line is the annual average of monthly data shown as the red line
Next, I modified my periodogram-generating computer function to allow for fractional time sampling of the data.
Finally, I calculated three periodograms. One is the periodogram of the annually averaged data, shown in yellow/black below. The second is the periodogram of the underlying monthly data, shown in blue.
And finally, I calculated the third periodogram (shown in cyan) using the annually averaged data, but sampled on a monthly basis … Figures 3 and 4 show those results. I must confess, when I saw them my jaw hit the floor.
Figures 3 and 4. Upper panel shows the periodogram of annual sunspot data (yellow/black), monthly sunspot data (blue), and monthly-sampled annual data (cyan). Lower panel is a closeup of the upper panel, showing the period from seven to twenty-five years.
You can see why my jaw hit the floor. The blue line (periodogram of actual monthly data) is almost identical to the cyan line (periodogram of annual data sampled at monthly intervals). Who knew?
It turns out that contrary to my expectation, the information about the strength of the monthly cycles is NOT lost when the individual monthly values are subsumed into annual averages. Somehow, that information still exists and can be retrieved. When I take the annual data and analyze it at monthly frequencies, the data is still there. Not exactly, to be sure, as you can see the process is not perfect or precise.
But it is astoundingly accurate. I absolutely didn’t expect that.
Now, I suppose some signal analysis guy will pop up and say he knew that all along and abuse me for my colossal ignorance … but I taught myself this stuff, and all I can say is, it sure was a surprise to me.
Truly, I don’t understand this result. For many, many years I’ve thought that when you average data in chunks, like say averaging monthly data into yearly data, that all of the monthly information was gone. Lost. Kaput. Irretrievable.
But that doesn’t seem to be the case at all. It seems that very little of the power spectrum information was lost at all as a result of the annual averaging process.
Naturally, of course, this brings up the next question—is this procedure invertible? That is to say, given a periodogram such as the one above, can I run the process in reverse? Can I start from the annual data, calculate the monthly periodogram from that annual data, and then invert the periodogram to give me back the monthly data? That would be really amazing … but I’m pessimistic.
My guess is no, the periodogram can’t be inverted to me give back the monthly data … but given how poor my last pathetic guess (at the head of this post) was, I’ll certainly give it a try. Any assistance gladly accepted.
Like I said above … once again all I can say is, I’ll report back on that one.
Regards to all,
w.
Acknowledgement: My thanks to WUWT reader Greg Goodman for the suggestion to investigate fractional time periods.
For Clarity: If you disagree with something someone says, please quote the exact words you disagree with. It avoids all kinds of misunderstandings.
Data and Code: I’ve collected the R code, the R functions, and the data into a small (22KB) zipped folder called “Oversampling Folder“. It includes the monthly and annual sunspot data as CSV files. If you change your R workspace to the folder, it should be turnkey.
For the original data:
SIDC Monthly Sunspot Data Note that following the advice of Leif Svalgaard, I have increased all of the pre-1947 SIDC sunspot values by 20%, to correct for the change in counting methods at that time. The change is immaterial for this analysis.
Changes In The R Function: The function to generate the periodogram, named “sineptop” (for sine peak-to-peak), was previously called like this:
sineptop(annual_data, frequency = 1)
or like this:
sineptop(monthly_data,frequency = 12)
The “frequency” variable identifies the data as having that many periods per year.
It can still be called like that. In addition, however, the new sineptop syntax for e.g. monthly sampling an annual dataset looks like this:
sineptop(annual_data, frequency = 1, by = 1/12)
If the “by” variable is not specified it is assumed to be 1, so you don’t need to specify it, and the original syntax still works.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.



Nick Stokes said in part May 28, 2014 at 8:23 pm:
“…..Actually, it could also be said that Willis is heterodyning….”
Note also that Greg said earlier, May 28, 2014 at 2:18 am
” … It’s more akin the the chirp-z analysis I usually do….”
Quite so. That brings back memories. The term “Chirp” kind of says it all.
I’ve discussed some of these ideas in a post here.
1sky1 says:
May 28, 2014 at 5:12 pm
As usual, 1sky1 has shown up to tell me all about how I’m doing everything the wrong way. 1sky1 never shows us how to do it the right way, of course.
I asked him if he’d do that once, so we could learn from him. Boy, was that a mistake!
He sent back a very nasty refusal, filled with spleen and vituperation, with a claim that I was trying to get him to perform valuable work for nothing … yeah, right, that’s my secret plan, to get 1sky1 to do “valuable work” for nothing …
As to the non-orthogonality of my results, I have indeed discussed it before, even in this thread, but for the current quest … so what? I’m using it to look for strong signals, not to decompose the signal into its component parts. As I’m sure 1sky1 will understand if he takes the time to think about it (which likely means he won’t understand it at all), in such a situation non-orthogonality is meaningless. Strong signals are strong signals, orthogonal or not …
w.
1sky1 says:
May 28, 2014 at 6:01 pm
Case in point for 1sky1’s habit of only telling people that they’re doing it totally wrong, but never revealing his secret plan for doing it right … instead, we just get his self assessment of his own abilities, which, needless to say, are stupendous and fantastic, he’s a legend in his own mind …
Of course, as is far too common in such cases, despite his huge claims, 1sky1 doesn’t have the balls to sign his own words, so he can walk away from them tomorrow and come back as “out2sea” and no one will be the wiser.
Pathetic …
w.
Nick Stokes says:
May 28, 2014 at 9:23 pm
Thanks for that, Nick, it’s an interesting post.
w.
Proper “Zero-Padding” of a time sequence to interpolate the FFT spectrum. Put the zeros in the middle, not at the end.
Here is my simple experiment. We start with 10 samples, n=0 to 9, Sin(2*pi*n/10) + Sin(2*2*pi*n/10) + Sin(3*2*pi*n/10). That’s three sinewaves of frequency 1/10, 2/10, and 3/10. The FFT magnitude has just these three frequencies (plus upper sides of course). See top panel of figure here.
http://electronotes.netfirms.com/ThreeSin.jpg
Adding 90 zeros to the END of the original time sequence of 10 points, we get the FFT magnitude as shown in the middle panel (blue circles – original spectral points have red centers). Kind of wild oscillations, but it does go through original points. NOW, putting the 90 zeros in the MIDDLE gives a much smoother result (lower panel).
The horrible middle panel (zeros at end) looks remarkably like designing an FIR filter and forgetting to put in the linear phase. I suspect it’s the same thing somehow.
@Bernie Hutchins following your invitation, I sent you an email yesterday to a yahoo address. Is that still valid?
Sin(2*pi*n/10) + Sin(2*2*pi*n/10) + Sin(3*2*pi*n/10)
Your test fn is basically a negative ramp repeating every 10 pts
If you end pad it you still have the ramp but with a pause between each cycle.
If you pad the middle you have a ramp that had a long flat half way down. As you have show the FFT is very different from that of the original
You seem to think this result is “better” in some way because it looks nicer?
From your comparison you can see that you have drastically reduced the resolution, have two peaks instead of three. The middle frequency is still there as a point but is not indistinguishable from any other point in the spectrum.
You have lost one of the three input signals. How is this better?
oops: but is NOW indistinguishable
However, I think your graph is a good illustration of how, despite giving better plotting resolution, padding does not improve the accuracy of the result. The basic uncertainly due to insufficient sampling of longer cycles remains and care is needed not to infer undue accuracy to the results.
Sorry, it does improve the accuracy but does not give the correct peak.
It would be interesting to do comparative plots for padding with say, 95, 135, 155 padding . (rather arbitrary choices trying to avoid integer multiples of signals and original sample length).
I ran a sine wave with a period of 11 years through the function “sineptop”.
There are all sorts of artifacts that have side lobes that are 20% the size of the main lobe. The effect of what you’re doing is creating some gnarly window and filtering effects. Partly because you are not windowing (high passing) the sample, and partly because your function does something funny to the sine-wave itself (haven’t figured out what yet – probably aliasing because you didn’t low pass filter the interpolation.
Please test with basic known waveforms that have known transform results before continuing. You aren’t getting useful information out of these transforms.
picture of result:
test data (11 year sine wave): https://www.dropbox.com/s/ieglk36hnxa430f/test-sin.csv
Yep, that’s magnitude of sinc fn.
What the periodogram for an 11 year sine wave should look like
red = unwindowed
blue = kaiser window with Beta = 2
picture:
source: https://www.dropbox.com/sh/5wh9dbja6x37nfa/AADhfZFr6JXWF2vCDAj407Hoa
first min 11+60mo second 11+12mo
sin(pi*x)/(pi*x); 12m= 2*pi ;
What the periodogram for an 11 year sine wave should look like
red = unwindowed
whoever told you the red line is unwindowed lied.
It has the same rectangular window Willis used.
😉
Just in case that is too subtle for anyone, the rect. window that is producing the artefacts that are of the form of sinc fn is the finite length of the data sample.
This distortion will be applied to all peaks in the spectrum and then overlaid.
Where the peaks overlap they will add to make something looking more significant.
I pointed out several days ago that triplets in Willis’ periodogram of SSN around 9 and 18y were probably an echo of the main 11y triplet.
I know this is going to go against Willis’ gut instincts not to mess with the data, but I think distorting the input data with a window fn of some sort is an essential compromise to avoid the finite sample period snapshot (rectangular window) throwing artefacts everywhere.
Sadly these window functions also reduce frequency resolution 🙁
I see lots of hard sums here.
So are we consigned to a firey end on an arid planet burned to death by our own waste products or not?
🙂 <- note smiley
Bernie Hutchins says:
May 28, 2014 at 3:37 pm
Steve from Rockwood said:
May 28, 2014 at 1:28 pm
Exactly HOW are you padding the time series with zeros? There are AT LEAST three meanings to “zero-padding” So if the time series is
[1, 1, 1, -1, -1, -1, -1, -1, 1, 1]
What does it look like padded with 10 zeros, ready for the FFT?
====================================================
1,1,1,-1,-1,-1,-1,-1,1,1,0,0,0,0,0,0,0,0,0,0
Not a great example Bernie, but imagine starting with 2,2,2,0,0,0,0,0,2,2 which has a DC offset of 1.0. Padding with zeros 2,2,2,0,0,0,0,0,2,2,0,0,0,0,0,0 to produce 16 terms would produce high frequencies at the 2,0 transition that was introduced by you. Removing the DC offset would reduce this to a 1,0 transition – not a great example data set but I think you know what I mean.
In the case where the first element is not equal to the last, this produces high frequencies because the data set is not periodic – a requirement of the FFT.
2,1,1,-1,-2,-1,-1,-1,1,1 for example needs to repeat as 2,1,1,-1,-2,-1,-1,-1,1,1,2,1,1 etc.
The start and rear ends of the time series would have to be tapered to match before padding.
e.g. 0.5,1,1,-1,-2,-1,-1,-1,1,0.5 then remove the DC offset, then pad with zeroes. You end up with a transition of 1.0, 0.5, 0.0 instead of 0.0, 2.0, 2.0 or 2.0, 2,0, 0.0.
The above is not a great example. Imagine a straight line of temperature increases.
0,0.5,1.0,1.5,2.0,2.5,3.0,3.5 etc…
When you put something like that into an FFT it is assumed to be periodic so the large offset between the first value 0.0 and the last value 3.5 will require a lot of high frequencies to reproduce the discontinuity when in fact there is no high frequency content in the time series – it is a straight line.
An FFT is band-limited. Any trend longer than half the length of the time series must be detrended. If you have 128 years of annual data you have to remove any trend longer than 64 years. The fact that subsequent warmings from 1850 onward ended with successively higher temperatures would have to be removed from the data set as the baby in the bath water. What would be left would be the oscillations (peak in the 40s, low in the 70s, peak in the 2000s), but no overall increase.
Willis I posted this on another page. The pdf makes an interesting read and to me confirms what you posted about reducing the climate models to simple black boxes
http://wattsupwiththat.com/2014/05/26/quote-of-the-week-howler-from-the-world-meteorological-organization-what-warming/#comment-1648819
Ferdinand Engelbeen says:
See: http://www.economics.rpi.edu/workingpapers/rpi0411.pdf
========
…
This confirms what Willis posted awhile back, that the climate models can all be reduced to a simple “black box” equation.
Greg and Steve from Rockwood –
Greg – I have your email and will get to it later today: it’s 11 AM here in Ithaca – first coffee!
Greg – the test signal is not a ramp, it’s the sum of three sines.
Steve – I intentionally chose the example I proposed because it automatically met two of your criterion. You did however clearly indicate that you are adding zeros to the end, which is one sense of “zero-padding”. Thanks. The part about filtering (windowing?) and the need for it escapes me still. All FFT pairs are periodic automatically.
Steve and Greg –
Isn’t it the case that zero padding the time sequence interpolates the spectrum? It adds more frequencies points but does NOT increase the resolution. No information is added – so how could it? It’s just computing many samples of the DTFT using an FFT. You CAN put the zero string anywhere, and the interpolation will go through the original FFT points as in the jpg I posted above (repeated here):
http://electronotes.netfirms.com/ThreeSin.jpg
BUT makes a HUGE difference what happens between the original points. Putting the 90 zeros between point 2 and 3 (not in the jpg) gives an intermediate case between the two lower panels. Which is right? I certainly prefer the bottom panel.
Working on these ideas later today. It’s not simple. Thanks for the feedback.
It appears that interpolating an FFT is more complicated than interpolating a time signal because the signal is assumed real while the FFT is in general complex. It seems that it does involve putting in a correct linear phase. Working on it. Probably too involved to try to post here directly.
Bernie: “Which is right? I certainly prefer the bottom panel.”
You again suggest it’s better but don’t say why and fail to reply to my noting that you’ve lost one of the peaks. Which is more accurate or whatever is secondary, you now have two identifiable peaks instead of 3. The excursions are less , does that make it more pleasing to the eye? I’d rather keep the third peak. Above all the fact that there are 3 and not 2. frequencies present.
consider this: the wrap-around that DFT does is artificial but can’t be helped. What you are proposing makes an artificial break in real continuous data and inserts something. The means TWO more artificial breaks with accompanying steps (unless you’re really lucky).
I don’t see how this can do anything but degrade the signal.
Your example is non typical in that the split is near a zero crossing, and it still loses important info.
“Greg – the test signal is not a ramp, it’s the sum of three sines.”
What I said is: “Your test fn is basically a negative ramp repeating every 10 pts” , say ramp-like if you prefer. I also noted loss of resolution and loss of one of the three test frequencies. That would seem more important than whether is a ramp or ramp-like. You avoid addressing that.
I’d never heard of this idea before so thought it may be some technique I was unaware of. Having looked I don’t think I’ll be using it.
@Bernie.
You cannot add information to an FFT by padding a time series. When you pad with zeroes you can only add to the start, to the end, or a little of both (the FFT does not care because the time series has no start or end point, only frequency content). You cannot insert zeroes into a time series. It corrupts the real frequency content. The main reason for padding with zeroes that I am aware of is for the convenience of using the nlogn FFT that requires a power of two points. If you have a 484 point time series add enough zeroes to get to 512, for example. Also you may want FFTs from different time series to have the same number of points to compare the spectra for the same frequencies. One series with 3800 points is padded to 4096 and another with 5000 is truncated to 4096. You did not add any frequency content to the padded time series. You did possibly take away a little low frequency content by truncating, but not much.