Guest Post by Willis Eschenbach
I’ve developed a curious kind of algorithm that I’ve humorously called “slow Fourier analysis”, but which might be better described as a method for direct spectral analysis. The basic concept is quite simple. I fit a sine wave with a certain cycle length, say 19 months, to the dataset of interest, and I note the peak-to-peak amplitude of this best-fit sine wave. I repeat this process for the various cycle lengths of interest, and put that all together as a periodogram. The periodogram shows how large of a sine wave is the best fit for the data for each of the cycle lengths. This method has worked a treat, but yesterday, this odd method of mine handed me a big surprise.
Now, I have been doing these periodograms utilizing only the time periods inherent in the data. So if the data was monthly, I’ve looked at all possible integral lengths of monthly cycles, starting with two month, three month, four month, five month cycles, and so on up to cycles with periods as long as a third the number of months in the data.
And on the other hand, if the data was yearly, I’ve looked at all possible integral lengths of yearly cycles, starting with two year cycles, three year cycles, four years, and so on up to a third of the number of years in the data.
(I mistrust reports of the longer cycles, and any cycle with a period longer than a third of the dataset is not really worth calculating—it will fool you every time.)
So I was going along in blissful ignorance when an alert WUWT reader, Greg Goodman, pointed out that there was no reason to restrict myself to integral periods. He noted that I could subdivide the results and gain greater resolution. I was dismissive of his idea. I said that I thought you could do it, and that he was right and it was an interesting idea, but I said I thought you wouldn’t gain any real resolution by doing that. Hang on, let me get my exact words and the graphic I used in that thread to illustrate my point … here’s what I said:
In any case, I’m still not convinced that the procedure will give any real increase in information. Here’s the difference, for example, between monthly and annual data:
I think that if I increase the resolution on the sunspot periodogram while still using the annual data, it won’t look anything like the real monthly data … I’ll report back on that one.
Well … this is the promised “report back”. As you can see, both the monthly and the annual data show the peak at 11 years. The monthly data shows more detail, as it resolves the individual peaks, at 10, 11, and ~ 11.9 years. I thought that the test I outlined would be a good one. I’d see what happened when I sampled annual data at monthly intervals. So here’s what I did. I took the SIDC monthly sunspot data. For the annual data, I didn’t use the SIDC annual data. Instead, I averaged the SIDC monthly data to give me annual data that I knew was a true annual average of that particular monthly dataset. Figure 2 shows the monthly data in red, and the annually averaged monthly data in black.
Figure 2. Monthly SIDC sunspot data (red), and the same monthly SIDC data annually averaged (black).
So to be clear, the data shown as the black line is the annual average of monthly data shown as the red line
Next, I modified my periodogram-generating computer function to allow for fractional time sampling of the data.
Finally, I calculated three periodograms. One is the periodogram of the annually averaged data, shown in yellow/black below. The second is the periodogram of the underlying monthly data, shown in blue.
And finally, I calculated the third periodogram (shown in cyan) using the annually averaged data, but sampled on a monthly basis … Figures 3 and 4 show those results. I must confess, when I saw them my jaw hit the floor.
Figures 3 and 4. Upper panel shows the periodogram of annual sunspot data (yellow/black), monthly sunspot data (blue), and monthly-sampled annual data (cyan). Lower panel is a closeup of the upper panel, showing the period from seven to twenty-five years.
You can see why my jaw hit the floor. The blue line (periodogram of actual monthly data) is almost identical to the cyan line (periodogram of annual data sampled at monthly intervals). Who knew?
It turns out that contrary to my expectation, the information about the strength of the monthly cycles is NOT lost when the individual monthly values are subsumed into annual averages. Somehow, that information still exists and can be retrieved. When I take the annual data and analyze it at monthly frequencies, the data is still there. Not exactly, to be sure, as you can see the process is not perfect or precise.
But it is astoundingly accurate. I absolutely didn’t expect that.
Now, I suppose some signal analysis guy will pop up and say he knew that all along and abuse me for my colossal ignorance … but I taught myself this stuff, and all I can say is, it sure was a surprise to me.
Truly, I don’t understand this result. For many, many years I’ve thought that when you average data in chunks, like say averaging monthly data into yearly data, that all of the monthly information was gone. Lost. Kaput. Irretrievable.
But that doesn’t seem to be the case at all. It seems that very little of the power spectrum information was lost at all as a result of the annual averaging process.
Naturally, of course, this brings up the next question—is this procedure invertible? That is to say, given a periodogram such as the one above, can I run the process in reverse? Can I start from the annual data, calculate the monthly periodogram from that annual data, and then invert the periodogram to give me back the monthly data? That would be really amazing … but I’m pessimistic.
My guess is no, the periodogram can’t be inverted to me give back the monthly data … but given how poor my last pathetic guess (at the head of this post) was, I’ll certainly give it a try. Any assistance gladly accepted.
Like I said above … once again all I can say is, I’ll report back on that one.
Regards to all,
w.
Acknowledgement: My thanks to WUWT reader Greg Goodman for the suggestion to investigate fractional time periods.
For Clarity: If you disagree with something someone says, please quote the exact words you disagree with. It avoids all kinds of misunderstandings.
Data and Code: I’ve collected the R code, the R functions, and the data into a small (22KB) zipped folder called “Oversampling Folder“. It includes the monthly and annual sunspot data as CSV files. If you change your R workspace to the folder, it should be turnkey.
For the original data:
SIDC Monthly Sunspot Data Note that following the advice of Leif Svalgaard, I have increased all of the pre-1947 SIDC sunspot values by 20%, to correct for the change in counting methods at that time. The change is immaterial for this analysis.
Changes In The R Function: The function to generate the periodogram, named “sineptop” (for sine peak-to-peak), was previously called like this:
sineptop(annual_data, frequency = 1)
or like this:
sineptop(monthly_data,frequency = 12)
The “frequency” variable identifies the data as having that many periods per year.
It can still be called like that. In addition, however, the new sineptop syntax for e.g. monthly sampling an annual dataset looks like this:
sineptop(annual_data, frequency = 1, by = 1/12)
If the “by” variable is not specified it is assumed to be 1, so you don’t need to specify it, and the original syntax still works.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.



@Bernie. Zero padding extends the lower frequency spectrum (for a given sample rate) but does not add any frequency content. You cannot add frequency content but you can introduce frequencies through bad practice.
Sure would improve your essays if you learned to use a decent image format for line drawings.
1. Greg said in part May 29, 2014 at 11:17 am
“….You again suggest it’s better but don’t say why and fail to reply to my noting that you’ve lost one of the peaks. Which is more accurate or whatever is secondary, you now have two identifiable peaks instead of 3. The excursions are less , does that make it more pleasing to the eye? I’d rather keep the third peak. Above all the fact that there are 3 and not 2. frequencies present……”
Sorry – I did not equate a loss of one local max to mean that the spectral energy there is lost, and indeed, it is still there EXACTLY as it was originally. (Actually, there are many more frequencies, each of the blue circles.) The three frequencies 1/10 (10/100), 2/10 (20/100), and 3/10 (30/100) are three of them, exactly as in the original. All that has been added is the suggestion that the region of spectral support may be continuous, but about the same width (from 1/10 to 3/10 or so) .
This is interpolation of the existing spectral information. No new information is added. The idea of putting the zeros in the middle, and of splitting any number exactly in the center (where present) is firmly established. Here it is for the time domain:
http://electronotes.netfirms.com/AN398.pdf
The useful point of view is to consider the INVERSE of interpolation, which is downsampling (resampling).
Greg also said: “Your example is non typical in that the split is near a zero crossing, and it still loses important info.”
I chose the example to be a sum of sine so that it would be zero at the center. (My program actually splits the center with 0/2 going both ways. ) Again, absolutely nothing is lost in my view.
I believe this is the same problem we have with the “Frequency Sampling” method of FIR filter design where you have to get the linear phase right:
http://electronotes.netfirms.com/AN337.pdf
Lots of pieces to put together.
Steve from Rockwood said in part May 29, 2014 at 12:35 pm
“You cannot add information to an FFT by padding a time series.”
Absolutely true. I hope I never suggested you could. Some people claim “more resolution” while it is just INTERPOLATION of the spectrum in the frequency domain. Note that my interpolations go EXACTLY through the FFT points of the length 10 case.
Interpolation in the TIME domain by inserting zeros in the MIDDLE of the FFT is well established:
http://electronotes.netfirms.com/AN398.pdf
and you will agree (I hope) that the FFT is mathematically symmetric with regard to time-frequency. So I can do the same to interpolate a spectrum. I am just finding a smooth curve that fits the frequency samples. The complication seems to be that most time signals are assumed to be real, while most FFT’s are complex. I believe this makes it necessary to pay more attention to a correct phase. I haven’t fixed this problem (or attempted to do so yet). I think it is the same as the problem with Frequency Sampling FIR filter design:
http://electronotes.netfirms.com/AN337.pdf
In as much as I am talking interpolation, the standard should be (at least as a start) does resampling back to the lower rate give me the original answer. It does – no loss.
Nick Stokes:
Your mathematics is not fine. No matter how high the Q of a physical tuning fork, its frequency response function never assumes the form of a cardinal sine function, sinc(x), with its negative side lobes. And your computational exercise with a pure finite-amplitude sinusoid totally misses the fundamental objection to using sinusoidal regression on data time-series that contain NONE, but are generated by RANDOM signals of varying bandwidths, whose power density occupies a CONTINUUM of frequencies.
Bernie Hutchins:
I remember quite well when physical spectrum analyzers consisted of a bank of narrow band-pass filters. I’ve already stated my objection to the tuning-fork analogy in response to Stokes. To address your obvious objection of lack of INFINiTELy-long records, one need only consider practical periodicity over available record length.
BTW, Since I work in an environment that discourages revelation of results, I’d be willing to send you plain-text file of <100 data points for test Prony's method only on the condition of strict confidentiality.
Willis:
That “your algorithm” fails to come to grips with the intrinsic character of most geophysical signals is plain as day to those with professional experience in signal analysis. My persistent criticism has always been directed at that issue–not your person. It’s utterly dishonest of you to accuse me now of that which you have persistently done. The marriage of ignorance with arrogance is what has driven scientifically qualified discussion largely elsewhere.
1sky1 said in part May 29, 2014 at 5:00 pm:
“…BTW, Since I work in an environment that discourages revelation of results, I’d be willing to send you plain-text file of <100 data points for test Prony's method only on the condition of strict confidentiality. …”
In as much as you are asking for MY help, I am reluctant to agree to any conditions on your part.
More to the point perhaps, my Matlab code is available and you can easily run the data yourself, and you know it is not a complicated method in the first place. You can probably code it better yourself.
Here is the code:
http://electronotes.netfirms.com/prony.txt
and all you would need to do is rename it prony.m and run a data vector x and a proposed order N. (Check your Matlab for existing functions prony.m. I think there is a GIT file floating around, and others, that are quite different from mine.) The example in the header should get you properly oriented. Some discussion is at:
http://electronotes.netfirms.com/AN365.pdf
It appears that the earlier discussions (before AN365) are not currently online, but I will be happy to post them if you need more.
1sky1 says:
May 29, 2014 at 5:25 pm
Well, let’s see what this charming comment contains. We have:
• No quotations to let us know what the author is talking about.
• No links to any studies.
• No support in the way of logic, math, data, or computer code for the author’s claims.
• In fact, no explanations of the claims at all.
• Claims of superior insight on the author’s part.
• A puerile accusation that I am dishonest, again without any quotation of my words to let us know what the author considers “dishonest”.
• Allied with the claim that I’m “dishonest”, a claim that no personal attacks are being made on me, that the author has directed his criticism at the science.
• An unsupported accusation that my actions are costing the site serious scientific discussion, again without specification of the purported damaging actions.
Yep … that has the hallmarks and signs of a 1sky1 comment, all right, it hits all the right notes.
w.
1sky1 says: May 29, 2014 at 4:49 pm
“Your mathematics is not fine. No matter how high the Q of a physical tuning fork, its frequency response function never assumes the form of a cardinal sine function, sinc(x), with its negative side lobes.”
OK. Take a series LC circuit with voltage V(t) at one end, and V=0 at the other. V(t)=I=0 for t<0. To avoid sqrt signs, let L=C=1.
The current I(t) is
I(t) = -sin(t)∫ cos(τ) V(τ) dτ – cos(t)∫ sin(τ) V(τ) dτ
The ∫ has limits 0 to t, or to tm if V(t)=0 after tm.
So the LC circuit goes exactly into oscillation with frequency =1 after t=tm, since the integarls remain constant, and the amplitude is Willis’ measure of periodicity. The LC is a filter which produces a single frequency output after tm.
Bernie: The idea of putting the zeros in the middle, and of splitting any number exactly in the center (where present) is firmly established. Here it is for the time domain:
http://electronotes.netfirms.com/AN398.pdf
It is possible to interpolate time-domain samples using the FFT. The
procedure is to first take the FFT of the time sequence, zero-pad the middle
of this FFT, and then take the inverse FFT of the larger FFT to obtain the
longer time-sequence. For example, a length 20 time signal could be
interpolated to a length 100 time sequence by inserting 80 zeros in the middle
====
OH Man. This is not way the same. The FFT is symmetric and thus adding “in the middle” is adding zero valued high frequency data points. Thus increasing the frequency resolution of the spectrum without changing the data. It is, in end padding !
Now try splitting half way between min and max freq and inserting 80 zeros . That is comparable to what you are doing in the time domain.
Your PDF files are very informative and I thank you but I don’t think you are making much sense on central padding issue and are not addressing comments I raise, so I’ll leave it there. I’ve been down this path before with others.
1sky1: “BTW, Since I work in an environment that discourages revelation of results, I’d be willing to send you plain-text file of <100 data points for test Prony's method only on the condition of strict confidentiality."
OMG , you sound really important. Do you work for some black govt agency or something?
And what would happen if this anonymous "plain-text" file of less than 100 anonymous data points from an anonymous poster got into the wrong hands?
I do hope Bernie can be trusted and he's not going to do a Snowden and disappear to Russia with your "confidential " data points.
LOL What a jerk.
@Bernie. If you are using your own FFT algorithm you have problems beyond real and imaginary components. To convert from time to frequency you start with 2 arrays, XTR and XTI where all XTI elements (imaginary part of the time series in the time domain) are set to 0.0. Once in the frequency domain the FFT will produce two arrays XFR and XFI that represent frequencies up to the Nyquist (twice your sample rate). These elements are often arranged in an order from (0) being the DC component toward increasing frequency until (N2) and then back down to the lowest frequency again. To find the middle you have to find the Nquist which will likely be at N2+1 element of the array, for a 1-based array where X(1) was the DC, X(2) was the lowest frequency, X(N2-1) was the second highest, X(N2) was the highest and X(N2+1) was the second highest. From there you would expand the array with zero frequency (real and imaginary). For example, to fill a 256 point FFT with zeroes (I did not test this code – just trying to be helpful):
XTR(1..512) real, time domain
XTI(1..512) imaginary, time
XFR(1..512) real frequency
XFI(1..512) imaginary frequency
for i:=1 to 256 do
begin
XTR(i) = some function varying with (i)
XTI(i) = 0.0
end
CALL NLOGN(XTR,XTI,256,+1,XFR,XFI)
These arrays now have the FFT coefficients in them for 256 points but you want to interpolate to twice the sample rate or 512 points.
for i:=512-64 to 512 do
begin
XFR(i)=XFR(i-512+1+64)
XFI(i)=XFI(i-512+1+64)
end
The above code copies the lower frequencies into the upper part of the array, because FFT arrays have two sets of frequencies (-w) and (+w) in addition to RE (cos) and IM (sin) components.
for i:=65 to 512-64 do
begin
XFR(i)=0.0
XFI(i)=0.0
end
The above code zeroes the higher frequencies
CALL NLOGN(XFR,XFI,512,-1,XTR,XTI)
-1 means from (f) to (t) domain. If things went well XTI should have near zero values throughout and XTR should be sampled at twice the rate you started with.
Greg said in part May 30, 2014 at 2:15 am:
“…OH Man. This is not way the same. The FFT is symmetric and thus adding “in the middle” is
adding zero valued high frequency data points. Thus increasing the frequency resolution of the spectrum without changing the data. It is, in end padding !…”
Nope! You have forgotten something! The FFT is NOT in general symmetric. Make the signal complex such as x = xr + j*xi where xr and xi are both random (just for a general test). Take X = fft(x) and neither its real part, its imaginary part, or the magnitude is symmetric. Do actually try this. The FFT of x = xr (even if xr is random) IS of course symmetric, the basis for my interpolation.
We will almost also certainly agree that there is a complete mathematical symmetry between FFT and IFFT equations. Whatever I do in time I can do in frequency. That means that the procedure either works in both domains, or fails in both domains.
Interpolation in the frequency domain works as in my jpg bottom panel. BUT this is because (by chance at first) the FFT was purely imaginary (I chose three sines). If I switch one to cosine, the interpolation is less smooth. Making this work robustly seems possible, fixing the “phase problem” as I suggested, or perhaps doing real/imaginary parts separately. I may well have done this 25 years ago.
But right now I have a program with 13 figures stacked up. I need a blank file and time – longer than the life of most postings here! And we are boring most folks if any are left here. Please watch for a future app note in my Electronotes site (tentatively AN-413 ???).
Steve from Rockwood replied to me May 30, 2014 at 8:12 am
Steve – I read what you wrote and don’t see any disagreement between us.
I saw you comment after replying to Greg at 8:53AM and thought you were making the same point about the real/imaginary parts, correctly pointing out the the imaginary part of the time sequence is assumed zero.
I use Matlab exclusively, which has an excellent FFT. But in the “old days” it was a DFT in BASICA (anyone remember that) with nested for loops and separate real/imaginary arrays.
@Bernie. We are in agreement. The phasing problem is likely a result of not treating the R and I components identically or not shifting both halves of the frequency spectrum at the array element next to the Nyquist. Cheers!
“It has the same rectangular window Willis used.”
ROFL. Yes doing nothing is a rectangular window.
We are up against fundamental constraints on information – kind of like the Heisenberg uncertainty principle. When examining the frequency components of signals at low frequencies you have a choice of windows that trades off frequency accuracy, amplitude accuracy, and phase accuracy. You also have a fundamental 1/x accuracy issue with the period. If you only have 2 periods of your signal your frequency error on period seen in the periodogram is on the order of 50%!. If you want a 5% error then you need 20 periods. No amount of magic padding is going to help – the information isn’t there in the original signal.
Try sweeping the frequency of a sine wave through the FFT, it’s easy to see the behavior.
And there lies my fundamental disagreement with trying to determine trends off of historical climate data. We have 11 year, 60 year, and other long cycles happening in nature, and we simply don’t have enough data to accurately estimate what those periods actually are. Just to see two periods of the PMO or AMO we need 120 years of satellite data. Call me when it’s the year 2099….
Mann’s latest estimations off a SINGLE period of the AMO is a complete joke. It just doesn’t pass the basic “do we have enough information” test. AFAICT climate scientists that are peer reviewing this material would have failed a graduate level course in digital signal processing…
Nick Stokes:
Shifting your mathematical formulation from the band-pass characteristic of
spectral windows to that of low-pass RL or RC filters hardly changes the
fact that ONLY with purely sinusoidal input will the result be sinusoidal
also. This still fails to address the crux of the matter: what is obtained
with RANDOM signals? In the former case, pure gaussian white noise will be
shaped into autocorrelated oscillatory “waves” at the characteristic
bandwidth, which in short records may strike the eye as “periodic.” In the
latter case, there will be diffusion-like autocorrelated fluctuations. In
neither case will a sinusoid with FIXED amplitude and phase be obtained.
Nevertheless, sinusoidal regression will still produce a misleading
numerical result.
All this is at the heart of the well-established systematic BIAS and
asymptotic INCONSISTENCY of raw periodograms as estimators of power density
spectra, usually covered in Signal Analysis 101 (see, e.g., O&S pp.733-734).
Sadly, those who rely primarily on computation for insight remain dismally
ignorant in their unicorn hunt for “periodicity.”
Bernie:
Actually, I was not seeking your “help” in my work, but trying to set you up
for a potentially lucrative consulting opportunity elsewhere.
I did like your demonstration of how simplistic emulation of bandlimited
interpolation by zero-padding the FFT leads to unexpected results. The
notion that sinusoidal regression can be used effectively for interpolation
in the general case is patently absurd.
As an EE, perhaps you can explain to all of us how the old analog spectrum
analyzers set up their bank of band-pass filters in the frequency domain.
They were, of course, equally spaced in their center frequencies, but was
the Rayleigh criterion for resolution strictly observed?
Greg:
It’s not surprising that a programmer who can’t label the frequency axis
correctly, has scant clue that the purpose of power spectrum analysis is to
estimate the density of signal variance over frequency, obscure this by
plotting as a function of period, and talks nonsensically of “the power
spectrum of cross-correlation” would indulge himself in fantasies about my
work and his qualifications as a signal analyst.
Plato said it best: “Wise men speak when they have something to say; fools
speak to say something.”
1sky1 says: May 30, 2014 at 5:23 pm
“Shifting your mathematical formulation from the band-pass characteristic of spectral windows to that of low-pass RL or RC filters hardly changes the fact that ONLY with purely sinusoidal input will the result be sinusoidal also.”
Not true at all. I’m using a LC filter, which is the equivalent of a tuning fork, and is a band-pass filter (freq 1/sqrt(LC)). And the integral formulation is for arbitrary driving voltage V(t). If it is time-bounded, then afterwards the LC circuit is in pure sinusoidal oscillation, at the resonant frequency.
Actually I made an error writing it down – in the integrals it should be dV/dτ. But that doesn’t change the result, which is that after the input has passed, the circuit goes on ringing (for ever with no R) at the resonant frequency with an amplitude that depends on V(t), and is high when V was oscillatory and strong in that band.
Peter Sable says:
May 29, 2014 at 1:52 am (Edit)
Thanks much, Peter. I’ve been thinking about that same question as I shoveled gravel today. I’ll need to take a closed look at what you done.
Appreciated,
w.
1sky1 commented to me May 30, 2014 at 5:24 pm
I guess that lucrative consulting contract was just not in the cards! (Tomorrow I will wander to the streets – tin cup in hand.) As for paragraph 2, I haven’t a clue what you were trying to say As for the spectrum analyzer, no, the one I made did not have equally spaced frequencies. I could elaborate, but following your lead, no particulars will be offered. I have no idea what a Rayleigh criterion of resolution is – never heard of it.
1sky1> Signal Analysis 101 (see, e.g., O&S pp.733-734).
Therein lies a serious problem. It’s not “101” if it’s page 733 and it’s certainly not 101 if it’s O&S – that’s a graduate level text. The undergrad text, which I also have, is terrible.
This field could benefit from a well written textbook that a normal engineer or climate scientist can use. O&S is a terrible textbook. The book would focus not just on the math but the process for not making horrible mistakes. O&S barely skims over that process, they assume if you can grok the math you can grok the process. Typical of too-smart people…
” obscure this byplotting as a function of period, ”
There’s nothing inherently wrong with this, period is merely 1/f. In either case the points should be plotted so that the resolution limitations are obvious.