Riding a Pseudocycle

Guest Post by Willis Eschenbach

Loehle and Scafetta recently posted a piece on decomposing the HadCRUT3 temperature record into a couple of component cycles plus a trend. I disagreed with their analysis on a variety of grounds. In the process, I was reminded of work I had done a few years ago using what is called “Periodicity Analysis” (PDF).

A couple of centuries ago, a gentleman named Fourier showed that any signal could be uniquely decomposed into a number of sine waves with different periods. Fourier analysis has been a mainstay analytical tool since that time. It allows us to detect any underlying regular sinusoidal cycles in a chaotic signal.

Figure 1. Joseph Fourier, looking like the world’s happiest mathematician

While Fourier analysis is very useful, it has a few shortcomings. First, it can only extract sinusoidal signals. Second, although it has good resolution as short timescales, it has poor resolution at the longer timescales. For many kinds of cyclical analysis, I prefer periodicity analysis.

So how does periodicity analysis work? The citation above gives a very technical description of the process, and it’s where I learned how to do periodicity analysis. Let me attempt to give a simpler description, although I recommend the citation for mathematicians.

Periodicity analysis breaks down a signal into cycles, but not sinusoidal cycles. It does so by directly averaging the data itself, so that it shows the actual cycles rather than theoretical cycles.

For example, suppose that we want to find the actual cycle of length two in a given dataset. We can do it by numbering the data points in order, and then dividing them into odd- and even-numbered data points. If we average all of the odd data points, and we average all of the even data, it will give us the average cycle of length two in the data. Here is what we get when we apply that procedure to the HadCRUT3 dataset:

Figure 2. Periodicity in the HadCRUT3 global surface temperature dataset, with a cycle length of 2. The cycle has been extended to be as long as the original dataset.

As you might imagine for a cycle of length 2, it is a simple zigzag. The amplitude is quite small, only plus/minus a hundredth of a degree. So we can conclude that there is only a tiny cycle of length two in the HadCRUT3.

Next, here is the same analysis, but with a cycle length of four. To do the analysis, we number the dataset in order with a cycle of four, i.e. “1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4 …”

Then we average all the “ones” together, and all of the twos and the threes and the fours. When we plot these out, we see the following pattern:

Figure 3. Periodicity in the HadCRUT3 global surface temperature dataset, with a cycle length of 4. The cycle has been extended to be as long as the original dataset.

As I mentioned above, we are not reducing the dataset to sinusoidal (sine wave shaped) cycles. Instead, we are determining the actual cycles in the dataset. This becomes more evident when we look at say the twenty year cycle:

Figure 4. Periodicity in the HadCRUT3 dataset, with a cycle length of 20. The cycle has been extended to be as long as the original dataset.

Note that the actual 20 year cycle is not sinusoidal. Instead, it rises quite sharply, and then decays slowly.

Now, as you can see from the three examples above, the amplitudes of the various length cycles are quite different. If we set the mean (average) of the original data to zero, we can measure the power in the cyclical underlying signals as the sum of the absolute values of the signal data. It is useful to compare this power value to the total power in the original signal. If we do this at all possible frequencies, we get a graph of the strength of each of the underlying cycles.

For example, suppose we are looking at a simple sine wave with a period of 24 years. Figure 5 shows the sine wave, along with periodicity analysis in blue showing the power in each of the various length cycles:

Figure 5. A sine wave, along with the periodicity analysis of all cycles up to half the length of the dataset.

Looking at Figure 5, we can see one clear difference between Fourier analysis and periodicity analysis — the periodicity analysis shows peaks at 24, 48, and 72 years, while a Fourier analysis of the same data would only show the 24-year cycle. Of course, the apparent 48 and 72 year peaks are merely a result of the 24 year cycle. Note also that the shortest length peak (24 years) is sharper than the longest length (72-year) peak. This is because there are fewer data points to measure and average when we are dealing with longer time spans, so the sharp peaks tend to broaden with increasing cycle length.

To move to a more interesting example relevant to the Loehle/Scafetta paper, consider the barycentric cycle of the sun. The sun rotates around the center of mass of the solar system. As it rotates, it speeds up and slows down because of the varying pull of the planets. What are the underlying cycles?

We can use periodicity analysis to find the cycles that have the most effect on the barycentric velocity. Figure 6 shows the process, step by step:

Figure 6. Periodicity analysis of the annual barycentric velocity data. 

The top row shows the barycentric data on the left, along with the amount of power in cycles of various lengths on the right in blue. The periodicity diagram at the top right shows that the overwhelming majority of the power in the barycentric data comes from a ~20 year cycle. It also demonstrates what we saw above, the spreading of the peaks of the signal at longer time periods because of the decreasing amount of data.

The second row left panel shows the signal that is left once we subtract out the 20-year cycle from the barycentric data. The periodicity diagram on the second row right shows that after we remove the 20-year cycle, the maximum amount of power is in the 83 year cycle. So as before, we remove that 83-year cycle.

Once that is done, the third row right panel shows that there is a clear 19-year cycle (visible as peaks at 19, 38, 57, and 76 years. This cycle may be a result of the fact that the “20-year cycle” is actually slightly less than 20 years). When that 19-year cycle is removed, there is a 13-year cycle visible at 13, 26, 39 years etc. And once that 13-year cycle is removed … well, there’s not much left at all.

The bottom left panel shows the original barycentric data in black, and the reconstruction made by adding just these four cycles of different lengths is shown in blue. As you can see, these four cycles are sufficient to reconstruct the barycentric data quite closely. This shows that we’ve done a valid deconstruction of the original data.

Now, what does all of this have to do with the Loehle/Scafetta paper? Well, two things. First, in the discussion on that thread I had said that I thought that the 60 year cycle that Loehle/Scafetta said was in the barycentric data was very weak. As the analysis above shows, the barycentric data does not have any kind of strong 60-year underlying cycle. Loehle/Scafetta claimed that there were ~ 20-year and ~ 60-year cycles in both the solar barycentric data and the surface temperature data. I find no such 60-year cycle in the barycentric data.

However, that’s not what I set out to investigate. I started all of this because I thought that the analysis of random red-noise datasets might show spurious cycles. So I made up some random red-noise datasets the same length as the HadCRUT3 annual temperature records (158 years), and I checked to see if they contained what look like cycles.

A “red-noise” dataset is one which is “auto-correlated”. In a temperature dataset, auto-correlated means that todays temperature depends in part on yesterday’s temperature. One kind of red-noise data is created by what are called “ARMA” processes. “AR” stands for “auto-regressive”, and “MA” stands for “moving average”. This kind of random noise is very similar observational datasets such as the HadCRUT3 dataset.

So, I made up a couple dozen random ARMA “pseudo-temperature” datasets using the AR and MA values calculated from the HadCRUT3 dataset, and I ran a periodicity analysis on each of the pseudo-temperature datasets to see what kinds of cycles they contained. Figure 6 shows eight of the two dozen random pseudo-temperature datasets in black, along with the corresponding periodicity analysis of the power in various cycles in blue to the right of the graph of the dataset:

Figure 6. Pseudo-temperature datasets (black lines) and their associated periodicity (blue circles). All pseudo-temperature datasets have been detrended.

Note that all of these pseudo-temperature datasets have some kind of apparent underlying cycles, as shown by the peaks in the periodicity analyses in blue on the right. But because they are purely random data, these are only pseudo-cycles, not real underlying cycles. Despite being clearly visible in the data and in the periodicity analyses, the cycles are an artifact of the auto-correlation of the datasets.

So for example random set 1 shows a strong cycle of about 42 years. Random set 6 shows two strong cycles, of about 38 and 65 years. Random set 17 shows a strong ~ 45-year cycle, and a weaker cycle around 20 years or so. We see this same pattern in all eight of the pseudo-temperature datasets, with random set 20 having cycles at 22 and 44 years, and random set 21 having a 60-year cycle and weak smaller cycles.

That is the main problem with the Loehle/Scafetta paper. While they do in fact find cycles in the HadCRUT3 data, the cycles are neither stronger nor more apparent than the cycles in the random datasets above. In other words, there is no indication at all that the HadCRUT3 dataset has any kind of significant multi-decadal cycles.

How do I know that?

Well, one of the datasets shown in Figure 6 above is actually not a random dataset. It is the HadCRUT3 surface temperature dataset itself … and it is indistinguishable from the truly random datasets in terms of its underlying cycles. All of them have visible cycles, it’s true, in some cases strong cycles … but they don’t mean anything.

w.

APPENDIX:

I did the work in the R computer language. Here’s the code, giving the “periods” function which does the periodicity function calculations. I’m not that fluent in R, it’s about the eighth computer language I’ve learned, so it might be kinda klutzy.

#FUNCTIONS

PI=4*atan(1) # value of pi

dsin=function(x) sin(PI*x/180) # sine function for degrees

regb =function(x) {lm(x~c(1:length(x)))[[1]][[1]]} #gives the intercept of the trend line

regm =function(x) {lm(x~c(1:length(x)))[[1]][[2]]} #gives the slope of the trend line

detrend = function(x){ #detrends a line

x-(regm(x)*c(1:length(x))+regb(x))

}

meanbyrow=function(modline,x){ #returns a full length repetition of the underlying cycle means

rep(tapply(x,modline,mean),length.out=length(x))

}

countbyrow=function(modline,x){ #returns a full length repetition of the underlying cycle number of datapoints N

rep(tapply(x,modline,length),length.out=length(x))

}

sdbyrow=function(modline,x){ #returns a full length repetition of the underlying cycle standard deviations

rep(tapply(x,modline,sd),length.out=length(x))

}

normmatrix=function(x) sum(abs(x)) #returns the norm of the dataset, which is proportional to the power in the signal

# Function “periods” (below) is the main function that calculates the percentage of power in each of the cycles. It takes as input the data being analyzed (inputx). It displays the strength of each cycle. It returns a list of the power of the cycles (vals), along with the means (means), numner of datapoints N (count), and standard deviations (sds).

# There’s probably an easier way to do this, I’ve used a brute force method. It’s slow on big datasets

periods=function(inputx,detrendit=TRUE,doplot=TRUE,val_lim=1/2) {

x=inputx

if (detrendit==TRUE) x=detrend(as.vector(inputx))

xlen=length(x)

modmatrix=matrix(NA, xlen,xlen)

modmatrix=matrix(mod((col(modmatrix)-1),row(modmatrix)),xlen,xlen)

countmatrix=aperm(apply(modmatrix,1,countbyrow,x))

meanmatrix=aperm(apply(modmatrix,1,meanbyrow,x))

sdmatrix=aperm(apply(modmatrix,1,sdbyrow,x))

xpower=normmatrix(x)

powerlist=apply(meanmatrix,1,normmatrix)/xpower

plotlist=powerlist[1:(length(powerlist)*val_lim)]

if (doplot) plot(plotlist,ylim=c(0,1),ylab=”% of total power”,xlab=”Cycle Length (yrs)”,col=”blue”)

invisible(list(vals=powerlist,means=meanmatrix,count=countmatrix,sds=sdmatrix))

}

# /////////////////////////// END OF FUNCTIONS

# TEST

# each row in the values returned represents a different period length.

myreturn=periods(c(1,2,1,4,1,2,1,8,1,2,2,4,1,2,1,8,6,5))

myreturn$vals

myreturn$means

myreturn$sds

myreturn$count

#ARIMA pseudotemps

# note that they are standardized to a mean of zero and a standard deviation of 0.2546, which is the standard deviation of the HadCRUT3 dataset.

# each row is a pseudotemperature record

instances=24 # number of records

instlength=158 # length of each record

rand1=matrix(arima.sim(list(order=c(1,0,1), ar=.9673,ma=-.4591),

n=instances*instlength),instlength,instances) #create pseudotemps

pseudotemps =(rand1-mean(rand1))*.2546/sd(rand1)

# Periodicity analysis of simple sine wave

par(mfrow=c(1,2),mai=c(.8,.8,.2,.2)*.8,mgp=c(2,1,0)) # split window

sintest=dsin((0:157)*15)# sine function

plotx=sintest

plot(detrend(plotx)~c(1850:2007),type=”l”,ylab= “24 year sine wave”,xlab=”Year”)

myperiod=periods(plotx)

Advertisements

  Subscribe  
newest oldest most voted
Notify of
Ed Zuiderwijk

Have a look into wavelet analysis.

stephen richards

Good stuff Willis. I did fourier and laplasse functions many years ago. You have explained this particular use as well as anyone I remember.

Katherine

Willis, thanks for making this easy for a layperson to follow. Food for thought, indeed.
Just one thing. You wrote: Note also that the shortest length peak (24 years) is sharper than the longest length (72-year) peak. This is because there are fewer data points to measure and average when we are dealing with longer time spans, so the sharp peaks tend to broaden with increasing cycle length.
But later you wrote: It also demonstrates what we saw above, the spreading of the peaks of the signal at longer time periods because of the decreasing amount of data.
Based on the former, I think the latter should be “It also demonstrates what we saw above, the spreading of the peaks of the signal at longer time periods because of the increasing amount of data.”

Sean Houlihane

Numerology? Why? – your graphs of cycles have ZERO basis to have the original time scale applied under them. Sorry, but it appears that you don’t understand the analysis you are doing here and that doesn’t inspire to try and work out what your point is.

oMan

Very clear and informative. Thanks much. As a layman, I appreciate this mini-tutorial, which shows a powerful analytic tool and a reminder of how easily our eyes and mind find patterns, only some of which are really there.

Willis Eschenbach

Katherine says:
July 30, 2011 at 2:40 am

Willis, thanks for making this easy for a layperson to follow. Food for thought, indeed.
Just one thing. You wrote: Note also that the shortest length peak (24 years) is sharper than the longest length (72-year) peak. This is because there are fewer data points to measure and average when we are dealing with longer time spans, so the sharp peaks tend to broaden with increasing cycle length.
But later you wrote: It also demonstrates what we saw above, the spreading of the peaks of the signal at longer time periods because of the decreasing amount of data.
Based on the former, I think the latter should be “It also demonstrates what we saw above, the spreading of the peaks of the signal at longer time periods because of the increasing amount of data.”

Thanks, Katherine. Actually, the amount of data decreases as the cycle length increases. Consider a cycle of length two. Each one is the average of half the data.
Now consider a cycle length of three. Each one is the average of a third of the data.
As you can see, increasing the cycle length decreases the amount of data available to average for each point in the cycle.
w.

Willis Eschenbach

Sean Houlihane says:
July 30, 2011 at 2:40 am

Numerology? Why? – your graphs of cycles have ZERO basis to have the original time scale applied under them. Sorry, but it appears that you don’t understand the analysis you are doing here and that doesn’t inspire to try and work out what your point is.

Sean, it sounds like you didn’t understand the analysis I am doing here. If you have questions, I’m happy to answer them.
w.

You appear to have used a mathematical algorithm to generate the random data. Is it not possible that there is an element of non-randomness here, which is why you see the periods you do?
I’d be more likely to be convinced if you had used truly random data. Why not grab some datasets here and run the tests again: http://www.random.org/

Willis Eschenbach

Derek Sorensen says:
July 30, 2011 at 3:12 am

You appear to have used a mathematical algorithm to generate the random data. Is it not possible that there is an element of non-randomness here, which is why you see the periods you do?
I’d be more likely to be convinced if you had used truly random data. Why not grab some datasets here and run the tests again: http://www.random.org/

Derek, I have used datasets which are not simple random normal datasets. Those kinds of random datasets are called “white noise”. However, they do not resemble observational datasets of things like temperature.
This is because temperature datasets are “autocorrelated”, meaning that today’s temperature depends in part on yesterday’s temperature. Datasets composed of autocorrelated noise are called “red noise” datasets.
Note that the quality of these datasets is that they don’t take huge jumps, say from way up high to way down low and back again. Instead, they follow something like a “drunkard’s walk”, a random trail that, although not taking large jumps, wanders widely. Because today’s temperature depends in part on yesterday’s temperature, the list of temperatures forms a trail, not just random white noise.
It is entirely possible to generate random “red noise” datasets. They do not, as you suggest, contain the periods. Instead, because they are a trail and they wander up and down, pseudo-cycles are quite common. It looks like cycles … but it’s not.
w.

Willis – do you have anything in your toolbox that would handle cycles of varying length, such as the solar cycle or the PDO?

commieBob

The practical use of Fourier Analysis has a really nasty problem that most people aren’t aware of. It is called spectral leakage and it isn’t dealt with in many/most digital signal processing texts. The fact that the dataset is necessarily truncated produces spurious frequencies and the result is that the analysis can be complete garbage. A time series of fewer that 200 data points (one for each year) is a problem. You will see frequencies that aren’t there and may miss important frequencies because of spectral leakage. A good reference is The Scientist and Engineer’s Guide to Digital Signal Processing By Steven W. Smith, Ph.D. http://www.dspguide.com/pdfbook.htm It is written with the idea that you might actually want to do some digital signal processing and deals with the gotchas that will probably bite you if you don’t know about them.
The fact that Fourier Analysis finds sine waves isn’t a problem. A sine wave is is the basic unit of oscillation. It isn’t theoretical, it is really there. Any other repeating waveform can be broken down to its component sine waves. If my spectrometer tells me that a frequency exists, I can tune into that frequency with a radio and find that it does indeed exist.

Stephen Wilde

That seems to be a mathematical way of showing that the cycles we think we see in the historical records and the proxies are not ‘significant’.
But does that matter?
On human timescales the changes that appear to arise from apparent climate cycling are real and in the past have led to the rise and fall of civilisations.
From our perspective there is enough ‘significance’ in those allegedly spurious cycles to give us a degree of predictive ability albeit within a wide range and albeit not guaranteed.
Thus if the sun gets a bit less active the vertical temperature profile of the atmosphere changes with effects on surface pressure distribution and likewise but from the bottom up when ocean surfaces are in heat releasing mode.
So who cares about statistical exercises when the linkages are so apparent?

Richard Saumarez

The peak at 60+ period is almost certainly a function of record length,which appears to be 130 years from your plots. This is a well known problem in any orthogonal, or non-orthogal transform and there are well-known ways of eliminating it.
Since the periodic transform uses a non-orthogonal basis set, the physical interpretation of the periodicities is difficult. To decompose a signal into a limited set of non-orthogonal basis functions and then reconstruct it (with some error presumably) tells us nothing about the signal. The PT is closely related to frequency domain template matching, where phase dependence can be specified, appears to give better understanding to structure of the signal.
Your thesis is that a 60 year periodicity in the HADCRUT data is an artefact of the processing methods. I am not convinced that you have done this correctly and I would be interested to see the results of more established methods, applied correctly, to your data set. If there is a 60 year periodity in the data, it should be possible to establish this, although the statistical significance is likely to be low as it is only twice the fundamental frequency of the data. Since you have appeared to use an ensemble of the HADCRUT data, a stratified approach may yield more information.

Robert of Ottawa

Willis Eschenbach says:
July 30, 2011 at 3:24 am
It looks like cycles … but it’s not.
Which was the whole point of the article I believe. The human mind looks for patterns where there are not necessailry any – both a gift and curse.

Willis Eschenbach says:
July 30, 2011 at 3:24 am

Fair enough, and thanks for the explanation.

Henry

Willis. You seem to be use “mod”, possibly from the matlab package. You might find “%/%” does integer division.

Kasuha

This is very impressive analysis and explanation, many thanks for it.
I wonder what fits would it make if the periodicity analysis was extended into non-integer domain (i.e. to cover also periods like 19.7 etc)

R. de Haan

Great article.
Looking forward to a response from Loehle and Scafetta here at WUWT.

Willis,
I think the problem here is that you are dealing with non-orthogonal functions, and seeking to partition the power between them. In Fourier analysis you can do that, because of orthogonality, but without, you can’t.
The power of the sum of two frequencies, v1 and v2, say, is (v1+v2)^2. And this adds as v1^2 and v2^2 only if the average power in the product v1*v2 is zero – orthogonality.
You might like to check Ed Z’s comment. I think what you are describing is a fast wavelet transform, which works by repeated doubling of the interval length.
What’s in a name? Just that wavelet transforms are very popular, and there is a stack of R packages to help. Eg wavelets.
I note that your IEEE paper anticipates that and points out that their version doubles the period rather than the frequency. But with the fast transform based on doubling, this is just looking at the same sequence in reverse.

Willis, my (limited) understanding of the limitations of average global atmospheric temperature data sets seems that they might not be very representative of real natural processes. I wonder what might happen if you apply the same technique to say the CET data set? I would attempt it myself, however, I suspect it would take me much longer than you… you seem to have the tools at your fingertips and neurons to spare :-).

David

The ‘drunkards walk’ is Brownian motion isn’t it? Or a random walk. A stock price can be said to follow such a motion (chartists may disagree – not too may wealthy chartists).

Katherine

As you can see, increasing the cycle length decreases the amount of data available to average for each point in the cycle.
I see. Thanks for the clarification.

When I worked in speech research I used a similar brute force method in analyzing voice waves to locate glottal jitter. Fourier doesn’t work well on these waves because the basic pulse is NOT sinusoidal. Contrary to an earlier comment, the sine wave is not “always there”, it’s always an artifact of Fourier. In a semi-coupled system like the glottis and articulators, Fourier generates a lot of ‘frequencies’ that simply aren’t part of the real process. They only get in the way of a meaningful analysis.
As I recall (25 years later), my algorithm was fairly close to an LPC method, though I didn’t start out trying to implement LPC. Most good speech analysis uses LPC in one way or another.

Jeff L

Willis,
Could you post a picture of the auto correlation of your random data sets & the real data set & use those plots to show the difference between a “real” signal & an artifact? TIA

Steve McIntyre

Willis, I also dislike attempts to find wave patterns in series that are almost certainly red noise. Your reference to Sethares and Staley looks interesting. Since Sethares and Staley refer to wavelet transforms, I presume that (contra Nick Stokes) their methodology has considered wavelet methods, though I am not in a position right now to comment on whether Sethares and Staley’s method is a useful improvement on wavelets or not.

Jeff L

Another thing for readers to understand is that a Fourier transform is a technique to visualize data in a different domain – there is nothing inherently wrong with it, regardless of the nature of the signal that is being transformed. It is a fully reversible process (ie you can transform any time domain dataset into the frequency domain & take any frequency domain dataset back to the time domain without losing any information). The real issue here is how you interpret the results of that transformation / analysis.

Don K

Some comments on various things:
1. As I understand it, Any “waveform” including digitized random pencil squiggles has a Fourier transform into the frequency domain that exactly reproduces the waveform. Therefore the fact that cycles appear when the data is subjected to Fourier analysis really doesn’t prove anything. I’m not 100% sure of that, and the underlying math is pretty intimidating to those of us to whom equations do not speak with a loud, clear voice. But I think that one really needs to show physical phenomena with appropriate phase and amplitude to back up Fourier decomposition. The same probably holds true for Periodocity analysis?
2. commieBob on spectral leakage. I was aware that spurious cycles appear due to data set truncation. But I thought they were mostly high frequency cycles and could be removed by low pass filtering or simply ignored in most cases. Am I wrong about that also?
3. CommieBob on whether Fourier’s dependence on sine(/cosine) waves is a problem. Well, yes, you’ll see the components if you do a spectral scan so they are sort of “real”. But it’s not so clear that someone looking for a physical cause for a periodic phenomenon is going to get all that much information from the sine waves if the actual waveform is a square wave or sawtooth or some other arbitrary shape.
4. Derek Sorenson and Willis on randomness. Both right but talking past each other? Willis almost certainly correct that you need to mimic autocorrelation by using red noise. But Derek is probably correct that — at least in concept — the red noise needs to be based on truly random values — white noise — not computer generated pseudo-random numbers. In practice, I think that pseudo-random numbers might be good enough. And all that assumes that red noise is generated by some sort of transformation on an input purportedly random data set. I don’t have the slightest idea how it is actually generated.

Ninderthana

Willis Eschenbach,
A very impressive technique but I must take you up on one point. There is a very strong underlying physical reason for a 60 year cycle in the Barycentre motion.
If you were considering speed and not velocity then I would not dispute your identification of a 20 year cycle but NO 60 year cycle. The speed of the Sun about the centre-of- mass of the Solar System is primarily modulated by the alignments of Jupiter and Saturn every 19.858 years. However, if you are considering the Sun’s velocity (i.e. the Sun’s speed and velocity of the Sun about the centre-of-mass of the Solar System) then this roughly repeats every 59.574 (= 19.858 x 3) years.
This is caused by the fact that the Sun completes one loop around the barycentre roughly once every every 20 years, However, the axis of alignment of the orbital loop (about the barycentre) advances by 120 degrees with each completed orbit. This means that the Sun must complete three twenty year orbits for it to return to roughly same position with respect to the distant stars.
Given that Loehle and Scafetta are claiming that planetary forces are somehow playing an indirect role in influencing the Earth’s climate, this influence would have to synchronized with the forces involved. This means that they would have to search for both the 20 year periodicity related to the Sun barycentric speed, and the 60 year periodicity associated with the Sun’s barycentric velocity.
This shows that it is important to consider the underlying physical principles when you are using periodicity analysis otherwise you might come to the wrong conclusion.
I agree that Loehle and Scafetta should have given the statistical significance of the periodicities that they found in the HADCRUT data compared to AR(1) red-noise. They should have used Singular Spectral Analysis with Monte Carlo trials to test the statistical significance of their results.

A physicist

Anthony,
As a professional scientist I have to correct this statement you make:
“First, it can only extract sinusoidal signals. Second, although it has good resolution as short timescales, it has poor resolution at the longer timescales. ”
This statement is, unfortunately, incorrect. Mathematically, Fourier expansion/analysis has shown to be complete expansion: Any function can be expanded in an infinite series of sinusoidal functions. If the periodic function of period T is NOT sinusoidal, then this periodicity can be seen in the Fourier expansion coefficient of terms which has period T. T/2, T/3, T/4, ….. . All those terms have a period of T after 1,2,3,4, … repetitions. Resolution wise – it depends on HOW many terms included in the expansion. You can include as far as T/100 or T/1000 to get bettter resolution. Of course, computation wise this is costly. Of course, it is DIFFICULT / IMPOSSIBLE to find a periodic signal in the data with period longer than the total duration of the recorded data/signal (If the data is only recorded for 1000 years, there is no way to find a periodic behavior with period longer than 1000 years, even larger than 500 years).
[You are mistaken about the author of the statement you quoted. ~dbs, mod.]

Hi Willis,
A very interesting presentation, applying an idea that was originally intended for another domain (music, Sethares, the inventor of this technique is a music theory expert). This kind of ‘idea recycling’ is often very rewarding, especially older ideas that initially failed, maybe due to lack of computation resources, and blossom when reexamined when more powerful research tools bcome available.
However, I’ll have to pick a few nits with you on this:
“While Fourier analysis is very useful, it has a few shortcomings. First, it can only extract sinusoidal signals. Second, although it has good resolution as short timescales, it has poor resolution at the longer timescales.”
‘First’) This is not a shortcoming, it’s incredibly useful for analysis. Fourier’s theorem makes an amazing, higly non-inutitive claim: any bounded signal (i.e. ‘real world’) can be decomposed into a set of weighted sinusoids. Guaranteed! How do we prove that? Simple, just add the sinusoids together and you reconstruct the original signal assuming the signal was sampled at least twice per period for each time step (Nyquist).
This also proves that there are no other ‘missing’ waveforms in the analysis, because we can reconstruct the original signal _entirely_ from sinusoids. Any other ‘waveforms’ observed in the signal will also be reduced to sinusoids.
It can also be shown that the resulting sets of sinuoids are unique (independent of the ordering of the calculation steps) and preserve the conservation of energy (Parseval Theorem). I.e. the sum of the energy in each sinuoid is equal to the total energy in the time envelope.
‘Second’) Perhaps you misstated this. Frequency resolution improves as the length of the time samples increase. It’s a Heisenberg tradeoff: we become more certain about frequency resolutions as we lose time resolution, and vice versa.
One of the problems with Periodicity Analysis (as Sethares and Stacey point out in their original paper) is that the resulting patterns are not unique, depending on how you order the steps of your calculation. This is due to the fact that the underlying subspaces of the decomposition are not orthognal. Whereas in Fourier (and Wavelet) decomposition the subspaces are othognal, and so the transforms produce unique pattern features.
So you can’t reconstruct the original signal by simply adding the spectral components, at least not without a lot of complicated bookkeeping to account for interactions between components.
I’m not saying this kind of analysis is useless, but you have to be careful about making claims of ‘reality’ concerning the output products, because they might disappear if you perform the transform in a diferent order. (Fourier outputs are real in the sense of uniqueness and energy convervation. If it says there is a spectral component at 60, then it’s there, up to Nyquist aliasing and noise issues.)
So, if you’re careful about how the compoents are produced, then this kind of analysis might be useful for classfication and discrimative modeling purposes, and might provide insights into data that other techniques don’t provide.
Can you provide a pointer to the HADCRUT3 datasets you used above. I’d like to do my spectral analysis on this data.
Thank again for a thought-provoking article!

Willis
Interesting analysis for “cycles” – however “nature” may have regular “oscillations” that are not exact “cycles”. e.g. the Pacific Decadal Oscillation (PDO) has been tracked across may centuries, but may not have exactly the same length. There are multiple lines of supporting data supporting the ~ 20 and 60 year oscillations that provide further support for Loehle & Scafetta. e.g. see Easterbrook
Stockwell at Niche Modeling lists some of PDO analysis papers: Natural Variation – 60 year cycle e.g.

The reconstruction indicates that the PDO is a robust feature of North Pacific climate variability throughout the study period, however, the major modes of oscillation providing the basic PDO regime timescale have not been persistent over the last 530 years. The quasi-centennial (75–115-yr) and pentadecadal (50–70-yr) oscillations dominated the periods before and after 1850, respectively. Our analysis suggest that solar forcing fluctuation on quasi-centennial time scale (Gleissberg cycle) could be the pace-maker of the PDO before 1850, and the PDO behavior after 1850 could be due, in part, to the global warming.

e.g. Shen et al. A Pacific Decadal Oscillation record since 1470 AD reconstructed from proxy data of summer rainfall over eastern China, GEOPHYSICAL RESEARCH LETTERS, VOL. 33, L03702, 4 PP., 2006 doi:10.1029/2005GL024804
Stocker & Mysak “Climatic Fluctuations on the Century Time Scale: A Review of High-Resolution Proxy Data and Possible Mechanisms.” Climatic Change 20:227-250, (1992) 227-250.

The results indicate that this oceanic circulation exhibits natural variability on the century time scale which produces oscillations in the ocean-to-atmosphere heat flux. Although global in extent, these fluctuations are largest in the Atlantic Ocean.

Ed Fix’s Solar activity simulation model seems to capture well the major solar variations which have varying amplitude and length. These could well be the primary cause for the ocean oscillations – even if they are not precise “cycles”. (Shown by David Archibald) See
Ed Fix The Relationship of Sunspot Cycles to Gravitational Stresses on the Sun: Results of a Proof-of-Concept Simulation”. Ch 14 p 335 of Dr. Donald Easterbrook, ed. (Elsevier, 2011) e-book (Search the book for “355″ or “barycenter” or “sunspot cycles”)








Powered by Ingram Digital
See also the major variations in the Length of Day (LOD), including reviews by Paul L. Vaughan

chrism

good to see,
did you ever see Shaw and Tigg, Applied Mathematica ? It has some absolutely neat Time series evaluation and a maximum entropy data reconstruction,
I would imagine either of the authors would be interested in lending advice from what I remember of their writing,
anyone know them ??

In calculating a global average temperature, the strongest known physical cycles (diuranal and seasonal at each site) are “averaged out”. I would expect that red noise becomes a stronger signal in the process. I suggest that because radiative energy transfer is “line of site and fast as light” that the relationship between atmospheric CO2 concentrations and energy lost to space is best determined on a site by site and hourly time frame.

Slabadang

Willis!
You are a fantastic pedagog. And the knowledge and aplication of statistics in science in general and in climate science especially is weak or bad. The ower interpritations are more a rule than an exception.
Often when I hear that they found “signals” as well now as “cycles” the red flagg hoists!

Thank you Willis for 2 things first for writing these things in a manner that those of us without a “tidy mind” can understand and second for keeping up the search for the truth in “climate science” you are helping the common man understand that there is an unethical agenda in “climate science” and as with all things the truth will out.

Michael Larkin

“The bottom left panel shows the original barycentric data in black, and the reconstruction made by adding just these four cycles of different lengths is shown in blue. As you can see, these four cycles are sufficient to reconstruct the barycentric data quite closely. This shows that we’ve done a valid deconstruction of the original data.”
Willis, at the magnification of that bottom left curve, it’s hard to make out the fact that there might be a black curve and a blue curve.I kept staring at it until I noticed what I assumed were tiny gaps.The black and the blue are so near in tone, and it might have worked better with black and red, for example.
Maths has always been my Achilles’ heel. Whilst I get the gist of what you are saying, I had trouble with the following, from which I didn’t recover:
“For example, suppose that we want to find the actual cycle of length two in a given dataset. We can do it by numbering the data points in order, and then dividing them into odd- and even-numbered data points. If we average all of the odd data points, and we average all of the even data, it will give us the average cycle of length two in the data. Here is what we get when we apply that procedure to the HadCRUT3 dataset:”
You speak of data points, but you do not speak of the values of data points. I can see an amplitude arises of about 0.01, and that seems to be dimensionless; it’s not degrees centrigrade, for example. Because you don’t give a worked numeric example, I’m struggling to understand how that amplitude arises and what it means, here and in the elaborated examples for periods 4 and 20.
What I’m failing to grasp may be very simple for you, so simple that you didn’t think to spell it out. If I could grasp it, I think pretty much anyone could.

Michael Larkin

I suppose I should have said an amplitude of 0.02 peak-to-peak.

Robert of Ottawa wrote:
The human mind looks for patterns where there are not necessailry any – both a gift and curse.
Great observation. And people that don’t understand confirmation bias are doomed to continue misinterpreting things due to mental myopia.

charlesH

Assuming my understanding is correct, I think you have it backwards Willis.
The Loehle and Scafetta paper isn’t significant because 60 and 20 yr cycles have been PROVEN to exist in the temperature record, rather they have been shown to replicate the temperature record quite well when a rather small co2 climate sensitivity is added.
This result is consistent with the recent paper of Spencer and Braswell regarding climate sensitivity.
Both paper support the idea that the IPCC overestimates climate sensitivity.
Am I missing something?

Darren Parker

What about the Great Year Cycle of approx 26,000 years? And you could also possibly consider the 70,000 year cycle , the precession of the ecliptic

ChE

AP –
The low-frequency resolution problem that Willis (not Anthony) talks about is no less fundemental than the Heisenberg Uncertainty Principal. In fact, it really is the basis of it.

Fourier analysis of the distance [in AU] between the sun and the barycenter [inversely related to the speed; doesn’t matter which one is used] http://www.leif.org/research/FFT-Barycenter-Distance.png covering 6000 [3000 BC to 3000 AD] years with a datapoint every 100 days shows the following periods [above 9 years]
9.92
11.87 Jupiter
12.78 second largest
13.8
19.85 largest peak
29.42 Saturn
35.89
45.31
61.03 very tiny (about 30 times smaller than the 19.85 yr peak)
83.85 Uranus
169.28 Neptune
There is thus no significant 60 yr period as is also evident by simple inspection of any section of the data, e.g. http://www.leif.org/research/Barycenter-Distance-2500BC-2000BC.
The spectral lines are very sharp, because there is no noise and the number of data points is very large ~22,000 and the cycles are actually there and nearly sinusoidal.

ChE

Oh, and Willis, kudos for publishing your code. If only this were standard practice.

Leif Svalgaard says:
July 30, 2011 at 8:13 am
http://www.leif.org/research/Barycenter-Distance-2500BC-2000BC.png is better

Surse

Truly fascinating post and comments. I am not a scientist but in following WUWT for years, I think i’m catching the drift!
Willis: “…what does all of this have to do with the Loehle/Scafetta paper? Well, two things. First, in the discussion on that thread I had said that I thought that the 60 year cycle that Loehle/Scafetta said was in the barycentric data was very weak. As the analysis above shows, the barycentric data does not have any kind of strong 60-year underlying cycle. Loehle/Scafetta claimed that there were ~ 20-year and ~ 60-year cycles in both the solar barycentric data and the surface temperature data. I find no such 60-year cycle in the barycentric data.”
Ninderthana: “A very impressive technique but I must take you up on one point. There is a very strong underlying physical reason for a 60 year cycle in the Barycentre motion.”
“If you were considering speed and not velocity then I would not dispute your identification of a 20 year cycle but NO 60 year cycle. The speed of the Sun about the centre-of- mass of the Solar System is primarily modulated by the alignments of Jupiter and Saturn every 19.858 years. However, if you are considering the Sun’s velocity (i.e. the Sun’s speed and velocity of the Sun about the centre-of-mass of the Solar System) then this roughly repeats every 59.574 (= 19.858 x 3) years.”
“This is caused by the fact that the Sun completes one loop around the barycentre roughly once every every 20 years, However, the axis of alignment of the orbital loop (about the barycentre) advances by 120 degrees with each completed orbit. This means that the Sun must complete three twenty year orbits for it to return to roughly same position with respect to the distant stars.”
Me: So is there a 60 year cycle or not?

Ric Locke

::sigh:: Mathematicians and non-mathematicians talking past one another. Nothing to see here…
First off, there’s nothing in Fourier analysis that prevents it from analyzing odd-shaped waveforms; in fact, that’s the point. A square wave is almost the trivial case — it’s the sum of an infinite number of odd harmonics (“multiples”) of the square wave’s frequency.
Any form of frequency analysis, regardless of method, when applied to a real (limited) data set, will show erroneous frequencies. This is a consequence of the fact that the data set can be thought of as a pulse: (no data)(dataset)(no data). The transition from no data -> data and the one from data -> no data constituted the edges of one cycle of a square wave, so the results will show frequencies at odd harmonics of twice the period of the data. One of the reasons for using wavelet analysis is that it tends to suppress this effect.
Regards,
Ric

Critically challenging published articles here with new analysis, explaining it in a way the layman reader can understand, plus publishing complete code….whaddya think this is, Real Climate?
Oh, wait.

Brian Macker

Derek Sorensen,
“Numerology? Why? – your graphs of cycles have ZERO basis to have the original time scale applied under them.”
If the assumption is that we start a particular time and that temperature changes behave as red noise from that time with the same magnitude as real temperature changes then the time frames make sense. Just as placing time frames on other models make sense. It just so happens that this model is timeless in that you can start with any date you want. It would only need be fixed if you chose a real temperature from a real date to start the process, or a particular temperature range within a particular time period. Since the claim wasn’t that this is a predictive model, but a non-predictive one, none of that matters.
So apparently it is you who doesn’t understand, and that includes some quite obvious if implicit assumptions.

KR

Excellent analysis, Willis.
I would note that Fourier analysis can uniquely decompose any signal up to the sampling/Nyquist limits. What it does do is simply identify internal patterns in the data that are not sinusoids, as the spectral data of (for example) a sawtooth pattern is spread between multiple sinusoids. However, since the majority of the energy is in those first few low frequency components and harmonics, you can usually identify those major cyclic behaviors.
I’m going to look into the Periodicity Transforms – they might be useful for some of the things I work on. I will echo the concern of others, though, that a non-orthogonal basis function such as that will be operation order dependent, meaning that each implementation of PT will potentially provide different answers even if using the same basis functions.
Robert of Ottowa“The human mind looks for patterns where there are not necessarily any – both a gift and curse.”
Absolutely right.
It’s worth keeping in mind that cyclic behavior is seen when there is a physical basis behind it, such as a pendulum. But it can also be an artifact of a short data set, which _looks_ cyclic even if it isn’t. For example, the velocity of my car may appear cyclic while trying to get out of a parking space, but that’s a poor prediction of my velocity once I’m on the road. Cyclic analysis not tied to the physical system exhibiting that behavior may well be an artifact of the analysis or the time frame examined – it really lacks explanatory power. And if there’s not a pendulum behind the curtain, predicting future behavior on purely output data analysis will be a serious c***shoot. That’s my major issue with Loehle and Scafetta – they identify nothing in the physics of the climate system that could show such behaviors.