This DSP engineer is often tasked with extracting spurious signals from noisy data. He submits this interesting result of applying these techniques to the HadCRUT temperature anomaly data. Digital Signal Processing analysis suggests cooling ahead in the immediate future with no significant probability of a positive anomaly exceeding .5°C between 2023 and 2113. See figures 13 and 14. Code and data is made available for replication. – Anthony
Guest essay by Jeffery S. Patterson, DSP Design Architect, Agilent Technologies
Harmonic Decomposition of the Modern Temperature Anomaly Record
Abstract: The observed temperature anomaly since 1900 can be well modeled with a simple harmonic decomposition of the temperature record based on a fundamental period of 170.7 years. The goodness-of-fit of the resulting model significantly exceeds the expected fit to a stochastic AR sequence matching the general characteristic of the modern temperature record.
Data
I’ve used the monthly Hadcrut3 temperature anomaly data available from http://woodfortrees.org/data/hadcrut3vgl/every as plotted in Figure 1.
Figure 1 – Hadcrut3 Temperature Record 1850-Present
To remove seasonal variations while avoiding spectral smearing and aliasing effects, the data was box-car averaged over a 12-month period and decimated by 12 to obtain the average annual temperature plotted in Figure 2.
Figure 2 – Monthly data decimated to yearly average
A Power Spectral Density (PSD) plot of the decimated data reveals harmonically related spectral peaks.
Figure 3 – PSD of annual temperature anomaly in dB
To eliminate the possibility that these are FFT (Fast Fourier Transform) artifacts while avoiding the spectral leakage associated with data windowing, we use a technique is called record periodization. The data is regressed about a line connecting the record endpoints, dropping the last point in the resulting residual. This process eliminates the endpoint discontinuity while preserving the position of the spectral peaks (although it does extenuate the amplitudes at higher frequencies and modifies the phase of the spectral components). The PSD of the residual is plotted in Figure 4.
Figure 4 – PSD of the periodized record
Since the spectral peaking is still present we conclude these are not record-length artifacts. The peaks are harmonically related, with odd harmonics dominating until the eighth. Since spectral resolution increases with frequency, we use the eighth harmonic of the periodized PSD to estimate the fundamental. The following Mathematica (Mma) code finds the 5th peak (8th harmonic) and estimates the fundamental.
wpkY1=Abs[ArgMax[{psdY,w>.25},w]]/8
0.036811
The units are radian frequency across the Nyquist band, mapped to ±p (the plots are zoomed to 0 < w < 1 to show the area of interest). To convert to years, invert wpkY1 and multiply by 2p, which yields a fundamental period of 170.7 years.
From inspection of the PSD we form the harmonic model (note all of the radian frequencies are harmonically related to the fundamental):
(*Define the 5th order harmonic model used in curve fit*)
model=AY1*Sin[wpkY1 t+phiY1]+AY2*Sin[2*wpkY1* t+phiY2]+AY3*
Sin[3*wpkY1* t+phiY3]+AY4*Sin[4*wpkY1* t+phiY4]+AY5*
Sin[5*wpkY1* t+phiY5]];
vars= {AY1,phiY1,AY2,phiY2,AY3,phiY3,AY4,phiY4,AY5,phiY5 }
and fit the model to the original (unperiodized) data to find the unknown amplitudes, AYx, and phases, phiYx.
fitParms1=FindFit[yearly,model,vars,t]
fit1=Table[model/.fitParms1,{t,0,112}];
residualY1= yearly- fit1;{AY1→-0.328464,phiY1→1.44861,AY2→-
0.194251,phiY2→3.03246,AY3→0.132514,phiY3→2.26587,AY4→0.0624932,
phiY4→-3.42662,AY5→-0.0116186,phiY5→-1.36245,AY8→0.0563983,phiY8→
1.97142,wpkY1→0.036811}
The fit is shown in Figure 5 and the residual error in Figure 6.
Figure 5 – Harmonic model fit to annual data
Figure 6 – Residual Error Figure 7 – PSD of the residual error
The residual is nearly white, as evidenced by Figure 7, justifying use of the Hodric-Prescott filter on the decimated data. This filter is designed to separate cyclical, non-stationary components from data. Figure 8 shows an excellent fit with a smoothing factor of 15.
Figure 8 – Model vs. HP Filtered data (smoothing factor=3)
Stochastic Analysis
The objection that this is simple curve fitting can be rightly raised. After all, harmonic decomposition is a highly constrained form of Fourier analysis, which is itself a curve fitting exercise that yields the harmonic coefficients (where the fundamental is the sample rate) which recreate the sequence exactly in the sample domain. That does not mean however, that any periodicity found by Fourier analysis (or by implication, harmonic decomposition) are not present in the record. Nor, as will be shown below, is it true that harmonic decomposition on an arbitrary sequence would be expected to yield the goodness-of-fit achieved here.
The 113 sample record examined above is not long enough to attribute statistical significance to the fundamental 170.7 year period, although others have found significance in the 57-year (here 56.9 year) third harmonic. We can however, estimate the probability that the results are a statistical fluke.
To do so, we use the data record to estimate an AR process.
procY=ARProcess[{a1,a2,a3,a4,a5},v];
procParamsY = FindProcessParameters[yearlyTD["States"],procY]
estProcY= procY /. procParamsY
WeakStationarity[estProcY]
{a1→0.713,a2→0.0647,a3→0.0629,a4→0.181,a5→0.0845,v→0.0124391}
As can be seen in Figure 9 below, the process estimate yields a reasonable match to observed power spectral density and covariance function.
Figure 9 – PSD of estimated AR process (red) vs. data Figure 9b – Correlation function (model in blue)
Figure 10 – 500 trial spaghetti plot Figure 10b – Three paths chosen at random
As shown in 10b, the AR process produces sequences which in general character match the temperature record. Next we perform a fifth-order harmonic decomposition on all 500 paths, taking the variance of the residual as a goodness-of-fit metric. Of the 500 trials, harmonic decomposition failed to converge 74 times, meaning that no periodicity could be found which reduced the variance of the residual (this alone disproves the hypothesis that any arbitrary AR sequences can be decomposed). To these failed trials we assigned the variance of the original sequence. The scattergram of results are plotted in Figure 11 along with a dashed line representing the variance of the model residual found above.
Figure 11 – Variance of residual; fifth order HC (Harmonic Coefficients), residual 5HC on climate record shown in red
We see that the fifth-order fit to the actual climate record produces an unusually good result. Of the 500 trials, 99.4% resulted in residual variance exceeding that achieved on the actual temperature data. Only 1.8% of the trials came within 10% and 5.2% within 20%. We can estimate the probability of achieving this result by chance by examining the cumulative distribution of the results plotted in Figure 12.
Figure 12 – CDF (Cumulative Distribution Function) of trial variances
The CDF estimates the probability of achieving these results by chance at ~8.1%.
Forecast
Even if we accept the premise of statistical significance, without knowledge of the underlying mechanism producing the periodicity, forecasting becomes a suspect endeavor. If for example, the harmonics are being generated by a stable non-linear climatic response to some celestial cycle, we would expect the model to have skill in forecasting future climate trends. On the other hand, if the periodicities are internally generated by the climate itself (e.g. feedback involving transport delays), we would expect both the fundamental frequency and importantly, the phase of the harmonics to evolve with time making accurate forecasts impossible.
Nevertheless, having come thus far, who could resist a peek into the future?
We assume the periodicity is externally forced and the climate response remains constant. We are interested in modeling the remaining variance so we fit a stochastic model to the residual. Empirically, we found that again, a 5th order AR (autoregressive) process matches the residual well.
tDataY=TemporalData[residualY1-Mean[residualY1],Automatic];
yearTD=TemporalData[residualY1,{ DateRange[{1900},{2012},"Year"]}]
procY=ARProcess[{a1,a2,a3,a4,a5},v];
procParamsY = FindProcessParameters[yearTD["States"],procY]
estProcY= procY /. procParamsY
WeakStationarity[estProcY]
A 100-path, 100-year run combining the paths of the AR model with the harmonic model derived above is shown in Figure 13.
Figure 13 – Projected global mean temperature anomaly (centered 1950-1965 mean)
Figure 14 – Survivability at 10 (Purple), 25 (Orange), 50 (Red), 75 (Blue) and 100 (Green) years
The survivability plots predict no significant probability of a positive anomaly exceeding .5°C between 2023 and 2113.
Discussion
With a roughly one-in-twelve chance that the model obtained above is the manifestation of a statistical fluke, these results are not definitive. They do however show that a reasonable hypothesis for the observed record can be established independent of any significant contribution from greenhouse gases or other anthropogenic effects.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Bruce Cobb:
In response to my supporting the view (shared by several in this thread) to assess if the method applied to half the time series can predict the other half, you have replied September 11, 2013 at 7:18 am
Sorry, but no. We are discussing an ability to model the data set and not its ability to model whatever you, I or anyone else thinks the data set should be.
However, if you are saying the data is dubious, then I agree; see
http://www.publications.parliament.uk/pa/cm200910/cmselect/cmsctech/memo/climatedata/uc0102.htm
Whilst I agree that point, I stress that it is not relevant to considerations in this thread.
Richard
Greg,
Of course moving-average (“boxcar”) filters are very intuitive and simple to implement. But that doesn’t change the fact that they also produce spurious artifacts in their output.
Gaussian filters are “ideal” in the sense that they do not produce these artifacts. But, for the same length filter, produce thicker spectral output lines, thus have less frequency resolving power.
http://en.wikipedia.org/wiki/Gaussian_filter
“Gaussian filters have the properties of having no overshoot to a step function input while minimizing the rise and fall time. This behavior is closely connected to the fact that the Gaussian filter has the minimum possible group delay. It is considered the ideal time domain filter,…”
Jeff Patterson says:
Greg Goodman says: It’s true that differential attenuates as 1/f
Actually it’s 1/f^2. Low frequency attentuation is extreme.
===
In the power spectrum maybe but in FT or your fitted model 1/f . d/dt(cos(wt)) ….
I am very surprised to see this.
You are imposing highly artificial constraints on the system. Your projected results (fig 13) are simply a periodic rerun of the 20th century. What you have done, in essense it to assume an impulse response based on auto-correlation, and perturbed this with a periodic input to get a result.
The point is that you cannot gain a statistical characterisation of your model because the persistance of the climate system is too long, compared to the record length, to allow segmentation of the record and improve estimated accuracy. It is inevitable that random inputs into a system with persistence will produce trends. The estimation of that persistence is very difficult and distinction between competing physical models almost impossible with current data. Nevertheless, trends that are comparable to that seen in the modern record are just possible with a randomly excited Hurst system. See a tutorial post:
http://judithcurry.com/2012/02/19/autocorrelation-and-trends/
The model assumes a periodic defined input and noise with some statistical properties. This is a stationary input, in the commonly accepted sense, because it has constant parameters with time. This is a huge assumption.
I would comment that this is the blind application of signal processing without giving sufficient thought to underlying physical assumptions: I wrote here, in a tutorial post
http://wattsupwiththat.com/2013/04/09/correlation-filtering-systems-and-degrees-of-freedom/
“Modern scripting programs such as “R” allow one to perform many signal processing calculations very easily. The use of these programs lies in not applying them blindly to data but in deciding how to use them appropriately. Speaking from bitter experience, it is very easy to make mistakes in signal processing and it is difficult to recognise them. These mistakes fall into three categories, programming errors, procedural errors in handling the signals and not understanding the theory as well as one should. While modern scripting languages are robust, and may largely eliminate straight programming errors, they most certainly do not protect one from making the others!”
John Day says:
In AR modeling, the predictive modeling results depend on the time series being stationary, i.e. constant mean and variance. But historical temperature records are not stationary, and in fact it is the non-stationary part of the record that you are trying to model. So results of predicting values in the future based on values in the past will generally not be credible.
The residual _is_ stationary. That’s the point. We can certainly drive an AR process with stationary (and in this case also nearly Gaussian) noise + sine waves and get a valid projection. This of course, as I pointed out, depends on the assumptions the the observed periodicity is a forcing function (i.e.. not internal) and the AR process itself is time invariant. As I mentioned, this might not be the case and so the projection should be taken with the appropriate grain of salt.
The main point (beside the surprisingly good fit that can be achieved with such few harmonically related elements) was not the projection but rather that there exists plausible explanations for the observed data which do not require significant forcing from anthropogenic sources.
I wonder about the futility in taking apart the frequency response of all the combined signals from all the sensors. It reminds me of signal frequency response testing (IE hearing aids, brainwaves, musical instruments, etc). Isolating instruments producing a combined sound by filtering out the known frequency response of each note of the various instruments is possible. But in temperature signals, the instruments producing the combined frequency response are many more and they are not playing from the same sheet of music. Each instrument (each sensor) produces a series of sounds in its little music room which has separate acoustics to the one down the road that is producing its own sounds in its little music room which has separate acoustics to the next one down the road, and so on. Logically, at best you could filter out all the separate acoustics back to a single one and say something about that instrument’s data and its particular music room and its particular sheet of music. Putting them all together and saying something about the resulting frequency response is probably not very useful and would lead to a high degree of spurious conclusions.
So I’d like to see this analysis applied up to 1990 to observe how its predictions from 1990 fared compared to reality.
joshv says:
September 11, 2013 at 5:12 am
“Though I am extremely skeptical of model projections in general, I see no reason to believe the projections of this model any less than the projections of GCMs.”
Or any more. Statistical masturbation, at its best, in both cases. Should have left it in the “boxcar” and shipped it out. As I have said before, one can obtain any result one desires with enough data manipulation and statistical analysis.
kencoffman:
Thankyou for your answer to me at September 11, 2013 at 7:58 am in reply to my request for clarification.
I understand your reply to say you have invented from whole cloth your notion that
I fail to see how your invention has any relevance to this thread whether or not you “can see a span that includes a cooling anomaly in the projection”.
Richard
JD: “But, for the same length filter, produce thicker spectral output lines, thus have less frequency resolving power.”
If you add the proviso “for the same length filter” you are totally correct. I work from the spec I need so I compare two filters with similar FWHM and conclude I need a longer window to do the same job without unacceptable defects ( be it gaussian or triple-RM).
I actually favour 3RM for this kind of stuff since it has a zero in frequency. It’s marginally better than gauss for removing annual, for example.
http://climategrog.wordpress.com/2013/05/19/triple-running-mean-filters/
Pamela Gray says:
September 11, 2013 at 8:09 am
“Putting them all together and saying something about the resulting frequency response is probably not very useful and would lead to a high degree of spurious conclusions.”
As one who plays guitar regularly I would say such methods also destroy the melody which is what one is looking for in this analogy.
– – – – – – – –
Jeff Patterson,
Thanks for your effort.
I particularly appreciate your care in describing the limitations and shortcomings of your methods.
John
Folks, we have been schooled on digital signal processing here. Nicely done Jeff.
Anyone interested in the topic should check out The Scientist and Engineer’s Guide to Digital Signal Processing. The book is a free download and is written for people who might actually want do practical signal processing. There are pitfalls that await anyone who naively attempts to use dsp techniques in real life (as opposed to a classroom exercise) and many undergrad textbooks don’t mention them.
Is this anything to do with Mr.T.Pratchett or the unseen university? There is certainly a whiff of Hex in the methodology and written conclusions.
Chuck Nolan says:
September 11, 2013 at 5:46 am
WillR says:
September 11, 2013 at 5:11 am
After figure 3:
“To eliminate the possibility that these are FFT (Fast Fourier Transform) artifacts while avoiding the spectral leakage associated with data windowing, we use a technique [that] is called record periodization.
Note the word [that] — it appears to be missing…
———————————————-
Don’t add words, delete the word “is”.”
cn
Grammatically it is okay to eliminate the first “that”, too. Also, the “the” before ‘spectral’. There, Chuck Nolan, WillR and I have made a statement on our full understanding of the mathematical process!
The problem with this sort of analysis, while interesting, is that it can only show what HAS happened and not what WILL happen. Would an analysis of the opening of the 1812 Overture predict the finale? About the only thing I am prepared to say with any certainty is that the next 1000 years is more likely to be cooler than the last 1000 years than it is likely to be warmer.
Thanks Jeff for a clear and concise post. I have more than a smattering of exposure to digital signal processing and Fourier analysis(although not enough to do it myself!) so I could understand what you were trying to do. You carefully noted and explained the limitations(non-linear externally forced vs. internally generated) and showed that statistically the result is unlikely to be a random error.
Now I’d like to see some publications examing the hypthesis that the climate is an exteranally forced linear process, or showing that it is an interanlly forced randlom process.
JP: “The main point (beside the surprisingly good fit that can be achieved with such few harmonically related elements) was not the projection but rather that there exists plausible explanations for the observed data which do not require significant forcing from anthropogenic sources.”
Sorry, maybe I missed something in the article. What plausible explanation did this present?
Crosspatch, actually that is likely. There is a musicality test, standardized no less, that in its simplest form, has the examinee predict what note comes next after a short series of notes. Because the muical scale is mathematical and orderly one can if one has an ear for the “language” predict what comes next.
sorry, “hypothesis”. My right ring fingertip is numb from a cut, hard to feel where the upper rlow lf keys is.
‘Jeff Patterson says: September 11, 2013 at 6:17 am
IanE says:
September 11, 2013 at 5:52 am
‘With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.’ John von Neumann.
As the stochastic fitting exercise showed, JvN could only only do this .6% of the time if all he had to work with was harmonically related sine waves.’
But of course he could choose any sort of function – he chose something that worked, he did not have to use the approach he ended up with. That is the fundamental problem with data-mining exercises.
crosspatch says:
September 11, 2013 at 8:26 am
Right you are. The long-term temperature trend for the second half (past 5700 years) or third third (3800 years) of the Holocene Interglacial is down. Simply extrapolating that trend gets us into the next glacial sooner rather than later. The trend could reverse, but hasn’t done so in any prior interglacial, once established.
Too bad that man-made GHGs can’t do that trick.
richardscourtney says:
That occurred to me afterwards. The entire exercise is pointless, then. Proper science demands accuracy from the start. If you are starting with data that are flawed, the result is going to be flawed as well.
Pamela Gray says:
September 11, 2013 at 8:33 am
“Crosspatch, actually that is likely. There is a musicality test, standardized no less, that in its simplest form, has the examinee predict what note comes next after a short series of notes. Because the muical scale is mathematical and orderly one can if one has an ear for the “language” predict what comes next.”
This is very true in 1,4,5 chords in simple tunes such as blue grass and old time rock and roll, not so much in classical, at least not that I can tell. Too complex and many times have less repetitive sequences. But then I do not play classical. But torture the “data” enough and you will lose the tune.
Nice science, you put it out there, with all the data, your assumptions, process, limitations, admission of doubts, etc and let everyone rip it apart. Then you defend. All publicly. I think a few other “climate scientists” could take a lesson here, Great post and discussion.