Crowdsourcing A Full Kernel Cascaded Triple Running Mean Low Pass Filter, No Seriously…

Fig 4-HadCrut4 Monthly Anomalies with CTRM Annual, 15 and 75 years low pass filters

Image Credit: Climate Data Blog

By Richard Linsley Hood  – Edited by Just The Facts

The goal of this crowdsourcing thread is to present a 12 month/365 day Cascaded Triple Running Mean (CTRM) filter, inform readers of its basis and value, and gather your input on how I can improve and develop it. A 12 month/365 day CTRM filter completely removes the annual ‘cycle’, as the CTRM is a near Gaussian low pass filter. In fact it is slightly better than Gaussian in that it completely removes the 12 month ‘cycle’ whereas true Gaussian leaves a small residual of that still in the data. This new tool is an attempt to produce a more accurate treatment of climate data and see what new perspectives, if any, it uncovers. This tool builds on the good work by Greg Goodman, with Vaughan Pratt’s valuable input, on this thread on Climate Etc.

Before we get too far into this, let me explain some of the terminology that will be used in this article:

—————-

Filter:

“In signal processing, a filter is a device or process that removes from a signal some unwanted component or feature. Filtering is a class of signal processing, the defining feature of filters being the complete or partial suppression of some aspect of the signal. Most often, this means removing some frequencies and not others in order to suppress interfering signals and reduce background noise.” Wikipedia.

Gaussian Filter:

A Gaussian Filter is probably the ideal filter in time domain terms. That is, if you consider the graphs you are looking at are like the ones displayed on an oscilloscope, then a Gaussian filter is the one that adds the least amount of distortions to the signal.

Full Kernel Filter:

Indicates that the output of the filter will not change when new data is added (except to extend the existing plot). It does not extend up to the ends of the data available, because the output is in the centre of the input range. This is its biggest limitation.

Low Pass Filter:

A low pass filter is one which removes the high frequency components in a signal. One of its most common usages is in anti-aliasing filters for conditioning signals prior to analog-to-digital conversion. Daily, Monthly and Annual averages are low pass filters also.

Cascaded:

A cascade is where you feed the output of the first stage into the input of the next stage and so on. In a spreadsheet implementation of a CTRM you can produce a single average column in the normal way and then use that column as an input to create the next output column and so on. The value of the inter-stage multiplier/divider is very important. It should be set to 1.2067. This is the precise value that makes the CTRM into a near Gaussian filter. It gives values of 12, 10 and 8 months for the three stages in an Annual filter for example.

Triple Running Mean:

The simplest method to remove high frequencies or smooth data is to use moving averages, also referred to as running means. A running mean filter is the standard ‘average’ that is most commonly used in Climate work. On its own it is a very bad form of filter and produces a lot of arithmetic artefacts. Adding three of those ‘back to back’ in a cascade, however, allows for a much higher quality filter that is also very easy to implement. It just needs two more stages than are normally used.

—————

With all of this in mind, a CTRM filter, used either at 365 days (if we have that resolution of data available) or 12 months in length with the most common data sets, will completely remove the Annual cycle while retaining the underlying monthly sampling frequency in the output. In fact it is even better than that, as it does not matter if the data used has been normalised already or not. A CTRM filter will produce the same output on either raw or normalised data, with only a small offset in order to address whatever the ‘Normal’ period chosen by the data provider. There are no added distortions of any sort from the filter.

Let’s take a look at at what this generates in practice.The following are UAH Anomalies from 1979 to Present with an Annual CTRM applied:

Fig 1-Feb UAH Monthly Global Anomalies with CTRM Annual low pass filter

Fig 1: UAH data with an Annual CTRM filter

Note that I have just plotted the data points. The CTRM filter has removed the ‘visual noise’ that a month to month variability causes. This is very similar to the 12 or 13 month single running mean that is often used, however it is more accurate as the mathematical errors produced by those simple running means are removed. Additionally, the higher frequencies are completely removed while all the lower frequencies are left completely intact.

The following are HadCRUT4 Anomalies from 1850 to Present with an Annual CTRM applied:

Fig 2-Jan HadCrut4 Monthly Anomalies with CTRM Annual low pass filter

Fig 2: HadCRUT4 data with an Annual CTRM filter

Note again that all the higher frequencies have been removed and the lower frequencies are all displayed without distortions or noise.

There is a small issue with these CTRM filters in that CTRMs are ‘full kernel’ filters as mentioned above, meaning their outputs will not change when new data is added (except to extend the existing plot). However, because the output is in the middle of the input data, they do not extend up to the ends of the data available as can be seen above. In order to overcome this issue, some additional work will be required.

The basic principles of filters work over all timescales, thus we do not need to constrain ourselves to an Annual filter. We are, after all, trying to determine how this complex load that is the Earth reacts to the constantly varying surface input and surface reflection/absorption with very long timescale storage and release systems including phase change, mass transport and the like. If this were some giant mechanical structure slowly vibrating away we would run low pass filters with much longer time constants to see what was down in the sub-harmonics. So let’s do just that for Climate.

When I applied a standard time/energy low pass filter sweep against the data I noticed that there is a sweet spot around 12-20 years where the output changes very little. This looks like it may well be a good stop/pass band binary chop point. So I choose 15 years as the roll off point to see what happens. Remember this is a standard low pass/band-pass filter, similar to the one that splits telephone from broadband to connect to the Internet. Using this approach, all frequencies of any period above 15 years are fully preserved in the output and all frequencies below that point are completely removed.

The following are HadCRUT4 Anomalies from 1850 to Present with a 15 CTRM and a 75 year single mean applied:

Fig 3-Jan HadCrut4 Monthly Anomalies with CTRM Annual, 15 and 75 years low pass filters

Fig 3: HadCRUT4 with additional greater than 15 year low pass. Greater than 75 year low pass filter included to remove the red trace discovered by the first pass.

Now, when reviewing the plot above some have claimed that this is a curve fitting or a ‘cycle mania’ exercise. However, the data hasn’t been fit to anything, I just applied a filter. Then out pops some wriggle in that plot which the data draws all on its own at around ~60 years. It’s the data what done it – not me! If you see any ‘cycle’ in graph, then that’s your perception. What you can’t do is say the wriggle is not there. That’s what the DATA says is there.

Note that the extra ‘greater than 75 years’ single running mean is included to remove the discovered ~60 year line, as one would normally do to get whatever residual is left. Only a single stage running mean can be used as the data available is too short for a full triple cascaded set. The UAH and RSS data series are too short to run a full greater than 15 year triple cascade pass on them, but it is possible to do a greater than 7.5 year which I’ll leave for a future exercise.

And that Full Kernel problem? We can add a Savitzky-Golay filter to the set,  which is the Engineering equivalent of LOWESS in Statistics, so should not meet too much resistance from statisticians (want to bet?).

Fig 4-Jan HadCrut4 Monthly Anomalies with CTRM Annual, 15 and 75 years low pass filters and S-G

Fig 4: HadCRUT4 with additional S-G projections to observe near term future trends

We can verify that the parameters chosen are correct because the line closely follows the full kernel filter if that is used as a training/verification guide. The latest part of the line should not be considered an absolute guide to the future. Like LOWESS, S-G will ‘whip’ around on new data like a caterpillar searching for a new leaf. However, it tends to follow a similar trajectory, at least until it runs into a tree. While this only a basic predictive tool, which estimates that the future will be like the recent past, the tool estimates that we are over a local peak and headed downwards…

And there we have it. A simple data treatment for the various temperature data sets, a high quality filter that removes the noise and helps us to see the bigger picture. Something to test the various claims made as to how the climate system works. Want to compare it against CO2. Go for it. Want to check SO2. Again fine. Volcanoes? Be my guest. Here is a spreadsheet containing UAH and a Annual CTRM and R code for a simple RSS graph. Please just don’t complain if the results from the data don’t meet your expectations. This is just data and summaries of the data. Occam’s Razor for a temperature series. Very simple, but it should be very revealing.

Now the question is how I can improve it. Do you see any flaws in the methodology or tool I’ve developed? Do you know how I can make it more accurate, more effective or more accessible? What other data sets do you think might be good candidates for a CTRM filter? Are there any particular combinations of data sets that you would like to see? You may have noted the 15 year CTRM combining UAH, RSS, HadCRUT and GISS at the head of this article. I have been developing various options at my new Climate Data Blog and based upon your input on this thread, I am planning a follow up article that will delve into some combinations of data sets, some of their similarities and some of their differences.

About the Author: Richard Linsley Hood holds an MSc in System Design and has been working as a ‘Practicing Logician’ (aka Computer Geek) to look at signals, images and the modelling of things in general inside computers for over 40 years now. This is his first venture into Climate Science and temperature analysis.

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
355 Comments
Inline Feedbacks
View all comments
AlexS
March 16, 2014 4:42 pm

One more instance where people want to produce omelets without eggs.
This answer below show everything is wrong about the author’s attitude:
“They are two of the only series that stretch back to 1850 (which unfortunately the satellite data does not) and with Global coverage.”

March 16, 2014 4:47 pm

It would be helpful to know the following:
1. What does this method tell us that a linear trend does not tell us?
2. How clear is it from the method that there is a ~60-year cyclicity in the data?
3. What tests have you performed to see whether this ~60-year cycle is synchronous with the ~60-year cycles of the great ocean oscillations? Or with the ~60-year planetary beat?
4. What are the main reasons why this method is better than other methods now in use?
5. Given that orbital characteristics have already removed any seasonal influence from the data, what need is there to remove it again?
6. Are you advocating this method as a replacement for the linear trends now used by the IPCC etc.?
7. Do you propose to get a paper on the applicability of this method to climate measurements into the learned journals, and to persuade the IPCC to adopt it?
Many thanks

RichardLH
March 16, 2014 4:49 pm

Bernie Hutchins says:
March 16, 2014 at 4:39 pm
“Thanks Richard – that’s what I thought.
But when you cascade Moving Averages (running means, rectangles, “boxcars”) you get all the nulls of the original rectangles in the cascade.”
But that is the reason for the 1.2067 inter-stage value. It places the nulls into the centre of the previous errors and thus the cascade flattens the whole thing to Gaussian, or very nearly so.
Vaughn was kind enough to agree with me when I pointed out that digitisation and range errors dominate over any residual errors with just the three stages.
S-G is all very well, but it does have a tendency to be quite ‘whippy’ at the ends. Full kernel filters never changed when new data s added, just extend. Thus certainty comes from their output, not uncertainty.
So I use CTRM for the majority of the data with S-G only for the ends and verified one against each other for the parameters.

RichardLH
March 16, 2014 4:50 pm

AlexS says:
March 16, 2014 at 4:42 pm
“One more instance where people want to produce omelets without eggs.
This answer below show everything is wrong about the author’s attitude:
“They are two of the only series that stretch back to 1850 (which unfortunately the satellite data does not) and with Global coverage.””
Show me a Global data set that extends further back and I will use it.

RichardLH
March 16, 2014 5:10 pm

Monckton of Brenchley says:
March 16, 2014 at 4:47 pm
“It would be helpful to know the following:
1. What does this method tell us that a linear trend does not tell us?”
Linear trends are to my mind almost the most useless of statistics. They are not really valid outside of the range from which they are drawn. A continuous function, such as a filter, is a much better guide as to what is actually happening in the available data. CTRM filters are the most accurate, simple, continuous function you can use.
“2. How clear is it from the method that there is a ~60-year cyclicity in the data?”
If there IS a ~60 year cycle in the data, then it will be present and demonstrable, by measuring peak to peak or mid point to mid point. With only two such cycles available in the data it is right at the edge of any such decision though. Nearly in ‘toss a coin’ land but signal decoders work like this all the time.
” 3. What tests have you performed to see whether this ~60-year cycle is synchronous with the ~60-year cycles of the great ocean oscillations? Or with the ~60-year planetary beat?”
I have quite a few graphs of PDO, AMO/NAO on http://climatedatablog.wordpress.com/. I am in the early days of drawing them all together and that is hopefully going to be part of my next article.
“4. What are the main reasons why this method is better than other methods now in use?”
Mathematically purer than single running means and not much more difficult to create/use.
“5. Given that orbital characteristics have already removed any seasonal influence from the data, what need is there to remove it again?”
Normal/Anomaly only has 30 additions so leaves a surprisingly large error term in that sort of work. This removes all of the quite cleanly. Can be run on either normalised or raw data and will produce the same high quality output for both.
” 6. Are you advocating this method as a replacement for the linear trends now used by the IPCC etc.?”
Absolutely.
Linear Trend = Tangent to the Curve = Flat Earth.
” 7. Do you propose to get a paper on the applicability of this method to climate measurements into the learned journals, and to persuade the IPCC to adopt it?”
My academic days are well past now. It may well be worth while trying to get a paper published though.

March 16, 2014 5:11 pm

Since temperature doesn’t vary linearly with w/m2, this method is as useless as calculating an average temperature is in the first place. Convert all the temperature data to w/m2 and then look at all the trends with all the filters you want. Since averaging temperature data is absurd in the first place, all this accomplishes is a sophisticated analysis of an absurd data set.
The fascination which so many seem to have with trends of averaged temperature data is beyond me. It tells us precisely nothing about the energy balance of the earth, no matter how you filter it.

Bernie Hutchins
March 16, 2014 5:13 pm

RichardLH says:
March 16, 2014 at 4:49 pm
Thanks again Richard
True enough you may be approximating Gaussian. Convolving many rectangles tends Gaussian of course. So? You are then using low-pass region that is rapidly falling from the start instead of one (SG) that could be optimally flat. What is your reason for choosing one over the other?
And I don’t understand your “S-G is all very well, but it does have a tendency to be quite ‘whippy’ at the ends.” What is “whippy”? Are you saying there are end effects on the ends – that is no surprise. Nobody really knows how to handle ends – just to caution end interpretation. Then you say: “So I use CTRM for the majority of the data with S-G only for the ends…” So you want the whippy behavior?
(Someone around here at times says smoothed data is not THE data. )

March 16, 2014 5:20 pm

RLH
Here’s the link . You will have to read the paper carefully to see which data was used in the paper and how Check on Christiansen
at
ftp://ftp.ncdc.noaa.gov/pub/data/paleo/contributions_by_author/
To look at the longer wavelengths you really need FFT and wavelet analysis of the Holocene data See Fig 4 at http://climatesense-norpag.blogspot.com
1000 year periodicity looks good at 10,000, 9000,8000,7000- then resonance fades out comes back in at 2000,1000 0.

RichardLH
March 16, 2014 5:31 pm

davidmhoffer says:
March 16, 2014 at 5:11 pm
“Since temperature doesn’t vary linearly with w/m2, this method is as useless as calculating an average temperature is in the first place.”
Since matter integrates the incoming power over its volume/density/composition and alters its temperature to suit I disagree.

RichardLH
March 16, 2014 5:40 pm

Bernie Hutchins says:
March 16, 2014 at 5:13 pm
“True enough you may be approximating Gaussian. Convolving many rectangles tends Gaussian of course. So? You are then using low-pass region that is rapidly falling from the start instead of one (SG) that could be optimally flat. What is your reason for choosing one over the other?”
S-G alters over the whole of its length with new data. A CTRM will never alter, only extend. CTRM is very simple to do, S-G is a lot more complex.
Mathematically CTRM is much, much purer than a single running mean and only requires two extra stages, with reducing windows to get near Gaussian. With just three stages digitisation/rounding errors are larger than then error terms left when compared to a true Gaussian response.
I try to avoid placing too much hard decision making on filters other than full kernel ones and use S-G for outline guidance only.

RichardLH
March 16, 2014 5:40 pm

Dr Norman Page says:
March 16, 2014 at 5:20 pm
“RLH Here’s the link .”
Thanks.

March 16, 2014 5:47 pm

RichardLH;
Since matter integrates the incoming power over its volume/density/composition and alters its temperature to suit I disagree.
>>>>>>>>>>>>>.
If the temperature of the earth were uniform, you’d be right. But it isn’t. I suggest you read Robert G Brown’s various articles on the absurdity of calculating an average temperature in the first place. It can easily be demonstrated that the earth can be cooling while exhibiting a warming average temperature, and vice versa.

March 16, 2014 5:53 pm

Another way to filter or factor out a strong annual signal is to do a running annual difference (jan 1978-jan 1977 etc). This has the advantage of preserving a signal-to-noise ratio and reveals the relative strengths of longer cycles. If you have hourly data, you can do day to day differences. The limitation is the time resolution of the data.

March 16, 2014 6:08 pm

I think your blog is one of the top conservatarian blogs out there and I put you in my links. Keep up the good work. http://the-paste.blogspot.com/ , http://thedailysmug.blogspot.com/

Bernie Hutchins
March 16, 2014 6:11 pm

RichardLH says:
March 16, 2014 at 5:40 pm
Thanks Richard –
(1) Somehow I don’t think you and I may not be talking about the same thing. My idea of a FIR filter (which is standard) is that it is fixed length, and convolves with the signal. A moving average and a SG do this. for example. New data is what enters the window one step at a time. The filter is not “redesigned” with each step. Isn’t this what you do? Is your system LTI? Your spreadsheet doesn’t tell me much. Diagrams and equations are often useful in signal processing of course – not just output graphs.
(2) SG is NOT “a lot more complex.” You didn’t say why the SG flat passband was not something to admire. Neither did you tell me what “whippy” meant. I have never heard that term used.
(3) Since you are smoothing (averaging) you are destroying data. If you tell me you see a periodicity of 60 years, I say – yes – I see it in the data itself. If I can’t see it already, but it comes out of the filter, I would want you to show that it emerges in excess of what would be expected just by resonance. So the larger issue is perhaps that if you want to use smoothing and therefore DESTROY information, are we wrong to ask you do define in detail what your smoothing involves, and to demonstrate why it is better than something better understood?

bones
March 16, 2014 6:12 pm

Lance Wallace says:
March 16, 2014 at 1:42 pm
The CO2 curve (seasonally detrended monthly) can be fit remarkably closely by a quadratic or exponential curve, each with <1% error for 650 or so consecutive months:
http://wattsupwiththat.com/2012/06/02/what-can-we-learn-from-the-mauna-loa-co2-curve-2/
The exponential has a time constant (e-folding time) on the order of 60 years (i.e., doubling time for the anthropogenic additions from the preindustrial level of 260 ppm of about 60*0.69 = 42 years).
——————————————————
Last time I looked, the atmospheric CO2 concentration was increasing at about 5% per decade, which gives it a doubling time of about 140 years.

RichardLH
March 16, 2014 6:38 pm

Bernie Hutchins says:
March 16, 2014 at 6:11 pm
“Thanks Richard –
(1) Somehow I don’t think you and I may not be talking about the same thing. My idea of a FIR filter (which is standard) is that it is fixed length, and convolves with the signal. A moving average and a SG do this. for example. New data is what enters the window one step at a time. ”
A CTRM is just an extension of the standard moving average. Nothing more. The same rules apply. New data enters, old data leaves. The window is a stacked set which has a nearly Gaussian distribution to its weighting values if you work them all out.
“(2) SG is NOT “a lot more complex.” You didn’t say why the SG flat passband was not something to admire. Neither did you tell me what “whippy” meant. I have never heard that term used.”
If you compare one set with another as time evolves then you will see how the S-G has a tendency for its outer end to move around in an almost animal like manner with new data. That Wiki link has a nice animation which shows the underlying function as its passes up a data set which displays it quite well.Too many animations of images in the work I have done elsewhere I suppose, it just looks like a caterpillar to me 🙂 Sorry.
http://en.wikipedia.org/wiki/File:Lissage_sg3_anim.gif
(3) Since you are smoothing (averaging) you are destroying data.
Wrong. You are assigning stuff to a pass band or a stop band. In fact if you take the output and subtract if from the original data (as in 1-x) you end up with the high pass filter version. Data is never destroyed. High Pass output plus Low Pass output always equals the original data set.

RichardLH
March 16, 2014 6:40 pm

davidmhoffer says:
March 16, 2014 at 5:47 pm
Pulsed input is also integrated by matter as well so I will still differ.

RichardLH
March 16, 2014 6:43 pm

fhhaynie says:
March 16, 2014 at 5:53 pm
“Another way to filter or factor out a strong annual signal is to do a running annual difference (jan 1978-jan 1977 etc). This has the advantage of preserving a signal-to-noise ratio and reveals the relative strengths of longer cycles. If you have hourly data, you can do day to day differences. The limitation is the time resolution of the data.”
A single stage filter, either running average or difference, has horrible mathematical errors in its out output.
One look at the frequency response will tell you that.
http://climatedatablog.files.wordpress.com/2014/02/fig-1-gaussian-simple-mean-frequency-plots.png

A Crooks of Adelaide
March 16, 2014 6:45 pm

RichardLH says:
March 16, 2014 at 4:11 pm
A Crooks of Adelaide says:
March 16, 2014 at 4:02 pm
The point of this is that is NOT a curve fit of any form. It is a simple low pass treatment of the data. Any ‘cycle’ you see, you need to explain. This just shows what is there.
I’m more interested in the whats left after you take the low pass out. Its the short term cycles that determine if the monthly “Arghhh its getting hotter!” or “Arrgh its getting colder!” is significant or not.
And I dont go along with this … “if you cant explain the cycle, it doesnt exist” story They were predicting eclipses in the Bronze Age and very useful it proved too. You have to find the cycles first – then you start to think what causes them

RichardLH
March 16, 2014 6:50 pm

Dr Norman Page says:
March 16, 2014 at 5:20 pm
” To look at the longer wavelengths you really need FFT and wavelet analysis of the Holocene data ”
The problem with FFTs and Wavelets is two fold.
1) Large amounts of noise make longer wavelengths very difficult to pick out. You end up with very broad peaks which all COULD be signal.
2) They are both very bad if the signal is made up of ‘half wave’ mixtures. Such as a mixture of 2, 3 and 4 years half waves in some random combination. Or, say, 55, 65, 75 year half waves mixed up to make a 65 year average. Nature, when not operating as a tuned string which is most of the time, has a habit of falling into just such combinations.
The noise is the problem really. In both value and phase. There are no nice clean signals which we can work with.

March 16, 2014 6:52 pm

Off the subject:
Most of the time, just read here go elsewhere on the net and seek out those who need to join in and link back here. The first word of this post “Crowdsourceing” struck me.
What we are missing is a “Crowdsourceing” person from the warming side to focus more voters to this site and others like this site and or to pull people out of their daily lives and into the struggle.
Back in say 2006 to 2008 old Al Gore was all about and never ever shut his yap.
Seems we need to find a way to get someone as polarizing Al Gore together with others like him to once more and get them all back into the fray and go all bombastic and over the top with wild claims. Think up a way to gig his or some other ones of their over sized egos and get them in the press and then use that to pull in more people of reason to the battle ground.
Not that these types will ever go away but we need to get this lie based fraud redistribution of wealth pulled way down to earth.

RichardLH
March 16, 2014 6:53 pm

A Crooks of Adelaide says:
March 16, 2014 at 6:45 pm
“You have to find the cycles first – then you start to think what causes them”
Well I suspect that we have the Daily, Monthly and Yearly cycles pinned down quite well now. Its the natural cycles longer than 15 years that I am interested in. Nothing much except something big at ~60 years between 15 and 75 years as far as I can tell.

Bart
March 16, 2014 7:00 pm

RichardLH says:
March 16, 2014 at 1:58 pm
“It is useful to compare how the CO2 curve matches to the residual after you have removed the ~60 ‘cycle’ and there it does match quite well but with one big problem, you need to find something else before 1850 to make it all work out right.”
There’s a bigger problem. It doesn’t really match well at all. The match is between the integral of temperature and CO2.
And, that’s GISS. The match with the satellite record is even better.
CO2 is not driving temperatures to any level of significance at all. It is, instead, itself accumulating due to a natural process which is modulated by temperatures. Once you remove the trend and the ~60 year periodicity, which have been around since at least 1880, well before CO2 is believed to have increased significantly, there is very little left of temperature to be driven.

March 16, 2014 7:03 pm

Richard. They all use different stations