Crowdsourcing A Full Kernel Cascaded Triple Running Mean Low Pass Filter, No Seriously…

Fig 4-HadCrut4 Monthly Anomalies with CTRM Annual, 15 and 75 years low pass filters

Image Credit: Climate Data Blog

By Richard Linsley Hood  – Edited by Just The Facts

The goal of this crowdsourcing thread is to present a 12 month/365 day Cascaded Triple Running Mean (CTRM) filter, inform readers of its basis and value, and gather your input on how I can improve and develop it. A 12 month/365 day CTRM filter completely removes the annual ‘cycle’, as the CTRM is a near Gaussian low pass filter. In fact it is slightly better than Gaussian in that it completely removes the 12 month ‘cycle’ whereas true Gaussian leaves a small residual of that still in the data. This new tool is an attempt to produce a more accurate treatment of climate data and see what new perspectives, if any, it uncovers. This tool builds on the good work by Greg Goodman, with Vaughan Pratt’s valuable input, on this thread on Climate Etc.

Before we get too far into this, let me explain some of the terminology that will be used in this article:

—————-

Filter:

“In signal processing, a filter is a device or process that removes from a signal some unwanted component or feature. Filtering is a class of signal processing, the defining feature of filters being the complete or partial suppression of some aspect of the signal. Most often, this means removing some frequencies and not others in order to suppress interfering signals and reduce background noise.” Wikipedia.

Gaussian Filter:

A Gaussian Filter is probably the ideal filter in time domain terms. That is, if you consider the graphs you are looking at are like the ones displayed on an oscilloscope, then a Gaussian filter is the one that adds the least amount of distortions to the signal.

Full Kernel Filter:

Indicates that the output of the filter will not change when new data is added (except to extend the existing plot). It does not extend up to the ends of the data available, because the output is in the centre of the input range. This is its biggest limitation.

Low Pass Filter:

A low pass filter is one which removes the high frequency components in a signal. One of its most common usages is in anti-aliasing filters for conditioning signals prior to analog-to-digital conversion. Daily, Monthly and Annual averages are low pass filters also.

Cascaded:

A cascade is where you feed the output of the first stage into the input of the next stage and so on. In a spreadsheet implementation of a CTRM you can produce a single average column in the normal way and then use that column as an input to create the next output column and so on. The value of the inter-stage multiplier/divider is very important. It should be set to 1.2067. This is the precise value that makes the CTRM into a near Gaussian filter. It gives values of 12, 10 and 8 months for the three stages in an Annual filter for example.

Triple Running Mean:

The simplest method to remove high frequencies or smooth data is to use moving averages, also referred to as running means. A running mean filter is the standard ‘average’ that is most commonly used in Climate work. On its own it is a very bad form of filter and produces a lot of arithmetic artefacts. Adding three of those ‘back to back’ in a cascade, however, allows for a much higher quality filter that is also very easy to implement. It just needs two more stages than are normally used.

—————

With all of this in mind, a CTRM filter, used either at 365 days (if we have that resolution of data available) or 12 months in length with the most common data sets, will completely remove the Annual cycle while retaining the underlying monthly sampling frequency in the output. In fact it is even better than that, as it does not matter if the data used has been normalised already or not. A CTRM filter will produce the same output on either raw or normalised data, with only a small offset in order to address whatever the ‘Normal’ period chosen by the data provider. There are no added distortions of any sort from the filter.

Let’s take a look at at what this generates in practice.The following are UAH Anomalies from 1979 to Present with an Annual CTRM applied:

Fig 1-Feb UAH Monthly Global Anomalies with CTRM Annual low pass filter

Fig 1: UAH data with an Annual CTRM filter

Note that I have just plotted the data points. The CTRM filter has removed the ‘visual noise’ that a month to month variability causes. This is very similar to the 12 or 13 month single running mean that is often used, however it is more accurate as the mathematical errors produced by those simple running means are removed. Additionally, the higher frequencies are completely removed while all the lower frequencies are left completely intact.

The following are HadCRUT4 Anomalies from 1850 to Present with an Annual CTRM applied:

Fig 2-Jan HadCrut4 Monthly Anomalies with CTRM Annual low pass filter

Fig 2: HadCRUT4 data with an Annual CTRM filter

Note again that all the higher frequencies have been removed and the lower frequencies are all displayed without distortions or noise.

There is a small issue with these CTRM filters in that CTRMs are ‘full kernel’ filters as mentioned above, meaning their outputs will not change when new data is added (except to extend the existing plot). However, because the output is in the middle of the input data, they do not extend up to the ends of the data available as can be seen above. In order to overcome this issue, some additional work will be required.

The basic principles of filters work over all timescales, thus we do not need to constrain ourselves to an Annual filter. We are, after all, trying to determine how this complex load that is the Earth reacts to the constantly varying surface input and surface reflection/absorption with very long timescale storage and release systems including phase change, mass transport and the like. If this were some giant mechanical structure slowly vibrating away we would run low pass filters with much longer time constants to see what was down in the sub-harmonics. So let’s do just that for Climate.

When I applied a standard time/energy low pass filter sweep against the data I noticed that there is a sweet spot around 12-20 years where the output changes very little. This looks like it may well be a good stop/pass band binary chop point. So I choose 15 years as the roll off point to see what happens. Remember this is a standard low pass/band-pass filter, similar to the one that splits telephone from broadband to connect to the Internet. Using this approach, all frequencies of any period above 15 years are fully preserved in the output and all frequencies below that point are completely removed.

The following are HadCRUT4 Anomalies from 1850 to Present with a 15 CTRM and a 75 year single mean applied:

Fig 3-Jan HadCrut4 Monthly Anomalies with CTRM Annual, 15 and 75 years low pass filters

Fig 3: HadCRUT4 with additional greater than 15 year low pass. Greater than 75 year low pass filter included to remove the red trace discovered by the first pass.

Now, when reviewing the plot above some have claimed that this is a curve fitting or a ‘cycle mania’ exercise. However, the data hasn’t been fit to anything, I just applied a filter. Then out pops some wriggle in that plot which the data draws all on its own at around ~60 years. It’s the data what done it – not me! If you see any ‘cycle’ in graph, then that’s your perception. What you can’t do is say the wriggle is not there. That’s what the DATA says is there.

Note that the extra ‘greater than 75 years’ single running mean is included to remove the discovered ~60 year line, as one would normally do to get whatever residual is left. Only a single stage running mean can be used as the data available is too short for a full triple cascaded set. The UAH and RSS data series are too short to run a full greater than 15 year triple cascade pass on them, but it is possible to do a greater than 7.5 year which I’ll leave for a future exercise.

And that Full Kernel problem? We can add a Savitzky-Golay filter to the set,  which is the Engineering equivalent of LOWESS in Statistics, so should not meet too much resistance from statisticians (want to bet?).

Fig 4-Jan HadCrut4 Monthly Anomalies with CTRM Annual, 15 and 75 years low pass filters and S-G

Fig 4: HadCRUT4 with additional S-G projections to observe near term future trends

We can verify that the parameters chosen are correct because the line closely follows the full kernel filter if that is used as a training/verification guide. The latest part of the line should not be considered an absolute guide to the future. Like LOWESS, S-G will ‘whip’ around on new data like a caterpillar searching for a new leaf. However, it tends to follow a similar trajectory, at least until it runs into a tree. While this only a basic predictive tool, which estimates that the future will be like the recent past, the tool estimates that we are over a local peak and headed downwards…

And there we have it. A simple data treatment for the various temperature data sets, a high quality filter that removes the noise and helps us to see the bigger picture. Something to test the various claims made as to how the climate system works. Want to compare it against CO2. Go for it. Want to check SO2. Again fine. Volcanoes? Be my guest. Here is a spreadsheet containing UAH and a Annual CTRM and R code for a simple RSS graph. Please just don’t complain if the results from the data don’t meet your expectations. This is just data and summaries of the data. Occam’s Razor for a temperature series. Very simple, but it should be very revealing.

Now the question is how I can improve it. Do you see any flaws in the methodology or tool I’ve developed? Do you know how I can make it more accurate, more effective or more accessible? What other data sets do you think might be good candidates for a CTRM filter? Are there any particular combinations of data sets that you would like to see? You may have noted the 15 year CTRM combining UAH, RSS, HadCRUT and GISS at the head of this article. I have been developing various options at my new Climate Data Blog and based upon your input on this thread, I am planning a follow up article that will delve into some combinations of data sets, some of their similarities and some of their differences.

About the Author: Richard Linsley Hood holds an MSc in System Design and has been working as a ‘Practicing Logician’ (aka Computer Geek) to look at signals, images and the modelling of things in general inside computers for over 40 years now. This is his first venture into Climate Science and temperature analysis.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
355 Comments
Inline Feedbacks
View all comments
Kirk c
March 16, 2014 1:30 pm

That is so cool!

Lance Wallace
March 16, 2014 1:42 pm

The CO2 curve (seasonally detrended monthly) can be fit remarkably closely by a quadratic or exponential curve, each with <1% error for 650 or so consecutive months:
http://wattsupwiththat.com/2012/06/02/what-can-we-learn-from-the-mauna-loa-co2-curve-2/
The exponential has a time constant (e-folding time) on the order of 60 years (i.e., doubling time for the anthropogenic additions from the preindustrial level of 260 ppm of about 60*0.69 = 42 years).
Can the Full Kernel Triple thingamajig provide any further insight into the characteristics of the curve? For example, due to various efforts by governments to reduce CO2 emissions, can we see any effect on the curve to date? I tried fitting the exponential curve only up to the year 2005, then 2010, finally to September 2013) and there was a very small movement toward lengthened e-folding time (60.26, 60.94, 61.59 years). But also, theoretically, there is some relation between atmospheric CO2 and CO2 emissions, but one has to assume something about the lag time between CO2 emissions and CO2 concentrations. Can the Triple whatsis somehow compare the two curves and derive either a lag time or an estimate of how much of the CO2 emissions makes it into the observed atmospheric concentrations? Simply assuming that all the CO2 emissions make it into the atmosphere in the following year gives an R^2 on the order of 40% (IIRC).

geran
March 16, 2014 1:45 pm

Since this is your first foray into “climate science”, Richard, let me help with the basics you will need to know.
Climate modeling is about as close to science as are video games. (Except in the better video games, there are sound effects.) Climate modeling works like this: Enter input, run program, no catastrophic results, enter new input and re-start. Continue until you get catastrophic results. Then, publish results and request more funding.
If your program never achieves a catastrophic result, adjust input data until proper results are achieved. (Ever heard of a hockey stick?)
If someone questions your knowledge of science, produce numerous graphics, in color, of your results.
It’s a veritable gold mine!

Arno Arrak
March 16, 2014 1:45 pm

Why bother with HadCrut and GISS? They are worthless. Use UAH and RSS, but get rid of that annual version and use monthly versions. And draw the trend with a semi-transparent magic marker. See Figure 15 in my book “What Warming?”

RichardLH
March 16, 2014 1:58 pm

Lance Wallace says:
March 16, 2014 at 1:42 pm
“Can the Triple whatsis somehow compare the two curves and derive either a lag time or an estimate of how much of the CO2 emissions makes it into the observed atmospheric concentrations?”
Not really. Low pass filters are only going to show periodicity in the data and the CO2 figure is a continuously(?) rising curve.
It is useful to compare how the CO2 curve matches to the residual after you have removed the ~60 ‘cycle’ and there it does match quite well but with one big problem, you need to find something else before 1850 to make it all work out right.
Volcanos are the current favourite but I think that getting just the right number and size of volcanos needed is stretching co-incidence a little too far. Still possible though.

Eliza
March 16, 2014 2:00 pm

Arno 100% correct GISS, HADCRUT are trash data I don’t understand why any posting using that data can be taken seriously *”adjustments”, “UHI ect…”. This is just feeding the warmist trolls.

RichardLH
March 16, 2014 2:03 pm

Arno Arrak says:
March 16, 2014 at 1:45 pm
“Why bother with HadCrut and GISS? They are worthless. Use UAH and RSS, but get rid of that annual version and use monthly versions.”
They are two of the only series that stretch back to 1850 (which unfortunately the satellite data does not) and with Global coverage.
They do match together quite well though with some odd differences.
What good would a monthly view do when trying to assess climate? Assuming you accept that climate is a long term thing, i.e. longer than 15 years. Monthly is down in the Weather range.

RichardLH
March 16, 2014 2:06 pm

Eliza says:
March 16, 2014 at 2:00 pm
“Arno 100% correct GISS, HADCRUT are trash data”
You might like to consider the fact that any long term adjustments will show up in the residual, greater than 75 years curve, and, if they have indeed occurred, would only serve to flatten that part of the output.
The ~60 year wriggle will still be there and needs explaining.

March 16, 2014 2:21 pm

Giss and hadcrut are not the only series.
Ncdc. Berkeley. Cowtan and way.
Im using ctrm in some work on gcrs and cloud cover. Thanks for the code richard.
Raise your objections now to the method…folks.

March 16, 2014 2:30 pm

Very nice elucidation of the 60 year cycle in the temperature data, Now do the same for the past 2000 years using a suitable filter. A review of candidate proxy data reconstructions and the historical record of climate during the last 2000 years suggests that at this time the most useful reconstruction for identifying temperature trends in the latest important millennial cycle is that of Christiansen and Ljungqvist 2012 (Fig 5)
http://www.clim-past.net/8/765/2012/cp-8-765-2012.pdf
For a forecast of the coming cooling based on the 60 and 1000 year quasi periodicities in the temperatures and the neutron count as a proxy for solar “activity ”
see http://climatesense-norpag.blogspot.com
this also has the Christiansen plot see Fig3. and the 1000 year cycle from ice cores Fig4
The biggest uncertainty in these forecasts is the uncertainty in the timing of the 1000 year cycle peak.
In the figure in Richards post it looks like it is about 2009. From the SST data it looks like about 2003. See Fig 7 in the link and NOAA data at
ftp://ftp.ncdc.noaa.gov/pub/data/anomalies/annual.ocean.90S.90N.df_1901-2000mean.dat
It is time to abandon forecasting from models and for discussion and forecasting purposes use the pattern recognition method seen at the link
http://climatesense-norpag.blogspot.com .

J Martin
March 16, 2014 2:31 pm

What is the point of all this, what is the end result, projection / prediction ?

Arno Arrak
March 16, 2014 2:31 pm

Lance Wallace says on March 16, 2014 at 1:42 pm:
“The CO2 curve (seasonally detrended monthly) can be fit remarkably closely by a quadratic or exponential curve, each with <1% error for 650 or so consecutive months…"
So what. CO2 is not the cause of any global warming and is not worth that expenditure of useless arithmetic. The only thing important about it is that it is completely smooth (except for its seasonal wiggle) during the last two centuries. It follows from this that it is physically impossible to start any greenhouse warming during these two centuries. We already know that there has been no warming for the last 17 years despite the fact that there is more carbon dioxide in the air now than ever before. That makes the twenty-first century greenhouse free. Since this constant addition of CO2 is not causing any warming it follows that the theory of enhanced greenhouse warming is defective. It does not work and should be discarded. The only theory that correctly describes this behavior is the Miskolczi theory that ignorant global warming activists hate. But the twentieth century did have warming. It came in two spurts. The first one started in 1910, raised global temperature by half a degree Celsius, and stopped in 1940. The second one started in 1999, raised global temperature by a third of a degree in only three years, and then stopped. Here is where the smoothness of the CO2 curve comes in. Radiation laws of physics require that in order to start an enhanced greenhouse warming you must simultaneously add carbon dioxide to the atmosphere. That is because the absorbency of CO2 for infrared radiation is a fixed property of its molecules that cannot be changed. To get more warming, get more molecules. This did not happen in 1910 or in 1999 as shown by the Keeling curve and its extension. Hence, all warming within the twentieth century was natural warming, not enhanced greenhouse warming. Cobsequently we now have the twentieth and the twenty-first centuries both entirely greenhouse free. Hence that anthropogenic global warming that is the life blood of IPCC simply does not exist. To put it in other words: AGW has proven to be nothing more than a pseudo-scientific fantasy, result of a false belief that Hansen discovered greenhouse warming in 1988.

Hlaford
March 16, 2014 2:37 pm

The N-path filters make the best notch filters in sampled domains. They fell under the radar for the other kinds of filtering, but here you’d have 12 paths and a high pass at each of them for the monthly data.
I’ve seen a paper where a guy explains how he removed the noise by vuvuzele trumpets using N-path filtering.

RichardLH
March 16, 2014 2:39 pm

Steven Mosher says:
March 16, 2014 at 2:21 pm
“Giss and hadcrut are not the only series. Ncdc. Berkeley. Cowtan and way.
Im using ctrm in some work on gcrs and cloud cover. Thanks for the code richard.”
No problem – it’s what Greg and Vaughan thrashed out translated into R (for the CTRM).
I know there are other series but I wanted to start with the most commonly used ones first. I have treatments for some of the others as well. At present I am working on a set of Global, Land, Ocean triples as that has thrown up some interesting observations.
It is a great pity that the temperatures series are not available in very in easy to read into R form but require, sometimes, a lot of code just to turn them into data.frames. Makes it a lot more difficult to post the code to the ‘net in an easy to understand form.
The ~60 years ‘cycle’ shows up strongly in all of them as far. 🙂

jai mitchell
March 16, 2014 2:39 pm

Since this constant addition of CO2 is not causing any warming it follows that the theory of enhanced greenhouse warming is defective. It does not work and should be discarded.
. . .hilarious. . .
http://www.skepticalscience.com//pics/oceanheat-NODC-endof2013.jpg

RichardLH
March 16, 2014 2:41 pm

J Martin says:
March 16, 2014 at 2:31 pm
“What is the point of all this, what is the end result, projection / prediction ?”
It allows you to look at how the system responded to the inputs over the last 150+ years with less mathematical errors hiding the details. You can use the S-G trend as a loose guide to the immediate future trend if you like.

Gary
March 16, 2014 2:42 pm

How much useful information is being lost by filtering out the high frequency “noise” ? In other words, how do you judge what the most effective bandwidth of the filter is?

RichardLH
March 16, 2014 2:44 pm

Hlaford says:
March 16, 2014 at 2:37 pm
“The N-path filters make the best notch filters in sampled domains.”
There are many good notch filters out there, but that is not what this is. In fact it is the exact opposite. A broadband stop/pass filter that will allow ALL of the frequencies above 15 years in length to be present in the output. No need to choose what to look for, anything that is there, will be there.

RichardLH
March 16, 2014 2:49 pm

Gary says:
March 16, 2014 at 2:42 pm
“How much useful information is being lost by filtering out the high frequency “noise” ? In other words, how do you judge what the most effective bandwidth of the filter is?”
Well what you lose is, Daily, Weather, Monthly, Yearly and Decadal. What you keep is multi-decadal and longer. Which do you consider to be relevant to Climate?
If you are interested in the other stuff you can run a high pass version instead and look at that only if you wish. Just subtract this output signal from the input signal and away you go.
No data is truly lost as such, it is just in the other pass band. The high and low pass added together will, by definition, always be the input signal.

David L. Hagen
March 16, 2014 3:01 pm

Thanks Richard for an insightful way of exposing the 60 year cycle.
1) Has an algebraic factor been found behind the 1.2067 factor or is this still empirical per von Pratt?
2) Does using the more accurate year length of 365.26 days make any difference?
3) Suggest showing the derivative of the CTRM curve.
That would show more clearly that the rate of warming is declining. If that derivative goes though zero, that would give evidence that we are now entering the next “cooling” period vs just flattening in warming.
PS suggest amending “with a 15 CTRM and” to “with a 15 CTRM year and” to read more clearly.
Interesting interaction you had with TallBloke.

Mike McMillan
March 16, 2014 3:10 pm

Well, it certainly gets rid of the 1998 peak problem. GISS would be appreciative.

Bernie Hutchins
March 16, 2014 3:14 pm

Is it possible to humor some of us old-timer digital filter designers by specifically saying what the (Finite) Impulse Response of the CTRM is; in basic terms such as what simpler elements are being cascaded (convolved for the IR response, multiplied for the frequency response, etc.)? This is the key to understanding filtering – old or new. Thanks.

RichardLH
March 16, 2014 3:15 pm

Dr Norman Page says:
March 16, 2014 at 2:30 pm
“Very nice elucidation of the 60 year cycle in the temperature data, Now do the same for the past 2000 years using a suitable filter.”
Bit of a challenge finding a thermometer series going back that far. 😉
Proxy series all come with their own problems. They are rarely in a resolution that will allow the ~60 year signal to be seen.
I do have one which is the Shen PDO reconstruction from rainfall which does it quite well back to the 1400s.
http://climatedatablog.files.wordpress.com/2014/02/pdo-reconstruction-1470-1998-shen-2006-with-gaussian-low-pass-30-and-75-year-filters-and-hadcrut-overlay.png
Shen, C., W.-C. Wang, W. Gong, and Z. Hao. 2006.
A Pacific Decadal Oscillation record since 1470 AD reconstructed
from proxy data of summer rainfall over eastern China.
Geophysical Research Letters, vol. 33, L03702, February 2006.
ftp://ftp.ncdc.noaa.gov/pub/data/paleo/historical/pacific/pdo-shen2006.txt
Looks like the ~60 year is present a long way back then. As to any longer cycles, there the problem is resolution and data length. In most case the ‘noise’ is so great you can conclude almost anything and not be proved wrong by the data unfortunately.

Graeme W
March 16, 2014 3:16 pm

I have a couple of questions regarding the initial filter.
1. A 365 day filter has an ongoing problem due to leap-years. There are, roughly, 25 leap days in a century, which pushes a 365 day filter almost a month out of alignment. How is this catered for? It’s not a problem with a 12 month filter, but then you hit the problem that each month is not equal in length.
2. You’re doing filtering on anomalies to remove the seasonal component. Aren’t the anomalies supposed to do that themselves, since they’re anomalies from the average for each part of the season. That is, the anomalies themselves are trying to remove the seasonal component, and the filter is also trying to remove the seasonal component. How do ensure that any result we get isn’t an artifact of these two processes interacting?
Please forgive me if these questions show my ignorance, but they’ve been a concern of mine for awhile.

Steve Taylor
March 16, 2014 3:17 pm

Richard, are you related to the late, great John Linsley Hood, by any chance ?

1 2 3 15