Image Credit: Climate Data Blog
By Richard Linsley Hood – Edited by Just The Facts
The goal of this crowdsourcing thread is to present a 12 month/365 day Cascaded Triple Running Mean (CTRM) filter, inform readers of its basis and value, and gather your input on how I can improve and develop it. A 12 month/365 day CTRM filter completely removes the annual ‘cycle’, as the CTRM is a near Gaussian low pass filter. In fact it is slightly better than Gaussian in that it completely removes the 12 month ‘cycle’ whereas true Gaussian leaves a small residual of that still in the data. This new tool is an attempt to produce a more accurate treatment of climate data and see what new perspectives, if any, it uncovers. This tool builds on the good work by Greg Goodman, with Vaughan Pratt’s valuable input, on this thread on Climate Etc.
Before we get too far into this, let me explain some of the terminology that will be used in this article:
—————-
Filter:
“In signal processing, a filter is a device or process that removes from a signal some unwanted component or feature. Filtering is a class of signal processing, the defining feature of filters being the complete or partial suppression of some aspect of the signal. Most often, this means removing some frequencies and not others in order to suppress interfering signals and reduce background noise.” Wikipedia.
Gaussian Filter:
A Gaussian Filter is probably the ideal filter in time domain terms. That is, if you consider the graphs you are looking at are like the ones displayed on an oscilloscope, then a Gaussian filter is the one that adds the least amount of distortions to the signal.
Full Kernel Filter:
Indicates that the output of the filter will not change when new data is added (except to extend the existing plot). It does not extend up to the ends of the data available, because the output is in the centre of the input range. This is its biggest limitation.
Low Pass Filter:
A low pass filter is one which removes the high frequency components in a signal. One of its most common usages is in anti-aliasing filters for conditioning signals prior to analog-to-digital conversion. Daily, Monthly and Annual averages are low pass filters also.
Cascaded:
A cascade is where you feed the output of the first stage into the input of the next stage and so on. In a spreadsheet implementation of a CTRM you can produce a single average column in the normal way and then use that column as an input to create the next output column and so on. The value of the inter-stage multiplier/divider is very important. It should be set to 1.2067. This is the precise value that makes the CTRM into a near Gaussian filter. It gives values of 12, 10 and 8 months for the three stages in an Annual filter for example.
Triple Running Mean:
The simplest method to remove high frequencies or smooth data is to use moving averages, also referred to as running means. A running mean filter is the standard ‘average’ that is most commonly used in Climate work. On its own it is a very bad form of filter and produces a lot of arithmetic artefacts. Adding three of those ‘back to back’ in a cascade, however, allows for a much higher quality filter that is also very easy to implement. It just needs two more stages than are normally used.
—————
With all of this in mind, a CTRM filter, used either at 365 days (if we have that resolution of data available) or 12 months in length with the most common data sets, will completely remove the Annual cycle while retaining the underlying monthly sampling frequency in the output. In fact it is even better than that, as it does not matter if the data used has been normalised already or not. A CTRM filter will produce the same output on either raw or normalised data, with only a small offset in order to address whatever the ‘Normal’ period chosen by the data provider. There are no added distortions of any sort from the filter.
Let’s take a look at at what this generates in practice.The following are UAH Anomalies from 1979 to Present with an Annual CTRM applied:
Fig 1: UAH data with an Annual CTRM filter
Note that I have just plotted the data points. The CTRM filter has removed the ‘visual noise’ that a month to month variability causes. This is very similar to the 12 or 13 month single running mean that is often used, however it is more accurate as the mathematical errors produced by those simple running means are removed. Additionally, the higher frequencies are completely removed while all the lower frequencies are left completely intact.
The following are HadCRUT4 Anomalies from 1850 to Present with an Annual CTRM applied:
Fig 2: HadCRUT4 data with an Annual CTRM filter
Note again that all the higher frequencies have been removed and the lower frequencies are all displayed without distortions or noise.
There is a small issue with these CTRM filters in that CTRMs are ‘full kernel’ filters as mentioned above, meaning their outputs will not change when new data is added (except to extend the existing plot). However, because the output is in the middle of the input data, they do not extend up to the ends of the data available as can be seen above. In order to overcome this issue, some additional work will be required.
The basic principles of filters work over all timescales, thus we do not need to constrain ourselves to an Annual filter. We are, after all, trying to determine how this complex load that is the Earth reacts to the constantly varying surface input and surface reflection/absorption with very long timescale storage and release systems including phase change, mass transport and the like. If this were some giant mechanical structure slowly vibrating away we would run low pass filters with much longer time constants to see what was down in the sub-harmonics. So let’s do just that for Climate.
When I applied a standard time/energy low pass filter sweep against the data I noticed that there is a sweet spot around 12-20 years where the output changes very little. This looks like it may well be a good stop/pass band binary chop point. So I choose 15 years as the roll off point to see what happens. Remember this is a standard low pass/band-pass filter, similar to the one that splits telephone from broadband to connect to the Internet. Using this approach, all frequencies of any period above 15 years are fully preserved in the output and all frequencies below that point are completely removed.
The following are HadCRUT4 Anomalies from 1850 to Present with a 15 CTRM and a 75 year single mean applied:
Fig 3: HadCRUT4 with additional greater than 15 year low pass. Greater than 75 year low pass filter included to remove the red trace discovered by the first pass.
Now, when reviewing the plot above some have claimed that this is a curve fitting or a ‘cycle mania’ exercise. However, the data hasn’t been fit to anything, I just applied a filter. Then out pops some wriggle in that plot which the data draws all on its own at around ~60 years. It’s the data what done it – not me! If you see any ‘cycle’ in graph, then that’s your perception. What you can’t do is say the wriggle is not there. That’s what the DATA says is there.
Note that the extra ‘greater than 75 years’ single running mean is included to remove the discovered ~60 year line, as one would normally do to get whatever residual is left. Only a single stage running mean can be used as the data available is too short for a full triple cascaded set. The UAH and RSS data series are too short to run a full greater than 15 year triple cascade pass on them, but it is possible to do a greater than 7.5 year which I’ll leave for a future exercise.
And that Full Kernel problem? We can add a Savitzky-Golay filter to the set, which is the Engineering equivalent of LOWESS in Statistics, so should not meet too much resistance from statisticians (want to bet?).
Fig 4: HadCRUT4 with additional S-G projections to observe near term future trends
We can verify that the parameters chosen are correct because the line closely follows the full kernel filter if that is used as a training/verification guide. The latest part of the line should not be considered an absolute guide to the future. Like LOWESS, S-G will ‘whip’ around on new data like a caterpillar searching for a new leaf. However, it tends to follow a similar trajectory, at least until it runs into a tree. While this only a basic predictive tool, which estimates that the future will be like the recent past, the tool estimates that we are over a local peak and headed downwards…
And there we have it. A simple data treatment for the various temperature data sets, a high quality filter that removes the noise and helps us to see the bigger picture. Something to test the various claims made as to how the climate system works. Want to compare it against CO2. Go for it. Want to check SO2. Again fine. Volcanoes? Be my guest. Here is a spreadsheet containing UAH and a Annual CTRM and R code for a simple RSS graph. Please just don’t complain if the results from the data don’t meet your expectations. This is just data and summaries of the data. Occam’s Razor for a temperature series. Very simple, but it should be very revealing.
Now the question is how I can improve it. Do you see any flaws in the methodology or tool I’ve developed? Do you know how I can make it more accurate, more effective or more accessible? What other data sets do you think might be good candidates for a CTRM filter? Are there any particular combinations of data sets that you would like to see? You may have noted the 15 year CTRM combining UAH, RSS, HadCRUT and GISS at the head of this article. I have been developing various options at my new Climate Data Blog and based upon your input on this thread, I am planning a follow up article that will delve into some combinations of data sets, some of their similarities and some of their differences.
About the Author: Richard Linsley Hood holds an MSc in System Design and has been working as a ‘Practicing Logician’ (aka Computer Geek) to look at signals, images and the modelling of things in general inside computers for over 40 years now. This is his first venture into Climate Science and temperature analysis.





David L. Hagen says:
March 17, 2014 at 6:22 am
“For a future endeavor, you might find it interesting to explore the Global Warming Prediction Project. and how their automatic model results compare with your filtered data.”
As soon as I see Linear Trends you have lost me.
Linear Trend = Tangent to the curve = Flat Earth.
It is just that sort of narrow minded thinking that leads one down blind alleyways and into blind corners.
Nature never does things in straight lines, it is always curves and cycles and avalanches. Almost never in sine waves either unless we are talking about orbits and the like. Ordered/Constrained Chaos is at the heart of most things.
“Now the question is how I can improve it. Do you see any flaws in the methodology or tool I’ve developed?”
Because HADCRUT4 data by CRU is greatly incorrect (such as within parts of the 1930-1980 period), as is that from Hansen’s GISS, unfortunately any analysis based on HADCRUT4 is also greatly incorrect. While such has the global cooling scare of the 1960s-1970s occur without substantial cooling beforehand in the global or Northern Hemisphere average, as if it just happened with little basis, I would challenge you or anyone (if inclined to defend it, though you might not be) to find any publication made prior to the CAGW movement of the 1980s onward which shows the 1960s-1970s without showing far more cooling relative to the warmer period of the 1930s-1950s.
For instance, a 1976 National Geographic’s publication of the temperature record of scientists of the time, in the following link, is one I have literally seen in paper form in a library, unlike those repeatedly rewritten-later electronic versions which by strange coincidence (not!) happen to be respectively by a group infamous in Climategroup and by a department which has been under the direction of someone so much an activist as to have been arrested repeatedly in protests:
See that and other illustrations of original temperature data around 40% of the way down in http://tinyurl.com/nbnh7hq
“Are there any particular combinations of data sets that you would like to see?”
While it would be a significant amount of work, digging up original non-rewritten temperature data for history up through the 1970s (prior to the political era, back when there wasn’t motivation to fudge it), digitalizing it, and then carefully joining it onto later temperature data (from more questionable sources but when no alternative) would be a better starting point. Joining together two datasets like that isn’t in theory ideal but the best option in practice; several years of overlap could be used to help check the method. The Northern Hemisphere average, the Southern Hemisphere average, and the global average could best be each generated, as there are reasons, for instance, Antarctic temperatures follow different patterns than the arctic (as the prior link implies).
Of course, like me, you might not have time to do so in the near future even if you desired. But that is something needed yet utterly lacking, as browsing enough skeptic websites indirectly illustrates. The ideal for convenience of other analysis would be both producing plots and producing a list of the values by year (e.g. a data spreadsheet).
RichardLH;
But you logic is faulty as to the methodology of how energy is absorbed/emitted from an object and the temperature of that object over time. IMHO.
>>>>>>>>>>>>>>>>>>
By your logic, there is no need to average temperature across the earth for analysis at all. The matter having done the integration, all that is required according to you is a single weather station which will over time be representative of the earth’s energy balance. Good luck with that.
To add, regarding this:
I would challenge you or anyone (if inclined to defend it, though you might not be) to find any publication made prior to the CAGW movement of the 1980s onward which shows the 1960s-1970s without showing far more cooling relative to the warmer period of the 1930s-1950s.
The Berkeley BEST set made in the modern day by someone who pretended to be a skeptic without pro-CAGW-movement bias but was an environmentalist (as found out in looking at some of his writing beforehand IIRC) does not even remotely meet that challenge. Only original paper publications (or a clear scan not looking rewritten in any way) made prior to the existence of the global warming movement would count.
Henry Clark says:
March 17, 2014 at 6:44 am
“Because HADCRUT4 data by CRU is greatly incorrect (such as within parts of the 1930-1980 period), as is that from Hansen’s GISS, unfortunately any analysis based on HADCRUT4 is also greatly incorrect. ”
Well they all draw from the same set of physical thermometers. You can take a look at the BEST database, which draws from a wide range of providers, to get a wider picture if you like.
This is more intended to analyse what is there in those series and compare and contrast them. The head figure is one where I have aligned the 4 major series over the 1979 onwards era and shown how they agree and disagree.
http://climatedatablog.files.wordpress.com/2014/02/hadcrut-giss-rss-and-uah-global-annual-anomalies-aligned-1979-2013-with-gaussian-low-pass-and-savitzky-golay-15-year-filters1.png
A later set from just UAH (CRU is pending) which compares the Land, Ocean and Combined is quite revealing.
http://climatedatablog.files.wordpress.com/2014/02/uah-global.png
davidmhoffer says:
March 17, 2014 at 6:45 am
“By your logic, there is no need to average temperature across the earth for analysis at all. The matter having done the integration, all that is required according to you is a single weather station which will over time be representative of the earth’s energy balance. Good luck with that.”
Straw man alert. You can answer that one yourself.
RichardLH.
Straw man alert. You can answer that one yourself.
>>>>>>>>>>>>>
No, you answer it. Does matter integrate energy inputs and outputs as you have argued? If so, why is more than a single weather station required for analysis?
davidmhoffer says:
March 17, 2014 at 6:59 am
“No, you answer it. Does matter integrate energy inputs and outputs as you have argued? If so, why is more than a single weather station required for analysis?”
Bully or what?
Well if you are going to get all technical on that then you would require a statistically representative sample of the various sub-environments that are present on the Earths’ surface.
I’ll start with these area based graphs/sampling sets as a reasonable first pass.
http://climatedatablog.wordpress.com/uah/
Probably needs to augmented by some point sampling values as well (coming up with a CRU combined analysis).
Then you might be able to get close to the true picture of how fast/slow the integration methodology in the various materials are across the whole input surface, day to day and month to month.
Henry Clark: “See that and other illustrations of original temperature data around 40% of the way down in” http://tinyurl.com/nbnh7hq
Very interesting ! In that graph early 60’s is very much the same as early 20th c. Late 19th even cooler rather than warmer as now shown. That does not tell us either is correct.
since Hansen did his little air-con con trick in 1988 , I have no reason to think that he would not be equally misleading with his constant warming adjustments.
It would also not fit a 60 year cycle.
But what we can see is that the long term changes we are seeking to explain are primarily what the various adjustments have stuck in there rather that what the measurements actually were, Whether that is for better or for worse.
MR Marler. “Filtering” is nothing more than than fitting data by a method that uses a set of basis functions, and then separating the results into two components (as said by Greg Goodman),
I said nothing of the sort. Don’t use my name to back up your ignorant claims.
RichardLH;
Probably needs to augmented by some point sampling values as well (coming up with a CRU combined analysis).
>>>>>>>>>>>>>
Probably? In other words, you don’t know.
Willis ran an article some time back on the average temperature of the moon not matching the Stefan-Boltzmann black body temperature. Does this mean SB Law is wrong? No. It means that averaging temperature in such a manner as to accurately represent the energy balance of the moon is near impossible. That’s for a body that is airless and waterless. Doing the same for the earth is orders of magnitude more complex.
I suggest you read Willis’ article as well as the various musings of Robert G Brown on these matters.
RichardLH says:
March 17, 2014 at 6:55 am
“Well they all draw from the same set of physical thermometers.”
So did the data published prior to the 1980s, but that can be seen to be drastically different. When depicting average changes of a tiny fraction of 1% in absolute temperature, of tenths of degree, they depend utterly on the interpolation between widely spaced apart stations, the choice of specific stations used, and, when applicable, hidden adjustments. The practical, convenient way to be certain of no bias in favor of the CAGW movement, without spending thousands of hours personally examining everything, is just to use data published prior to its existence.
In addition to the examples in my prior link, there are others, such as the Northern Hemisphere average history of the National Academy of Sciences, illustrated at http://stevengoddard.files.wordpress.com/2014/03/screenhunter_637-mar-15-11-33.gif in http://stevengoddard.wordpress.com/2014/03/15/yet-another-smoking-gun-of-data-fraud-at-nasa/ , which is utterly different from the CRU/BEST rewritten versions of NH average as well as global average history.
“The head figure is one where I have aligned the 4 major series over the 1979 onwards era and shown how they agree and disagree.”
Obviously, and similar has been seen before, like the undercover CAGW movement supporters on Wikipedia publish a similar plot. However, assuming that RSS or UAH are relatively accurate for the sake of argument, correspondence with them 1979-onwards has absolutely jack to do with disproving rewriting of the pre-1979 section easier to get away with.
If you wish to argue this, then try to meet my challenge:
Find (and link or upload) any publication made prior to the CAGW movement of the 1980s onward which shows the 1960s-1970s without showing far more cooling relative to the warmer period of the 1930s-1950s.
Since you’re arguing this is merely a matter of them all using the same thermometers, that should be easy rather than impossible. Again, what is shown must have been published back then, not a BEST publication of decades later for instance.
cd. “In my field of study, the Butterworth is effectively a passband filter where…”
Many of these filters, commonly used in electroincs are not really applicable to relatively short time series. They can be programmed but usually by recursive formulae. That means they need a long ‘spin-up’ period before they converge and give a reasonably stable result that is close to intended characteristics. The spin up in practice is usually nearly as long as the data for climate!
Also they mostly have really lumpy stop-band leakage.
In electronic applications it may take a few hundreths of a second to settling then be in continuous use. Not really the same thing as climate data. Hence they are not generally much help.
That is why FIR ( finite impulse response ) filters are usually applied here.
My favourite for this use is Lanczos, unless you need to get really close to the end of the data.
davidmhoffer says:
March 17, 2014 at 7:17 am
“Probably? In other words, you don’t know.”
Don’t put words incorrectly into my mouth.
I was pointing out that area (actually volume) sampling instruments are all very well but can properly be supplemented by point sampling instruments. They both have different time sampling methodologies so that integration alone is non trivial.
Again we are wandering off the original point where you claimed that measuring temperature was useless and I pointed out that integration by matter into temperature over time made your claim invalid.
My reply is stuck in moderation at the moment, perhaps from some word in it, probably going to appear later.
RichardLH;
Again we are wandering off the original point where you claimed that measuring temperature was useless and I pointed out that integration by matter into temperature over time made your claim invalid.
>>>>>>>>>>>>>>>
No, we’re not wandering at all. I’m explaining why and you’re coming up with excuses that you can’t defend with anything other than explanations that begin with “probably” and muse about the integration being “non trivial” but never actually answering directly the points I’ve raised.
I suggest again that you read the information that I’ve pointed you to.
Richard: “This is pure Gaussian – or even slightly better than Gaussian in that it completely removes the 12 month ‘cycle’ without any additional distortions.”
It does not have the ugly distortions of the inverted lobes in the stop band that make such a mess with simple running mean but to be accurate you should recognise that progressively attenuating all frequencies right from zero is a distortion (other than the intended “distortion” of removing higher frequencies).
Nothings perfect , it’s always a compromise. That is why at least knowing what the filter does is so important when choosing one. and why this article is so helpful. Just be realistic about how good gaussian or triple running means are.
Greg Goodman says:
March 17, 2014 at 12:34 am
“Do you have a similar graph that goes back to say 1850?”
I only really trust the CO2 data to 1958. Proxies are… proxies, i.e., not direct measurements. There are no means truly to verify them.
But, this was a hangup with Ferdinand Englebeen. He was keen to point out that, if you accept the ice core measurements as accurate, the relationship should predict too low a CO2 level before 1958, as here.
Besides the fact that I do not trust the ice core data to deliver a faithful reproduction of CO2 levels, I pointed out it was moot anyway, because knowing what happened since 1958 is enough to discount the influence of human inputs during the era of most significant modern rise.
But, if one really must insist on matching a dodgy set of data farther back, there is no reason that the relationship
dCO2/dt = k*(T – To)
must be steady. The fact that it has been since 1958 is actually quite remarkable. But, there could easily have been a regime change in, say, about 1945 which altered the parameters k and To. I showed the effect of such a regime change to To could make the records consistent here. What could that signify? Possibly an alteration in the CO2 content of upwelling waters at about that time. Maybe Godzilla emerging from hibernation stirred up a cache of latent CO2 at the bottom of the ocean. Who knows?
But, in any case, it is moot. Knowing what happened since 1958 is enough to discount the influence of human inputs during the era of most significant modern rise.
RichardLH: To the point of your post, I would say that you need a filter with near unity gain in the passband a little above 1/60 years^-1, which falls off rapidly thereafter. The Parks-McClellan algorithm was all the rage back when I was an undergraduate, and I think the Remez exchange algorithm still forms the kernel of many modern filter design algorithms. Free code for the algorithm may be readily found on the web, though it is generally in FORTRAN.
Thanks Bart, That was not a challenge to what you said, you just mentioned something about it varying earlier and I presumed you had a graph that went further back that may be been interesting. I agree pre-1958 is a different story in CO2 records.
Those who wish to infer something from ice-cores showing stable millennial scale relationships and modern climate are comparing apples to oranges. The magnitude of the long term , in-phase relationship does not tell us anything about the magnitude of the short term orthogonal relationship.
Ferdi did suggest some ice core data for circa 1500 with 50 year resolution, that may be relevant but I won’t digress too far into that discussion here.
Your graph was very interesting. I may have another look that, Thanks.
davidmhoffer says:
March 17, 2014 at 7:40 am
“No, we’re not wandering at all. I’m explaining why and you’re coming up with excuses that you can’t defend with anything other than explanations that begin with “probably” and muse about the integration being “non trivial” but never actually answering directly the points I’ve raised.
I suggest again that you read the information that I’ve pointed you to.”
Hmmm. I point out that matter integrates energy over time for both inward and outward flows and you never address that point.
Then you raise a straw man about how many sampling points are needed above one. I answer than in a reasonable way and you bluster on.
Read it. Not interested.
Greg Goodman says:
March 17, 2014 at 7:41 am
“It does not have the ugly distortions of the inverted lobes in the stop band that make such a mess with simple running mean but to be accurate you should recognise that progressively attenuating all frequencies right from zero is a distortion (other than the intended “distortion” of removing higher frequencies).
Nothings perfect , it’s always a compromise. That is why at least knowing what the filter does is so important when choosing one. and why this article is so helpful. Just be realistic about how good gaussian or triple running means are.”
Well as you can always get the ‘other half’ out by doing 1-x to get the high pass filter it is pretty good for a few lines of code 🙂
True there will always be some blurring of frequencies around the corner value and those will show up in some measure in both pass and stop outputs instead of just one but as you say, nothing is perfect.
Bart says:
March 17, 2014 at 7:54 am
“RichardLH: To the point of your post, I would say that you need a filter with near unity gain in the passband a little above 1/60 years^-1, which falls off rapidly thereafter. The Parks-McClellan algorithm was all the rage back when I was an undergraduate,”
Thanks for the suggestion but the frequency response curve is way to ‘ringy’ for me.
http://en.wikipedia.org/wiki/File:Pmalgorithm.png
That is the main problem with most of the higher order filters, there are none that respond well to square wave input. Gaussian is probably the best as far as that goes, hence a near Gaussian with only a few lines of code seemed best.
If you want to use true Gaussian then it will do nearly as well. Just stick with S-G rather than switch disciplines to LOWES in that case.
Does throw up another question though. Why is it that GISS and HadCRUT are so far apart in the middle? They are close together at the start and the finish, why so different in the middle? I am not sure GISS (or HadCRUT) will thank me that much.
GISS has a (broken, often backwards) UHI correction. HADCRUT does not. Probably other reasons as well. The UHI correction used by GISS maximally affects the ends.
You cannot really say that they are close at the start and finish and different in the middle, because they are anomalies with separately computed absolute bases. That is, there is no guarantee that the mean global temperature computed from the corrected GISS data will correspond to that computed from the uncorrected HADCRUT4 data. Consequently, you can shift the curves up or down by as much as 1C. So perhaps they should be adjusted to be close in the middle, and maximally different at the start and finish.
There are multiple data adjustments in the two series, and while there is substantial overlap in data sources, the data sources are not identical. And then, as Steve M. notes, there are other long running series with their OWN adjustments and (still overlapping) data sources. It is amusing to see how different they are when plotted on the same axes via e.g. W4T, and then to imagine how their similarities and differences would change if one moved them up or down as anomalies around their distinctly computed global average temperatures on an absolute scale. It is even more amusing to consider how they would change if one actually used e.g. personal weather station data that is now readily available on a fairly impressive if somewhat random grid at least in the United States to compute an actual topographical UHI correction with something vaguely approximating a valid statistical/numerical basis for all of the series. It is more amusing still to consider what the series would look like with error bars, but that, at least, we will never see because if it were ever honestly plotted, the game would be over.
This is a partial entre towards not exactly an explicit correction to your approach, but rather to a suggestion for future consideration.
If one does look at your figure 2 above (HADCRUT4 vs smoothed), you will note that the removed noise scales from the right (most recent) to the left (oldest). This is, of course, a reflection of the increasing underlying uncertainty in the data as one goes into the more remote past. In 1850 we knew diddly about the temperature of the world’s oceans (which cover 70% of the “surface” in the global average surface temperature) and there were whole continental-sized tracts of the planet that were virtually unexplored, let alone reliably sampled for their temperature.
This leaves us with several chicken-and-egg problems that your filter may not be able to compensate/correct for. The most important one is systematic bias like the aforementioned UHI, or systematic bias because places like the Arctic and Antarctic and central Africa and central Australia and Siberia and most of the Americas or the world’s oceans were pitifully poorly represented in the data, and some of those surface areas are make dominant contributions to the perceived “anomaly” today (as maps that plot relative anomaly clearly show). Your smoothed curve may smooth the noisy data to reveal the underlying “simple” structure more clearly, but it obviously cannot fix systematic biases, only statistically neutral noise. I know you make no such claim, but it is important to maintain this disclaimer as the differences between different global average temperature anomalies are in part direct measures of these biases.
The second is that your smoothed curve still comes with an implicit error. Some fraction of the removed/filtered noise isn’t just statistical noise, it is actual measurement error, method error, statistical error that may or may not be zero sum. It might be worthwhile to do some sort of secondary computation on the removed noise — perhaps the simplest one, create a smoothed mean-square of the deviation of the data from the smoothed curve — and use it to add some sort of quasi-gaussian error bar around the smoothed curve. Indeed, plotting the absolute and signed difference between the smoothed curve and the data would itself be rather revealing, I think.
rgb
Greg:
Your request for a Lanczos filter in R seems to have already been met.
http://stackoverflow.com/questions/17264119/using-lanczos-low-pass-filter-in-r-program