Guest Post by Willis Eschenbach
Since we’ve been discussing smoothing in datasets, I thought I’d repost something that Steve McIntyre had graciously allowed me to post on his amazing blog ClimateAudit back in 2008.
—————————————————————————————–
Data Smoothing and Spurious Correlation
Allan Macrae has posted an interesting study at ICECAP. In the study he argues that the changes in temperature (tropospheric and surface) precede the changes in atmospheric CO2 by nine months. Thus, he says, CO2 cannot be the source of the changes in temperature, because it follows those changes.
Being a curious and generally disbelieving sort of fellow, I thought I’d take a look to see if his claims were true. I got the three datasets (CO2, tropospheric, and surface temperatures), and I have posted them up here. These show the actual data, not the month-to-month changes.
In the Macrae study, he used smoothed datasets (12 month average) of the month-to-month change in temperature (∆T) and CO2 (∆CO2) to establish the lag between the change in CO2 and temperature . Accordingly, I did the same. [My initial graph of the raw and smoothed data is shown above as Figure 1, I repeat it here with the original caption.]

Figure 1. Cross-correlations of raw and 12-month smoothed UAH MSU Lower Tropospheric Temperature change (∆T) and Mauna Loa CO2 change (∆CO2). Smoothing is done with a Gaussian average, with a “Full Width to Half Maximum” (FWHM) width of 12 months (brown line). Red line is correlation of raw unsmoothed data (referred to as a “0 month average”). Black circle shows peak correlation.
At first glance, this seemed to confirm his study. The smoothed datasets do indeed have a strong correlation of about 0.6 with a lag of nine months (indicated by the black circle). However, I didn’t like the looks of the averaged data. The cycle looked artificial. And more to the point, I didn’t see anything resembling a correlation at a lag of nine months in the unsmoothed data.
Normally, if there is indeed a correlation that involves a lag, the unsmoothed data will show that correlation, although it will usually be stronger when it is smoothed. In addition, there will be a correlation on either side of the peak which is somewhat smaller than at the peak. So if there is a peak at say 9 months in the unsmoothed data, there will be positive (but smaller) correlations at 8 and 10 months. However, in this case, with the unsmoothed data there is a negative correlation for 7, 8, and 9 months lag.
Now Steve McIntyre has posted somewhere about how averaging can actually create spurious correlations (although my google-fu was not strong enough to find it). I suspected that the correlation between these datasets was spurious, so I decided to look at different smoothing lengths. These look like this:

Figure 2. Cross-correlations of raw and smoothed UAH MSU Lower Tropospheric Temperature change (∆T) and Mauna Loa CO2 change (∆CO2). Smoothing is done with a Gaussian average, with a “Full Width to Half Maximum” (FWHM) width as given in the legend. Black circles shows peak correlation for various smoothing widths. As above, a “0 month” average shows the lagged correlations of the raw data itself.
Note what happens as the smoothing filter width is increased. What start out as separate tiny peaks at about 3-5 and 11-14 months end up being combined into a single large peak at around nine months. Note also how the lag of the peak correlation changes as the smoothing window is widened. It starts with a lag of about 4 months (purple and blue 2 month and 6 month smoothing lines). As the smoothing window increases, the lag increases as well, all the way up to 17 months for the 48 month smoothing. Which one is correct, if any?
To investigate what happens with random noise, I constructed a pair of series with similar autoregressions, and I looked at the lagged correlations. The original dataset is positively autocorrelated (sometimes called “red” noise). In general, the change (∆T or ∆CO2) in a positively autocorrelated dataset is negatively autocorrelated (sometimes called “blue noise”). Since the data under investigation is blue, I used blue random noise with the same negative autocorrelation for my test of random data. However, the exact choice is immaterial to the smoothing issue.
This was my first result using random data:

Figure 3. Cross-correlations of raw and smoothed random (blue noise) datasets. Smoothing is done with a Gaussian average, with a “Full Width to Half Maximum” (FWHM) width as given in the legend. Black circles show peak correlations for various smoothings.
Note that as the smoothing window increases in width, we see the same kind of changes we saw in the temperature/CO2 comparison. There appears to be a correlation between the smoothed random series, with a lag of about 7 months. In addition, as the smoothing window widens, the maximum point is pushed over, until it occurs at a lag which does not show any correlation in the raw data.
After making the first graph of the effect of smoothing width on random blue noise, I noticed that the curves were still rising on the right. So I graphed the correlations out to 60 months. This is the result:

Figure 4. Rescaling of Figure 3, showing the effect of lags out to 60 months.
Note how, once again, the smoothing (even for as short a period as six months, green line) converts a non-descript region (say lag +30 to +60, right part of the graph) into a high correlation region, by the lumping together of individual peaks. Remember, this was just random blue noise, none of these are represent real lagged relationships despite the high correlation.
My general conclusion from all of this is to avoid looking for lagged correlations in smoothed datasets, they’ll lie to you. I was surprised by the creation of apparent, but totally spurious, lagged correlations when the data is smoothed.
And for the $64,000 question … is the correlation found in the Macrae study valid, or spurious? I truly don’t know, although I strongly suspect that it is spurious. But how can we tell?
My best to everyone,
w.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
I learned Algol on the great god Burroughs B5500 back in 1967. Hollerith cards, overnight batch processing. The advanced Computer Science majors were using a new high-level language called BASIC.
@Willis: You’ll probably want to subscribe to -help and -announce.
cheers,
gary
For muti-varaible correlation use the software “formulize” available on http://www.nutonian.com (for free for limited dataset, in earlier times it was totally free).
You can get amazing results as for example this: http://climate.mr-int.ch/NotesImages/Correlation_1.png which correlates observed monthly temperature anomalies (HADCRUT3) with Atlantic Multi Decadal Oscillations (AMO), El Niño-La Niña, transmitted solar radiation (which reveals volcanic eruptions almost as a Dirac impulse), CO2 atmospheric concentration, and solar spots. Caution: correlation does not necessarily imply causation!
wrt the delay from temperature to CO2:
There is a lot of noise in data for both temperature and CO2. However, the 1998 El Nino shows up quite clearly –
http://members.westnet.com.au/jonas1/CO2FocusOn1998.jpg
Temperature is RSS TLT Tropics Ocean for the given date.
CO2s are as at the given date, averaged over various stations in each of the 5 given regions, minus the same value as at 12 months earlier.
The delay from temperature to CO21 is clearly visible. Interestingly, there isn’t a large difference in travel times.
It’s easier to see if the CO2 data is smoothed –
http://members.westnet.com.au/jonas1/CO2FocusOn1998Smoothed.jpg
Is it OK to use smoothed data for this? It looks OK in this example, but as W shows, it’s best to check carefully, and to do proper calcs on the unsmoothed data if you’re using it for anything other than just seeing what it looks like.
PS. Tropic temperature is scaled in the 2 graphs for easy visual comparison. It isn’t smoothed.
Additional note to my previous post at 1:01 am: no smoothing was made prior to the correlation. But the Hadley dataset is anyway the result of data massaging to calculate global averages etc.
johanna says: March 31, 2013 at 12:09 am
Slightly OT, but after reading this post I checked John Daly’s Wikipedia entry. What a shambles.
____________________________
So why not update it? Unfortunately, I don’t know enough about him to do it myself, but surely someone here can tidy it up and explain things a bit more.
.
Willis, what you have discovered by this study is that “smoothers” don’t smooth they corrupt.
Maybe you should have used a filter instead.
I say this because those who are using a “smoother” usually don’t even realise they are using a filter. They just want the data to look “smoother”. If they realised they needed to low pass filter the data, they would realise they needed to design a filter or chose a filter based of some criterion. That would force them to decide what the criterion was and chose a filter that satisfies it.
Sadly, most times they just smooth and end up with crap.
This is one of my all biggest time gripes about climate science, that they can not get beyond runny mean “smoothers”.
You have not shown that you should not filter data, what you have shown is that runny means are a crap filter. . That’s why I call them runny mean filters. You use them and end up with crap everywhere.
The frequency response of the rectangular window used in a running mean is the sync function. It has a zero ( the bit you want to filter out is bang on ) at pi and a negative lobe that peaks at pi*1.3317 ( tan(x)=x at 1.3771*pi if you were wondering ) .
This means that it lets through stuff you imagined you “smoothed” away . Not only that but it inverts it !!
Now guess what? 12 / 1.3317 = 8.97 BINGO
Your nine month correlation is right in the hole.
Now have a look at the data and the light 2m “smoother” There is a peak either side and a negative around 8 months !! It is that 8m negative peak that is getting trough the 12m smoother and being inverted.
Not only have you let through something you intended to remove , you turned it upside down and made a negative correlation into a positive one.
So Allan Macrae may (or may not) have found true correlation but if he did it was probably negated.
There was a similar article that got some applause here a while called something like “Don’t smooth , you hockey puck” in which the author made similar claims similarly based SOLEY on problems of runny means. He totally failed to realise it was not whether you filter but waht filter you choose. But there again he was talking about “smoothers” so probably had not even realised the difference.
I emailed him explaining all this and got a polite but dismissive one word reply: “thanks”.
I really ought to right this up formally and post it somewhere.
Bottom line: don’t smooth, filter. And if you don’t know how to filter either find out or get a job as a climate scientist 😉
BTW there is +ve correlation in CO2 at about 3m though 0.1 looks a bit low in terms of 95% confidence.
Of course the other problem is that he’s also starting with monthly averages , which are themselves sub-sampled running means of 30 days. That’s two more data distortions, the mean and then sub sampling without a proper anti-alias filter.
With a method like that you’d be better flipping a coin. There’s a better chance of getting the right answer.
And I kid you not, this is par for the course in climatology.
Nice discussion of sawtooth CO2 @ur momisugly the end of that old thread.
========================
FWIW, I think the fact that temperature leads CO2 jumps out of the data.
Look here http://www.robles-thome.talktalk.net/carbontemp.pdf
This is just two charts: the twelve month change in atmospheric Carbon, and the twelve month change in temperature (HADCRUT3). These are the very noisy faint lines. The thick lines are the 12 month moving averages of each of these separately. Without doing any correlations, what leads what is very clear. My best fit is that temperature leads carbon by about 7 months.
There are no smoothed series being correlated here, so can be no spurious correlations. I’ll read the article again more slowly to see if it shows some errors in my analysis.
In addition to the numbers, there is of course a good reason why temperature should lead CO2: the gas is less soluble in warmer water, so higher temp is (eventually) more CO2.
The CO2 vs temperature lags are interesting.
But let’s remember CO2 has a seasonal cycle (which varies from location to location). It is tied to the vegetation growth and decay cycles which vary across the planet. It also moves across the planet with large-scale winds which also vary in time. CO2 also has a long-term exponentially increasing trend which should be taken into account.
Temperature, as well, has a seasonal cycle which varies from location to location. Normally we deal with anomalies that are adjusted for the known seasonal patterns but both of these series have seasonal cycles which are offset from each.
It’s hard to say CO2 lags X months behind Temperature changes without properly accounting for all these time series patterns properly.
If you are smoothing either of them improperly compared to their true seasonal and underlying increasing/decreasing trends, your X will not be the true one.
The Dangers of smoothing. (And if you are a climate scientist, a fabulous Opportunity to mislead, which is why nearly every climate science paper uses smoothed data ONLY. Reminds one of a recent Marcott and a recent Hansen paper).
RStudio is a step forward but Eclipse with the StatET add-on is more advanced. For example, multiple plot windows; ability to view multiple sections of code simultaneously; source code debugging with breakpoints; and views of variable space. Really great if you’re combining R with other languages such as C or Perl or Java. They can all be handled under Eclipse with appropriate add-ons.
Matt Briggs has a number of posts on the dangers inherent in smoothing, particularly when combined with prediction.
http://wmbriggs.com/blog/?s=smoothing&x=0&y=0
or just go to wmbriggs.com and search for “smoothing” if the above doesn’t work.
Silver Ralph says:
March 31, 2013 at 1:40 am
johanna says: March 31, 2013 at 12:09 am
Slightly OT, but after reading this post I checked John Daly’s Wikipedia entry. What a shambles.
____________________________
So why not update it? Unfortunately, I don’t know enough about him to do it myself, but surely someone here can tidy it up and explain things a bit more.
———————-
Ralph, people have been trying to do that for nearly a decade. That is my point.
Any attempt to write an objective account of John Daly’s work would immediately be jumped all over by the resident “rapid response team” on wikipedia.
I absolutely agree that someone who is young and wakeful and interested enough should take up the task. It is a worthy project.
As I am older, and need to husband my energy to what will get results (the 80/20 rule), this one is not for me. But, I will never forgive the bastards who sent, received, and subsequently acquiesced to (by silence) that awful email where they cheered John Daly’s death. That includes those who saw the first round of released emails, when it appeared, and said nothing.
Sorry, don’t have the reference at hand, but it is well known to Anthony and long term readers of WUWT.
MODS: I am willing to consider a hosting option for John Daly’s data mine.
Crispin
I was taught to smooth data only prior to display for human consumption, all previous steps and calculation were performed on unfiltered data.
After all, the unknown signal we are looking for is in the original data, careless filtering/smoothing can lose or change these signals.
In its infancy, smoothing of brainwave patterns was also fraught with complications and could result in lost peaks that were valuable in calculating stimulus onset to peak and peak to peak measures. Worse, an industry standard was not set early on so it was difficult to compare results across studies completed by different labs. Climate science is still in its infancy and is hardly making gains to become anything other than an infant.
I’d be happy to host it. I lease a dedicated Linux server and have plenty of space and bandwidth. No idea who I’d need to contact, so if anyone knows, my email is alberts dot jeff at gmail dot com.
Greg Goodman says:
March 31, 2013 at 2:21 am
“The frequency response of the rectangular window used in a running mean is the sync function. It has a zero ( the bit you want to filter out is bang on ) at pi and a negative lobe that peaks at pi*1.3317 ( tan(x)=x at 1.3771*pi if you were wondering ) .
This means that it lets through stuff you imagined you “smoothed” away . Not only that but it inverts it !!
Now guess what? 12 / 1.3317 = 8.97 BINGO
Your nine month correlation is right in the hole.”
————————————————————————————
I think you may be onto something here however, Willis states he used a gausian filter , implying a gausian operator / gauasian weights were applied in the smoothing, which would get rid of the sync function / ringing / bleeding issues associated with a square wave operator. Your assumption is that he basically used a square wave (no weights ) in calculating the smoothing. Now, based on Willis’ results & your analysis, I think you might be on to something – that the actual filtering was a square wave & not a gausian filter as stated. So, once again, this raises more questions & increases my suspicion there is something fundamentally wrong with the calculations presented here as there are many inconsistencies. None of it really makes sense as presented. I would add to my list of what I would like to see the filter operator & it’s associated power spectrum.
Answering the question of “… is the correlation found in the Macrae study valid, or spurious?” should not be a very hard question to answer – it just needs a different analysis – plots of the raw data , the filter operator(s), the filtered data, the spectra of all of the above & the then cross-correlations of both filtered & unfiltered data – if you could look at all of those together, anyone with some signal analysis background ought to be able to look the plots & answer the question, quickly & definitively.
Greg Goodman says:
March 31, 2013 at 2:21 am”
Yours is an intriguing comment. Anyone that can include crap, Bingo, and π in a few lines of text deserves a crack at a full-blown post. Set yourself down and have a go at getting your runny means and filtered points properly sorted out. I’ll suggest having a couple of others (Willis, Geoff S., ?) review it before posting. Why not ask Anthony if this would work for him, insofar as this is his site?
As somebody involved professionally in the analysis of time series for over a decade can I make a few points:
1, smoothing is of NO VALUE unless it is used to create a forecast; I don’t care what the “smooth trend” of past data is — the past data is the best presentation of the past data.
2, never-ever compute an auto-correlation function or cross-correlation function from data to which a process that induces auto-correlation has already been applied (i.e. from a smooth). The random errors of independent and identically distributed data are computable (or bootstrappable), and so the difference of your ACF or CCF from that expected for IID noise processes is also computable. Once you start throwing ad-hoc filters into the data, who knows how those errors are going to behave. Remember the window size of your filter is a degree of freedom that is being adjusted — are you using the standard error of that in your induced error covariance matrix?
3, there are so many ad-hoc smoothing windows thrown around because the make the data look “nice” to the analyst (see #1 above) that it makes one cringe.
Time series analysis was studied extensively by several excellent English statisticians. Kendall, Box and Jenkins made huge contributions. The Box-Jenkins book is really a gem. If you want to do any time-series analysis please read at least that — or Hamilton, for a more modern treatment. The Akiake Information Criterion (AICc) is an excellent tool to tune up Box-Jenkins style models to find the best approximating model for in-sample data. This is based upon very well defined information theoretic analysis of the estimation process.
When you don’t know what the underlying structure is, yes. If you do, however, then there’s nothing wrong with the practice.
Exactly.
Mark
gary turner says:
March 31, 2013 at 12:55 am
Thanks, Gary, noted. Also thanks again to Mosh, Rstudio is awesome.
w.
RERT says:
March 31, 2013 at 2:57 am
FWIW, I think the fact that temperature leads CO2 jumps out of the data.
Look here http://www.robles-thome.talktalk.net/carbontemp.pdf
This is just two charts: the twelve month change in atmospheric Carbon, and the twelve month change in temperature (HADCRUT3). These are the very noisy faint lines. The thick lines are the 12 month moving averages of each of these separately. Without doing any correlations, what leads what is very clear. My best fit is that temperature leads carbon by about 7 months.
What’s clear from that plot is that by the arbitrary shift of the CO2 axis by about -0.3% you’ve given the impression that the linear increase in CO2 independent of T doesn’t exist! What your graph actually shows is that CO2 increases steadily independently of temperature with a superimposed modulation due to temperature. As far as the lag is concerned, you don’t say whether your data is global or not, but if so there’s a problem due to the differences between the hemispheres, Arctic showing intra-annual fluctuations of ~10ppm, Mauna Loa ~5ppm, S Pole ~0ppm
Geoff L: ”
I think you may be onto something here however, Willis states he used a gausian filter , implying a gausian operator / gauasian weights were applied in the smoothing”
Willis (article): “In the Macrae study, he used smoothed datasets (12 month average) of the month-to-month change in temperature (∆T) and CO2 (∆CO2) to establish the lag between the change in CO2 and temperature . Accordingly, I did the same. ”
I read this to mean “running 12 month average” since he is clearly still working with monthly data , not annual data, as would be the case if it was (12 month average) as stated by Willis.
However, he does state later it was done with gaussian filters. So it appears that he was calling his 12m FWHM gaussian which would be an average over 72 months of data a “12 month average”. At least that’s the best I can make of it.
None of that goes against what I said about the problems with running means in general.
What would seem rather odd with what is reported of the McRae study is why anyone would look for a lag correlation of less than 12 months in data that they have tried to remove variations of less than twelve months from.