Guest Post by Willis Eschenbach
Since we’ve been discussing smoothing in datasets, I thought I’d repost something that Steve McIntyre had graciously allowed me to post on his amazing blog ClimateAudit back in 2008.
—————————————————————————————–
Data Smoothing and Spurious Correlation
Allan Macrae has posted an interesting study at ICECAP. In the study he argues that the changes in temperature (tropospheric and surface) precede the changes in atmospheric CO2 by nine months. Thus, he says, CO2 cannot be the source of the changes in temperature, because it follows those changes.
Being a curious and generally disbelieving sort of fellow, I thought I’d take a look to see if his claims were true. I got the three datasets (CO2, tropospheric, and surface temperatures), and I have posted them up here. These show the actual data, not the month-to-month changes.
In the Macrae study, he used smoothed datasets (12 month average) of the month-to-month change in temperature (∆T) and CO2 (∆CO2) to establish the lag between the change in CO2 and temperature . Accordingly, I did the same. [My initial graph of the raw and smoothed data is shown above as Figure 1, I repeat it here with the original caption.]

Figure 1. Cross-correlations of raw and 12-month smoothed UAH MSU Lower Tropospheric Temperature change (∆T) and Mauna Loa CO2 change (∆CO2). Smoothing is done with a Gaussian average, with a “Full Width to Half Maximum” (FWHM) width of 12 months (brown line). Red line is correlation of raw unsmoothed data (referred to as a “0 month average”). Black circle shows peak correlation.
At first glance, this seemed to confirm his study. The smoothed datasets do indeed have a strong correlation of about 0.6 with a lag of nine months (indicated by the black circle). However, I didn’t like the looks of the averaged data. The cycle looked artificial. And more to the point, I didn’t see anything resembling a correlation at a lag of nine months in the unsmoothed data.
Normally, if there is indeed a correlation that involves a lag, the unsmoothed data will show that correlation, although it will usually be stronger when it is smoothed. In addition, there will be a correlation on either side of the peak which is somewhat smaller than at the peak. So if there is a peak at say 9 months in the unsmoothed data, there will be positive (but smaller) correlations at 8 and 10 months. However, in this case, with the unsmoothed data there is a negative correlation for 7, 8, and 9 months lag.
Now Steve McIntyre has posted somewhere about how averaging can actually create spurious correlations (although my google-fu was not strong enough to find it). I suspected that the correlation between these datasets was spurious, so I decided to look at different smoothing lengths. These look like this:

Figure 2. Cross-correlations of raw and smoothed UAH MSU Lower Tropospheric Temperature change (∆T) and Mauna Loa CO2 change (∆CO2). Smoothing is done with a Gaussian average, with a “Full Width to Half Maximum” (FWHM) width as given in the legend. Black circles shows peak correlation for various smoothing widths. As above, a “0 month” average shows the lagged correlations of the raw data itself.
Note what happens as the smoothing filter width is increased. What start out as separate tiny peaks at about 3-5 and 11-14 months end up being combined into a single large peak at around nine months. Note also how the lag of the peak correlation changes as the smoothing window is widened. It starts with a lag of about 4 months (purple and blue 2 month and 6 month smoothing lines). As the smoothing window increases, the lag increases as well, all the way up to 17 months for the 48 month smoothing. Which one is correct, if any?
To investigate what happens with random noise, I constructed a pair of series with similar autoregressions, and I looked at the lagged correlations. The original dataset is positively autocorrelated (sometimes called “red” noise). In general, the change (∆T or ∆CO2) in a positively autocorrelated dataset is negatively autocorrelated (sometimes called “blue noise”). Since the data under investigation is blue, I used blue random noise with the same negative autocorrelation for my test of random data. However, the exact choice is immaterial to the smoothing issue.
This was my first result using random data:

Figure 3. Cross-correlations of raw and smoothed random (blue noise) datasets. Smoothing is done with a Gaussian average, with a “Full Width to Half Maximum” (FWHM) width as given in the legend. Black circles show peak correlations for various smoothings.
Note that as the smoothing window increases in width, we see the same kind of changes we saw in the temperature/CO2 comparison. There appears to be a correlation between the smoothed random series, with a lag of about 7 months. In addition, as the smoothing window widens, the maximum point is pushed over, until it occurs at a lag which does not show any correlation in the raw data.
After making the first graph of the effect of smoothing width on random blue noise, I noticed that the curves were still rising on the right. So I graphed the correlations out to 60 months. This is the result:

Figure 4. Rescaling of Figure 3, showing the effect of lags out to 60 months.
Note how, once again, the smoothing (even for as short a period as six months, green line) converts a non-descript region (say lag +30 to +60, right part of the graph) into a high correlation region, by the lumping together of individual peaks. Remember, this was just random blue noise, none of these are represent real lagged relationships despite the high correlation.
My general conclusion from all of this is to avoid looking for lagged correlations in smoothed datasets, they’ll lie to you. I was surprised by the creation of apparent, but totally spurious, lagged correlations when the data is smoothed.
And for the $64,000 question … is the correlation found in the Macrae study valid, or spurious? I truly don’t know, although I strongly suspect that it is spurious. But how can we tell?
My best to everyone,
w.
Clarification:
When I wrote: “I then looked at the monthly averages of dCO2 (ie current CO2 level – previous month’s CO2 level).” what I meant was that I looked at the average (taken over the years in the date set) for each month of the difference between that month’s CO2 level and the previous month’s.
Simon Anthony
Addendum: The absence of any “rush-hour” spikes in urban CO2 concentrations was surprising.
When I first pointed out this relationship (dCO2/dt varies with T and T lags CO2 by 9 months), it was deemed incorrect.
Then it was accepted as valid by some on the warmist side of this debate, but dismissed as a “feedback”.
This “feedback argument” appears to be a “cargo cult” rationalization, derived as follows:
“We KNOW that CO2 drives Temperature, therefore it MUST BE a feedback.”
More below from 2009:
__________________
http://wattsupwiththat.com/2009/01/21/antarctica-warming-an-evolution-of-viewpoint/#comment-77000
Time is limited so I can only provide some more general answers to your questions:
My paper was posted Jan.31/08 with a spreadsheet at
http://icecap.us/index.php/go/joes-blog/carbon_dioxide_in_not_the_primary_cause_of_global_warming_the_future_can_no/
The paper is located at
http://icecap.us/images/uploads/CO2vsTMacRae.pdf
The relevant spreadsheet is
http://icecap.us/images/uploads/CO2vsTMacRaeFig5b.xls
There are many correlations calculated in the spreadsheet.
In my Figure 1 and 2, global dCO2/dt closely coincides with global Lower Tropospheric Temperature LT and Surface Temperature ST. I believe that the temperature and CO2 datasets are collected completely independently, and yet there is this clear correlation.
After publishing this paper, I also demonstrated the same correlation with different datasets – using Mauna Loa CO2 and Hadcrut3 ST going back to 1958. More recently I examined the close correlation of LT measurements taken by satellite and those taken by radiosonde.
Further, I found (actually I was given by Richard Courtney) earlier papers by Kuo (1990) and Keeling (1995) that discussed the delay of CO2 after temperature, although neither appeared to notice the even closer correlation of dCO2/dt with temperature. This correlation is noted in my Figures 3 and 4.
See also Roy Spencer’s (U of Alabama, Huntsville) take on this subject at
http://wattsupwiththat.wordpress.com/2008/01/25/double-whammy-friday-roy-spencer-on-how-oceans-are-driving-co2/
and
http://wattsupwiththat.wordpress.com/2008/01/28/spencer-pt2-more-co2-peculiarities-the-c13c12-isotope-ratio/
This subject has generated much discussion among serious scientists, and this discussion continues. Almost no one doubts the dCO2/dt versus LT (and ST) correlation. Some go so far as to say that humankind is not even the primary cause of the current increase in atmospheric CO2 – that it is natural. Others rely on a “material balance argument” (mass balance argument) to refute this claim – I think these would be in the majority. I am an agnostic on this question, to date.
The warmist side also has also noted this ~9 month delay, but try to explain it as a “feedback effect” – this argument seems more consistent with AGW religious dogma than with science (“ASSUMING AGW is true, then it MUST be feedback”). 🙂
It is interesting to note, however, that the natural seasonal variation in atmospheric CO2 ranges up to ~16ppm in the far North, whereas the annual increase in atmospheric CO2 is only ~2ppm. This reality tends to weaken the “material balance argument”. This seasonal ‘sawtooth” of CO2 is primarily driven by the Northern Hemisphere landmass, which is much greater in area than that of the Southern Hemisphere. CO2 falls during the NH summer due primarily to land-based photosynthesis, and rises in the late fall, winter and early spring as biomass degrades.
There is also likely to be significant CO2 solution and exsolution from the oceans.
See the excellent animation at http://svs.gsfc.nasa.gov/vis/a000000/a003500/a003562/carbonDioxideSequence2002_2008_at15fps.mp4
It is also interesting to note that the detailed signals we derive from the data show that CO2 lags temperature at all time scales, from the 9 month delay for ~ENSO cycles to the 600 year delay inferred in the ice core data for much longer cycles.
Regards, Allan
The overwhelming importance of the changes in CO2 from Dec to Jan and from Sep to Oct in the data can be seen by taking the CO2 measurement at the start of the data (Jan 1979), adding the changes in CO2 for only Dec-Jan and Sep-Oct for all successive years and comparing with the final measurement (Sep 2006).
Start measurement is 336.67, final measurement is 381.55. The result from adding just those 2 months’ additions to the start figure is 381.47.
It seems as though 10 months of the year make no net difference to CO2 levels. That seems unlikely.
I looked up Mauno Loa CO2 data here… ftp://ftp.cmdl.noaa.gov/ccg/co2/trends/co2_mm_mlo.txt
The numbers are different from those linked by Willis in the original post. That might not matter (there may have been later minor adjustments) but, more significantly, they don’t show the strange concentration of CO2 changes in 2 months. It seems possible that the data linked to by Willis might have some errors.
Allan MacRae says:
April 4, 2013 at 6:57 am
Addendum: The absence of any “rush-hour” spikes in urban CO2 concentrations was surprising.
Indeed it is, take a look at this paper which indicates otherwise:
http://www.ars.usda.gov/SP2UserFiles/ad_hoc/12755100FullTextPublicationspdf/Publications/sookim/ElevatedAtmosphericCO2ConcentrationandTemperatureAcrossanUrbanRuralTransect.pdf
Willis Eschenbach says:
April 3, 2013 at 9:53 am
Thanks, Mike. Unfortunately, you’re just looking at an artifact created by comparing today’s temperature to a 12-month change in temperature. To see the effect, try graphing the temperature versus the 12 month change in the temperature in the same manner …
Willis, the CO2 concentration has to be modulated by the temperature since the natural fluxes of CO2 are functions of inter alia temperature, due to Henry’s Law, temperature sensitivity of photosynthesis etc.
The balance equation for the atmosphere takes the form:
dCO2/dT= fossil fuel combustion+(Natural sources-sinks)= F+So(T)-Si(T)= F+ΔS(T)
Observations show that dCO2/dT≅F/2 over the course of a year, therefore ΔS(T)≅-F/2 so natural sinks exceed natural sources and about half the fossil fuel emissions are sequestered.
So the overall growth of CO2 is a steady annual increase due to fossil fuel emissions with superimposed fluctuations due to ΔS(T). I’m sure that the short term lags are due to the hemispheric differences, the seasonal change is mostly due to NH seasons with very little SH coupled with atmospheric transport. For those non-believers in mass balance equations I suggest a conversation with a Chem engineer (or an accountant for that matter)!
Hi Willis and Allan MacRae
If you replace the CO2 data linked to by Willis with the monthly Mauna Loa data from this site… ftp://ftp.cmdl.noaa.gov/ccg/co2/trends/co2_mm_mlo.txt … and then look at the correlation between the monthly changes in CO2 and the monthly temp changes (from Willis’s data) – not the smoothed versions, just the differences – then you find that there’s a “clean” peak at 4 months and at annual intervals before and after. This is in contrast to the same correlation using Willis’s data for CO2 which has no such peaks. However, although it’s clean, the peak height is quite low (no higher than the “random” peaks generated by Willis’s data).
The smoothed data (with a 12 month filter) has a peak at 8 months and at annual intervals before and after.
If you replace the “real” CO2 data with simulated numbers with an increment equal to the average change over the real data set, perturbed by a random amount up to +/- the standard deviation of “real” dCO2 data, you find that the correlation of the unsmoothed data goes away. However, for the 12 month smoothed data, you usually (ie for different random number sets) find peaks of about the same size as those in Willis’s first chart, although their location varies.
I haven’t worked through the maths but it seems likely you’ll get misleading results if you average data over 12 months – which effectively smooths away structure of shorter time intervals – and then look for correlations at less than 12 months displacement (9 months in Willis’s example above).
These and earlier observations suggest the following:
– The CO2 data linked to by Willis seems to have some problems; I don’t know the source of the data but the dominance of just 2 months in determining CO2 levels seems wrong;
– The peaks in the smoothed correlation function are likely to be spurious if the peak’s displacement is less than the smoothing interval;
– The “real” unsmoothed Mauna Loa data does seem to have a peak in the correlation with temp changes at a 4-month displacement. It’s low but “clean” and, as such peaks aren’t present with simulated data, it may be that it’s a genuine effect rather than an artifact of the methods or an accident of the data.
Simon Anthony
Allan MacRae April 4, 2013 at 7:21 am “When I first pointed out this relationship (dCO2/dt varies with T and T lags CO2 by 9 months), it was deemed incorrect.” – do you mean “CO2 lags T by 9 months”?
Phil. says “So the overall growth of CO2 is a steady annual increase due to fossil fuel emissions with superimposed fluctuations due to ΔS(T).”
Thanks for putting it so concisely. I am quite sure that your explanation is correct, but I have struggled to explain it as clearly.
Simon Anthony says “The overwhelming importance of the changes in CO2 from Dec to Jan and from Sep to Oct in the data can be seen by […].
It seems as though 10 months of the year make no net difference to CO2 levels. That seems unlikely.”
If I understand you correctly, the effect you refer to could be obtained from any data with a regular cycle.
http://members.westnet.com.au/jonas1/CO2Profile.jpg
Mike Jonas says:
If I understand you correctly, the effect you refer to could be obtained from any data with a regular cycle.
Although the data are annually cyclic, they aren’t “any data with a regular cycle”. This particular annual cycle has the months of January and October always positive while the others are ~randomly distributed about zero. The data seem to be wrong – other data showing supposedly the same measurements have an annual cycle which is more like a sine wave – but I don’t know where Willis got them from. It’s a long time since he made the original post so he may not now be able to trace the source.
Simon Anthony – OK, I see what you mean.
Hi Willis and Allan MacRae
Another way to see that the 9-month lagged relationship between dT and dCO2 is likely to be spurious is that, if genuine, you’d expect to see further, successively smaller, peaks in the correlation at annual intervals. There are no such peaks.
Mike Jonas says:
April 4, 2013 at 1:16 pm
Allan MacRae April 4, 2013 at 7:21 am “When I first pointed out this relationship (dCO2/dt varies with T and T lags CO2 by 9 months), it was deemed incorrect.” – do you mean “CO2 lags T by 9 months”?
Answer – Yes – my apologies – blame it on absence of coffee – thanks, Allan
Phil. says:
April 4, 2013 at 8:58 am
Allan MacRae says:
April 4, 2013 at 6:57 am
Addendum: The absence of any “rush-hour” spikes in urban CO2 concentrations was surprising.
Indeed it is, take a look at this paper which indicates otherwise:
http://www.ars.usda.gov/SP2UserFiles/ad_hoc/12755100FullTextPublicationspdf/Publications/sookim/ElevatedAtmosphericCO2ConcentrationandTemperatureAcrossanUrbanRuralTransect.pdf
Thank you Phil .
But please look at Fig. 2 in your http://www.ars…. paper. (God – who does these acronyms in the USA – it’s almost as bad as PNAS)
Yes, CO2 concentrations are higher in urban areas than in rural areas as stated in the paper, no surprise there –
BUT atmospheric CO2 concentrations plummet starting at about 7am daily.
That was my earlier point. At the time of peak morning CO2 emissions from power plants and the morning rush hour, CO2 concentrations drop. Obviously, photosynthesis is the dominant factor and “excess” CO2 appears to be trapped at the source. However, I suppose one could argue that atm. CO2 would drop even more if it were not for emissions from power plants and cars.
Warning – no coffee yet today.
Phil. says: April 4, 2013 at 9:33 am
“For those non-believers in mass balance equations I suggest a conversation with a Chem engineer (or an accountant for that matter)!”
Phil – Kindly Google the discussions between Ferdinand Engelbeen and Richard Courtney here and on ClimateAudit. It’ s a bit more complicated than you think, imo.
But it is possible that you are almost correct…
to Phil 2:
http://wattsupwiththat.com/2012/04/19/what-you-mean-we-arent-controlling-the-climate/
to Phil 3:
http://wattsupwiththat.com/2011/08/05/the-emily-litella-moment-for-climate-science-and-co2/
http://wattsupwiththat.com/2011/08/05/the-emily-litella-moment-for-climate-science-and-co2/#comment-713773
AllanMRMacRae says: August 7, 2011 at 5:09 am
Hi Ferdinand,
I hope you are well, and am enjoying once again your longstanding dialogue with Richard Courtney.
I think you raise some very interesting points, particularly in the quantification of certain factors.
I wonder if some of these questions can be explained by at least two, and possibly more, time lags of CO2 AFTER temperature change. We think we know there is an ~~800 year “long cycle” lag of CO2 after temperature from the ice core data, and also a ~9-month “short-cycle” lag as derived from modern data. If I recall correctly, the dear, late Ernst Beck also postulated another such “intermediate-cycle” lag, and it may still become apparent, even if it takes more than ~5 years to manifest itself.
However, with sincere respect, I don’t agree with your “material balance argument”. I think it is incorrect because it inherently assumes the climate-CO2 system is static, but it is highly dynamic, and the relatively small humanmade fraction of total CO2 flux is insignificant in this huge system, as it continues to chase equilibrium into eternity.
Best personal regards, Allan
http://wattsupwiththat.com/2012/08/30/important-paper-strongly-suggests-man-made-co2-is-not-the-driver-of-global-warming/#comment-1070493
Here is an interesting article about Japanese satellite results, at
http://chiefio.wordpress.com/2011/10/31/japanese-satellites-say-3rd-world-owes-co2-reparations-to-the-west/
Japanese Satellites say 3rd World Owes CO2 Reparations to The West
Posted on 31 October 2011
[excerpt]
“ It seems that the Japanese have a nice tool on orbit and set out to figure out who was a “maker” and who was a “taker” in the CO2 production / consumption game. Seems they found out that CO2 was largely net absorbed in the industrialized ‘west’ and net created in the ’3rd world’. “
See also Murry Salby’s video at time 10:38 – the major global CO2 sources are NOT in industrial areas – they are in equatorial areas where deforestation is rampant.
As I’ve posted to Ferdinand Engelbeen in the past:
“Variations in biomass (e.g. deforestation and reforestation) may be the huge variable that would make your mass balance equation work better.”
As Richard Courtney ably summarizes above:
“The unresolved issues are
(a) what is the equilibrium state of the carbon cycle?
(b) how does the equilibrium state of the carbon cycle vary?
(c) what causes the equilibrium state of the carbon cycle to vary?
(d) does the anthropogenic CO2 emission induce the equilibrium state of the carbon cycle to vary discernibly?”
To summarize:
This is an important scientific debate about the carbon cycle and the primary sources of increasing atmospheric CO2. It is entirely possible, some say it is probable, that increasing atmospheric CO2 is NOT primarily caused by the burning of fossil fuels, others say it IS, and the scientific debate goes on.
To be clear, however, the only significant apparent impact of increasing atmospheric CO2 is beneficial, because CO2 is a plant food.
The claim that increasing CO2 is causing catastrophic global warming is being falsified by these facts:
– there has been no net global warming for 10 to 15 years, despite increasing atmospheric CO2;
– predictions of catastrophic global warming are the result of deeply flawed climate computer models that are inconsistent with actual observations;
– the leading proponents of catastrophic global warming hysteria have been shown in the Climategate emails to be dishonest.
A decade ago, we wrote:
“Climate science does not support the theory of catastrophic human-made global warming – the alleged warming crisis does not exist.”
Since then there has been NO net global warming.
Also a decade ago, I (we) predicted global cooling would commence by 2020 to 2030. When this cooling does occur, many of these scientific questions will be answered.
In the meantime, society should reject the claims of the global warming alarmists, because they have a demonstrated track record of being wrong in ALL their major climate alarmist predictions.
In science, such an utter failure on one’s predictive track record is a fair and objective measure of the falsification of one’s hypotheses.
Repeating, from 2002, with ten more years of confirming data:
“Climate science does not support the theory of catastrophic human-made global warming – the alleged warming crisis does not exist.”
Allan MacRae says:
April 6, 2013 at 8:41 pm
Phil. says: April 4, 2013 at 9:33 am
“For those non-believers in mass balance equations I suggest a conversation with a Chem engineer (or an accountant for that matter)!”
Phil – Kindly Google the discussions between Ferdinand Engelbeen and Richard Courtney here and on ClimateAudit. It’ s a bit more complicated than you think, imo.
But it is possible that you are almost correct…
It can be more complicated but the rate of change will always equal the difference between total sources and total sinks! You can break down the terms to give more detail but that will always be true.
So yes I am correct, thank you.
However, with sincere respect, I don’t agree with your “material balance argument”. I think it is incorrect because it inherently assumes the climate-CO2 system is static,
No it does not, if it did dCO2/dt would be zero, sources and sinks can both be functions of temperature as I explicitly stated (and also of time of course)!
Now Euro-Nutcases are Burning Vast Tracts of Forest as so called Biomass, in the name of saving the environment by burning it. It’s a bit like Bombing for Peace. In some
countries, such as Poland and Finland, wood meets more than 80% of renewable-energy
demand. So much wood is burned that Construction board companies have gone bust by the cartload. Read more at URL enviromental-lunacy.notlong.com – Economist Mag.
Phil – I still do not like the Mass Balance argument that attributes increases in atmospheric CO2 to the burning of coal, oil and natural gas, but another possibility is that increasing atmospheric CO2 is not primarily due to the aforementioned fossil fuel burning but rather primarily due to deforestation. Here is some evidence:
From above:
“ It seems that the Japanese have a nice tool on orbit and set out to figure out who was a “maker” and who was a “taker” in the CO2 production / consumption game. Seems they found out that CO2 was largely net absorbed in the industrialized ‘west’ and net created in the ’3rd world’. “
See also Murry Salby’s video at time 10:38 – the major global CO2 sources are NOT in industrial areas – they are in equatorial areas where deforestation is rampant.