Guest Post by Willis Eschenbach
Since we’ve been discussing smoothing in datasets, I thought I’d repost something that Steve McIntyre had graciously allowed me to post on his amazing blog ClimateAudit back in 2008.
—————————————————————————————–
Data Smoothing and Spurious Correlation
Allan Macrae has posted an interesting study at ICECAP. In the study he argues that the changes in temperature (tropospheric and surface) precede the changes in atmospheric CO2 by nine months. Thus, he says, CO2 cannot be the source of the changes in temperature, because it follows those changes.
Being a curious and generally disbelieving sort of fellow, I thought I’d take a look to see if his claims were true. I got the three datasets (CO2, tropospheric, and surface temperatures), and I have posted them up here. These show the actual data, not the month-to-month changes.
In the Macrae study, he used smoothed datasets (12 month average) of the month-to-month change in temperature (∆T) and CO2 (∆CO2) to establish the lag between the change in CO2 and temperature . Accordingly, I did the same. [My initial graph of the raw and smoothed data is shown above as Figure 1, I repeat it here with the original caption.]

Figure 1. Cross-correlations of raw and 12-month smoothed UAH MSU Lower Tropospheric Temperature change (∆T) and Mauna Loa CO2 change (∆CO2). Smoothing is done with a Gaussian average, with a “Full Width to Half Maximum” (FWHM) width of 12 months (brown line). Red line is correlation of raw unsmoothed data (referred to as a “0 month average”). Black circle shows peak correlation.
At first glance, this seemed to confirm his study. The smoothed datasets do indeed have a strong correlation of about 0.6 with a lag of nine months (indicated by the black circle). However, I didn’t like the looks of the averaged data. The cycle looked artificial. And more to the point, I didn’t see anything resembling a correlation at a lag of nine months in the unsmoothed data.
Normally, if there is indeed a correlation that involves a lag, the unsmoothed data will show that correlation, although it will usually be stronger when it is smoothed. In addition, there will be a correlation on either side of the peak which is somewhat smaller than at the peak. So if there is a peak at say 9 months in the unsmoothed data, there will be positive (but smaller) correlations at 8 and 10 months. However, in this case, with the unsmoothed data there is a negative correlation for 7, 8, and 9 months lag.
Now Steve McIntyre has posted somewhere about how averaging can actually create spurious correlations (although my google-fu was not strong enough to find it). I suspected that the correlation between these datasets was spurious, so I decided to look at different smoothing lengths. These look like this:

Figure 2. Cross-correlations of raw and smoothed UAH MSU Lower Tropospheric Temperature change (∆T) and Mauna Loa CO2 change (∆CO2). Smoothing is done with a Gaussian average, with a “Full Width to Half Maximum” (FWHM) width as given in the legend. Black circles shows peak correlation for various smoothing widths. As above, a “0 month” average shows the lagged correlations of the raw data itself.
Note what happens as the smoothing filter width is increased. What start out as separate tiny peaks at about 3-5 and 11-14 months end up being combined into a single large peak at around nine months. Note also how the lag of the peak correlation changes as the smoothing window is widened. It starts with a lag of about 4 months (purple and blue 2 month and 6 month smoothing lines). As the smoothing window increases, the lag increases as well, all the way up to 17 months for the 48 month smoothing. Which one is correct, if any?
To investigate what happens with random noise, I constructed a pair of series with similar autoregressions, and I looked at the lagged correlations. The original dataset is positively autocorrelated (sometimes called “red” noise). In general, the change (∆T or ∆CO2) in a positively autocorrelated dataset is negatively autocorrelated (sometimes called “blue noise”). Since the data under investigation is blue, I used blue random noise with the same negative autocorrelation for my test of random data. However, the exact choice is immaterial to the smoothing issue.
This was my first result using random data:

Figure 3. Cross-correlations of raw and smoothed random (blue noise) datasets. Smoothing is done with a Gaussian average, with a “Full Width to Half Maximum” (FWHM) width as given in the legend. Black circles show peak correlations for various smoothings.
Note that as the smoothing window increases in width, we see the same kind of changes we saw in the temperature/CO2 comparison. There appears to be a correlation between the smoothed random series, with a lag of about 7 months. In addition, as the smoothing window widens, the maximum point is pushed over, until it occurs at a lag which does not show any correlation in the raw data.
After making the first graph of the effect of smoothing width on random blue noise, I noticed that the curves were still rising on the right. So I graphed the correlations out to 60 months. This is the result:

Figure 4. Rescaling of Figure 3, showing the effect of lags out to 60 months.
Note how, once again, the smoothing (even for as short a period as six months, green line) converts a non-descript region (say lag +30 to +60, right part of the graph) into a high correlation region, by the lumping together of individual peaks. Remember, this was just random blue noise, none of these are represent real lagged relationships despite the high correlation.
My general conclusion from all of this is to avoid looking for lagged correlations in smoothed datasets, they’ll lie to you. I was surprised by the creation of apparent, but totally spurious, lagged correlations when the data is smoothed.
And for the $64,000 question … is the correlation found in the Macrae study valid, or spurious? I truly don’t know, although I strongly suspect that it is spurious. But how can we tell?
My best to everyone,
w.
Allan MacRae says:
March 31, 2013 at 9:14 pm
First, you can’t really examine causality mathematically. You can examine Granger causality. Granger causality measures exactly what Matt is discussing, whether CO2 predicts temperature rise or vice-versa.
Next, in Granger causality there are four possible scenarios.
1). CO2 Granger-causes Temperature
2). Temperature Granger-causes CO2
3). Neither one Granger-causes the other.
4). CO2 Granger-causes Temperature —AND— Temperature Granger-causes CO2
Matt Briggs admits above that he “ignores the very real possibility” that each one Granger-causes the other … the problem is, I’ve done the analysis. The answer that I got was Number 4), that each one Granger-causes the other one. And that’s the one he ignores.
Finally, Matt’s conclusion is:
I would hardly call that a ringing endorsement of the idea that temperature changes cause CO2 changes 9 months later.
However, I intend to take another look at this, always new ideas and new datasets. Thanks for the push.
My best to you,
w.
Willis,
You asked me to show you what happens when you smooth real data using a Fourier method. Here is your UAH data. You can compare the original to 3 month and 12 month smoothed versions. Used R fft.
Time UAH UAH_3m UAH_12m
1979 -0.08 -0.008539224 -0.004108392
1979.08 0.05 0.001612929 -0.008500194
1979.17 -0.09 -0.003505892 -0.007833117
1979.25 -0.05 -0.037399051 -0.002801893
1979.33 -0.11 -0.04411988 0.005382925
1979.42 -0.06 -0.010358059 0.015363334
1979.5 0.06 0.027467731 0.02596757
1979.58 0.05 0.035534778 0.036426131
1979.67 0.07 0.047906606 0.046427538
1979.75 0.18 0.1002987 0.056018996
1979.83 0.24 0.111512434 0.065399103
1979.92 0.04 0.05729056 0.07467573
1980 0.15 0.059295181 0.083666377
1980.08 0.17 0.097500129 0.091801484
1980.17 0.08 0.068134228 0.098159222
1980.25 0.19 0.073608283 0.101622776
1980.33 0.25 0.154716442 0.101118341
1980.42 0.26 0.138009658 0.095872253
1980.5 0.1 0.042036078 0.085622971
1980.58 0.01 0.033513045 0.070737562
1980.67 0.13 0.063171411 0.052207687
1980.75 0.08 0.046366718 0.031529483
1980.83 0.02 0.027382731 0.010497033
1980.92 -0.01 -0.003469599 -0.009045598
1981 -0.13 -0.05458334 -0.025446277
1981.08 -0.14 -0.059952362 -0.037442264
1981.17 -0.08 -0.033014287 -0.044291027
1981.25 -0.06 -0.019244334 -0.045829816
1981.33 -0.02 -0.00786738 -0.042472645
1981.42 -0.09 -0.022762758 -0.035159026
1981.5 -0.11 -0.064752878 -0.025266516
1981.58 -0.17 -0.061034807 -0.01449179
1981.67 -0.03 -0.01059729 -0.004697239
1981.75 0.06 0.029047263 0.00228282
1981.83 0.03 0.04141028 0.004877085
1981.92 0.04 0.01312394 0.002040458
1982 -0.1 -0.037208206 -0.006486312
1982.08 -0.14 -0.052833261 -0.019970345
1982.17 -0.09 -0.050530566 -0.036593442
1982.25 -0.15 -0.057918495 -0.053527216
1982.33 -0.1 -0.036594655 -0.067210024
1982.42 0 -0.007612419 -0.073821857
1982.5 -0.1 -0.0251548 -0.069907281
1982.58 -0.12 -0.053150569 -0.053052181
1982.67 -0.11 -0.067678627 -0.022489248
1982.75 -0.26 -0.080284081 0.020499939
1982.83 0.04 -0.019058447 0.072496306
1982.92 0.19 0.142730891 0.128228101
1983 0.58 0.275704913 0.18121919
1983.08 0.58 0.30870024 0.224714983
1983.17 0.6 0.300739452 0.25273525
1983.25 0.47 0.258859023 0.261064556
1983.33 0.39 0.178114066 0.247989978
1983.42 0.22 0.150642807 0.214634787
1983.5 0.41 0.180501318 0.164809977
1983.58 0.19 0.134618412 0.10439851
1983.67 0.09 0.025461752 0.040380583
1983.75 -0.1 -0.017423268 -0.020318301
1983.83 -0.08 -0.047901454 -0.071937642
1983.92 -0.34 -0.155206865 -0.110740149
1984 -0.44 -0.209054099 -0.135459843
1984.08 -0.26 -0.124996617 -0.147267714
1984.17 -0.11 -0.049295701 -0.149282752
1984.25 -0.19 -0.076307351 -0.145748927
1984.33 -0.28 -0.139065626 -0.141063684
1984.42 -0.35 -0.175867018 -0.138866796
1984.5 -0.44 -0.175522427 -0.141375546
1984.58 -0.24 -0.169715304 -0.149089408
1984.67 -0.54 -0.196444501 -0.160899687
1984.75 -0.35 -0.219666637 -0.17454778
1984.83 -0.5 -0.210891237 -0.187300779
1984.92 -0.39 -0.193214843 -0.196671852
1985 -0.28 -0.143383562 -0.201013557
1985.08 -0.25 -0.090642384 -0.199853214
1985.17 -0.22 -0.136570089 -0.193909386
1985.25 -0.51 -0.209296927 -0.184809556
1985.33 -0.36 -0.197384789 -0.174601169
1985.42 -0.43 -0.205516742 -0.165194151
1985.5 -0.59 -0.257160512 -0.157882398
1985.58 -0.33 -0.204867631 -0.153063669
1985.67 -0.34 -0.116593715 -0.15021999
1985.75 -0.24 -0.131340683 -0.148149653
1985.83 -0.26 -0.132093733 -0.145375589
1985.92 -0.2 -0.062452903 -0.14061026
1986 -0.11 -0.073389226 -0.133145234
1986.08 -0.35 -0.154521136 -0.123056871
1986.17 -0.33 -0.154132216 -0.111171324
1986.25 -0.18 -0.08827268 -0.098798632
1986.33 -0.17 -0.071074766 -0.087309358
1986.42 -0.22 -0.108884238 -0.077671381
1986.5 -0.29 -0.118131279 -0.0700775
1986.58 -0.1 -0.072504037 -0.063772877
1986.67 -0.18 -0.048220315 -0.05714053
1986.75 -0.11 -0.064121747 -0.048036112
1986.83 -0.08 -0.039024939 -0.034297189
1986.92 -0.02 0.023499683 -0.014304373
1987 0.15 0.064196666 0.012545574
1987.08 0.2 0.100602866 0.045576365
1987.17 0.13 0.114145871 0.082850777
1987.25 0.24 0.071204166 0.121474902
1987.33 0.03 0.072221703 0.158121279
1987.42 0.35 0.161994422 0.189649371
1987.5 0.43 0.218564188 0.213676844
1987.58 0.37 0.208539132 0.228966315
1987.67 0.42 0.214241289 0.235537245
1987.75 0.52 0.251837623 0.234480477
1987.83 0.5 0.291134304 0.227526798
1987.92 0.62 0.287670661 0.216482591
1988 0.4 0.218580162 0.202679488
1988.08 0.28 0.170911136 0.186582551
1988.17 0.44 0.176220126 0.167662897
1988.25 0.09 0.121093935 0.144575239
1988.33 0.15 0.022267104 0.115604281
1988.42 -0.09 0.000905258 0.079274964
1988.5 0.03 0.016195098 0.034977134
1988.58 0.05 0.020118108 -0.016553758
1988.67 0.03 0.039619585 -0.073029233
1988.75 -0.07 -0.028613624 -0.130742556
1988.83 -0.4 -0.206958319 -0.185036977
1988.92 -0.64 -0.288475298 -0.231021999
1989 -0.52 -0.264874215 -0.264394328
1989.08 -0.62 -0.30806891 -0.282194366
1989.17 -0.74 -0.337704931 -0.283341049
1989.25 -0.49 -0.26281105 -0.268834462
1989.33 -0.5 -0.227048763 -0.241585769
1989.42 -0.48 -0.22846577 -0.205911552
1989.5 -0.29 -0.150500646 -0.166797106
1989.58 -0.23 -0.091316052 -0.129076236
1989.67 -0.19 -0.097618412 -0.096685154
1989.75 -0.14 -0.049984781 -0.072124691
1989.83 0.01 -0.005817707 -0.056215273
1989.92 -0.2 -0.061277952 -0.048165396
1990 -0.17 -0.110038428 -0.045911617
1990.08 -0.27 -0.102587896 -0.04663977
1990.17 -0.17 -0.079946411 -0.04737239
1990.25 -0.01 -0.013474499 -0.045508998
1990.33 0 0.037042589 -0.039230608
1990.42 0.03 -0.004440867 -0.027719063
1990.5 -0.12 -0.03026285 -0.011184775
1990.58 0.01 -0.001532182 0.009267084
1990.67 -0.03 0.006704662 0.031879289
1990.75 0.1 0.046872033 0.054535321
1990.83 0.19 0.109078801 0.07505149
1990.92 0.15 0.090721899 0.091434591
1991 0.2 0.088070493 0.102089649
1991.08 0.2 0.134378689 0.105975862
1991.17 0.16 0.070874825 0.102714012
1991.25 -0.01 0.004049484 0.092645485
1991.33 0.14 0.092924141 0.076835079
1991.42 0.29 0.138680077 0.057002678
1991.5 0.13 0.072096464 0.035368049
1991.58 0.05 0.059697961 0.014402099
1991.67 0.1 0.01501111 -0.003503232
1991.75 -0.32 -0.108773023 -0.016409601
1991.83 -0.18 -0.104887008 -0.023191172
1991.92 -0.07 -0.028557026 -0.023803456
1992 -0.13 -0.036729023 -0.019389822
1992.08 0.03 0.002028652 -0.012174714
1992.17 0.13 0.085877716 -0.005131198
1992.25 0.1 0.054799812 -0.001459142
1992.33 -0.02 0.004866796 -0.003958139
1992.42 0.01 -0.006639456 -0.014414004
1992.5 -0.27 -0.089520515 -0.033129028
1992.58 -0.22 -0.146432677 -0.058708706
1992.67 -0.3 -0.10162643 -0.088172561
1992.75 -0.17 -0.101854461 -0.117392186
1992.83 -0.3 -0.129648732 -0.141789264
1992.92 -0.21 -0.102411232 -0.157166107
1993 -0.3 -0.132959943 -0.160505791
1993.08 -0.39 -0.199477742 -0.150577771
1993.17 -0.35 -0.153005361 -0.128219838
1993.25 -0.13 -0.063678281 -0.096231914
1993.33 -0.05 -0.018274402 -0.058897664
1993.42 0.02 0.02699991 -0.021228369
1993.5 0.04 0.020199025 0.011917901
1993.58 -0.15 -0.064503633 0.036663629
1993.67 -0.19 -0.078251665 0.050721728
1993.75 0.08 0.030667082 0.053674226
1993.83 0.2 0.133240695 0.046880584
1993.92 0.32 0.141768093 0.033057851
1994 0.08 0.072994946 0.015643959
1994.08 -0.01 -0.01458152 -0.001903554
1994.17 -0.16 -0.06164302 -0.016717269
1994.25 -0.13 -0.056992449 -0.02691508
1994.33 -0.09 -0.044898105 -0.031718716
1994.42 -0.16 -0.054019932 -0.03130616
1994.5 -0.05 -0.041857399 -0.026474364
1994.58 -0.07 0.004466785 -0.018231189
1994.67 0.04 -0.002332096 -0.007444326
1994.75 -0.15 -0.041987995 0.005351446
1994.83 0.07 0.030366654 0.019941017
1994.92 0.26 0.140545623 0.036228281
1995 0.18 0.104307787 0.053926891
1995.08 0.05 0.023679644 0.072258818
1995.17 0.05 0.048339892 0.089780967
1995.25 0.2 0.084860144 0.104428263
1995.33 0.07 0.078988053 0.113801713
1995.42 0.26 0.096150372 0.115657424
1995.5 0.14 0.11597382 0.108486937
1995.58 0.25 0.114322758 0.092038858
1995.67 0.22 0.115184883 0.067629287
1995.75 0.12 0.086816777 0.038127084
1995.83 0.04 0.010913884 0.007572071
1995.92 -0.17 -0.071150561 -0.019526478
1996 -0.24 -0.103667472 -0.039079173
1996.08 -0.09 -0.047335732 -0.048303493
1996.17 0 0.011410818 -0.046402321
1996.25 -0.09 -0.0267085 -0.034860821
1996.33 -0.08 -0.049557025 -0.017264639
1996.42 -0.04 0.008550891 0.001354073
1996.5 0.05 0.014922697 0.015528932
1996.58 -0.07 -0.00792167 0.020529729
1996.67 0.1 0.040345949 0.013446565
1996.75 0.05 0.046189761 -0.006047569
1996.83 -0.09 -0.034926517 -0.03542911
1996.92 -0.16 -0.089818784 -0.069623018
1997 -0.32 -0.125100082 -0.101821654
1997.08 -0.29 -0.152346654 -0.124666935
1997.17 -0.27 -0.136624874 -0.131563414
1997.25 -0.38 -0.142797947 -0.117864634
1997.33 -0.15 -0.120512092 -0.081705216
1997.42 -0.04 0.040153405 -0.024329202
1997.5 0.42 0.180297304 0.050127504
1997.58 0.31 0.190158231 0.135330792
1997.67 0.43 0.201554277 0.223756878
1997.75 0.36 0.208525415 0.307803117
1997.83 0.42 0.207542664 0.380786687
1997.92 0.73 0.366317232 0.437661898
1998 1.08 0.575609721 0.475365531
1998.08 1.25 0.598366938 0.492790955
1998.17 1.04 0.554248257 0.490472911
1998.25 1.09 0.549054854 0.470116946
1998.33 0.91 0.449945986 0.434119762
1998.42 0.54 0.296215288 0.385199413
1998.5 0.45 0.226724119 0.326197782
1998.58 0.41 0.204014524 0.260049682
1998.67 0.3 0.180487157 0.189853404
1998.75 0.35 0.15735462 0.118943374
1998.83 0.14 0.103784813 0.050866244
1998.92 0.07 0.021387215 -0.010803473
1999 -0.18 -0.063537266 -0.062813738
1999.08 -0.22 -0.112132688 -0.102692123
1999.17 -0.22 -0.099708867 -0.129199203
1999.25 -0.26 -0.11654271 -0.142645807
1999.33 -0.39 -0.191000579 -0.144957913
1999.42 -0.39 -0.187892081 -0.139425235
1999.5 -0.28 -0.124472968 -0.130145837
1999.58 -0.26 -0.126801274 -0.121259946
1999.67 -0.28 -0.133857277 -0.116129921
1999.75 -0.24 -0.10247343 -0.116651635
1999.83 -0.21 -0.102803097 -0.122865577
1999.92 -0.22 -0.105692672 -0.132975102
2000 -0.28 -0.117875658 -0.143787559
2000.08 -0.37 -0.187849037 -0.151492915
2000.17 -0.39 -0.188177147 -0.152609085
2000.25 -0.24 -0.096551078 -0.144875467
2000.33 -0.17 -0.091470219 -0.127879789
2000.42 -0.29 -0.135640578 -0.103259563
2000.5 -0.26 -0.105194511 -0.074416317
2000.58 -0.13 -0.077247938 -0.045796107
2000.67 -0.15 -0.056582623 -0.021895839
2000.75 0.02 0.022737488 -0.006225574
2000.83 0.12 0.0656018 -0.000473807
2000.92 0.07 0.024547777 -0.004080314
2001 -0.18 -0.033309872 -0.014328168
2001.08 -0.05 -0.076990522 -0.02694441
2001.17 -0.2 -0.047699548 -0.03707665
2001.25 0.04 0.021113789 -0.040420033
2001.33 -0.02 -0.023297894 -0.034228931
2001.42 -0.23 -0.083232133 -0.017970812
2001.5 -0.01 0.006897479 0.006538164
2001.58 0.21 0.076856806 0.035554224
2001.67 -0.06 0.032608399 0.064288908
2001.75 0.2 0.060970662 0.088046707
2001.83 0.21 0.142600301 0.103346361
2001.92 0.26 0.123623769 0.108755393
2002 0.12 0.080455154 0.105233374
2002.08 0.19 0.092327213 0.095895582
2002.17 0.17 0.09469414 0.085247749
2002.25 0.1 0.079235608 0.078071129
2002.33 0.24 0.086374909 0.078224126
2002.42 0.12 0.117479715 0.087650528
2002.5 0.34 0.142222547 0.105838633
2002.58 0.19 0.118509976 0.129870272
2002.67 0.14 0.079350038 0.15505861
2002.75 0.19 0.099018491 0.176031854
2002.83 0.3 0.162100956 0.188010627
2002.92 0.45 0.225339891 0.187976332
2003 0.46 0.259352804 0.175449034
2003.08 0.48 0.218289399 0.15268201
2003.17 0.22 0.142112591 0.124215697
2003.25 0.17 0.095537426 0.095884616
2003.33 0.13 0.041274088 0.073501372
2003.42 -0.11 0.001273274 0.061520932
2003.5 0.13 0.036083502 0.061997103
2003.58 0.09 0.062904575 0.074078775
2003.67 0.07 0.061566385 0.094170796
2003.75 0.29 0.125182587 0.116731567
2003.83 0.31 0.183475415 0.13553208
2003.92 0.31 0.161481827 0.14509345
2004 0.32 0.155415837 0.141976958
2004.08 0.27 0.164552188 0.1256322
2004.17 0.27 0.121178996 0.098607825
2004.25 0.09 0.072859339 0.066072608
2004.33 0.03 0.013857437 0.034749397
2004.42 -0.13 -0.064491272 0.011495386
2004.5 -0.18 -0.050778639 0.001839769
2004.58 0.12 0.021097038 0.008797107
2004.67 -0.05 0.032304022 0.032211004
2004.75 0.18 0.064980974 0.0687634
2004.83 0.26 0.15214421 0.112637767
2004.92 0.35 0.180039879 0.156683149
2005 0.37 0.204132685 0.193821856
2005.08 0.62 0.291234476 0.218398814
2005.17 0.49 0.292656441 0.227193289
2005.25 0.39 0.168743346 0.219896666
2005.33 0.12 0.084883558 0.198982487
2005.42 0.17 0.099370989 0.169028269
2005.5 0.28 0.127816237 0.135662655
2005.58 0.16 0.114743146 0.104381719
2005.67 0.21 0.087724971 0.079491302
2005.75 0.13 0.097313313 0.063388207
2005.83 0.25 0.115383197 0.056305015
2005.92 0.17 0.099056248 0.056533469
2006 0.15 0.097887502 0.061035947
2006.08 0.27 0.11039506 0.066276891
2006.17 -0.01 0.044404628 0.069071962
2006.25 -0.08 -0.063949318 0.067266729
2006.33 -0.2 -0.077127023 0.060112133
2006.42 0.02 0.021245698 0.048284875
2006.5 0.25 0.122827269 0.033586499
2006.58 0.21 0.125823781 0.018425021
2006.67 0.1 0.043815494 0.005222204
Thank you W.
I did my PhD in a signal processing laboratory and have used it professionally for much of my career.
In a nutshell, the cross correlation function is the inverse Fourier Transform (or discrete inverse FT if one is using sampled data) of the cross-power spectrum. It is easy to show that when you filter a signal and perform a correlation, you are multiplying the cross-spectrum by the squared amplitude response of the filter. This has the obvious effect of introducing serial correlation into the inverse transformed record because it constrains the frequency of the correlation function. The ACF of an ideal random signal is an impulse at T=0, because it has a uniform power spectrum. When high frequencies in the power spectrum are eliminated, low frequency oscillations are generated in the correlation function after inverse transformation, i.e.: serial correlation is generated. The degree of additional correlation is calculable from the filter frequency response.
To put this another way, the filtered signals are the result of a convolution between the raw signals and the impulse response of the filter. In a purely random signal, in which, by definition, each sample is uncorrelated. If one imagines a 3 point, non recursive, filter with an impulse response of 0.5, 1.0, 0.5 it is clear that any sample in the output is dependent on the the original sample and its neighbours, so introducing serial correlation into the signal.
I agree that “R” allows one to do quite complex calculations with the minimum of effort. What is important, when using such packages, is that one understands the results one has obtained.
I’m sorry that I omitted one important aspect of your question that relates to basic signal processing. A real sampled record, as opposed to an analytical record, has very important features that relate to its Fourier Transform. The Fourier Transform of a real sequence does not exist. The reason for this is that the limits of integration of the FT extend between + and – infinity and therefore one has to assume that a real signal is one of infinite duration multiplied by finite length window. This is important because it shows that a real record can only be represented by a discrete Fourier Series, not its transform.
While this might seem an abstract point, it is actually very important because all correlations can be calculated via (discrete) Fouriier Transforms and this allows analysis of what happens at the end of signals. Given that one has a record, one can establish the discrete Fourier Series of that record, which allows efficient filtering and correlation. The problem is that one is actually creating an infinite series that repeats with a period determined by the length of the initial record. In principle, one can calculate the first point in the repeat of the record from the last point in the record by a Taylor’s series expansion, but in practice, unless the signal in question is an exact harmonic of the sampling record length, one will not be able to do so. This means that signal is discontinuous at its ends. In practice, this is overcome by using a tapering “window” to remove the discontinuity and also by extending the record by at least its length with zeros. This gives an approximation to the underlying signal by allows consistent manipulations such as filtering or correlation.
Thus the problem of start-up and finishing transients are well recognised and leads to the rule of thumb that one should only use, at most, one third of a record to establish correlation.
Phil –
The data is Mauna Loa. Because the base plot is annual increments, the seasonal movement is essentially irrelevant. No-one is suggesting that all of the CO2 movement is caused solely by temperature change, simply that temperature change has a manifest influence on the CO2 level, and leads it by several months. There is no attempt to hide anything by changing the scales, which in anycase has no bearing on the above point.
Cheers,
re running mean: Allan’s paper does not explicitly say how he did the averaging but does not use the word gaussian anywhere. It does explicitly mention “all Running Means” , “no Running Means” in all graph titles. So I have to assume that is clearly what he was using.
I was confused in my initial comment by Willis referring to a 12 month average when apparently this meant a gaussian FWHM (two sigma width) 12m filter which as a 3 sigma filter would need 36m window (correcting my ealier 72m which I’d calculated for 12m sigma not 12m FWHM). I’d already acknowledged that confusion so no need to continue on that.
http://climateaudit.org/2008/02/12/data-smoothing-and-spurious-correlation/#comment-136804
Allan MacRae:
LT data: http://www.atmos.uah.edu/data/msu/t2lt/tltglhmam_5.2
ST data: http://www.cru.uea.ac.uk/cru/data/temperature/hadcrut3gl.txt
The main problem I see here (aside the use of RM) is that both these are “anomally” data sets . That means they have had the annual “seasonal” pattern from some period (usually 1960-90 or similar) removed.
hadCRUT* has hadSST* as the major shareholder. The convoluted processing used to calculate the “climatology” that is subtracted to create the “anomaly” makes notable changes to frequency content as I showed here and discussed with John Kennedy in comments:
http://judithcurry.com/2012/03/15/on-the-adjustments-to-the-hadsst3-data-set-2
That would support Allan’s comments about the reliability of surface temps in the CA thread.
Surely this sort of thing needs to be done _actual_ temperatures and data with the least amount of processing.
I would suggest that untampered ICOADS SST data may be a better choice if looking for short term correlations.
The final comment by Ray Tomes at CA, that the 9m maybe phase shift of differentiation seems pertinent. It would seem from Allans’s paper (I also looked at this relationship a few years back too) that it is d/dt (CO2) that is affected by temperature rather than CO2 level. For CO2 conc there is a short term correlation with lag but a strong underlying rise.
Allen has detrended CO2 conc which is a crude high-pass filter. This makes the short term correlation visible in CO2.
There is a fairly obvious oscillatory component as Ray points out and that explains the lag.
Out gassing is caused by the temperature deviation from the current “equilibrium” state of the water that supports the current absorbed concentration.
The strong similarity that Allan shows in ST and d/dt CO2 is quite striking and seems to account for a large part of the variation in the two. This would suggest that the dominant factor is oceanic out-gassing. At least on the sub-decadal time-scale studied.
It does not support the idea of CO2 causing temp change. In essence, it seems Allan’s conclusions are basically correct.
Bill Illis –
As in my answer to Phil, I’m taking the percentage 12-month change in Mauna Loa CO2 ppm, not seasonally adjusted. There is no obvious reason to believe those numbers should have a seasonal signal. The temperature data is the 12 month change in the HADCRUT3 global temperature anomaly. The key point is that if you take incremental CO2, and smooth it, and incremental temperature, and smooth it (both by taking 12 month averages), I contend that the fact that Temperature leads CO2 is manifest. Not that it’s the only or even major cause of change, but it is a major driver of the volatility in CO2 changes.
This isn’t an enviromentalist scam. In fact, the relationship in reverse of that expected for ‘global warming’ I suspect is giving some a propaganda-hernia. The only thing I would comment is that this clearly points to a positive feedback on CO2 itself, rather than temperature.
BTW, apologies if you followed my link early on, when the graph was devoid of scales and titles. Here it is again, for other completer-finisher types:
http://www.robles-thome.talktalk.net/carbontemp.pdf
I cannot see how Figure 3, as it stands, can be correct general case. Clearly there will be a random phase shift in the cross-spectrum which will be exagerated with low pass filtering. However, there there should be a systematic time delay in the cross-correlation between two random signals. The mean time shift should be 0.
Is Figure 3 the mean of many trials? If so, I suspect that the results have been miscalculated. If it is a single trial, it shows one case of a time shift with one particular data and does show a general case..
This has implications for the statistics of the phase shift. If the statistics of the underlying process is known, the limits of the time shift is calculable.
This is a manifestation of a fundamental problem in climate statistics. We only have 1 record of temperature and any other variable. Normally, when one uses the statistical methods discussed here, one uses ensemble statistics – which is not possible. However, if one is looking for short delays only, one can segment the record, obtain the ensemble power spectra and hence perform the corellations. In this case, only a modest increase in accuracy will be obtained, but one will get a better idea of the variability.
I am sorry, Mr Eschenbach, if my initial comment upset you. However, I do think that this post shows signal processing or statistical skill. If, as you request, you want me to show how to do this better, I would be delighted to write a post on signal correlation from a signal processors perspective.
Sorry, that should read in line 3: “there should NOT be a systematic time delay”
RCS: “I would be delighted to write a post on signal correlation from a signal processors perspective.”
That would be very valuable.
I’m sure Anthony would be willing to publish something like that.
@ur momisugly Greg Goodman Sure, send it along.
OK, If Anthony would like me to, I can write a post about how correlation, convolution, Fourier transforms, filtering and degrees of freedom etc fit together.
@RCS sure, go for it. Write it in MS Word with embedded graphs, and then leave a note in “submit a story” and I’ll send you an email to send the entire essay to.
Here’s a plot I did in 2010 investigating similarly to Allen MacRae:
http://climategrog.wordpress.com/?attachment_id=207
my gauss3 is a gaussian with sigma=3 months , equivalent of FWHM=6 months in Willis’ notation.
Note the phase lags 0.05 year that I used to best align ERSST with the other two. Y-axis shifts of 0.23 and 0.26 are immaterial.
Volcano dates are not supposed to prove anything , it was an exploratory plot an I wanted to see whether anything was visible.
The fact that CO2 significantly steps away from SST around 1987 is interesting. Having closely matched before hand this change merits a closer look.
PS diff3 was differential (rate of change) of CO2. Taken over three points.
Anthony Watts says:
April 1, 2013 at 9:23 am
“@RCS sure, go for it. Write it in MS Word with embedded graphs,”
This is what sets thinking sceptics apart from the “commited”. Sceptics roll up their sleeves and using sharp tools, take apart the facile “creations of the “decided” folks. This stuff would be censored in Real Climate and like quarters, robbing them of the magnificent education available. When one only lets ideas in that agree with one’s own, there may be a lot of comfort but there’s zero education. I suspect that WUWT, CA, and other analytical sceptic blogs get frequent quiet visits by the hockey team. Steve McIntyre notes that CA was not referenced in the turnaround taken by Marcott et al. This harkens back to Gergis et al. After being demolished by CA, Gergis makes an “independent” discovery of his paper’s terminal deficiencies the next day – no “thank you Steve” forthcoming. The surfacestations project set off a round of station closures, new deployments, and a snarky, premature paper attempting to marginalize the project. Oh there is no thanks from that quarter. Worse, the consensus’s mediocre “work” has cost us a trillion or two and sceptics who have brought real science to the subject are basically unfunded.
“…and sceptics who have brought real science to the subject are basically unfunded.”
You’re kidding, right? Hasn’t Hearthand forwarded your regular check from Koch Bros and Exxon Mobil this month? I know I got mine. 😉
Matt Briggs says:
“Two broad hypotheses are advanced: (Hypothesis 1) As more CO2 is added to the air, through radiative effects, the temperature later rises; and (Hypothesis 2) As temperature increases, through ocean-chemical and biological effects, CO2 is later added to the atmosphere.”
“All I am confident of saying is, conditional on this data and its limitations etc., that Hypothesis 2 is more probable than Hypothesis 1, but I won’t say how much more probable.”
Willis Eschenbach says: March 31, 2013 at 11:26 pm
I would hardly call that a ringing endorsement of the idea that temperature changes cause CO2 changes 9 months later.
Hi Willis,
I think I must have been smarter when I did this work – now, it just makes my head hurt. 🙂
As I recall, Matt Briggs’ conclusion is limited by his analytical method, which only examined integer multiples of 1-year lags. He then concluded that “CO2 lags temperature” is more probable than “temperature lags CO2”. I recall he found the best indication of this probability at a one-year lag.
This is all from memory and may all be crap – it’s late and I’ve been working all day.
Best personal regards, Allan
Greg Goodman says: April 1, 2013 at 3:19 am
re running mean: Allan’s paper does not explicitly say how he did the averaging but does not use the word gaussian anywhere. It does explicitly mention “all Running Means” , “no Running Means” in all graph titles. So I have to assume that is clearly what he was using.
HI Greg,
There should be a spreadsheet of all calcs available for download at
http://icecap.us/index.php/go/joes-blog/carbon_dioxide_in_not_the_primary_cause_of_global_warming_the_future_can_no
Try
http://icecap.us/images/uploads/CO2vsTMacRaeFig5b.xls
Best, Allan
Thanks, Alan. As I mentioned, I’m still not clear if what you found was real or not, nor was that my point. I just wanted to show the difficulties with the smoothing of data before you do further analyses.
Best to you,
w.
Hi Willis, a Mathematica news tag led me here. It’s a very powerful system, and worth the money (for a home license if you’re not being bankrolled!). Did you know that the latest version has R integration? So you can run R code and display results in Mathematica? That way you needn’t throw away all that hard earned knowledge of yours.
Allan MacRae says “As I recall, Matt Briggs’ conclusion is limited by his analytical method, which only examined integer multiples of 1-year lags. He then concluded that “CO2 lags temperature” is more probable than “temperature lags CO2”. I recall he found the best indication of this probability at a one-year lag.
This is all from memory and may all be crap … ”
Just looking at unsmoothed monthly data, it’s pretty clear that CO2 lags temperature:
http://members.westnet.com.au/jonas1/deltaCO2vsTemp.JPG
The graph shows y-o-y CO2 change and temperature (NB. scaled), with CO2 change shifted back in time 6 months. ie, CO2 changes usually lag temperature by 6-12 months.
But that’s not the whole story. The annual changes in temperature are high enough to make the resultant CO2 change show up above the noise, and that’s visible in the graph. But percentage-wise, CO2 from fossil fuels doesn’t vary much y-o-y. For its effect on CO2, and for CO2’s effect on temperature,you have to look elsewhere, or look differently.
Pete says:
April 3, 2013 at 12:55 am (Edit)
Hey, Pete, thanks. I have Mathematica and I do use it. I can program it in two of the four languages it understands … but like I said, the learning curve is so steep it gives me nosebleed.
Plus (and more important) it’s way expensive for your average guy, I got mine from my work. But if I do the same work in R, anyone can replicate it.
Thanks for the tip about displaying R results in Mathematica … but since an upgrade to do that will likely be more money than I’m willing to spend, I’ll just putter along.
w.
Mike Jonas says:
April 3, 2013 at 2:37 am (Edit)
Thanks, Mike. Unfortunately, you’re just looking at an artifact created by comparing today’s temperature to a 12-month change in temperature. To see the effect, try graphing the temperature versus the 12 month change in the temperature in the same manner …
w.
Willis says “Unfortunately, you’re just looking at an artifact created by comparing today’s temperature to a 12-month change in temperature. To see the effect, try graphing the temperature versus the 12 month change in the temperature in the same manner …”
Presumably you mean that by looking at 12-month change in CO2 I am effectively looking at 12-month change in temperature. I disagree, and the reason is very important to this whole temperature-CO2 thing…..
The temperature-CO2 relationship is a bit surprising. When I first started looking at temperature and CO2 data, I rather naturally plotted temperature change against CO2 change. Then Frank Lansner pointed to the correlation between CO2 change and temperature (not temperature change). It took me a long time before the penny dropped …
… as I explained in an earlier comment (http://wattsupwiththat.com/2013/03/30/the-pitfalls-of-data-smoothing/#comment-1262333), “Other possible problems with the paper are [] the use of annual change in temperature, when actual temperature would be more relevant. The point is that the rate of atmospheric CO2 absorption or emission by the ocean is driven by temperature, not by how much the temperature differs from last year’s temperature.“.
In simple terms, the rather small CO2 change caused by last year’s temperature has no effect on the subsequent rate of CO2 absorption/emission by the ocean. In other words, today’s temperature drives the rate of CO2 absorption/emission by the ocean, unaffected by anything that last year’s temperature did. The y-o-y change in CO2 reflects that rate.
My argument only applies in a world in which a relatively steady stream of man-made CO2 is maintaining a high ocean-atmosphere imbalance. Without the man-made CO2, the relationship between CO2 change and temperature would be weaker, and the relationship between CO2 change and temperature change would be stronger.
It’s counter-intuitive, but by plotting temperature, not temperature change, against CO2 change, I’m matching the relevant two variables.
So it isn’t an artefact. And as I pointed out in the same comment, y-o-y CO2 changes relate to man-made CO2 emissions, not to changes in man-made emisssions.
Hi again Willis,
I re-examined the plots in the Excel spreadsheet at
http://icecap.us/images/uploads/CO2vsTMacRaeFig5b.xls
and recall that I ran Figures 5, 6, 7, and 8 all without running means, and added Figure 5b (to address your question at that time).
I think that without running means, the stated relationship between dCO2/dt and Temperature and the 9-month lag of Temperature after CO2 still hold true, although they are less beautiful than with running means.
Anyone who wants to review the math can do so through the spreadsheet.
I do not think the relationship that I alleged to exist is an artifact of the mathematical technique, although I obviously welcome anyone using a better approach. The challenge is to deal with the seasonal “sawtooth” oscillation in the CO2 data in a manner that is valid.
Of course, the temperature data is itself not actually temperature, but is a temperature anomaly created in order to deal with another seasonal oscillation.
Regarding the suggestion from several parties that fossil fuel emissions drive CO2, I did considerable work on this premise at the time and could not find the relationships that others allege to exist. The so-called “mass balance” argument has been ongoing between Richard Courtney and Ferdinand Engelbeen for over a decade. I lean towards Richard’s view, although I respect both gentlemen.
I have also participated in this debate. I concluded at the time, based on limited Salt Lake City data, that it appeared that manmade CO2 emissions were captured close to the source at that locality, at least during the growing season. The absence of any “rush-hour” spikes in CO2 concentrations was surprising.
Best, Allan
.
Hi Willis and Allan Macrae
I looked at the auto-correlation of dCO2 in the linked data. As well as the obvious peaks at 0,+/- 12, +/- 24… months, there are lesser but significant peaks at +/- 3, +/- 15, +/- 27,,, and at +/- 9, +/- 21…
I then looked at the monthly averages of dCO2 (ie current CO2 level – previous month’s CO2 level). The figures for January and October are literally hundreds of times greater than those for the other months (essentially the figures for those 2 months are always positive and “big” while the other months are smaller and, more importantly, vary in sign). The 9 month gap between January and October obviously accounts for the other peaks in the auto-correlation of dCO2.
I’ve used the data you link to and have been able to replicate your results so I don’t think anything is wrong there.
I wonder whether the original data are correct and, if so, what could be the mechanism which makes net CO2 emissions so much greater in January and October.
BTW, I’m not sure whether the date labels in the data correspond to the beginning or end of the relevant month so “January” may be “December” and “October” may be “September”.
BTBTW, I don’t know whether this 9/3 monthly gap between dCO2 peaks could account for the apparent similarly lagged correlation between dCO2 and dT. At first sight I can’t see how it could, but maybe I’m missing something obvious.
Simon Anthony