Guest Post by Willis Eschenbach
There’s a new paper over at IOP called “Forcing of the wintertime atmospheric circulation by the multidecadal fluctuations of the North Atlantic ocean”, by Y Peings and G Magnusdottir, hereinafter Peings2014. I was particularly interested in a couple of things they discuss in their abstract, which says (emphasis mine):
The North Atlantic sea surface temperature exhibits fluctuations on the multidecadal time scale, a phenomenon known as the Atlantic Multidecadal Oscillation (AMO). This letter demonstrates that the multidecadal fluctuations of the wintertime North Atlantic Oscillation (NAO) are tied to the AMO, with an opposite-signed relationship between the polarities of the AMO and the NAO. Our statistical analyses suggest that the AMO signal precedes the NAO by 10–15 years with an interesting predictability window for decadal forecasting. The AMO footprint is also detected in the multidecadal variability of the intraseasonal weather regimes of the North Atlantic sector. This observational evidence is robust over the entire 20th century and it is supported by numerical experiments with an atmospheric global climate model.
Let me start with their claim that the AMO signal precedes the NAO by 10-15 years. Here’s the cross-correlation function for the monthly data, using the full 1856-2112 NOAA AMO and the Hurrell NAO data:
Hmmmm … why am I not finding the relationship between AMO and NAO they discuss? I mean, I see that the largest correlation is at zero, and there is a correlation out 15 years, but it’s all so tiny … what’s the problem?
Well, to start with, they are not using the regular AMO index, nor are they using the full year. Instead, here is their description:
A wintertime AMO index is constructed over the 1870– 2012 period using the HadISST dataset (Rayner et al 2003). The monthly SST anomalies are determined with respect to the 1981–2010 climatology, then the winter AMO index is computed by averaging the monthly SST anomalies over the North Atlantic [75W/5W; 0/70N] from December to March (DJFM). The global anomalies of SST are subtracted in order to remove the global warming trend and the tropical oceans influence, as suggested by Trenberth and Shea (2006). A Lanczos low-pass filter is applied to the time series to remove the high-frequency variability (21 total weights and a threshold of 10 years, with the end points reflected to avoid losing data).
Nor are they using the standard NAO, viz:
A decadal NAO index is computed from the 20th century reanalysis (20CR), which is available over 1871–2010 and is based on the assimilation of surface pressure observations only (Compo et al 2011). We use the station-based formulation based on the Stykkisholmur/Reykjavik and Lisbon anomalous sea-level pressure (SLP) difference (Hurrell et al 2003). The high-frequency fluctuations are removed from the NAO index using the same Lanczos filter as for the AMO index.
They are not using the standard AMO, nor the standard NAO, and most importantly, they are using a smoothed subset of the data for calculating the correlations. While using smoothed data is fine for display purposes, it is almost always a Very Bad Idea™ to do statistics and correlations using smoothed data, for reasons discussed below.
In addition, they are not using the full year. Instead, they are using a 4-month subset of the year, DJFM. While there is no inherent problem with doing this, it definitely messes with the statistics. If you want to find a significant correlation using a 4-month subset of the annual data, to achieve a significance level of 0.05 you need to find a four-month chunk with a p-value of one minus the twelfth root of 0.95, or 0.004 …
They go on to say that they have taken autocorrelation into account, viz (emphasis mine):
Figure 2 of Peing2014. ORIGINAL CAPTION: Lead–lag correlations (black curve) between the DJFM AMO and the DJFM decadal NAO indices over 1901–2010. The statistical significance of the correlation is depicted by the p-value (blue dashed curve), computed using a bootstrap method that takes into account auto-correlations in the time series. The 95% confidence level is indicated by the dashed black line.
I note that they are using p=0.05 as their significance level, despite the fact that they are using partial-year correlations.
Now it’s wonderful that they have used a “bootstrap method” to allow for auto-correlations … but that’s the sum total of the information that they give us about their whizbang bootstrap method. I generally use the method of Quenouille, viz:
Figure 3. My emulation of Peing2014 Figure 2. Red shows p-values less than 0.05. “DJFM” is December-January-February-March. Auto-correlation is adjusted for by the method of Quenouille detailed above.
While the general shape is similar to Figure 2, there are a number of differences between what I find and what they find. Overall, the correlation “R” (black line) is slightly smaller. Their correlation has a max of about 0.55 and a minimum of -0.75, while mine has a max of 0.5 and a minimum of -.67. And while their results show R = +0.2 at -30, my results show R≈0. Not a lot of difference to be sure … but I’m using their data, so it should be exact.
Next, I find higher results for the p-value. Only the lags -2 to -7, and 23 to 27, are significant at the 0.05 level.
However, remember that they have used only part of the dataset, the values from December to March. Assuming that they searched all of the 4-month periods to settle finally on DJFM, that’s a dozen different samples that they have searched. And it may be more than a dozen, because I would assume that they would first look by quarters (three months). As a result, if you search that many situations, your odds of finding a result with a p-value of 0.5 purely by pure chance is quite large …
The net result is that if you look at twelve samples, you need to find a p-value of
<blockquote>1 – o.95<sup>1/12</sup> = 0.004</blockquote>
to be statistically significant at the 0.05 level … and that’s not happening anywhere in their graph.
Next, they do not find a correlation with AMO lagging the NAO, as in my results.
Next, there is an oddity, I might even say an impossibility, in their result. Look at the left hand side of Figure 2. Remember that as the lead gets longer and longer, we are using fewer and fewer datapoints in the calculation. In addition, as the lead gets longer, the correlation ( R ) is decreasing. Now, with fewer datapoints and a lower number of years, the p-value should steadily increase. You can see that in my graph—the maximum correlation and the minimum p-value are at about a two-year lead, and then as the lead heads out to 30 years, the R decreases, and the number of datapoints decreases.
But when both the correlation and the number of datapoints go down, the p-value has to increase … and while that is visible in my results, we don’t see anything like that in their results.
I am in mystery about the difference between my results and theirs. I know that the digitization is accurate to within the widths of the lines, here’s the proof of that, a screenshot of the digitization process of their Figure S3 …
Finally, I have grave reservations about this general type of analysis. Basically, the AMO and the NAO represent subsections of the global temperature record. And as the name suggest, the NAO (North Atlantic Oscillation) is in itself a subset of the AMO (Atlantic Multidecadal Oscillation), representing the northern part.
As a result, I would be shocked if we did NOT find something akin to Figures 2 or 3 above. And in fact, a Monte Carlo analysis using proxy data with autocorrelation characteristics like the highly smoothed data that they are using easily generates the kind of curves shown above. That’s what happens when one dataset is a subset of another dataset, and it should not be a surprise to anyone.
In addition, such relationships are often not stable over time. For example, Figure 5 shows the cross-correlation for the AMO and NAO datasets (1901-2010), along with the identical cross-correlation calculations for the first halves (1901-1955) and for the second halves (1956-2010) of the two datasets. As you can see, the relationship is far from consistent, with cross-correlations of the two halves being different from each other, and both being different from the full dataset as well. This increases the chance that we are looking at a spurious correlation.
As you recall, they claim in their Abstract (above) that their results are “robust over the entire 20th Century”, but their own data says otherwise.
In no particular order:
• Since the NAO is a subset of the AMO, we would expect cross-correlation between the two at a number of leads and lags … and that’s what we find. The authors seem to find that impressive, but their results show levels of significance and shapes of the cross-correlation that are quite commonplace when one dataset is a subset of another and the two datasets are heavily smoothed.
• They have made no attempt to adjust their significance levels to reflect the fact that they have chosen one of twelve or more possible monthly subsets of the data. This is a huge oversight, and one that puts all of their conclusions into doubt.
• I am unable to replicate the results of their cross-correlations (what they call “lead-lag” correlations above) of the smoothed 1901-2010 DJFM NAO and AMO.
• I am also unable to replicate the results of their “bootstrap” method of calculating the p-value, although that is undoubtedly related to the fact that they did not disclose their secret method …
• They neglected to include a description of one of the most important parts of their analysis, the calculation of the significance using a bootstrap method.
• The use of smoothed data in doing cross-correlation analyses is an abomination. Nature knows nothing of the 30-year average changes. Either there is significant cross-correlation between the two actual datasets or there is not. Using smoothed datasets can even generate totally spurious correlations. I give some examples here … and lest you think that I made up the idea that smoothing can lead to totally spurious correlations, it’s actually called the “Slutsky-Yule Effect”. Their use of smoothed datasets for cross-correlation alone is enough to entirely disqualify their study.
• As a result, were I a reviewer I could not agree with the publication of this study until those problems are solved.
A couple of things in closing. First, Science magazine recently decided to add a statistician to the peer-review panel for all studies … and as this paper clearly demonstrates, all journals might profitably do the same.
And second, the AMO and the PDO and the NAO are all parts of the global temperature record. As a result, using them to emulate the global temperature record as the authors have done can best be described as cheating. When someone does that, they are using part of what they are trying to predict as an explanatory variable …
And while (as the authors show) that is often a way to get impressive results, it’s like saying that you can predict the average temperature for tomorrow, as long as you already know tomorrow’s temperature from noon to 2pm. Which is not all that impressive, is it?
My best regards to all,
De Maximis: If you disagree with me, and many do on any given day, please quote the exact words that you disagree with. That way, we can all understand exactly what your objection might be.
DATA AND CODE: The digitized 30-year smoothed datasets of the AMO and the NAO are here. The NOAA AMO data is online here, and the Hurrell NAO data is here. I haven’t posted the computer code. It is a pig’s breakfast, and as opposed to being “user-friendly”, it is actively user-aggressive … I may clean it up if I get time, but my life is a bit crazy at the moment, the data is there, and a cross-correlation is a very simple analysis that folks can do on their own.