Guest Post by Willis Eschenbach
There’s a new paper over at IOP called “Forcing of the wintertime atmospheric circulation by the multidecadal fluctuations of the North Atlantic ocean”, by Y Peings and G Magnusdottir, hereinafter Peings2014. I was particularly interested in a couple of things they discuss in their abstract, which says (emphasis mine):
Abstract
The North Atlantic sea surface temperature exhibits fluctuations on the multidecadal time scale, a phenomenon known as the Atlantic Multidecadal Oscillation (AMO). This letter demonstrates that the multidecadal fluctuations of the wintertime North Atlantic Oscillation (NAO) are tied to the AMO, with an opposite-signed relationship between the polarities of the AMO and the NAO. Our statistical analyses suggest that the AMO signal precedes the NAO by 10–15 years with an interesting predictability window for decadal forecasting. The AMO footprint is also detected in the multidecadal variability of the intraseasonal weather regimes of the North Atlantic sector. This observational evidence is robust over the entire 20th century and it is supported by numerical experiments with an atmospheric global climate model.
Let me start with their claim that the AMO signal precedes the NAO by 10-15 years. Here’s the cross-correlation function for the monthly data, using the full 1856-2112 NOAA AMO and the Hurrell NAO data:
Figure 1. Cross-correlation of the full Hurrell NAO and the NOAA AMO, 1856-2012.
Hmmmm … why am I not finding the relationship between AMO and NAO they discuss? I mean, I see that the largest correlation is at zero, and there is a correlation out 15 years, but it’s all so tiny … what’s the problem?
Well, to start with, they are not using the regular AMO index, nor are they using the full year. Instead, here is their description:
A wintertime AMO index is constructed over the 1870– 2012 period using the HadISST dataset (Rayner et al 2003). The monthly SST anomalies are determined with respect to the 1981–2010 climatology, then the winter AMO index is computed by averaging the monthly SST anomalies over the North Atlantic [75W/5W; 0/70N] from December to March (DJFM). The global anomalies of SST are subtracted in order to remove the global warming trend and the tropical oceans influence, as suggested by Trenberth and Shea (2006). A Lanczos low-pass filter is applied to the time series to remove the high-frequency variability (21 total weights and a threshold of 10 years, with the end points reflected to avoid losing data).
Nor are they using the standard NAO, viz:
A decadal NAO index is computed from the 20th century reanalysis (20CR), which is available over 1871–2010 and is based on the assimilation of surface pressure observations only (Compo et al 2011). We use the station-based formulation based on the Stykkisholmur/Reykjavik and Lisbon anomalous sea-level pressure (SLP) difference (Hurrell et al 2003). The high-frequency fluctuations are removed from the NAO index using the same Lanczos filter as for the AMO index.
They are not using the standard AMO, nor the standard NAO, and most importantly, they are using a smoothed subset of the data for calculating the correlations. While using smoothed data is fine for display purposes, it is almost always a Very Bad Idea™ to do statistics and correlations using smoothed data, for reasons discussed below.
In addition, they are not using the full year. Instead, they are using a 4-month subset of the year, DJFM. While there is no inherent problem with doing this, it definitely messes with the statistics. If you want to find a significant correlation using a 4-month subset of the annual data, to achieve a significance level of 0.05 you need to find a four-month chunk with a p-value of one minus the twelfth root of 0.95, or 0.004 …
They go on to say that they have taken autocorrelation into account, viz (emphasis mine):
Figure 2 of Peing2014. ORIGINAL CAPTION: Lead–lag correlations (black curve) between the DJFM AMO and the DJFM decadal NAO indices over 1901–2010. The statistical significance of the correlation is depicted by the p-value (blue dashed curve), computed using a bootstrap method that takes into account auto-correlations in the time series. The 95% confidence level is indicated by the dashed black line.
I note that they are using p=0.05 as their significance level, despite the fact that they are using partial-year correlations.
Now it’s wonderful that they have used a “bootstrap method” to allow for auto-correlations … but that’s the sum total of the information that they give us about their whizbang bootstrap method. I generally use the method of Quenouille, viz:
I digitized their data to see if I could replicate their Figure 2. Figure 3 shows that result:
Figure 3. My emulation of Peing2014 Figure 2. Red shows p-values less than 0.05. “DJFM” is December-January-February-March. Auto-correlation is adjusted for by the method of Quenouille detailed above.
While the general shape is similar to Figure 2, there are a number of differences between what I find and what they find. Overall, the correlation “R” (black line) is slightly smaller. Their correlation has a max of about 0.55 and a minimum of -0.75, while mine has a max of 0.5 and a minimum of -.67. And while their results show R = +0.2 at -30, my results show R≈0. Not a lot of difference to be sure … but I’m using their data, so it should be exact.
Next, I find higher results for the p-value. Only the lags -2 to -7, and 23 to 27, are significant at the 0.05 level.
However, remember that they have used only part of the dataset, the values from December to March. Assuming that they searched all of the 4-month periods to settle finally on DJFM, that’s a dozen different samples that they have searched. And it may be more than a dozen, because I would assume that they would first look by quarters (three months). As a result, if you search that many situations, your odds of finding a result with a p-value of 0.5 purely by pure chance is quite large …
The net result is that if you look at twelve samples, you need to find a p-value of
<blockquote>1 – o.95<sup>1/12</sup> = 0.004</blockquote>
to be statistically significant at the 0.05 level … and that’s not happening anywhere in their graph.
Next, they do not find a correlation with AMO lagging the NAO, as in my results.
Next, there is an oddity, I might even say an impossibility, in their result. Look at the left hand side of Figure 2. Remember that as the lead gets longer and longer, we are using fewer and fewer datapoints in the calculation. In addition, as the lead gets longer, the correlation ( R ) is decreasing. Now, with fewer datapoints and a lower number of years, the p-value should steadily increase. You can see that in my graph—the maximum correlation and the minimum p-value are at about a two-year lead, and then as the lead heads out to 30 years, the R decreases, and the number of datapoints decreases.
But when both the correlation and the number of datapoints go down, the p-value has to increase … and while that is visible in my results, we don’t see anything like that in their results.
I am in mystery about the difference between my results and theirs. I know that the digitization is accurate to within the widths of the lines, here’s the proof of that, a screenshot of the digitization process of their Figure S3 …
Figure 4. Screenshot of the digitization process, showing that the errors are less than half a linewidth …
Finally, I have grave reservations about this general type of analysis. Basically, the AMO and the NAO represent subsections of the global temperature record. And as the name suggest, the NAO (North Atlantic Oscillation) is in itself a subset of the AMO (Atlantic Multidecadal Oscillation), representing the northern part.
As a result, I would be shocked if we did NOT find something akin to Figures 2 or 3 above. And in fact, a Monte Carlo analysis using proxy data with autocorrelation characteristics like the highly smoothed data that they are using easily generates the kind of curves shown above. That’s what happens when one dataset is a subset of another dataset, and it should not be a surprise to anyone.
In addition, such relationships are often not stable over time. For example, Figure 5 shows the cross-correlation for the AMO and NAO datasets (1901-2010), along with the identical cross-correlation calculations for the first halves (1901-1955) and for the second halves (1956-2010) of the two datasets. As you can see, the relationship is far from consistent, with cross-correlations of the two halves being different from each other, and both being different from the full dataset as well. This increases the chance that we are looking at a spurious correlation.
Figure 5. A comparison of the cross-correlation of the 30-year smoothed AMO and NAO datasets with the cross-correlations of the first halves and the second halves of the same two datasets.
As you recall, they claim in their Abstract (above) that their results are “robust over the entire 20th Century”, but their own data says otherwise.
CONCLUSIONS
In no particular order:
• Since the NAO is a subset of the AMO, we would expect cross-correlation between the two at a number of leads and lags … and that’s what we find. The authors seem to find that impressive, but their results show levels of significance and shapes of the cross-correlation that are quite commonplace when one dataset is a subset of another and the two datasets are heavily smoothed.
• They have made no attempt to adjust their significance levels to reflect the fact that they have chosen one of twelve or more possible monthly subsets of the data. This is a huge oversight, and one that puts all of their conclusions into doubt.
• I am unable to replicate the results of their cross-correlations (what they call “lead-lag” correlations above) of the smoothed 1901-2010 DJFM NAO and AMO.
• I am also unable to replicate the results of their “bootstrap” method of calculating the p-value, although that is undoubtedly related to the fact that they did not disclose their secret method …
• They neglected to include a description of one of the most important parts of their analysis, the calculation of the significance using a bootstrap method.
• The use of smoothed data in doing cross-correlation analyses is an abomination. Nature knows nothing of the 30-year average changes. Either there is significant cross-correlation between the two actual datasets or there is not. Using smoothed datasets can even generate totally spurious correlations. I give some examples here … and lest you think that I made up the idea that smoothing can lead to totally spurious correlations, it’s actually called the “Slutsky-Yule Effect”. Their use of smoothed datasets for cross-correlation alone is enough to entirely disqualify their study.
• As a result, were I a reviewer I could not agree with the publication of this study until those problems are solved.
A couple of things in closing. First, Science magazine recently decided to add a statistician to the peer-review panel for all studies … and as this paper clearly demonstrates, all journals might profitably do the same.
And second, the AMO and the PDO and the NAO are all parts of the global temperature record. As a result, using them to emulate the global temperature record as the authors have done can best be described as cheating. When someone does that, they are using part of what they are trying to predict as an explanatory variable …
And while (as the authors show) that is often a way to get impressive results, it’s like saying that you can predict the average temperature for tomorrow, as long as you already know tomorrow’s temperature from noon to 2pm. Which is not all that impressive, is it?
My best regards to all,
w.
De Maximis: If you disagree with me, and many do on any given day, please quote the exact words that you disagree with. That way, we can all understand exactly what your objection might be.
DATA AND CODE: The digitized 30-year smoothed datasets of the AMO and the NAO are here. The NOAA AMO data is online here, and the Hurrell NAO data is here. I haven’t posted the computer code. It is a pig’s breakfast, and as opposed to being “user-friendly”, it is actively user-aggressive … I may clean it up if I get time, but my life is a bit crazy at the moment, the data is there, and a cross-correlation is a very simple analysis that folks can do on their own.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.

“And second, the AMO and the PDO and the NAO are all parts of the global temperature record. As a result, using them to emulate the global temperature record as the authors have done can best be described as cheating. When someone does that, they are using part of what they are trying to predict as an explanatory variable …”
Seconded.
I will add another way of fooling yourself.
You will find people arguing that El nino ‘explains’ global warming.
Its a related logical failing.
And second, the AMO and the PDO and the NAO are all parts of the global temperature record. As a result, using them to emulate the global temperature record as the authors have done can best be described as cheating
I don’t believe they intended to cheat. I think they did what they did out of ignorance. Nice clean analysis Willis. Thanks !
Mosher statement seems a little iritating to me. I don’t like explains in comma quotes and that he calls it a related logical failing. Please explain Steven !
In addition, they are not using the full year. Instead, they are using a 4-month subset of the year, DJFM. While there is no inherent problem with doing this, it definitely messes with the statistics.
I’ve seen that a lot. The reason given is that the bulk of NAO effect seems to be seasonal. So removing the remaining seasons is considered beneficial in isolating the effects.
I’m not a scientist by trade, but I’m not scientifically ignorant either. Your explanations really go in-depth and I appreciate that.
Where I live, the University of Alaska Fairbanks is one of the premier research facilities for geophysical science. It’s well-known here that global temperatures were increasingly slightly, lagging slightly behind the sun’s most recent solar maximum. Global temperatures are now decreasing gradually following the sun’s most recent solar minimum. As my screen name suggests, aurora activity is a hobby and we went almost five years with hardly any auroral activity — consistent with a solar minimum.
In other words, global warming and cooling are part of a natural cycle and scientists in some fields know this, but they aren’t the ones with the microphones these days.
It’s sad when science gets hijacked by politics.
Typo:
Should read: “Nor are they using the standard NAO”?
Willis Eschenbach: “I generally use the method of Quenouille.”
For those of us who don’t know any statistics, can someone supply a pointer to a derivation of that formula?
“Nor are they using the standard AMO, viz:”
NAO? not AMO…
Best Wishes Willis…How about a heart update.
You are absolutely right on main points: (1) if the selection of months was based on prior analysis of data (who would believe an assertion to the contrary?) then the correction to the significance level has to be computed and used; (2) with autocorrelated data, bootstrapping is not straightforward, so the algorithm that they used needs to be described in detail. Even p < 0.05 is not strong evidence against the null hypothesis.
Correlating part to whole is a little like predicting adult height from pre-adolescent height. It isn't a foregone conclusion that the correlation will be high, but the hypothesis of 0 correlation is not the correct hypothesis to test for any meaningful improvement in knowledge.
Thanks again.
Make a cumulative of MEi index and you get exactly the North Atlantic SST record. North Atlantic SST record imprints its wave shape into the whole global record. Global record is a composite of individual ocean basins SST. There is no global record, which affects lets say North Atlantic record, it is exactly vice versa.
I miss in the article a graph of NAO and AMO compared against each other. Or winter NAO compared to winter CET (excellent corelation), or AMO compared to summer CET (again excellent correlation).
http://www.climate4you.com/images/CentralEnglandTempSince1659%201100pixel.gif
Some 80 years ago, NAO peaked in 20ties (bringing even warmer decade of British winters than in 2000s) and AMO peaked two decades later, making summers warmer. Then it happened, that NAO and AMO peaked close to each other in 90s/2000s, bringing warm winters and warm summers almost at the same time. Since 1990, NAO goes down and CET winters follow. Winters in Europe, Siberia and USA are trending colder since 1990.
http://data.giss.nasa.gov/cgi-bin/gistemp/nmaps.cgi?year_last=2014&month_last=2&sat=4&sst=0&type=trends&mean_gen=1203&year1=1990&year2=2014&base1=1951&base2=1980&radius=250&pol=rob
There is no place for any radiative theories. Climate is a function of various local climate cycles, combining against or together. Remember that Climategate e-mail “what if all that is just a multidecadal fluctuation? They will kill us probably..”
To hell with standards. In war, love, and climatology anything is allowed.
I didn’t find anything easy, but this paper refers to two papers by Quenouille:http://www.ism.ac.jp/editsec/aism/pdf/048_4_0621.pdf
It’s discussed in most statistical time series books.
Matthew R Marler: Thanks a lot for the paper, although as you no doubt assumed, I remain hopeful of finding something that’s more pitched down to my level.
Some time ago I did snag a set of lecture notes on time series, but then I got sidetracked. Having this come up may make me refocus my efforts.
Nice highlights Willis:
At which point I’m thinking, since we don’t have any idea where these ocean oscillations come from, or even if they are an actual repeating pattern, that this is pure data mining, where to find a FIFTEEN YEARS AHEAD PRECURSOR SIGNAL to be statistically significant you’d need–what?–300 years of data? Then Willis highlights part II:
Ha ha ha. If they only have a few multi-decadal maybe-oscillations to look at their findings CAN’T be “robust.”
And then they don’t even acknowledge that they have arbitrarily picked one of many statistical relationships to focus on? That is pretending to do statistics. W:
And who knows how many other ways they tried looking for correlations? If anything needs to be vetted by a statistician it is pure data-mining exercises like this.
evanmjones says:
April 2, 2014 at 12:03 pm
Indeed it is done a lot, Evan, and it’s not a problem. The only issue is that you have to adjust the significance levels when you start hunting for correlations at the monthly level.
w.
Alec Rawls says:
April 2, 2014 at 12:19 pm
[Thanks, Alec, fixed. -w.]
Joe Born says:
April 2, 2014 at 12:21 pm
Joe, it’s good to hear from you. Regarding Quenouille, I’ve always assumed that it is an ad-hoc formula. I do find a discussion here, which likely isn’t over my head but is definitely over my time allocation.
I’ve tested the Quenouille approach using Monte Carlo simulations, and it gives a reasonable result, sometimes a bit high, sometimes a bit low, and very occasionally way off. However, if I’m serious about wanting to know about the actual statistics for a given situation, I use Monte Carlo for the calculations and only use the Quenouille formula to check my results for reasonability. Data over theory is my watchword …
w.
While I am not as advanced as Willis in statistical analysis (*cough*a long way behind*cough*), I do recognize Mannian statistical method when I see it (too often still these days).
Thus to choose a sample like that by maximizing the statistical metric one wishes to maximize is called “peeking at the result”. It’s what Mann did with his Hockey Stick and it negates any and all statistical inferences to do so.
Well done Willis for spotting it, because what looks and feels like a spurious correlation very often is.
Matthew R Marler says:
April 2, 2014 at 12:26 pm
Thanks, Matt, glad to see you. Actually, it’s worse than you say. In your analogy, it is like predicting adult height from the length of the adult’s femur … so in fact it is a foregone conclusion that the correlation will be a ways from zero.
For example, despite the small size of the Atlantic Ocean w.r.t. the total surface area, the correlation between the detrended HadCRUT4 global temperature data and the AMO data (monthly, 1856-2012) is 0.47. For the annual data, it’s larger, 0.65.
As a result, if you try to use the AMO to “predict” the global temperature, you can be sure that you’ll get a good result … but it won’t mean anything because you are using part of what you are predicting as an explanatory variable.
w.
Many good points Willis. I also distrust this idea of picking out certain months.
However, the old “never smooth” falacy comes up again.
“The use of smoothed data in doing cross-correlation analyses is an abomination. Nature knows nothing of the 30-year average changes. Either there is significant cross-correlation between the two actual datasets or there is not. ”
The danger of seeing false significance is real and something that needs to be guarded against. I see no sign that the authors are aware nor that they have adjusted thier significance tests to account for the lesser degrees of freedom in the filtered data, so the criticism if justified but the dogmatic postition about ‘abominations’ is not.
If data is being “smoothed” to make it look smoother, the point totally valid, there is no reason to do stats on prettied up data. However, if the data was _filtered_ to remove, say, a strong annual cycle in order to study lesser magnitude inter-annual variation, then it is legit and it makes sense. This is why rail against using the term “smoothing” in place of frequency filtering, if that is what is meant.
Another thing you state incorrectly is that NAO is a subset of AMO. It is not. NAO is a sea level pressure index not SST, so it cannot be a subset.
In general you are right , this looks rather sloppy and has not been competently peer-reviewed.
“And second, the AMO and the PDO and the NAO are all parts of the global temperature record. As a result, using them to emulate the global temperature record as the authors have done can best be described as cheating. When someone does that, they are using part of what they are trying to predict as an explanatory variable …”
NAO is a pressure index, not a temperature index.
The idea of splitting the data can be a good test but the problem is that Hadley processing has seriously mangled the frequency content already and just comparing frequency spectra of different halves of HadSST3 gives very different results, despite ICOADS being quite similar.
Their error is probably in using the early data at all.
With all the anomalies, detrending and DJFM sub-setting and data adjustments going on I don’t know how meaning to give the results , whatever they are.
NAO does have some interesting periodic content but I’ve not compared it to AMO.
Willis Eschenbach:
Thanks for the response; since you and Mr. Marler have both directed me to the same paper, I feel honor-bound at least to make a more-creditable run at it once my taxes are done. (My first pass this afternoon didn’t quite do it.)
Anyway, the n_eff issue comes up enough on this site (and occasionally on Ms. Liljegren’s) that it will be worth my while to walk around in some Monte Carlo runs myself if (as I suspect) I ultimately fail to grok the theory. Thanks for that suggestion, too.
‘The North Atlantic sea surface temperature exhibits fluctuations on the multidecadal time scale’
Does it?
Does it really?
The North Atlantic is a big place, so big in fact that anyone who considers that they have measured ‘its’ ‘temperature’ in any meaningful way on any timescale is deceiving either themselves or us.
That’s not to say we shouldn’t try to make observations and indeed with satellites it may now be possible to make some vague generalised measurements but speaking as someone who makes a living by measuring and trying to control temperatures in air, water, and gases – let me assure you that anyone who claims to have identified a .5C fluctuation over 15 years in the temperature of THE NORTH ATLANTIC is an asshole!
‘ Not a lot of difference to be sure … but I’m using their data, so it should be exact,’
As you say that you digitized their data, is there any room in that process for a discrepancy to be introduced?
About two and half years ago I emailed one of my AMO-NAO graphs to Dr. Judith Curry. The reply was short: “hi, looks good! my main suggestion is to go back prior to 1920; 1850 if possible (the data in north atlantic should be good enough)….. Judy”
I found that the truncated AMO actually follows the northern NAO component by ~11 years, which is opposite to what the above article claims, the rest is here:
http://hal.archives-ouvertes.fr/docs/00/64/12/35/PDF/NorthAtlanticOscillations-I.pdf