Guest Post by Willis Eschenbach
Bob Tisdale has a detailed post on the new 2014 paper entitled “The Atlantic Multidecadal Oscillation as a dominant factor of oceanic influence on climate” by Chylek et al. Nic Lewis also did a good analysis of the paper, see the Notes below for the links. I have a different take on it than theirs, one which centers on the opening statement from their abstract:
ABSTRACT: A multiple linear regression analysis of global annual mean near-surface air temperature (1900–2012) using the known radiative forcing and the El Niño–Southern Oscillation index as explanatory variables account for 89% of the observed temperature variance. When the Atlantic Multidecadal Oscillation (AMO) index is added to the set of explanatory variables, the fraction of accounted for temperature variance increases to 94%. …
They seem impressed with a couple of things. The first is that their four aggregated forcings of greenhouse gases (GHGs), aerosols, volcanic forcings, and solar variations, plus an ENSO dataset, can emulate the global average temperatures with an adjusted R^2 of 0.89 or so. The second thing that impresses them is that when you add in the AMO as an explanatory variable, the R^2 jumps up to 0.94 or so … I’m not impressed by either one, for reasons which will become clear.
Figure 1. Forcings used in the Chylek et al. analysis of the Atlantic Multidecadal Oscillation. Note the different scales in each panel.
There are several problems with the analysis done in Chylek 2014. Let me take the issues in no particular order.
PROBLEM THE FIRST
Does anyone but me see the huge issue inherent in including the Atlantic Multidecadal Oscillation (AMO) Index among the explanatory variables when trying to emulate the global surface temperature?
Perhaps it will help if I post up the explanation of just how the AMO Index is calculated …
From their link to the AMO dataset (see below)…
The [AMO] timeseries are calculated from the Kaplan SST dataset which is updated monthly. It is basically an index of the N Atlantic temperatures. …
Method:
Use the Kaplan SST dataset (5×5).
Compute the area weighted average over the N Atlantic, basically 0 to 70N.
Detrend that time series
Optionally smooth it with a 121 month smoother.
In other words … the AMO is just the temperature of the North Atlantic with the trend removed.
So let me ask again … if we’re trying to emulate the “global annual mean near-surface air temperature for the period 1900-2011″, will it help us if we know the detrended North Atlantic temperature for the period 1900-2011 … or is that just cheating?
Me, I say it’s cheating. The dependent variable that we are trying to emulate is the global surface temperature. But they have included the North Atlantic temperature, which is a large part of the very thing that they are trying to explain, as an explanatory variable.
But wait, it gets worse. The El Nino index that they use is a fairly obscure one, the “Cold Tongue Index”. It is described as follows (emphasis mine):
The cold tongue index (CTI) is the average SST anomaly over 6N-6S, 180-90W (the dotted region in the map) minus the global mean SST.
There are a number of El Nino indices. One group of them are the detrended average of the sea surface temperatures in various areas—El Nino 1 through El Nino 4, El Nino 3.4, and the like. There is also the MEI, the Multivariate ENSO Index. Then there are pressure-based indices like ENSO, based on the difference in pressure between Tahiti and Darwin, Australia.
There’s an odd wrinkle in the cold tongue index (CTI), however. This is that the CTI is not detrended. Instead, they subtract the global average sea surface temperature (SST) from the average temperature in the CTI area of 6°N/S, 180° to 90° W.
But this means that they’ve included, not just the average temperature of the CTI area, but also the entire global SST as a part of their explanatory variable, because:
CTI Index = CTI Sea Surface Temperature – Global Mean Sea Surface Temperature
I ask again … if you are trying to emulate the “global annual mean near-surface air temperature for the period 1900-2011″, will it help if an explanatory variable contains the global mean sea surface temperature for the period 1900-2011 … or again, is that just cheating?
I have to say the same as I said before … cheating. Using some portion of this year’s global temperature data (e.g. North Atlantic SSTs or CTI SSTs or global SSTs) to predict this year’s global temperature data is not a valid procedure. I’m sure my beloved and most erudite friend Lord Monckton could tell us the Latin name of this particular logical error, but Latin or not … you can’t do ‘dat …
Which is why, although the authors seem to be impressed that including the AMO increased the adjusted R^2 up to 0.94, I’m not impressed in the slightest. You can’t use any part of what you are trying to predict as a predictor. See how the AMO index (bottom right, Fig. 1) goes down until 1910, then up until 1940, down until 1970, and then up again? Those are the North Atlantic version of the very swings in temperature that we are trying to explain, so you absolutely can’t use them as an explanatory variable.
PROBLEM THE SECOND
Let’s look at just the forcings used in the climate models, setting aside the ENSO and AMO variables. Chylek 2014 uses the GISS forcings, which are composed of the following separate datasets:
Figure 1a. The ten categories of forcing in the GISS forcing dataset. Note the different scales for each panel.
Now, for anybody that thinks that e.g. ozone levels in the atmosphere actually look like that … well, seems highly doubtful. But while that is a problem in and of itself, that’s not the problem in this context. The problem here is that all of these are measured in watts per square metre (W/m2). As a result they should all have the same effect … but Chylek et al. do a strange thing. They add together the well-mixed ghgs plus ozone plus stratospheric H2O into one group they call “GHGs”. Then they put reflective aerosols, aerosol indirect, black carbon, and snow albedo into a second group they call “Aerosols”. Volcanic forcing are treated as a third separate group, solar is the fourth, and land use is ignored entirely. This grouping is shown in Figure 1 above.
Then each of these four groups (GHSs, Aerosols, Volcanoes, and Solar) gets its own individual parameter in their equation … but this means that a watt per square metre (W/m2) from aerosols and a W/m2 from solar and a W/m2 from GHGs all have a very, very different effect … they make no effort to explain or justify this curious procedure.
PROBLEM THE THIRD
Here’s an odd fact for you. They are impressed that they can get an R^2 of 0.88 or something like that (if they cheat and include the entire global SST within the “explanatory” variables of their model). I can get close to that, 0.87. However, let’s start by calculating the R^2 of a much simpler model … the linear model. Figure 2 shows the GISS Land-Ocean Temperature Index (LOTI), and a straight-line emulation. The odd fact is the size of the R^2 of such a simplistic model …
Figure 2. The simplest possible straight-line model. Black is GISS LOTI, red is the emulation.
Note that the R^2 of a straight line is quite high, 0.81. So their correlation of 0.88 … well, not all that impressive.
In any case, here are a few more emulations, with their corresponding adjusted R^2. First, Figure 3 shows their group called “aerosols” (AER) along with the volcanic forcing (VOL):
Figure 3. Emulation using the Chylek groups “Aerosols” (AER) and “Volcanic Forcing” (VOL). Note that watt for watt, the aerosols have about six times the effect of the volcanoes.
Now, even this bozo-simple (and assuredly incorrect) emulation has an adjusted R^2 of 0.854 … or, if you don’t like the use of aerosols, Figure 4 shows the same thing as Figure 3, but with GHGs in place of aerosols:
Figure 4. Emulation using solely GHGs (GHG) and volcanoes (VOLC). Note that watt for watt, the GHGs have about three times the effect of the volcanoes.
There are a couple of issues revealed by this pair of analyses, using either GHGs or aerosols. One is that you can hardly see the difference between the two red lines in Figures 3 and 4. Obviously, this means that getting a good-looking match and a fairly impressive-sounding adjusted R^2 means absolutely nothing about the underlying reality.
Another issue is the difference between the strengths of the supposedly equivalent W/m2 values from GHGs, aerosols, and volcanoes.
Having seen that, let’s see what happens when we use all of the Chylek forcings except the cheating forcings (ENSO and AMO). Figure 5 shows the emulation using the sun, the aerosols, the volcanoes, and the greenhouse gases:
Figure 5. Emulation using the four Chylek at al. groupings (greenhouse gases GHG, aerosols AER, volcanic VOLC, and solar SOL) of the ten GISS forcings.
Note that again, watt for watt the volcanoes are only about a third of the strength of the GHGs. The solar forcings are quite strong, presumably because the solar variations are quite small … which highlights another problem with this type of analysis.
So that’s the third problem. They are giving different strengths to different types of forcings, without any justification for the procedure. Not only that, but the variation in the strengths is three to one or more … I see no physical reason for their whole method.
PROBLEM THE FOURTH
Now we’ve seen what happens when we’re not cheating by using a portion of the dependent variable as an explanatory variable. So let’s start cheating and add in the ENSO data.
Figure 6. Uses all of the GISS forcings plus the ENSO cold tongue index.
As I said, I couldn’t quite replicate their 0.88 value, but that comes close.
Now, before I go any further, let me point out a shortcoming of all of these emulations in Figs 2 to 6. They do not catch the drop in temperatures around 1910, or the high point around 1940, or the drop from around 1940 to 1970. Even including all of the forcings, and (improperly) giving them different weights, Figure 6 above still shows these problems.
However, all of these global average temperature changes are clearly reflected in the corresponding temperature changes in the North Atlantic ocean … take another look at the bottom right panel of Figure 1. And so of course when they (improperly) include the AMO as an explanatory variable, you get a much better adjusted R^2 … duh. But it means nothing.
Figure 7. Emulation using all of the variables, including the Atlantic Multidecadal Oscillation (AMO).
PROBLEM THE FIFTH
All of the above is made somewhat moot by a deeper flaw in their analysis. This is the lack of any lagging of the applied forcings. IF you believe in the forcing fairy, then you have to believe in lags. Me, I don’t think that the changes in global average temperature are a linear function of the changes in global average forcing. Instead, I think that there are strong emergent temperature regulating mechanisms acting at time scales of minutes to hours, largely negating both the changes from the forcing and any associated lags. So I’m not much bothered by lags.
But if you think that global average temperature follows forcing, then you need to do a more sophisticated lagged analysis involving at least one time constant.
CONCLUSIONS
• I find the analysis in Chylek 2014 to be totally invalid because they are including parts of the dependent variable (ENSO and AMO) as explanatory variables. Bad scientists, no cookies.
• As is shown by the examples using either GHGs or aerosols plus volcanoes (Figs. 3 & 4), a good fit and an impressive adjusted R^2 mean nothing. We get equally strong and nearly indistinguishable results using either GHGs or aerosols. This is an indication that this is the wrong tool for the job. Heck, even a straight line does a reasonable job, R^2 = 0.81 …
• Giving different weights to different kinds of forcing (e.g. volcanic, solar) is a novel procedure that requires strong physical justification. To the contrary, they have not provided any justification for the procedure.
• As you add or vary the explanatory variables, their parameters change. Again, this is another indication that they are not using the right tool for the job.
• The lack of any consideration of lag in the analysis is in contradiction to their assumption that changes in the global surface temperature are a linear function of changes in global average forcing.
Best to everyone,
w.
De Rigeur: If you disagree with something I or anyone else says, please quote their exact words. That way, we can all be clear on exactly what you are objecting to.
LINKS:
DATA:
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
“I notice you are lumping all skeptics together.”
just the ones that howl inconsistently. Petr sent me this paper a bit ago.
my first observation was the same as Willis’. hmm dont use AMO like that.
Paul Linsay says:
March 16, 2014 at 4:23 pm
…
I just noticed that and scrolled down here to comment. Beat me by “that” much! Doesn’t effect the analysis. Thanks, Willis!
As a trained meteorologist studying this for the last 20 years I would say solar and land use should like be number 1 and 2. You drive from the outskirts of town at night and it’s 35 F and in to the downtown area it’s closer to 40. We’ve increased the pop of the world greatly. It’s just a little experience and intuition go a long way
Ourororos, not Ourboros?
Oops.
Ouroboros, not Ourboros?
Me, I say it’s cheating. The dependent variable that we are trying to emulate is the global surface temperature. But they have included the North Atlantic temperature, which is a large part of the very thing that they are trying to explain, as an explanatory variable.
But wait, it gets worse. The El Nino index that they use is a fairly obscure one, the “Cold Tongue Index”. It is described as follows (emphasis mine):
The cold tongue index (CTI) is the average SST anomaly over 6N-6S, 180-90W (the dotted region in the map) minus the global mean SST.
We come back to the case that if U and V are random variables, and if W is defined as W = U – V, then W is by definition positively correlated with U and negatively correlated with V. The more general case is discussed by T. W. Anderson, An Introduction to Multivariate Statistical Analysis, second edition, Theorem 2.4.1 p 25. He adds the distributional result for the case that U and V have a joint bivariate normal distribution, but the result for mean and variance is general (at least when U and V have variances). I have been waiting for the right time to bring this to your attention.
So, yes, more than just the high R^2 value is necessary in order to show that something has been revealed about the processes generating the nice regression.
On the other hand: (1) now that it is published, the paper is likely to be studied in graduate schools; (2) the importance of backward elimination for partially resolving problems of multicollinearity of predictors is nicely illustrated; (3) now that the paper has been published, other modelers are likely to follow with improved models; (4) the test of the utility of the model will be how well it (and descendents) fit out of sample data; (5) as I wrote before, it is “of a piece with” Vaughan Pratt’s modeling, in that it attempts to develop reasonable models for the background variation for getting the best estimate of the regression coefficient for the log[CO2] regressor; (6) exactly what the best measure of the ENSO oscillation is will only be elucidated by more work. You presented a lot of information in graphical form in your “Power Stroke” post, and in a response to a comment you outlined a lot of work on developing a (possibly nonlinear) high dimensional spatio-temporal vector autoregressive approach to modeling the transport of heat throughout the Pacific and toward the poles. A better approach in the mean time would probably be the relatively simple Southern Oscillation Index, which is merely a difference between two locations in the Pacific.
Whether log[CO2] is the best measure of the influence of CO2 also requires more work. You can model the last 150 years of temperature data equally well with and without it, but without an adequate knowledge of variation independent of CO2 we can’t test whether the modeled CO2 effect is or isn’t statistically significant. All of the models for natural variation are post-hoc. I made the same point when Vaughan Pratt’s model was posted here.
Willis, I know you don’t like being paraphrased so I went back and reread your comments on the power stroke. Here is one that I came up with, though several referred to possibly more analysis along these lines:
I called this post “The Power Stroke” to emphasize that what most people think is important about El Nino/La Nina analyses (temperatures and frequency of occurrence) is secondary. The point is not whether the temperatures in the El Nino region are up or down, nor is it how many times they go up and down.
The point is the size of the power stroke—how much warm water was actually moved? It doesn’t matter whether that happens in three big pushes or ten small pushes. By that I mean people get caught up in counting the frequency of the events, when that misses the point. The critical measurement, which as far as I know has not been done but I’m always surprised when some commenter finds it, is the total amount of energy moved per occurrence/month/year/decade.
In any case, Bob, I’m thinking it should be possible to use the Argo data to get some good numbers regarding how much warm water is being moved. And I’m hoping I’m not the one that ends up doing the shovel work … what we need to look at in my opinion is the total energy content of the mixed layer in the region shown in Figure 2 above. Who was it that did the great Argo work here on WUWT? A commenter named Andrew? Maybe he’d be interested …
I thought of (possibly nonlinear) vector autoregressive processes because they are simpler than complete sets of nonlinear partial differential equations and compartment models. My point is that I think you are correct to focus on mass and heat transport in the Pacific Oscillations. But it is a massive problem.
I probably shouldn’t even comment at all, as I’m too busy to comment and continue to discuss this week (sigh) but:
a) Nonlinear highly multivariate predictive modelling is one of my professional games. I have a goodly amount of personally written high end software for building e.g. genetic algorithm optimized neural networks with as many as hundreds of inputs built to predict subtle, highly nonlinear, highly multivariate behavior patterns on top of e.g. large demographic and transactional databases.
One of the first things one learns in this business professionally is that if your model is too good to be true, it is, well, too good to be true! It usually means, as Willis notes, that some of your input variables are consequences of the factor you are trying to predict, not causes. This happens all of the time, in commerce, in medical research, in mundane science. When you are trying to predict a training set on the basis of a variable that only gets filled in for customers that actually bought some product, the model building process will quickly learn that if it hotwires this variable straight through the complexity it can get pure training set gain. Indeed, you can build a model that will even work well for trial sets — drawn from the same process, that fills in the variable based on the targeted outcome. The problem comes the day you actually try to take your model, that you built for some client, and apply it to a million expensive new prospects that don’t have any value at all for that variable because they haven’t purchased the product — yet — and your expected phenomenally high return withers to where one barely beats random chance because your neural model was distracted by the meaningless correlation from investing neurons in the far more subtle and difficult patterns that might have led to actual substantial — but far more modest — lift above random chance.
The same problem exists — only worse — in medical research. Researchers have to be enormously careful in their model building process not to include “shadow variables” — variables that confound the desired result by having dependent values, e.g. from a test administered asymmetrically to the cancer survivors (who are alive to be tested!) compared to the non-survivors (who aren’t).
b) Regression models in general presuppose a functional understanding of the causal relationship. Multivariate linear regression models presuppose independent, linear causes. That is, if one doubles (say) the solar forcing, one doubles the linear contribution to the overall average surface temperature (which is, note well, not what is being fit because it is unknown within error bars the width of the entire temperature anomaly variation plotted, leading us to consider the problem of fitting a linear model with an unknown constant term with a high degree of uncertainty and the puzzling alteration of R^2 from anything you like to almost nothing attendant on its inclusion but that is another story) If one simultaneously doubles the aerosol screening, it will cause a doubling of a subtracted term. But there won’t be any nonlinear correction to the aerosol term from the solar term — such as might exist if the solar-radiation-aerosol connection turns out to be a physically valid and significant, such that increasing solar activity causes decreased average aerosol contribution or worse, directly mucks around with albedo.
But that isn’t quite right, is it? We aren’t talking about doubling solar activity. We are talking about doubling a solar activity anomaly — if we doubled solar activity, we would be in serious trouble. In one sense, this is a better thing. From the humble Taylor series, we can expect that even if there is a nonlinear direct functional relationship between solar state and temperature, for small changes in solar state the temperature should respond linearly with small changes of its own.
The problem then becomes “what’s a small change?” Or, when do second or higher order non-linear effects become important. The Taylor series itself cannot answer that, but the assumption of linearity in the model begs the question! A multivariate linear independent assumption only makes the problem worse. At some point, one is making the elephant wiggle its trunk, still without proposing anything like a meaningful model of an elephant.
Personally, I think there is little doubt that the climate is a non-linear, strongly-coupled system. Here’s why. Because it is stable. It oscillates around long-term stable attractors. It bounces. It varies substantially over time. The linear part of that is nearly all Mr. Sun, which provides a baseline energy flux that varies very slowly over geological timescales. Within the very broad range of possible climates consistent with this overall forcing, the climate system is non-Markovian, chaotic, nonlinear, multivariate, strongly coupled, and with whole families of attractors governing transient cyclic dynamics that bounce the planet’s state around between them while the attractors themselves move around, appear, and disappear. Linearizing this dynamics as if the linear model has predictive value is utterly senseless. So is (don’t get me started!) fitting nonlinear models, voodoo models, models with confounding variables, models that connect the phases of the moons of Jupiter — no matter how good correlation, with an absolute super R^2 one can come up with in a six or eight parameter model, I can come up with the latin for that one myself: post hoc ergo propter hoc. Correlation is not causality, and in fact can often either get causality backwards — as in this case — or get correlation by pure accident.
There is nothing magical about Taylor series. There is nothing magical about Weierstrauss’ theorem (a general justification for fitting higher order polynomials to bounded segments of data in such a way that will almost never extrapolate outside of the fit data).
c) OK, so if we can’t just take a bunch of variables and fit a predictive multiple linear regression model with a great R^2 and expect it to actually, well, predict (especially when the predictors included are physical consequences of the predicted values as much as they may well be part of their future physical causes in a nonlinear, non-Markovian dynamical system!) what can we do?
That is simple. We can either try the microscopic route — solve the Navier-Stokes system at sufficiently fine spatiotemporal resolution and see if we can eventually predict the climate, or we can try to build a serious highly multivariate nonlinear predictive model (using e.g. neural networks) without inserting any prior assumptions about functional relationships — neural networks are general nonlinear function approximators that are empirically optimized to fit the training data in a way that further optimally predicts trial data and that ultimately has to predict reality before being considered “proven” in any sense at all) or we can try something in between — build semi-empirical non-Markovian models that operate microscopically but also take into account the causal linkages to large scale climate quasiparticles like ENSO and the NAO. But these models will not be simple empirical fits and they are difficult to abstract theoretically.
In physics it would be the difference between building a model of interaction between e.g. surface plasmons (coherent collective oscillations of many, many electrons at the surface between two materials) and building a microscopic model that solves the underlying many electron problem for the complete system. The latter is computationally intractible — as is solving the N-S equations for the planet and its oceans driven by a variable star. The former requires substantial insight, and for it to work, of course, the electrons have to self-organize into coherent collective oscillations in the first place that can then be treated as “quasi-particles” in a higher order theory.
Climate science is stuck because it won’t admit that it cannot solve the N-S equations in any predictively meaningful way — GCMs simply do not work well enough at their currently accessible resolution, and may well never work well enough. We will never be able to solve the many electron model precisely for any large quantum system, and we know it, which is what motivated the search for approximate and semi-empirical methods that could come close, often based on quasiparticles (like plasmons) or on semi-empirical functional forms (like density functionals built with some theoretical justification but with adjustable parameters that are set so that the models work, sort of, mostly, in limited regimes).
rgb
Thanks, Willis. An excellent article.
Thanks for reminding me of the “Bestiary” from Jorge Luis Borges, where I first learned about the Ouroboros.
And yes, the Chylek et al. paper is not worth the electrons it is made of. No honest thinking seems to have been wasted in it.
Using different weights for different forcings makes sense if the magnitude of some of them is not actually known, but the time history’s “shape” is. Though neither is really the case with aerosol forcing. It’s really just a case of fooling yourself with an over fit. I’ve done a fun exercise that illustrates this well by creating a “fit” model to the data that is obvious nonsense but does very well.
Hi Willis,
I have no problem with using an ENSO index to account for short term variation; the mechanism by which ENSO influences temperatures (especially in the tropics) is pretty well understood, and there is no reason to believe that the trend in average surface temperature is diving the ENSO rather then the other way around. There is a lag (~3 – ~4 months) between the Nino 3.4 index and the average surface temperature, indicating the direction of causation is ENSO—> global temperature; it’s had to see how the Nino 3.4 can lead the short term variation in global surface temperature by several months if the direction of causation is actually the opposite.
I agree that using the AMO index is more problematic, because the mechanism is not clearly understood, though there is at least some indication the AMO is physically related to the rate of Atlantic overturning. But since that mechanism remains somewhat speculative, the AMO could in fact just be a reflection of trend in average temperature. If actual measurements of the Atlantic overturning rate were used instead of the AMO index, then I think that would be a more defensible regression variable.
I completely agree with your critique of the absence of lag applied to forcings (which ought to be modeled with at least two lag functions… faster and sower), and also agree with your critique of the (absolutely nutty) acceptance of large differences in the effects of individual forcings which are all expressed in units of watts/M^2. The most rational approach is to sum the individuals into a single forcing function, then apply a suitable lag function, before regression against temperatures.
“Does anyone but me see the huge issue inherent in including the Atlantic Multidecadal Oscillation (AMO) Index among the explanatory variables when trying to emulate the global surface temperature?”
I see a bigger one, the AMO functions as a negative feedback to solar forcing. Low solar plasma speeds gives negative NAO/AO phases, and the altered atmospheric circulation increases the poleward warm sea water transport, which is why the AMO rises sharply from 1995. Continued higher forcing results in a negative AMO.
I am delighted to be reading this thread. The submissions since about 12.00 pm on 16th March have been most instructive for me, with insightful (I think I’m right here!) comment and maths and statistics. I particularly enjoyed RGB’s contribution (still enjoying it) and thinking about all that expertise that is not getting tapped by climatologists.
I have a very small contribution/comment to make, regarding CO2’s influence. I believe that the very close linear fit of CO2 concentration in the atmosphere to time effectively makes it redundant as a predictor. OK, it varies by the month very regularly, but climate/temperature with all its complexities must lag CO2 by a substantial time. It certainly won’t respond on a monthly basis. It happens to be the only possible forcing that humanity just might be in a position to “control” – though there’s little sign of that yet! So, use CO2 concentration or Time, but not both.
For any climate model or prediction to have any practical value it seems to me that the potential predictors /must/ lead the dependent variable, frequently global surface temperature, by a substantial time interval. If one uses current values of the predictors the stats reduce to not a lot more than correlations, which interesting though they might be are of theoretical, not practical value. For that we need leading indicators. So the question that arises is what is a useful lead time for decision makers? Is it weeks, months or years? As an industrial bench chemist it dawned on me after several years that useful technical discoveries (that is profitable ones) seemed to depend on unusual circumstances, such as an unexpected and significant two factor interaction – to use analysis of variance terminology – and these approach frequency of the fabled hen’s teeth in the real world. In climate, with its horrendously complex possibilities that RGB so fluently described I fear that the magic predictive combination of potential independents and their interactions are destined to remain mysterious, even assuming the (unlikely) event that they exist and are linear in their actions.
I hope to read more input wisdom from people like RGB and Matthew R Marler
A few comments in response to a rather witty and decidedly stringent critique (speaking only for myself and not my co-authors):
A principal goal of our study was to estimate the relative contributions of anthropogenic and natural climate forcing on the recent global mean temperature history of the atmosphere. To this end we employ a straightforward empirical statistical analysis via a regression model wherein we assume a linear relation between the observed temperature and a set of physically distinct and plausible explanatory variables (predictors). A typical set of explanatory variables includes the known radiative forcings, and additional factors characterizing the oceanic influence on climate. In this latter category we include the AMO (Atlantic Multi-decadal Oscillation) index, which is intended to represent some important aspects of the large scale oceanic thermohaline circulation that transports heat from tropical to polar regions. However, the AMO is presently measured by monitoring the sea surface temperatures (SST) of the North Atlantic, and this makes it vulnerable to the criticism that it really amounts to just a regional temperature time series, which therefore should not be put in the category of a predictor or forcing term, but rather a response thereto. This is the position taken by Willis, who asserts that to regard the AMO as a predictor basically amounts to cheating. Accordingly, a closer look at what the AMO represents is warranted.
Of course at the outset one can defend the reasonableness of the premise that the AMO as measured represents primarily a forcing effect of an ocean circulation, as the point of constructing the index was to somehow capture the capacity of ocean currents to transport heat anomalies to the atmosphere. Although the physics of this transport and the accompanying exchange processes are poorly understood, at least the qualitative notion that it is occurring seems quite sound. Otherwise, if it were just to be regarded as no more than a kind of proxy regional temperature measurement, the AMO would devolve into insignificance and would rarely be invoked as a determinant influence in any study. Of course, this is a rather weak argument based partially on semantics and the definition of the AMO, so more is needed.
We are relying on detrended North Atlantic SST measurements representing the temperature history of the mixed surface layer in the North Atlantic ocean. Energy exchange across the sea-air boundary is often dominated by an upward transfer of sensible, latent, and heat radiation from sea to air, given the dominance of density, heat capacity, and latent heat of water relative to vapor in air. Temperature changes in the atmosphere will also drive heat fluxes into the layer, but the huge heat capacity of the surface layer relative to that of the atmosphere implies that short term atmospheric trends will be damped out, so the rationale for linear detrending to isolate a true large scale signal seems justified.
As for the reality of that large scale signal, there is considerable empirical evidence available to demonstrate it. For example, from the recent article:
North Atlantic Ocean control on surface heat flux on multidecadal timescales
By: Gulev, Sergey K.; Latif, Mojib; Keenlyside, Noel; et al.,
NATURE Volume: 499 Issue: 7459 Pages: 464-+ Published: JUL 25 2013, we have the following assessment:
“Direct evidence of the oceanic influence of AMV (Atlantic Multidecadal Variability imprinted in SST’s) can only be provided by surface heat fluxes, the language of ocean-atmosphere communication. Here we provide observational evidence that in the mid-latitude North Atlantic and on timescales longer than 10 years, surface turbulent heat fluxes are indeed driven by the ocean and may force the atmosphere, whereas on shorter timescales the converse is true, thereby confirming the Bjerknes conjecture.”
And as for the nature of the signature, it is known that instrumental sea surface temperature records in the North Atlantic Ocean are characterized by large multidecadal variability. The lack of strong radiative oscillatory forcing of the climate system at multidecadal time scales and the results of long unforced climate simulations have led to a consensus view that the AMO is an internal mode of climate variability. An examination of this hypothesis carried out by Jeff Knight using simulations based on the Coupled Model Intercomparison Project Phase 3 (CMIP3) database has shown that:
” The differences found between observed and ensemble mean temperatures could arise through errors in the observational data, errors in the models’ response to forcings or in the forcings themselves, or as a result of genuine internal variability. Each of these possibilities is discussed, and it is concluded that internal variability within the natural climate system is the most likely origin of the differences.”
(The Atlantic Multidecadal Oscillation Inferred from the Forced Climate Response in Coupled General Circulation Models
By: Knight, Jeff R.
JOURNAL OF CLIMATE Volume: 22 Issue: 7 Pages: 1610-1625 Published: APR 2009 )
Our study also includes an empirical test that supports the idea that the AMO represents an important driver for the oceanic influence on climate. The conventional point of view of the climate modeling community is that rising greenhouse gases (GHG) are strongly correlated with (and largely responsible for) rising global temperatures. Therefore, if the AMO were just a proxy for global temperature, as Willis asserts, then the AMO should also be strongly correlated with rising GHG. But we tested that hypothesis and found it failed: in our assumed space of physically distinct forcing functions, the basis vectors representing AMO and GHG were found to be nearly orthogonal. Given the significant correlation found between AMO and global temperature, this strongly suggests that the AMO is a valid driver of global temperature, which at least over the ~100 yr. time period considered has operated essentially independently of the effects of GHG.
For shorter timescales of the order of a decade or less, the study by Gulev et al. (see above) suggests there may be some net transport of turbulent heat flux from atmosphere to ocean, i.e., on this scale the premise that the AMO is an (independent) explanatory variable responsible for the behavior of the (dependent) air temperature does indeed become questionable. And in fact, our regression analysis has shown that to some extent the AMO can subsume the roles played by short term radiative forcings due to solar input, volcanoes, and the El Nino/Southern Oscillation (ENSO). This is evidence that some imprinting of those explanatory variables on the SST measurements of the North Atlantic does in fact occur, and therefore on this short timescale the AMO should not be regarded as an independent driver of air temperature.
But once again we should reiterate that on the longer timescales of interest in our study (decadal and more), Gulev et al. find the energy flow is from ocean to atmosphere, in support of the use of the AMO as an independent predictor of air temperature in regression models.
—
A separate complaint by Willis has to do with our selection of categories of radiative explanatory variables. Here we were simply trying to take into account the fact that radiative forcing factors that operate according to different physical mechanisms should be treated separately, at least in principle, even though our simple regression model doesn’t take those differences into account explicitly. Our point of view in this case has been understood and defended earlier in these blog comments by Martin Lewitt, which we repeat here:
Martin Lewitt says:
March 16, 2014 at 9:33 am
Assigning different weights to different W/m^2 forcings is actually a refreshing admission of complexity and nonlinearity. Each of the forcings coupled to the climate differently in vertical and geographical distribution and in some cases chemically (as in Solar generating ozone), etc. In a nonlinear dynamic system it is the assumption that they were all equivalent that would have to be justified. Representing each forcing by the variation in a globally and annually averaged W/m^2 figure is a poor proxy for these coupling differences, but then a grid from the ocean mixing layer to the stratosphere is the reason we have AOGCMs. The reason the author didn’t explain the allowing of different weights is that it is common knowledge, e.g., here are other refreshing acknowledgements of the implications of non-linear dynamics. Knutti and Hegerl state in their 2008 review article in Nature Geoscience:“The concept of radiative forcing is of rather limited use for forcings with strongly varying vertical or spatial distributions.”and this:“There is a difference in the sensitivity to radiative forcing for different forcing mechanisms, which has been phrased as their ‘efficacy’”
James, let me start by saying that you have earned great respect for your willingness to defend your own work against its detractors. It is a refreshing and rare thing to find in the climate discussion. My comments follow.
James D. Klett says:
March 24, 2014 at 3:11 pm
Let me say that you are correct about my objection. Since you include the detrended temperature of a good chunk of the planet among the predictors, of course your prediction will improve.
I fear I can’t make sense of this. It seems you are saying that because ocean circulation affects North Atlantic temperatures, it’s OK to use the NA temperatures as a predictor … how does that work, exactly?
While this is true, I’m not clear what you think that means or implies.
Your argument here seems to be that because you have linearly detrended the temperature signal, it’s OK to use it as a predictor … once again, how does that work? For example, look at my Figure 1:

A recurring problem with the models is that they have a hard time emulating the see-saw action of the temperature. As a result, when they only use the first five forcings above with the exception of the AMO, they cannot replicate the temperature rise to 1945, fall to 1975, rise to 1998, and then go flat. But when you add in the AMO, presto! All of those problems are resolved.
Why?
Because you’ve included a good chunk of what is to be predicted among your explanatory variables … and because it is part of the global temperature, it contains the gross variations in the global temperature.
So you’re telling me that they used models to look at whether the observations or the models were at fault, or whether it was internal variability, which is climatespeak for “we don’t have a clue” … and the surprising result was, the models were very clear that the models weren’t at fault.
I’m sorry, good sir, but I fear that doesn’t establish much …
Fallacy of the excluded middle. There is a third possibility that you fail to consider. This is the possibility that the idea that CO2 is the secret climate control knob is … well, let me call it highly unlikely. My research indicates that changes in forcing, whether from CO2 or other sources, have little effect on the temperature.
In addition, you say ” if the AMO were just a proxy for global temperature, as Willis asserts …”. I fear I have asserted no such thing. I assert that the AMO is a PART of the global temperature, not a proxy for the temperature, and as such it cannot be included as an explanatory variable for emulating global temperature.
Glad to hear it. That observation certainly lends credence to my point of view, which is that CO2 and other GHG levels have little to do with the temperature … so thanks for the evidence supporting my hypothesis.
No, not at all. It strongly suggests that the AMO is a PART of the global temperature. Siberia is as well correlated with the global temperature as is the North Atlantic … does that significant correlation imply that Siberia is a “valid driver of global temperatures”???
I don’t follow the logic in that at all. If you want to use the AMO as a PREDICTOR as you say, then you have to use this year’s AMO to predict next years weather. I have no problem with that effort.

But that doesn’t mean you can use this years AMO as an independent explanatory variable to predict this year’s weather. That’s including part of the predictand as an explanatory variable. No bueno.
Let me see if I can explain it another way. Here is the correlation between global temperature and gridcell temperature:
Now, this is monthly CERES data, looking at the long-term (decadal) correlation of the gridcell temperature with the global temperature (after removal of monthly climatology).
As you can see, the North Atlantic is far from the only place that has a strong correlation with the global temperature. For example, I’ve highlighted the Nino3.4 region, and there are many others
But if I’m trying to emulate the global temperature, I cannot use this year’s temperature of any of those regions to help predict this year’s global temperature. As I said in the head post, it’s cheating, whether the area in question is the Nino3.4 region, Siberia, or the North Tropical Atlantic, all of which are correlated with global temperature.
Since the indirect and the direct aerosol effect operate on completely different principles and involve completely different physical mechanisms, and airborne black carbon involves a third physical mechanism and black carbon on snow involves a fourth distinct mechanism, I fear that your explanation doesn’t …
I am quite familiar with the literature on the “efficacy” of different forcings, which is pretty skinny. Mostly it’s one study, by James Hansen and two dozen co-authors. Basically, they find a difference in efficacy (using climate models) of about 2:1 between the various forcings, based on model results.
You, on the other hand, have arbitrarily divided ten forcings into four groups, and then assigned freely-variable parameters to each one. That is an entirely different procedure from what they have done in the “efficacy” study … and it is a procedure that has no physical justification, not even the poor justification of a climate model. It is merely a parameter-fitting exercise … and I assume you are familiar with the story of Freeman Dyson and the elephant?
Here are the pseudo-efficacies from the fitting process:
(Intercept) GHG AER VOLC SOL ENSO AMO -0.291 0.231 -0.054 0.042 0.139 0.004 0.457Heck, the best fit actually has a negative weight for aerosols … and the “efficacies” vary by a factor of six from volcanoes to GHGs … meaningless.
As a result, I’m sorry but I still see no justification, either physical or theoretical, for using a fitting procedure to assign pseudo-efficacy values to randomly assigned groupings of variables … and I see lots of reasons that it’s a bad idea.
Again, my thanks for your participation, and if you still disagree and/or have questions (as is extremely likely), I’m glad to discuss any and all of this further.
w.
Willis, It appears very little headway has been made in the narrowing of differences of perception as to what was or was not accomplished in this particular climate study, but at a minimum some civil discourse (at least by the standards so often seen in this polarized arena of “climate change”) is on the record here for others to ponder as they wish, and that may prove helpful to some.
In a situation like this there are rapidly diminishing returns expected from extending the discussion further. Nevertheless, I will respond briefly to a couple of questions, since the answers illustrate some principles under debate with specific and easy to understand examples.
Question 1:
“Siberia is as well correlated with the global temperature as is the North Atlantic … does that significant correlation imply that Siberia is a “valid driver of global temperatures”???”
If by “Siberia” you mean air temperatures measured at meteorological stations located in Siberia, then the answer would be “no”; i.e., in this case a subset of near-surface global air temperatures are being measured. Since no exceptional physical influences are expected at such Siberian stations compared to other locations on the planet, we expect these temperature measurements to be fairly representative of the global mean values, allowances being made for the latitudinal effect of reduced solar insolation in Siberia.
But note there was a time when the answer would have been an emphatic “yes”: About 250 million years ago, a mantle plume erupted through the crust in Siberia, which likely caused the Permian-Triassic extinction event. According to Wikipedia, this event, “also called the Great Dying, affected all life on Earth, and is estimated to have killed 90% of species living at the time”. So when there was a flux of energy across the Earth’s surface to the air in Siberia, that flux and the attendant surface temperature were at that time valid drivers of global air temperature.
Similarly, today on a much smaller scale there is considerable evidence that there is a time dependent net flux of energy across the surface of the North Atlantic into (and at times out of) the air above, and the AMO is thought to provide a signal for that phenomenon. To the extent that is true, the AMO can also be employed as a valid driver of global air temperature.
The fact that the AMO signal is a temperature time series for the surface mixing layer does not by itself disqualify it in this role. There seems to be an idee fixe in play that because the AMO is a temperature series, like the global mean air temperature, it must somehow automatically be equivalent to the latter, or at least to “a good chunk” of the latter; this is a kind of association fallacy. One should keep in mind, for example, that in addition to the oceanic thermo-haline heat flux being monitored, the connection between temperatures in the mixing layer and the air above involves many nearly, and some outright, intractable, stochastic and nonlinear processes; the two temperature series are simply not physically equivalent surrogates. In this regard, too, it is worth emphasizing again that the SST measurements are not taken right at the ocean surface, nor are they intended to describe the temperature there:
“All of these ship and buoy SSTs are estimates of some type of bulk SST, which does not actually represent the temperature at the surface of the ocean. This bulk SST is of significant historical importance since it has been used in the formulation of all of our so-called “bulk” air/sea heat flux formulae and because it supplies an estimate of the local heat content. Numerical models today require the input of some form of bulk SST for their computation in spite of the fact that it is the skin SST that is in contact with and interacts with the overlying atmosphere. Some people think that the difference between the skin and bulk SSTs is a constant to account for the cooler skin temperatures. This is not the case as the skin SST is closely coupled to the atmosphere-ocean exchanges of heat and momentum making the bulk-skin SST difference a quantity that varies with fairly short time and space scales depending on the prevailing atmospheric conditions (wind speed and air-sea heat flux).”
(Estimating Sea Surface Temperature from Infrared Satellite and In Situ Temperature Data, Emery, W. J.; Castro, Sandra; Wick, G. A.; Schluessel, Peter; Donlon, Craig; Bulletin of the American Meteorological Society . Dec2001, Vol. 82 Issue 12, p2773. http://icoads.noaa.gov/advances/emery.pdf)
Question 2:
“I assume you are familiar with the story of Freeman Dyson and the
elephant? ”
This is an amusing story, and I’m a great fan of any anecdotes about the legendary figures. But when John von Neumann said “… with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”, he was talking about parameters that could be adjusted arbitrarily. For example, for the case of a line, y = ax + b , you can produce them located anywhere and pointing any which way in the plane, i.e., of any slope and any intercept, by varying parameters (coefficients) a and b independently and arbitrarily. But if you are fitting that line to data or a general function over an interval by a least squares procedure, a unique pair of parameters a and b is determined, within specified tolerances, and we know the corresponding unique line will appear intuitively appropriate, i.e., devoid of arbitrariness. And, if you try fitting a parabola instead, or a cubic, or some higher order polynomial with even more parameters needed for its definition, theory and experience shows that the resulting fit to the data or general function likewise produces a unique solution set of parameters, and the fit usually gets better the
more parameters that are employed in this fashion, i.e., as more relevant fitting functions are
introduced. This is quite analogous to the regression analysis we’ve employed for fitting plausible
explanatory variables to the global mean air temperature. Accordingly, trying to get von Neumann (plus Fermi and Dyson) on our case – a scary thought – is in this instance completely off base.
James D. Klett says:
March 30, 2014 at 9:59 pm
Welcome back again, James. I can only agree with the thought above.
I don’t understand this argument at all. You seem to be saying that BECAUSE there is a “time dependent net flux of energy across the surface of the North Atlantic”, that somehow it’s OK to use the detrended North Atlantic temperature to predict the global temperature. But the same is true of every ocean everywhere in the world—there is a time dependent net flux of energy across the surface of the ocean everywhere that I know of … and? How does this somehow make it OK to use data from tomorrow to predict tomorrow as you are doing with the AMO?
Is the AMO signal equivalent to a “good chunk” of the global temperature? Well, here’s the relationship:

So regardless of your unscientific attempt to denigrate the claim by calliing it an “idee fixe”, it’s actually the “idee juste”, the correct idea because in fact, the AMO IS equivalent to a “good chunk” (correlation +0.68) of the HadCRUT4 data. It’s obvious in the plot above.
No one said that they are “physically equivalent surrogates”. I said that since the AMO is nothing but detrended temperature, it is cheating to use it or any other detrended temperatures to predict the temperature. Why is that so hard to understand?
While that is true, I fail to see the relevance. If the processes and phenomena were as different between the AMO and the HadCRUT4 measurements as you claim, then why is the correlation AMO-HadCRUT4 almost 0.7???
Finally, let me say it again. The AMO is nothing but a detrended part of HadCRUT4. As such, using the AMO as an explanatory variable for HadCRUT4 doesn’t pass the laugh test.
Here’s an equivalent example. Suppose I’m trying to predict the S&P 500 stock price. Is it legitimate for me to use the detrended values of a hundred of the 500 stocks to predict the stock prices?
Absolutely not. Why? Because you don’t have the data at the time when you’re trying to make the prediction.
… and that’s exactly what you’ve done in your analysis.
Absolutely not. He was indeed talking about parameters that can be adjusted arbitrarily. However, Dyson didn’t pick the numbers arbitrarily. In both your case and Dyson’s case, the arbitrarily adjustable parameters were adjusted and fitted to give the correct answer … no difference at all.
You placed no restrictions that I can see on your parameters, which assuredly means that they are “arbitrarily adjustable”. Of course, like Dyson, you picked values that solved the problem … so what? They are still arbitrarily adjustable.
Again, the question is not whether the fitting is “devoid of arbitrariness”. It is whether the numbers are fitted by whatever manner to give the right answer.
Hayieee … that’s going exactly the wrong direction. Why is this so hard to explain?
OK, let me try it this way. Please distinguish for me the following two sentences. First your statement about your parameters:
Then Fermi’s statement about Dyson’s parameters:
“With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”
I ask because I see no difference between them … but obviously you think your statement is a good thing for your results, and Fermi’s statement is a bad thing for your results. How can that be?
Indeed it is, and that is exactly the problem. You are fitting freely variable parameters. In the trade this is called a “curve fitting exercise”, and as Fermi pointed out, it doesn’t mean a damn thing. Just like yours, Dyson’s parameters were arbitrarily adjustable. Just like yours, Dyson’s parameters were fitted to give the best results.
And just like yours, Dyson’s freely-variable fitted parameters made his whole equation totally meaningless …
You’ll have to explain that one to me again. The fact that freely adjustable parameters are fitted by a mathematical procedure rather than set in some other manner does NOT make them acceptable, or turn them into something different than what Dyson warned us against. Freely adjustable parameters are just that, no matter what means you take to adjust them.
Again, my thanks for your continued participation, and if you have further points, comments, or questions I’m here.
w.