Chylek Imitates Ouroboros

Guest Post by Willis Eschenbach

Bob Tisdale has a detailed post on the new 2014 paper entitled “The Atlantic Multidecadal Oscillation as a dominant factor of oceanic influence on climate” by Chylek et al. Nic Lewis also did a good analysis of the paper, see the Notes below for the links. I have a different take on it than theirs, one which centers on the opening statement from their abstract:

ABSTRACT: A multiple linear regression analysis of global annual mean near-surface air temperature (1900–2012) using the known radiative forcing and the El Niño–Southern Oscillation index as explanatory variables account for 89% of the observed temperature variance. When the Atlantic Multidecadal Oscillation (AMO) index is added to the set of explanatory variables, the fraction of accounted for temperature variance increases to 94%. …

They seem impressed with a couple of things. The first is that their four aggregated forcings of greenhouse gases (GHGs), aerosols, volcanic forcings, and solar variations, plus an ENSO dataset, can emulate the global average temperatures with an adjusted R^2 of 0.89 or so. The second thing that impresses them is that when you add in the AMO as an explanatory variable, the R^2 jumps up to 0.94 or so … I’m not impressed by either one, for reasons which will become clear.

forcings used in chylek et al 2014Figure 1. Forcings used in the Chylek et al. analysis of the Atlantic Multidecadal Oscillation. Note the different scales in each panel.

There are several problems with the analysis done in Chylek 2014. Let me take the issues in no particular order.

PROBLEM THE FIRST

Does anyone but me see the huge issue inherent in including the Atlantic Multidecadal Oscillation (AMO) Index among the explanatory variables when trying to emulate the global surface temperature?

Perhaps it will help if I post up the explanation of just how the AMO Index is calculated …

From their link to the AMO dataset (see below)…

The [AMO] timeseries are calculated from the Kaplan SST dataset which is updated monthly. It is basically an index of the N Atlantic temperatures. …

Method:

Use the Kaplan SST dataset (5×5).

Compute the area weighted average over the N Atlantic, basically 0 to 70N.

Detrend that time series

Optionally smooth it with a 121 month smoother.

In other words … the AMO is just the temperature of the North Atlantic with the trend removed.

So let me ask again … if we’re trying to emulate the “global annual mean near-surface air temperature for the period 1900-2011″, will it help us if we know the detrended North Atlantic temperature for the period 1900-2011 … or is that just cheating?

Me, I say it’s cheating. The dependent variable that we are trying to emulate is the global surface temperature. But they have included the North Atlantic temperature, which is a large part of the very thing that they are trying to explain, as an explanatory variable.

But wait, it gets worse. The El Nino index that they use is a fairly obscure one, the “Cold Tongue Index”. It is described as follows (emphasis mine):

The cold tongue index (CTI) is the average SST anomaly over 6N-6S, 180-90W (the dotted region in the map) minus the global mean SST.

There are a number of El Nino indices. One group of them are the detrended average of the sea surface temperatures in various areas—El Nino 1 through El Nino 4, El Nino 3.4, and the like. There is also the MEI, the Multivariate ENSO Index. Then there are pressure-based indices like ENSO, based on the difference in pressure between Tahiti and Darwin, Australia.

There’s an odd wrinkle in the cold tongue index (CTI), however. This is that the CTI is not detrended. Instead, they subtract the global average sea surface temperature (SST) from the average temperature in the CTI area of 6°N/S, 180° to 90° W.

But this means that they’ve included, not just the average temperature of the CTI area, but also the entire global SST as a part of their explanatory variable, because:

CTI Index = CTI Sea Surface Temperature – Global Mean Sea Surface Temperature

I ask again … if you are trying to emulate the “global annual mean near-surface air temperature for the period 1900-2011″, will it help if an explanatory variable contains the global mean sea surface temperature for the period 1900-2011 … or again, is that just cheating?

I have to say the same as I said before … cheating. Using some portion of this year’s global temperature data (e.g. North Atlantic SSTs or CTI SSTs or global SSTs) to predict this year’s global temperature data is not a valid procedure. I’m sure my beloved and most erudite friend Lord Monckton could tell us the Latin name of this particular logical error, but Latin or not … you can’t do ‘dat …

Which is why, although the authors seem to be impressed that including the AMO increased the adjusted R^2 up to 0.94, I’m not impressed in the slightest. You can’t use any part of what you are trying to predict as a predictor. See how the AMO index (bottom right, Fig. 1) goes down until 1910, then up until 1940, down until 1970, and then up again? Those are the North Atlantic version of the very swings in temperature that we are trying to explain, so you absolutely can’t use them as an explanatory variable.

PROBLEM THE SECOND

Let’s look at just the forcings used in the climate models, setting aside the ENSO and AMO variables. Chylek 2014 uses the GISS forcings, which are composed of the following separate datasets:

GISS forcings used in chylekFigure 1a. The ten categories of forcing in the GISS forcing dataset. Note the different scales for each panel.

Now, for anybody that thinks that e.g. ozone levels in the atmosphere actually look like that … well, seems highly doubtful. But while that is a problem in and of itself, that’s not the problem in this context. The problem here is that all of these are measured in watts per square metre (W/m2). As a result they should all have the same effect … but Chylek et al. do a strange thing. They add together the well-mixed ghgs plus ozone plus stratospheric H2O into one group they call “GHGs”. Then they put reflective aerosols, aerosol indirect, black carbon, and snow albedo into a second group they call “Aerosols”. Volcanic forcing are treated as a third separate group, solar is the fourth, and land use is ignored entirely. This grouping is shown in Figure 1 above.

Then each of these four groups (GHSs, Aerosols, Volcanoes, and Solar) gets its own individual parameter in their equation … but this means that a watt per square metre (W/m2) from aerosols and a W/m2 from solar and a W/m2 from GHGs all have a very, very different effect … they make no effort to explain or justify this curious procedure.

PROBLEM THE THIRD

Here’s an odd fact for you. They are impressed that they can get an R^2 of 0.88 or something like that (if they cheat and include the entire global SST within the “explanatory” variables of their model). I can get close to that, 0.87. However, let’s start by calculating the R^2 of a much simpler model … the linear model. Figure 2 shows the GISS Land-Ocean Temperature Index (LOTI), and a straight-line emulation. The odd fact is the size of the R^2 of such a simplistic model …

chylek emulations Linear_TrendFigure 2. The simplest possible straight-line model. Black is GISS LOTI, red is the emulation.

Note that the R^2 of a straight line is quite high, 0.81. So their correlation of 0.88 … well, not all that impressive.

In any case, here are a few more emulations, with their corresponding adjusted R^2. First, Figure 3 shows their group called “aerosols” (AER) along with the volcanic forcing (VOL):

chylek emulations AER VOLCFigure 3. Emulation using the Chylek groups “Aerosols” (AER) and “Volcanic Forcing” (VOL). Note that watt for watt, the aerosols have about six times the effect of the volcanoes.

Now, even this bozo-simple (and assuredly incorrect) emulation has an adjusted R^2 of 0.854 … or, if you don’t like the use of aerosols, Figure 4 shows the same thing as Figure 3, but with GHGs in place of aerosols:

chylek emulations GHG VOLCFigure 4. Emulation using solely GHGs (GHG) and volcanoes (VOLC). Note that watt for watt, the GHGs have about three times the effect of the volcanoes.

There are a couple of issues revealed by this pair of analyses, using either GHGs or aerosols. One is that you can hardly see the difference between the two red lines in Figures 3 and 4. Obviously, this means that getting a good-looking match and a fairly impressive-sounding adjusted R^2 means absolutely nothing about the underlying reality.

Another issue is the difference between the strengths of the supposedly equivalent W/m2 values from GHGs, aerosols, and volcanoes.

Having seen that, let’s see what happens when we use all of the Chylek forcings except the cheating forcings (ENSO and AMO). Figure 5 shows the emulation using the sun, the aerosols, the volcanoes, and the greenhouse gases:

chylek emulations GHG AER VOLC SOLFigure 5. Emulation using the four Chylek at al. groupings (greenhouse gases GHG, aerosols AER, volcanic VOLC, and solar SOL) of the ten GISS forcings.

Note that again, watt for watt the volcanoes are only about a third of the strength of the GHGs. The solar forcings are quite strong, presumably because the solar variations are quite small … which highlights another problem with this type of analysis.

So that’s the third problem. They are giving different strengths to different types of forcings, without any justification for the procedure. Not only that, but the variation in the strengths is three to one or more … I see no physical reason for their whole method.

PROBLEM THE FOURTH

Now we’ve seen what happens when we’re not cheating by using a portion of the dependent variable as an explanatory variable. So let’s start cheating and add in the ENSO data.

chylek emulations GHG AER VOLC SOL ENSOFigure 6. Uses all of the GISS forcings plus the ENSO cold tongue index.

As I said, I couldn’t quite replicate their 0.88 value, but that comes close.

Now, before I go any further, let me point out a shortcoming of all of these emulations in Figs 2 to 6. They do not catch the drop in temperatures around 1910, or the high point around 1940, or the drop from around 1940 to 1970. Even including all of the forcings, and (improperly) giving them different weights, Figure 6 above still shows these problems.

However, all of these global average temperature changes are clearly reflected in the corresponding temperature changes in the North Atlantic ocean … take another look at the bottom right panel of Figure 1. And so of course when they (improperly) include the AMO as an explanatory variable, you get a much better adjusted R^2 … duh. But it means nothing.

chylek emulations GHG AER VOLC SOL ENSO AMOFigure 7. Emulation using all of the variables, including the Atlantic Multidecadal Oscillation (AMO).

PROBLEM THE FIFTH

All of the above is made somewhat moot by a deeper flaw in their analysis. This is the lack of any lagging of the applied forcings. IF you believe in the forcing fairy, then you have to believe in lags. Me, I don’t think that the changes in global average temperature are a linear function of the changes in global average forcing. Instead, I think that there are strong emergent temperature regulating mechanisms acting at time scales of minutes to hours, largely negating both the changes from the forcing and any associated lags. So I’m not much bothered by lags.

But if you think that global average temperature follows forcing, then you need to do a more sophisticated lagged analysis involving at least one time constant.

CONCLUSIONS

• I find the analysis in Chylek 2014 to be totally invalid because they are including parts of the dependent variable (ENSO and AMO) as explanatory variables. Bad scientists, no cookies.

• As is shown by the examples using either GHGs or aerosols plus volcanoes (Figs. 3 & 4), a good fit and an impressive adjusted R^2 mean nothing. We get equally strong and nearly indistinguishable results using either GHGs or aerosols. This is an indication that this is the wrong tool for the job. Heck, even a straight line does a reasonable job, R^2 = 0.81 …

• Giving different weights to different kinds of forcing (e.g. volcanic, solar) is a novel procedure that requires strong physical justification. To the contrary, they have not provided any justification for the procedure.

• As you add or vary the explanatory variables, their parameters change. Again, this is another indication that they are not using the right tool for the job.

• The lack of any consideration of lag in the analysis is in contradiction to their assumption that changes in the global surface temperature are a linear function of changes in global average forcing.

Best to everyone,

w.

De Rigeur: If you disagree with something I or anyone else says, please quote their exact words. That way, we can all be clear on exactly what you are objecting to.

LINKS:

Chylek Paper

Bob Tisdale’s Analysis

Nic Lewis’s analysis

DATA:

GISS Forcing

CTI

GISS LOTI

AMO

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

67 Comments
Inline Feedbacks
View all comments
March 16, 2014 4:52 pm

“I notice you are lumping all skeptics together.”
just the ones that howl inconsistently. Petr sent me this paper a bit ago.
my first observation was the same as Willis’. hmm dont use AMO like that.

Frank Kotler
March 16, 2014 4:59 pm

Paul Linsay says:
March 16, 2014 at 4:23 pm

I just noticed that and scrolled down here to comment. Beat me by “that” much! Doesn’t effect the analysis. Thanks, Willis!

Rob
March 16, 2014 5:24 pm

As a trained meteorologist studying this for the last 20 years I would say solar and land use should like be number 1 and 2. You drive from the outskirts of town at night and it’s 35 F and in to the downtown area it’s closer to 40. We’ve increased the pop of the world greatly. It’s just a little experience and intuition go a long way

tokyoboy
March 16, 2014 5:53 pm

Ourororos, not Ourboros?

tokyoboy
March 16, 2014 5:54 pm

Oops.
Ouroboros, not Ourboros?

Matthew R Marler
March 16, 2014 7:57 pm

Me, I say it’s cheating. The dependent variable that we are trying to emulate is the global surface temperature. But they have included the North Atlantic temperature, which is a large part of the very thing that they are trying to explain, as an explanatory variable.
But wait, it gets worse. The El Nino index that they use is a fairly obscure one, the “Cold Tongue Index”. It is described as follows (emphasis mine):
The cold tongue index (CTI) is the average SST anomaly over 6N-6S, 180-90W (the dotted region in the map) minus the global mean SST.

We come back to the case that if U and V are random variables, and if W is defined as W = U – V, then W is by definition positively correlated with U and negatively correlated with V. The more general case is discussed by T. W. Anderson, An Introduction to Multivariate Statistical Analysis, second edition, Theorem 2.4.1 p 25. He adds the distributional result for the case that U and V have a joint bivariate normal distribution, but the result for mean and variance is general (at least when U and V have variances). I have been waiting for the right time to bring this to your attention.
So, yes, more than just the high R^2 value is necessary in order to show that something has been revealed about the processes generating the nice regression.
On the other hand: (1) now that it is published, the paper is likely to be studied in graduate schools; (2) the importance of backward elimination for partially resolving problems of multicollinearity of predictors is nicely illustrated; (3) now that the paper has been published, other modelers are likely to follow with improved models; (4) the test of the utility of the model will be how well it (and descendents) fit out of sample data; (5) as I wrote before, it is “of a piece with” Vaughan Pratt’s modeling, in that it attempts to develop reasonable models for the background variation for getting the best estimate of the regression coefficient for the log[CO2] regressor; (6) exactly what the best measure of the ENSO oscillation is will only be elucidated by more work. You presented a lot of information in graphical form in your “Power Stroke” post, and in a response to a comment you outlined a lot of work on developing a (possibly nonlinear) high dimensional spatio-temporal vector autoregressive approach to modeling the transport of heat throughout the Pacific and toward the poles. A better approach in the mean time would probably be the relatively simple Southern Oscillation Index, which is merely a difference between two locations in the Pacific.
Whether log[CO2] is the best measure of the influence of CO2 also requires more work. You can model the last 150 years of temperature data equally well with and without it, but without an adequate knowledge of variation independent of CO2 we can’t test whether the modeled CO2 effect is or isn’t statistically significant. All of the models for natural variation are post-hoc. I made the same point when Vaughan Pratt’s model was posted here.

Matthew R Marler
March 17, 2014 6:44 am

Willis, I know you don’t like being paraphrased so I went back and reread your comments on the power stroke. Here is one that I came up with, though several referred to possibly more analysis along these lines:
I called this post “The Power Stroke” to emphasize that what most people think is important about El Nino/La Nina analyses (temperatures and frequency of occurrence) is secondary. The point is not whether the temperatures in the El Nino region are up or down, nor is it how many times they go up and down.
The point is the size of the power stroke—how much warm water was actually moved? It doesn’t matter whether that happens in three big pushes or ten small pushes. By that I mean people get caught up in counting the frequency of the events, when that misses the point. The critical measurement, which as far as I know has not been done but I’m always surprised when some commenter finds it, is the total amount of energy moved per occurrence/month/year/decade.
In any case, Bob, I’m thinking it should be possible to use the Argo data to get some good numbers regarding how much warm water is being moved. And I’m hoping I’m not the one that ends up doing the shovel work … what we need to look at in my opinion is the total energy content of the mixed layer in the region shown in Figure 2 above. Who was it that did the great Argo work here on WUWT? A commenter named Andrew? Maybe he’d be interested …

I thought of (possibly nonlinear) vector autoregressive processes because they are simpler than complete sets of nonlinear partial differential equations and compartment models. My point is that I think you are correct to focus on mass and heat transport in the Pacific Oscillations. But it is a massive problem.

rgbatduke
March 17, 2014 6:47 am

I probably shouldn’t even comment at all, as I’m too busy to comment and continue to discuss this week (sigh) but:
a) Nonlinear highly multivariate predictive modelling is one of my professional games. I have a goodly amount of personally written high end software for building e.g. genetic algorithm optimized neural networks with as many as hundreds of inputs built to predict subtle, highly nonlinear, highly multivariate behavior patterns on top of e.g. large demographic and transactional databases.
One of the first things one learns in this business professionally is that if your model is too good to be true, it is, well, too good to be true! It usually means, as Willis notes, that some of your input variables are consequences of the factor you are trying to predict, not causes. This happens all of the time, in commerce, in medical research, in mundane science. When you are trying to predict a training set on the basis of a variable that only gets filled in for customers that actually bought some product, the model building process will quickly learn that if it hotwires this variable straight through the complexity it can get pure training set gain. Indeed, you can build a model that will even work well for trial sets — drawn from the same process, that fills in the variable based on the targeted outcome. The problem comes the day you actually try to take your model, that you built for some client, and apply it to a million expensive new prospects that don’t have any value at all for that variable because they haven’t purchased the product — yet — and your expected phenomenally high return withers to where one barely beats random chance because your neural model was distracted by the meaningless correlation from investing neurons in the far more subtle and difficult patterns that might have led to actual substantial — but far more modest — lift above random chance.
The same problem exists — only worse — in medical research. Researchers have to be enormously careful in their model building process not to include “shadow variables” — variables that confound the desired result by having dependent values, e.g. from a test administered asymmetrically to the cancer survivors (who are alive to be tested!) compared to the non-survivors (who aren’t).
b) Regression models in general presuppose a functional understanding of the causal relationship. Multivariate linear regression models presuppose independent, linear causes. That is, if one doubles (say) the solar forcing, one doubles the linear contribution to the overall average surface temperature (which is, note well, not what is being fit because it is unknown within error bars the width of the entire temperature anomaly variation plotted, leading us to consider the problem of fitting a linear model with an unknown constant term with a high degree of uncertainty and the puzzling alteration of R^2 from anything you like to almost nothing attendant on its inclusion but that is another story) If one simultaneously doubles the aerosol screening, it will cause a doubling of a subtracted term. But there won’t be any nonlinear correction to the aerosol term from the solar term — such as might exist if the solar-radiation-aerosol connection turns out to be a physically valid and significant, such that increasing solar activity causes decreased average aerosol contribution or worse, directly mucks around with albedo.
But that isn’t quite right, is it? We aren’t talking about doubling solar activity. We are talking about doubling a solar activity anomaly — if we doubled solar activity, we would be in serious trouble. In one sense, this is a better thing. From the humble Taylor series, we can expect that even if there is a nonlinear direct functional relationship between solar state and temperature, for small changes in solar state the temperature should respond linearly with small changes of its own.
The problem then becomes “what’s a small change?” Or, when do second or higher order non-linear effects become important. The Taylor series itself cannot answer that, but the assumption of linearity in the model begs the question! A multivariate linear independent assumption only makes the problem worse. At some point, one is making the elephant wiggle its trunk, still without proposing anything like a meaningful model of an elephant.
Personally, I think there is little doubt that the climate is a non-linear, strongly-coupled system. Here’s why. Because it is stable. It oscillates around long-term stable attractors. It bounces. It varies substantially over time. The linear part of that is nearly all Mr. Sun, which provides a baseline energy flux that varies very slowly over geological timescales. Within the very broad range of possible climates consistent with this overall forcing, the climate system is non-Markovian, chaotic, nonlinear, multivariate, strongly coupled, and with whole families of attractors governing transient cyclic dynamics that bounce the planet’s state around between them while the attractors themselves move around, appear, and disappear. Linearizing this dynamics as if the linear model has predictive value is utterly senseless. So is (don’t get me started!) fitting nonlinear models, voodoo models, models with confounding variables, models that connect the phases of the moons of Jupiter — no matter how good correlation, with an absolute super R^2 one can come up with in a six or eight parameter model, I can come up with the latin for that one myself: post hoc ergo propter hoc. Correlation is not causality, and in fact can often either get causality backwards — as in this case — or get correlation by pure accident.
There is nothing magical about Taylor series. There is nothing magical about Weierstrauss’ theorem (a general justification for fitting higher order polynomials to bounded segments of data in such a way that will almost never extrapolate outside of the fit data).
c) OK, so if we can’t just take a bunch of variables and fit a predictive multiple linear regression model with a great R^2 and expect it to actually, well, predict (especially when the predictors included are physical consequences of the predicted values as much as they may well be part of their future physical causes in a nonlinear, non-Markovian dynamical system!) what can we do?
That is simple. We can either try the microscopic route — solve the Navier-Stokes system at sufficiently fine spatiotemporal resolution and see if we can eventually predict the climate, or we can try to build a serious highly multivariate nonlinear predictive model (using e.g. neural networks) without inserting any prior assumptions about functional relationships — neural networks are general nonlinear function approximators that are empirically optimized to fit the training data in a way that further optimally predicts trial data and that ultimately has to predict reality before being considered “proven” in any sense at all) or we can try something in between — build semi-empirical non-Markovian models that operate microscopically but also take into account the causal linkages to large scale climate quasiparticles like ENSO and the NAO. But these models will not be simple empirical fits and they are difficult to abstract theoretically.
In physics it would be the difference between building a model of interaction between e.g. surface plasmons (coherent collective oscillations of many, many electrons at the surface between two materials) and building a microscopic model that solves the underlying many electron problem for the complete system. The latter is computationally intractible — as is solving the N-S equations for the planet and its oceans driven by a variable star. The former requires substantial insight, and for it to work, of course, the electrons have to self-organize into coherent collective oscillations in the first place that can then be treated as “quasi-particles” in a higher order theory.
Climate science is stuck because it won’t admit that it cannot solve the N-S equations in any predictively meaningful way — GCMs simply do not work well enough at their currently accessible resolution, and may well never work well enough. We will never be able to solve the many electron model precisely for any large quantum system, and we know it, which is what motivated the search for approximate and semi-empirical methods that could come close, often based on quasiparticles (like plasmons) or on semi-empirical functional forms (like density functionals built with some theoretical justification but with adjustable parameters that are set so that the models work, sort of, mostly, in limited regimes).
rgb

March 17, 2014 8:07 am

Thanks, Willis. An excellent article.
Thanks for reminding me of the “Bestiary” from Jorge Luis Borges, where I first learned about the Ouroboros.
And yes, the Chylek et al. paper is not worth the electrons it is made of. No honest thinking seems to have been wasted in it.

timetochooseagain
March 17, 2014 10:51 am

Using different weights for different forcings makes sense if the magnitude of some of them is not actually known, but the time history’s “shape” is. Though neither is really the case with aerosol forcing. It’s really just a case of fooling yourself with an over fit. I’ve done a fun exercise that illustrates this well by creating a “fit” model to the data that is obvious nonsense but does very well.

Steve Fitzpatrick
March 17, 2014 12:50 pm

Hi Willis,
I have no problem with using an ENSO index to account for short term variation; the mechanism by which ENSO influences temperatures (especially in the tropics) is pretty well understood, and there is no reason to believe that the trend in average surface temperature is diving the ENSO rather then the other way around. There is a lag (~3 – ~4 months) between the Nino 3.4 index and the average surface temperature, indicating the direction of causation is ENSO—> global temperature; it’s had to see how the Nino 3.4 can lead the short term variation in global surface temperature by several months if the direction of causation is actually the opposite.
I agree that using the AMO index is more problematic, because the mechanism is not clearly understood, though there is at least some indication the AMO is physically related to the rate of Atlantic overturning. But since that mechanism remains somewhat speculative, the AMO could in fact just be a reflection of trend in average temperature. If actual measurements of the Atlantic overturning rate were used instead of the AMO index, then I think that would be a more defensible regression variable.
I completely agree with your critique of the absence of lag applied to forcings (which ought to be modeled with at least two lag functions… faster and sower), and also agree with your critique of the (absolutely nutty) acceptance of large differences in the effects of individual forcings which are all expressed in units of watts/M^2. The most rational approach is to sum the individuals into a single forcing function, then apply a suitable lag function, before regression against temperatures.

March 17, 2014 7:22 pm

“Does anyone but me see the huge issue inherent in including the Atlantic Multidecadal Oscillation (AMO) Index among the explanatory variables when trying to emulate the global surface temperature?”
I see a bigger one, the AMO functions as a negative feedback to solar forcing. Low solar plasma speeds gives negative NAO/AO phases, and the altered atmospheric circulation increases the poleward warm sea water transport, which is why the AMO rises sharply from 1995. Continued higher forcing results in a negative AMO.

Robin Edwards
March 18, 2014 7:28 am

I am delighted to be reading this thread. The submissions since about 12.00 pm on 16th March have been most instructive for me, with insightful (I think I’m right here!) comment and maths and statistics. I particularly enjoyed RGB’s contribution (still enjoying it) and thinking about all that expertise that is not getting tapped by climatologists.
I have a very small contribution/comment to make, regarding CO2’s influence. I believe that the very close linear fit of CO2 concentration in the atmosphere to time effectively makes it redundant as a predictor. OK, it varies by the month very regularly, but climate/temperature with all its complexities must lag CO2 by a substantial time. It certainly won’t respond on a monthly basis. It happens to be the only possible forcing that humanity just might be in a position to “control” – though there’s little sign of that yet! So, use CO2 concentration or Time, but not both.
For any climate model or prediction to have any practical value it seems to me that the potential predictors /must/ lead the dependent variable, frequently global surface temperature, by a substantial time interval. If one uses current values of the predictors the stats reduce to not a lot more than correlations, which interesting though they might be are of theoretical, not practical value. For that we need leading indicators. So the question that arises is what is a useful lead time for decision makers? Is it weeks, months or years? As an industrial bench chemist it dawned on me after several years that useful technical discoveries (that is profitable ones) seemed to depend on unusual circumstances, such as an unexpected and significant two factor interaction – to use analysis of variance terminology – and these approach frequency of the fabled hen’s teeth in the real world. In climate, with its horrendously complex possibilities that RGB so fluently described I fear that the magic predictive combination of potential independents and their interactions are destined to remain mysterious, even assuming the (unlikely) event that they exist and are linear in their actions.
I hope to read more input wisdom from people like RGB and Matthew R Marler

James D. Klett
March 24, 2014 3:11 pm

A few comments in response to a rather witty and decidedly stringent critique (speaking only for myself and not my co-authors):
A principal goal of our study was to estimate the relative contributions of anthropogenic and natural climate forcing on the recent global mean temperature history of the atmosphere. To this end we employ a straightforward empirical statistical analysis via a regression model wherein we assume a linear relation between the observed temperature and a set of physically distinct and plausible explanatory variables (predictors). A typical set of explanatory variables includes the known radiative forcings, and additional factors characterizing the oceanic influence on climate. In this latter category we include the AMO (Atlantic Multi-decadal Oscillation) index, which is intended to represent some important aspects of the large scale oceanic thermohaline circulation that transports heat from tropical to polar regions. However, the AMO is presently measured by monitoring the sea surface temperatures (SST) of the North Atlantic, and this makes it vulnerable to the criticism that it really amounts to just a regional temperature time series, which therefore should not be put in the category of a predictor or forcing term, but rather a response thereto. This is the position taken by Willis, who asserts that to regard the AMO as a predictor basically amounts to cheating. Accordingly, a closer look at what the AMO represents is warranted.
Of course at the outset one can defend the reasonableness of the premise that the AMO as measured represents primarily a forcing effect of an ocean circulation, as the point of constructing the index was to somehow capture the capacity of ocean currents to transport heat anomalies to the atmosphere. Although the physics of this transport and the accompanying exchange processes are poorly understood, at least the qualitative notion that it is occurring seems quite sound. Otherwise, if it were just to be regarded as no more than a kind of proxy regional temperature measurement, the AMO would devolve into insignificance and would rarely be invoked as a determinant influence in any study. Of course, this is a rather weak argument based partially on semantics and the definition of the AMO, so more is needed.
We are relying on detrended North Atlantic SST measurements representing the temperature history of the mixed surface layer in the North Atlantic ocean. Energy exchange across the sea-air boundary is often dominated by an upward transfer of sensible, latent, and heat radiation from sea to air, given the dominance of density, heat capacity, and latent heat of water relative to vapor in air. Temperature changes in the atmosphere will also drive heat fluxes into the layer, but the huge heat capacity of the surface layer relative to that of the atmosphere implies that short term atmospheric trends will be damped out, so the rationale for linear detrending to isolate a true large scale signal seems justified.
As for the reality of that large scale signal, there is considerable empirical evidence available to demonstrate it. For example, from the recent article:
North Atlantic Ocean control on surface heat flux on multidecadal timescales
By: Gulev, Sergey K.; Latif, Mojib; Keenlyside, Noel; et al.,
NATURE Volume: 499 Issue: 7459 Pages: 464-+ Published: JUL 25 2013, we have the following assessment:
“Direct evidence of the oceanic influence of AMV (Atlantic Multidecadal Variability imprinted in SST’s) can only be provided by surface heat fluxes, the language of ocean-atmosphere communication. Here we provide observational evidence that in the mid-latitude North Atlantic and on timescales longer than 10 years, surface turbulent heat fluxes are indeed driven by the ocean and may force the atmosphere, whereas on shorter timescales the converse is true, thereby confirming the Bjerknes conjecture.”
And as for the nature of the signature, it is known that instrumental sea surface temperature records in the North Atlantic Ocean are characterized by large multidecadal variability. The lack of strong radiative oscillatory forcing of the climate system at multidecadal time scales and the results of long unforced climate simulations have led to a consensus view that the AMO is an internal mode of climate variability. An examination of this hypothesis carried out by Jeff Knight using simulations based on the Coupled Model Intercomparison Project Phase 3 (CMIP3) database has shown that:
” The differences found between observed and ensemble mean temperatures could arise through errors in the observational data, errors in the models’ response to forcings or in the forcings themselves, or as a result of genuine internal variability. Each of these possibilities is discussed, and it is concluded that internal variability within the natural climate system is the most likely origin of the differences.”
(The Atlantic Multidecadal Oscillation Inferred from the Forced Climate Response in Coupled General Circulation Models
By: Knight, Jeff R.
JOURNAL OF CLIMATE Volume: 22 Issue: 7 Pages: 1610-1625 Published: APR 2009 )
Our study also includes an empirical test that supports the idea that the AMO represents an important driver for the oceanic influence on climate. The conventional point of view of the climate modeling community is that rising greenhouse gases (GHG) are strongly correlated with (and largely responsible for) rising global temperatures. Therefore, if the AMO were just a proxy for global temperature, as Willis asserts, then the AMO should also be strongly correlated with rising GHG. But we tested that hypothesis and found it failed: in our assumed space of physically distinct forcing functions, the basis vectors representing AMO and GHG were found to be nearly orthogonal. Given the significant correlation found between AMO and global temperature, this strongly suggests that the AMO is a valid driver of global temperature, which at least over the ~100 yr. time period considered has operated essentially independently of the effects of GHG.
For shorter timescales of the order of a decade or less, the study by Gulev et al. (see above) suggests there may be some net transport of turbulent heat flux from atmosphere to ocean, i.e., on this scale the premise that the AMO is an (independent) explanatory variable responsible for the behavior of the (dependent) air temperature does indeed become questionable. And in fact, our regression analysis has shown that to some extent the AMO can subsume the roles played by short term radiative forcings due to solar input, volcanoes, and the El Nino/Southern Oscillation (ENSO). This is evidence that some imprinting of those explanatory variables on the SST measurements of the North Atlantic does in fact occur, and therefore on this short timescale the AMO should not be regarded as an independent driver of air temperature.
But once again we should reiterate that on the longer timescales of interest in our study (decadal and more), Gulev et al. find the energy flow is from ocean to atmosphere, in support of the use of the AMO as an independent predictor of air temperature in regression models.

A separate complaint by Willis has to do with our selection of categories of radiative explanatory variables. Here we were simply trying to take into account the fact that radiative forcing factors that operate according to different physical mechanisms should be treated separately, at least in principle, even though our simple regression model doesn’t take those differences into account explicitly. Our point of view in this case has been understood and defended earlier in these blog comments by Martin Lewitt, which we repeat here:
Martin Lewitt says:
March 16, 2014 at 9:33 am
Assigning different weights to different W/m^2 forcings is actually a refreshing admission of complexity and nonlinearity. Each of the forcings coupled to the climate differently in vertical and geographical distribution and in some cases chemically (as in Solar generating ozone), etc. In a nonlinear dynamic system it is the assumption that they were all equivalent that would have to be justified. Representing each forcing by the variation in a globally and annually averaged W/m^2 figure is a poor proxy for these coupling differences, but then a grid from the ocean mixing layer to the stratosphere is the reason we have AOGCMs. The reason the author didn’t explain the allowing of different weights is that it is common knowledge, e.g., here are other refreshing acknowledgements of the implications of non-linear dynamics. Knutti and Hegerl state in their 2008 review article in Nature Geoscience:“The concept of radiative forcing is of rather limited use for forcings with strongly varying vertical or spatial distributions.”and this:“There is a difference in the sensitivity to radiative forcing for different forcing mechanisms, which has been phrased as their ‘efficacy’”

James D. Klett
March 30, 2014 9:59 pm

Willis, It appears very little headway has been made in the narrowing of differences of perception as to what was or was not accomplished in this particular climate study, but at a minimum some civil discourse (at least by the standards so often seen in this polarized arena of “climate change”) is on the record here for others to ponder as they wish, and that may prove helpful to some.
In a situation like this there are rapidly diminishing returns expected from extending the discussion further. Nevertheless, I will respond briefly to a couple of questions, since the answers illustrate some principles under debate with specific and easy to understand examples.
Question 1:
“Siberia is as well correlated with the global temperature as is the North Atlantic … does that significant correlation imply that Siberia is a “valid driver of global temperatures”???”
If by “Siberia” you mean air temperatures measured at meteorological stations located in Siberia, then the answer would be “no”; i.e., in this case a subset of near-surface global air temperatures are being measured. Since no exceptional physical influences are expected at such Siberian stations compared to other locations on the planet, we expect these temperature measurements to be fairly representative of the global mean values, allowances being made for the latitudinal effect of reduced solar insolation in Siberia.
But note there was a time when the answer would have been an emphatic “yes”: About 250 million years ago, a mantle plume erupted through the crust in Siberia, which likely caused the Permian-Triassic extinction event. According to Wikipedia, this event, “also called the Great Dying, affected all life on Earth, and is estimated to have killed 90% of species living at the time”. So when there was a flux of energy across the Earth’s surface to the air in Siberia, that flux and the attendant surface temperature were at that time valid drivers of global air temperature.
Similarly, today on a much smaller scale there is considerable evidence that there is a time dependent net flux of energy across the surface of the North Atlantic into (and at times out of) the air above, and the AMO is thought to provide a signal for that phenomenon. To the extent that is true, the AMO can also be employed as a valid driver of global air temperature.
The fact that the AMO signal is a temperature time series for the surface mixing layer does not by itself disqualify it in this role. There seems to be an idee fixe in play that because the AMO is a temperature series, like the global mean air temperature, it must somehow automatically be equivalent to the latter, or at least to “a good chunk” of the latter; this is a kind of association fallacy. One should keep in mind, for example, that in addition to the oceanic thermo-haline heat flux being monitored, the connection between temperatures in the mixing layer and the air above involves many nearly, and some outright, intractable, stochastic and nonlinear processes; the two temperature series are simply not physically equivalent surrogates. In this regard, too, it is worth emphasizing again that the SST measurements are not taken right at the ocean surface, nor are they intended to describe the temperature there:
“All of these ship and buoy SSTs are estimates of some type of bulk SST, which does not actually represent the temperature at the surface of the ocean. This bulk SST is of significant historical importance since it has been used in the formulation of all of our so-called “bulk” air/sea heat flux formulae and because it supplies an estimate of the local heat content. Numerical models today require the input of some form of bulk SST for their computation in spite of the fact that it is the skin SST that is in contact with and interacts with the overlying atmosphere. Some people think that the difference between the skin and bulk SSTs is a constant to account for the cooler skin temperatures. This is not the case as the skin SST is closely coupled to the atmosphere-ocean exchanges of heat and momentum making the bulk-skin SST difference a quantity that varies with fairly short time and space scales depending on the prevailing atmospheric conditions (wind speed and air-sea heat flux).”
(Estimating Sea Surface Temperature from Infrared Satellite and In Situ Temperature Data, Emery, W. J.; Castro, Sandra; Wick, G. A.; Schluessel, Peter; Donlon, Craig; Bulletin of the American Meteorological Society . Dec2001, Vol. 82 Issue 12, p2773. http://icoads.noaa.gov/advances/emery.pdf)
Question 2:
“I assume you are familiar with the story of Freeman Dyson and the
elephant? ”
This is an amusing story, and I’m a great fan of any anecdotes about the legendary figures. But when John von Neumann said “… with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”, he was talking about parameters that could be adjusted arbitrarily. For example, for the case of a line, y = ax + b , you can produce them located anywhere and pointing any which way in the plane, i.e., of any slope and any intercept, by varying parameters (coefficients) a and b independently and arbitrarily. But if you are fitting that line to data or a general function over an interval by a least squares procedure, a unique pair of parameters a and b is determined, within specified tolerances, and we know the corresponding unique line will appear intuitively appropriate, i.e., devoid of arbitrariness. And, if you try fitting a parabola instead, or a cubic, or some higher order polynomial with even more parameters needed for its definition, theory and experience shows that the resulting fit to the data or general function likewise produces a unique solution set of parameters, and the fit usually gets better the
more parameters that are employed in this fashion, i.e., as more relevant fitting functions are
introduced. This is quite analogous to the regression analysis we’ve employed for fitting plausible
explanatory variables to the global mean air temperature. Accordingly, trying to get von Neumann (plus Fermi and Dyson) on our case – a scary thought – is in this instance completely off base.