An objective Bayesian estimate of climate sensitivity

Guest post by Nic Lewis

Many readers will know that I have analysed the Forest et al., 2006, (F06) study in some depth. I’m pleased to report that my paper reanalysing F06 using an improved, objective Bayesian method was accepted by Journal of Climate last month, just before the IPCC deadline for papers to be cited in AR5 WG1, and has now been posted as an Early Online Release, here. The paper is long (8,400 words) and technical, with quite a lot of statistical mathematics, so in this article I’ll just give a flavour of it and summarize its results.

The journey from initially looking into F06 to getting my paper accepted was fairly long and bumpy. I originally submitted the paper last July, fourteen months after first coming across some data that should have matched what was used in F06. The reason it took me that long was partly that I was feeling my way, learning exactly how F06 worked, how to undertake objective statistical inference correctly in its case and how to deal with other issues that I was unfamiliar with. It was also partly because after some months I obtained, from the lead author of a related study, another set of data that should have matched the data used in F06, but which was mostly different from the first set. And it was partly because I was unsuccessful in my attempts to obtain any data or code from Dr Forest.

Fortunately, he released a full set of (semi-processed) data and code after I submitted the paper. Therefore, in a revised version of the paper submitted in December, following a first round of peer review, I was able properly to resolve the data issues and also to take advantage of the final six years of model simulation data, which had not been used in F06. I still faced difficulties with two reviewers – my response to one second review exceeded 9,500 words –  but fortunately the editor involved was very fair and helpful, and decided my re-revised paper did not require a further round of peer review.

Forest 2006

First, some details about F06, for those interested. F06 was a ‘Bayesian’ study that estimated climate sensitivity (ECS or Seq) jointly with effective ocean diffusivity (Kv)1 and aerosol forcing (Faer). F06 used three ‘diagnostics’ (groups of variables whose observed values are compared to model-simulations): surface temperature anomalies, global deep-ocean temperature trend, and upper-air temperature changes. The MIT 2D climate model, which has adjustable parameters calibrated in terms of Seq , Kv and Faer, was run several hundred times at different settings of those parameters, producing sets of model-simulated temperature changes. Comparison of these simulated temperature changes to observations provided estimates of how likely the observations were to have occurred at each set of parameter values (taking account of natural internal variability). Bayes’ theorem could then be applied, uniform prior distributions for the three parameters being multiplied together, and the resulting uniform joint prior being multiplied by the likelihood function for each diagnostic in turn. The result was a joint posterior probability density function (PDF) for the parameters. The PDFs for each of the individual parameters were then readily derived by integration. These techniques are described in Appendix 9.B of AR4 WG1, here.

Lewis 2013

As noted above, Forest 06 used uniform priors in the parameters. However, the relationship between the parameters and the observations is highly nonlinear and the use of a uniform parameter prior therefore strongly influences the final PDF. Therefore in my paper Bayes’ theorem is applied to the data rather than the parameters: a joint posterior PDF for the observations is obtained from a joint uniform prior in the observations and the likelihood functions. Because the observations have first been ‘whitened’,2 this uniform prior is noninformative, meaning that the joint posterior PDF is objective and free of bias. Then, using a standard statistical formula, this posterior PDF in the whitened observations can be converted to an objective joint PDF for the climate parameters.

The F06 ECS PDF had a mode (most likely value) of 2.9 K (°C) and a 5–95% uncertainty range of 2.1 to 8.9 K. Using the same data, I estimate a climate sensitivity PDF with a mode of 2.4 K and a 5–95% uncertainty range of 2.0–3.6 K, the reduction being primarily due to use of an objective Bayesian approach. Upon incorporating six additional years of model-simulation data, previously unused, and improving diagnostic power by changing how the surface temperature data is used, the central estimate of climate sensitivity using the objective Bayesian method falls to 1.6 K (mode and median), with 5–95% bounds of 1.2–2.2 K. When uncertainties in non-aerosol forcings and in surface temperatures, ignored in F06, are allowed for, the 5–95% range widens to 1.0–3.0 K.

The 1.6 K mode for climate sensitivity I obtain is identical to the modes from Aldrin et al. (2012) and (using the same, HadCRUT4, observational dataset) Ring et al. (2012). It is also the same as the best estimate I obtained in my December non-peer reviewed heat balance (energy budget) study using more recent data, here. In principle, the lack of warming over the last ten to fifteen years shouldn’t really affect estimates of climate sensitivity, as a lower global surface temperature should be compensated for by more heat going into the ocean.

Footnotes

  1. Parameterised as its square root
  2. Making them uncorrelated, with a radially symmetric joint probability density

The below plot shows how the factor for converting the joint PDF for the whitened observations into a joint PDF for the three climate system parameters (on the vertical axis – units arbitrary) varies with climate sensitivity Seq and ocean diffusivity Kv. This conversion factor is, mathematically, equivalent to a noninformative joint prior for the parameters. The plot is for a slightly different case to that illustrated in the paper, but its shape is almost identical. Aerosol forcing has been set to a fixed value. At different aerosol values the surface scales up or down somewhat, but retains its overall shape.

lewis_2013_fig1

The key thing to notice is that at high sensitivity values not only does the prior tail off even when ocean diffusivity is low, but that at higher Kv values the prior becomes almost zero. (Ignore the upturn  in the front RH corner, which is caused by model noise.) The noninformative prior thereby prevents more probability than the data uncertainty distributions warrant being assigned to regions where data responds little to parameter changes. It is that which results in better-constrained PDFs being, correctly, obtained compared to when uniform priors for the parameters are used.

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
87 Comments
Inline Feedbacks
View all comments
Matthew R Marler
April 17, 2013 11:55 am

Crispin in Waterloo but still in Yogyakarta: In my view this work shows that F06 was not executed with the necessary rigour.
I don’t dispute that. All Bayesian analyses should be accompanied by analyses of the sensitivity to the prior used, and the best way to do that, as far as I know, is to use many priors that differ dramatically.
I disputed the claims that this particular choice of prior is “objective” and that the resultant posterior distribution is an improvement. Neither claim is justified. The only justifiable claim is that this is an instance where a different prior produced a different posterior distribution; it’s something we know in principle and this is a worked example.

Matthew R Marler
April 17, 2013 12:05 pm

Crispin in Waterloo but still in Yogyakarta: If you, MRM, wish to take up the cause of personal motivations affecting the scientific discussion,
I don’t doubt that there other instances of personal motivation afflicting climate science. Upon re-reading this post, I admit that my attribution of motivation in this case is weak.

April 17, 2013 5:26 pm

davidmhoffer says:
April 16, 2013 at 1:22 pm
If the PDO for example were to go wildly negative, pushing down global temps for another 10 years, we we conclude that CO2 sensitivity had declined further? That wouldn’t make sense would it?

It would mean either the claimed CO2 forcing is much too high and is in fact well under 1C per doubling (and consequently the late 20th century warming didn’t result from increasing CO2 levels),
OR the Forcing Model/Theory is wrong, and changes in forcings do not result in the changes in climate it predicts.

April 17, 2013 5:41 pm

I have dissembled your first paragraph under “Lewis 2013”. There are several jumps of logic and statistical theorems that are not explained, not obvious, and not supported by reference. Let me comment on these jumps for the purpose of further discussion.
Therefore in my paper Bayes’ theorem is applied to the data rather than the parameters: a joint posterior PDF for the observations is obtained from a joint uniform prior in the observations and the likelihood functions.
I must admit that this is thinking outside the box. But to what end? Are you therefore using Bayesian theory as an excuse to adjust the data to fit a model? What is the Bayesian approach used for on the observations if not to obtain flexibility in the observational data?
Because the observations have first been ‘whitened’, (i.e. made non-correlated),
Why should we believe that observations in a time series, or in a weather map, should be uncorrelated to neighboring points in time and space? Why should an act of whitening not bias the data some way?
this uniform prior is noninformative,
This statement sets off my alarm bells. Frankly, I don’t believe in the non-informative scenario.
For one thing, if a distribution is uniform in one coordinate system, it is not necessarily uniform if transformed into another. Therefore Uniform is not magically pure. The choice of coordinate system is informative and biased by choice of model. Secondly, people usually talk of uniform distributions between defined end-points. The choice of endpoints is an act of information and act of bias. Furthermore, when people are studying a system with Bayesian statistics, some knowledge of the system must already be known — there is a model. To assume that the midpoint of a range is as equally possible as the studied endpoints is an act of bias toward the extremes and away from acquired knowledge. It is a bias toward the chosen endpoints.
One can assume that a six-sided die has a prior distribution that is uniform between 1 and 6 inclusive. However, I once owned a die that contained a 7 (a 6 superimposed with a 1, or 5, of 3) and two 5’s. Had it as a kid, long lost, but a treasured memory. The point being that the distribution of a die is not limited to 1 and 6 inclusive, that a 6 and a 7 are not of equal probabilities, and the probability of a 7 is not zero. To assume a uniform distribution between 1 and 6 and zero at 7 is an informative, biased choice that ignores available information. To assume a uniform distribution between 1 and 7 is also to ignore a lot of available information. I don’t think “non-informative” means “blind”.
meaning that the joint posterior PDF is objective and free of bias.
Sorry, I don’t buy it. The decision to make the distribution uniform (with or without unspecified end points) and the whitening makes the claim that the result is free of bias a stretch.
Then, using a standard statistical formula,
Which standard statistical formula? Normal Gaussian statistics? Why not say so?
this posterior PDF in the whitened observations can be converted to an objective joint PDF for the climate parameters.
No doubt it CAN be so converted. But why should this be free of bias?
But we get back to the “at what end?” Now you have a PDF for the climate parameters. The whole point of Bayesian statistics is that you can take this PDF as a prior distribution, apply new observational data to get a refined posterior distribution. In your case, the second cycle is fundamentally different than your first. Perhaps this is why the addition of six additional years created such a different distribution.

April 18, 2013 6:45 am

Stephen Rasey says:
April 17, 2013 at 5:41 pm

My stats is only what science undergrads learn, but you have articulated issues, some of which raised flags with me. Like why ‘whiten’ the data?
Thanks. Very informative.

April 19, 2013 12:20 pm

While the uniform prior is uninformative, priors that are uninformative and non-uninform are of infinite number. Thus, the posterior PDF lacks the uniqueness that is required by logic.

April 19, 2013 5:21 pm

From Wikipedia

The term “uninformative prior” may be somewhat of a misnomer; often, such a prior might be called a not very informative prior, or an objective prior, i.e. one that’s not subjectively elicited. Uninformative priors can express “objective” information such as “the variable is positive” or “the variable is less than some limit”.

Like I said, I don’t believe in the noninformative scenario. “Uninformative Priors” fly under a false flag for every one of them contain presumptions, not the least of which are the model to which they are applied and the parameters such as choice of endpoints.
On the other hand, if Terry Oldberg is right, that a uniform distribution is only one of many possible uniformative distributions, then the choosing of a uniform distribution over the others is a potential act of bias and not one of indifference. The choice of a uniform distribution is subjective and therefore neither objective nor noninformative.

April 19, 2013 9:58 pm

Stephen Rasey:
It is easy to prove the existence of an infinity of non-informative prior probability density functions over the climate sensitivity, one of which is the uniform prior. I’ve already published a proof of this assertion in the blogosphere. If there is call for it, I’ll publish this proof once again in this thread.

April 19, 2013 11:20 pm

Oldberg,
Whether there are two, three, or an infinite number I think is beside the point. So for the sake of argument, we’ll take as settled that there are always more than one noninformative prior distribution that could be used.
Then are each of these candidate non-informative priors interchangeable so that any could be used and the same posterior distribution will be the result? I cannot fathom how the same result is possible given different priors. It must be that different priors will result in different posteriors if processed with the same set of observations, at least if the prior and observations are not trivial cases.
If priors are not interchangable, then the choice of noninformative prior has a bearing on the result. How then can the choice of one prior over another be a noninformative action?
Does one use a Monte Carlo process to randomly (non-informatively?) choose several different priors and preform an analysis? If the domain of candidate priors is infinite on an unknown number of dimensions, how do we tell whether we have fairly sampled the domain with the Monte Carlo process?
What is actually meant by the word “noninformative” when one prior distribution is chosen out of many and that choice is justified and defended by the analyst? Can there no information guiding the choice?

Reply to  Stephen Rasey
April 20, 2013 9:03 am

Stephen Rasey
You say:
“If priors are not interchangable, then the choice of noninformative prior has a bearing on the result. How then can the choice of one prior over another be a noninformative action?”
I answer:
It’s not the choice of prior that is noninformative but rather is each prior in a set containing many priors that is noninformative. A “noninformative” prior is one that maximizes the entropy of the associated probability distribution function.
In generating posterior PDFs over the equilibrium climate sensitivity (TECS), climatologists select one of the many equally noninformative priors arbitrarily. According to IPCC Assessment Report 4, the uniform prior is popular with climatologists. However, priors that are equally uninformative but non-uniform can be proved to be of infinite number. Each of the many priors yields a different posterior PDF and public policy prescription. If you think this process is illogical, you are right. The multiplicity of the noninformative priors generates violations of Aristotle’s law of non-contradiction.
As TECS is defined on the change in the equilibrium temperature and the equilibrium temperature is not an observable, TECS is not a scientifically viable concept. It has been made to appear viable through concealment of the violations of non-contradiction by the authors of the IPCC assessment reports.

April 21, 2013 11:36 am

Terry,
climatologists select one of the many equally noninformative priors arbitrarily
Are you sure the word “equally” applies? That implies that the priors are interchangeable. But we know that the generation of the posterior is dependent upon the choice of prior, even a noninformative prior. Therefore, there must be material differences in the priors.

(ii) The statistical analysis is often required to appear objective Of course true objectivity is virtually never attainable and the prior distribution is usually the least of the problems in terms of objectivity but use of a subjectively elicited prior significantly reduces the appearance of objectivity Noninformative priors not only preserve this appearance but can be argued to result in analyses that are more objective than most classical analyses[emphasis in original]
(v)…the Je􀀀reys prior seems to almost always yield a proper posterior distribution This is magical in that the common constant ( or uniform) prior will much more frequently fail to yield a proper posterior. Even better the reference prior approach….yield surprising good performance…..(Yang-Berger 1998 p.4-5)

For any class of prior distribution, there is at least one prior distribution that maximizes the entropy for that distribution class. This is the noninformative prior distribution from that class. But there are an infinite number of classes. It is a stretch for me to believe that each of these infinite noninformative priors have the same entropy level.
Uniform distributions (from endpoints a to b) might be noninformative as a class. But not all noninformative prior distributions are uniform. Some of these, unlike uniforms, have non-zero skew.
I assert that all Bayesian studies of TECS that employ a uniform prior distribution result in a posterior distribution with a positive skew. I’d love to see a counter example. ( I think you could create a negative skew by choosing a uniform distribution (a,b) with a deliberately chosen low b near 2. But I digress.) To employ a zero-skew prior distribution when the result is expected (from prior work) to be positively skewed is an element of bias and a bias toward the high end of the a to b range.

I agree with you that Aldrin is the most thorough study, although its use of a uniform prior distribution for climate sensitivity will have pushed up the mean, mainly by making the upper tail of its estimate worse constrained than if an objective Bayesian method with a noninformative prior had been used. – Nic Lewis, Bishop Hill, Jan 12, 2013

Nic Lewis in his 5th paragraph makes the statement (concerning his use of the uniform on the observations)

this uniform prior is noninformative, meaning that the joint posterior PDF is objective and free of bias.

Uniform priors might be noninformative, but it does not follow that the choice of uniform, and particularly the choice of the a,b range of that uniform distribution, is free of bias. If it is not free of bias, it is not necessarily objective. There seems to be an implied claim in many Bayesian papers that “Uniform = noninformative = nonbiased = objective.” No. There is only the appearance of objectivity. When it comes to estimates of physical constants (even if TECS was one), the uniform prior distribution is seldom if ever the best objective prior.
Perhaps the title of this paper really should be: “A noninformative (Bayesian) estimate of climate sensitivity”

Reply to  Stephen Rasey
April 21, 2013 5:04 pm

Stephen Rasey:
Please find proofs of the infinitude of noninformative priors and noninformativeness of the uniform prior below.
Let T designate the equilibrium climate sensitivity. Let X designate a unique interval in T in which the probability density of T is non-nil. Let Xp(X) designate a partition of X into intervals that are of infinitesimal length.
Let P(X) designate a function that maps the elements of X to the associated probabilities. By stipulation, P(X) is constant within each element of the partition Xp(X). Maximization of the entropy of P(X) yields the conclusion that P(X) is a constant. Let this constant be designated by C.
Let i designate a particular element of Xp(X) and let l(i) designate the length of this element. In i, the probability density is C/l(i). Partitions of X are of infinite number. It follows that noninformative priors are of infinite number also.
Among the many noninformative priors is the one in which l(i) is a constant. This prior is the uniform prior. Thus, the uniform prior is noninformative but not uniquely so.
Q.E.D.