While the reactions to Lewis and Crok rage, from intellectually lazy trolling at Dr. Judith Curry’s shop by a trio of commenters using the “nyah, nyah, it’s not peer reviewed!” method, (she now has a technical thread to filter such antics) to the more reasoned technical response from NCAR’s Piers Forster, Jonathan Gregory & Ed Hawkins in Comments on the GWPF climate sensitivity report, and subsequently botched reaction by Sherwood in the Guardian, I thought it might be useful to point out another paper on the evaluation of climate models.
While this paper doesn’t address the issue of climate sensitivity, but focuses more on quantifying and rating divergence, it does have one thing in common with Lewis and Crok; it compares climate model output to observational data.
Though, not wholly directly; it uses the ERA40 reanalysis data (The ERA40 reanalysis data set from ECMWF, is the same reanalysis dataset deriving temperature near the north pole by the Danish Meteorological Institute) it does point out that there are significant divergences from model output and then scores them. I noted how far off GISS-AOM was in their figure 2 which shows hindcast performance from 1961-1990:
Some models scored well in hindcast, staying close to the ERA40 reanalysis data, so they aren’t all bad. The point is that models all give an estimate, and while we can argue that averages/ensembles represent the best guess for reality, the true measure of reality comes back to comparison of real-world data. What Lewis and Crok did is nothing more than evaluate many climate models for their performance against real world data:
In recent years it has become possible to make good empirical estimates of climate sensitivity from observational data such as temperature and ocean heat records. These estimates, published in leading scientific journals, point to climate sensitivity per doubling of CO2 most likely being under 2°C for long-term warming, with a best estimate of only 1.3-1.4°C for warming over a seventy year period.
I think much of the negative reaction to the report has to do with the “who and how” of the Lewis and Crok report production, rather than the content, and that is used as an excuse to ignore it rather than to show why it might be wrong. By saying it isn’t peer reviewed, it makes it easy to say (lazily) that the Lewis and Crok report has no value. However, as we’ve seen recently, where…
Publishers withdraw more than 120 gibberish papers, Conference proceedings removed from subscription databases after scientist reveals that they were computer-generated.
…peer review is no guarantee that one paper is necessarily better than another. The problem with peer review is that it relies on volunteerism, and I suspect many scientists asked to review are often too busy to give the level of commitment required to fully analyze, test, and/or replicate a paper’s data/methodology they are asked to look at.
In the case of Lewis and Crok, because of the high visibility, you can bet there will be many looking at it far more carefully, looking for flaws, than if it had three referees. Either the paper is right or it is wrong, I’m sure we’ll find out in the future, if not by challenges, but by Mother Nature being the arbiter of truth.
This paper is from The Society for Industrial and Applied Mathematics, SIAM:
Evaluating climate models
The simulation of elements affecting the Earth’s climate is usually carried out by coupled atmosphere-ocean circulation models, output from which is used to provide insights into future climate states. Climate models have traditionally been assessed by comparing summary statistics (such as global annual mean temperatures) or point estimates from simulated models to the corresponding observed quantities.
A paper published last December in the SIAM/ASA Journal on Uncertainty Quantification argues that it is more appropriate to compare the distribution of climate model output data (over time and space) to the corresponding distribution of observed data. Distance measures between probability distributions, also called divergence functions, can be used to make this comparison.
The authors evaluate 15 different climate models by comparing simulations of past climate to corresponding reanalysis data. Reanalysis datasets are created by inputting climate observations using a given climate model throughout the entire reanalysis period in order to reduce the effects of modeling changes on climate statistics. Historical weather observations are used to reconstruct atmospheric states on a global grid, thereby allowing direct comparison to climate model output.
View the paper:
It has been argued persuasively that, in order to evaluate climate models, the probability distributions of model output need to be compared to the corresponding empirical distributions of observed data. Distance measures between probability distributions, also called divergence functions, can be used for this purpose. We contend that divergence functions ought to be proper, in the sense that acting on modelers’ true beliefs is an optimal strategy. The score divergences introduced in this paper derive from proper scoring rules and, thus, they are proper with the integrated quadratic distance and the Kullback–Leibler divergence being particularly attractive choices. Other commonly used divergences fail to be proper. In an illustration, we evaluate and rank simulations from 15 climate models for temperature extremes in a comparison to reanalysis data.