Show us your tests: Australian drought models

Guest post by Dr. David Stockwell

In Australia, the carbon-tax juggernaut rolls on, justified in part by fear of droughts, increasing in frequency and severity as CO2 increases.

I have always found that checking one’s assumptions was good advice, and with that in mind I checked the models used in a major drought study by the CSIRO and the Australian BoM. The study, the Drought Exceptional Circumstances Report (DECR), was widely used to support the contention that major increases in drought frequency and severity in Australia will result from further increases in CO2.

The results were recently published in the peer-reviewed journal Energy and Environment (PDF).

My paper contributes in the areas of validation of climate models (the subject of a recent post at Climate, Etc.), and regional model disagreement with rainfall observations (see post by Willis Eschenbach).

Specifically, droughts have decreased last century in line with increasing rainfall, but the climate models used in the DECR showed the opposite (and significantly so).

Overall, it is a case study demonstrating the need for more rigorous and explicit validation of climate models if they are to advise government policy.

It is reasonably well known that general circulation models are virtually worthless at projecting changes in regional rainfall, the IPCC says so, and the Australian Academy of Science agrees. The most basic statistical tests in the paper demonstrate this: the simulated drought trends are statistically inconsistent with the trend of the observations, a simple mean value shows more skill that any of the models, and drought frequency has dropped below the 95% CL of the simulations (see Figure).

The larger issue is how to get people to accept that there will always be worthless models, and the task of genuinely committed modellers to identify and eliminate worthless models. It’s not convincing to argue that validation is too hard for climate models, they are the only ones we have got, they are justified by physical realism, or they are ‘close enough’.

My study shows that the obvious testing regimes would have shown the drought models in the DECR study were unfit for use, if they had been tested. I asked CSIRO, but no validation results were supplied.

The concerns of scientists are different to decision-makers. While scientists are mainly interested with the relative skill of models in order to gauge improvements, decision-makers are (or should be) concerned primarily with whether the models should be used at all (are fit-for-use). Because of this, model-testing regimes for decision-makers must have the potential to completely reject some or all of the models if they do not rise above a predetermined standard, or benchmark.

There are a number of ways that benchmarking can be set up, which engineers or others in critical disciplines would be familiar with, usually involving a degree of independent inspection, documentation of expected standards, and so on. My favorite benchmark test is quick and easy: the Nash-Sutcliffe Efficiency, an indicator of whether a model shows more skill than a simple mean value.

I believe that decision-makers should not take results seriously unless rigorous validation of the models is also demonstrated.

It is up to the customers of climate studies to not rely on the authority of the IPCC, the CSIRO and the BoM, and to demand “Show us your tests”, as would be expected with any economic, medical or engineering study where the costs of making the wrong decision are high. Duty of care requires confidence that all reasonable means have been taken to validate all of the models and assumptions that support the key conclusions.

===============================================================

About the author (from his website here)

After receiving a Ph.D. in Ecosystem Dynamics from the Australian National University in 1992, I worked as a consultant (WHO, Parks and Wildlife, Land and Natural Resources services) until moving to the San Diego Supercomputer Center at University of California San Diego in 1997. There I helped to develop computational and data intensive infrastructure for ecological niche modeling mainly using museum collections data with grants from the NSF, USGS and DOT. I developed the GARP (Genetic Algorithm for Rule-set Production) system making contributions in many fields: modeling of invasive species, epidemiology of human diseases, the discovery of seven new species of chameleon in Madagascar, and effects on species of climate change. I have published in major journals and was judged by the US Immigration Service as an Outstanding Researcher, recognized internationally as outstanding in their academic field.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
53 Comments
Inline Feedbacks
View all comments
Geoff Sherrington
March 26, 2011 3:43 am

David kindly showed me a preview of this paper after I had read the CSIRO Hennessy at al report of 2008, Drought and Exceptional Circumstances Report, DECR for short. Some other authors of the DECR have been prominent in IPCC authoring.
In essence, David took the definitions and framework of the DECR, then showed with a few short statistical tests that the paper lacked predictive skill. (His conclusions since 2008 are being confirmed).
Therefore, there is no need for bloggers here to get too nuanced about definitions and special cases. The DECR, from memory, defined 4 classes of drought, but a failure of any one class would fail the whole paper. Prior soil moisture might be important, as noted above, but it’s not the sole determinant. (For some of the earlier period, pan evaporimeter tests were shown inaccurate when it was realised that birds were splashing away in them. They now have a mesh cover.)
Example. There were drought-like conditions in some of the last decade at the mouth of the Murray River. The headwaters of the Murray-Darling are 1,500 km distant as the crow flies. Therefore, a significant drought 1,500 km away could cause drought-like conditions at the mouth. This raises a problem of correlation with temperature. Temperature where? Temperature when? (it can take several months for water to flow from headwater to mouth).
My reaction to graphs central to explaining the CSIRO predictions was unkind. The frequency of droughts in decades before the paper was written increased sharply for decades after the publication date. To me, rightly or wrongly, that’s a sign of confirmation bias.

March 26, 2011 3:55 am

Thanks Geoff. I am really perplexed by such criticisms. Any criticism of the definition of drought I used is a criticism of the definition used in the CSIRO and BoM DECR. The scholarly approach is to defeat your opponent on their own turf.
The other criticism is that the models are ‘close enough’. Here the central issue is that if you intend to use them to forecast the trend, they must show significant skill at modelling the trend in the past. To say they are ‘close enough’ is a sloppy, intuitive attitude that doesn’t get you anywhere in science (as you know Geoff). Both of these complains were made in peer review at the AMM.

Dave N
March 27, 2011 4:53 pm

I guess they figure that they spend enough on the studies to warrant spending something on audits and/or other checks.