Guest Post by Willis Eschenbach
Well, in my last post I thought that I had seen nature at its worst … Nature Magazine, that is. But now I’ve had a chance to look at the other paywalled Nature paper in the same issue, entitled Anthropogenic greenhouse gas contribution to flood risk in England and Wales in autumn 2000, by Pardeep Pall, Tolu Aina, Dáithí A. Stone, Peter A. Stott, Toru Nozawa, Arno G. J. Hilberts, Dag Lohmann and Myles R. Allen (hereinafter Pall2011). The supplementary information is available here, and contains much of the concepts of the paper. In the autumn of 2000, there was extreme rainfall in southwest England and Wales that led to widespread flooding. Pall2011 explores the question of the expected frequency of this type of event They conclude (emphasis mine):
… in nine out of ten cases our model results indicate that twentieth century anthropogenic greenhouse gas emissions increased the risk of floods occurring in England and Wales in autumn 2000 by more than 20%, and in two out of three cases by more than 90%.
Figure 1. England in the image of Venice, Autumn 2000. Or maybe Wales. Picture reproduced for pictorial reasons only, if it is Wales, please, UKPersons, don’t bust me, I took enough flak for the New Orleans photo in Part 1. Photo Source
To start my analysis, I had to consider the “Qualitative Law of Scientific Authorship”, which states that as a general rule:
Q ≈ 1 / N^2
where Q is the quality of the scientific study, and N^2 is the square of the number of listed authors. More to the point, however, let’s begin instead with this. How much historical UK river flow data did they analyze to come to their conclusions about UK flood risk?
Unfortunately, the answer is, they didn’t analyze any historical river flow data at all.
You may think I’m kidding, or that this is some kind of trick question. Neither one. Here’s what they did.
They used a single seasonal resolution atmospheric climate computer model (HadAM3-N144) to generate some 2,268 single-years of synthetic autumn 2000 weather data. The observed April 2000 climate variables (temperature, pressure, etc) were used as the initial values input to the HadAM3-N144 model. The model was kicked off using those values as a starting point, and run over and over a couple thousand times. The authors of Pall2011 call this 2,268 modeled single years of computer-generated weather “data” the “A2000 climate”. I will refer to it as the A2000 synthetic climate, to avoid confusion with the real thing.
The A2000 synthetic climate is a universe of a couple thousand single-year outcomes of one computer model (with a fixed set of internal parameter settings), so presumably the model space given those parameters is well explored … which means nothing about whether the actual variation in the real world is well explored by the model space. But I digress.
The 2,268 one-year climate model simulations of the A2000 autumn weather dataset were then fed into a second much simpler model, called a “precipitation runoff model” (P-R). The P-R model estimates the individual river runoff in SW England and Wales, given the gridcell scale precipitation.
In turn, this P-R model was calibrated using the output of a third climate model, the ERA-40 computer model reanalysis of the historical data. The ERA-40, like other models, outputs variables on a global grid. The authors have used multiple linear regression to calibrate the P-R model so it provides the best match between the river flow gauge data for the 11 UK rainfall catchments studied, and the ERA-40 computer reanalysis gridded data. How good is the match with reality? Dunno, they didn’t say …
So down at the bottom there is some data. But they don’t analyze that data in any way at all. Instead, they just use it to set the parameters of the P-R model.
Summary to date:
• Actual April 2000 data and actual patterns of surface temperatures, air pressure, and other variables are used repeatedly as the starting point for 2,268 one-year modeled weather runs. The result is called the A2000 synthetic climate. This 2,268 single years of synthetic weather is used as input to a second Precipitation-Runoff model. The P-R model is tuned to the closest match with the gridcell precipitation output of the ERA-40 climate reanalysis model. Using the A2000 weather data, the P-R model generates 2,268 years of synthetic river flow and flood data.
So that’s the first half of the game.
For the second half, they used the output of four global circulation climate models (GCMs). They used those four GCMs to generate what a synthetic world would have looked like if there were no 20th century anthropogenic forcing. Or in the words of Pall2011, each of the four models generated “a hypothetical scenario representing the “surface warming patterns” as they might have been had twentieth-century anthropogenic greenhouse gas emissions not occurred (A2000N).” Here is their description of the changes between A2000 and A2000N:
The A2000N scenario attempts to represent hypothetical autumn 2000 conditions in the [HadAM3-N144] model by altering the A2000 scenario as follows: greenhouse gas concentrations are reduced to year 1900 levels; SSTs are altered by subtracting estimated twentieth-century warming attributable to greenhouse gas emissions, accounting for uncertainty; and sea ice is altered correspondingly using a simple empirical SST–sea ice relationship determined from observed SST and sea ice.
Interesting choice of things to alter, worthy of some thought … fixed year 1900 greenhouse gases, cooler ocean, more sea ice, but no change in land temperatures … seems like that would end up with a warm UK embedded in a cooler ocean. And that seems like it would definitely affect the rainfall. But let us not be distracted by logical inconsistencies …
Then they used the original climate model (HadAM3-N144), initialized with those changes in starting conditions from the four GCM models, combined with the same initial perturbations used in A2000 to generate another couple thousand one-year simulations. In other words, same model, same kickoff date (I just realized the synthetic weather data starts on April Fools Day), different global starting conditions from output of the four GCMs. The result is called the A2000N synthetic climate, although of course they omit the “synthetic”. I guess the N is for “no warming”.
These couple of thousand years of model output weather, the A2000N synthetic climate, then followed the path of the A2000 synthetic climate. They were fed into the second model, the P-R model that had been tuned using the ERA-40 reanalysis model. They emerged as a second set of river flow and flood predictions.
Summary to date:
• Two datasets of computer generated 100% genuine simulated UK river flow and flood data have been created. Neither dataset is related to actual observational data, either by blood, marriage, or demonstrated propinquity, although to be fair one of the models had its dials set using a comparison of observational data with a third model’s results. One of these two datasets is described by the authors as “hypothetical” and the other as “realistic”.
Finally, of course, they compare the two datasets to conclude that humans are the cause:
The precise magnitude of the anthropogenic contribution remains uncertain, but in nine out of ten cases our model results indicate that twentieth century anthropogenic greenhouse gas emissions increased the risk of floods occurring in England and Wales in autumn 2000 by more than 20%, and in two out of three cases by more than 90%.
Summary to date
• The authors have conclusively shown that in a computer model of SW England and Wales, synthetic climate A is statistically more prone to synthetic floods than is synthetic climate B.
I’m not sure what I can say besides that, because they don’t say much beside that.
Yes, they show that their results are pretty consistent with this over here, and they generally agree with that over, and by and large they’re not outside the bounds of these conditions, and that the authors estimated uncertainty by Monte Carlo bootstrapping and are satisfied with the results … but considering the uncertainties that they have not included, well, you can draw your own conclusions about whether the authors have established their case in a scientific sense. Let me just throw up a few of the questions raised by this analysis.
QUESTIONS FOR WHICH I HAVE ABSOLUTELY NO ANSWER
1. How were the four GCMs chosen? How much uncertainty does this bring in? What would four other GCMs show?
2. What are the total uncertainties when the averaged output of one computer model is used as the input to a second computer model, then the output of the second computer model is used as the input to a third simpler computer model, which has been calibrated against a separate climate reanalysis computer model?
3. With over 2000 one-year realizations, we know that they are exploring the HadAM3-N144 model space for a given setting of the model parameters. But are the various models fully exploring the actual reality space? And if they are, does the distribution of their results match the distribution of real climate variations? That is an unstated assumption which must be verified for their “nine out of ten” results to be valid. Maybe nine out of ten model runs are unrealistic junk, maybe they’re unalloyed gold … although my money is on the former, the truth is there’s no way to tell at this point.
4. Given the warnings in the source of the data (see below) that “seldom is it safe to allow the [river gauge] data series to speak for themselves”, what quality control was exercised on the river gauge data to ensure accuracy in the setting of the P-R modeled parameters? In general, flows have increased as more land is rendered impermeable (roads, parking lots, buildings) and as land has been cleared of native vegetation. This increases runoff for a given rainfall pattern, and thus introduces a trend of increasing flow in the results. I cannot tell if this is adjusted for in the analysis, despite the fact that the river gauge records are used to calibrate the P-R model.
5. Since the P-R model is calibrated using the ERA-40 reanalysis results, how well does it replicate the actual river flows year by year, and how much uncertainty is there in the calculated result?
6. Given an April 1 starting date for each of the years for which we have records, how well does the procedure outlined in this paper (start the HadAM3-N144 on April Fools Day to predict autumn rainfall) predict the measured 80 years or so of rainfall for which we have actual records?
7. Given an April 1 starting date for each of the years for which we have records, how well does the procedure outlined in this paper (start the HadAM3-N144 on April Fools Day to predict river flows and floods) predict the measured river flows for the years and rivers for which we have actual records?
8. In a casino game, four different computer model results are compared to reality. Since they predict different outcomes, if one is right, then three are wrong. All four may be wrong to a greater or lesser degree. Payoff on the bet is proportional to correlation of model to reality. What is the mathematical expectation of return on a $1 bet on one of the models in that casino … and what is the uncertainty of that return? Given that there are four models, will betting on the average of the models improve my odds? And how is that question different from the difficulties and the unknowns involved in estimating only this one part of the total uncertainty of this study, using only the information we’ve been given in the study?
9. There are a total of six climate models involved, each of which has different gridcell sizes and coordinates. There are a variety of methods used to average from one gridcell scheme to another scheme with different gridcell sizes. What method was used, and what is the uncertainty introduced by that step?
10. The study describes the use of one particular model to create the two sets of 2,000+ single years of synthetic weather … how different would the sets be if a different climate model were used?
11. Given that the GCMs forecast different rainfall patterns than those of the ERA-40 reanalysis model, and given that the P-R model is calibrated to the ERA-40 model results, how much uncertainty is introduced by using those same ERA-40 calibration settings with the GCM results?
12. Did they really start the A2000N simulations by cooling the ocean and not the land as they seem to say?
As you can see, there are lots of important questions left unanswered at this point.
Reading over this, there’s one thing that I’d like to clarify. I am not scornful of this study because it is wrong. I am scornful of this study because it is so very far from being science that there is no hope of determining if this study is wrong or not. They haven’t given us anywhere near the amount of information that is required to make even the most rough judgement as to the validity of their analysis.
BACK TO BORING OLD DATA …
As you know, I like facts. Robert Heinlein’s comment is apt:
What are the facts? Again and again and again-what are the facts? Shun wishful thinking, ignore divine revelation, forget what “the stars foretell,” avoid opinion, care not what the neighbors think, never mind the unguessable “verdict of history”–what are the facts, and to how many decimal places? You pilot always into an unknown future; facts are your single clue. Get the facts!
Because he wrote that in 1973, the only thing Heinlein left out was “beware computer model results.” Accordingly, I went to the river flow gauge data site referenced in Pall2011, which is here. I got as far as the part where it says (emphasis mine):
Appraisal of Long Hydrometric Series
… Data precision and consistency can be a major problem with many early hydrometric records. Over the twentieth century instrumentation and data acquisition facilities improved but these improvements can themselves introduce inhomogeneities into the time series – which may be compounded by changes (sometimes undocumented) in the location of the monitoring station or methods of data processing employed. In addition, man’s influence on river flow regimes and aquifer recharge patterns has become increasingly pervasive, over the last 50 years especially. The resulting changes to natural river flow regimes and groundwater level behaviour may be further affected by the less perceptible impacts of land use change; although these have been quantified in a number of important experimental catchments generally they defy easy quantification.
So like most long-term records of natural phenomena, this one also has its traps for the unwary. Indeed, the authors close out the section by saying:
It will be appreciated therefore that the recognition and interpretation of trends relies heavily on the availability of reference and spatial information to help distinguish the effects of climate variability from the impact of a range of other factors; seldom is it safe to allow the data series to speak for themselves.
Clearly, the authors of Pall2011 have taken that advice to heart, as they’ve hardly let the data say a single word … but on a more serious note, since this is the data they used regarding “climate variability” to calibrate the P-R model, did the Pall2011 folks follow the advice of the data curator? I see no evidence of that either way.
In any case, I could see that the river flow gauge data wouldn’t be much help to me. I was intrigued, however, by the implicit claim in the paper that extreme precipitation events were on the rise in the UK. I mean, they are saying that the changing climate will bring more floods, and the only way that can happen is if the UK has more extreme rains.
Fortunately, we do have another dataset of interest here. Unfortunately it is from the Hadley Centre again, this time the Hadley UK Precipitation dataset of Alexander and Jones, and yes, it is Phil Jones (HadUKP). Fortunately, the reference paper doesn’t show any egregious issues. Unfortunately but somewhat unavoidably, it uses a complex averaging system. Fortunately, the average results are not much different from a straight average on the scale of interest here. Unfortunately, there’s no audit trail so while averages may only be slightly changed, there’s no way to know exactly what was done to a particular extreme in a particular place and time.
In any case, it’s the best we have. It lists total daily rainfall by section of the UK, and one of these sections is South West England and Wales, which avoids the problems in averaging the sections into larger areas. Figure 2 shows the autumn maximum one-day rainfall for SW England and Wales, which was the area and time-frame Pall2011 studied regarding the autumn 2000 floods:
The extreme rainfall shown in this record is typical of records of extremes. In natural records, the extremes rarely have a normal (Gaussian or bell-shaped) distribution. Instead, typically these records contain a few extremely large values, even when we’re just looking at the extremes. The kind of extreme rainfalls leading to the flooding of 2000 are seen in Figure 3. I see this graph as a cautionary tale, in that if the record had started a year later, the one-day rainfall in 2000 would be by far the largest in the record.
In any case, for the 70 years of this record there is no indication of increasing flood risk from climate factors. Pall2011 has clearly shown that in two out of three of the years of synthetic climate B, the chance of a synthetic autumn flood in a synthetic SW England and Wales went up by 90%, over the synthetic flood risk in synthetic climate A.
But according to the observational data, there’s no sign of any increase in autumn rainfall extremes in SW England and Wales, so it seems very unlikely they were talking about our SW England and Wales … gives new meaning to the string theory claim of multiple parallel universes, I guess.
IMPLICATIONS OF THE PUBLICATION OF THIS STUDY
It is very disturbing that Nature Magazine would publish this study. There is one and only one way in which this study might have stood the slightest chance of scientific respectability. This would have been if the authors had published the exact datasets and code used to produce all of their results. A written description of the procedures is pathetically inadequate for any analysis of the validity of their results.
At an absolute minimum, to have any hope of validity the study requires the electronic publication of the A2000 and A2000N climates in some accessible form, along with the results of simple tests of the models involved (e.g. computer predictions of autumn river flows, along with the actual river flows). In addition, the study needs an explanation of the ex-ante criteria used to select the four GCMs and the lead model, and the answers to the questions I pose above, to be anywhere near convincing as a scientific study. And even then, when people finally get a chance to look at the currently unavailable A2000 and A2000N synthetic climates, we may find that they bear no resemblance to any reality, hypothetical or otherwise …
As as result, I put the onus on Nature Magazine on this one. Given the ephemeral nature of the study, the reviewers should have asked the hard questions. Nature Editors, on the other hand, should have required that the authors post sufficient data and code so that other scientists can see if what they have done is correct, or if it would be correct if some errors were fixed, or if it is far from correct, or just what is going on.
Because at present, the best we can say of the study is a) we don’t have a clue if it’s true, and b) it is not falsifiable … and while that looks good in the “Journal of Irreproducible Results“, for a magazine like Nature that is ostensibly about peer-reviewed science, that’s not a good thing.
PS – Please don’t construe this as a rant against computer models. I’ve been programming computers since 1963, longer than many readers have been around. I’m fluent in R, C, VBA, and Pascal, and I can read and write (slowly) in a half-dozen other computer languages. I use, have occasionally written, and understand the strengths, weaknesses, and limitations of a variety computer models of real-world systems. I am well aware that “all models are wrong, and some models are useful”, thats why I use them and study them and occasionally write them.
My point is that until you test, really test your model by comparing the output to reality in the most exacting tests you can imagine, you have nothing more than a complicated toy of unknown veracity. And even after extensive testing, models can still be wrong about the real world. That’s why Boeing still has test flights of new planes, despite using the best computer models that billion$ can buy, and despite the fact that modeling airflow around a plane is orders of magnitude simpler than the modeling global climate …
I and others have shown elsewhere (see my thread here, the comment here, and the graphic here) that the annual global mean temperature output of NASA’s pride and joy climate model, the GISS-E GCM, can be replicated to 98% accuracy by the simple one-line single-variable equation T(n) = [lambda * Forcings(n-1)/tau + T(n-1) ] exp(-1/tau) with T(n) being temperature at time n, and lambda and tau being constants of climate sensitivity and lag time …
Which, given the complexity of the climate, makes it very likely that the GISSE model is both wrong and not all that useful. And applying four of that kind of GCMs to the problem of UK floods certainly doesn’t improve the accuracy of your results …
The problem is not computer models. The problem is Nature Magazine trying to pass off the end results of a long computer model daisy-chain of specifically selected, untested, unverified, un-investigated computer models as valid, falsifiable, peer-reviewed science. Call me crazy, but when your results represent the output of four computer models, which are fed into a fifth computer model, whose output goes to a sixth computer model, which is calibrated against a seventh computer model, and then your results are compared to a series of different results from the fifth computer model but run with different parameters, in order to demonstrate that flood risks have changed from increasing GHGs … well, when you do that, you need to do more than wave your hands to convince me that your flood risk results are not only a valid representation of reality, but are in fact a sufficiently accurate representation of reality to guide our future actions.