Nature Unleashes a Flood … of Bad Science.

Guest Post by Willis Eschenbach

Recently, Nature Magazine published a paywalled paper called “Human contribution to more-intense precipitation extremes” by Seung-Ki Min, Xuebin Zhang, Francis W. Zwiers & Gabriele C. Hegerl (hereinafter MZZH11) was published in Nature Magazine. The Supplementary Information is available here. The study makes a very strong claim to have shown that CO2 and other greenhouse gases are responsible for increasing extreme rainfall events, viz:

Here we show that human-induced increases in greenhouse gases have contributed to the observed intensification of heavy precipitation events found over approximately two-thirds of data-covered parts of Northern Hemisphere land areas.

Figure 1. Extreme 1-day rainfall. New Orleans, Katrina. Photo Source

There are two rainfall indices which are used in their analysis, called the RX1day and RX5day indices. The RX1day and RX5day indices give the maximum one-day precipitation and five-day precipitation for a given station for a given month. These individual station datasets (available here, free registration required) have been combined into a gridded dataset, called HADEX (Hadley Climate Extremes Dataset) . It is this gridded dataset that was used in the MZZH11 study.

So what’s wrong with the study? Just about everything. Let me peel the layers off it for you, one by one.

Other people have commented on a variety of problems with the study, including Roger Pielke Jr., Andy Revkin,  Judith Curry . But to begin with, I didn’t read them, I did what I always do. I went for the facts. I thrive on facts. I went to get the original data. For me, this is not the HADEX data, as that data has already been gridded. I went to the actual underlying data used to create the HADEX dataset, as cited above. Since they don’t provide a single datablock file with all of the areas (grrrr … pet peeve), I started by looking at the USA data.

And as is my habit, the first thing I do is just to look at the individual records. There are 2,661 stations in the USA database, of which some 731 contain some RX1day maximum one day rainfall data. However, as is usual with weather records of all kinds, many of these have missing data.  In addition, only 9% of the stations contain a significant trend at the 95% confidence level. Since with a 95% confidence interval (CI) we would expect 5% of the stations to exceed that in any random dataset, we’re only slightly above what would be expected in a random dataset. In addition, the number of stations available varies over time..

Now, let me repeat part of that, because it is important.

91% of the rainfall stations in the US do not show a significant trend in precipitation extremes, either up or down.

So overwhelmingly in the US there has been

No significant change in the extreme rainfall.

And as if that wasn’t enough …

Of the remaining 9% that have significant trends, 5% of the trends are probably from pure random variation.

So this means that

Only about 5% of the stations in the US show any significant change in rainfall extremes.

So when you see claims about changes in US precipitation extremes, bear in mind that they are talking about a situation where only ~ 5% of the US rainfall stations show a significant trend in extreme rainfall. The rest of the nation is not doing anything.

Now, having seen that, let’s compare that to the results shown in the study:

Figure 2. The main figure of the MZZH11 study, along with the original caption. This claims to show that the odds of extreme events have increased in the US.

Hmmmm …. so how did they get that result, when the trends of the individual station extreme precipitation show that some 95% of the stations aren’t doing anything out of the ordinary? Let me go over the stages step by step as they are laid out in the study. Then I’ll return to discuss the implications of each step.

1. The HADEX folks start with the individual records. Then, using a complex formula based on the distance and the angle from the center of the enclosing gridcell, they take a weighted station average of each month’s extreme 1-day rain values from all stations inside the gridcell. This converts the raw station data into the HADEX gridded station data.

2. Then in this study they convert each HADEX gridcell time series to a “Probability-Based Index” (PI) as follows:

Observed and simulated  annual extremes are converted to PI by fitting a separate generalized extreme value (GEV) distribution to each 49-year time series of annual extremes and replacing values with their corresponding percentiles on the fitted distribution. Model PI values are interpolated onto the HadEX grid to facilitate comparison with observations (see Methods Summary and Supplementary Information for details).

In other words, they separately fit a generalized three-parameter probability function each to gridcell time series, to get a probability distribution. The fitting is done iteratively, by repeatedly adjusting each parameter to find the best fit. Then they replace that extreme rainfall value (in millimetres per day) with the corresponding probability distribution value, which is between zero and 1.

They explain this curious transformation as follows:

Owing to the high spatial variability of precipitation and the sparseness of the observing network in many regions, estimates of area means of extreme precipitation may be uncertain; for example, for regions where the distribution of individual stations does not adequately sample the spatial variability of extreme values across the region. In order to reduce the effects of this source of uncertainty on area means, and to improve representativeness and inter-comparability, we standardized values at each grid-point before estimating large area averages by mapping extreme precipitation amounts onto a zero-to-one scale. The resulting ‘probability-based index’ (PI) equalizes the weighting given to grid-points in different locations and climatic regions in large area averages and facilitates comparison between observations and model simulations.

Hmmm … moving right along …

3. Next, they average the individual gridcells into “Northern Hemisphere”, “Northern Tropics”, etc.

4. Then the results from the models are obtained. Of course, models don’t have point observations, they already have gridcell averages. However, the model gridcells are not the same as the HADEX gridcells. So the model values have to be area-averaged onto the HADEX gridcells, and then the models averaged together.

5. Finally, they use a technique optimistically called “optimal fingerprinting”. As near as I can tell this method is unique to climate science. Here’s their description:

In this method, observed patterns are regressed onto multi-model simulated responses to external forcing (fingerprint patterns). The resulting best estimates and uncertainty ranges of the regression coefficients (or scaling factors) are analysed to determine whether the fingerprints are present in the observations. For detection, the estimated scaling factors should be positive and uncertainty ranges should exclude zero. If the uncertainty ranges also include unity, the model patterns are considered to be consistent with observations.

In other words, the “optimal fingerprint” method looks at the two distributions H0 and H1 (observational data and model results) and sees how far the distributions overlap. Here’s a graphical view of the process, from Bell, one of the developers of the technique.

Figure 2a. A graphical view of the “optimal fingerprint” technique.

As you can see, if the distributions are anything other than Gaussian (bell shaped), the method gives incorrect results. Or as Bell says (op. cit.) the optimal footprint model involves several crucial assumptions, viz:

•  It assumes the probability distribution of the model dataset and the actual dataset are Gaussian

•  It assumes the probability distribution of the model dataset and the actual have approximately the same width

While it is possible that the extreme rainfall datasets fit these criteria, until we are shown that they do fit them we don’t know if the analysis is valid. However, it seems extremely doubtful that the hemispheric averages of the probability based indexes will be normal. The MZZH11 folks haven’t thought through all of the consequences of their actions. They have fitted an extreme value distribution to standardize the gridcell time series.

This wouldn’t matter a bit, if they hadn’t then tried to use optimal fingerprinting. The problem is that the average of a PI of a number of extreme value distributions will be an extreme value distribution, not a Gaussian distribution. As you can see in Figure 2a above, for the “optimal fingerprint” method to work, the distributions have to be Gaussian. It’s not as though the method will work with other distributions but just give poorer results. Unless the data is Gaussian, the “optimal fingerprint” method is worse than useless … it is actively misleading.

It also seems doubtful that the two datasets have the same width. While I do not have access to their model dataset, you can see from Figure 1 that the distribution of the observations is wider, both regarding increases and decreases, than the distribution of the model results.

This seems extremely likely to disqualify the use of optimal fingerprinting in this particular case even by their own criteria. In either case, they need to show that the “optimal fingerprint” model is actually appropriate for this study. Or in the words of Bell, the normal distribution “should be verified for the particular choice of variables”. If they have done so there is no indication of that in the study.

I think that whole concept of using a selected group of GCMs for “optimal fingerprinting” is very shaky. While I have seen theoretical justifications for the procedure, I have not seen any indication that it has been tested against real data (not used on real data, but tested against a selected set of real data where the answer is known). The models are tuned to match the past. Because of that, if you remove any of the forcings, it’s almost a certainty that the model will not perform as well … duh, it’s a tuned model. And without knowing how or why the models are chosen, how can they say their results are solid?

OK, I said above that I would first describe the steps of their analysis. Those are the steps. Now let’s look at the implications of each step individually.

STEP ONE: We start with what underlies the very first step, which is the data. I didn’t have to look far to find that the data used to make the HADEX gridded dataset contains some really ugly errors. One station shows 48 years of August rains with a one-day maximum of 25 to 50 mm (one to two inches), and then has one August (1983) with one day when it is claimed to have rained 1016 mm (40 inches) … color me crazy, but I think that once again, as we have seen time after time, the very basic steps have been skipped. Quality doesn’t seem to be getting controlled. So … we have an unknown amount of uncertainty in the data simply due to bad individual data points. I haven’t done an analysis of how much, but a quick look revealed a dozen stations with that egregious an error in the 731 US datasets … no telling about the rest of the world.

The next data issue is “inhomogeneities” (sudden changes in volume or variability) in the data. In a Finnish study, 70% of the rainfall stations had inhomogeneities. While there are various mathematical methods used by the HADEX folks to “correct” for this, it introduces additional uncertainty into the data. I think it would be preferable to split the data at the point of the inhomogeneous change, and analyze each part as a separate station. Either way, we have an uncertainty of at least the difference in results of the two methods. In addition, the Norwegian study found that on average, the inhomogeneities tended to increase the apparent rainfall over time, introducing a spurious trend into the data.

In addition, extreme rainfall data is much harder to quality control than mean temperature data. For example, it doesn’t ever happen that the January temperature at a given station averages 40 degrees every January but one, when it averages 140 degrees. But extreme daily rainfall could easily change from 40 mm one January to an unusual rain of 140 mm. This makes for very difficult judgements as to whether a large daily reading is erroneous.

In addition, an extreme value is one single value, so if that value is incorrectly large it is not averaged out by valid data. It carries through, and is wrong for the day, the month, the year, and the decade.

Rainfall extreme data also suffers in the recording itself. If I have a weather station and I go away for the weekend, my maximum thermometer will record the maximum temperature of the two days I missed. But the rainfall gauge can only give me the average of the two days I missed … or I could record the two days as one with no rain on the other day. Either way … uncertainties.

Finally, up to somewhere around the seventies, the old rain gauges were not self emptying. This means that if the gauge were not manually emptied, it could not record an extreme rain. All of these problems with the collection of the extreme rainfall data means it is inherently less accurate than either mean or extreme temperature data.

So that’s the uncertainties in the data itself. Next we come to the first actual mathematical step, the averaging of the station data to make the HADEX gridcells. HADEX, curiously, uses the averaging method rejected by the MZZH11 folks. HADEX averages the actual rainfall extreme values, and did not create a probability-based index (PI) as in the MZZH11 study. I can make a cogent argument for either one, PI or raw data, for the average. But using a PI based average of a raw data average seems like an odd choice, which would result in unknown uncertainties. But I’m getting ahead of myself. Let me return to the gridding of the HADEX data.

Another problem increasing the uncertainty of the gridding is the extreme spatial and temporal variability of rainfall data. They are not well correlated, and as the underlying study for HADEX says (emphasis mine):

[56] The angular distance weighting (ADW) method of calculating grid point values from station data requires knowledge of the spatial correlation structure of the station data, i.e., a function that relates the magnitude of correlation to the distance between the stations. To obtain this we correlate time series for each station pairing within defined latitude bands and then average the correlations falling within each 100 km bin. To optimize computation only pairs of stations within 2000 km of each other are considered. We assume that at zero distance the correlation function is equal to one. This may not necessarily be the best assumption for the precipitation indices because of their noisy nature but it does provide a good compromise to give better gridded coverage.

Like most AGW claims, this seems reasonable on the surface. It means that stations closer to the gridbox center get weighted more than distant stations. It is based on the early observation by Hansen and Lebedeff in 1987 that year-to-year temperature changes were well correlated between nearby stations, and that correlation fell off with distance. In other words, if this year is hotter than last year in my town, it’s likely hotter than last year in a town 100 km. away. Here is their figure showing that relationship:

Figure 3. Correlation versus Inter-station Distance. Original caption says “Correlation coefficients between annual mean temperature changes for pairs of randomly selected stations having at least 50 common years in their records.”

Note that at close distances there is good correlation between annual temperature changes, and that at the latitude of the US (mostly the bottom graph in Figure 3) the correlation is greater than 50% out to around 1200 kilometres.

Being a generally suspicious type fellow, I wondered about their claim that changes in rainfall extremes could be calculated by assuming they follow the same distribution used for temperature changes. So I calculated the actual relationship between correlation and inter-station distance for the annual change in maximum one-day rainfall. Figure 4 shows that result. It is very different from temperature data, which has good correlation between nearby stations and drops off slowly with increasing distance. Extreme rainfall does not follow that pattern in the slightest.

Figure 4. Correlation of annual change in 1-day maximum rainfall versus the distance between the stations. Scatterplot shows all station pairs between all 340 mainland US stations which have at least 40 years of data per station. Red line is a 501 point Gaussian average of the data.

As you can see, there is only a slight relationship at small distances between extreme rainfall event correlation and distance between stations. There is an increase in correlation with decreasing distance as we saw with temperature, but it drops to zero very quickly. In addition, there are a significant number of negative correlations at all distances. In the temperature data shown in Figure 3, the decorrelation distance (the distance where the average correlation drops to 0.50) is on the order of 1200 km. The corresponding decorrelation distance for one-day extreme precipitation is only 40 km …

Thinking that the actual extreme values might correlate better than the annual change in the extreme values, I plotted that as well … it is almost indistinguishable from Figure 4. Either way, there is only a very short-range (less than 40 km) relation between distance and correlation for the RX1day data.

In summary, the method of weighting averages by angular distances used for gridding temperature records is supported by the Hansen/Lebedeff temperature data in Figure 3. On the other hand, the observations of extreme rainfall events in Figure 4 means that we cannot use same method for gridding of extreme rainfall data. It makes no sense, and reduces accuracy, to average data weighted by distance when the correlation doesn’t vary with anything but the shortest distances, and the standard deviation for the correlation is so large at all distances.

STEP 2: Next, they fit a generalized extreme value (GEV) probability distribution to each individual gridcell. I object very strongly to this procedure. The GEV distribution has three different parameters. Depending on how you set the three GEV dials, it will give you distributions ranging from a normal to an exponential to a Weibull distribution. Setting the dials differently for each gridcell introduces an astronomical amount of uncertainty into the results. If one gridcell is treated as a normal distribution, and the next gridcell is treated as an exponential distribution, how on earth are we supposed to compare them? I would throw out the paper based on this one problem alone.

If I decided to use their method, I would use a Zipf distribution rather than a GEV. The Zipf distribution is found in a wide range of this type of natural phenomena. One advantage of the Zipf distribution is that it only has one parameter, sigma. Well, two, but one is the size of the dataset N. Keeps you from overfitting. In addition, the idea of fitting a probability distribution to the angular-distance weighted average of raw extreme event data is … well … nuts. If you’re going to use a PI, you need to use it on the individual station records, not on some arbitrary average somewhere down the line.

STEP 3: Hemispheric and zonal averages. In addition to the easily calculable statistical error propagation in such averaging, we have the fact that in addition to statistical error each individual gridpoint has its own individual error. I don’t see any indication that they have dealt with this source of uncertainty.

STEP 4: Each model needs to have its results converted from the model grid to the HADEX grid. This, of course, gives a different amount of uncertainty to each of the HADEX gridboxes for each of the models. In addition, this uncertainty is different from the uncertainty of the corresponding observational gridbox …

There are some other model issues. The most important one is that they have not given any ex-ante criteria for selecting the models used. There are 24 models in the CMIP database that they could have used. Why did they pick those particular models? Why not divide the 24 models into 3 groups of 8 and see what difference it makes? How much uncertainty is introduced here? We don’t know … but it may be substantial.

STEP 5: Here we have the question of the uncertainties in the optimal fingerprinting. These uncertainties are said to have been established by Monte Carlo procedures … which makes me nervous. The generation of proper data for a Monte Carlo analysis is a very subtle and sophisticated art. As a result, the unsupported claim of a Monte Carlo analysis doesn’t mean much to me without a careful analysis of their “random” proxy data.

More importantly, the data does not appear to be suitable for “optimal fingerprinting” by their own criteria.

End result of the five steps?

While they have calculated the uncertainty of their final result and shown it in their graphs, they have not included most of the uncertainties I listed above. As a result, they have greatly underestimated the real uncertainty, and their results are highly questionable on that issue alone.

OVERALL CONCLUSIONS

1. They have neglected the uncertainties from:

•  the bad individual records in the original data

•  the homogenization of the original data

•  the averaging into gridcells

•  the incorrect assumption of increasing correlation with decreasing distance

•  the use of a 3 parameter fitted different probability function for each gridcell

•  the use of a PI average on top of a weighted raw data average

•  the use of non-Gaussian data for an “optimal fingerprint” analysis

•  the conversion of the model results to the HADEX grid

•  the selection of the models

As a result, we do not know if their findings are significant or not … but given the number of sources of uncertainty and the fact that their results were marginal to begin with, I would say no way. In any case, until those questions are addressed, the paper should not have been published, and the results cannot be relied upon.

2.  There are a number of major issues with the paper:

•  Someone needs to do some serious quality control on the data.

•  The use of the HADEX RX1day dataset should be suspended until the data is fixed.

•  The HADEX RX1day dataset also should not be used until gridcell averages can be properly recalculated without distance-weighting.

•  The use of a subset of models which are selected without any ex-ante criteria damages the credibility of the analysis

•  If a probability-based index is going to be used, it should be used on the raw data rather than on averaged data. Using it on grid-cell averages of raw data introduces spurious uncertainties.

•  If a probability-based index is going to be used, it needs to be applied uniformly across all gridcells rather than using different distributions a gridcell by gridcell basis.

•  No analysis is given to justify the use of “optimal fingerprinting” with non-Gaussian data.

3. Out of the 731 US stations with rainfall data, including Alaska, Hawaii and Puerto Rico, 91% showed no significant change in the extreme rainfall events, either up or down.

4. Of the 340 mainland US stations with 40 years or more of records, 92% showed no significant change in extreme rainfall in either direction.

As a result, I maintain that their results are contrary to the station records, that they have used inappropriate methods, and that they have greatly underestimated the total uncertainties of their results. Thus the conclusions of their paper are not supported by their arguments and methods, and are contradicted by the lack of any visible trend in the overwhelming majority of the station datasets. To date, they have not established their case.

My best regards to all, please use your indoor voices in discussions …

w.

[UPDATE] I’ve put the widely-cited paper by Allen and Tett about “optimal fingerprinting” online here.

Advertisements

  Subscribe  
newest oldest most voted
Notify of
Harry the Hacker

Seems a lot like Garbage in, Garbage Out.

Mark T

Of course their conclusions are not supported… it was published in Nature.
Mark

tokyoboy

This is one of their (semi)final death rattles?

Charlie A

Have you offered to Nature your services as a one man editorial review board.
It would be interesting to see the peer review comments for this paper.

ferdberple

A study that said there was no statistically significant change in rainfall would not have been published by Nature. Look at any newspaper. An increase in the rate of plane crashes is a story. A decrease is not.

The initial precipitation data has been subjected to such extreme torture that it would appear appropriate for Amnesty International to take up the case.
/humour
For each of the points you listed, reproduced below,
• the bad individual records in the original data
• the homogenization of the original data
• the averaging into gridcells
• the incorrect assumption of increasing correlation with decreasing distance
• the use of a 3 parameter fitted different probability function for each gridcell
• the use of a PI average on top of a weighted raw data average
• the use of non-Gaussian data for an “optimal fingerprint” analysis
• the conversion of the model results to the HADEX grid
• the selection of the models
one question that is not clear to me. Do the authors employ a sensitivity analysis to yield an estimate of the systematic errors associated with performing each of the above? Do they report such errors?
A general observation: climatology seems to be the only field of science that does not think that error estimates and error bars on data are necessary.

J

I know it is unlikely to happen, but you should actually send this analysis as a comment to Nature and see what happens …

Nylo

Great analysis as usual, Willis. It’s a pleasure to read your guest posts, which are IMHO always among WUWT’s greatest contents.

michel

Willis, very nice piece, knocks the paper into a cocked hat.
Do you have time to turn your attention on the other paper in this issue, the one about the flooding in the UK in 2000?
The claim there was that a week or so of autumn flooding was provably connected to rising CO2 levels….. At least, that’s what I think it was saying.

Willis Eschenbach

Colonel Sun says:
February 20, 2011 at 11:43 pm

… one question that is not clear to me. Do the authors employ a sensitivity analysis to yield an estimate of the systematic errors associated with performing each of the above? Do they report such errors?

Basically, no. They do a couple of sensitivity analyses, but they don’t touch the important issues like data quality, model selection, or improper averaging. They also do not do any analysis of whether this data is suitable for the “optimal fingerprint” analysis.
w.

I knew you’d get a kick out of it.

Willis Eschenbach

Charlie A says:
February 20, 2011 at 11:28 pm

Have you offered to Nature your services as a one man editorial review board.
It would be interesting to see the peer review comments for this paper.

Yes, it would be very interesting to see the comments. That’s why I have argued elsewhere that peer reviews should be signed, and published electronically when the paper is published.
w.

jorgekafkazar

Bottom line: Their method doesn’t permit the inclusion of error bars. The result isn’t just garbage, we can’t even tell how bad the garbage might be. Alarmists are so certain that a minuscule increase in CO2 produces warming, rain, dead frogs, etc., that they have to resort to increasingly bizarre statistical methods in an attempt to tease out some sort of signal from chaotic data. Rain is more chaotic than temperature, it would seem, so the validity of this paper is well below that of temperature-based papers. How did this ever get published? Oh, yes. Nature.

Malaga View

A beautifully sharp analysis that cuts right through the mumbo jumbo to reveal witch doctors publishing more voodoo in their house magazine.

First Witch: When shall we three meet again
In thunder, lightning, or in rain?
Second Witch: When the hurlyburly’s done,
When the battle’s lost and won.
William Shakespeare, Macbeth, 1.1

SSam

I’m not a stats sort of person… but that read really, really well.
Thank you for putting it into common language.

Roy

Ronald Coase had it sussed: torture the data long enough…

Martin Brumby

After torturing the data to that extent, it will obviously confess to whatever the “scientists” want to hear.
It would have at least been more honest to have said:-
“It’s all the fault of Man’s CO2. A little Polar Bear told me.”

Manfred

It is annoying, that such poor science slipped again through Nature’s peer review process.
I associate with co-author Gabriele Hegerl questionable practices within the IPCC, disproven Hockey team “science”, climategate and her few rooms apart neighbour Geoffrey Boulton running the failed Muir/Russell “inquiry”.

Scottish Sceptic

I was wondering whether there was any basis to your concerns … and then I saw the scatterplot!
If there is so little correlation between stations – how can anyone create a gridded pattern by averaging local stations?

Willis Eschenbach

steven mosher says:
February 20, 2011 at 11:52 pm

I knew you’d get a kick out of it.

Thanks for the heads-up on the paper, mosh. It was the first time I’ve explored deeply into the terrain surrounding the analysis of data extremes as opposed to the data itself. My previous foray involved peak river flows, shown as Update 12 here. Interesting stuff.
It also brought me back to “optimal fingerprinting”, which I think I understand, but which I don’t understand the implications of. Does anyone know a clear explication and discussion of the subject? I’ve posted up the Allen and Tett paper on OF here.
I generally distrust multiple linear regressions, of which optimal fingerprinting is one flavor, because so little of nature is linear and because of the problem of overfitting … yes, multiple linear regressions have their uses, but they are a blunt and often misleading tool.
w.

Peter Plail

Thanks for your hard work Willis.
They have done a lot of calculations to get their results. I am appalled that they assume Gaussian distribution of actual and model datasets without checking. Surely it is not a difficult task to verify this pretty fundamental assumption, at least using the actual dataset?
Another question I keep asking myself is where is the science in this – this is a statistical analysis (and a pretty amateur one, judging from Willis’ analysis).

John Trigge

Peer review, that much lauded and oft quoted as the arbiter of quality, fails again.

I remember reading about this article before, and the SI had this pearl:
“…There is a sudden drop in observational coverage after 2000. For 2001-2003, fewer than 60% of grid-points have data compared to the 1961-1990 mean…”
Kinda hard to tell if there’s been an increase in rain, when there’s been a decrease in data.

Roy

I think that the general point that a change in climate will produce changes in the (frequency of particular types of) weather is logically sound!

Ed Zuiderwijk

That “fingerprint method” looks to me like the “methods” used to arrive at the infamous hockey stick. If you introduce a clearly “new statistical technique”, if that’s what it is, you would want to have a reputable statistician have a look at it (and any publisher worth his salt would insist on that).
If I want to find out if two distributions are statistically significantly different, I use the good old non-parametric Kolmogorov-Smirnov test. I’m sure I know what comes out of that if applied to these data.

Merovign

It is, frankly, difficult for the layman to understand how so many errors can be made in a row, accidentally, by professionals.
I can only describe it as a statistical horror show. It’s actually kind of depressing.
Yes, seeing the peer review comments would be interesting. I doubt they will be forthcoming, however. Science as an open process seems to be anachronistic.

Christopher Hanley

They seem to be sidestepping an essential link in their chain of reasoning viz. the global mean temperature.

R.S.Brown

Willis,
In your top paragraph you use the phrase,

“extreme rainfall events”

to describe what MZZH11 is looking at, drawn from the
HADEX data set from 1950 up to and including 1999.
However, the authors of the study describe the data with
the more inclusive phrase,

“heavy precipitation events”

to describe the target of their investigation.
Going through the commentaries, the press releases, and
particularly the Supplementary Information at:
http://www.nature.com/nature/journal/v470/n7334/extref/nature09763-s1.pdf
I don’t see any differentiation between rainfall and snowfall
as forms of precipitation. Rainfall is immediate in it’s “wetting”
and can be measured in real time via rain gauges.
Snowfall, on the other hand, can be reduced down to it’s
equivalent in rainfall at the time it’s precipitated, but the
“wetting” effects on the environment don’t occur until
the actual snow melts.
The rain that falls on Tuesday is credited to Tuesday. The
snow that falls on Wednesday may be credited to Wednesday,
but may nor melt until Saturday, or two Saturdays later.
2.2 inches of rain over a weekend is a good, strong, but not
all that heavy a rain event in many U.S. locations. The 22 inch
dry snow equivalent occuring in most U.S. locations generally
is considered a heavy precipitation event.
My point is, this study may be “off” a bit on the basic assumption
of what’s way down inside the HADEX data set.

Puckster

“Being a generally suspicious type fellow, I wondered about their claim that changes in rainfall extremes could be calculating by assuming they follow the same distribution used for temperature changes.”
Before submitting this to Nature……….maybe change “calculating” to “calculated”?
……..I’m just saying…….just a little proof reading.
Anyway, just how “overlooked” can submissions get. Anthony, doubtless your submission will live up to the requisite standards.

John Marshall

Looks like a fiddle to me. You could compare extreme rainfall to shoe size, instead of model output and get the same result.
Models are not science!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

2kevin

…and in this corner, fighting for for humanity and sa nity as whole; Willis ‘The Eviscerator’ Eschenbach!

kadaka (KD Knoebel)

Editorial Annoyances:
STEP ONE: […] I haven’t done an analysis of how much, but a quick look revealed a dozen station…
stations
In addition, extreme rainfall data is much harder to quality control than mean temperature data. For example, it doesn’t ever happen that the January temperature at station averages 40 degrees every January but one, when it averages 140 degrees.
Needs clarity and/or correction. Delete the “at station” and change to “the January temperatures average”?
Being a generally suspicious type fellow, I wondered about their claim that changes in rainfall extremes could be calculating by assuming they follow the same distribution used for temperature changes.
Should be calculated
If I decided to use their method, I would use a Zipf distribution rather than a GEV. The Ziph distribution is found in a wide range of this type of natural phenomena. One advantage of the Zipf distribution is that it only has one parameter, sigma.
H or F? (Pronunciation-based error, “pf” in German can be the same as “ph”, ex: Pfizer)
Otherwise, good reading, good analysis.
I shall now wait for the regulars of Tami’s Troupe to complain “That’s just the US which is less than 2% of the land area,” “If you actually knew statistics you’d see they are perfectly justified…” Etc, ad nauseam.

Mac

Poor data, bad methodologies, flawed science, misleading headlines.
It would seem that is all you need do to publish in Nature.

Anoneumouse

In Climate Science, to observe something, you have to create it. Now this sounds scarily close to bullshit. But if it is bullshit, then at least it’s bullshit with equations.
Where A= a quantity and B= a quantity and C=a quantity and BS =Bullshit
here is the formula that defines bullshit:
A+B+C = D
A+B+C+BS=BS
and BS is not equal to D
The significance of this formula is that even when you solve the variables A. B and C once you add BS to it your answer is also BS. Simply adding the bullshit factor completely destroys your ability to solve the equation that would otherwise be represented by the value D.

fFreddy

Seems a lot like Garbage in, Garbage Out.

No, to quote a commenter elsewhere, in climatology it is “Garbage in, Gospel Out”

Richard Telford

only ~ 5% of the US rainfall stations show a significant trend in extreme rainfall. The rest of the nation is not doing anything.
——————
This is a very weak analysis, of the type beloved by climate “sceptics”.
If time is a weak predictor of extreme rainfall, then only a few individual stations will have a statistically significant trend, perhaps few more that expected from the Type I error rate. But there may still be a highly significant relationship taking the data en-mass.
Climate “sceptics” like this, because they can pick a record and show that there is no *statistically significant* change, ignoring the aggregate data which may show a highly statistically significant change.
Here is some R code that illustrates this:
set.seed(123)
x=rnorm(100)
y=as.data.frame(replicate(1000,rnorm(100,x*.05)))#1000 replicate y variables with weak relationship to x
plot(x,y[[1]])#plot of first y replicate against x
sum(sapply(y,function(Y)cor.test(x,Y)$p.value)<.05)#number of replicates with a statistically significant trend == 68
cor.test(rowMeans(y),x)$p.value # p-value of the aggregate data set highly statistically significant

Don V

Willis, starting on page 424 of the Allen and Tett paper they discuss “Consistency checks to detect model inadequacy”. They even give a ” simple test of the null hypotheis” (I suck at statistics and it didn’t look that simple to me). Were you able to determine if MZZH11 ever conducted this “simple test” for using optimal fingerprinting to prove modeled rainfall extremes retrospectively correlated with the
actual data and therefore could be used to prognosticate? I kept falling asleep at this late hour while wading through the paper. . . . z z z but I couldn’t find any such tests.

Steeptown

Are the authors all like Steig, i.e. not statistical experts? Were statisticians involved at all, including in the peer review process?

Brian H

Heh. “Not even wrong” seems like a suitable ‘peer review’ comment.

An excellent analysis, as always, by Mr Eschenbach. Reading the explanation of the authors’ ‘curious transformation’ of the data, I thought I was reading something written by Michael Mann!!
Malaga View says it beautifully with a quotation from Shakespeare. The US navy during the last war said it no less elegantly – ‘Bullshit baffles brains’.

“Thinking that the actual extreme values might correlate better than the annual change in the extreme values, I plotted that as well … it is almost indistinguishable from Figure 4. Either way, there is only a very short-range (less than 40 km) relation between distance and correlation for the RX1day data.”
I had to laugh when I got to your figure 4 Willis. A comprehensive FAIL analysis, thank you.
This reminds me of the tropospheric hotspot paper (who was it?) which splurged red all over the graph, by including zero among the numbers to be painted red.

On Katrina, here’s a clue: don’t build a city below sea level.

Charlie A says:
February 20, 2011 at 11:28 pm
Have you offered to Nature….
Why did you pick Nature in particular? It is because of the controversy in some of their global warming papers publicized by them? You chose Nature because of their global warming bias?

Roger Longstaff

A very impressive analysis. The cumulative application of approximations that you describe surely leads to a meaningless result.
I also would be interested in your thoughts on the other Nature paper, where a single SINGLE EXTREME RAINFALL EVENT, IN A single (“cherry picked”?) region of England, 11 years ago (“also “cherry picked”?), was “shown” to have been 50% more likely due to anthropogenic carbon dioxide. Massive parallel processing was required to run the models – presumably a “Monte Carlo” simulation?
This was widely reported by the UK MSM as further proof of harmful AGW, and by implication to justify the terrible cost that we are paying to mitigate against its effects. Somewhere else on this blog (in a different thread) someone published the raw rainfall data for England, which, to the untrained eye, showed no discerable pattern for the last hundred years.
All of this sent my BS detector off the scale, but the average response down at the pub was – if all of these clever people used every computer in the world to prove this, it must be right. And Nature used to be a fine publication!

Cementafriend

Willis, I have not checked the actual statistical distribution of rainfall but I can say absolutely that it is not a Gaussian normal distribution. For example a station near where I live (now closed but I have been measuring daily rainfall for a couple of years and have infilled a few missing monthly records to make a complete record back to 1893). The rain is seasonal with three summer months having about five times the three winter months while the other months vary between the high and low months. The average in one of the wet months is about 230mm and the individual months (117 years) range from 5mm to 1380mm (std dev 205mm). For one of the drier months the average is 59mm and the range from 0mm to 158mm (std dev 52mm). Any one can see that this is a skewed distribution and both average and SD do not make much sense. My look at this rainfall data including a check of some daily rainfalls (max of 665mm one day in 1898) shows that there has been no increase in daily or monthly maximum rainfall but possibly a very small increasing trend in the annual total of less than 5 mm in about 1800mm total over the 117years but the record length is not sufficient to really judge.

Guam

@ Michael
Yes That is EXACTLY what was being claimed, I saw the interview on Sky News regarding this University of Oxford claim (whilst there were caveats) it was quite clear they claimed a risk increase by a factor of 2 due to AGW of the 2000 events in the UK

I will concentrate on UK data available since ~1750:
Monthly precipitation for England&Wales
http://climexp.knmi.nl/data/pHadEWP_monthly_qc.png
Eyeballing it, nothing special but some statistic tools should be applied.
Annual precipitation for England&Wales
http://climexp.knmi.nl/data/pHadEWP_monthly_qc_mean1.png
Some 60-years long wave patter has emerged; now I remind you that those scientists concentrated on post-1950 trend and “after dozens of climate runs they found out, that only with programmed GH forcing they get the rainfall increase”. Those scientist of course cherry-picked the rising rainfall trend after 1950. Hey, run that junk model of yours since 1750, whether we will see the wave pattern or not! [lotsa profanities self-snipped]
Of course, maybe the annual totals are not much changing, but the precip is spread more unevenly during the year? Let’s calculate annual standard deviation of monthly values!
http://climexp.knmi.nl/data/pHadEWP_monthly_qc_sd1a.png
No bloody increase in precipitation spread, or instability, or irregularities, nothing.. [more self-snippage]
Of course, maybe the rise in extremes is hidden in the daily data? Let’s see annual standard deviations of daily data:
England&Wales, only since 1930
http://climexp.knmi.nl/data/pHadEWP_daily_qc_sd1a.png
Year 2000 was rather remarkable and there is some trend;
Scotland: no trend
http://climexp.knmi.nl/data/pHadSP_daily_qc_sd1a.png
N. Ireland: trend to more stability
http://climexp.knmi.nl/daily2longer.cgi
SE England: no obvious trend
http://climexp.knmi.nl/data/pHadSEEP_daily_qc_sd1a.png

danbo

Figure 1. Extreme 1-day rainfall. New Orleans, Katrina. Photo Source
I’m not sure where this came from. But see that large body of water at the top, of the source photo. http://www.cces.ethz.ch/projects/hazri/EXTREMES/img/KatrinaNewOrleansFlooded.jpg?hires It’s called lake Pontchartain. It’s the second largest inland body of saltwater in the US. That’s where the water came from. Although it rained, this was a tidal event. Not a rain event.

Viv Evans

Thank you, Willis, for this entertaining ‘look-under-the-hood’. It is like showing that there is only a lawn mower motor under the hood of the latest, hugely lauded model of a Ferrari – and it is stuttering as well.
One wonders who did the pal review on this paper …

fenbeagle

Looks like a heavy flood to me. Meanwhile, wading through the mire on the Lincolnshire Fens…
http://fenbeagleblog.wordpress.com/