Nature Unleashes a Flood … of Bad Science.

Guest Post by Willis Eschenbach

Recently, Nature Magazine published a paywalled paper called “Human contribution to more-intense precipitation extremes” by Seung-Ki Min, Xuebin Zhang, Francis W. Zwiers & Gabriele C. Hegerl (hereinafter MZZH11) was published in Nature Magazine. The Supplementary Information is available here. The study makes a very strong claim to have shown that CO2 and other greenhouse gases are responsible for increasing extreme rainfall events, viz:

Here we show that human-induced increases in greenhouse gases have contributed to the observed intensification of heavy precipitation events found over approximately two-thirds of data-covered parts of Northern Hemisphere land areas.

Figure 1. Extreme 1-day rainfall. New Orleans, Katrina. Photo Source

There are two rainfall indices which are used in their analysis, called the RX1day and RX5day indices. The RX1day and RX5day indices give the maximum one-day precipitation and five-day precipitation for a given station for a given month. These individual station datasets (available here, free registration required) have been combined into a gridded dataset, called HADEX (Hadley Climate Extremes Dataset) . It is this gridded dataset that was used in the MZZH11 study.

So what’s wrong with the study? Just about everything. Let me peel the layers off it for you, one by one.

Other people have commented on a variety of problems with the study, including Roger Pielke Jr., Andy Revkin, Judith Curry . But to begin with, I didn’t read them, I did what I always do. I went for the facts. I thrive on facts. I went to get the original data. For me, this is not the HADEX data, as that data has already been gridded. I went to the actual underlying data used to create the HADEX dataset, as cited above. Since they don’t provide a single datablock file with all of the areas (grrrr … pet peeve), I started by looking at the USA data.

And as is my habit, the first thing I do is just to look at the individual records. There are 2,661 stations in the USA database, of which some 731 contain some RX1day maximum one day rainfall data. However, as is usual with weather records of all kinds, many of these have missing data. In addition, only 9% of the stations contain a significant trend at the 95% confidence level. Since with a 95% confidence interval (CI) we would expect 5% of the stations to exceed that in any random dataset, we’re only slightly above what would be expected in a random dataset. In addition, the number of stations available varies over time..

Now, let me repeat part of that, because it is important.

91% of the rainfall stations in the US do not show a significant trend in precipitation extremes, either up or down.

So overwhelmingly in the US there has been

No significant change in the extreme rainfall.

And as if that wasn’t enough …

Of the remaining 9% that have significant trends, 5% of the trends are probably from pure random variation.

So this means that

Only about 5% of the stations in the US show any significant change in rainfall extremes.

So when you see claims about changes in US precipitation extremes, bear in mind that they are talking about a situation where only ~ 5% of the US rainfall stations show a significant trend in extreme rainfall. The rest of the nation is not doing anything.

Now, having seen that, let’s compare that to the results shown in the study:

Figure 2. The main figure of the MZZH11 study, along with the original caption. This claims to show that the odds of extreme events have increased in the US.

Hmmmm …. so how did they get that result, when the trends of the individual station extreme precipitation show that some 95% of the stations aren’t doing anything out of the ordinary? Let me go over the stages step by step as they are laid out in the study. Then I’ll return to discuss the implications of each step.

1. The HADEX folks start with the individual records. Then, using a complex formula based on the distance and the angle from the center of the enclosing gridcell, they take a weighted station average of each month’s extreme 1-day rain values from all stations inside the gridcell. This converts the raw station data into the HADEX gridded station data.

2. Then in this study they convert each HADEX gridcell time series to a “Probability-Based Index” (PI) as follows:

Observed and simulated annual extremes are converted to PI by fitting a separate generalized extreme value (GEV) distribution to each 49-year time series of annual extremes and replacing values with their corresponding percentiles on the fitted distribution. Model PI values are interpolated onto the HadEX grid to facilitate comparison with observations (see Methods Summary and Supplementary Information for details).

In other words, they separately fit a generalized three-parameter probability function each to gridcell time series, to get a probability distribution. The fitting is done iteratively, by repeatedly adjusting each parameter to find the best fit. Then they replace that extreme rainfall value (in millimetres per day) with the corresponding probability distribution value, which is between zero and 1.

They explain this curious transformation as follows:

Owing to the high spatial variability of precipitation and the sparseness of the observing network in many regions, estimates of area means of extreme precipitation may be uncertain; for example, for regions where the distribution of individual stations does not adequately sample the spatial variability of extreme values across the region. In order to reduce the effects of this source of uncertainty on area means, and to improve representativeness and inter-comparability, we standardized values at each grid-point before estimating large area averages by mapping extreme precipitation amounts onto a zero-to-one scale. The resulting ‘probability-based index’ (PI) equalizes the weighting given to grid-points in different locations and climatic regions in large area averages and facilitates comparison between observations and model simulations.

Hmmm … moving right along …

3. Next, they average the individual gridcells into “Northern Hemisphere”, “Northern Tropics”, etc.

4. Then the results from the models are obtained. Of course, models don’t have point observations, they already have gridcell averages. However, the model gridcells are not the same as the HADEX gridcells. So the model values have to be area-averaged onto the HADEX gridcells, and then the models averaged together.

5. Finally, they use a technique optimistically called “optimal fingerprinting”. As near as I can tell this method is unique to climate science. Here’s their description:

In this method, observed patterns are regressed onto multi-model simulated responses to external forcing (fingerprint patterns). The resulting best estimates and uncertainty ranges of the regression coefficients (or scaling factors) are analysed to determine whether the fingerprints are present in the observations. For detection, the estimated scaling factors should be positive and uncertainty ranges should exclude zero. If the uncertainty ranges also include unity, the model patterns are considered to be consistent with observations.

In other words, the “optimal fingerprint” method looks at the two distributions H0 and H1 (observational data and model results) and sees how far the distributions overlap. Here’s a graphical view of the process, from Bell, one of the developers of the technique.

Figure 2a. A graphical view of the “optimal fingerprint” technique.

As you can see, if the distributions are anything other than Gaussian (bell shaped), the method gives incorrect results. Or as Bell says (op. cit.) the optimal footprint model involves several crucial assumptions, viz:

• It assumes the probability distribution of the model dataset and the actual dataset are Gaussian

• It assumes the probability distribution of the model dataset and the actual have approximately the same width

While it is possible that the extreme rainfall datasets fit these criteria, until we are shown that they do fit them we don’t know if the analysis is valid. However, it seems extremely doubtful that the hemispheric averages of the probability based indexes will be normal. The MZZH11 folks haven’t thought through all of the consequences of their actions. They have fitted an extreme value distribution to standardize the gridcell time series.

This wouldn’t matter a bit, if they hadn’t then tried to use optimal fingerprinting. The problem is that the average of a PI of a number of extreme value distributions will be an extreme value distribution, not a Gaussian distribution. As you can see in Figure 2a above, for the “optimal fingerprint” method to work, the distributions have to be Gaussian. It’s not as though the method will work with other distributions but just give poorer results. Unless the data is Gaussian, the “optimal fingerprint” method is worse than useless … it is actively misleading.

It also seems doubtful that the two datasets have the same width. While I do not have access to their model dataset, you can see from Figure 1 that the distribution of the observations is wider, both regarding increases and decreases, than the distribution of the model results.

This seems extremely likely to disqualify the use of optimal fingerprinting in this particular case even by their own criteria. In either case, they need to show that the “optimal fingerprint” model is actually appropriate for this study. Or in the words of Bell, the normal distribution “should be verified for the particular choice of variables”. If they have done so there is no indication of that in the study.

I think that whole concept of using a selected group of GCMs for “optimal fingerprinting” is very shaky. While I have seen theoretical justifications for the procedure, I have not seen any indication that it has been tested against real data (not used on real data, but tested against a selected set of real data where the answer is known). The models are tuned to match the past. Because of that, if you remove any of the forcings, it’s almost a certainty that the model will not perform as well … duh, it’s a tuned model. And without knowing how or why the models are chosen, how can they say their results are solid?

OK, I said above that I would first describe the steps of their analysis. Those are the steps. Now let’s look at the implications of each step individually.

STEP ONE: We start with what underlies the very first step, which is the data. I didn’t have to look far to find that the data used to make the HADEX gridded dataset contains some really ugly errors. One station shows 48 years of August rains with a one-day maximum of 25 to 50 mm (one to two inches), and then has one August (1983) with one day when it is claimed to have rained 1016 mm (40 inches) … color me crazy, but I think that once again, as we have seen time after time, the very basic steps have been skipped. Quality doesn’t seem to be getting controlled. So … we have an unknown amount of uncertainty in the data simply due to bad individual data points. I haven’t done an analysis of how much, but a quick look revealed a dozen stations with that egregious an error in the 731 US datasets … no telling about the rest of the world.

The next data issue is “inhomogeneities” (sudden changes in volume or variability) in the data. In a Finnish study, 70% of the rainfall stations had inhomogeneities. While there are various mathematical methods used by the HADEX folks to “correct” for this, it introduces additional uncertainty into the data. I think it would be preferable to split the data at the point of the inhomogeneous change, and analyze each part as a separate station. Either way, we have an uncertainty of at least the difference in results of the two methods. In addition, the Norwegian study found that on average, the inhomogeneities tended to increase the apparent rainfall over time, introducing a spurious trend into the data.

In addition, extreme rainfall data is much harder to quality control than mean temperature data. For example, it doesn’t ever happen that the January temperature at a given station averages 40 degrees every January but one, when it averages 140 degrees. But extreme daily rainfall could easily change from 40 mm one January to an unusual rain of 140 mm. This makes for very difficult judgements as to whether a large daily reading is erroneous.

In addition, an extreme value is one single value, so if that value is incorrectly large it is not averaged out by valid data. It carries through, and is wrong for the day, the month, the year, and the decade.

Rainfall extreme data also suffers in the recording itself. If I have a weather station and I go away for the weekend, my maximum thermometer will record the maximum temperature of the two days I missed. But the rainfall gauge can only give me the average of the two days I missed … or I could record the two days as one with no rain on the other day. Either way … uncertainties.

Finally, up to somewhere around the seventies, the old rain gauges were not self emptying. This means that if the gauge were not manually emptied, it could not record an extreme rain. All of these problems with the collection of the extreme rainfall data means it is inherently less accurate than either mean or extreme temperature data.

So that’s the uncertainties in the data itself. Next we come to the first actual mathematical step, the averaging of the station data to make the HADEX gridcells. HADEX, curiously, uses the averaging method rejected by the MZZH11 folks. HADEX averages the actual rainfall extreme values, and did not create a probability-based index (PI) as in the MZZH11 study. I can make a cogent argument for either one, PI or raw data, for the average. But using a PI based average of a raw data average seems like an odd choice, which would result in unknown uncertainties. But I’m getting ahead of myself. Let me return to the gridding of the HADEX data.

Another problem increasing the uncertainty of the gridding is the extreme spatial and temporal variability of rainfall data. They are not well correlated, and as the underlying study for HADEX says (emphasis mine):

[56] The angular distance weighting (ADW) method of calculating grid point values from station data requires knowledge of the spatial correlation structure of the station data, i.e., a function that relates the magnitude of correlation to the distance between the stations. To obtain this we correlate time series for each station pairing within defined latitude bands and then average the correlations falling within each 100 km bin. To optimize computation only pairs of stations within 2000 km of each other are considered. We assume that at zero distance the correlation function is equal to one. This may not necessarily be the best assumption for the precipitation indices because of their noisy nature but it does provide a good compromise to give better gridded coverage.

Like most AGW claims, this seems reasonable on the surface. It means that stations closer to the gridbox center get weighted more than distant stations. It is based on the early observation by Hansen and Lebedeff in 1987 that year-to-year temperature changes were well correlated between nearby stations, and that correlation fell off with distance. In other words, if this year is hotter than last year in my town, it’s likely hotter than last year in a town 100 km. away. Here is their figure showing that relationship:

Figure 3. Correlation versus Inter-station Distance. Original caption says “Correlation coefficients between annual mean temperature changes for pairs of randomly selected stations having at least 50 common years in their records.”

Note that at close distances there is good correlation between annual temperature changes, and that at the latitude of the US (mostly the bottom graph in Figure 3) the correlation is greater than 50% out to around 1200 kilometres.

Being a generally suspicious type fellow, I wondered about their claim that changes in rainfall extremes could be calculated by assuming they follow the same distribution used for temperature changes. So I calculated the actual relationship between correlation and inter-station distance for the annual change in maximum one-day rainfall. Figure 4 shows that result. It is very different from temperature data, which has good correlation between nearby stations and drops off slowly with increasing distance. Extreme rainfall does not follow that pattern in the slightest.

Figure 4. Correlation of annual change in 1-day maximum rainfall versus the distance between the stations. Scatterplot shows all station pairs between all 340 mainland US stations which have at least 40 years of data per station. Red line is a 501 point Gaussian average of the data.

As you can see, there is only a slight relationship at small distances between extreme rainfall event correlation and distance between stations. There is an increase in correlation with decreasing distance as we saw with temperature, but it drops to zero very quickly. In addition, there are a significant number of negative correlations at all distances. In the temperature data shown in Figure 3, the decorrelation distance (the distance where the average correlation drops to 0.50) is on the order of 1200 km. The corresponding decorrelation distance for one-day extreme precipitation is only 40 km …

Thinking that the actual extreme values might correlate better than the annual change in the extreme values, I plotted that as well … it is almost indistinguishable from Figure 4. Either way, there is only a very short-range (less than 40 km) relation between distance and correlation for the RX1day data.

In summary, the method of weighting averages by angular distances used for gridding temperature records is supported by the Hansen/Lebedeff temperature data in Figure 3. On the other hand, the observations of extreme rainfall events in Figure 4 means that we cannot use same method for gridding of extreme rainfall data. It makes no sense, and reduces accuracy, to average data weighted by distance when the correlation doesn’t vary with anything but the shortest distances, and the standard deviation for the correlation is so large at all distances.

STEP 2: Next, they fit a generalized extreme value (GEV) probability distribution to each individual gridcell. I object very strongly to this procedure. The GEV distribution has three different parameters. Depending on how you set the three GEV dials, it will give you distributions ranging from a normal to an exponential to a Weibull distribution. Setting the dials differently for each gridcell introduces an astronomical amount of uncertainty into the results. If one gridcell is treated as a normal distribution, and the next gridcell is treated as an exponential distribution, how on earth are we supposed to compare them? I would throw out the paper based on this one problem alone.

If I decided to use their method, I would use a Zipf distribution rather than a GEV. The Zipf distribution is found in a wide range of this type of natural phenomena. One advantage of the Zipf distribution is that it only has one parameter, sigma. Well, two, but one is the size of the dataset N. Keeps you from overfitting. In addition, the idea of fitting a probability distribution to the angular-distance weighted average of raw extreme event data is … well … nuts. If you’re going to use a PI, you need to use it on the individual station records, not on some arbitrary average somewhere down the line.

STEP 3: Hemispheric and zonal averages. In addition to the easily calculable statistical error propagation in such averaging, we have the fact that in addition to statistical error each individual gridpoint has its own individual error. I don’t see any indication that they have dealt with this source of uncertainty.

STEP 4: Each model needs to have its results converted from the model grid to the HADEX grid. This, of course, gives a different amount of uncertainty to each of the HADEX gridboxes for each of the models. In addition, this uncertainty is different from the uncertainty of the corresponding observational gridbox …

There are some other model issues. The most important one is that they have not given any ex-ante criteria for selecting the models used. There are 24 models in the CMIP database that they could have used. Why did they pick those particular models? Why not divide the 24 models into 3 groups of 8 and see what difference it makes? How much uncertainty is introduced here? We don’t know … but it may be substantial.

STEP 5: Here we have the question of the uncertainties in the optimal fingerprinting. These uncertainties are said to have been established by Monte Carlo procedures … which makes me nervous. The generation of proper data for a Monte Carlo analysis is a very subtle and sophisticated art. As a result, the unsupported claim of a Monte Carlo analysis doesn’t mean much to me without a careful analysis of their “random” proxy data.

More importantly, the data does not appear to be suitable for “optimal fingerprinting” by their own criteria.

End result of the five steps?

While they have calculated the uncertainty of their final result and shown it in their graphs, they have not included most of the uncertainties I listed above. As a result, they have greatly underestimated the real uncertainty, and their results are highly questionable on that issue alone.

OVERALL CONCLUSIONS

1. They have neglected the uncertainties from:

• the bad individual records in the original data

• the homogenization of the original data

• the averaging into gridcells

• the incorrect assumption of increasing correlation with decreasing distance

• the use of a 3 parameter fitted different probability function for each gridcell

• the use of a PI average on top of a weighted raw data average

• the use of non-Gaussian data for an “optimal fingerprint” analysis

• the conversion of the model results to the HADEX grid

• the selection of the models

As a result, we do not know if their findings are significant or not … but given the number of sources of uncertainty and the fact that their results were marginal to begin with, I would say no way. In any case, until those questions are addressed, the paper should not have been published, and the results cannot be relied upon.

2. There are a number of major issues with the paper:

• Someone needs to do some serious quality control on the data.

• The use of the HADEX RX1day dataset should be suspended until the data is fixed.

• The HADEX RX1day dataset also should not be used until gridcell averages can be properly recalculated without distance-weighting.

• The use of a subset of models which are selected without any ex-ante criteria damages the credibility of the analysis

• If a probability-based index is going to be used, it should be used on the raw data rather than on averaged data. Using it on grid-cell averages of raw data introduces spurious uncertainties.

• If a probability-based index is going to be used, it needs to be applied uniformly across all gridcells rather than using different distributions a gridcell by gridcell basis.

• No analysis is given to justify the use of “optimal fingerprinting” with non-Gaussian data.

3. Out of the 731 US stations with rainfall data, including Alaska, Hawaii and Puerto Rico, 91% showed no significant change in the extreme rainfall events, either up or down.

4. Of the 340 mainland US stations with 40 years or more of records, 92% showed no significant change in extreme rainfall in either direction.

As a result, I maintain that their results are contrary to the station records, that they have used inappropriate methods, and that they have greatly underestimated the total uncertainties of their results. Thus the conclusions of their paper are not supported by their arguments and methods, and are contradicted by the lack of any visible trend in the overwhelming majority of the station datasets. To date, they have not established their case.

My best regards to all, please use your indoor voices in discussions …

[UPDATE] I’ve put the widely-cited paper by Allen and Tett about “optimal fingerprinting” online here.

0 0 votes

Article Rating

185 Comments

Inline Feedbacks

View all comments

Willis Eschenbach

Author

February 21, 2011 12:12 pm

Bernie says:
February 21, 2011 at 5:48 am

Willis:…
Do you know if there have been verifications of the original Hansen and Lebedeff (1987) results? I realize that precipitation is likely to be different from temperature but the difference in your analysis from H&L is so dramatic that it seems to me to be worth verifying – if for no other reason than the passing of 25 years. Also, I would assume that the analysis of the satellite data would verify any findings with respect to temperature.

I’ve run the results myself and find results similar to Hansen/Lebedeff. In addition, the HADEX paper cited above (Appendix A) recalculates the values for many of the variables, and finds similar answers.
That reference also says:

Statistical tests were not generally applied to precipitation data analyzed at the workshops but any obvious outliers, identified by careful examination of graphs, were checked manually. Careful post workshop analysis was employed and data processed outside of the workshops were similarly tested for outliers but methods varied from source to source. Statistical tests, local knowledge, an investigation of station histories or comparison with neighboring stations can all be applied to determine whether an outlying precipitation value is erroneous. It is particularly important to identify multiday precipitation accumulations that can appear erroneously in records of daily precipitation [Viney and Bates, 2004]. These occur when accumulated rainfall values are reported as daily totals. For example, data extracted from GHCN-Daily for Brazil were rejected if a rainfall value greater than 1 mm fell after a missing observation [Haylock et al., 2006]. Even after data were processed and collated for this study, annual time series of total precipitation and diurnal temperature range for each station were assessed again to identify outliers that may have been missed in the initial quality control procedure.

While that sounds fine, how is it that the ETCCDI data still contains crazy outliers? … this is again the problem with the lack of transparency. There should be an audit trail so that we can compare the pre-QC data with the post-QC data, or at least there should be a record of what changes were made. Without that, we simply cannot trust the HADEX data … which should sound familiar, it’s no different than the situation with other Hadley Center products.
w.

Willis Eschenbach

Author

February 21, 2011 12:24 pm

Bill Illis says:
February 21, 2011 at 7:40 am

Here is a scatterplot of HadUKP England and Wales precipitation (a fully quality controlled dataset) versus HadCET temperatures back to 1766.
Technically, there should be a trend in precipitation of 2% to 3% per 1.0C increase in temperatures.
There is no trend in this data and no evidence that global warming is causing more precipitation in the UK.
http://img249.imageshack.us/img249/1669/engwalesprecipvshadcetk.png
Either the temperature data is wrong, the precipitation data is wrong or the Clausius Clapeyron relation is wrong (on a local level at least – and if that is the case, the studies are based on a false premise to start with).
There is also no trend in daily precipitation in England and Wales back to 1931 (over 29,000 individual datapoints), a period when temperatures have supposedly increased by 0.8C in the UK.
http://img101.imageshack.us/img101/4800/dailyrainfallenglandwal.png

Excellent work as usual, Bill. Richard Telford is right that looking for a trend in individual stations is a weak test, but you are showing the national averages.
In addition, teasing out some kind of very small relationship from climate data, while possible, suffers from a couple problems. The first is that the rainfall data is short, spotty, and contains a host of missing observations. In recent years, of course, the data for any given station is more complete than in early years … which will not have much of an influence on averages, but could easily affect extremes.
In addition, if the relationship is so weak that we need layers and layers of sophisticated analysis to find it in fifty years of data … then I doubt that it is big enough to make any difference at all.
On the other hand, if the difference is as big as they claim it is (an increase in extreme events of 0.3-0.5% per year over much of the US), then we should have seen a 15-25% increase in the number of extreme events in the record … but that hasn’t shown up, and despite Richard’s claims, an increase of 25% in any trend over 50 years will definitely show up in the simple trend data. But it hasn’t.
w.

Tim Folkerts

February 21, 2011 12:31 pm

Willis Eschenbach says: February 21, 2011 at 11:52 am
“About three quarters positive, one quarter negative … but remember, there are problems in the raw data, big problems, and we only have about 25 datasets with a significant trend. Given those issues, finding that distribution (one quarter/three quarters) is not surprising.”
But isn’t that like saying “I flipped a coin 5 times, and repeated the experiment 731 times. Of those experiments, 75% of the time I got more heads than tails. But given the small number of trials, it is not surprising to get more heads.” When in fact, getting more heads 75% of the time is incredibly unlikely in 731. For 731 trials where there is no trend, then getting 75% going one way would only happen about 1 out of 10^43. (If you had less than 731 actual stations to use, then the odds of a 3:1 split would not be as extreme, but even with as few as 20 stations, getting a 15:5 split would be a statistically significant difference from 50:50.)
So what you have shown is that there is indeed a VERY statistically significant difference from the null hypothesis (“there is no trend”).
Could you provide one more statistic? How many times were there statistically significant DECREASES compared to INCREASES? 9% of the time time you say there was a trend. If there is indeed no trend, then about half of these should trend up and half trend downward. If instead, close to 0% of stations show a decrease while close to 9% show an increase, then once again that is evidence that there is a statistically significant change.
NOTE: There is a big difference between “statistically significant” and “meteorologically significant”. The increase could well have no impact on people or nature.

Willis Eschenbach

Author

February 21, 2011 12:31 pm

Tim Folkerts says:
February 21, 2011 at 10:52 am

I haven’t had time to look through it all, but the first conclusion (which was repeated “because it is important”) it was is invalid.

Now, let me repeat part of that, because it is important.
91% of the rainfall stations in the US do not show a significant trend in precipitation extremes, either up or down.
So overwhelmingly in the US there has been
No significant change in the extreme rainfall.

The aggregate of a lot of data sets that are not individually significant can quite easily be significant. For example, very few if any years by themselves would show statistically significant increase in temperature from the previous year. But a century of such years can show a statistically significant trend.

Thanks, Tim. You make the same point as Richard, that testing individual trends is a weak test. While you are correct in principle, that claim is often wrong in practice. See my answer to Richard here, along with my comments here.
I put all of those trend numbers in because they are important as an indication of the size of the change that we are looking at. It is a very tiny change if it exists at all. We know that because of the amount of the underlying data which contains no trend at all. This is a different result from e.g. finding a tiny trend that results from the averaging of a large number of significant trends.
In addition, it indicates that the size of their claimed effect is doubtful. They claim that the number of extreme events in much of the US has increased by 30-50% over the last 50 years … are you seriously claiming that if that happened a simple trend test would not be able to detect a 30-50% change, and that such a 30-50% change would be statistically insignificant in over 90% of the stations?
Because that’s what your claim means, and if I’m to believe it, you’ll have to do better than say “the aggregate of a lot of data sets that are not individually significant can quite easily be significant”. Yes, they can … but you to to show in this case that they are significant, not that they “can quite easily be” significant. You are saying that a 30-50% change in a dataset average is only visible in 9% of the underlying data making up the average … citation? Explanation? Supporting data?
It might be explained by very large changes in a small subset of the data … but then you’re left trying to explain why rainfall only changes in a small percentage of the data, and the rest is unchanged … while “global warming” is said to cause droughts and floods, I’ve not heard anyone claim that it will only affect less than 10% of recording stations.
Finally, my analysis of the individual trends is far from the only evidence that their analysis is deeply and seriously flawed … I don’t want anyone to come away thinking that your point would make any difference if it were true. We still have the other huge problems with the dataset and the the analysis, which your points don’t touch at all.
w.

TonyK

February 21, 2011 12:59 pm

I must admit that a lot (most?) of this analysis went right over my head! The whole affair reminds me of those join-the-dot pictures. It seems that the warmists simply add a whole lot more dots between the measuring stations that actually exist until the picture comes out like they wanted in the first place.
Personally, I view ANY processing of the raw data with a variable amount of scepticism. If there is a trend in temperature or rainfall, surely it would show in the raw data. Simply look at a good long temperature record. Is it going up or down? If up, is it a linear rise or accelerating? Or is it levelling out?
You know the old chestnut, ‘If a tree falls in the forest when no-one is there to hear it, does it still make a noise?’ (Duh! Of course it does!) Perhaps the climate equivalent is ‘If it rains exceptionally hard where no-one is there to see it, does it count? A warmist would say ‘Yes, and we have to allow for that in the models.’ A heretic would say ‘If there was no-one there to see it, how do you know it rained?’
[Question: Why do our UK cousins hyphenate “no one”? ~dbs]

Tim Folkerts

February 21, 2011 1:09 pm

I have no specific knowledge of the “fingerprint” analysis they did, but ….
1 ************************************
“The problem is that the average of a PI of a number of extreme value distributions will be an extreme value distribution, not a Gaussian distribution.”
The Central Limit Theorem states

Let X1, X2, X3, …, Xn be a sequence of n independent and identically distributed (iid) random variables each having finite values of expectation µ and variance σ2 > 0. The central limit theorem states that as the sample size n increases, the distribution of the sample average of these random variables approaches the normal distribution with a mean µ and variance σ2/n irrespective of the shape of the common distribution of the individual terms Xi.
from Wikipedia

The average of numbers from ANY distribution WOULD approach a normal (Gaussian) distribution when data is combined regardless of the initial distribution.
2 ************************************
“In other words, the “optimal fingerprint” method looks at the two distributions H0 and H1 (observational data and model results) and sees how far the distributions overlap. ”
You CLAIM that they must use the technique you mention, but there are plenty of legitimate ways to compare non-normal (non-Gaussian) distributions. Can you cite any evidence that the analysis in the current paper actually uses the method you suggest, or that the method would fail if they did use the method you suggest (especially in light of my fisrt point that the data will indeed ?

Tim Folkerts

February 21, 2011 1:26 pm

Willis,
I agree that there are plenty of opportunities to critique the paper. Any time there is such substantial statistical analysis, there is lost of room for errors. And then there is lots of room for critiques of the critiques. 🙂
You state in your reply “They claim that the number of extreme events in much of the US has increased by 30-50% over the last 50 years”. This is the first I had seen that claim, so I was not thinking about that in my response. Even so, I could argue that if a given station went from 2 extreme events per decade to 3 per decade, that would be an increase of 50% increase in the number of extreme events. Finding a statistically significant trend in such data would be difficult unless you averaged a lot of stations to see a change of 200 to 300 per decade. But again, I don’t have the data, nor do I have the original paper, so this is simply speculation about how such a “big” 50% change could be difficult to spot.

kadaka (KD Knoebel)

February 21, 2011 2:08 pm

From Tim Folkerts on February 21, 2011 at 1:09 pm:

“In other words, the “optimal fingerprint” method looks at the two distributions H0 and H1 (observational data and model results) and sees how far the distributions overlap. ”
You CLAIM that they must use the technique you mention, but there are plenty of legitimate ways to compare non-normal (non-Gaussian) distributions. Can you cite any evidence that the analysis in the current paper actually uses the method you suggest…

http://localgov.nccarf.edu.au/resources/human-contribution-more-intense-precipitation-extremes
Emphasis added:

In this Nature article the authors state that “human-induced increases in greenhouse gases have contributed to the observed intensification of heavy precipitation events found over approximately two-thirds of data-covered parts of Northern Hemisphere land areas. These results are based on a comparison of observed and multi-model simulated changes in extreme precipitation over the latter half of the twentieth century analysed with an optimal fingerprinting technique.”

Cute site. The Aussies have a National Climate Change Adaptation Research Facility (NCCARF) to help them prepare for the Challenging Turbulent Times To Come. Who knew? I didn’t. I’m sure our Aussie brethren are all very happy to be paying for something so useful.

John from CA

February 21, 2011 3:06 pm

Great Post Willis — this is a perfect example of a great content object exercise for the classroom. Would you consider allowing us to convert some of your posts to free SCOs for educators?

HFC

February 21, 2011 3:24 pm

Biased Brainwashing Corporation has been selling the AGW = more rain story today.
Listen or simply read the summary to understand the BBC angle.
http://www.bbc.co.uk/programmes/b00yjs49

Don Horne

February 21, 2011 3:54 pm

Willis,
Shouldn’t “temperatures” be “precipitation” in the sentence below which is just after Fig. 2?
Hmmmm …. so how did they get that result, when the trends of the individual station extreme temperatures show that some 95% of the stations aren’t doing anything out of the ordinary?

Don Horne

February 21, 2011 4:00 pm

OOPS,
That should be just after Fig. 1 not Fig. 2. Sorry ’bout that.
Previous post by me…
Willis,
Shouldn’t “temperatures” be “precipitation” in the sentence below which is just after Fig. 2?
Hmmmm …. so how did they get that result, when the trends of the individual station extreme temperatures show that some 95% of the stations aren’t doing anything out of the ordinary?

Willis Eschenbach

Author

February 21, 2011 4:07 pm

Tim Folkerts says:
February 21, 2011 at 12:31 pm

Willis Eschenbach says: February 21, 2011 at 11:52 am

“About three quarters positive, one quarter negative … but remember, there are problems in the raw data, big problems, and we only have about 25 datasets with a significant trend. Given those issues, finding that distribution (one quarter/three quarters) is not surprising.”

…
Could you provide one more statistic? How many times were there statistically significant DECREASES compared to INCREASES? 9% of the time time you say there was a trend. If there is indeed no trend, then about half of these should trend up and half trend downward. If instead, close to 0% of stations show a decrease while close to 9% show an increase, then once again that is evidence that there is a statistically significant change.

I’m sorry for my lack of clarity. The figures I gave were for statistically significant stations only. The numbers are slightly different for all of the stations, and for the mainland long stations. Of the 340 mainland US stations that have 40 years of data or more, there were 27 stations with statistically significant trends. Of these, 5 were negative and 22 were positive, with a mean of +3 ± 2 mm (2σ) per decade increase.
But before doing analyses on those numbers, remember that there is some very bad data in the mix, as I pointed out in the head post. This includes data that’s out by an order of magnitude or more. Until that data is fixed, you can’t make any definitive statements about anything.
And again, this is very peripheral to my main point, which was the underlying weaknesses of their analysis.
w.

danbo

February 21, 2011 4:31 pm

Sam agreed. The levees broke. Not just in the 9th ward. I’ve seen photos of the water splashing over the levees. One would argue if it had not been for the tide. Would the levees have broken.
http://en.wikipedia.org/wiki/File:Hurricane_Katrina_winds_1200utc29Aug05_landfall_LA_1hr.gif
Here it’s pushing water in the lake.
http://en.wikipedia.org/wiki/File:Hurricane_Katrina_winds_1500utc29Aug05_landfall_MS.gif
Here the winds shift and push the water at the levees.
Either way. It ain’t rain.

Michael Barnes

February 21, 2011 4:40 pm

Was the flood picture of New Orleans from the original article? If so, it is irrelevant to their argument.
The flooding in New Orleans was not from the rain. New Orleans flooded because a levee failed.

danbo

February 21, 2011 4:41 pm

Sam. When you drive across the parish line from Jefferson. The next I-10 interchange (in the photo) as I recall isn’t part of the 9th ward I believe it’s the 4th. It also flooded around the Superdome. Pretty much in the CBD. There were many breaches and collapses.

Willis Eschenbach

Author

February 21, 2011 4:48 pm

Tim Folkerts says:
February 21, 2011 at 1:09 pm

…
You CLAIM that they must use the technique you mention, but there are plenty of legitimate ways to compare non-normal (non-Gaussian) distributions.

Tim, I don’t understand this. Where have I claimed that they have used what techniques?

Can you cite any evidence that the analysis in the current paper actually uses the method you suggest, or that the method would fail if they did use the method you suggest (especially in light of my fisrt point that the data will indeed ?

Again, confusion. If by “the method [I] suggest” you mean “optimal fingerprinting”, that’s what they say they use. I gave evidence that Bell says the data must be random Gaussian.
Finally, yes, I am well aware of the Central Limit Theorem. However, what you may not realize is the range that this encompasses. For example, here is the violinplot of the time-series of the average of 10,000 datasets of the length of the MZZH11 data (49 years).

As you can see, despite the numbers being large (10,000 sets of 49-year pseudo-data), the distribution is nowhere near normal.
And indeed, the question is not whether averages in general are normal or not. The Law of Large Numbers is meaningless here. The question in this case is, how far from normal is the actual data used by MZZH11, and what effect does that departure have on the accuracy of their results?
Since they give no indication that they have even considered this question (although they may have done so), and since we cannot repeat their analysis because of their use of a different and unspecified probability distribution for each gridcell on the map, at this point their claims are both unsupported and incapable of replication.
This again highlights the need for scientists to publish their data and their code. At present, nobody on the planet can confidently replicate their results. And to make things worse, the same is true about the HADEX gridcell averages. There is no audit trail, there’s not even sufficient description to establish the decorrelation length used with the RX1day and RX5day data.
I’m not saying that their analysis is wrong because of the issue of whether the average is Gaussian, Tim. I’m saying that MZZH11 and HADEX have not given us anywhere near enough data to determine if their claims even make it above the noise level, much less if they are significant.
w.

danbo

February 21, 2011 4:51 pm

Sorry Willis I wasn’t sure if you grabbed it. Of if it came from someone who was using it pretend it was rain. If it was a quick grab by you no problem.
Forgive me. But Katrina isn’t something theoretical to me. I lived in New Orleans. And lived on the Mississippi coast saw the lead up to Katrina. Had to run. (from Mississippi.) And return home. And have listened to BS about Katrina and AGWing till I’m sick.
No offense intended guy.

Willis Eschenbach

Author

February 21, 2011 4:58 pm

John from CA says:
February 21, 2011 at 3:06 pm

Great Post Willis — this is a perfect example of a great content object exercise for the classroom. Would you consider allowing us to convert some of your posts to free SCOs for educators?

Do as you wish, as long as you point out to the students that I can be wrong just like anyone …
w.

danbo

February 21, 2011 4:59 pm

Willis. I know the area in the photo. Granted it floods with a heavy dew. But the food in the picture. That was tidal or as others prefer, The levee caving in. And of course every hurricane has heavy rain. Supposedly we had a 30 ft high tide. I know where the high tide marks are. I believe it.

Willis Eschenbach

Author

February 21, 2011 5:09 pm

Tim Folkerts says:
February 21, 2011 at 1:26 pm

…You state in your reply “They claim that the number of extreme events in much of the US has increased by 30-50% over the last 50 years”. This is the first I had seen that claim, so I was not thinking about that in my response.

It was a slight simplification, and on re-reading, not at all clear. Plus it contains a typo, it should have been half of that, 15% to 25% … let’s start again.
Note the legend in the figure at the top. This is the change in the PI, the probability index which goes from 0 to 1. They say that in much of the US the PI has increased at the rate of 0.3% to 0.5% for fifty years, which is a change of 15% to 30% in fifty years.
In reality, as thePI increases, the occurrences are becoming larger and larger. An extreme rainfall with a PI of 0.80 is much more than twice as large as the rainfall with a PI of 0.40. So their 15% to 25% increase PI will be reflected in a much larger increase in 1-day maximum rainfall.
Sorry for the confusion,
w.

TheLastDemocrat

February 21, 2011 5:35 pm

Look at this flood — 1927 – – makes Hurricane Katrina flood look like a puddle. Regions underwater from Ponchartrain up to Kentucky and Arkansas.
http://en.wikipedia.org/wiki/Great_Mississippi_Flood_of_1927
Good Gauss! Put that in your distribution and try to make it look normal.

banjo

February 21, 2011 5:44 pm

[Question: Why do our UK cousins hyphenate “no one”? ~dbs]
http://www.grammar-monster.com/easily_confused/no-one_no_one.htm
Coz we is cool an up to date, innit?
Unfortunately my grammar is not as healthy as my grandpa.

john reeves

February 21, 2011 6:25 pm

Hey excellent analysis there good fellows. I saw the abstract for that and it looked very bogus, as in plenty of “suggests” “could be” and partly caused” kind of statemnets..didn’t sound convincing at all.
Yr research states why in very clear terms and i just wish this attitude of jumping to conclusions and trying to proven them via ‘science’ is really causing plenty of problems in real terms. The older school emprical method of observing data then seeking to understand what it says seem to be a better way of approaching the science of CC.
Keep up the good work..

kadaka (KD Knoebel)

February 21, 2011 6:30 pm

Questions:
I found a University of Victoria press release about the paper:
http://communications.uvic.ca/releases/release.php?display=release&id=1205

A new study co-authored by Francis Zwiers, the director of UVic’s Pacific Climate Impacts Consortium, suggests that human-induced global warming may be responsible for the increases in heavy precipitation that have been observed over much of the Northern Hemisphere including North America and Eurasia over the past several decades. (…)

1. It says at the bottom:

To receive a copy of the study email a request to press.nature@gmail.com

Does that mean that Anthony Watts, publisher of the “new media” publication Watt’s Up With That?, can request a copy?
2. We were repeatedly assaulted with complaints that the Medieval Warming Period was not a global event, it was only in the Northern Hemisphere (when they allowed that much), doesn’t detract from Michael Mann’s hockey stick graph, etc.
Now they are looking at Northern Hemisphere data, and it is cited as showing global warming. Specifically the effects of human-induced global warming. [And don’t bother to argue the exact semantics, it is being cited as proof of (C)AGW.]
It’s not global, it’s only Northern Hemisphere. It’s Northern Hemisphere, it’s global. WUWT?