Guest Post by Willis Eschenbach
The Berkeley Earth Surface Temperature (BEST) team is making a new global climate temperature record. Hopefully this will give us a better handle on what’s going on with the temperature.
BEST has put out a list of the four goals for their mathematical methods (algorithms). I like three of those goals a lot. One I’m not so fond of. Here are their goals:
1) Make it possible to exploit relatively short (e.g. a few years) or discontinuous station records. Rather than simply excluding all short records, we prefer to design a system that allow short records to be used with a low – but non‐zero – weighting whenever it is practical to do so.
2) Avoid gridding. All three major research groups currently rely on spatial gridding in their averaging algorithms. As a result, the effective averages may dependant on the choice of grid pattern and may be sensitive to effects such as the change in grid cell area with latitude. Our algorithms seek to eliminate explicit gridding entirely.
3) Place empirical homogenization on an equal footing with other averaging. We distinguish empirical homogenization from evidence‐based homogenization. Evidence‐based adjustments to records occur when secondary data and/or metadata is used to identify problems with a record and propose adjustments. By contrast, empirical homogenization is the process of comparing a record to its neighbors to detect undocumented discontinuities and other changes. This empirical process performs a kind of averaging as local outliers are replaced with the basic behavior of the local group. Rather than regarding empirical homogenization as a separate preprocessing step, we plan to incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.
4) Provide uncertainty estimates for the full time series through all steps in the process.
Using short series, avoiding gridding, and uncertainty estimates are all great goals. But the whole question of “empirical homogenization” is fraught with hidden problems and traps for the unwary.
The first of these is that nature is essentially not homogeneous. It is pied and dappled, patched and plotted. It generally doesn’t move smoothly from one state to another, it moves abruptly. It tends to favor Zipf distributions, which are about as non-normal (i.e. non-Gaussian) as a distribution can get.
So I object to the way that the problem is conceptualized. The problem is not that the data requires “homogenization”, that’s a procedure for milk. The problem is that there are undocumented discontinuities or incorrect data entries. But homogenizing the data is not the answer to that.
This is particularly true since (if I understand what they’re saying) they have already told us how they plan to deal with discontinuities. The plan, which I’ve been pushing for some time now, is to simply break the series apart at the discontinuities and treat it at two separate series. And that’s a good plan. They say:
Data split: Each unique record was broken up into fragments having no gaps longer than 1 year. Each fragment was then treated as a separate record for filtering and merging. Note however that the number of stations is based on the number of unique locations, and not the number of record fragments.
So why would they deal with “empirical discontinuities” by adjusting them, and deal with other discontinuities in a totally different manner?
Next, I object to the plan that they will “incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.” This will make it very difficult to back it out of the calculations to see what effect it has had. It will also hugely complicate the question of the estimation of error. For any step-wise process, it is crucial to separate the steps so the effect of each single step can be understood and evaluated.
Finally, let’s consider the nature of the “homogenization” process they propose. They describe it as a process whereby:
… local outliers are replaced with the basic behavior of the local group
There’s a number of problems with that.
First, temperatures generally follow a Zipf distribution (a distribution with a large excess of extreme values). As a result, what would definitely be “extreme outliers” in a Gaussian distribution are just another day in the life in a Zipf distribution. A very unusual and uncommon temperature in a Gaussian distribution may be a fairly common and mundane temperature in a Zipf distribution. If you pull those so-called outliers out of the dataset, or replace them with a local average, and you no longer have temperature data – you have Gaussian data. So you have to be real, real careful before you declare an outlier. I would certainly look at the distributions before and after “homogenization”, to see if the Zipf nature of the distribution has disappeared … and if so, I’d reconsider my algorithm.
Second, while there is a generally high correlation between temperature datasets out to 1200 km or so, that’s all that it is. A correlation. It is not a law. For any given station, there will often be nearby datasets that have very little correlation. In addition, for each of the highly correlated pairs, there will be a number of individual years where the variation in the two datasets is quite large. So despite high correlation, we cannot just assume that any record that disagrees with the “local group” is incorrect, as the BEST folks seem to be proposing.
Third, since nature itself is almost “anti-homogeneous”, full of abrupt changes and frequent odd occurrences and outliers, why would we want to “homogenize” a dataset at all? If we find data we know to be bad, throw it out. Don’t just replace it with some imaginary number that you think is somehow more homogeneous.
Fourth, although the temperature data is highly correlated out for a long distance, the same is not true of the trend. See my post on Alaskan trends regarding this question. Since the trends are not correlated, adjustment based on neighbors may well introduce a spurious trend. If the “basic behavior of the local group” is trending upwards, and the data being homogenized is trending horizontally, both may indeed be correct, and homogenization will destroy that …
Those are some of the problems with “homogenization” that I see. I’d start by naming it something else. It does not describe what we wish to do to the data. Nature is not homogenous, and neither should our dataset be homogeneous.
Then I’d use the local group, solely to locate unusual “outliers” or shifts in variance or average temperature.
But there’s no way I’d replace the putative “outliers” or shifts with the behavior of the “local group”. Why should I? If all you are doing is bringing the data in line with the average of the local group, why not just throw it out entirely and use the local average? What’s the advantage?
Instead, if I found such an actual anomaly or incorrect data point, I’d just throw out the bad data point, and break the original temperature record in two at that point, and consider it as two different records. Why average it with anything at all? That’s introducing extraneous information into a pristine dataset, what’s the point of that?
Lastly, a couple of issues with their quality control procedures. They say:
Local outlier filter: We tested for and flagged values that exceeded a locally determined empirical 99.9% threshold for normal climate variation in each record.
Regional filter: For each record, the 21 nearest neighbors having at least 5 years of record were located. These were used to estimate a normal pattern of seasonal climate variation. After adjusting for changes in latitude and altitude, each record was compared to its local normal pattern and 99.9% outliers were flagged.
Again, I’d be real, real cautious about these procedures. Since the value in both cases is “locally determined”, there will certainly not be a whole lot of data for analysis. Determination of the 99.9% exceedance level, based solely on a small dataset of Zipf-distributed data, will have huge error margins. Overall, what they propose seems like a procedure guaranteed to convert a Zipf dataset into a Gaussian dataset, and at that point all bets are off …
In addition, once the “normal pattern of seasonal climate variation” is established, how is one to determine what is a 99.9% outlier? The exact details of how this is done make a big difference. I’m not sure I see a clear and clean way to do it, particularly when the seasonal data has been “adjusted for changes in latitude and altitude”. That implies that they are not using anomalies but absolute values, and that always makes things stickier. But they don’t say how they plan to do it …
In closing, I bring all of this up, not to oppose the BEST crew or make them wrong or pick on errors, but to assist them in making their work bulletproof. I am overjoyed that they are doing what they are doing. I bring this up to make their product better by crowd-sourcing ideas and objections to how they plan to analyze the data.
Accordingly, I will ask the assistance of the moderators in politely removing any posts talking about whether BEST will or won’t come up with anything good, or of their motives, or whether the eventual product will be useful, or the preliminary results, or anything extraneous. Just paste in “Snipped – OT” to mark them, if you’d be so kind.
This thread is about how to do the temperature analysis properly, not whether to do it, or the doer’s motives, or whether it is worth doing. Those are all good questions, but not for this thread. Please take all of that to a general thread regarding BEST. This thread is about the mathematical analysis and transformation of the data, and nothing else.