Guest Post by Willis Eschenbach
The Berkeley Earth Surface Temperature (BEST) team is making a new global climate temperature record. Hopefully this will give us a better handle on what’s going on with the temperature.
BEST has put out a list of the four goals for their mathematical methods (algorithms). I like three of those goals a lot. One I’m not so fond of. Here are their goals:
1) Make it possible to exploit relatively short (e.g. a few years) or discontinuous station records. Rather than simply excluding all short records, we prefer to design a system that allow short records to be used with a low – but non‐zero – weighting whenever it is practical to do so.
2) Avoid gridding. All three major research groups currently rely on spatial gridding in their averaging algorithms. As a result, the effective averages may dependant on the choice of grid pattern and may be sensitive to effects such as the change in grid cell area with latitude. Our algorithms seek to eliminate explicit gridding entirely.
3) Place empirical homogenization on an equal footing with other averaging. We distinguish empirical homogenization from evidence‐based homogenization. Evidence‐based adjustments to records occur when secondary data and/or metadata is used to identify problems with a record and propose adjustments. By contrast, empirical homogenization is the process of comparing a record to its neighbors to detect undocumented discontinuities and other changes. This empirical process performs a kind of averaging as local outliers are replaced with the basic behavior of the local group. Rather than regarding empirical homogenization as a separate preprocessing step, we plan to incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.
4) Provide uncertainty estimates for the full time series through all steps in the process.
Using short series, avoiding gridding, and uncertainty estimates are all great goals. But the whole question of “empirical homogenization” is fraught with hidden problems and traps for the unwary.
The first of these is that nature is essentially not homogeneous. It is pied and dappled, patched and plotted. It generally doesn’t move smoothly from one state to another, it moves abruptly. It tends to favor Zipf distributions, which are about as non-normal (i.e. non-Gaussian) as a distribution can get.
So I object to the way that the problem is conceptualized. The problem is not that the data requires “homogenization”, that’s a procedure for milk. The problem is that there are undocumented discontinuities or incorrect data entries. But homogenizing the data is not the answer to that.
This is particularly true since (if I understand what they’re saying) they have already told us how they plan to deal with discontinuities. The plan, which I’ve been pushing for some time now, is to simply break the series apart at the discontinuities and treat it at two separate series. And that’s a good plan. They say:
Data split: Each unique record was broken up into fragments having no gaps longer than 1 year. Each fragment was then treated as a separate record for filtering and merging. Note however that the number of stations is based on the number of unique locations, and not the number of record fragments.
So why would they deal with “empirical discontinuities” by adjusting them, and deal with other discontinuities in a totally different manner?
Next, I object to the plan that they will “incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.” This will make it very difficult to back it out of the calculations to see what effect it has had. It will also hugely complicate the question of the estimation of error. For any step-wise process, it is crucial to separate the steps so the effect of each single step can be understood and evaluated.
Finally, let’s consider the nature of the “homogenization” process they propose. They describe it as a process whereby:
… local outliers are replaced with the basic behavior of the local group
There’s a number of problems with that.
First, temperatures generally follow a Zipf distribution (a distribution with a large excess of extreme values). As a result, what would definitely be “extreme outliers” in a Gaussian distribution are just another day in the life in a Zipf distribution. A very unusual and uncommon temperature in a Gaussian distribution may be a fairly common and mundane temperature in a Zipf distribution. If you pull those so-called outliers out of the dataset, or replace them with a local average, and you no longer have temperature data – you have Gaussian data. So you have to be real, real careful before you declare an outlier. I would certainly look at the distributions before and after “homogenization”, to see if the Zipf nature of the distribution has disappeared … and if so, I’d reconsider my algorithm.
Second, while there is a generally high correlation between temperature datasets out to 1200 km or so, that’s all that it is. A correlation. It is not a law. For any given station, there will often be nearby datasets that have very little correlation. In addition, for each of the highly correlated pairs, there will be a number of individual years where the variation in the two datasets is quite large. So despite high correlation, we cannot just assume that any record that disagrees with the “local group” is incorrect, as the BEST folks seem to be proposing.
Third, since nature itself is almost “anti-homogeneous”, full of abrupt changes and frequent odd occurrences and outliers, why would we want to “homogenize” a dataset at all? If we find data we know to be bad, throw it out. Don’t just replace it with some imaginary number that you think is somehow more homogeneous.
Fourth, although the temperature data is highly correlated out for a long distance, the same is not true of the trend. See my post on Alaskan trends regarding this question. Since the trends are not correlated, adjustment based on neighbors may well introduce a spurious trend. If the “basic behavior of the local group” is trending upwards, and the data being homogenized is trending horizontally, both may indeed be correct, and homogenization will destroy that …
Those are some of the problems with “homogenization” that I see. I’d start by naming it something else. It does not describe what we wish to do to the data. Nature is not homogenous, and neither should our dataset be homogeneous.
Then I’d use the local group, solely to locate unusual “outliers” or shifts in variance or average temperature.
But there’s no way I’d replace the putative “outliers” or shifts with the behavior of the “local group”. Why should I? If all you are doing is bringing the data in line with the average of the local group, why not just throw it out entirely and use the local average? What’s the advantage?
Instead, if I found such an actual anomaly or incorrect data point, I’d just throw out the bad data point, and break the original temperature record in two at that point, and consider it as two different records. Why average it with anything at all? That’s introducing extraneous information into a pristine dataset, what’s the point of that?
Lastly, a couple of issues with their quality control procedures. They say:
Local outlier filter: We tested for and flagged values that exceeded a locally determined empirical 99.9% threshold for normal climate variation in each record.
and
Regional filter: For each record, the 21 nearest neighbors having at least 5 years of record were located. These were used to estimate a normal pattern of seasonal climate variation. After adjusting for changes in latitude and altitude, each record was compared to its local normal pattern and 99.9% outliers were flagged.
Again, I’d be real, real cautious about these procedures. Since the value in both cases is “locally determined”, there will certainly not be a whole lot of data for analysis. Determination of the 99.9% exceedance level, based solely on a small dataset of Zipf-distributed data, will have huge error margins. Overall, what they propose seems like a procedure guaranteed to convert a Zipf dataset into a Gaussian dataset, and at that point all bets are off …
In addition, once the “normal pattern of seasonal climate variation” is established, how is one to determine what is a 99.9% outlier? The exact details of how this is done make a big difference. I’m not sure I see a clear and clean way to do it, particularly when the seasonal data has been “adjusted for changes in latitude and altitude”. That implies that they are not using anomalies but absolute values, and that always makes things stickier. But they don’t say how they plan to do it …
In closing, I bring all of this up, not to oppose the BEST crew or make them wrong or pick on errors, but to assist them in making their work bulletproof. I am overjoyed that they are doing what they are doing. I bring this up to make their product better by crowd-sourcing ideas and objections to how they plan to analyze the data.
Accordingly, I will ask the assistance of the moderators in politely removing any posts talking about whether BEST will or won’t come up with anything good, or of their motives, or whether the eventual product will be useful, or the preliminary results, or anything extraneous. Just paste in “Snipped – OT” to mark them, if you’d be so kind.
This thread is about how to do the temperature analysis properly, not whether to do it, or the doer’s motives, or whether it is worth doing. Those are all good questions, but not for this thread. Please take all of that to a general thread regarding BEST. This thread is about the mathematical analysis and transformation of the data, and nothing else.
w.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.

sky says:
March 24, 2011 at 8:47 pm
eadler says:
March 24, 2011 at 7:02 pm
“In fact you are criticizing a procedure which you don’t understand and haven’t read, because the details haven’t been released yet in a full paper. In any case, the homogenization process is what “vets the data”, so your phrase “without vetting the data” is an unfounded assumption about the procedure you claim to criticize.”
You can presume anything you want about what I ostensibly “don’t understand” about “homogenization,” but any time measured values in a station record are altered or replaced with something inferred statistically from other stations, the observational basis is no longer there. Bona fide vetting doesn’t do that! And I don’t waste my time on plainly attitudinal arguments that lack substantive basis.
It is your posts are full of emotional “attitudinal arguments”. I would wager that if the GISS temperature record showed that global warming was non-existent, you wouldn’t worry about homogenization.
Your argument is not logical. There are no locations on earth whose station records are free of inhomongenities during the 160 year period covered by the temperature record. The alternatives are :
1)correct the errors to the best of our ability, using statistical methods
2) allow them to remain in the record and influence the result
3)Eliminate the entire record.
There is no perfect solution.
If you don’t want to have a temperature record, you could do 3), throw out all of the records data that has imperfections.
If you didn’t care about errors, you could forget about corrections. Some people have objected to the corrections, claiming that without them, there would be no evidence of global warming in the temperature record, and offering as proof records of individual stations where this has been the result.
In fact this has been looked at. The resulting trends, without the homogenization corrections, are said to be the similar to the corrected data:
http://tamino.wordpress.com/2011/01/02/hottest-year/#comment-46809
Wrote a program to read in the GHCN data sets and compute simple-minded “dumb average” global temperature anomalies (smoothed with a moving-average filter).
The results were noisier than the official GISS/CRU/etc. results (due to lack of ocean coverage and lack of proper geospatial weighting), but overall results were quite consistent with GISS/CRU/etc.
Overall summary of the results:
GHCN raw and adjusted data produced nearly identical temperature trends.
To me, making an effort to correct the data, with statistical valid techniques, even if there is no quarantee of 100% accuracy, is better than the other 2 options. This is especially important for charting regional trends.
John Trigge says:
“The recent Russian heatwave was (finally) put down to a blocking high. It was unusual, it was unexpected, it could be determined to be an outlier but it occurred, therefore it is a valid piece of data.
Why should it be removed in favour of some-one’s arbitrary idea of what is/is not an outlier?”
There is no reason for temperature data from the Russian heat wave to be rejected by the BEST project. The heat spell occurred for a relatively long period over a large area. While the data point for, say, July 2010 from a given location would appear to be an outlier for that particular location compared with other July’s at that location, other locations nearby would show the same pattern of extremely high temperatures during the same period. The BEST homogenization approach would not throw out data from such extreme events.
Homogenization is designed to identify items such as the UHI, measurement errors, undocumented station movements and changes in the immediate surroundings producing changes in data produced at the location. The approach may detect errors such as a record high temperatures registered in Baltimore not long ago (reported in this blog by Anthony) that were produced by faulty equipment but which was not corrected even though the sensor problem was known by the US Weather Service.
eadler : “The resulting trends, without the homogenization corrections, are said to be the similar to the corrected data:”
They “are said to be”. If the raw data had been kept, plus a record of all the changes with reasons, the trends wouldn’t be “said to be”, they would “be” (or “not be”).
eadler says:
March 25, 2011 at 7:41 am
“It is your posts are full of emotional “attitudinal arguments”. I would wager that if the GISS temperature record showed that global warming was non-existent, you wouldn’t worry about homogenization.
Your argument is not logical. There are no locations on earth whose station records are free of inhomongenities during the 160 year period covered by the temperature record. The alternatives are :
1)correct the errors to the best of our ability, using statistical methods
2) allow them to remain in the record and influence the result
3)Eliminate the entire record. ”
==============================================================
My posts directly address the substantive problems of using scraps of unvetted data and performing uncertain “anomalizations” and ad hoc “homogenizations” in an urban-biased GHCN data base. By mockingly talking about sausages for breakfast and wagering that I wouldn’t worry about homogenization if GISS showed no trend, it is your posts that resort to ad hominem arguments while ignoring the scientific issues.
Non one is calling for elimination of the entire record. Contrary to your claim, there are hundreds of century-long station records unevenly scattered around the globe that are relatively free of data errors or extraneous, non-climatic biasing factors. I’ll not decribe here the advanced signal analysis methods used in vetting them. But I will say that those analyses reveal very widespread corruption of records by inexplicable offsets, as well as by gradual UHI intensification and land-use changes. That corruption cannot be effectively removed by homogenization with other similarly flawed records. All unvetted records indeed should be disregarded in serious climatological studies–even if it leaves great holes in the geographic coverage. What is wholly illogical is the notion that “trends” obtained from indiscriminate inclusion of all records, such as in Tamino’s global comparison with GISS, are climatically meaningful.
BEST’s goal of including all records after performing their own “homogenization” serves no clear scientific purpose.
I see no point in further discussion with someone whose seizes upon colorful adjectives, while paying scant attention to the clear meaning conveyed by nouns and verbs.
sky says:
March 25, 2011 at 3:44 pm
Non one is calling for elimination of the entire record. Contrary to your claim, there are hundreds of century-long station records unevenly scattered around the globe that are relatively free of data errors or extraneous, non-climatic biasing factors. I’ll not decribe here the advanced signal analysis methods used in vetting them. But I will say that those analyses reveal very widespread corruption of records by inexplicable offsets, as well as by gradual UHI intensification and land-use changes. That corruption cannot be effectively removed by homogenization with other similarly flawed records. All unvetted records indeed should be disregarded in serious climatological studies–even if it leaves great holes in the geographic coverage. What is wholly illogical is the notion that “trends” obtained from indiscriminate inclusion of all records, such as in Tamino’s global comparison with GISS, are climatically meaningful.
BEST’s goal of including all records after performing their own “homogenization” serves no clear scientific purpose.
I see no point in further discussion with someone whose seizes upon colorful adjectives, while paying scant attention to the clear meaning conveyed by nouns and verbs.
It is not clear what you mean by “relatively free of data errors and nonclimatic biasing factors”. Do you have evidence that there are hundreds of such stations, and more importantly that they can be used to track climate change around the globe? Is there a reference that I can consult for this.
The survey of homogenization techniques written by 13 researchers from 11 different countries says the following:
http://www.cru.uea.ac.uk/cru/data/temperature/HadCRUT3_accepted.pdf
Unfortunately, most long-term climatological time series have been affected by a number of non-climatic factors that make these data unrepresentative of the actual climate variation occurring over time.
These factors include changes in: instruments, observing practices, station locations, formulae used to calculate means, and station environment (Jones et al., 1985; Karl and Williams, 1987; Gullett et al., 1990;
Heino, 1994). Some changes cause sharp discontinuities while other changes, particularly change in the environment around the station, can cause gradual biases in the data. All of these inhomogeneities can bias a time series and lead to misinterpretations of the studied climate. It is important, therefore, to remove the inhomogeneities or at least determine the possible error they may cause.
Unless you have a convincing reference that shows homogenization is unnecessary and data can be vetted in some way without it, or that the statistical methods you use are not part of the methods reviewed in this paper, I believe I am justified in regarding your claim with a great deal of skepticism.
John Andrews says:
March 24, 2011 at 10:06 pm
Many thanks, John. For me it depends on the dataset.
I just downloaded the England daily rainfall dataset. It’s much closer to either a Zipf or an exponential (power law) distribution than it is to log-normal.
My point was that with any given dataset, what you need to do first is to figure out what kind of non-normal distribution you’re dealing with. I program in R. It has functions for the fitting of data (“fitdistr” in the package “MASS”) to a variety of probability distributions. Only after that’s done can we talk of whether some particular datapoint is out of the ordinary.
Indeed. I create subsequent datasets with each transformation. That lets me examine the effect of each step. In R it’s simple to do.
w.
Eadler, you raise an interesting issue when you quote the HadCRUT folks as saying:
eadler says:
March 25, 2011 at 7:08 pm
Are there problems with the data? Absolutely. All of the problems the HadCRUT folks mentioned, instruments and all the rest, are mixed into the dataset. And I agree with you that we need to deal with that as best we can.
However, do we want to “homogenize” the data? I object to the concept. What we want to do is remove any non-climate signals. This is a very different objective than homogenization, and requires different methods and algorithms.
One problem I discussed above is that while temperatures are well correlated over fairly long distances, the same is not true of trends. For example, a slow decades long wind-shift may not affect one site much, but may gradually warm another site. The temperature records are still well correlated, but the trends are very different.
This renders homogenization as is currently practiced theoretically problematic. I haven’t found an answer to that yet, although you may know of one.
Thanks for your clarification of the issues,
w.
eadler says:
March 25, 2011 at 7:08 pm
“Unless you have a convincing reference that shows homogenization is unnecessary and data can be vetted in some way without it, or that the statistical methods you use are not part of the methods reviewed in this paper, I believe I am justified in regarding your claim with a great deal of skepticism.”
In any proper sense of the term, “homogenization” is applied to eliminate the discrepancies between SPATIALLY separated records; it is typically applied even to data-perfect series to account for gradual UHI effects, rather than to cure TEMPORAL nonstationarities (e.g., datum offsets) or other data deficiencies. To be effective, homogenization–which the CRU white paper conflates with the distinct issues of data QC and record repair–requires a pristine ABSOLUTE baseline for reference, which is usually unavailable. Thus in practice, accurate wholesale homogenization proves impossible.
Vetting is a process that separates the wheat from the chaff RELATIONALLY. Yes, indeed, the signal analysis methods involved are beyond the ken evidenced by the CRU white paper. And those century-long records in the GHCN data base that survive vetting number in the hundreds, albeit concentrated mostly in the USA. The fact that these findings are not summarized in any publicly availabe reference is an impediment only to academic pursuits and not to serious research.