Not Whether, but How to Do The Math

Guest Post by Willis Eschenbach

The Berkeley Earth Surface Temperature (BEST) team is making a new global climate temperature record. Hopefully this will give us a better handle on what’s going on with the temperature.

BEST has put out a list of the four goals for their mathematical methods (algorithms). I like three of those goals a lot. One I’m not so fond of. Here are their goals:

1)  Make it possible to exploit relatively short (e.g. a few years) or discontinuous station records. Rather than simply excluding all short records, we prefer to design a system that allow short records to be used with a low – but non‐zero – weighting whenever it is practical to do so.

2)  Avoid gridding. All three major research groups currently rely on spatial gridding in their averaging algorithms. As a result, the effective averages may dependant on the choice of grid pattern and may be sensitive to effects such as the change in grid cell area with latitude. Our algorithms seek to eliminate explicit gridding entirely.

3)  Place empirical homogenization on an equal footing with other averaging. We distinguish empirical homogenization from evidence‐based homogenization. Evidence‐based adjustments to records occur when secondary data and/or metadata is used to identify problems with a record and propose adjustments. By contrast, empirical homogenization is the process of comparing a record to its neighbors to detect undocumented discontinuities and other changes. This empirical process performs a kind of averaging as local outliers are replaced with the basic behavior of the local group. Rather than regarding empirical homogenization as a separate preprocessing step, we plan to incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.

4)  Provide uncertainty estimates for the full time series through all steps in the process.

Using short series, avoiding gridding, and uncertainty estimates are all great goals. But the whole question of “empirical homogenization” is fraught with hidden problems and traps for the unwary.

The first of these is that nature is essentially not homogeneous. It is pied and dappled, patched and plotted. It generally doesn’t move smoothly from one state to another, it moves abruptly. It tends to favor Zipf distributions, which are about as non-normal (i.e. non-Gaussian) as a distribution can get.

So I object to the way that the problem is conceptualized. The problem is not that the data requires “homogenization”, that’s a procedure for milk. The problem is that there are undocumented discontinuities or incorrect data entries. But homogenizing the data is not the answer to that.

This is particularly true since (if I understand what they’re saying) they have already told us how they plan to deal with discontinuities. The plan, which I’ve been pushing for some time now, is to simply break the series apart at the discontinuities and treat it at two separate series. And that’s a good plan. They say:

Data split: Each unique record was broken up into fragments having no gaps longer than 1 year. Each fragment was then treated as a separate record for filtering and merging. Note however that the number of stations is based on the number of unique locations, and not the number of record fragments.

So why would they deal with “empirical discontinuities” by adjusting them, and deal with other discontinuities in a totally different manner?

Next, I object to the plan that they will “incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.” This will make it very difficult to back it out of the calculations to see what effect it has had. It will also hugely complicate the question of the estimation of error. For any step-wise process, it is crucial to separate the steps so the effect of each single step can be understood and evaluated.

Finally, let’s consider the nature of the “homogenization” process they propose. They describe it as a process whereby:

… local outliers are replaced with the basic behavior of the local group

There’s a number of problems with that.

First, temperatures generally follow a Zipf distribution (a distribution with a large excess of extreme values). As a result, what would definitely be “extreme outliers” in a Gaussian distribution are just another day in the life in a Zipf distribution. A very unusual and uncommon temperature in a Gaussian distribution may be a fairly common and mundane temperature in a Zipf distribution. If you pull those so-called outliers out of the dataset, or replace them with a local average, and you no longer have temperature data – you have Gaussian data. So you have to be real, real careful before you declare an outlier. I would certainly look at the distributions before and after “homogenization”, to see if the Zipf nature of the distribution has disappeared … and if so, I’d reconsider my algorithm.

Second, while there is a generally high correlation between temperature datasets out to 1200 km or so, that’s all that it is. A correlation. It is not a law. For any given station, there will often be nearby datasets that have very little correlation. In addition, for each of the highly correlated pairs, there will be a number of individual years where the variation in the two datasets is quite large. So despite high correlation, we cannot just assume that any record that disagrees with the “local group” is incorrect, as the BEST folks seem to be proposing.

Third, since nature itself is almost “anti-homogeneous”, full of abrupt changes and frequent odd occurrences and outliers, why would we want to “homogenize” a dataset at all? If we find data we know to be bad, throw it out. Don’t just replace it with some imaginary number that you think is somehow more homogeneous.

Fourth, although the temperature data is highly correlated out for a long distance, the same is not true of the trend. See my post on Alaskan trends regarding this question. Since the trends are not correlated, adjustment based on neighbors may well introduce a spurious trend. If the “basic behavior of the local group” is trending upwards, and the data being homogenized is trending horizontally, both may indeed be correct, and homogenization will destroy that …

Those are some of the problems with “homogenization” that I see. I’d start by naming it something else. It does not describe what we wish to do to the data. Nature is not homogenous, and neither should our dataset be homogeneous.

Then I’d use the local group, solely to locate unusual “outliers” or shifts in variance or average temperature.

But there’s no way I’d replace the putative “outliers” or shifts with the behavior of the “local group”. Why should I? If all you are doing is bringing the data in line with the average of the local group, why not just throw it out entirely and use the local average? What’s the advantage?

Instead, if I found such an actual anomaly or incorrect data point, I’d just throw out the bad data point, and break the original temperature record in two at that point, and consider it as two different records. Why average it with anything at all? That’s introducing extraneous information into a pristine dataset, what’s the point of that?

Lastly, a couple of issues with their quality control procedures. They say:

Local outlier filter: We tested for and flagged values that exceeded a locally determined empirical 99.9% threshold for normal climate variation in each record.

and

Regional filter: For each record, the 21 nearest neighbors having at least 5 years of record were located. These were used to estimate a normal pattern of seasonal climate variation. After adjusting for changes in latitude and altitude, each record was compared to its local normal pattern and 99.9% outliers were flagged.

Again, I’d be real, real cautious about these procedures. Since the value in both cases is “locally determined”, there will certainly not be a whole lot of data for analysis. Determination of the 99.9% exceedance level, based solely on a small dataset of Zipf-distributed data, will have huge error margins. Overall, what they propose seems like a procedure guaranteed to convert a Zipf dataset into a Gaussian dataset, and at that point all bets are off …

In addition, once the “normal pattern of seasonal climate variation” is established, how is one to determine what is a 99.9% outlier? The exact details of how this is done make a big difference. I’m not sure I see a clear and clean way to do it, particularly when the seasonal data has been “adjusted for changes in latitude and altitude”. That implies that they are not using anomalies but absolute values, and that always makes things stickier. But they don’t say how they plan to do it …

In closing, I bring all of this up, not to oppose the BEST crew or make them wrong or pick on errors, but to assist them in making their work bulletproof. I am overjoyed that they are doing what they are doing. I bring this up to make their product better by crowd-sourcing ideas and objections to how they plan to analyze the data.

Accordingly, I will ask the assistance of the moderators in politely removing any posts talking about whether BEST will or won’t come up with anything good, or of their motives, or whether the eventual product will be useful, or the preliminary results, or anything extraneous. Just paste in “Snipped – OT” to mark them, if you’d be so kind.

This thread is about how to do the temperature analysis properly, not whether to do it, or the doer’s motives, or whether it is worth doing. Those are all good questions, but not for this thread. Please take all of that to a general thread regarding BEST. This thread is about the mathematical analysis and transformation of the data, and nothing else.

w.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

158 Comments
Inline Feedbacks
View all comments
March 23, 2011 7:56 am

\\ Rather than regarding empirical homogenization as a separate preprocessing step, we plan to incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.//
There is no doubt we need to clean up the data (1) and find the M20 readings recorded as +20 instead of -20. Some cleanup is absolutely necessary. Nevertheless, one persons outlier might be either a local heat wave or an errant blast from a jet engine.
My questions can all be laid to rest with comforting answers to these questions:
A) will EVERY data value flagged as an outlier or suspicious be logged with the corresponding filter type and statistical criteria? Will that Log be part of the transparency of process?
B) will every data value flagged be logged with all the filters it fails?
C) will the data corrections be a separate log entry with queryable justification?
D) will a homogenization log be available for independent statistical analysis so that each of the filter criteria can be independently analyzed and or audited for clustering?
E) What will be the process for correcting some of the corrections?
F) What about the data that WASN’T flagged, but was close. Will that appear anywhere? In short, are there several levels of flagging? Why should there be one and only one set of corrections?
(1) BTW: what are the sources of the BEST data set they are starting with? What clean up has already been done with it prior to getting it in their hands?

DesertYote
March 23, 2011 7:57 am

As someone who use to develop code for calculating the Measurement Uncertainty of modern digital test equipment, I fear no evil. But the thought of trying to do this with the homogenized mess being proposed is going to give me nightmares.

Allen63
March 23, 2011 8:00 am

Since, in the end, folks will use the data to estimate “temperature changes” with time, it seems that it might be appropriate to evaluate changes directly (rather than via estimated absolute temperatures).
Then, combine the estimated station-by-station changes in some acceptable way to obtain an “average” global change (over a time period).
This would seem to solve some problems — while no doubt introducing others. Nonetheless, it may use the raw data with fewer arbitrary corrections and does go straight to the heart of the questions — how much has the temperature changed and when.

Ken Harvey
March 23, 2011 8:08 am

You can’t do good mathematics with bad numbers. Period. Using statistical methods does not change that. Once data has been adjusted, or includes numbers taken from a series shorter than the main body of data, then the result is nothing more nor less than somebody’s opinion. 2 + 2 = 4. 2 + something that looks as though it may be between 1 and 3 is simply an unknown. Your best estimate may, or may not, be better than my best guess, but unless we can take a measure to it the world will never know.

Gary Swift
March 23, 2011 8:16 am

I have been following this project since it first went public and have given a lot of thought on this subject; how best to go about making the results accurate and transparent, and leave as little doubt in the results as possible.
My conclusion and advice to the BEST team is this:
Provide the results as an interactive tool, rather than just a single final analysis. Provide option check-boxes so that anyone who wonders what the results would look like if a particular step had been done some other way, you can simply check/uncheck a box and the application would show the results of the other method. That would free the BEST team from needing to make hard choices between mutually exclusive methods that may each have merrit and risk. The application would not need to do the math over and over. That could be done one time, then show the results in an Adobe Flash web application or something similar. It would just be graphics.

RandomThesis
March 23, 2011 8:17 am

I am struck by the thought that while BEST may give us more precise temperature information, it somewhat like relying on a cars odometer for miles driven without taking into account whether the vehicle is on a dynamometer or converted with monster truck tires. We may end up with more precise information but not anymore accurate.
Eliminating (ignoring?, waiting for?) humidity and ocean temps is just too much when we base trillion dollar decisions on this data. We must be clear that while improvements to data are great, that in of itself does not constitute information, knowledge or wisdom.

Nib
March 23, 2011 8:19 am

How do warm breezes and wind chills figure into records?
Here in Toronto it’s -5. With the wind chill, it feels like -13.

Pooh, Dixie
March 23, 2011 8:47 am

UHI Adjustments?
What are the requirements / specifications for the Urban Heat Island factor, if any?

David A. Evans.
March 23, 2011 8:50 am

[“Snipped – OT” – see article body]

Pooh, Dixie
March 23, 2011 8:56 am

Archives?
What are the requirements / specifications for retaining raw data, adjusted data, requirements / specifications, operating processes / procedures, code, off-the-shelf applications and any parameter values as a historical change log? (Configuration Management)

PeterB in indianapolis
March 23, 2011 9:13 am

[“Snipped – OT” – see article body]

Bob Diaz
March 23, 2011 9:14 am

Very interesting; assuming that the raw data and all software to process the data is open for review and others to analyze, it will be interesting to see a graph of all the different results showing the range of different numbers. If one analysis showed a value of xxx for a given time period and another showed a value of yyy, |xxx-yyy| would show us the difference between the two. (Assume that xxx is the highest value of all the studies and yyy is the lowest value.) I wonder how much it’s going to be?
Even if all the different calculations end in results that are very close, we’re still left with the problem of: How much is due to CO2 and how much is a natural change?
Bob Diaz

Ian W
March 23, 2011 9:27 am

[“Snipped – OT” – see article body]

Pooh, Dixie
March 23, 2011 9:30 am

Temperature Station Attributes?
It might be useful to record the station attributes of the measurements: instruments, location, site and NOAA estimated error (http://www.surfacestations.org/). A relational link to the station attributes might do the job; it should be date stamped to provide for changes.

March 23, 2011 9:32 am

jorgekafkazar says: “Ignoring the issues of correlation not proving causation and “global temperature” being a purely theoretical concept, the statistical obstacle you mention makes this “BEST” effort seem doomed to irrelevance..”
Willis Eschenbach replies: I disagree entirely. The BEST effort is just beginning, how could it be “doomed” to anything this early in the game? The procedures are new, and will certainly be changed at some point. In a battle, the first casualty is always the battle plan … they will soldier on.
Despite your military metaphor, you are thinking like the truth-seeking scientist you are. I certainly hope you are correct, but keep in mind, too, these military quotations:
1. “In war, truth is the first casualty.” Aeschylus
2. “We have a fifth column inside the city.” – General Mola

anna v
March 23, 2011 9:36 am

Whereas I can see that there is a meaning in having as good as possible map of temperatures, I see little meaning in having a map of anomalies. It is heat content that is important to tell us if the world is heating or cooling. Once one has a good map of temperatures one can use a modified black body radiation and integrate to get the total energy radiated by the earth.
Anomalies are just red herrings, in my opinion. One gets 15 degree anomalies at the poles with much less energy radiated than the 2 degree anomalies at the tropics. It is the energy that is important.

Lance Wallace
March 23, 2011 9:40 am

I dropped a line to Elizabeth Muller of the BEST project and was pleasantly surprised that she actually answered. I was making the point that all these new data stations would need to be investigated in the same way as Anthony’s surfacestations project; otherwise we would just be adding lots of data of uncertain quality. I thought the BEST folks were missing a great opportunity to harness the energies of many people globally to investigate a station or two each or just let us know about their individual experiences at one location. Although her response was very pleasant and accommodating, there was no indication that such a thing was planned. She said the station locations would be revealed at the same time as their results a few months hence.

Rex
March 23, 2011 9:40 am

[“Snipped – OT” – see article body]

DJ
March 23, 2011 9:52 am

Maybe I missed it in previous discussion…..
Who is doing the peer review of the product of BEST?
My real concern is that they’ll publish a report, and it’ll be taken at face value immediately by the msm, embraced in full, and given gospel status.
It needs to be completely open to review and discussion if it’s going to be considered valid science, and as Willis points out, there are legitimate concerns before it even starts. We’ve been down this road before, where the outcome is predetermined. This could well be just that, where at the end will be the pronouncement that “We were right all along, we did everything right, the earth is warming at an alarming rate, and the word Denier should be spelled with a capital D”.
With severe cutbacks in educational funding at universities, the push for more self-sufficient professor funding and research, the old adage “No problem, no funding” is stronger than ever.

March 23, 2011 9:53 am

Thank you Willis for drawing attention to this issue.
I concur with AllenC that BEST must provide for a different kind of analysis than what we normally see. I believe he refers to the studies done by JR Wakefield, whose work I also admire.
What is so striking about Wakefield’s research is his refraining from any combining or homogenizing of data across geography. In this way, the uniqueness of each microclimate is respected and understood.
IMO we need BEST to produce three things:
1) A validated record of the actual temperature measurements at each site in the database. (This should be the baseline for any kind of analysis, but must be kept separate from any averaging that others may do later in other kinds of studies.)
2) A representation of the climate pattern over time at each site, to the extent the record allows. This would ideally include not only trends of daily averages, but also daily mins and maxs, changes in seasons (earlier or later winters, earlier or later springs), changes in frequencies of extremes (for example >30C, <-20C).
3) An analysis comparing local climate patterns to discern dominant trends at regional, national and continental levels.
This kind of research shows what is actually happening in climates in a way that people can relate to their own experiences. Even more, the results will be extremely useful to local and regional authorities in their efforts to adapt to actual climate change, whatever it is.
Adaptation efforts would also be better informed if precipitation and humidity records and patterns were included, but I take your point about first things first.

John Tofflemire
March 23, 2011 10:10 am

Pooh Dixie says:
“UHI Adjustments? What are the requirements / specifications for the Urban Heat Island factor, if any?”
BEST’s homogenization process is designed, among other things, to identify the presence and magnitude of urban heat islands.

Jeff Carlson
March 23, 2011 10:13 am

homogenized data is no longer raw data but a guess … GIGO … if you don’t have a temperature reading over hundreds of miles of surface, and I can’t stress this enough, YOU DON”T HAVE RAW DATA …. blending, averaging, homoginizing doesn’t fix that problem and that problem means you are not doing climate science …
we don’t have raw global temerature data that is equally distributed over the surface … i.e. we don’t have raw data …
after that anything they do is just guessing and while this project is an interesting statistical exercise but it is not climate science …

March 23, 2011 10:19 am

Willis, a brief comment. The Berkeley group may not present the “last word” in how to do the surface temperature analysis. I regard this as the first important step, in terms of a comprehensive data set that is well documented and transparent. They have introduced some new methodologies, which are all steps in the right direction, IMO. Their analysis will lay the foundation for others to try other methods and to improve on the analysis
Judith Curry

Verified by MonsterInsights