Not Whether, but How to Do The Math

Guest Post by Willis Eschenbach

The Berkeley Earth Surface Temperature (BEST) team is making a new global climate temperature record. Hopefully this will give us a better handle on what’s going on with the temperature.

BEST has put out a list of the four goals for their mathematical methods (algorithms). I like three of those goals a lot. One I’m not so fond of. Here are their goals:

1) Make it possible to exploit relatively short (e.g. a few years) or discontinuous station records. Rather than simply excluding all short records, we prefer to design a system that allow short records to be used with a low – but non‐zero – weighting whenever it is practical to do so.

2) Avoid gridding. All three major research groups currently rely on spatial gridding in their averaging algorithms. As a result, the effective averages may dependant on the choice of grid pattern and may be sensitive to effects such as the change in grid cell area with latitude. Our algorithms seek to eliminate explicit gridding entirely.

3) Place empirical homogenization on an equal footing with other averaging. We distinguish empirical homogenization from evidence‐based homogenization. Evidence‐based adjustments to records occur when secondary data and/or metadata is used to identify problems with a record and propose adjustments. By contrast, empirical homogenization is the process of comparing a record to its neighbors to detect undocumented discontinuities and other changes. This empirical process performs a kind of averaging as local outliers are replaced with the basic behavior of the local group. Rather than regarding empirical homogenization as a separate preprocessing step, we plan to incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.

4) Provide uncertainty estimates for the full time series through all steps in the process.

Using short series, avoiding gridding, and uncertainty estimates are all great goals. But the whole question of “empirical homogenization” is fraught with hidden problems and traps for the unwary.

The first of these is that nature is essentially not homogeneous. It is pied and dappled, patched and plotted. It generally doesn’t move smoothly from one state to another, it moves abruptly. It tends to favor Zipf distributions, which are about as non-normal (i.e. non-Gaussian) as a distribution can get.

So I object to the way that the problem is conceptualized. The problem is not that the data requires “homogenization”, that’s a procedure for milk. The problem is that there are undocumented discontinuities or incorrect data entries. But homogenizing the data is not the answer to that.

This is particularly true since (if I understand what they’re saying) they have already told us how they plan to deal with discontinuities. The plan, which I’ve been pushing for some time now, is to simply break the series apart at the discontinuities and treat it at two separate series. And that’s a good plan. They say:

Data split: Each unique record was broken up into fragments having no gaps longer than 1 year. Each fragment was then treated as a separate record for filtering and merging. Note however that the number of stations is based on the number of unique locations, and not the number of record fragments.

So why would they deal with “empirical discontinuities” by adjusting them, and deal with other discontinuities in a totally different manner?

Next, I object to the plan that they will “incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.” This will make it very difficult to back it out of the calculations to see what effect it has had. It will also hugely complicate the question of the estimation of error. For any step-wise process, it is crucial to separate the steps so the effect of each single step can be understood and evaluated.

Finally, let’s consider the nature of the “homogenization” process they propose. They describe it as a process whereby:

… local outliers are replaced with the basic behavior of the local group

There’s a number of problems with that.

First, temperatures generally follow a Zipf distribution (a distribution with a large excess of extreme values). As a result, what would definitely be “extreme outliers” in a Gaussian distribution are just another day in the life in a Zipf distribution. A very unusual and uncommon temperature in a Gaussian distribution may be a fairly common and mundane temperature in a Zipf distribution. If you pull those so-called outliers out of the dataset, or replace them with a local average, and you no longer have temperature data – you have Gaussian data. So you have to be real, real careful before you declare an outlier. I would certainly look at the distributions before and after “homogenization”, to see if the Zipf nature of the distribution has disappeared … and if so, I’d reconsider my algorithm.

Second, while there is a generally high correlation between temperature datasets out to 1200 km or so, that’s all that it is. A correlation. It is not a law. For any given station, there will often be nearby datasets that have very little correlation. In addition, for each of the highly correlated pairs, there will be a number of individual years where the variation in the two datasets is quite large. So despite high correlation, we cannot just assume that any record that disagrees with the “local group” is incorrect, as the BEST folks seem to be proposing.

Third, since nature itself is almost “anti-homogeneous”, full of abrupt changes and frequent odd occurrences and outliers, why would we want to “homogenize” a dataset at all? If we find data we know to be bad, throw it out. Don’t just replace it with some imaginary number that you think is somehow more homogeneous.

Fourth, although the temperature data is highly correlated out for a long distance, the same is not true of the trend. See my post on Alaskan trends regarding this question. Since the trends are not correlated, adjustment based on neighbors may well introduce a spurious trend. If the “basic behavior of the local group” is trending upwards, and the data being homogenized is trending horizontally, both may indeed be correct, and homogenization will destroy that …

Those are some of the problems with “homogenization” that I see. I’d start by naming it something else. It does not describe what we wish to do to the data. Nature is not homogenous, and neither should our dataset be homogeneous.

Then I’d use the local group, solely to locate unusual “outliers” or shifts in variance or average temperature.

But there’s no way I’d replace the putative “outliers” or shifts with the behavior of the “local group”. Why should I? If all you are doing is bringing the data in line with the average of the local group, why not just throw it out entirely and use the local average? What’s the advantage?

Instead, if I found such an actual anomaly or incorrect data point, I’d just throw out the bad data point, and break the original temperature record in two at that point, and consider it as two different records. Why average it with anything at all? That’s introducing extraneous information into a pristine dataset, what’s the point of that?

Lastly, a couple of issues with their quality control procedures. They say:

Local outlier filter: We tested for and flagged values that exceeded a locally determined empirical 99.9% threshold for normal climate variation in each record.

and

Regional filter: For each record, the 21 nearest neighbors having at least 5 years of record were located. These were used to estimate a normal pattern of seasonal climate variation. After adjusting for changes in latitude and altitude, each record was compared to its local normal pattern and 99.9% outliers were flagged.

Again, I’d be real, real cautious about these procedures. Since the value in both cases is “locally determined”, there will certainly not be a whole lot of data for analysis. Determination of the 99.9% exceedance level, based solely on a small dataset of Zipf-distributed data, will have huge error margins. Overall, what they propose seems like a procedure guaranteed to convert a Zipf dataset into a Gaussian dataset, and at that point all bets are off …

In addition, once the “normal pattern of seasonal climate variation” is established, how is one to determine what is a 99.9% outlier? The exact details of how this is done make a big difference. I’m not sure I see a clear and clean way to do it, particularly when the seasonal data has been “adjusted for changes in latitude and altitude”. That implies that they are not using anomalies but absolute values, and that always makes things stickier. But they don’t say how they plan to do it …

In closing, I bring all of this up, not to oppose the BEST crew or make them wrong or pick on errors, but to assist them in making their work bulletproof. I am overjoyed that they are doing what they are doing. I bring this up to make their product better by crowd-sourcing ideas and objections to how they plan to analyze the data.

Accordingly, I will ask the assistance of the moderators in politely removing any posts talking about whether BEST will or won’t come up with anything good, or of their motives, or whether the eventual product will be useful, or the preliminary results, or anything extraneous. Just paste in “Snipped – OT” to mark them, if you’d be so kind.

This thread is about how to do the temperature analysis properly, not whether to do it, or the doer’s motives, or whether it is worth doing. Those are all good questions, but not for this thread. Please take all of that to a general thread regarding BEST. This thread is about the mathematical analysis and transformation of the data, and nothing else.

0 0 votes

Article Rating

158 Comments

crosspatch

March 23, 2011 12:17 am

Temperatures are odd things. Climate conditions can vary considerably over only a few miles of geographical distance. The weather in Truckee, California is a lot different than the weather at Reno, Nevada and they are only about 40 miles apart.
The problem is that regions often have several microclimates. I might lump Half Moon Bay with Pacifica, but would lump neither with San Mateo though San Mateo with Redwood City might make perfect sense. The point is that any “homogenization” of temperatures must take regional geography into account and not simply distance on a map.
Reply: crosspatch, your comment is perfectly intelligible to me and others in the Bay Area, but we have quite an International audience. San Mateo and Redwood City just might be a bit obscure to our readers on a few continents. ~ ctm.

Brian H

March 23, 2011 12:42 am

There’s a tautology problem here. Only by examination of the outliers and their relation to the entire data set can you determine whether they have important information to convey. Discarding them eliminates that possibility.

Sean

March 23, 2011 12:43 am

I hope the raw data and the programming is made available online under as open license/agreement . It is most important BEST make that a priority. They will not do everything right in detail. Simply because honest people do not actually agree what is the right maths in detail. The key point is we can see what was done and play with the code/data to see if the results are robust to other assumptions.

jorgekafkazar

March 23, 2011 12:45 am

Ignoring the issues of correlation not proving causation and “global temperature” being a purely theoretical concept, the statistical obstacle you mention makes this “BEST” effort seem doomed to irrelevance.

BigWaveDave

March 23, 2011 12:53 am

What good is temperature data without corresponding pressure and humidity?

John Kehr

March 23, 2011 12:59 am

When I read their descriptions I get a different understanding than they do. Most instance, lets compare Sacramento and South Lake Tahoe. They are ~120 miles apart. If the temperature data for Sacramento is colder than South Lake Tahoe for one reading during the day, then it is flagged and thrown out. Or the reverse is thrown out.
This ties back into Finland having readings that are 20C instead of -20C in the middle of winter because a negative got missed.
Lots of testing will be needed for the algorithms to get it correct, but such correlated behavior is a good way to get rid of bad data. I agree with not inserting an imaginary number as weather will skew things, but throwing it away is the right thing to do.
I like the idea to get rid of the gridding. There is so much empty space without any valid data. Why is that filled in? That is truly meaningless.
One thing that isn’t clear to me is if they will continue to use the min/max temperatures only. With modern logging that is a foolish limitation. The problem with that is comparing to old records that only have that resolution, but that is where the amount of error enters. Older records with only min/max would have higher error than modern data that had 24 points per day for the daily average. Correlating the 24 data day to the min/max days would be interesting to see.
There is more, but this is already getting too long for a comment.
John Kehr

Willis Eschenbach

Author

March 23, 2011 1:10 am

jorgekafkazar says:
March 23, 2011 at 12:45 am

Ignoring the issues of correlation not proving causation and “global temperature” being a purely theoretical concept, the statistical obstacle you mention makes this “BEST” effort seem doomed to irrelevance.

I disagree entirely. The BEST effort is just beginning, how could it be “doomed” to anything this early in the game? The procedures are new, and will certainly be changed at some point. In a battle, the first casualty is always the battle plan … they will soldier on.
w.

Willis Eschenbach

Author

March 23, 2011 1:13 am

BigWaveDave says:
March 23, 2011 at 12:53 am (Edit)

What good is temperature data without corresponding pressure and humidity?

What good is steak and eggs without beer? It’s not a full meal by any means, but it’s better than nothing.
Yes, it would be nice to have pressure and humidity as well … but let’s take them one piece at a time, get that piece right as best we know how, and then move on to the next piece.
Thanks,
w.

Tenuc

March 23, 2011 1:20 am

I can’t understand why climate scientists always seem to want to homogenise data?
They are aware that climate is ultimately driven by deterministic chaos and must also be aware that any homogenisation process will destroy information and be a handicap to intuition.
It is also deplorable that the BEST team seek to hide the adjustments they make, rather that letting the raw data stand, then adding single adjustment layers for each massaging of the base data. There current approach will make it much harder to achieve their fifth objective…
“To provide an open platform for further analysis by publishing our complete data and software code as well as tools to aid both professional and amateur exploration of the data.”

Thomas L

March 23, 2011 1:24 am

At any level of usefulness, real-world statisticians with relevant experience need to be brought in early in the process. I have many years of experience in working with small datasets (insurance actuary), but I could easily have missed this.
It’s critical to be able to start with as much raw data as possible. And from open source work, to make that raw data static and available. None of this moving averages for 1934, where by moving average, I mean that the average for that year (and most years) has many different values over time. Less than 5K of data per year per station for early years. About 50K per station per year if we have hourly data. 10,000 stations, 100 years, 50GB for entire dataset, 500MB (less than one CD) for annual updates. Only raw fixes I’d add would be minus signs where the raw data clearly omitted them. One or two CDs for metadata, again static (any adjustments to prior data need date, person, method, reason, and do not modify any previous data, just give suggested/recommended adjustments to data).
As you point out, their definition of homogenization screams to me “Don’t do that!”. If needed, do it late and separately, with full documentation (before, after, reasons, full computer code, including source, compiler/version, hardware, running time, enough to exactly reproduce results).

Bob from the UK

March 23, 2011 1:30 am

I don’t see how a temperature record can be made without using approximations which may be questionable.
I think the key therefore is to identify these approximations/adjustments, make them transparent. Adjustments and approximations which significantly affect the record, upwards or downwards trends need to be examined, eg extrapolation over large areas. I would expect most of them to balance out over time. The most significant problem I’m aware of is the “Urban Heat Island”, however this is not a processing problem, but a measurement problem.

Paul

March 23, 2011 1:53 am

Speaking with my physicist cap on, rather than adjusting data, wouldn’t it be better to increase the error bars instead? I get a bad feeling when I see data being adjusted.

Jack Hughes

March 23, 2011 1:59 am

“local outliers are replaced with the basic behavior of the local group”
So Einsteins’s ideas will be replaced with the basic averaged ideas of the Swiss people. Newton’s Laws will be replaced with the average thinking of people who sit under apple trees.

dahuang

March 23, 2011 2:01 am

I think any project aiming to obtain the global mean temperature should first refute Essex et al. (2007), otherwise I see little value in such efforts. BEST is no exception.
Ref. http://www.uoguelph.ca/~rmckitri/research/globaltemp/globaltemp.html

Scottish Sceptic

March 23, 2011 2:02 am

I fear you are very much concentrating on the wrong problem. Even if you produce a statistically “perfect” method of analysis, the megasaurus in the dunny is that the raw local temperature experienced by the Met Stations has changed due to several factors including macro urban heating, micro changes to surfaces and land use around the station, and many stations were moved closer to buildings in the 1970s+ in order to facilitate automation.
What we really need is for people to get off their seats in their ivory towers of academia, and go and visit each and every one of these sites. To characterise the site both now and historically in terms of their relationship to buildings and surfaces like tarmac. Only once you understand the micro-urbanisation changes can you really start considering macro changes like overall global temperature otherwise it is a totally meaningless figure.
And as a lot of these changes will not documented and many of those who are the primary source of information for the 1970s are either retiring or dying, unless someone gets off their backside, puts together the millions upon millions such a project will actually take and gets going … this information will be lost for ever and we will never know what actually happened to global temperatures last century

Saul Jacka

March 23, 2011 2:05 am

I am astounded at the homogenisation suggestion. Any serious statistical analysis should start by looking at the data closely and querying anything suspicious. However, the way to deal with suspicious data points is either (after long consideration) to treat them as missing data or else to build a statistical model with observation error probabilities in it (i.e. not necessarily the usual Gaussian measurement noise, but different error distributions or even positive probabilities of measurements unrelated to the “true” values).
Homogenisation is a major source of overconfidence in statistically bland outcomes.

Stephen Richards

March 23, 2011 2:22 am

I agree totally with you , Willis BUT I do want to the raw data plotted as well as the processed. I hate adjustments to data, any data. It ‘feels’, well, just completely wrong.

John Marshall

March 23, 2011 2:23 am

This research relies on the assumption that temperature is a good measure of AGW. It is not.
All the warmist assumptions are based on the idea of equilibrium of energy exchange. The planet does not achieve equilibrium with temperature which is one reason why climates change, probably the main reason.
Atmospheric temperatures will also change far more quickly than those of the oceans which is why ocean temperatures are far more important an indicator of heat exchanges.

joe

March 23, 2011 2:31 am

but will their work be open to the public? can we have code and data? can we therefore replace some algorithms?

NikFromNYC

March 23, 2011 2:44 am

What mechanisms might introduce damaging systematic instead relatively benign random errors if outliers are left in, tossed out or massaged in? Or is it more a matter of error bars not being accurate instead of just the trend being off? Knowledge of error types seems important here versus placing too much reliance on blind statistics that conform to some lofty idea of elegance.
Also of great concern to me since I like to harp on very old single site records is how to deal with not being so lazy about cutting the global average off prior to 1880 like GISS does! There are three USA records that are fully continuous back to around 1820, and about a dozen in Europe back to the 1700s. There are many dozens that carry back to 1830 instead of just 1880. Extending the slope back another half century would help resolve the hockey stick debate.

Keith Wallis

March 23, 2011 2:51 am

As soon as I saw the third goal in the piece I had the same concern as you, Willis. BEST (or anyone else) can’t just make the assumption that an outlier is an error and ‘homogenize’ it. It may be an error, it may be due to a microclimate, a very localised weather phenomenon (e.g. a fog-prone valley), etc.
I’m afraid this stuff can’t just be waved away, as it’s the way these very differences over small areas have been handled that have caused quite a few of the issues with the existing global temperature sets (Eureka as an example, perhaps). It’s going to take a heck of a lot of hard yards, but that’s what must be done in order to get it ‘right’. Blanket processing algorithms won’t wash – it’ll take human eyes to look at and understand every significant local variation before a more generalised rule can be written to handle them automatically.
Yes, it’s an enormous amount of work for a lot of people, but nobody ever said this was easy. There are no short-cuts.

vukcevic

March 23, 2011 2:51 am

ctm Crosspatch makes a good point.
For international audience here is an example that I have looked into recently. Rainfall is one of the principal climate parameters. Oxford and Cambridge, two UK’s university cities are only 65 miles (~ 105 km) apart, geographical features are very similar. It is likely that they have as accurate records as you can find anywhere in the world. Not only that there is considerable difference in the amount of rainfall, but even trends are different for two not so distant places. http://www.vukcevic.talktalk.net/Oxbridge.htm
Reply: I wasn’t disagreeing with his point. He lives in my area and was using examples I thought might just be a bit obscure. ~ ctm

David L

March 23, 2011 2:59 am

Any highschool student can tell you that the “average” is only an acceptable measure of central tendency for a normal distribution. If it’s non-normal then the median or mode is appropriate. Or one can do a Box-Cox analysis on the distribution and apply a transform to make it normal. BWhy do learned scientists keep forgetting this fact?

Ian W

March 23, 2011 3:06 am

Willis Eschenbach says:
March 23, 2011 at 1:13 am
BigWaveDave says:
March 23, 2011 at 12:53 am (Edit)
What good is temperature data without corresponding pressure and humidity?
What good is steak and eggs without beer? It’s not a full meal by any means, but it’s better than nothing.
Yes, it would be nice to have pressure and humidity as well … but let’s take them one piece at a time, get that piece right as best we know how, and then move on to the next piece.
Thanks,
w.
Hello Willis,
I think that you missed the point. Temperature is not a measure of energy yet it is an increase in ‘trapped’ energy due to GHG that is being claimed to have changed the Earth’s energy budget.
The amount of energy needed to raise the temperature of dry polar air one degree centigrade is significantly less (eighty times less?) than the amount of energy required to raise humid tropical air one degree centigrade. This is due to the enthalpy of humid air.
see http://www.engineeringtoolbox.com/enthalpy-moist-air-d_683.html
When the polar vortices expand, as they have done recently with the equatorward movements of the jetstreams, the atmospheric humidity balance alters. If temperatures are just averaged without regard to the humidity changes then for the same amount of energy there can be a significant rise in temperature.
It would really be a good idea to decide if the intent is to measure energy budget or not.
Then when you are measuring the correct variable, you can have concerns on whether a cool site in a valley on the northern side of the mountains is ‘homogenized’ because the 3 sites 50 miles away on the southern side of the mountains are so much warmer, and then the outliers in their temperatures removed to make the temperature pattern look nice and Gaussian.

Mike Jonas

Editor

March 23, 2011 3:08 am

I agree with Willis re homogenisation and outliers. Where I live in NSW Australia, a few degrees shift in wind direction is the difference between cold from the Snowy Mountains and stinking hot from the interior. Temperatures can vary wildly over short distances.
I looked up stations near Moss Vale on the Bureau of Meteorology website http://www.bom.gov.au, and there are three of them with some temperatures for January 2011 that are NOT marked “Not quality controlled or uncertain, or precise date unknown”. Their temperatures for Jan 3, 7-10, 13,16 (that’s all the Jan days they have in common) are as follows:
MOSS VALE (HOSKINS STREET) 26.8, 28.0, 31.5, 32.0, 25.6, 21.0, 23.6
ALBION PARK (WOLLONGONG AIRPORT) (38km away) 21.0, 27.0, 27.0, 27.0, 27.0, 25.0, 28.0
KIAMA BOWLING CLUB (46km away) 28.0, 35.0, 32.0, 24.0, 23.5, 24.0, 22.6
The 3 stations are in a nearly straight line, so Albion Park is very close to Kiama and at similar altitude (Moss Vale is about 2000ft higher).
I would contend that the 3 stations bear little relationship to each other, and would be useless for estimating their neighbours’ temperatures.
For example, from Jan 7-9, the temperature changes at the 3 stations were +3.5/+0.5, 0.0/0.0, -3.0/-8.0 resp. A 15 deg C difference over just 2 days!
These 3 stations are in populated areas in a developed country, yet there are large gaps in the data, which itself looks pretty suspect in places.
I reckon BEST have a difficult job ahead of them, but I would argue strongly in favour of –
1. – not making up any data
2. – not dropping outliers, and
3. – reflecting missing or dodgy-looking data in uncertainty ranges.
#1 and #3 seem pretty straightforward. I think #2 is reasonable, because genuine outliers can occur, so dropping outliers is little different to making data up. If the BEST team think they have some dodgy outliers, surely the best (no pun intended) thing to do is to increase the uncertainty ranges at those points.