Not Whether, but How to Do The Math

Guest Post by Willis Eschenbach

The Berkeley Earth Surface Temperature (BEST) team is making a new global climate temperature record. Hopefully this will give us a better handle on what’s going on with the temperature.

BEST has put out a list of the four goals for their mathematical methods (algorithms). I like three of those goals a lot. One I’m not so fond of. Here are their goals:

1)  Make it possible to exploit relatively short (e.g. a few years) or discontinuous station records. Rather than simply excluding all short records, we prefer to design a system that allow short records to be used with a low – but non‐zero – weighting whenever it is practical to do so.

2)  Avoid gridding. All three major research groups currently rely on spatial gridding in their averaging algorithms. As a result, the effective averages may dependant on the choice of grid pattern and may be sensitive to effects such as the change in grid cell area with latitude. Our algorithms seek to eliminate explicit gridding entirely.

3)  Place empirical homogenization on an equal footing with other averaging. We distinguish empirical homogenization from evidence‐based homogenization. Evidence‐based adjustments to records occur when secondary data and/or metadata is used to identify problems with a record and propose adjustments. By contrast, empirical homogenization is the process of comparing a record to its neighbors to detect undocumented discontinuities and other changes. This empirical process performs a kind of averaging as local outliers are replaced with the basic behavior of the local group. Rather than regarding empirical homogenization as a separate preprocessing step, we plan to incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.

4)  Provide uncertainty estimates for the full time series through all steps in the process.

Using short series, avoiding gridding, and uncertainty estimates are all great goals. But the whole question of “empirical homogenization” is fraught with hidden problems and traps for the unwary.

The first of these is that nature is essentially not homogeneous. It is pied and dappled, patched and plotted. It generally doesn’t move smoothly from one state to another, it moves abruptly. It tends to favor Zipf distributions, which are about as non-normal (i.e. non-Gaussian) as a distribution can get.

So I object to the way that the problem is conceptualized. The problem is not that the data requires “homogenization”, that’s a procedure for milk. The problem is that there are undocumented discontinuities or incorrect data entries. But homogenizing the data is not the answer to that.

This is particularly true since (if I understand what they’re saying) they have already told us how they plan to deal with discontinuities. The plan, which I’ve been pushing for some time now, is to simply break the series apart at the discontinuities and treat it at two separate series. And that’s a good plan. They say:

Data split: Each unique record was broken up into fragments having no gaps longer than 1 year. Each fragment was then treated as a separate record for filtering and merging. Note however that the number of stations is based on the number of unique locations, and not the number of record fragments.

So why would they deal with “empirical discontinuities” by adjusting them, and deal with other discontinuities in a totally different manner?

Next, I object to the plan that they will “incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.” This will make it very difficult to back it out of the calculations to see what effect it has had. It will also hugely complicate the question of the estimation of error. For any step-wise process, it is crucial to separate the steps so the effect of each single step can be understood and evaluated.

Finally, let’s consider the nature of the “homogenization” process they propose. They describe it as a process whereby:

… local outliers are replaced with the basic behavior of the local group

There’s a number of problems with that.

First, temperatures generally follow a Zipf distribution (a distribution with a large excess of extreme values). As a result, what would definitely be “extreme outliers” in a Gaussian distribution are just another day in the life in a Zipf distribution. A very unusual and uncommon temperature in a Gaussian distribution may be a fairly common and mundane temperature in a Zipf distribution. If you pull those so-called outliers out of the dataset, or replace them with a local average, and you no longer have temperature data – you have Gaussian data. So you have to be real, real careful before you declare an outlier. I would certainly look at the distributions before and after “homogenization”, to see if the Zipf nature of the distribution has disappeared … and if so, I’d reconsider my algorithm.

Second, while there is a generally high correlation between temperature datasets out to 1200 km or so, that’s all that it is. A correlation. It is not a law. For any given station, there will often be nearby datasets that have very little correlation. In addition, for each of the highly correlated pairs, there will be a number of individual years where the variation in the two datasets is quite large. So despite high correlation, we cannot just assume that any record that disagrees with the “local group” is incorrect, as the BEST folks seem to be proposing.

Third, since nature itself is almost “anti-homogeneous”, full of abrupt changes and frequent odd occurrences and outliers, why would we want to “homogenize” a dataset at all? If we find data we know to be bad, throw it out. Don’t just replace it with some imaginary number that you think is somehow more homogeneous.

Fourth, although the temperature data is highly correlated out for a long distance, the same is not true of the trend. See my post on Alaskan trends regarding this question. Since the trends are not correlated, adjustment based on neighbors may well introduce a spurious trend. If the “basic behavior of the local group” is trending upwards, and the data being homogenized is trending horizontally, both may indeed be correct, and homogenization will destroy that …

Those are some of the problems with “homogenization” that I see. I’d start by naming it something else. It does not describe what we wish to do to the data. Nature is not homogenous, and neither should our dataset be homogeneous.

Then I’d use the local group, solely to locate unusual “outliers” or shifts in variance or average temperature.

But there’s no way I’d replace the putative “outliers” or shifts with the behavior of the “local group”. Why should I? If all you are doing is bringing the data in line with the average of the local group, why not just throw it out entirely and use the local average? What’s the advantage?

Instead, if I found such an actual anomaly or incorrect data point, I’d just throw out the bad data point, and break the original temperature record in two at that point, and consider it as two different records. Why average it with anything at all? That’s introducing extraneous information into a pristine dataset, what’s the point of that?

Lastly, a couple of issues with their quality control procedures. They say:

Local outlier filter: We tested for and flagged values that exceeded a locally determined empirical 99.9% threshold for normal climate variation in each record.


Regional filter: For each record, the 21 nearest neighbors having at least 5 years of record were located. These were used to estimate a normal pattern of seasonal climate variation. After adjusting for changes in latitude and altitude, each record was compared to its local normal pattern and 99.9% outliers were flagged.

Again, I’d be real, real cautious about these procedures. Since the value in both cases is “locally determined”, there will certainly not be a whole lot of data for analysis. Determination of the 99.9% exceedance level, based solely on a small dataset of Zipf-distributed data, will have huge error margins. Overall, what they propose seems like a procedure guaranteed to convert a Zipf dataset into a Gaussian dataset, and at that point all bets are off …

In addition, once the “normal pattern of seasonal climate variation” is established, how is one to determine what is a 99.9% outlier? The exact details of how this is done make a big difference. I’m not sure I see a clear and clean way to do it, particularly when the seasonal data has been “adjusted for changes in latitude and altitude”. That implies that they are not using anomalies but absolute values, and that always makes things stickier. But they don’t say how they plan to do it …

In closing, I bring all of this up, not to oppose the BEST crew or make them wrong or pick on errors, but to assist them in making their work bulletproof. I am overjoyed that they are doing what they are doing. I bring this up to make their product better by crowd-sourcing ideas and objections to how they plan to analyze the data.

Accordingly, I will ask the assistance of the moderators in politely removing any posts talking about whether BEST will or won’t come up with anything good, or of their motives, or whether the eventual product will be useful, or the preliminary results, or anything extraneous. Just paste in “Snipped – OT” to mark them, if you’d be so kind.

This thread is about how to do the temperature analysis properly, not whether to do it, or the doer’s motives, or whether it is worth doing. Those are all good questions, but not for this thread. Please take all of that to a general thread regarding BEST. This thread is about the mathematical analysis and transformation of the data, and nothing else.



newest oldest most voted
Notify of

Temperatures are odd things. Climate conditions can vary considerably over only a few miles of geographical distance. The weather in Truckee, California is a lot different than the weather at Reno, Nevada and they are only about 40 miles apart.
The problem is that regions often have several microclimates. I might lump Half Moon Bay with Pacifica, but would lump neither with San Mateo though San Mateo with Redwood City might make perfect sense. The point is that any “homogenization” of temperatures must take regional geography into account and not simply distance on a map.
Reply: crosspatch, your comment is perfectly intelligible to me and others in the Bay Area, but we have quite an International audience. San Mateo and Redwood City just might be a bit obscure to our readers on a few continents. ~ ctm.

Brian H

There’s a tautology problem here. Only by examination of the outliers and their relation to the entire data set can you determine whether they have important information to convey. Discarding them eliminates that possibility.


I hope the raw data and the programming is made available online under as open license/agreement . It is most important BEST make that a priority. They will not do everything right in detail. Simply because honest people do not actually agree what is the right maths in detail. The key point is we can see what was done and play with the code/data to see if the results are robust to other assumptions.


Ignoring the issues of correlation not proving causation and “global temperature” being a purely theoretical concept, the statistical obstacle you mention makes this “BEST” effort seem doomed to irrelevance.


What good is temperature data without corresponding pressure and humidity?

When I read their descriptions I get a different understanding than they do. Most instance, lets compare Sacramento and South Lake Tahoe. They are ~120 miles apart. If the temperature data for Sacramento is colder than South Lake Tahoe for one reading during the day, then it is flagged and thrown out. Or the reverse is thrown out.
This ties back into Finland having readings that are 20C instead of -20C in the middle of winter because a negative got missed.
Lots of testing will be needed for the algorithms to get it correct, but such correlated behavior is a good way to get rid of bad data. I agree with not inserting an imaginary number as weather will skew things, but throwing it away is the right thing to do.
I like the idea to get rid of the gridding. There is so much empty space without any valid data. Why is that filled in? That is truly meaningless.
One thing that isn’t clear to me is if they will continue to use the min/max temperatures only. With modern logging that is a foolish limitation. The problem with that is comparing to old records that only have that resolution, but that is where the amount of error enters. Older records with only min/max would have higher error than modern data that had 24 points per day for the daily average. Correlating the 24 data day to the min/max days would be interesting to see.
There is more, but this is already getting too long for a comment.
John Kehr

Willis Eschenbach

jorgekafkazar says:
March 23, 2011 at 12:45 am

Ignoring the issues of correlation not proving causation and “global temperature” being a purely theoretical concept, the statistical obstacle you mention makes this “BEST” effort seem doomed to irrelevance.

I disagree entirely. The BEST effort is just beginning, how could it be “doomed” to anything this early in the game? The procedures are new, and will certainly be changed at some point. In a battle, the first casualty is always the battle plan … they will soldier on.

Willis Eschenbach

BigWaveDave says:
March 23, 2011 at 12:53 am (Edit)

What good is temperature data without corresponding pressure and humidity?

What good is steak and eggs without beer? It’s not a full meal by any means, but it’s better than nothing.
Yes, it would be nice to have pressure and humidity as well … but let’s take them one piece at a time, get that piece right as best we know how, and then move on to the next piece.


I can’t understand why climate scientists always seem to want to homogenise data?
They are aware that climate is ultimately driven by deterministic chaos and must also be aware that any homogenisation process will destroy information and be a handicap to intuition.
It is also deplorable that the BEST team seek to hide the adjustments they make, rather that letting the raw data stand, then adding single adjustment layers for each massaging of the base data. There current approach will make it much harder to achieve their fifth objective…
“To provide an open platform for further analysis by publishing our complete data and software code as well as tools to aid both professional and amateur exploration of the data.”

Thomas L

At any level of usefulness, real-world statisticians with relevant experience need to be brought in early in the process. I have many years of experience in working with small datasets (insurance actuary), but I could easily have missed this.
It’s critical to be able to start with as much raw data as possible. And from open source work, to make that raw data static and available. None of this moving averages for 1934, where by moving average, I mean that the average for that year (and most years) has many different values over time. Less than 5K of data per year per station for early years. About 50K per station per year if we have hourly data. 10,000 stations, 100 years, 50GB for entire dataset, 500MB (less than one CD) for annual updates. Only raw fixes I’d add would be minus signs where the raw data clearly omitted them. One or two CDs for metadata, again static (any adjustments to prior data need date, person, method, reason, and do not modify any previous data, just give suggested/recommended adjustments to data).
As you point out, their definition of homogenization screams to me “Don’t do that!”. If needed, do it late and separately, with full documentation (before, after, reasons, full computer code, including source, compiler/version, hardware, running time, enough to exactly reproduce results).

Bob from the UK

I don’t see how a temperature record can be made without using approximations which may be questionable.
I think the key therefore is to identify these approximations/adjustments, make them transparent. Adjustments and approximations which significantly affect the record, upwards or downwards trends need to be examined, eg extrapolation over large areas. I would expect most of them to balance out over time. The most significant problem I’m aware of is the “Urban Heat Island”, however this is not a processing problem, but a measurement problem.


Speaking with my physicist cap on, rather than adjusting data, wouldn’t it be better to increase the error bars instead? I get a bad feeling when I see data being adjusted.

Jack Hughes

“local outliers are replaced with the basic behavior of the local group”
So Einsteins’s ideas will be replaced with the basic averaged ideas of the Swiss people. Newton’s Laws will be replaced with the average thinking of people who sit under apple trees.

I think any project aiming to obtain the global mean temperature should first refute Essex et al. (2007), otherwise I see little value in such efforts. BEST is no exception.

I fear you are very much concentrating on the wrong problem. Even if you produce a statistically “perfect” method of analysis, the megasaurus in the dunny is that the raw local temperature experienced by the Met Stations has changed due to several factors including macro urban heating, micro changes to surfaces and land use around the station, and many stations were moved closer to buildings in the 1970s+ in order to facilitate automation.
What we really need is for people to get off their seats in their ivory towers of academia, and go and visit each and every one of these sites. To characterise the site both now and historically in terms of their relationship to buildings and surfaces like tarmac. Only once you understand the micro-urbanisation changes can you really start considering macro changes like overall global temperature otherwise it is a totally meaningless figure.
And as a lot of these changes will not documented and many of those who are the primary source of information for the 1970s are either retiring or dying, unless someone gets off their backside, puts together the millions upon millions such a project will actually take and gets going … this information will be lost for ever and we will never know what actually happened to global temperatures last century

Saul Jacka

I am astounded at the homogenisation suggestion. Any serious statistical analysis should start by looking at the data closely and querying anything suspicious. However, the way to deal with suspicious data points is either (after long consideration) to treat them as missing data or else to build a statistical model with observation error probabilities in it (i.e. not necessarily the usual Gaussian measurement noise, but different error distributions or even positive probabilities of measurements unrelated to the “true” values).
Homogenisation is a major source of overconfidence in statistically bland outcomes.

Stephen Richards

I agree totally with you , Willis BUT I do want to the raw data plotted as well as the processed. I hate adjustments to data, any data. It ‘feels’, well, just completely wrong.

John Marshall

This research relies on the assumption that temperature is a good measure of AGW. It is not.
All the warmist assumptions are based on the idea of equilibrium of energy exchange. The planet does not achieve equilibrium with temperature which is one reason why climates change, probably the main reason.
Atmospheric temperatures will also change far more quickly than those of the oceans which is why ocean temperatures are far more important an indicator of heat exchanges.


but will their work be open to the public? can we have code and data? can we therefore replace some algorithms?


What mechanisms might introduce damaging systematic instead relatively benign random errors if outliers are left in, tossed out or massaged in? Or is it more a matter of error bars not being accurate instead of just the trend being off? Knowledge of error types seems important here versus placing too much reliance on blind statistics that conform to some lofty idea of elegance.
Also of great concern to me since I like to harp on very old single site records is how to deal with not being so lazy about cutting the global average off prior to 1880 like GISS does! There are three USA records that are fully continuous back to around 1820, and about a dozen in Europe back to the 1700s. There are many dozens that carry back to 1830 instead of just 1880. Extending the slope back another half century would help resolve the hockey stick debate.

Keith Wallis

As soon as I saw the third goal in the piece I had the same concern as you, Willis. BEST (or anyone else) can’t just make the assumption that an outlier is an error and ‘homogenize’ it. It may be an error, it may be due to a microclimate, a very localised weather phenomenon (e.g. a fog-prone valley), etc.
I’m afraid this stuff can’t just be waved away, as it’s the way these very differences over small areas have been handled that have caused quite a few of the issues with the existing global temperature sets (Eureka as an example, perhaps). It’s going to take a heck of a lot of hard yards, but that’s what must be done in order to get it ‘right’. Blanket processing algorithms won’t wash – it’ll take human eyes to look at and understand every significant local variation before a more generalised rule can be written to handle them automatically.
Yes, it’s an enormous amount of work for a lot of people, but nobody ever said this was easy. There are no short-cuts.

ctm Crosspatch makes a good point.
For international audience here is an example that I have looked into recently. Rainfall is one of the principal climate parameters. Oxford and Cambridge, two UK’s university cities are only 65 miles (~ 105 km) apart, geographical features are very similar. It is likely that they have as accurate records as you can find anywhere in the world. Not only that there is considerable difference in the amount of rainfall, but even trends are different for two not so distant places.
Reply: I wasn’t disagreeing with his point. He lives in my area and was using examples I thought might just be a bit obscure. ~ ctm

David L

Any highschool student can tell you that the “average” is only an acceptable measure of central tendency for a normal distribution. If it’s non-normal then the median or mode is appropriate. Or one can do a Box-Cox analysis on the distribution and apply a transform to make it normal. BWhy do learned scientists keep forgetting this fact?

Ian W

Willis Eschenbach says:
March 23, 2011 at 1:13 am
BigWaveDave says:
March 23, 2011 at 12:53 am (Edit)
What good is temperature data without corresponding pressure and humidity?
What good is steak and eggs without beer? It’s not a full meal by any means, but it’s better than nothing.
Yes, it would be nice to have pressure and humidity as well … but let’s take them one piece at a time, get that piece right as best we know how, and then move on to the next piece.

Hello Willis,
I think that you missed the point. Temperature is not a measure of energy yet it is an increase in ‘trapped’ energy due to GHG that is being claimed to have changed the Earth’s energy budget.
The amount of energy needed to raise the temperature of dry polar air one degree centigrade is significantly less (eighty times less?) than the amount of energy required to raise humid tropical air one degree centigrade. This is due to the enthalpy of humid air.
When the polar vortices expand, as they have done recently with the equatorward movements of the jetstreams, the atmospheric humidity balance alters. If temperatures are just averaged without regard to the humidity changes then for the same amount of energy there can be a significant rise in temperature.
It would really be a good idea to decide if the intent is to measure energy budget or not.
Then when you are measuring the correct variable, you can have concerns on whether a cool site in a valley on the northern side of the mountains is ‘homogenized’ because the 3 sites 50 miles away on the southern side of the mountains are so much warmer, and then the outliers in their temperatures removed to make the temperature pattern look nice and Gaussian.

I agree with Willis re homogenisation and outliers. Where I live in NSW Australia, a few degrees shift in wind direction is the difference between cold from the Snowy Mountains and stinking hot from the interior. Temperatures can vary wildly over short distances.
I looked up stations near Moss Vale on the Bureau of Meteorology website, and there are three of them with some temperatures for January 2011 that are NOT marked “Not quality controlled or uncertain, or precise date unknown”. Their temperatures for Jan 3, 7-10, 13,16 (that’s all the Jan days they have in common) are as follows:
MOSS VALE (HOSKINS STREET) 26.8, 28.0, 31.5, 32.0, 25.6, 21.0, 23.6
ALBION PARK (WOLLONGONG AIRPORT) (38km away) 21.0, 27.0, 27.0, 27.0, 27.0, 25.0, 28.0
KIAMA BOWLING CLUB (46km away) 28.0, 35.0, 32.0, 24.0, 23.5, 24.0, 22.6
The 3 stations are in a nearly straight line, so Albion Park is very close to Kiama and at similar altitude (Moss Vale is about 2000ft higher).
I would contend that the 3 stations bear little relationship to each other, and would be useless for estimating their neighbours’ temperatures.
For example, from Jan 7-9, the temperature changes at the 3 stations were +3.5/+0.5, 0.0/0.0, -3.0/-8.0 resp. A 15 deg C difference over just 2 days!
These 3 stations are in populated areas in a developed country, yet there are large gaps in the data, which itself looks pretty suspect in places.
I reckon BEST have a difficult job ahead of them, but I would argue strongly in favour of –
1. – not making up any data
2. – not dropping outliers, and
3. – reflecting missing or dodgy-looking data in uncertainty ranges.
#1 and #3 seem pretty straightforward. I think #2 is reasonable, because genuine outliers can occur, so dropping outliers is little different to making data up. If the BEST team think they have some dodgy outliers, surely the best (no pun intended) thing to do is to increase the uncertainty ranges at those points.

Keith Wallis

Agree with John Kehr regarding hourly temp readings. These should be included where available as a genuine way of reducing the impact of erroneous outliers. Sure, not every station can provide these, but it won’t be difficult to be able to provide separate results for all stations, just max/min stations and just hourly stations to see the difference it makes (if any).

Alexander K

I’m in agreement with Crosspatch, even though I have no knowledge of the areas he is discussing. The geographical area I am most familiar with, the Rodney district in New Zealand, has a temperature profile that looks absolutely bizarre to those unfamiliar with it. It is coastal, has an enormous but shallow harbour with the longest harbour shorelineline in the world, lies on a very narrow and often rugged tongue of land and has an unusual number of vastly differing microclimates within a very small geographical area. Attempting to draw any conclusions from temps from adjacent but differing microclimates would seem to be a way to destroy any meaning in the records. As an example, the entire area is in the ‘subtropical’ climate, yet in some microclimates, morning frosts suficiently severe to freeze exposed water pipes are commonplace, yet other microclimates ‘just over the hill’ are entirely frost-free.
Perhaps I am missing something due to my own ignorance, but I tend to agree with Willis in regard to homogenising of data and in removing outliers. If those outliers are an accurate record of temperature as it happened, removing them is rendering the data incorrect. While I see the BEST initiative as an invaluable exercise, I would be even happier if the same effort was made to ensure high quality data is taken, free from contamination.
I guess I am still a suspicious country boy at heart, as I tend to distrust exterme cleverness with mathematics and statistics employed to ‘get a result’ taken from situations and equipment that are influenced by factors other than those we are attempting to measure. I do not understand the rush to get the BEST thing done, as the world and it’s climate will be here for a while yet doing its thing, whatever that is.


“If we find data we know to be bad, throw it out. Don’t just replace it with some imaginary number that you think is somehow more homogeneous.”
…and split the series. Allow short series (much shorter than a year) to enable data to be tossed without creating a hole in the series that needs to be filled by imaginary data.
Then the algorithm can be run and rerun with varying criteria on discarding data.


@Willis : “Instead, if I found such an actual anomaly or incorrect data point, I’d just throw out the bad data point, and break the original temperature record in two at that point, and consider it as two different records. Why average it with anything at all? That’s introducing extraneous information into a pristine dataset, what’s the point of that?”
Ahhh yup. Man, that’ll teach me to post mid article…


Actually if you want to get a real feel for what has happened with real global temps lets say, since 1880, is to look at unadjusted raw RURAL data only, I mean one that is STILL rural. I don’t think I have seen ONE anywhere showing any significant warming, has anyone here? In any case it does not matter what BEST come up with in the end, it is the current trend for say next 10 years that will count (yes even with urban station) etc, because since 2002 it is already FLAT! ie no extra warming as predicted


As one other poster, above, noted, averages can be deceiving. There is another poster on here which has an excellent analysis of Canadian temperatures. His graphs and analysis shows that maximum daily highs (as recorded by land based instruments) have either not been increasing or have in been decreasing over a number of decades, while the daily minimums have been increasing. By averaging the temperatures, it appears that Canada is experiencing a warming trend. But this sort of warming trend (where only the daily minimums increase) is actually very good for many reasons. With no increase in daily maximums, is there any harm being felt?
So what I am asking for is if BEST could show both the trend in daily maximums and daily minimums rather than the change in the average temperatures.

Steve Keohane

I agree with your perspective Willis. They need to keep the real distribution in the data. One can explain and eliminate an outlier, or leave it alone, but don’t change it to something else, include it, and pretend you have the same representation of the data. I am not confident about the choice of confidence limits, those are contingent on the type of data distribution, and people usually default to Gaussian which may not represent nature, as you point out.

Theo Goodwin

Great analysis, Willis. I am with you all the way. And I go farther than you. Climate is local. That is apparent to lovers of the outdoors. For example, the coastline of Florida is considerably cooler than inland areas and the wind is always blowing on the coastline. But the coastline is a permanent feature. Surely it qualifies as climate not weather. Yet how many times will the coastline show up as an outlier in an inland cell? Seems to me that the cell based approach is seriously flawed. Why use cells? If using cells, why make them larger than one mile by one mile? This is the age of computers, after all. Data management has benefited so much from computers that a finer mesh of data really seems required at this time.

Don K

I’m kinda of new to this stuff. Yes, the homogenization thing jumped out at me also. Some questions:
1. Is homogenization a technique for getting the final data point? Or is it merely a screen to minimize the number of data points that need to be manually examined for plausibility?
2. When homogenization takes place, are they working with actual temperatures, or “anomaly (first difference?) temperatures”
3. Has any of the input data been OCRed from old records? If so, there really should be a step somewhere to try to detect misreads e.g. 3s read as 8s,5s as 6s (the two are virtually indistinguishable in at least one font-I forget which). Some OCR errors are pretty much indetectable even by humans, but errors in the leading digit can be pretty blatant.
3a. Has handwriting recognition technology been used on any of the input data? It has probably improved some since I worked with it tangentially two decades ago, but it was pretty iffy back then (Don’t confuse real time recognition where stroke order information is available with trying to identify the result without stroke order info. The latter is much harder and would be what is needed here).
3b. Is there any provision for flagging records with a high detected error rate as being doubtful?
3c. I know that editing the data seems to be traditional in “climate science”, but if that is really the case, might not this be a good time to break with tradition? Is it feasible to pass through all the data and flag the values that screening says are doubtful so that the end analyst can choose to use them, tweak them, or reject them?
4. The output of this effort is what? A single global temperature? A cleaned up temperature set from which “global temperatures” can be computed?

Darren Parker

I don’t think the temperature sets should be looking at max and min data, they should be loking at diurnal variance and be plotted against local cloud cover. Everyone knows anecdotally that if you have a clear day and cloudy night way less heat is lost than with a clear night. What AGW should be proving if true is that the diurnal variation should be lessening as night time heat loss decreases. I have abig problem with the black body theory – Part of the world is absorbing heat whilst part of the world is losing heat all the time, and each part of the world is different in terms of land/water/biomass coverage. This kind of simplictis modelling proposed is still way way off where we need tobe – until we get fractal based systems we’re a long way off

Joel Heinrich

There is another problem in homogenising with “neighbouring” stations, like Straßburg (France) and Karlsruhe (Germany), which are some 70 km apart. The French compute the mean temperature like the US as (Tmax + Tmin)/2, while in Germany it was (T0730 +T1430 +2*T2130)/4 and now is the mean of 24 hourly measurements (with T0730 being the temperature at 7:30 CET). There is a difference of up to 1K between the Frenchs and the Germans mean daily temperature. Any homogenisation would introduce a bias.


Agree wholeheartedly with the article. In trying to arrive at my own “scientifically justifiable” approach to analyzing global temperature, I saw “homogenization” (for want of a better word) and “data creation” to be the “Achilles heal” of the methodologies already being used.
Honestly, with no attempt to discredit BEST so early in the game, I think “homogenization” represents a “shallowness” of thought or understanding regarding the actual physics, measurement techniques, geographical, and statistical effects bearing on “global temperature change estimation”. What is needed is “deep” thought on this topic.
I had high hopes that BEST will provide a universally “acceptable” product. Hopefully, BEST will reconsider this aspect.


Assuming they will not have the resources/time to dive into each station and it’s individual history, how about chunking the surface into geographical regions based on the best available inventory of micro-climate and “medium-climate” types. Then, generally treat those regions as independent islands.
Also, I think that they should do runs where they;
a) leave out micro/medium climate areas that don’t have stattion coverage
b) Include those uncovered areas based on some stated, rational method.
c) Land only,
d) ocean only
e) lastly, global


I would really like to see a straight “binning” analysis which defines some metric of warming, *at a particular site*, and then allows one to flexibly bin the data. This way we could ask questions like “what percentage of sites at 30-65 deg latitude have warmed?”, and “what percentage of rural stations have warmed vs. urban stations”.
I’d also like to see the definition of “warming” be flexible. For example one could select traditional temp anomaly, or just average temp for the year, or compare monthly average year over year, or use absolute max temp for the year, absolute min temp, average of monthly max temp, average of monthly min temp – etc…


Willis – I agree with you completely.
In particular, I am very much against homogenization.
I know that you are aware of the Australian Bureau of Metrology “High Quality” datasets, which produce, via very sophisticated statistics, a very sharply rising Australian temperature map.
This has been built on the top of raw data that has very little trend, except for some UHI here and there.
Please forward your criticisms to BEST if you haven’t yet done so, as by and large, they seem to be trying to produce an honest series.


If BEST can get the method right then eventually the BOM can be induced to follow their lead.
BOM have for too long followed the siren call of the IPCC.
Time to change ships before it’s too late.

Steve C

Thanks for a useful post, Willis – I admit to (customary) laziness in not checking through BEST’s campaign plan, and share the general unease with “normalisation”.
I think Stephen Richards comes closest to my immediate reaction. If they are proposing to release a “massaged” dataset, then they should also release (a) all the raw data exactly as received, with just an indication of where cuts into data packets have been made, and (b) full information on what changes have been made where, and why.
In short, all the original data and details which the Hockey Team prefer to keep under the carpet, so that all the information is freely available for inspection and discussion by anyone. Not just plots, either – the original figures are the foundation of the entire structure built on them, so should be there “in person” in great chunks of CSV or whatever for complete access.
Like that link above, which ends with:
* The programs use (whatever software, algorithms etc.)
* The zipped archive is (link) here.
– like it oughta.


After reading Anthony, Willis, and most of the comments, it is very to see why there is so much excitement about the scientific nature of this project. If valid, unadjusted raw data in various regions has not been identified — and hasn’t some of it been destroyed? — how can any analysis of that data be anything but worthless. And homogenization, come on. “Global climate temperature” is not possible, but overall warming and cooling in the various regions does seem doable. Why sign on to this project?


The homogenization also would seem to obscure the noise inherent in the data. If nearby 10 stations all have small DC offsets- say, up to 0.5C, you just can’t arbitrarily remove those, find the standard deviation (assuming a Gaussian distribution of errors, which just can’t be right) then quote some absurdly small uncertainty in the temperatures. And you can’t then average a thousand such areas and quote an even smaller uncertainty.
There has to be some, ahem, “robust” accounting for the fact that you must have significant uncertainty in finding “local” average temps, and that doesn’t really become smaller like sqrt(number of stations), since the errors are a combination of many sources, some non-random and non-Gaussian. The homogenization obscures this.

I worked for many years in acoustics and phonetics research, building electronic gadgets and writing software to perform experiments.
I know one thing: In any honest scientific field, you don’t “homogenize” anything. When one subject shows responses that indicate he can’t do the task, or misunderstands the instructions, you don’t try to “pull” his responses toward other similar subjects. You toss him out. If you have to toss so many subjects that your total data set ends up unusable, you toss out the whole experiment and try something different.


I question how good any homogenization technique will be in detecting errors that creep in over decades. Such as micro site contamination from growing plants or urban encroachment.


If the nearby city is big enough, encroaching urbanization could easily affect almost all of the sensors in a given region. In such a case, would homogenization adjust the one or two sensors that aren’t being encroached to better match those that are?

Joe Lalonde

I live near the water on Georgian Bay, Ontario, Canada.
We have a 50′ hill all around this area of shore line.
The other day it was 4 degrees C colder at the bottom of the hill to at the top when I was driving home.
Ask me if I trust temperatures on a global scale…NOT!

j ferguson

Is it reasonable to do the homogenization as the last step and then show the with/without delta so that we might appreciate what its effect is? If the effect is insignificant? If it is significant, then the “outliers” really must be identified and qualified, musn’t they?

Data are. They are readings taken from instruments. Good data are data taken from properly selected, properly calibrated, properly sited, properly installed, properly maintained instruments. All other data are bad data, for whatever reasons. Missing data are simply missing.
Bad data cannot be converted into good “data” by gridding, infilling, adjusting, homogenizing, pasteurizing, folding, bending, spindling or mutilating the bad data. Missing data cannot be magically, mystically “appeared” any more than good data should be “disappeared”.
Data which has been adjusted is no longer data. I don’t know what it is, only what it is not. In this context, however, it is typically referred to as a “global temperature record”.