Not Whether, but How to Do The Math

Guest Post by Willis Eschenbach

The Berkeley Earth Surface Temperature (BEST) team is making a new global climate temperature record. Hopefully this will give us a better handle on what’s going on with the temperature.

BEST has put out a list of the four goals for their mathematical methods (algorithms). I like three of those goals a lot. One I’m not so fond of. Here are their goals:

1)  Make it possible to exploit relatively short (e.g. a few years) or discontinuous station records. Rather than simply excluding all short records, we prefer to design a system that allow short records to be used with a low – but non‐zero – weighting whenever it is practical to do so.

2)  Avoid gridding. All three major research groups currently rely on spatial gridding in their averaging algorithms. As a result, the effective averages may dependant on the choice of grid pattern and may be sensitive to effects such as the change in grid cell area with latitude. Our algorithms seek to eliminate explicit gridding entirely.

3)  Place empirical homogenization on an equal footing with other averaging. We distinguish empirical homogenization from evidence‐based homogenization. Evidence‐based adjustments to records occur when secondary data and/or metadata is used to identify problems with a record and propose adjustments. By contrast, empirical homogenization is the process of comparing a record to its neighbors to detect undocumented discontinuities and other changes. This empirical process performs a kind of averaging as local outliers are replaced with the basic behavior of the local group. Rather than regarding empirical homogenization as a separate preprocessing step, we plan to incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.

4)  Provide uncertainty estimates for the full time series through all steps in the process.

Using short series, avoiding gridding, and uncertainty estimates are all great goals. But the whole question of “empirical homogenization” is fraught with hidden problems and traps for the unwary.

The first of these is that nature is essentially not homogeneous. It is pied and dappled, patched and plotted. It generally doesn’t move smoothly from one state to another, it moves abruptly. It tends to favor Zipf distributions, which are about as non-normal (i.e. non-Gaussian) as a distribution can get.

So I object to the way that the problem is conceptualized. The problem is not that the data requires “homogenization”, that’s a procedure for milk. The problem is that there are undocumented discontinuities or incorrect data entries. But homogenizing the data is not the answer to that.

This is particularly true since (if I understand what they’re saying) they have already told us how they plan to deal with discontinuities. The plan, which I’ve been pushing for some time now, is to simply break the series apart at the discontinuities and treat it at two separate series. And that’s a good plan. They say:

Data split: Each unique record was broken up into fragments having no gaps longer than 1 year. Each fragment was then treated as a separate record for filtering and merging. Note however that the number of stations is based on the number of unique locations, and not the number of record fragments.

So why would they deal with “empirical discontinuities” by adjusting them, and deal with other discontinuities in a totally different manner?

Next, I object to the plan that they will “incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.” This will make it very difficult to back it out of the calculations to see what effect it has had. It will also hugely complicate the question of the estimation of error. For any step-wise process, it is crucial to separate the steps so the effect of each single step can be understood and evaluated.

Finally, let’s consider the nature of the “homogenization” process they propose. They describe it as a process whereby:

… local outliers are replaced with the basic behavior of the local group

There’s a number of problems with that.

First, temperatures generally follow a Zipf distribution (a distribution with a large excess of extreme values). As a result, what would definitely be “extreme outliers” in a Gaussian distribution are just another day in the life in a Zipf distribution. A very unusual and uncommon temperature in a Gaussian distribution may be a fairly common and mundane temperature in a Zipf distribution. If you pull those so-called outliers out of the dataset, or replace them with a local average, and you no longer have temperature data – you have Gaussian data. So you have to be real, real careful before you declare an outlier. I would certainly look at the distributions before and after “homogenization”, to see if the Zipf nature of the distribution has disappeared … and if so, I’d reconsider my algorithm.

Second, while there is a generally high correlation between temperature datasets out to 1200 km or so, that’s all that it is. A correlation. It is not a law. For any given station, there will often be nearby datasets that have very little correlation. In addition, for each of the highly correlated pairs, there will be a number of individual years where the variation in the two datasets is quite large. So despite high correlation, we cannot just assume that any record that disagrees with the “local group” is incorrect, as the BEST folks seem to be proposing.

Third, since nature itself is almost “anti-homogeneous”, full of abrupt changes and frequent odd occurrences and outliers, why would we want to “homogenize” a dataset at all? If we find data we know to be bad, throw it out. Don’t just replace it with some imaginary number that you think is somehow more homogeneous.

Fourth, although the temperature data is highly correlated out for a long distance, the same is not true of the trend. See my post on Alaskan trends regarding this question. Since the trends are not correlated, adjustment based on neighbors may well introduce a spurious trend. If the “basic behavior of the local group” is trending upwards, and the data being homogenized is trending horizontally, both may indeed be correct, and homogenization will destroy that …

Those are some of the problems with “homogenization” that I see. I’d start by naming it something else. It does not describe what we wish to do to the data. Nature is not homogenous, and neither should our dataset be homogeneous.

Then I’d use the local group, solely to locate unusual “outliers” or shifts in variance or average temperature.

But there’s no way I’d replace the putative “outliers” or shifts with the behavior of the “local group”. Why should I? If all you are doing is bringing the data in line with the average of the local group, why not just throw it out entirely and use the local average? What’s the advantage?

Instead, if I found such an actual anomaly or incorrect data point, I’d just throw out the bad data point, and break the original temperature record in two at that point, and consider it as two different records. Why average it with anything at all? That’s introducing extraneous information into a pristine dataset, what’s the point of that?

Lastly, a couple of issues with their quality control procedures. They say:

Local outlier filter: We tested for and flagged values that exceeded a locally determined empirical 99.9% threshold for normal climate variation in each record.

and

Regional filter: For each record, the 21 nearest neighbors having at least 5 years of record were located. These were used to estimate a normal pattern of seasonal climate variation. After adjusting for changes in latitude and altitude, each record was compared to its local normal pattern and 99.9% outliers were flagged.

Again, I’d be real, real cautious about these procedures. Since the value in both cases is “locally determined”, there will certainly not be a whole lot of data for analysis. Determination of the 99.9% exceedance level, based solely on a small dataset of Zipf-distributed data, will have huge error margins. Overall, what they propose seems like a procedure guaranteed to convert a Zipf dataset into a Gaussian dataset, and at that point all bets are off …

In addition, once the “normal pattern of seasonal climate variation” is established, how is one to determine what is a 99.9% outlier? The exact details of how this is done make a big difference. I’m not sure I see a clear and clean way to do it, particularly when the seasonal data has been “adjusted for changes in latitude and altitude”. That implies that they are not using anomalies but absolute values, and that always makes things stickier. But they don’t say how they plan to do it …

In closing, I bring all of this up, not to oppose the BEST crew or make them wrong or pick on errors, but to assist them in making their work bulletproof. I am overjoyed that they are doing what they are doing. I bring this up to make their product better by crowd-sourcing ideas and objections to how they plan to analyze the data.

Accordingly, I will ask the assistance of the moderators in politely removing any posts talking about whether BEST will or won’t come up with anything good, or of their motives, or whether the eventual product will be useful, or the preliminary results, or anything extraneous. Just paste in “Snipped – OT” to mark them, if you’d be so kind.

This thread is about how to do the temperature analysis properly, not whether to do it, or the doer’s motives, or whether it is worth doing. Those are all good questions, but not for this thread. Please take all of that to a general thread regarding BEST. This thread is about the mathematical analysis and transformation of the data, and nothing else.

w.

About these ads

158 thoughts on “Not Whether, but How to Do The Math

  1. Temperatures are odd things. Climate conditions can vary considerably over only a few miles of geographical distance. The weather in Truckee, California is a lot different than the weather at Reno, Nevada and they are only about 40 miles apart.

    The problem is that regions often have several microclimates. I might lump Half Moon Bay with Pacifica, but would lump neither with San Mateo though San Mateo with Redwood City might make perfect sense. The point is that any “homogenization” of temperatures must take regional geography into account and not simply distance on a map.

    Reply: crosspatch, your comment is perfectly intelligible to me and others in the Bay Area, but we have quite an International audience. San Mateo and Redwood City just might be a bit obscure to our readers on a few continents. ~ ctm.

  2. There’s a tautology problem here. Only by examination of the outliers and their relation to the entire data set can you determine whether they have important information to convey. Discarding them eliminates that possibility.

  3. I hope the raw data and the programming is made available online under as open license/agreement . It is most important BEST make that a priority. They will not do everything right in detail. Simply because honest people do not actually agree what is the right maths in detail. The key point is we can see what was done and play with the code/data to see if the results are robust to other assumptions.

  4. Ignoring the issues of correlation not proving causation and “global temperature” being a purely theoretical concept, the statistical obstacle you mention makes this “BEST” effort seem doomed to irrelevance.

  5. When I read their descriptions I get a different understanding than they do. Most instance, lets compare Sacramento and South Lake Tahoe. They are ~120 miles apart. If the temperature data for Sacramento is colder than South Lake Tahoe for one reading during the day, then it is flagged and thrown out. Or the reverse is thrown out.

    This ties back into Finland having readings that are 20C instead of -20C in the middle of winter because a negative got missed.

    Lots of testing will be needed for the algorithms to get it correct, but such correlated behavior is a good way to get rid of bad data. I agree with not inserting an imaginary number as weather will skew things, but throwing it away is the right thing to do.

    I like the idea to get rid of the gridding. There is so much empty space without any valid data. Why is that filled in? That is truly meaningless.

    One thing that isn’t clear to me is if they will continue to use the min/max temperatures only. With modern logging that is a foolish limitation. The problem with that is comparing to old records that only have that resolution, but that is where the amount of error enters. Older records with only min/max would have higher error than modern data that had 24 points per day for the daily average. Correlating the 24 data day to the min/max days would be interesting to see.

    There is more, but this is already getting too long for a comment.
    John Kehr

  6. jorgekafkazar says:
    March 23, 2011 at 12:45 am

    Ignoring the issues of correlation not proving causation and “global temperature” being a purely theoretical concept, the statistical obstacle you mention makes this “BEST” effort seem doomed to irrelevance.

    I disagree entirely. The BEST effort is just beginning, how could it be “doomed” to anything this early in the game? The procedures are new, and will certainly be changed at some point. In a battle, the first casualty is always the battle plan … they will soldier on.

    w.

  7. BigWaveDave says:
    March 23, 2011 at 12:53 am (Edit)

    What good is temperature data without corresponding pressure and humidity?

    What good is steak and eggs without beer? It’s not a full meal by any means, but it’s better than nothing.

    Yes, it would be nice to have pressure and humidity as well … but let’s take them one piece at a time, get that piece right as best we know how, and then move on to the next piece.

    Thanks,

    w.

  8. I can’t understand why climate scientists always seem to want to homogenise data?

    They are aware that climate is ultimately driven by deterministic chaos and must also be aware that any homogenisation process will destroy information and be a handicap to intuition.

    It is also deplorable that the BEST team seek to hide the adjustments they make, rather that letting the raw data stand, then adding single adjustment layers for each massaging of the base data. There current approach will make it much harder to achieve their fifth objective…

    “To provide an open platform for further analysis by publishing our complete data and software code as well as tools to aid both professional and amateur exploration of the data.”

  9. At any level of usefulness, real-world statisticians with relevant experience need to be brought in early in the process. I have many years of experience in working with small datasets (insurance actuary), but I could easily have missed this.

    It’s critical to be able to start with as much raw data as possible. And from open source work, to make that raw data static and available. None of this moving averages for 1934, where by moving average, I mean that the average for that year (and most years) has many different values over time. Less than 5K of data per year per station for early years. About 50K per station per year if we have hourly data. 10,000 stations, 100 years, 50GB for entire dataset, 500MB (less than one CD) for annual updates. Only raw fixes I’d add would be minus signs where the raw data clearly omitted them. One or two CDs for metadata, again static (any adjustments to prior data need date, person, method, reason, and do not modify any previous data, just give suggested/recommended adjustments to data).

    As you point out, their definition of homogenization screams to me “Don’t do that!”. If needed, do it late and separately, with full documentation (before, after, reasons, full computer code, including source, compiler/version, hardware, running time, enough to exactly reproduce results).

  10. I don’t see how a temperature record can be made without using approximations which may be questionable.
    I think the key therefore is to identify these approximations/adjustments, make them transparent. Adjustments and approximations which significantly affect the record, upwards or downwards trends need to be examined, eg extrapolation over large areas. I would expect most of them to balance out over time. The most significant problem I’m aware of is the “Urban Heat Island”, however this is not a processing problem, but a measurement problem.

  11. Speaking with my physicist cap on, rather than adjusting data, wouldn’t it be better to increase the error bars instead? I get a bad feeling when I see data being adjusted.

  12. “local outliers are replaced with the basic behavior of the local group”

    So Einsteins’s ideas will be replaced with the basic averaged ideas of the Swiss people. Newton’s Laws will be replaced with the average thinking of people who sit under apple trees.

  13. I fear you are very much concentrating on the wrong problem. Even if you produce a statistically “perfect” method of analysis, the megasaurus in the dunny is that the raw local temperature experienced by the Met Stations has changed due to several factors including macro urban heating, micro changes to surfaces and land use around the station, and many stations were moved closer to buildings in the 1970s+ in order to facilitate automation.

    What we really need is for people to get off their seats in their ivory towers of academia, and go and visit each and every one of these sites. To characterise the site both now and historically in terms of their relationship to buildings and surfaces like tarmac. Only once you understand the micro-urbanisation changes can you really start considering macro changes like overall global temperature otherwise it is a totally meaningless figure.

    And as a lot of these changes will not documented and many of those who are the primary source of information for the 1970s are either retiring or dying, unless someone gets off their backside, puts together the millions upon millions such a project will actually take and gets going … this information will be lost for ever and we will never know what actually happened to global temperatures last century

  14. I am astounded at the homogenisation suggestion. Any serious statistical analysis should start by looking at the data closely and querying anything suspicious. However, the way to deal with suspicious data points is either (after long consideration) to treat them as missing data or else to build a statistical model with observation error probabilities in it (i.e. not necessarily the usual Gaussian measurement noise, but different error distributions or even positive probabilities of measurements unrelated to the “true” values).
    Homogenisation is a major source of overconfidence in statistically bland outcomes.

  15. I agree totally with you , Willis BUT I do want to the raw data plotted as well as the processed. I hate adjustments to data, any data. It ‘feels’, well, just completely wrong.

  16. This research relies on the assumption that temperature is a good measure of AGW. It is not.

    All the warmist assumptions are based on the idea of equilibrium of energy exchange. The planet does not achieve equilibrium with temperature which is one reason why climates change, probably the main reason.

    Atmospheric temperatures will also change far more quickly than those of the oceans which is why ocean temperatures are far more important an indicator of heat exchanges.

  17. but will their work be open to the public? can we have code and data? can we therefore replace some algorithms?

  18. What mechanisms might introduce damaging systematic instead relatively benign random errors if outliers are left in, tossed out or massaged in? Or is it more a matter of error bars not being accurate instead of just the trend being off? Knowledge of error types seems important here versus placing too much reliance on blind statistics that conform to some lofty idea of elegance.

    Also of great concern to me since I like to harp on very old single site records is how to deal with not being so lazy about cutting the global average off prior to 1880 like GISS does! There are three USA records that are fully continuous back to around 1820, and about a dozen in Europe back to the 1700s. There are many dozens that carry back to 1830 instead of just 1880. Extending the slope back another half century would help resolve the hockey stick debate.

  19. As soon as I saw the third goal in the piece I had the same concern as you, Willis. BEST (or anyone else) can’t just make the assumption that an outlier is an error and ‘homogenize’ it. It may be an error, it may be due to a microclimate, a very localised weather phenomenon (e.g. a fog-prone valley), etc.

    I’m afraid this stuff can’t just be waved away, as it’s the way these very differences over small areas have been handled that have caused quite a few of the issues with the existing global temperature sets (Eureka as an example, perhaps). It’s going to take a heck of a lot of hard yards, but that’s what must be done in order to get it ‘right’. Blanket processing algorithms won’t wash – it’ll take human eyes to look at and understand every significant local variation before a more generalised rule can be written to handle them automatically.

    Yes, it’s an enormous amount of work for a lot of people, but nobody ever said this was easy. There are no short-cuts.

  20. ctm Crosspatch makes a good point.
    For international audience here is an example that I have looked into recently. Rainfall is one of the principal climate parameters. Oxford and Cambridge, two UK’s university cities are only 65 miles (~ 105 km) apart, geographical features are very similar. It is likely that they have as accurate records as you can find anywhere in the world. Not only that there is considerable difference in the amount of rainfall, but even trends are different for two not so distant places. http://www.vukcevic.talktalk.net/Oxbridge.htm

    Reply: I wasn’t disagreeing with his point. He lives in my area and was using examples I thought might just be a bit obscure. ~ ctm

  21. Any highschool student can tell you that the “average” is only an acceptable measure of central tendency for a normal distribution. If it’s non-normal then the median or mode is appropriate. Or one can do a Box-Cox analysis on the distribution and apply a transform to make it normal. BWhy do learned scientists keep forgetting this fact?

  22. Willis Eschenbach says:
    March 23, 2011 at 1:13 am
    BigWaveDave says:
    March 23, 2011 at 12:53 am (Edit)

    What good is temperature data without corresponding pressure and humidity?

    What good is steak and eggs without beer? It’s not a full meal by any means, but it’s better than nothing.

    Yes, it would be nice to have pressure and humidity as well … but let’s take them one piece at a time, get that piece right as best we know how, and then move on to the next piece.

    Thanks,

    w.

    Hello Willis,
    I think that you missed the point. Temperature is not a measure of energy yet it is an increase in ‘trapped’ energy due to GHG that is being claimed to have changed the Earth’s energy budget.
    The amount of energy needed to raise the temperature of dry polar air one degree centigrade is significantly less (eighty times less?) than the amount of energy required to raise humid tropical air one degree centigrade. This is due to the enthalpy of humid air.
    see http://www.engineeringtoolbox.com/enthalpy-moist-air-d_683.html

    When the polar vortices expand, as they have done recently with the equatorward movements of the jetstreams, the atmospheric humidity balance alters. If temperatures are just averaged without regard to the humidity changes then for the same amount of energy there can be a significant rise in temperature.

    It would really be a good idea to decide if the intent is to measure energy budget or not.

    Then when you are measuring the correct variable, you can have concerns on whether a cool site in a valley on the northern side of the mountains is ‘homogenized’ because the 3 sites 50 miles away on the southern side of the mountains are so much warmer, and then the outliers in their temperatures removed to make the temperature pattern look nice and Gaussian.

  23. I agree with Willis re homogenisation and outliers. Where I live in NSW Australia, a few degrees shift in wind direction is the difference between cold from the Snowy Mountains and stinking hot from the interior. Temperatures can vary wildly over short distances.
    I looked up stations near Moss Vale on the Bureau of Meteorology website http://www.bom.gov.au, and there are three of them with some temperatures for January 2011 that are NOT marked “Not quality controlled or uncertain, or precise date unknown”. Their temperatures for Jan 3, 7-10, 13,16 (that’s all the Jan days they have in common) are as follows:
    MOSS VALE (HOSKINS STREET) 26.8, 28.0, 31.5, 32.0, 25.6, 21.0, 23.6
    ALBION PARK (WOLLONGONG AIRPORT) (38km away) 21.0, 27.0, 27.0, 27.0, 27.0, 25.0, 28.0
    KIAMA BOWLING CLUB (46km away) 28.0, 35.0, 32.0, 24.0, 23.5, 24.0, 22.6
    The 3 stations are in a nearly straight line, so Albion Park is very close to Kiama and at similar altitude (Moss Vale is about 2000ft higher).
    I would contend that the 3 stations bear little relationship to each other, and would be useless for estimating their neighbours’ temperatures.
    For example, from Jan 7-9, the temperature changes at the 3 stations were +3.5/+0.5, 0.0/0.0, -3.0/-8.0 resp. A 15 deg C difference over just 2 days!

    These 3 stations are in populated areas in a developed country, yet there are large gaps in the data, which itself looks pretty suspect in places.

    I reckon BEST have a difficult job ahead of them, but I would argue strongly in favour of –
    1. – not making up any data
    2. – not dropping outliers, and
    3. – reflecting missing or dodgy-looking data in uncertainty ranges.
    #1 and #3 seem pretty straightforward. I think #2 is reasonable, because genuine outliers can occur, so dropping outliers is little different to making data up. If the BEST team think they have some dodgy outliers, surely the best (no pun intended) thing to do is to increase the uncertainty ranges at those points.

  24. Agree with John Kehr regarding hourly temp readings. These should be included where available as a genuine way of reducing the impact of erroneous outliers. Sure, not every station can provide these, but it won’t be difficult to be able to provide separate results for all stations, just max/min stations and just hourly stations to see the difference it makes (if any).

  25. I’m in agreement with Crosspatch, even though I have no knowledge of the areas he is discussing. The geographical area I am most familiar with, the Rodney district in New Zealand, has a temperature profile that looks absolutely bizarre to those unfamiliar with it. It is coastal, has an enormous but shallow harbour with the longest harbour shorelineline in the world, lies on a very narrow and often rugged tongue of land and has an unusual number of vastly differing microclimates within a very small geographical area. Attempting to draw any conclusions from temps from adjacent but differing microclimates would seem to be a way to destroy any meaning in the records. As an example, the entire area is in the ‘subtropical’ climate, yet in some microclimates, morning frosts suficiently severe to freeze exposed water pipes are commonplace, yet other microclimates ‘just over the hill’ are entirely frost-free.
    Perhaps I am missing something due to my own ignorance, but I tend to agree with Willis in regard to homogenising of data and in removing outliers. If those outliers are an accurate record of temperature as it happened, removing them is rendering the data incorrect. While I see the BEST initiative as an invaluable exercise, I would be even happier if the same effort was made to ensure high quality data is taken, free from contamination.
    I guess I am still a suspicious country boy at heart, as I tend to distrust exterme cleverness with mathematics and statistics employed to ‘get a result’ taken from situations and equipment that are influenced by factors other than those we are attempting to measure. I do not understand the rush to get the BEST thing done, as the world and it’s climate will be here for a while yet doing its thing, whatever that is.

  26. “If we find data we know to be bad, throw it out. Don’t just replace it with some imaginary number that you think is somehow more homogeneous.”

    …and split the series. Allow short series (much shorter than a year) to enable data to be tossed without creating a hole in the series that needs to be filled by imaginary data.

    Then the algorithm can be run and rerun with varying criteria on discarding data.

  27. @Willis : “Instead, if I found such an actual anomaly or incorrect data point, I’d just throw out the bad data point, and break the original temperature record in two at that point, and consider it as two different records. Why average it with anything at all? That’s introducing extraneous information into a pristine dataset, what’s the point of that?”

    Ahhh yup. Man, that’ll teach me to post mid article…

  28. Actually if you want to get a real feel for what has happened with real global temps lets say, since 1880, is to look at unadjusted raw RURAL data only, I mean one that is STILL rural. I don’t think I have seen ONE anywhere showing any significant warming, has anyone here? In any case it does not matter what BEST come up with in the end, it is the current trend for say next 10 years that will count (yes even with urban station) etc, because since 2002 it is already FLAT! ie no extra warming as predicted

  29. As one other poster, above, noted, averages can be deceiving. There is another poster on here which has an excellent analysis of Canadian temperatures. His graphs and analysis shows that maximum daily highs (as recorded by land based instruments) have either not been increasing or have in been decreasing over a number of decades, while the daily minimums have been increasing. By averaging the temperatures, it appears that Canada is experiencing a warming trend. But this sort of warming trend (where only the daily minimums increase) is actually very good for many reasons. With no increase in daily maximums, is there any harm being felt?

    So what I am asking for is if BEST could show both the trend in daily maximums and daily minimums rather than the change in the average temperatures.

  30. I agree with your perspective Willis. They need to keep the real distribution in the data. One can explain and eliminate an outlier, or leave it alone, but don’t change it to something else, include it, and pretend you have the same representation of the data. I am not confident about the choice of confidence limits, those are contingent on the type of data distribution, and people usually default to Gaussian which may not represent nature, as you point out.

  31. Great analysis, Willis. I am with you all the way. And I go farther than you. Climate is local. That is apparent to lovers of the outdoors. For example, the coastline of Florida is considerably cooler than inland areas and the wind is always blowing on the coastline. But the coastline is a permanent feature. Surely it qualifies as climate not weather. Yet how many times will the coastline show up as an outlier in an inland cell? Seems to me that the cell based approach is seriously flawed. Why use cells? If using cells, why make them larger than one mile by one mile? This is the age of computers, after all. Data management has benefited so much from computers that a finer mesh of data really seems required at this time.

  32. I’m kinda of new to this stuff. Yes, the homogenization thing jumped out at me also. Some questions:

    1. Is homogenization a technique for getting the final data point? Or is it merely a screen to minimize the number of data points that need to be manually examined for plausibility?

    2. When homogenization takes place, are they working with actual temperatures, or “anomaly (first difference?) temperatures”

    3. Has any of the input data been OCRed from old records? If so, there really should be a step somewhere to try to detect misreads e.g. 3s read as 8s,5s as 6s (the two are virtually indistinguishable in at least one font-I forget which). Some OCR errors are pretty much indetectable even by humans, but errors in the leading digit can be pretty blatant.

    3a. Has handwriting recognition technology been used on any of the input data? It has probably improved some since I worked with it tangentially two decades ago, but it was pretty iffy back then (Don’t confuse real time recognition where stroke order information is available with trying to identify the result without stroke order info. The latter is much harder and would be what is needed here).

    3b. Is there any provision for flagging records with a high detected error rate as being doubtful?

    3c. I know that editing the data seems to be traditional in “climate science”, but if that is really the case, might not this be a good time to break with tradition? Is it feasible to pass through all the data and flag the values that screening says are doubtful so that the end analyst can choose to use them, tweak them, or reject them?

    4. The output of this effort is what? A single global temperature? A cleaned up temperature set from which “global temperatures” can be computed?

  33. I don’t think the temperature sets should be looking at max and min data, they should be loking at diurnal variance and be plotted against local cloud cover. Everyone knows anecdotally that if you have a clear day and cloudy night way less heat is lost than with a clear night. What AGW should be proving if true is that the diurnal variation should be lessening as night time heat loss decreases. I have abig problem with the black body theory – Part of the world is absorbing heat whilst part of the world is losing heat all the time, and each part of the world is different in terms of land/water/biomass coverage. This kind of simplictis modelling proposed is still way way off where we need tobe – until we get fractal based systems we’re a long way off

  34. There is another problem in homogenising with “neighbouring” stations, like Straßburg (France) and Karlsruhe (Germany), which are some 70 km apart. The French compute the mean temperature like the US as (Tmax + Tmin)/2, while in Germany it was (T0730 +T1430 +2*T2130)/4 and now is the mean of 24 hourly measurements (with T0730 being the temperature at 7:30 CET). There is a difference of up to 1K between the Frenchs and the Germans mean daily temperature. Any homogenisation would introduce a bias.

  35. Agree wholeheartedly with the article. In trying to arrive at my own “scientifically justifiable” approach to analyzing global temperature, I saw “homogenization” (for want of a better word) and “data creation” to be the “Achilles heal” of the methodologies already being used.

    Honestly, with no attempt to discredit BEST so early in the game, I think “homogenization” represents a “shallowness” of thought or understanding regarding the actual physics, measurement techniques, geographical, and statistical effects bearing on “global temperature change estimation”. What is needed is “deep” thought on this topic.

    I had high hopes that BEST will provide a universally “acceptable” product. Hopefully, BEST will reconsider this aspect.

  36. Assuming they will not have the resources/time to dive into each station and it’s individual history, how about chunking the surface into geographical regions based on the best available inventory of micro-climate and “medium-climate” types. Then, generally treat those regions as independent islands.

    Also, I think that they should do runs where they;

    a) leave out micro/medium climate areas that don’t have stattion coverage
    b) Include those uncovered areas based on some stated, rational method.
    c) Land only,
    d) ocean only
    e) lastly, global

  37. I would really like to see a straight “binning” analysis which defines some metric of warming, *at a particular site*, and then allows one to flexibly bin the data. This way we could ask questions like “what percentage of sites at 30-65 deg latitude have warmed?”, and “what percentage of rural stations have warmed vs. urban stations”.

    I’d also like to see the definition of “warming” be flexible. For example one could select traditional temp anomaly, or just average temp for the year, or compare monthly average year over year, or use absolute max temp for the year, absolute min temp, average of monthly max temp, average of monthly min temp – etc…

  38. Willis – I agree with you completely.
    In particular, I am very much against homogenization.

    I know that you are aware of the Australian Bureau of Metrology “High Quality” datasets, which produce, via very sophisticated statistics, a very sharply rising Australian temperature map.
    This has been built on the top of raw data that has very little trend, except for some UHI here and there.

    Please forward your criticisms to BEST if you haven’t yet done so, as by and large, they seem to be trying to produce an honest series.

  39. If BEST can get the method right then eventually the BOM can be induced to follow their lead.

    BOM have for too long followed the siren call of the IPCC.
    Time to change ships before it’s too late.

  40. Thanks for a useful post, Willis – I admit to (customary) laziness in not checking through BEST’s campaign plan, and share the general unease with “normalisation”.

    I think Stephen Richards comes closest to my immediate reaction. If they are proposing to release a “massaged” dataset, then they should also release (a) all the raw data exactly as received, with just an indication of where cuts into data packets have been made, and (b) full information on what changes have been made where, and why.

    In short, all the original data and details which the Hockey Team prefer to keep under the carpet, so that all the information is freely available for inspection and discussion by anyone. Not just plots, either – the original figures are the foundation of the entire structure built on them, so should be there “in person” in great chunks of CSV or whatever for complete access.

    Like that uoguelph.ca link above, which ends with:
    DATA and CODE:
    * The programs use (whatever software, algorithms etc.)
    * The zipped archive is (link) here.
    – like it oughta.

  41. After reading Anthony, Willis, and most of the comments, it is very to see why there is so much excitement about the scientific nature of this project. If valid, unadjusted raw data in various regions has not been identified — and hasn’t some of it been destroyed? — how can any analysis of that data be anything but worthless. And homogenization, come on. “Global climate temperature” is not possible, but overall warming and cooling in the various regions does seem doable. Why sign on to this project?

  42. The homogenization also would seem to obscure the noise inherent in the data. If nearby 10 stations all have small DC offsets- say, up to 0.5C, you just can’t arbitrarily remove those, find the standard deviation (assuming a Gaussian distribution of errors, which just can’t be right) then quote some absurdly small uncertainty in the temperatures. And you can’t then average a thousand such areas and quote an even smaller uncertainty.

    There has to be some, ahem, “robust” accounting for the fact that you must have significant uncertainty in finding “local” average temps, and that doesn’t really become smaller like sqrt(number of stations), since the errors are a combination of many sources, some non-random and non-Gaussian. The homogenization obscures this.

    Jeff

    Jeff

  43. I worked for many years in acoustics and phonetics research, building electronic gadgets and writing software to perform experiments.

    I know one thing: In any honest scientific field, you don’t “homogenize” anything. When one subject shows responses that indicate he can’t do the task, or misunderstands the instructions, you don’t try to “pull” his responses toward other similar subjects. You toss him out. If you have to toss so many subjects that your total data set ends up unusable, you toss out the whole experiment and try something different.

  44. I question how good any homogenization technique will be in detecting errors that creep in over decades. Such as micro site contamination from growing plants or urban encroachment.

  45. If the nearby city is big enough, encroaching urbanization could easily affect almost all of the sensors in a given region. In such a case, would homogenization adjust the one or two sensors that aren’t being encroached to better match those that are?

  46. Willis,

    I live near the water on Georgian Bay, Ontario, Canada.
    We have a 50′ hill all around this area of shore line.
    The other day it was 4 degrees C colder at the bottom of the hill to at the top when I was driving home.

    Ask me if I trust temperatures on a global scale…NOT!

  47. Is it reasonable to do the homogenization as the last step and then show the with/without delta so that we might appreciate what its effect is? If the effect is insignificant? If it is significant, then the “outliers” really must be identified and qualified, musn’t they?

  48. Data are. They are readings taken from instruments. Good data are data taken from properly selected, properly calibrated, properly sited, properly installed, properly maintained instruments. All other data are bad data, for whatever reasons. Missing data are simply missing.

    Bad data cannot be converted into good “data” by gridding, infilling, adjusting, homogenizing, pasteurizing, folding, bending, spindling or mutilating the bad data. Missing data cannot be magically, mystically “appeared” any more than good data should be “disappeared”.

    Data which has been adjusted is no longer data. I don’t know what it is, only what it is not. In this context, however, it is typically referred to as a “global temperature record”.

  49. We have a number of “Oregon’s Freezer” areas here, known for huge ranges between the night time low and the day time high. This pattern is not unusual at all, occurring rather frequently in long term records. These outliers are simply part of the climate we experience. Marginal desert plains high in altitude relative to the surrounding country side exhibit typical desert characteristics but in a much more random “outlier” fashion. Other similar landscapes in counties and states in the US experience similar odd lows and highs within a 24 hr period. To treat such occurrences as outliers seems to me an effort to negate climate patterns entirely natural for the area. Why do that?

    I hope these folks are meteorologists or at least have a good in-the-field meteorologist on their staff so they have some understanding of weather pattern variation based on topographical climate sub-zones within the major climate zones long known among weather men and women. If they rely on computer machination to “take care of outliers”, we are back to square one with yet another piece of junk science.

  50. I would prefer that all outliers be flagged rather than removed. Coding is essentially as easy this way, it just takes a little more data storage. Given the small size of the dataset, this is not a burden IMO.

    Flagged outliers (preferably with the reason they are considered outliers) would make it very easy and straightforward for anybody to investigate the nature of the outliers and express opinions (conclusions) about whether the exclusions are optimal or if algorithms should be modified. Processing errors begin to stand out. It would also make it far more straightforward for somebody to modify later processing, such as replacing homogenation with an algorithm for creating separate data sets, such as AW suggests. Early processing can be accepted, modified, or rejected.

    When young, I helped develop processing for a satellite data set. I (and others) fought to keep all data even in the final data product. Data considered “bad” were simply marked as bad with a bit coded reason for it. This approach led to quick recognition and correction of a coding error where data marked as bad wasn’t. It also allowed researchers to investigate these points (one person’s outliers are anothers focus of investigation).

    The counter arguments given at that time were “That it was a burden for data users to have to check the flags” and “It might lead to some researchers naively including bad data in their analyses leading to bad results” and “We’re the experts, people ought to trust our judgment of what’s bad and what’s not”. While there’s some validity to these arguments, I think the benefits outweigh any extra costs.

  51. Willis Eschenbach says:

    2) Avoid gridding. All three major research groups currently rely on spatial gridding in their averaging algorithms. As a result, the effective averages may dependant on the choice of grid pattern and may be sensitive to effects such as the change in grid cell area with latitude. Our algorithms seek to eliminate explicit gridding entirely.

    The problem of change in grid area with latitude can be avoided with a different geographical reference system. See: http://www.neubert.net II – Platonic Spheres – Octahedron’s tesselation.

    A. N. Ditchfield

  52. Joel Heinrich>

    I think you’ve put your finger on an important point: homogenisation is not something reasonable to do for temperatures, but it is reasonable for temperature _measurements_.

  53. Thanks, Willis, great post.

    crosspatch is absolutely correct, we can drive 30 km and get a whole different climate, much less a different temperature. No doubt this is true in other parts of the world as well. The idea that 1200 km represents some kind of reasonable number seems purely arbitrary, as well as way beyond what we in fact know to be the case in the real world.

    Homogenization, by definition, will destroy information. By definition, it introduces an a priori, preconceived artifact into the record of what we think the temperature *should* look like. You cannot get away from that and there is absolutely no reason to go down this path, so why do it? Does it make obtaining the average temperature easier? Sure. Because we can sit behind a desk and just do calculations and statistics, rather than getting out in the field and actually looking at the site in question to see if there is a perfectly legitimate reason it is an “outlier”.

    I’m hopeful that BEST will be an improvement over existing approaches, but this whole concept of homogenization is one of the first things that has to go.

  54. What’s really missing is a sensitivity analysis of the points you bring up, not whether they are valid or not.
    For example, grossed data is not a problem I’d used properly so once you have your data set, perform a sensitivity analysis on grid size in over and under sampled areas. What does it do to the final answer? You can determine the maximum grid size allowable and go with that.
    Same for outliers. What happens to the output when the outliers are increasingly removed from the input.
    Methods that show a high sensitivity to slight changes should be more carefully investigated. I suspect if this were done people would find the earth’s temperature cannot be determined to within 0.1 deg.

  55. The Surface Stations project shows that most of the data being collected in the US currently is bad data. This demonstrates either a lack of seriousness regarding the data collection process, or a desire to assure continued collection of a “loosey goosey” data set which can be subjected to massive adjustments. Neither of the above explanations for the current situation leaves me with a “warm, fuzzy feeling”.

  56. did ya hear the one about the statistician who was found dead in his kitchen?
    his feet were in the freezer and his head was in the oven. on average he felt fine.
    he was an average person, with one breast and one testicle – and when rendered in a blender, was quite homogenized.

  57. The problem BEST face is that they are trying to do the analysis using only rules – no human judgement. Its the use of human judgement in the current data which causes people to doubt its accuracy.

    However, given they are doing this purely algorithmically, it is easy for them to do it in multiple ways. So they could implement an algorithm which does it the way Willis likes, and see how different the results turn out.

    What should be guarded against is people running the whole thing multiple times with multiple rules, and then selecting the appropriate rule post-facto, because it appeals to their prejudices.

    So I think they should accept submissions from people who have a particular rule in mind, and develop algorithms for these rules before they have any results. Then run their software multiple times, once with each rule.

    My bet, for what its worth, is that the statistical effect of a large amount of data will overwhelm any errors in the data, so that whether you use raw data, homogenized data or human adjusted data, won’t make much difference. But we won’t know until its been done.

  58. Willis:
    Would not the inclusion of short records require working with absolute temperatures rather than anomalies?
    While I am very much in favour and optimistic about the BEST effort in terms of creating a series which has more transparency than the existing series, the new series should come with many of the caveats and cautions alluded to in the above comments.

  59. Emperic homogenization assumes a common and unchanging density of intersite relationships. It’s turbulent, it is, through time and space.
    ============

  60. I can only add my endorsement to those who advocate that “outliers” should be treated with the greatest caution. The term presupposes a sure knowledge of the underlying true distribution, and so a circular argument ensues. The only real excuse for adjusting “outliers” is that a gross clerical error has been introduced. You can perhaps check this by reference to the original data source, but it is a time-consuming process. When I was an industrial chemist the most common error was a reversal of a pair of adjoining digits. Often easy to spot for leading digits, but elusive thereafter. I’ve often argued that smoothing procedures are to be avoided, but I can understand their attraction for the supposed purpose of clarifying a complex situation, so that people such as politicians and journalists don’t have to use their brains. This does not form a pretext for unbridled smoothing at the scientific level, though.

  61. In producing a global temperature average we are fundamentally working with small geographical areas and combining them into larger ones. We cannot escape grids. Whereas most of the globe is ocean and immune to the microclimates of mountains and plateaus, and not easily carpeted with weather stations, larger starting grids and more averaging are necessary. More fundamental questions are, should high elevation surface temps be given equal weight with their thin air, and of course, should SST be weighted equally with land?

  62. What a great question, and what a wonderful project. Data can appear non-representative for so many reasons. A change in wind direction, a sudden storm, an architectural change, moving cloud shadows, different microclimates. Some of the short distances mentioned above should also mention a difference in elevation of up to a mile. So I would suggest that all raw data be kept. If any of it is dropped or adjusted for purposes of this project, then it should be categorized and marked with the reason and the adjustment, so that it is not permanently gone from later discussions. We should be able to bring different sets back into the fold, to try different treatments of the data.

  63. May I ask a naive question: What is it temperature?…..my microwave oven was cold until I turned it on to heat my food up. Does temperature measure reveal its origin?

  64. “Accordingly, I will ask the assistance of the moderators in politely removing…..”

    A thousand one-liners about Berkeley shot to heck. :(

    At any rate I’m glad their not gridding, “spatial” considerations are way different than “nearby” considerations.

    Willis, perhaps you know, I certainly don’t, are they spatially weighting at all? It seems to me that they must. Perhaps it is the vernacular used, but I don’t think an average of temp data is particularly all that useful. Do a dozen thermometer temp readings from LA county weigh more than the 2 interior Antarctic temp readings? (pretending that those two are reasonably reliable)

    Like you, I don’t think the simultaneous steps are justified. When doing any kind of heavy mathematical data analysis, I always break it down into a clear sequential process. It’s much easier to see where I went wrong the first time when doing it this way. And how does one accomplish simultaneous steps to begin with? There is a hierarchy of math processing. In other words, regardless of how one formulates and algorithm, we don’t add and divide at the same time.

  65. One of the things they could do is keep a record of all outliers that have been excluded. That way it would be possible to compare what effect the outliers would have had on the overall result.

  66. Before making my comment, I would like to apologize for accusing Mr Eschenbach of fraud in one of my posts. I should not be so fast to accuse someone of dishonesty because I have a strong disagreement with their position. I should recognize that people can have honest differences of opinion and can make honest mistakes.

    I am curious about the statement that Temperature is a Zipf distribution. I have never seen this referred to before. Looking up Zipf distribution, in Wikipedia, I find that it refers to a distribution where the 2nd most commonly occurring value appears 1/2 as often as the first, the third most common appears 1/3 as often as the first etc.. A generalized version is ~1/n^s, where n is the rank and s is a number greater than 1.

    http://en.wikipedia.org/wiki/Zipf%27s_law

    Does anyone have a reference which documents this dependence of temperature distribution on rank? Since temperature is a continuous variable rather than a discrete variable, how is rank defined?

    REPLY: Mr. Eadler is now restored to posting status – Anthony

  67. I think there should be a an xml schema definition created for an ideal temp record. This will allow universally agreed defintions, make explicit the assumptions about missing data and make database storage and transfer of the records much easier.

  68. Ian W says:
    March 23, 2011 at 3:06 am

    Hello Willis,
    I think that you missed the point. Temperature is not a measure of energy yet it is an increase in ‘trapped’ energy due to GHG that is being claimed to have changed the Earth’s energy budget. …

    No, I understood the point perfectly. The energy content of the air is dependent on both temperature and humidity. It’s just that until we have a valid temperature record, I see moving on to humidity as premature.

    w.

  69. I find all of this very confusing. Presented with the hypothesis that catastrophic warming will destroy life as we know it, we are called upon to spend billions or trillions of dollars to avoid this horrible fate. If this is such a big deal, why do we have a tiny group of researchers doing this on shoestring budget? Do we really believe that manufacturing a derived set of numbers from this data will increase our confidence in what is, after all, a set of empirical measurements which could, in theory have any statistical and numerical properties you care to conceive. And, furthermore, performing this distortion of empirical data automatically.

    Spend a lot less than a billion dollars and hire enough graduate students to check every data point from every station and throw out the garbage. The process will take a few years and have to be managed, but at the end you’ll have raw data that you can put some trust in. The warmists are talking about the end of the world and then going out to the garage with a can of WD-40 and a hammer; I’m sorry but even BEST is bush league in this context. If you going to do the job, do it right.

  70. \\ Rather than regarding empirical homogenization as a separate preprocessing step, we plan to incorporate empirical homogenization as a process that occurs simultaneously with the other averaging steps.//

    There is no doubt we need to clean up the data (1) and find the M20 readings recorded as +20 instead of -20. Some cleanup is absolutely necessary. Nevertheless, one persons outlier might be either a local heat wave or an errant blast from a jet engine.

    My questions can all be laid to rest with comforting answers to these questions:

    A) will EVERY data value flagged as an outlier or suspicious be logged with the corresponding filter type and statistical criteria? Will that Log be part of the transparency of process?

    B) will every data value flagged be logged with all the filters it fails?

    C) will the data corrections be a separate log entry with queryable justification?

    D) will a homogenization log be available for independent statistical analysis so that each of the filter criteria can be independently analyzed and or audited for clustering?

    E) What will be the process for correcting some of the corrections?

    F) What about the data that WASN’T flagged, but was close. Will that appear anywhere? In short, are there several levels of flagging? Why should there be one and only one set of corrections?

    (1) BTW: what are the sources of the BEST data set they are starting with? What clean up has already been done with it prior to getting it in their hands?

  71. As someone who use to develop code for calculating the Measurement Uncertainty of modern digital test equipment, I fear no evil. But the thought of trying to do this with the homogenized mess being proposed is going to give me nightmares.

  72. Since, in the end, folks will use the data to estimate “temperature changes” with time, it seems that it might be appropriate to evaluate changes directly (rather than via estimated absolute temperatures).

    Then, combine the estimated station-by-station changes in some acceptable way to obtain an “average” global change (over a time period).

    This would seem to solve some problems — while no doubt introducing others. Nonetheless, it may use the raw data with fewer arbitrary corrections and does go straight to the heart of the questions — how much has the temperature changed and when.

  73. John Brookes says:
    March 23, 2011 at 6:56 am

    … My bet, for what its worth, is that the statistical effect of a large amount of data will overwhelm any errors in the data, so that whether you use raw data, homogenized data or human adjusted data, won’t make much difference. But we won’t know until its been done.

    Since that depends entirely on which methods are used, you are correct. However, I am less sanguine than you that the effect of lots of data will overwhelm the use of bad mathematical methods. My experience is that if the method is wrong, it doesn’t matter if I have one data point or a thousand …

    w.

  74. Bernie says:
    March 23, 2011 at 7:06 am

    Willis:
    Would not the inclusion of short records require working with absolute temperatures rather than anomalies?

    I think you could do it either way, using the first differences method with anomalies.

    w.

  75. You can’t do good mathematics with bad numbers. Period. Using statistical methods does not change that. Once data has been adjusted, or includes numbers taken from a series shorter than the main body of data, then the result is nothing more nor less than somebody’s opinion. 2 + 2 = 4. 2 + something that looks as though it may be between 1 and 3 is simply an unknown. Your best estimate may, or may not, be better than my best guess, but unless we can take a measure to it the world will never know.

  76. I have been following this project since it first went public and have given a lot of thought on this subject; how best to go about making the results accurate and transparent, and leave as little doubt in the results as possible.

    My conclusion and advice to the BEST team is this:

    Provide the results as an interactive tool, rather than just a single final analysis. Provide option check-boxes so that anyone who wonders what the results would look like if a particular step had been done some other way, you can simply check/uncheck a box and the application would show the results of the other method. That would free the BEST team from needing to make hard choices between mutually exclusive methods that may each have merrit and risk. The application would not need to do the math over and over. That could be done one time, then show the results in an Adobe Flash web application or something similar. It would just be graphics.

  77. I am struck by the thought that while BEST may give us more precise temperature information, it somewhat like relying on a cars odometer for miles driven without taking into account whether the vehicle is on a dynamometer or converted with monster truck tires. We may end up with more precise information but not anymore accurate.

    Eliminating (ignoring?, waiting for?) humidity and ocean temps is just too much when we base trillion dollar decisions on this data. We must be clear that while improvements to data are great, that in of itself does not constitute information, knowledge or wisdom.

  78. How do warm breezes and wind chills figure into records?
    Here in Toronto it’s -5. With the wind chill, it feels like -13.

  79. UHI Adjustments?
    What are the requirements / specifications for the Urban Heat Island factor, if any?

  80. Archives?
    What are the requirements / specifications for retaining raw data, adjusted data, requirements / specifications, operating processes / procedures, code, off-the-shelf applications and any parameter values as a historical change log? (Configuration Management)

  81. Very interesting; assuming that the raw data and all software to process the data is open for review and others to analyze, it will be interesting to see a graph of all the different results showing the range of different numbers. If one analysis showed a value of xxx for a given time period and another showed a value of yyy, |xxx-yyy| would show us the difference between the two. (Assume that xxx is the highest value of all the studies and yyy is the lowest value.) I wonder how much it’s going to be?

    Even if all the different calculations end in results that are very close, we’re still left with the problem of: How much is due to CO2 and how much is a natural change?

    Bob Diaz

  82. Temperature Station Attributes?

    It might be useful to record the station attributes of the measurements: instruments, location, site and NOAA estimated error (http://www.surfacestations.org/). A relational link to the station attributes might do the job; it should be date stamped to provide for changes.

  83. jorgekafkazar says: “Ignoring the issues of correlation not proving causation and “global temperature” being a purely theoretical concept, the statistical obstacle you mention makes this “BEST” effort seem doomed to irrelevance..”

    Willis Eschenbach replies: I disagree entirely. The BEST effort is just beginning, how could it be “doomed” to anything this early in the game? The procedures are new, and will certainly be changed at some point. In a battle, the first casualty is always the battle plan … they will soldier on.

    Despite your military metaphor, you are thinking like the truth-seeking scientist you are. I certainly hope you are correct, but keep in mind, too, these military quotations:

    1. “In war, truth is the first casualty.” Aeschylus
    2. “We have a fifth column inside the city.” – General Mola

  84. Whereas I can see that there is a meaning in having as good as possible map of temperatures, I see little meaning in having a map of anomalies. It is heat content that is important to tell us if the world is heating or cooling. Once one has a good map of temperatures one can use a modified black body radiation and integrate to get the total energy radiated by the earth.

    Anomalies are just red herrings, in my opinion. One gets 15 degree anomalies at the poles with much less energy radiated than the 2 degree anomalies at the tropics. It is the energy that is important.

  85. I dropped a line to Elizabeth Muller of the BEST project and was pleasantly surprised that she actually answered. I was making the point that all these new data stations would need to be investigated in the same way as Anthony’s surfacestations project; otherwise we would just be adding lots of data of uncertain quality. I thought the BEST folks were missing a great opportunity to harness the energies of many people globally to investigate a station or two each or just let us know about their individual experiences at one location. Although her response was very pleasant and accommodating, there was no indication that such a thing was planned. She said the station locations would be revealed at the same time as their results a few months hence.

  86. Maybe I missed it in previous discussion…..
    Who is doing the peer review of the product of BEST?

    My real concern is that they’ll publish a report, and it’ll be taken at face value immediately by the msm, embraced in full, and given gospel status.

    It needs to be completely open to review and discussion if it’s going to be considered valid science, and as Willis points out, there are legitimate concerns before it even starts. We’ve been down this road before, where the outcome is predetermined. This could well be just that, where at the end will be the pronouncement that “We were right all along, we did everything right, the earth is warming at an alarming rate, and the word Denier should be spelled with a capital D”.

    With severe cutbacks in educational funding at universities, the push for more self-sufficient professor funding and research, the old adage “No problem, no funding” is stronger than ever.

  87. Thank you Willis for drawing attention to this issue.

    I concur with AllenC that BEST must provide for a different kind of analysis than what we normally see. I believe he refers to the studies done by JR Wakefield, whose work I also admire.

    What is so striking about Wakefield’s research is his refraining from any combining or homogenizing of data across geography. In this way, the uniqueness of each microclimate is respected and understood.

    IMO we need BEST to produce three things:

    1) A validated record of the actual temperature measurements at each site in the database. (This should be the baseline for any kind of analysis, but must be kept separate from any averaging that others may do later in other kinds of studies.)
    2) A representation of the climate pattern over time at each site, to the extent the record allows. This would ideally include not only trends of daily averages, but also daily mins and maxs, changes in seasons (earlier or later winters, earlier or later springs), changes in frequencies of extremes (for example >30C, <-20C).
    3) An analysis comparing local climate patterns to discern dominant trends at regional, national and continental levels.

    This kind of research shows what is actually happening in climates in a way that people can relate to their own experiences. Even more, the results will be extremely useful to local and regional authorities in their efforts to adapt to actual climate change, whatever it is.

    Adaptation efforts would also be better informed if precipitation and humidity records and patterns were included, but I take your point about first things first.

  88. Pooh Dixie says:

    “UHI Adjustments? What are the requirements / specifications for the Urban Heat Island factor, if any?”

    BEST’s homogenization process is designed, among other things, to identify the presence and magnitude of urban heat islands.

  89. homogenized data is no longer raw data but a guess … GIGO … if you don’t have a temperature reading over hundreds of miles of surface, and I can’t stress this enough, YOU DON”T HAVE RAW DATA …. blending, averaging, homoginizing doesn’t fix that problem and that problem means you are not doing climate science …

    we don’t have raw global temerature data that is equally distributed over the surface … i.e. we don’t have raw data …

    after that anything they do is just guessing and while this project is an interesting statistical exercise but it is not climate science …

  90. Willis, a brief comment. The Berkeley group may not present the “last word” in how to do the surface temperature analysis. I regard this as the first important step, in terms of a comprehensive data set that is well documented and transparent. They have introduced some new methodologies, which are all steps in the right direction, IMO. Their analysis will lay the foundation for others to try other methods and to improve on the analysis

    Judith Curry

  91. I have a question about annual global averages and the number of sites used for the calculations …
    If year X has 1,000 data points and year Y has 800 data points can’t they generate an average for year X using the 1,000 data points and for year Y using the 800 data points or do some or all of the 200 “extra” data points in year X get dropped because they don’t have the same datasets ?

    Since they are calculating an annual global average shouldn’t they only care about contiguous data within a single year ?

  92. [“Snipped – OT” - see article body]

    I too do not understand why the first objective is not to identify any trends in the many climate regions, by region, including micro-climates.
    The concept of an overall global trend is kind of meaningless since it will wash out at lot of information.

    I think that creating a database by Station (with gaps) that included a quality indicator with a algoithm reference to how the quality indicator was determined should be the first step. This would include hourly temp, humidity, wind speed and direction where available, where only min/max were recorded then only two hourly rows would exists the remaining 22 would exist but contain a NULL value (NULL does not mean zero is means No Data), the data should include a relative altitude to mean sea level.
    Then add a fewof columns for each remote sensing device that has information for the station location, again with gaps and quality indicators and references, hourly.

    Then the same again for the oceans, grid the oceans in 100km squares and use any data from any device with a set of columns per-device, for mobile devices use the data from when the mobile device was in or sensing the grid square, for bouys include the relative altitude from mean sea level.

    It would be no less of an undertaking then the human genome project, but once completed, we would have a maintainable public database that could be used for many purposes.

    It should become the one and only input source for any and all computer models, a common public frame of reference containg the totality of historical information.

    It will be somewhere in the range of 2,680,927,920,000 rows.
    Large but manageable, on par with the scale of a historical transaction database for a very large stock exchage.

    Raw unadjusted data, all of it, in one place, all public, warts and all.

  93. crosspatch says:
    March 23, 2011 at 12:17 am

    Temperatures are odd things. Climate conditions can vary considerably over only a few miles of geographical distance.”

    Yes, anyone that lives next to a body of water can attest. Also regional weather anomolies at the same site might be difficult to handle, such as a Chinook wind in southern Alberta where you could have -20C to + 20C in a matter of hours… it might seem impossible, but those who have experienced it will assure you it’s real.

  94. Will,

    they are doomed because they are starting with bad raw data … its that simple … you can’t fix that with math …

  95. I think there’s a much deeper problem: BEST is doing the best they can, and we ought to be thanking them because it’s a job someone should have done long ago, but no one should expect to get clean data out of this.

    On the contrary, by the time they get done their conclusion will, I predict, be that we don’t have valid data. Consider, for example, their decision to remove gridding because it adds imaginary data – that’s certainly true and therefore a good decision, right? Right, but absent gridding, there’s no large area applicability, so every researcher who wants to draw a conclusion applicable beyond the limits of each station is going to be adding some kind of gridding stand-in- and so making up data to cover what he doesn’t have.

    Bottom line: if BEST demonstrates – by meeting about 3.5 of their four goals – that the best data is pretty useless for large area policy and planning purposes they’ll have done something very valuable, but it will be the opposite of what most of the people commenting here seem to expect – taking away a basis for conclusions instead of providing one.

  96. Re: UHI: John Tofflemire says: March 23, 2011 at 10:10 am

    “BEST’s homogenization process is designed, among other things, to identify the presence and magnitude of urban heat islands.”

    With respect, I did not ask about the assertion. I asked about “requirements / specifications”. That would include the derivation for each location.

  97. Willis,

    I have to ask about the use of the Zipf Distribution.

    A quick check of Google shows that you are about the only person who uses this distribution with respect to temperature (yet you seem very confident that it is THE distribution for temperature data). Typically this distribution is used to look at things like how often different words come up in language, how often cities of various sizes occur, how much traffic the 7th most popular web page gets compared to the 8th most popular.

    This distribution is based on rankings of items. From Wikipedia on “Zipf’s Law”:

    For example, in the Brown Corpus “the” is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69,971 out of slightly over 1 million). True to Zipf’s Law, the second-place word “of” accounts for slightly over 3.5% of words (36,411 occurrences), followed by “and” (28,852).

    A similar statement for temperature would be something like:

    For example, at some weather station “70 F” is the most frequently occurring temperature , and by itself accounts for nearly 7% of all temperature records. True to Zipf’s Law, the second-place temperature “69 F” accounts for slightly over 3.5% of temperature records followed by “71 F”.

    Of course, the specific percentages would be different, but I don’t really see how a ranking of temperature frequencies will be an effective way to analyze the data. For one thing, it eliminates the sign of the deviation from the most common data (the mode).

  98. “It is also deplorable that the BEST team seek to hide the adjustments they make, rather that letting the raw data stand”

    BEST might take a page from the lessons learned in IT. The best practice in Data Warehousing is to never overwrite (update) or delete data. The only operation allowed in the warehouse is an insert operation.

    Think of the old manual accounting ledger. You never cross out or erase any entry. Even when the entries are in error. You remove errors by inserting a reversal. You correct errors by inserting an adjustment. Each entry then becomes part of a history, leading from the current state of the data back to the original state of the data, providing a full audit of all changes.

    As soon as you start deleting or updating entries in place, usually under the excuse of “saving space”, or “more efficient”, you have no way to audit the changes and you cannot rely on the results. You can never certify that your data is correct.

    The mistake that CRO made, as revealed by ClimateGate was to adjust the data in place, and thus overwrite the original data. This effectively destroyed the audit of changes, so that no one can determine what was actually done. As a result, the CRU cannot certify their results as accurate.

    Financial transactions have been done this way for years, allowing banks to certify that their financial data is correct. There are now large corporate initiatives underway to apply this same technique to non-financial data, to allow it to be certified as well. The typicall buzz word is Master Data Management. The aim is to make non-financial changes auditable by keeping a full history of all changes.

    The beauty of this technique is that by maintaining a full history, you can always correct past errors. If you make an adjustment, and later you find out the method was faulty or the calculation incorrect, you simply insert a reversal today, effective the date of the error and the data is automatically corrected, with a full history still in place. Banks use this method all the time to correct past errors with minimal disruption to existing data.

    BEST should apply the same techniques to control of their data. Otherwise the correctness of the data will always be open to dispute, no matter how good their mathematical approach. Data quality requires much more than good analytics. It requires fundamentally good data management techniques, including a full audit of all changes.

  99. This is brilliant, Willis. Be sure you share it with the Berkeley team.

    The purpose of the project is to try to determine to what extent the record has been skewed by air conditioners added next to the weather station, increasing use of asphault, and other aspects of the Urban Heat Island Effect.

    That is a very difficult challenge indeed, and only a truly rigorous procedure has any hope of getting it right.

    If the Berkely tem share ALL the raw data and calculations, we will have something useful. But if they screw up the math, it will get a reputation akin to Mann’s hockey stick.

  100. Actually I think this problem should be approached from the other end. It would be better to have a hundred series that you trust than ten thousand that you don’t. So I recommend to search for data series which are well documented, check changes of measurement techniques over the duration of the record, average out or ignore obvious omissions (months when no data have been recorded) and arrive thereby at the best record for that station. Then you apply (or not) UHI corrections for that station. De Bilt in Holland is an example, where they have been doing this since 150 years or so. You can then also communicate with those stations if necessary, obtain copies of the original records when data are uncertain. If you find say one hundred stations like this with a reasonable spread over the globe you can draw useful conclusions about global temperature trends. It is of course a lot of work, but not more I think than the BEST approach. It would lead to an understanding of what has been happening around the world (before you apply statistical techniques) and could provide a basis to expand the system to recorded humidity levels as suggested in earlier comments.

    Evert

  101. Willis I think you’ve misunderstood empirical homogenization. Further, I’m not at all convinced that temperatures at a given location are universally described by a power law distribution, third I do not see the kind of discontinuous behavior that you speak of. 4th, the 1200km correlation distance is an integral part of the error calculation due to spatial sampling. There will be, I suspect, some refinement of that namely to account for the known variations in this figure due to latitude, season, and direction. Temperatures are correlated in space because the atmosphere is a flow, so its more than mere correlation.

    All that aside, we know this from tests with synthetic data. Methods like theirs, perform vastly superior to the method you have preferred at times in the past : the first differences method. See jeffId’s destruction of that method back on the airvent

  102. Hear! Hear! Dr. Curry! To be right-minded, researchers should be flattered when their efforts are questioned, critiqued, and improved upon. Important research MUST be questioned, duplicated, shown in other ways, and improved upon.

    My thoughts: I believe that the effort to develop temperature data sets is still in its infancy. I am convinced that multiple broad regional sets made up of a running overlapping three month average in much the same way oceanic and atmospheric oscillations are presented would be another viable (and in my mind better) method. The single global monthly average hides way too much important information be it anthropogenic or natural, or both.

  103. First time poster. Not a meteorologist, not a scientist, not a physicist, and my degree in Oceanography, earned at a small trade school on the Severn River in Maryland, is almost 40 years old.

    That said, if my admittedly porous memory serves, somewhere I read or heard or was tested on the fact that the oceans are the ‘engines of our weather.’ If that observation is valid, then all this surface temperature analysis is akin to taking your own temperature when you have a cold. That temperature is but a symptom of something else that is forcing the temperature.

    All we learn with surface temperature observations is that, yep, something significant is going on. We still aren’t any closer to understanding what is causing the temperature variations. Seems like a lot of wasted effort that might be better focused on understanding ocean currents and ocean temperature variation effects of atmospheric temps.

  104. San Mateo and Redwood City just might be a bit obscure to our readers on a few continents.

    The point being that two towns (any region probably has their own) might be close by when one looks at a map but live in completely different climate zones. The purpose of explicitly mentioning the names of the towns was so people so inclined could look them up on a map and see how close they are. The difference being that there is a mountain range between San Mateo and Half Moon Bay and so the two are in completely different climate zones even though they are within a short drive of each other

    Same with Truckee and Reno. One is an alpine climate that gets several feet of snow in the winter, the other is desert even though they are only 40 miles apart.

  105. Jeff Carlson says”If year X has 1,000 data points and year Y has 800 data points can’t they generate an average for year X using the 1,000 data points and for year Y using the 800 data points

    That won’t work. Having the same set of stations from year to year is crucial. The average max temperature across all weather stations in Australia for the first four available full decades are as follows:

    1860-69 : 28.53 deg C
    1870-79 : 29.07
    1880-89 : 29.94
    1890-99 : 30.94

    Do you think that the average temperature in Australia really rose 2.4 deg C in 30 years? That’s 8 deg C per century (using the IPCC way of thinking).
    [Data from http://www.bom.gov.au

  106. Rather then trying to homogenize the data – how about we start over with a detailed, calibrated surface temperature sensor network? Place sensors to measure what needs to be measured, instead of by other considerations, and calibrate to satellite data.

    At some point the finite, definable cost of that exercise would be less then the continuing efforts to integrate bad data. We could then draw a line, reference all of the old data (proxy and sensors) with error as a historical value (and fight over that forever). The data could be evaluated over the coming decades to determine if the “signal” of carbon emissions exists, or solar influences, ocean cycles, and land-use changes are predominant. The density of the network would determine its accuracy and cost. If this is really important, the effort should be politically feasible.

    Might be a way out for the politicians who have been hoodwinked into being AGW proponents – get better data and study it some more is a time honored way for politicians to kick the can down the road.

  107. I’d urge an accounting-transaction style approach to the dataset. Explained more fully here. That way, for a given station/day/time, the ‘layers’ of raw data, adjustments etc can be represented as separate transactions, with source and other categorisations – date/time stamps, process and version used, type of adjustment and so on – added to ensure traceability. That way, even if there are kludges and mash-ups, they show up as separate transactions in the station/day/time ‘account’ and can be aggregated across all stations, excluded with a line of SQL magic, or filtered out in a cube approach. Transparency and workability, please!

  108. Surely these weather stations have kept track of other weather data than just temperature? How about BEST include ALL the data and meta data from all the weather station records? What information has been recorded by these stations? Is it even the right data needed to make sense of the climate?

    “What Does Moist Enthalpy Tell Us?

    In our blog of July 11, we introduced the concept of moist enthalpy (see also Pielke, R.A. Sr., C. Davey, and J. Morgan, 2004: Assessing “global warming” with surface heat content. Eos, 85, No. 21, 210-211. ). This is an important climate change metric, since it illustrates why surface air temperature alone is inadequate to monitor trends of surface heating and cooling. Heat is measured in units of Joules. Degrees Celsius is an incomplete metric of heat.

    Surface air moist enthalpy does capture the proper measure of heat. It is defined as CpT + Lq where Cp is the heat capacity of air at constant pressure, T is air temperature, L is the latent heat of phase change of water vapor, and q is the specific humidity of air. T is what we measure with a thermometer, while q is derived by measuring the wet bulb temperature (or, alternatively, dewpoint temperature).

    To illustrate how important it is to use moist enthalpy, we can refer to the current heat wave in the southwest United States. The temperatures in Yuma, Arizona, for example, have reached 110°F (43.3°C), but with dewpoint temperatures around 32°F (0°C). In terms of moist enthalpy, if the temperature falls to 95°F (35°C) but the dewpoint temperature rises to 48°F, the moist enthalpy is the same. Temperature by itself, of course, is critically important for many applications. However, when we want to quantify heat in the surface air in its proper units in physics, we must use moist enthalpy.

    In terms of assessing trends in globally-averaged surface air temperature as a metric to diagnose the radiative equilibrium of the Earth, the neglect of using moist enthalpy, therefore, necessarily produces an inaccurate metric, since the water vapor content of the surface air will generally have different temporal variability and trends than the air temperature.” – Roger Pielke Sr.

    http://pielkeclimatesci.wordpress.com/2005/07/18/what-does-moist-enthalpy-tell-us/

    Does it even make sense to “average” temperature data into one number for the whole planet. I’ve never seen any explanation of that that makes sense. Doesn’t it make more sense to monitor each station’s local temperatures and look for trends in that? Some will stay the same, some will have upward trends and some will have downwards trends. Having an average of the “anomaly” data doesn’t even make sense, at least I’ve never seen any good explanations as to why it makes any sense.

    In computer science we deal with discrete data sets all the time. Averaging or “homogenizing” data aka “fabricating data” just doesn’t seem wise as that just continues the statistical games that have been played. Somehow a discrete method must be developed that incorporates error ranges and uncertainties intact throughout the computations so that outputs also show the combined error ranges and uncertainties for any computed data. Computed data should be clearly marked as such and should NEVER be presented in a graph in a way to imply that it’s anything but computed data.

    Bottom line it won’t be the best BEST if it only uses temperature data alone, it seems that Surface Air Moist Enthalpy should also be computed.

    How can climate science advance when it’s stuck with limited methods of measuring Nature (e.g. Average Surface Temperature) when more accurate methods exist (e.g. Surface Air Moist Enthalpy)?

  109. just checked Romm’s (is he really not Roem after all?) site…they are incensed and worked up about something that Mr Watts might have done. If we could harness their negative energy to power our nations, the problem is solved. They seem completely beyond the bounds of reason. i feel I should feel sorry for them and try to lock them up somewhere. they obviously cannot handle real life.

    Sorry, that was my first dip into climate progress: it is like dealing with a horde of hysterical lunatics.

  110. Like a few of the other posters have mentioned the question which is trying to be answered is that of energy budget. Temperature, while correlated for sure is a pretty poor proxy for energy, especially without pressure and humidity levels. All of the effort should be going towards ocean heat content and proxies there of (SST). In addition to it being largely free of urban heat and other contaminating and confounding effects it actually is the direct measurement of the question which needs answering. The top few meters or so of ocean contains as much heat energy as the entire atmosphere. In addition the variance in ocean temps is way way lower than atmospheric surface temps so the underlying data is much more stable and fewer measurements will give far far better approximations to reality.
    It seems bizarre that we can easily get daily satellite temp data, and yet the OHC data from the argo floats is available only with great difficulty. Just business as usual in the world of climate ‘science’.

  111. Willis, thanks for a thought provoking post, once again.

    I’m with the “don’t change the data group “(which seems to be almost everyone). If the data is questionable, drop it from the analysis.

    I, like lots of others on this thread, have an anecdote about real weather producing a temperature reading that would be questioned. A few years ago, here in San Antonio, Tx ,we had a 100 degree F high one day in February. Of course it was an all time record high for February. Temperatures were fairly normal on either side of that day. If one was looking at the temperature records, one would probably question the validity of that reading. Yet it was real.

    Regarding data distributions. I too have always wondered about the general assumption that the data have a Gaussian distribution. I was not familiar with the Zipf distribution. (Thanks to Tim Folkerts, and someone else whom I could not find again in the thread, for explaining what it is.) My favorite expression for distributions is the Weibull distribution, since the expression for a Weibull distribution includes a shape factor that will cause it’s shape to vary from exponential to normal to log-normal.

    And on the subject of distributions, why do we always use the mean of the distribution for the “average?” It would seem to me that the median (50% above, 50% below) would be more descriptive of temperature behavior than the arithmetic mean.

    One last comment. I hope that Dr. Curry is right. I hope that BEST is just the being.

  112. Many have suggested that instead of discarding outliers they simply include this in the error bars. I don’t see how we can even calculate error bars given the nature of local temperatures that Willis points out here and the limitations of the data set. Can someone give me a primer in one paragraph or less on how we can even quantify error given the limitations of the data? If it is possible what are the assumptions included in that error calculation?

    It seems to me that we needed to know our methodology before we began collecting data. Obviously, that is not the case with the historical dataset.

  113. It’s the trend of change in temperature at each measurement location that matters, not the average temperature over some area, and therefore not the trend of the change in average temperature.

    The methodology should be to compute the trend of temperature changes over time at each measurement site, then multiply the number of risers by the average rise and the number of decliners by the average decline. That would provide a far better synopsis of the trend.

    Consider, for example, a stock market: Which methodology would be a better measure of what the market is doing as a whole? 1) Computing the mean price of all stocks every N units of time, and then plotting the trend of that average price over time? Or 2) Computing the trend of each stock price individually over time, and multiplying the number of rising stocks multiplied by the average percentage increase, and multiplying the number of declining stocks by the average percentage decrease?

  114. Heat capacity of the immediate environment at each station is a major player. Global average temperature may not be a meaningful concept, as indicated by Essex & al., 2007.

    Journal of Non-Equilibrium Thermodynamics. Volume 32, Issue 1, Pages 1–27
    ISSN (Print) 0340-0204
    DOI: 10.1515/JNETDY.2007.001
    February 2007
    Does a Global Temperature Exist?
    Christopher Essex, Ross McKitrick & Bjarne Andresen

    However, heat content of the entire climate system or of its subsystems, being an extensive quantity, is meaningful. Now, average of temperatures over the subset of stations with a high heat capacity environment (basically the ones close to a large water body) is a proxy to heat content, because heat capacity of water is the same everywhere and is much larger than that of any other substance in the climate system.

    Therefore it would be an interesting exercise to compare seaside temperature trends to inland trends. As the latter ones lack any obvious physical meaning, if trends of averages differ significantly for the two subsets, that would cast serious doubt on any global trend.

  115. Willis, you are dead right.

    The BEST effort is worthy and should be praised and encouraged. However, unless they are magicians they will not be able to overcome the fundamental problem which is that the source data is just too lousy for the job. So I fear the BEST team may have created a rod for their own backs. If their results show less warming than the current official data their methodology will be roundly criticised by the warmists. If their results confirm existing estimates, the skeptics will be all over them with objections.

    For me the irony of all of this is that it is irrelevant anyway. What people seem to be blind to is that, even using the official world temperature data from NASA or Hadley, THERE IS NO DANGEROUS GLOBAL WARMING ANYWAY.

    Here is a plot of the fficial HadCRUT3 data straight from the University of East Anglia of Climategate fame:

    http://www.thetruthaboutclimatechange.org/temps.png

    Perhaps somebody could explain to me why a long term trend of 0.41degC per century is an alarming problem? The 30 year period from 1970 to 2000 looks to me likely to be just the upswing of the well documented ~67 year oceanic AMO temperature cycle, which shows up clearly in the 11 year running mean line (red). During the upswing section of any roughly sinusoidal variation, the slope is going to be 4 or 5 times steeper than the long term average – hey, that’s the nature of a sinusoidal variation! So it is surely no coincidence that climate alarmism flourished during that 30 year period when Hansen et. al. believed fervently that they were witnessing the clear and alarming signature of man-made global warming. Now that we are moving over the top of the natural cycle and into the downswing, they are having an increasingly tough time maintaining their position.

    Reworking the world’s temperature data is most unlikely to resolve this issue. But the next 10 to 15 years will surely resolve it one way or the other just by observing whether or not the current apparent downturn is maintained. While we wait patiently for the outcome, perhaps we should take on board Ronald Reagan’s famous maxim: “Don’t just do something, stand there”.

  116. Willis Eschenbach says:
    March 23, 2011 at 1:13 am
    BigWaveDave says:
    March 23, 2011 at 12:53 am
    Ian W says:
    March 23, 2011 at 3:06 am

    . . . regarding pressure and humidity . . .

    I agree with Ian. He stated it one way. I would like to state it in a slightly different way. Temperature is a poor, incomplete surrogate measure of local atmospheric energy content. So, what good can come out of determining a world wide “average” of nonuniformally measured temperature (an incomplete surrogate for energy) without corresponding world wide detail on the local time dependent concentrations of both the major energy carrying agent (water) and most importantly the local vertical concentration distribution of what everyone seems to be clamoring is the pesky AGW causitive agent CO2. I agree homogenizing temp data, might allow introduction of a meaningless bias, but without any other more important MEASURED data as well, especially site specific data (eg. water and CO2 concentration, vertical distribution, time dependent changes in local human population, local aircraft ‘concentration’, local changes in number, proximity and horsepower of engines, building ‘concentration’ and height, surface emissivity etc.) the CHANGE in temperature can not in any meaningful way be attributed to a CAUSE.

  117. diogenes says:
    March 23, 2011 at 3:18 pm
    “Sorry, that was my first dip into climate progress: it is like dealing with a horde of hysterical lunatics.”

    Given your name, the worst of the worst should not disappoint you in any way. Those CP people must be “something else.”

  118. steven mosher says:
    March 23, 2011 at 12:15 pm
    “Willis I think you’ve misunderstood empirical homogenization. Further, I’m not at all convinced that temperatures at a given location are universally described by a power law distribution, third I do not see the kind of discontinuous behavior that you speak of. 4th, the 1200km correlation distance is an integral part of the error calculation due to spatial sampling.”

    Why do you post this? It’s just a list of “mentions” of points on which you disagree with Willis. I could understand a post in which you explain one of your points. Do you not have time to explain your points? Then why post at all? Anyone who demonstrates the temperament that you demonstrate in this post will inevitably bring upon himself a lot of unnecessary push-back. I hope BEST does not share your temperament. If they do, this entire exercise is wasted.

  119. A “movie” of 15 four year periods of 525,960 1951-2010 hourly averaged data from De Bilt NL (station # 260).

    [video src="http://boels069.nl/klimaat/Temperature.wmv" /]

    (Needs Windows MediaPlayer)
    Data from:

    http://www.knmi.nl/klimatologie/uurgegevens/#no

    The plots represent counts of unique temperatures and the Excel (Office 2010) Norm.Dist function (Gaussian I believe).

    I’m also wondering about the Zipf distribution.

  120. Tim Folkerts says:
    March 23, 2011 at 11:07 am

    Willis,

    I have to ask about the use of the Zipf Distribution.

    A quick check of Google shows that you are about the only person who uses this distribution with respect to temperature (yet you seem very confident that it is THE distribution for temperature data). Typically this distribution is used to look at things like how often different words come up in language, how often cities of various sizes occur, how much traffic the 7th most popular web page gets compared to the 8th most popular.

    This distribution is based on rankings of items. From Wikipedia on “Zipf’s Law”:

    For example, in the Brown Corpus “the” is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69,971 out of slightly over 1 million). True to Zipf’s Law, the second-place word “of” accounts for slightly over 3.5% of words (36,411 occurrences), followed by “and” (28,852).

    A similar statement for temperature would be something like:

    For example, at some weather station “70 F” is the most frequently occurring temperature , and by itself accounts for nearly 7% of all temperature records. True to Zipf’s Law, the second-place temperature “69 F” accounts for slightly over 3.5% of temperature records followed by “71 F”.

    Of course, the specific percentages would be different, but I don’t really see how a ranking of temperature frequencies will be an effective way to analyze the data. For one thing, it eliminates the sign of the deviation from the most common data (the mode).
    I agree totally with your point. I might add that the language is a qualitative data set. Temperature is quantitative data. Also temperature has a periodic seasonality which will dominate the distribution of data. There is no assumption required that data be Gaussian to do the homogenization of the statistics to eliminate bad data.

    An extensive review of different homogenization techniques was published by researchers from 11 different countries. There are different mathematical techniques in additional to labor intensive techniques that make use of metadata.

    http://docs.google.com/viewer?a=v&q=cache:fIHtfyKKinoJ:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.122.1131%26rep%3Drep1%26type%3Dpdf+homogenization+temperature+data+rationale&hl=en&gl=us&pid=bl&srcid=ADGEEShduEujlOnUZP5mmxCwyhSmq9kJPCwRXcYLO1wk8-0iS61vMlDsMZ8Fo1-zPI44u0nCgQHk_LOQaypaAY3hHO1VOeqTutMglkMdJ4K7Z3KR6u5nMN3wj9ihRzoz6S7pg-m-4AAG&sig=AHIEtbQI2CgKWKVQS1C43Eoa-Glzsi5A8Q

    I think without looking at how these work in some detail, it seems cavalier to dismiss them with the remark used by Eschenbach:

    “The problem is not that the data requires “homogenization”, that’s a procedure for milk. The problem is that there are undocumented discontinuities or incorrect data entries. But homogenizing the data is not the answer to that.”

    The rhetoric here treats homogenization as if it were some kind of obscentity.

    The definition of inhomogeneities in the data includes, equipment failures, changes in location of stations, changes in environment, measurement practices operator error etc., i.e. any changes other than changes in climate that affects the data. The process of homogenization is to extract the true behavior of the climate and eliminate the extranenous factors in the data. Part of process of homogenization of the data is to find the discontinuities and errors in the data. The review makes that clear. Also, when multiple stations are used, and correlated with one another, the probability of making an error is significantly reduced.

    There is no perfect way to homogenize data. As the review paper points out, the best method to use, depends on the state of the data, the distance between stations etc. The authors point out, in their conclusions section, that when significant errors need to be corrected, homogenized data sets done by different methods tend to resemble one another more than they resemble the original data. To me that would confirm that homogenization is a good idea. The review points out that when large numbers of stations are averaged, the homogenized and original average trends are very similar. Homogenization of data is most useful for regional temperature data.

  121. Willis:

    If I recall correctly, Ross McKitrick, et. al. concluded that the whole concept of a “global average temp.” is as goofy as the narrative gets. Do you deny this? If so, why? If not, what good is the BEST investigation? Is it not a waste of time and resources, if the “conclusions” mean nothing?

    I personally agree with RP Sr. who maintains that the ocean temp. is the important metric.

    What say you?

  122. eadler says:
    March 23, 2011 at 6:33 pm

    “The rhetoric here treats homogenization as if it were some kind of obscentity.”

    It is nice to see that you are reading better than you once did and that you are picking up nuance better than you once did.

    The BEST people have an opportunity to connect with sceptics. They should take it. Otherwise, after November 2012 there will be no Democrats in Congress, EPA will be abolished, and the UN will be escorted by security to the new country of its choice.

    If all BEST is going to do is discuss advanced statistical techniques then their effort is DOA, dead on arrival. They have to take up questions that interest sceptics if they want to appeal to sceptics.

  123. When the actual temperature field is as inhomogenous as careful measurements show it to be, the ex ante imposition of academic notions of homogeneity is scientifically misguided. There is no such thing as a unique “true” temperature at any averaging scale in any region that can be entirely divorced from the particular spot where the thermometer is placed. While the correlation of deviations from the station average can extend over considerable distances, that does not apply to the absolute temperatures themselves. Somehow that point escapes those who favor the statistical massage of scraps of data from a temporally non-uniform set of stations over actual measurements. No less than than the vain excercises in homogenization, the abandonment of a uniform set of datum levels–which all too often are corrupted by urbanization–when the inclusion of very short records becomes a primary goal strikes me as being a grievous flaw.

  124. Seeing as the error bars on any surface temperature measurement would have to be +/- 1 deg C at least I find it hard to believe that a conclusion other than “to within the measurement errors no temperature trend is discernible” could possibly be warranted.
    This project is too silly for words.

    Historical and agricultural records along with a little archaeology would seem to give a better idea of climate change in the last few thousand years of the holocene and those records integrate all aspects of climate, not just temperature.

  125. The recent Russian heatwave was (finally) put down to a blocking high. It was unusual, it was unexpected, it could be determined to be an outlier but it occurred, therefore it is a valid piece of data.

    Why should it be removed in favour of some-one’s arbitrary idea of what is/is not an outlier?

  126. Theo Goodwin says:
    March 23, 2011 at 7:55 pm

    eadler says:
    March 23, 2011 at 6:33 pm

    “The rhetoric here treats homogenization as if it were some kind of obscentity.”

    [eadler] It is nice to see that you are reading better than you once did and that you are picking up nuance better than you once did.

    The BEST people have an opportunity to connect with sceptics. They should take it. Otherwise, after November 2012 there will be no Democrats in Congress, EPA will be abolished, and the UN will be escorted by security to the new country of its choice.

    If all BEST is going to do is discuss advanced statistical techniques then their effort is DOA, dead on arrival. They have to take up questions that interest sceptics if they want to appeal to sceptics.

    – – – – – – –

    Theo Goodwin,

    Thanks, that was simply and well said.

    If it happens, for example with Anthony and the WUWT fellowship’s help, that there is increasingly more timely and direct engagement between BEST team ongoing efforts and the skeptics on the blogs, then I can only see it as being a very positively perceived enhancement of the stature of the BEST project.

    John

  127. 134.

    When the actual temperature field is as inhomogenous as careful measurements show it to be, the ex ante imposition of academic notions of homogeneity is scientifically misguided. There is no such thing as a unique “true” temperature at any averaging scale in any region that can be entirely divorced from the particular spot where the thermometer is placed. While the correlation of deviations from the station average can extend over considerable distances, that does not apply to the absolute temperatures themselves. Somehow that point escapes those who favor the statistical massage of scraps of data from a temporally non-uniform set of stations over actual measurements. No less than than the vain excercises in homogenization, the abandonment of a uniform set of datum levels–which all too often are corrupted by urbanization–when the inclusion of very short records becomes a primary goal strikes me as being a grievous flaw.
    You have made an excellent rhetorical argument with powerful adjectives – vain, corrupted etc., but in essence, you are making a straw man argument here. The purpose of homogenization is to determine deviations from the average, ie. temperature anomalies, rather than absolute temperatures. We don’t really care about the absolute numbers, only the trends.

  128. David Socrates says: March 23, 2011 at 5:37 pm

    “Perhaps somebody could explain to me why a long term trend of 0.41degC per century is an alarming problem? The 30 year period from 1970 to 2000 looks to me likely to be just the upswing of the well documented ~67 year oceanic AMO temperature cycle, which shows up clearly in the 11 year running mean line (red). During the upswing section of any roughly sinusoidal variation, the slope is going to be 4 or 5 times steeper than the long term average – hey, that’s the nature of a sinusoidal variation! So it is surely no coincidence that climate alarmism flourished during that 30 year period when Hansen et. al. believed fervently that they were witnessing the clear and alarming signature of man-made global warming. Now that we are moving over the top of the natural cycle and into the downswing, they are having an increasingly tough time maintaining their position. “

    There is another interesting aspect of “the [o]fficial HadCRUT3 data” plot http://www.thetruthaboutclimatechange.org/temps.png
    It starts near a trough (1860) and ends near a peak (~ 2005).

    Nonetheless, the HadCRUT3 trend is remarkably “consistent with” Akasofu’s view of temperature trends:

    Akasofu, Syun-Ichi. 2009. Two Natural Components of the Recent Climate Change: University of Alaska Fairbanks Fairbanks, Alaska: International Arctic Research Center, April 30. http://people.iarc.uaf.edu/~sakasofu/pdf/Earth_recovering_from_LIA_R.pdf

    A possible cause of global warming.
    (1) The Recovery from the Little Ice Age (A Possible Cause of Global Warming)
    and
    (2) The Multi-decadal Oscillation (The Recent Halting of the Warming)

    Two natural components of the currently progressing climate change are identified. The first one is an almost linear global temperature increase of about 0.5°C/100 years, which seems to have started in 1800–1850, at least one hundred years before 1946 when manmade CO2 in the atmosphere began to increase rapidly. This 150~200-year-long linear warming trend is likely to be a natural change. One possible cause of this linear increase may be the earth’s continuing recovery from the Little Ice Age (1400~1800); the recovery began in 1800~1850. This trend (0.5°C/100 years) should be subtracted from the temperature data during the last 100 years when estimating the manmade contribution to the present global warming trend. As a result, there is a possibility that only a small fraction of the present warming trend is attributable to the greenhouse effect resulting from human activities.
    It is also shown that various cryosphere phenomena, including glaciers in many places in the world and sea ice in the Arctic Ocean that had developed during the Little Ice Age, began to recede after 1800 and are still receding; their recession is thus not a recent phenomenon.
    The second one is oscillatory (positive/negative) changes, which are superposed on the linear change. One of them is the multi-decadal oscillation, which is a natural change. This particular natural change had a positive rate of change of about 0.15°C/10 years from about 1975 (positive from 1910 to 1940, negative from 1940 to 1975), and is thought by the IPCC to be a sure sign of the greenhouse effect of CO2. However, the positive trend from 1975 has stopped after 2000.
    One possibility of the halting is that after reaching a peak in 2000, the multi-decadal oscillation has begun to overwhelm the linear increase, causing the IPCC prediction to fail as early as the first decade of the 21st century.
    There is an urgent need to correctly identify natural changes and remove them from the present global warming/cooling trend, in order to accurately and correctly identify the contribution of the manmade greenhouse effect. Only then can the effects of CO2 be studied quantitatively. Arctic research should be able to contribute greatly to this endeavor.

    Akasofu, Syun-Ichi. 2010. “‘On the recovery from the Little Ice Age’.” Natural Science 2 (11): 1211-1224. doi:10.4236/ns.2010.211149. http://www.scirp.org/journal/PaperInformation.aspx?PaperID=3217&JournalID=69

    A number of published papers and openly available data on sea level changes, glacier retreat, freezing/break-up dates of rivers, sea ice retreat, tree-ring observations, ice cores and changes of the cosmic-ray intensity, from the year 1000 to the present, are studied to examine how the Earth has recovered from the Little Ice Age (LIA). We learn that the recovery from the LIA has proceeded continuously, roughly in a linear manner, from 1800-1850 to the present. The rate of the recovery in terms of temperature is about 0.5°C/100 years and thus it has important implications for understanding the present global warming. It is suggested on the basis of a much longer period covering that the Earth is still in the process of recovery from the LIA; there is no sign to indicate the end of the recovery before 1900. Cosmic-ray intensity data show that solar activity was related to both the LIA and its recovery. The multi-decadal oscillation of a period of 50 to 60 years was superposed on the linear change; it peaked in 1940 and 2000, causing the halting of warming temporarily after 2000. These changes are natural changes, and in order to determine the contribution of the manmade greenhouse effect, there is an urgent need to identify them correctly and accurately and remove them.

  129. curryja says:
    March 23, 2011 at 10:19 am

    Willis, a brief comment. The Berkeley group may not present the “last word” in how to do the surface temperature analysis. I regard this as the first important step, in terms of a comprehensive data set that is well documented and transparent. They have introduced some new methodologies, which are all steps in the right direction, IMO. Their analysis will lay the foundation for others to try other methods and to improve on the analysis

    Judith Curry

    Always good to hear from you, Judith. And I agree completely. Someone above said the effort was “doomed to failure”, but I see it as the exact opposite, a very important initiative whose outlines are not yet set.

    The most important things to me are accessibility and transparency. If each step is visible, and the data and code is public, then whether their new methods are valuable additions will quickly become apparent.

    One of the first things that struck me when I entered the field was the lack of an agreed upon temperature dataset. Before we can even begin to discuss such things as the effect of UHI upon the dataset, the data has to be clean and good and quality controlled and agreed upon.

    I am, as I mentioned above, not in agreement with several of their methods. First, the idea that using only the data from a group of 21 nearby stations one can reliably determine whether a data point is less than a one in a thousand event (99.9% exceedence rate) or not seems impossible. I believe that they have a statistician among the team, but if so, they haven’t thought this one all of the way through.

    Because if you want to reliably determine if a certain data point is a one-in-a-thousand event or not, you’ll need a minimum of something like thirty times that much data, or 30,000 data points … and I doubt if we have that for a number of the stations in question. And that’s without even including the complication of the
    Zipf distribution.

    Part of the problem is that most statisticians don’t even think about Zipf distributions at all, whereas nature thinks about them all the time. And extremes are the stock-in-trade of Zipf distributions. So the error margins on the calculations of which is a valid data point and which is bogus data will be huge.

    Finally, as I indicated above, the whole idea of “homogenization” is anathema to me.

    But that just a disagreement about what to do with the data, once we have an agreed upon dataset. With the dataset, people around the world will be able to do their own analyses and come to their own conclusions.

    All the best,

    w.

    PS – Please, please, please, if you have any power over BEST, have them make the data available as a single 2-D block, with rows as time and columns as stations. I’m not interested in 35,000 individual station records a la CRU … indeed, the design of the public availability gateway for this data is crucial. They should take a look at the KNMI website, where I can pull up any subset of the data I’m interested in, filtered in a host of possible ways.

  130. Jeff Carlson says:
    March 23, 2011 at 10:39 am

    Will,

    they are doomed because they are starting with bad raw data … its that simple … you can’t fix that with math …

    Jeff, all data is bad data to a greater or lesser degree … the only question is the degree. No measurement is ever exact. We don’t let that stop us in any field of science, it just affects the confidence intervals.

    w.

  131. Tim Folkerts says:
    March 23, 2011 at 11:07 am

    Willis,

    I have to ask about the use of the Zipf Distribution. … (good questions snipped)

    Thanks, Tim. The Zipf is one of a number of closely related power law distributions. They all differ from Gaussian distributions in that they have an excess of extreme events. For this reason they are sometimes called “fat-tailed” distributions.

    My point is not that Zipf is the one and only relevant distribution. It is that we cannot use Gaussian statistics to identify extreme events in natural datasets.

    If I ran the zoo, the first thing that I’d do is to take the highest quality records that I had and determine which particular power-law distribution gives the best fit to the temperature data.

    Then, and only then, would I start talking about the “99.9% exceedance” limits …

    I discuss the Zipf distribution some more in the appendices to this post.

    w.

  132. eadler says:
    March 24, 2011 at 5:29 am

    “The purpose of homogenization is to determine deviations from the average, ie. temperature anomalies, rather than absolute temperatures. We don’t really care about the absolute numbers, only the trends.”

    Pray tell, how does one establish reliable “averages” in the first place, without maintaining FIXED datum levels at a UNIFORM (unchanging) set of stations? You seem oblivious to the fact that by employing scraps of record from an EVER-CHANGING set of stations, you get a data sausage with mystery ingredients, rather than a physically meaningful ensemble average of station records. And that’s in the very best case, without any data handling issues or the datum-corrupting influences of UHI and site/instrumentation or land-use changes. You SHOULD care about absolute levels, because that’s the ONLY thing that thermometers measure.

    Spurious offsets from datum level are readily lost from sight in the highly variable, stochastic temperature changes at any station, but even a half-dozen offset years near the beginning or end of the record can have a profound effect upon the regressional “trend.” In fact, such trends are the most inconsistent features of actual station records. Without proper vetting of station data at an ABSOLUTE level, which can only be done with sufficiently long records, trends become ephemeral artifacts. This problem is not solved by MANUFACTURING a time-series via “homogenization” from neighboring unvetted data.

  133. sky says:
    March 24, 2011 at 2:44 pm

    eadler says:
    March 24, 2011 at 5:29 am

    “The purpose of homogenization is to determine deviations from the average, ie. temperature anomalies, rather than absolute temperatures. We don’t really care about the absolute numbers, only the trends.”

    Pray tell, how does one establish reliable “averages” in the first place, without maintaining FIXED datum levels at a UNIFORM (unchanging) set of stations? You seem oblivious to the fact that by employing scraps of record from an EVER-CHANGING set of stations, you get a data sausage with mystery ingredients, rather than a physically meaningful ensemble average of station records. And that’s in the very best case, without any data handling issues or the datum-corrupting influences of UHI and site/instrumentation or land-use changes. You SHOULD care about absolute levels, because that’s the ONLY thing that thermometers measure.

    Spurious offsets from datum level are readily lost from sight in the highly variable, stochastic temperature changes at any station, but even a half-dozen offset years near the beginning or end of the record can have a profound effect upon the regressional “trend.” In fact, such trends are the most inconsistent features of actual station records. Without proper vetting of station data at an ABSOLUTE level, which can only be done with sufficiently long records, trends become ephemeral artifacts. This problem is not solved by MANUFACTURING a time-series via “homogenization” from neighboring unvetted data.

    You are making me hungry with your talk of data sausages for breakfast. I was hoping they would be something substantial , but you dashed my hopes because you say the trends they created are ephemeral artifacts. If they are ephemeral I guess they are not substantial enough to satisfy my breakfast hunger.

    In fact you are criticizing a procedure which you don’t understand and haven’t read, because the details haven’t been released yet in a full paper. In any case, the homogenization process is what “vets the data”, so your phrase “without vetting the data” is an unfounded assumption about the procedure you claim to criticize.

  134. eadler says:
    March 24, 2011 at 7:02 pm

    “In fact you are criticizing a procedure which you don’t understand and haven’t read, because the details haven’t been released yet in a full paper. In any case, the homogenization process is what “vets the data”, so your phrase “without vetting the data” is an unfounded assumption about the procedure you claim to criticize.”

    You can presume anything you want about what I ostensibly “don’t understand” about “homogenization,” but any time measured values in a station record are altered or replaced with something inferred statistically from other stations, the observational basis is no longer there. Bona fide vetting doesn’t do that! And I don’t waste my time on plainly attitudinal arguments that lack substantive basis.

  135. I prefer the use of the log normal distribution for continuous data rather than the Zipf distribution which is useful for discrete data. My personal experience indicates that natural series for continuous data tend to be log normal with widely varying parameters. (I haven’t read all the comments so far – too many!) In any case the use of the average as the basis for a anomalous datum is probably wrong most of the time because the median is a better measure of the central tendency then the average. In any case, the raw data must be preserved!

  136. sky says:
    March 24, 2011 at 8:47 pm

    eadler says:
    March 24, 2011 at 7:02 pm

    “In fact you are criticizing a procedure which you don’t understand and haven’t read, because the details haven’t been released yet in a full paper. In any case, the homogenization process is what “vets the data”, so your phrase “without vetting the data” is an unfounded assumption about the procedure you claim to criticize.”

    You can presume anything you want about what I ostensibly “don’t understand” about “homogenization,” but any time measured values in a station record are altered or replaced with something inferred statistically from other stations, the observational basis is no longer there. Bona fide vetting doesn’t do that! And I don’t waste my time on plainly attitudinal arguments that lack substantive basis.

    It is your posts are full of emotional “attitudinal arguments”. I would wager that if the GISS temperature record showed that global warming was non-existent, you wouldn’t worry about homogenization.

    Your argument is not logical. There are no locations on earth whose station records are free of inhomongenities during the 160 year period covered by the temperature record. The alternatives are :

    1)correct the errors to the best of our ability, using statistical methods
    2) allow them to remain in the record and influence the result
    3)Eliminate the entire record.

    There is no perfect solution.
    If you don’t want to have a temperature record, you could do 3), throw out all of the records data that has imperfections.

    If you didn’t care about errors, you could forget about corrections. Some people have objected to the corrections, claiming that without them, there would be no evidence of global warming in the temperature record, and offering as proof records of individual stations where this has been the result.
    In fact this has been looked at. The resulting trends, without the homogenization corrections, are said to be the similar to the corrected data:

    http://tamino.wordpress.com/2011/01/02/hottest-year/#comment-46809

    Wrote a program to read in the GHCN data sets and compute simple-minded “dumb average” global temperature anomalies (smoothed with a moving-average filter).

    The results were noisier than the official GISS/CRU/etc. results (due to lack of ocean coverage and lack of proper geospatial weighting), but overall results were quite consistent with GISS/CRU/etc.

    Overall summary of the results:

    GHCN raw and adjusted data produced nearly identical temperature trends.

    To me, making an effort to correct the data, with statistical valid techniques, even if there is no quarantee of 100% accuracy, is better than the other 2 options. This is especially important for charting regional trends.

  137. John Trigge says:

    “The recent Russian heatwave was (finally) put down to a blocking high. It was unusual, it was unexpected, it could be determined to be an outlier but it occurred, therefore it is a valid piece of data.

    Why should it be removed in favour of some-one’s arbitrary idea of what is/is not an outlier?”

    There is no reason for temperature data from the Russian heat wave to be rejected by the BEST project. The heat spell occurred for a relatively long period over a large area. While the data point for, say, July 2010 from a given location would appear to be an outlier for that particular location compared with other July’s at that location, other locations nearby would show the same pattern of extremely high temperatures during the same period. The BEST homogenization approach would not throw out data from such extreme events.

    Homogenization is designed to identify items such as the UHI, measurement errors, undocumented station movements and changes in the immediate surroundings producing changes in data produced at the location. The approach may detect errors such as a record high temperatures registered in Baltimore not long ago (reported in this blog by Anthony) that were produced by faulty equipment but which was not corrected even though the sensor problem was known by the US Weather Service.

  138. eadler : “The resulting trends, without the homogenization corrections, are said to be the similar to the corrected data:

    They “are said to be”. If the raw data had been kept, plus a record of all the changes with reasons, the trends wouldn’t be “said to be”, they would “be” (or “not be”).

  139. eadler says:
    March 25, 2011 at 7:41 am
    “It is your posts are full of emotional “attitudinal arguments”. I would wager that if the GISS temperature record showed that global warming was non-existent, you wouldn’t worry about homogenization.

    Your argument is not logical. There are no locations on earth whose station records are free of inhomongenities during the 160 year period covered by the temperature record. The alternatives are :

    1)correct the errors to the best of our ability, using statistical methods
    2) allow them to remain in the record and influence the result
    3)Eliminate the entire record. ”

    ==============================================================

    My posts directly address the substantive problems of using scraps of unvetted data and performing uncertain “anomalizations” and ad hoc “homogenizations” in an urban-biased GHCN data base. By mockingly talking about sausages for breakfast and wagering that I wouldn’t worry about homogenization if GISS showed no trend, it is your posts that resort to ad hominem arguments while ignoring the scientific issues.

    Non one is calling for elimination of the entire record. Contrary to your claim, there are hundreds of century-long station records unevenly scattered around the globe that are relatively free of data errors or extraneous, non-climatic biasing factors. I’ll not decribe here the advanced signal analysis methods used in vetting them. But I will say that those analyses reveal very widespread corruption of records by inexplicable offsets, as well as by gradual UHI intensification and land-use changes. That corruption cannot be effectively removed by homogenization with other similarly flawed records. All unvetted records indeed should be disregarded in serious climatological studies–even if it leaves great holes in the geographic coverage. What is wholly illogical is the notion that “trends” obtained from indiscriminate inclusion of all records, such as in Tamino’s global comparison with GISS, are climatically meaningful.
    BEST’s goal of including all records after performing their own “homogenization” serves no clear scientific purpose.

    I see no point in further discussion with someone whose seizes upon colorful adjectives, while paying scant attention to the clear meaning conveyed by nouns and verbs.

  140. sky says:
    March 25, 2011 at 3:44 pm


    Non one is calling for elimination of the entire record. Contrary to your claim, there are hundreds of century-long station records unevenly scattered around the globe that are relatively free of data errors or extraneous, non-climatic biasing factors. I’ll not decribe here the advanced signal analysis methods used in vetting them. But I will say that those analyses reveal very widespread corruption of records by inexplicable offsets, as well as by gradual UHI intensification and land-use changes. That corruption cannot be effectively removed by homogenization with other similarly flawed records. All unvetted records indeed should be disregarded in serious climatological studies–even if it leaves great holes in the geographic coverage. What is wholly illogical is the notion that “trends” obtained from indiscriminate inclusion of all records, such as in Tamino’s global comparison with GISS, are climatically meaningful.
    BEST’s goal of including all records after performing their own “homogenization” serves no clear scientific purpose.

    I see no point in further discussion with someone whose seizes upon colorful adjectives, while paying scant attention to the clear meaning conveyed by nouns and verbs.
    It is not clear what you mean by “relatively free of data errors and nonclimatic biasing factors”. Do you have evidence that there are hundreds of such stations, and more importantly that they can be used to track climate change around the globe? Is there a reference that I can consult for this.
    The survey of homogenization techniques written by 13 researchers from 11 different countries says the following:

    http://www.cru.uea.ac.uk/cru/data/temperature/HadCRUT3_accepted.pdf

    Unfortunately, most long-term climatological time series have been affected by a number of non-climatic factors that make these data unrepresentative of the actual climate variation occurring over time.
    These factors include changes in: instruments, observing practices, station locations, formulae used to calculate means, and station environment (Jones et al., 1985; Karl and Williams, 1987; Gullett et al., 1990;
    Heino, 1994). Some changes cause sharp discontinuities while other changes, particularly change in the environment around the station, can cause gradual biases in the data. All of these inhomogeneities can bias a time series and lead to misinterpretations of the studied climate. It is important, therefore, to remove the inhomogeneities or at least determine the possible error they may cause.

    Unless you have a convincing reference that shows homogenization is unnecessary and data can be vetted in some way without it, or that the statistical methods you use are not part of the methods reviewed in this paper, I believe I am justified in regarding your claim with a great deal of skepticism.

  141. John Andrews says:
    March 24, 2011 at 10:06 pm

    I prefer the use of the log normal distribution for continuous data rather than the Zipf distribution which is useful for discrete data. My personal experience indicates that natural series for continuous data tend to be log normal with widely varying parameters. (I haven’t read all the comments so far – too many!) In any case the use of the average as the basis for a anomalous datum is probably wrong most of the time because the median is a better measure of the central tendency then the average.

    Many thanks, John. For me it depends on the dataset.

    I just downloaded the England daily rainfall dataset. It’s much closer to either a Zipf or an exponential (power law) distribution than it is to log-normal.

    My point was that with any given dataset, what you need to do first is to figure out what kind of non-normal distribution you’re dealing with. I program in R. It has functions for the fitting of data (“fitdistr” in the package “MASS”) to a variety of probability distributions. Only after that’s done can we talk of whether some particular datapoint is out of the ordinary.

    In any case, the raw data must be preserved!

    Indeed. I create subsequent datasets with each transformation. That lets me examine the effect of each step. In R it’s simple to do.

    w.

  142. Eadler, you raise an interesting issue when you quote the HadCRUT folks as saying:

    eadler says:
    March 25, 2011 at 7:08 pm

    …Unfortunately, most long-term climatological time series have been affected by a number of non-climatic factors that make these data unrepresentative of the actual climate variation occurring over time.
    These factors include changes in: instruments, observing practices, station locations, formulae used to calculate means, and station environment (Jones et al., 1985; Karl and Williams, 1987; Gullett et al., 1990;
    Heino, 1994). Some changes cause sharp discontinuities while other changes, particularly change in the environment around the station, can cause gradual biases in the data. All of these inhomogeneities can bias a time series and lead to misinterpretations of the studied climate. It is important, therefore, to remove the inhomogeneities or at least determine the possible error they may cause.

    Unless you have a convincing reference that shows homogenization is unnecessary and data can be vetted in some way without it, or that the statistical methods you use are not part of the methods reviewed in this paper, I believe I am justified in regarding your claim with a great deal of skepticism.

    Are there problems with the data? Absolutely. All of the problems the HadCRUT folks mentioned, instruments and all the rest, are mixed into the dataset. And I agree with you that we need to deal with that as best we can.

    However, do we want to “homogenize” the data? I object to the concept. What we want to do is remove any non-climate signals. This is a very different objective than homogenization, and requires different methods and algorithms.

    One problem I discussed above is that while temperatures are well correlated over fairly long distances, the same is not true of trends. For example, a slow decades long wind-shift may not affect one site much, but may gradually warm another site. The temperature records are still well correlated, but the trends are very different.

    This renders homogenization as is currently practiced theoretically problematic. I haven’t found an answer to that yet, although you may know of one.

    Thanks for your clarification of the issues,

    w.

  143. eadler says:
    March 25, 2011 at 7:08 pm

    “Unless you have a convincing reference that shows homogenization is unnecessary and data can be vetted in some way without it, or that the statistical methods you use are not part of the methods reviewed in this paper, I believe I am justified in regarding your claim with a great deal of skepticism.”

    In any proper sense of the term, “homogenization” is applied to eliminate the discrepancies between SPATIALLY separated records; it is typically applied even to data-perfect series to account for gradual UHI effects, rather than to cure TEMPORAL nonstationarities (e.g., datum offsets) or other data deficiencies. To be effective, homogenization–which the CRU white paper conflates with the distinct issues of data QC and record repair–requires a pristine ABSOLUTE baseline for reference, which is usually unavailable. Thus in practice, accurate wholesale homogenization proves impossible.

    Vetting is a process that separates the wheat from the chaff RELATIONALLY. Yes, indeed, the signal analysis methods involved are beyond the ken evidenced by the CRU white paper. And those century-long records in the GHCN data base that survive vetting number in the hundreds, albeit concentrated mostly in the USA. The fact that these findings are not summarized in any publicly availabe reference is an impediment only to academic pursuits and not to serious research.

Comments are closed.