The automatic adjustment procedure is almost guaranteed to produce spurious, artificial warming, and here’s why.
Guest essay by Bob Dedekind
Auckland, NZ, June 2014
In a recent comment on Lucia’s blog The Blackboard, Zeke Hausfather had this to say about the NCDC temperature adjustments:
“The reason why station values in the distant past end up getting adjusted is due to a choice by NCDC to assume that current values are the “true” values. Each month, as new station data come in, NCDC runs their pairwise homogenization algorithm which looks for non-climatic breakpoints by comparing each station to its surrounding stations. When these breakpoints are detected, they are removed. If a small step change is detected in a 100-year station record in the year 2006, for example, removing that step change will move all the values for that station prior to 2006 up or down by the amount of the breakpoint removed. As long as new data leads to new breakpoint detection, the past station temperatures will be raised or lowered by the size of the breakpoint.”
In other words, an automatic computer algorithm searches for breakpoints, and then automatically adjusts the whole prior record up or down by the amount of the breakpoint.
This is not something new; it’s been around for ages, but something has always troubled me about it. It’s something that should also bother NCDC, but I suspect confirmation bias has prevented them from even looking for errors.
You see, the automatic adjustment procedure is almost guaranteed to produce spurious, artificial warming, and here’s why.
Sheltering
Sheltering occurs at many weather stations around the world. It happens when something (anything) stops or hinders airflow around a recording site. The most common causes are vegetation growth and human-built obstructions, such as buildings. A prime example of this is the Albert Park site in Auckland, New Zealand. Photographs taken in 1905 show a grassy, bare hilltop surrounded by newly-planted flower beds, and at the very top of the hill lies the weather station.
If you take a wander today through Albert Park, you will encounter a completely different vista. The Park itself is covered in large mature trees, and the city of Auckland towers above it on every side. We know from the scientific literature that the wind run measurements here dropped by 50% between 1915 and 1970 (Hessell, 1980). The station history for Albert Park mentions the sheltering problem from 1930 onwards. The site was closed permanently for temperature measurements in 1989.
So what effect does the sheltering have on temperature? According to McAneney et al. (1990), each 1m of shelter growth increases the maximum air temperature by 0.1°C. So for trees 10m high, we can expect a full 1°C increase in maximum air temperature. See Fig 5 from McAneney reproduced below:
It’s interesting to note that the trees in the McAneney study grow to 10m in only 6 years. For this reason weather stations will periodically have vegetation cleared from around them. An example is Kelburn in Wellington, where cut-backs occurred in 1949, 1959 and 1969. What this means is that some sites (not all) will exhibit a saw-tooth temperature history, where temperatures increase slowly due to shelter growth, then drop suddenly when the vegetation is cleared.
So what happens now when the automatic computer algorithm finds the breakpoints at year 10 and 20? It automatically reduces them as follows.
So what have we done? We have introduced a warming trend for this station where none existed.
Now, not every station is going to have sheltering problems, but there will be enough of them to introduce a certain amount of warming. The important point is that there is no countering mechanism – there is no process that will produce slow cooling, followed by sudden warming. Therefore the adjustments will always be only one way – towards more warming.
UHI (Urban Heat Island)
The UHI problem is similar (Zhang et al. 2014). A diagram from Hansen (2001) illustrates this quite well.
In this case the station has moved away from the city centre, out towards a more rural setting. Once again, an automatic algorithm will most likely pick up the breakpoint, and perform the adjustment. There is also no countering mechanism that produces a long-term cooling trend. If even a relatively few stations are affected in this way (say 10%) it will be enough to skew the trend.
References
1. Hansen, J., Ruedy, R., Sato, M., Imhoff, M, Lawrence, W., Easterling, D., Peterson, T. and Karl, T. (2001) A closer look at United States and global surface temperature change. Journal of Geophysical Research, 106, 23 947–23 963.
2. Hessell, J. W. D. (1980) Apparent trends of mean temperature in New Zealand since 1930. New Zealand Journal of Science, 23, 1-9.
3. McAneney K.J., Salinger M.J., Porteus A.S., and Barber R.F. (1990) Modification of an orchard climate with increasing shelter-belt height. Agricultural and Forest Meteorology, 49, 177-189.
4. Lei Zhang, Guo-Yu Ren, Yu-Yu Ren, Ai-Ying Zhang, Zi-Ying Chu, Ya-Qing Zhou (2014) Effect of data homogenization on estimate of temperature trend: a case of Huairou station in Beijing Municipality. Theoretical and Applied Climatology February 2014, Volume 115, Issue 3-4, 365-373
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Victor Venema (@VariabilityBlog) says: June 11, 2014 at 1:21 pm
“Venema et al. 2012 discusses benchmarking results for a range of algorithms: OA at http://www.clim-past.net/8/89/2012/cp-8-89-2012.html“
That’s an impressive author list. Now, how about getting one of those guys to run Albert Park through their algorithms, and see which one produces a correct result?
Conversely, if none of them do, then you have a topic for ‘further research’ grant applications.
Everybody wins.
Bob Dedekind, homogenization can only improve the trend estimates for larger regions. It is known that it cannot improve every single station, it can only do so on average.
Thank you for finding a problem with this one station. I am sure NOAA will be interested in trying to understand what went wrong, that may provide useful information to improve their homogenization method. Just like the problems it has in Iceland and they are looking into what made the Pairwise Homogenization Algorithm produce too shallow trends in the Arctic (like Cowtan and Way just found).
However your claim was: “You see, the automatic adjustment procedure is almost guaranteed to produce spurious, artificial warming, and here’s why.”
That sounded more general. Then one would expect you to be able to back that up with evidence that there is a general problem that significantly changes the global mean temperature. Especially when you write:
“In the new sceptical era they have to demonstrate some transparency, or they simply won’t be taken seriously.”´
Or is that rule only for other people making claims. Demonstrate transparently that there is something wrong. Alternatively send a polite email to NOAA that one of their homogenized stations does not look right. They will be interested.
Victor Venema (@VariabilityBlog) says: June 11, 2014 at 2:51 pm
“However your claim was: “You see, the automatic adjustment procedure is almost guaranteed to produce spurious, artificial warming, and here’s why.”
That sounded more general.”
It is more general. It’s not just one site.
I showed clearly in my post above that there is an inherent problem with automatic breakpoint analysis, in general. To date nobody has shown why my contention is wrong. I produced peer-reviewed references to back up my argument. I showed an example, Albert Park. Why Albert Park? It’s in Auckland, where I happen to live, and I’ve done detailed analysis on its temperature history. We have a lot of data on it. Are there others? Absolutely – Hansen identified the problem back in 2001. We expect that by now the problem would have been solved in the algorithms. It isn’t, otherwise GHCN v3 would not have the incorrect adjustments for Albert Park.
And how can you “improve the trend estimates for large regions” by introducing incorrect artificial warming trends for individual stations in those regions?
Our comments crossed due to moderation.
Now, how about getting one of those guys to run Albert Park through their algorithms, and see which one produces a correct result?
We will in the International Surface Temperature Initiative. Here multiple algorithms will be applied to a new global temperature dataset that includes GHCN and thus also Albert Park.
With Zeke and NOAA we are working on a study that especially a lot of more difficult than realistic scenario’s to see if and when the algorithms no longer improve the data. That also includes a test dataset with many saw tooth inhomogeneities.
Both studies are volunteer efforts and will still take some time. There is unfortunately more funding for global climate models as for studying the quality of our invaluable historical observation. If you are in a hurry and want to be sure that the strong claim you made in the post actually holds you will have to brush up your Fortran skills.
The BEST method of finding “break points” to adjust temperature records to improve the record’s accuracy is fundamentally flawed in several areas.
First, as in the saw-toothed example at the head of this post:
From: Rasey Jan 23, 2013 at 11:30 am
What the BEST process does, by slicing and dicing nice long temperature records into separate segment station records is bake in slow instrument drift and station contamination as climate signal and discards the critical recalibration information. This is madness.
Second: A fundamental assumption of the BEST process is that the potential instrument error does not change over the length of a segment. In reality, if there is any physical reason to split a record into two separate stations, then the beginning of each segment is far more reliable than the end of the segment. Someone had to have set up the station and calibrate it. At the end of the station, they might have recalibrated it, but it is unlikely when it is temporarily abandoned, permanently shut down, or destroyed.
The third fundamental flaw in the BEST process is that by slicing long records and using the slope of the segments, they are taking a low-pass temperature signal and turning it into a band-passsignal and eliminating the lowest frequencies found in the signal. Then they are integrating the slope segments to reconstruct a signal with (what they say) reliable low frequency content.
I want to be very clear of my meaning here. I am NOT saying I expect to see any particular dominant frequency in temperature data. This is totally an information content issue.
Low frequency information is not contained in the high frequency content. You cannot use high frequency information to predict the low frequency stuff. If you filter out the Low Frequencies with a scalpel, they are gone. Regional homogenization cannot restore them if all the data has been subjected to the same scalpel process.
Fourth: the segment lengths that come out of the BEST automated scalpel are absurdly short to be used in a climate study. Look at them for yourselves. Here is my look at Denver Stapleton Airport, a BEST record that runs from 1873 to 2011 with TEN!! breakpoints, six of them between 1980 and 1996 inclusive. The station record has temperatures from before the airport existed, after the airport closed, and ten breakpoints, some as quick as 2 years in a record officially 130+ years long when the station itself probably existed only from after 1919 to 1995. That seems like an excessive number of breakpoints, especially when they don’t correlate will with documented airport expansion events.
Victor Venema (@VariabilityBlog) says: June 11, 2014 at 3:21 pm
“If you are in a hurry and want to be sure that the strong claim you made in the post actually holds you will have to brush up your Fortran skills.”
No fear, I have no intention of deliberately subjecting myself to Fortran again!
I’m happy to wait for the results of the study, but in the meantime I regard the GHCN v3 adjustments as incorrect, for the reasons I’ve laid out above, and because nobody has shown me any reason to conclude that something as basic as the Hansen problem has been catered for in the algorithms.
I’m not saying, by the way, that the Hansen problem is easy to solve using automatic means. Far from it – what do you use as the references for other trends, the homogenised or unhomogenised stations? Homogenised makes sense on the surface, but If you use the homogenised stations, you are using stations that have already had incorrect Hansen-type warming trends introduced, so your analysis is flawed. For example, if the current homogenised Auckland station is used as a reference, the calculated trend differences will be hopelessly wrong. And if you adjust another station (say Wellington) using Auckland, it becomes wrong too, but is then used as a homogenised reference. And so on.
If on the other hand you use the unhomogenised stations, genuine breakpoints such as 1928 in Kelburn have not yet been dealt with, and the trends will be wrong there too.
You have to remove the non-climatic trends first before you can do the breakpoint checks, but how do you find the non-climatic trends without resorting to trend comparison checks with stations that have themselves been incorrectly altered?
Tricky. But I’m sure you’ll find the solution.
As a statistician who has reviewed the adjustment data, I can guarantee you that the author’s suspician about spurrious warming adjustments and confirmation bias is correct. My impression is that there, indeed, has been warming, but not as much as the adjusted records indicate. I’d have to do a much more thorough analysis to calculate how much, and I don’t know if anyone has attempted that before. Fair warning – that would take a lot of personal time, time which I certainly don’t have…
Mosh and/or Zeke, Stephen Rasey above and Bob Dedekind in the head post raise several points that I hadn’t considered. Let me summarize them, they can correct me if I’m wrong.
• In any kind of sawtooth-shaped wave of a temperature record subject to periodic or episodic maintenance or change, e.g. painting a Stephenson screen, the most accurate measurements are those immediately following the change. Following that, there is a gradual drift in the temperature until the following maintenance.
• Since the Berkeley Earth “scalpel” method would slice these into separate records at the time of the discontinuities caused by the maintenance, it throws away the trend correction information obtained at the time when the episodic maintenance removes the instrumental drift from the record.
• As a result, the scalpel method “bakes in” the gradual drift that occurs in between the corrections.
Now this makes perfect sense to me. You can see what would happen with a thought experiment. If we have a bunch of trendless sawtooth waves of varying frequencies, and we chop them at their respective discontinuities, average their first differences, and cumulatively sum the averages, we will get a strong positive trend despite the fact that there is absolutely no trend in the sawtooth waves themselves.
So I’d like to know if and how the “scalpel” method avoids this problem … because I sure can’t think of a way to avoid it.
In your reply, please consider that I have long thought and written that the scalpel method was the best of a bad lot of methods, all methods have problems but I thought the scalpel method avoided most of them … so don’t thump me on the head, I’m only the messenger here.
w.
Willis Eschenbach says: June 11, 2014 at 6:57 pm
“…we will get a strong positive trend despite the fact that there is absolutely no trend in the sawtooth waves themselves.”
Bingo. Until this issue is solved the “most accurate” trend is the unadjusted one. My reasons are:
1) In a large dataset such as the land stations, it is to be expected that true breakpoints from station moves are randomly distributed, in sign and time. It is fair then (or at least less error-prone) to use the unadjusted series “as is”, at least as far as breakpoints are concerned.
2) The issue of shelter growth and/or urban heat islands means that the overall unadjusted trend is an upper bound. We can at least state with confidence that the true trend is less than the unadjusted trend.
Victor Venema (@VariabilityBlog) says:
June 11, 2014 at 1:21 pm
Thanks for that, Victor. I took a look at your study. It is interesting, but I fear you haven’t dealt with the issue I identified above. Let me repeat that section:
In your study you use both actual inhomogeneous observational data, and artificial “homogeneous” data to test the algorithms. But that assumes that you know how homogeneous the natural dataset would be if we had perfect data … but since we don’t, we only have inhomogeneous data, I see no way to tell the real inhomogeneities from the artificial.
Let me raise a point here. In your paper you encapsulate the problem when you say:
Now, every “homogenization” method implicitly makes the following assumption:
But since almost every record we have is known to be inhomogeneous … how can we tell the difference? To solve this, many algorithms make a further assumption:
I would question both those assumptions, for the obvious reasons, and I would again point out that we don’t know much about what a long-term homogeneous temperature record might look like … which makes depending on a computer algorithm a very dubious procedure, particularly without manual individual quality control (the lack of which seems to be all the rage).
That one I had read, including this
And at that point, the guy with his hand on the throttle gets to decide where to set the breakpoints … and as a result, the guy with his hand on the throttle gets to set the eventual trend. I wouldn’t mind so much if the full range of possibilities were spread out so we could see which one was chosen … but generally we don’t get that, we get the chosen, anointed result with little or no exploration of what happens with different “judgement calls at various decision points”.
Phil Jones was right about what? Right to refuse to release his data? Right to tell porkies? Victor, I was the poor fool who made the first FOI request to Phil Jones for his data. In response, he told me a number of flat-out fairy tales. Among them was the claim that much of his data was subject to confidentiality agreements. When he was forced by a subsequent FOI request to produce the purported agreements covering much of his data, he came up with … exactly one. And that truth-free claim about confidentiality agreements was just one among many of his bogus excuses for not producing his data for public examination.
You obviously haven’t heard the details of the squalid episode known as Climategate. My own small part in it is detailed here … read’m and weep …
w.
Nick Stokes linked to this site ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/products/stnplots/5/50793436001.gif
It clearly shows that the fit between 1930 til the present would show no warming. Adjusting for the one off event creates a warming trend for 70+years. A cooling trend prior to the one off event becomes a constant temp.Even a child could spot that the correction is not correcting anything, its creating a trend where there was none.
vicgallus says: June 11, 2014 at 8:10 pm
“It clearly shows that the fit between 1930 til the present would show no warming. Adjusting for the one off event creates a warming trend for 70+years.”
Vic,
There are times when an adjustment is necvessary. I would say that the Kelburn jump in 1928 is reasonable. The reason is that the site of the weather station moved up a hill, resulting in generally colder temperatures.
Have a look at the Wellington section here:
http://www.climateconversation.wordshine.co.nz/docs/Statistical%20Audit%20of%20the%20NIWA%207-Station%20Review%20Aug%202011%20SI.pdf
I am not expecting the algorithms to be fixed.
There was a lot of testing done on these algorithms. They have been designed to produce the desired result. If anything, the adjustment process will have to create another 2.5C of warming in the next 86 years so any “fixes” have to be in a certain direction.
@Bob Dedekind – Granted the pre 1928 data only looks like the trend has been changed but the post 1930 data does have a change to the trend. It is not merely shifted up uniformly. Why does 1990 need to be shifted up more than 1930?
vicgallus says: June 12, 2014 at 12:14 am
“…the post 1930 data does have a change to the trend.”
Yes, I have no idea what that’s all about. Neither NIWA nor our group identified any problems there. But I’ve noticed that with the GHCN results – they don’t necessarily coincide with any real changes.
Another cross comment. These moderation pauses are getting tedious. Is my opinion so dangerous?
Bob Dedekind says: “It is more general. It’s not just one site.”
Maybe I missed something. A saw tooth type pattern naturally happens more often, but do you have more examples as Albert Park that such a problem is not solved right?
And how can you “improve the trend estimates for large regions” by introducing incorrect artificial warming trends for individual stations in those regions?
Analogies seem to be dangerous in the climate “debate”, but let me still try one.
Do you or your boss always make the perfect decision? Is your firm still profitable? Would it be more profitable if the CEO would be paralyzed by aiming to make every single decision perfectly?
The stations around Albert Park almost surely also have non-climatic jumps. Think back 100 years, it is nearly impossible to keep the measurements and the surrounding constant. These jumps create wrong trends in these stations and in the regional mean climate. As long as the improvements in the other stations are larger than the problem created in Albert Park, the regional mean trend will become more accurate.
Your opinion is not. Your behavior is.
The issue of BEST baking in slow errors due to gradually fading paint by removing periodic repainting jumps that in fact self-correct those errors over time means it barely passes the laugh test, being designed to automatically *remove* sudden station maintenence back to pristine conditions while leaving in the bad conditions. Their claim that absolute value temperatures perfectly calibrated to the unwavering freezing and boiling points of water since the Farenheight scale of the 1700s can be adjusted downwards in the past requires much more support than they afford by mere cheerleading about the superiority of black box algorithms.
To test the validity of BEST’s claim that the US is warming at all we can skirt around the urban heating effect issue by simply looking at October when no heating or air conditioning is used around temperature stations or in whole big cities. And in October most states show absolutely no warming going back to the 1920s:
http://themigrantmind.blogspot.com/2009/12/hundred-years-of-october-cooling.html
The United States isn’t some oddball location even though it’s only a small percentage of Earth’s surface, and its the best data archive we have by far. October falsifies warming claims, bluntly, and by logical extension thus falsifies the entire global average temperature claim too. Only if the US is climatically isolated and anomalous can you wiggle out of this, acting as a lawyer.
Knowing any jury might acquit carbon dioxide since the canary didn’t die and the dog didn’t bark and the USA didn’t warm when the big machines were shut off, BEST adjusts the USA to create a warming trend all based on adjustments by activist “experts”:
http://oi58.tinypic.com/68wm5u.jpg
But where are the recent stories of heat waves in the states?
“Not guilty!” says the jury.
At June 10, 2014 at 8:36 am Barrybrill says:
“Paul Carter says shelter is less important at Kelburn because it is an exceptionally windy site. On the contrary, this windines means the data is particularly susceptible to contamination by vegetation growth.”
The Kelburn Stevenson screen is on the top of a hill – see google earth at -41.284499° 174.767920° . From the aerial view it looks like there’s a lot of trees surrounding the site but they’re largely down-hill from the screen, and don’t have the same impact on wind that a flat area surrounded by such trees would have. You can use street-view to verify this – there are street-view photos at the entrance to the car park. Cutting the trees back would make little difference to the amount of stationary air, as testified by the recorded temperatures. You need to appreciate just how windy Wellington is and particularly how exposed that spot is to understand how the trees have relatively little impact on overall temperature at that site.
Well victor in reply to your last bit, the financial analogy, I would like to point out that CEO’s who make decisions as poorly as the temperature adjustments at GHCN (ie changing a level trend to rising) usually end up getting sent away from fraud. So its not just about making the occasional mistake. I think our commenter from the financial sector would point out that in his realm total transparency is a requirement, as it should be in any endeavor. With total transparency you can go back and look at something and decide if the adjustment was warranted based on facts. This takes time but at the end of the day if your creating a temperature record for the ages you have all the time in the world.
v/r,
David Riser
Victor Venema,
“These jumps create wrong trends in these stations and in the regional mean climate.”
The majority of the examples given here show that precisely the jumps are not an alteration of the trend but rather are partial corrections which should not be canceled.
– The increase in tree height corrected by cutting,
– The deterioration of the painting corrected by a facelift,
– Increasing anthropogenic perturbations corrected by moving to a less disturbed area.
Bob Dedekind says:
June 11, 2014 at 7:13 pm
Mmmm … I’d guess that station moves on average would be from more urban to more rural. As a result they’d average cooler The data is there in the BEST dataset. So many questions … so little time. Mosh or Zeke might know.
I don’t have any numbers to back this up, but in general the permanent changes from human occupation such as roads, buildings, parking lots, and the like all make a location warmer. As does the replacement of forests with fields. In addition, in many locations we have large amounts of thermal energy being released (airports, near highways, Arctic towns in winter, in all cities, from airconditioning exhaust, industrial operations, etc.)
So in general, over time we’d expect to see the record increasingly affected by human activities.
w.
Victor Venema (@VariabilityBlog) says:
June 12, 2014 at 3:45 am
The moderators on this site are unpaid volunteers. We need moderators 24/7, so they are spread around the planet. And there’s not always as many of them available as we might like.
And yes, Victor, sometimes they need to get some sleep, or one of them has other things to to.
So no, Victor … sometimes it’s not all about you and your opinion …
My rule of thumb, which I follow at least somewhat successfully, is:
w.
[As of this morning, only 1,279,591 items have been reviewed and accepted. Thank you for your compliments Willis. 8<) .mod]
The number and amplitude of adjustments shows that thermometers are not very reliable, especially for long-term trends. The problem of quantification is not yet hopeless. We may use proxies whose high frequency correlation with temperature is proved. If several proxies are consistent, we can legitimately think they give a reasonable estimate of the trend. Two examples:
http://img38.imageshack.us/img38/1905/atsas.png
http://imageshack.us/a/img21/1076/polar2.png
Willis Eschenbach says: June 12, 2014 at 10:30 am
“Mmmm … I’d guess that station moves on average would be from more urban to more rural. As a result they’d average cooler”Quite possibly, but judging from the NZ record that only happened in the latter years. But regardless of that, think about the process – we have urban sites slowly increasing, often over many decades, with a non-climatic trend. Then there’s a move to a more rural site. So we have a saw-tooth. Can we correct for it automatically? Not easily, and it’s not being done right now.
So until the problem is resolved, we must leave the unadjusted record as is. Adjusting it is known to produce a Hansen error. Not adjusting it is maybe not perfect, but it’s better than the alternative.
Unfortunately, we do have data towards the right hand side of the saw-tooth that is artificially too high, so the trend is skewed up a little, which is why I believe it will be an upper bound on the trend. But that may be debatable – the main point is that the unadjusted is a better model right now until the issue is corrected.
Victor Venema (@VariabilityBlog) says: June 12, 2014 at 3:45 am
“A saw tooth type pattern naturally happens more often, but do you have more examples as Albert Park that such a problem is not solved right?”
Do I need more? I showed that this is a general problem. I showed how it affected an example. It is very clear from comments made by Zeke, Nick and yourself that there are no built-in checks to prevent this happening, in fact it almost seems as if nobody even thought of it, from the reaction I’m getting.
And then on top of that we can all see that the textbook example of Albert Park/Mangere fails the test, so we know that the software does not solve this problem.
Now Hansen identified the problem in 2001, Williams (2102) specifically states that the pre-1979 negative bias is possibly due to movement of stations in the Hansen manner, and Zhang et al. (2014) deal with this head-on, even quantifying it.
Quote from Zhang:
“Our analysis shows that data homogenization for [temperature] stations moved from downtowns to suburbs can lead to a significant overestimate of rising trends of surface air temperature.”
The only one in denial seems to be your good self.