Guest Post by Willis Eschenbach
In an insightful post at WUWT by Bob Dedekind, he talked about a problem with temperature adjustments. He pointed out that the stations are maintained, by doing things like periodically cutting back the trees that are encroaching, or by painting the Stevenson Screen. He noted that that if we try to “homogenize” these stations, we get an erroneous result. This led me to a consideration about the “scalpel method” used by the Berkeley Earth folks to correct discontinuities in the temperature record.
The underlying problem is that most temperature records have discontinuities. There are station moves, and changing instruments, and routine maintainence, and the like. As a result, the raw data may not reflect the actual temperatures.
There are a variety of ways to deal with that, which are grouped under the rubric of “homogenization”. A temperature dataset is said to be “homogenized” when all effects other that temperature effects have been removed from the data.
The method that I’ve recommended in the past is called the “scalpel method”. To see how it works, suppose there is a station move. The scalpel method cuts the data at the time of the move, and simply considers it as two station records, one at the original location, and one at the new location. What’s not to like? Well, here’s what I posted over at that thread. The Berkeley Earth dataset is homogenized by the scalpel method, and both Zeke Hausfather and Steven Mosher have assisted the Berkeley folks in their work. Both of them had commented on Bob’s post, so I asked them the following.
Mosh and/or Zeke, Stephen Rasey above and Bob Dedekind in the head post raise several points that I hadn’t considered. Let me summarize them, they can correct me if I’m wrong.
• In any kind of sawtooth-shaped wave of a temperature record subject to periodic or episodic maintenance or change, e.g. painting a Stephenson screen, the most accurate measurements are those immediately following the change. Following that, there is a gradual drift in the temperature until the following maintenance.
• Since the Berkeley Earth “scalpel” method would slice these into separate records at the time of the discontinuities caused by the maintenance, it throws away the trend correction information obtained at the time when the episodic maintenance removes the instrumental drift from the record.
• As a result, the scalpel method “bakes in” the gradual drift that occurs in between the corrections.
Now this makes perfect sense to me. You can see what would happen with a thought experiment. If we have a bunch of trendless sawtooth waves of varying frequencies, and we chop them at their respective discontinuities, average their first differences, and cumulatively sum the averages, we will get a strong positive trend despite the fact that there is absolutely no trend in the sawtooth waves themselves.
So I’d like to know if and how the “scalpel” method avoids this problem … because I sure can’t think of a way to avoid it.
In your reply, please consider that I have long thought and written that the scalpel method was the best of a bad lot of methods, all methods have problems but I thought the scalpel method avoided most of them … so don’t thump me on the head, I’m only the messenger here.
w.
Unfortunately, it seems that they’d stopped reading the post by that point, as I got no answer. So I’m here to ask it again …
My best to both Zeke and Mosh, who I have no intention of putting on the spot. It’s just that as a long time advocate of the scalpel method myself, I’d like to know the answer before I continue to support it.
Regards to all,
w.
What happens if you keep the scalpel corrections as signed increments to the error bands?
Any data handling method that can produce a positive temperature trend is highly sought after amongst CAGW supporters in these days of no obvious warming. TOBS, homogenisation, UHI, relocation and loss of sites can all be pressed into the service of The Cause in some way or another. There is no consideration of Scientific Method here, it is now all just politics.
Link to Dedekind’s post does not work.
Do you assume that the drift is always positive: a tree creating a shadow, new asphalt, new buildings, fading paint, …
Is it possible to find signal that is smaller than the measurement errors. This is the big question of the temperature records.
same problem as exposed for GISS pair-wise correction. The method cannot catch gradual drift followed by step-wise correction. Thus it introduces bias into signals that have no bias.
however, since the bias tends to introduce warming and warming was expected, the error went undetected by the programmers.
no conspiracy is required. programmers never look for errors when the data gives the expected result. that is how testing is done most of the time. you only look for problems when you don’t get the answer you expect. so, if you expect warming, you only look for bugs when the results don’t show warming. as a result, bugs that cause warming are not likely to be found – at least not by the folks writing the code.
“the most accurate measurements are those immediately following the change. Following that, there is a gradual drift in the temperature until the following maintenance.”
Could we create a subset containing the first measurement only from each change, if these are the most accurate?
Much fewer measurements, but illuminating, *if*, the trend differs from using all of the measurements?
steverichards1984 says: June 28, 2014 at 11:39 pm
subset containing the first measurement only….
Yes, please. I’ve often advocated a look at a large temperature subset composed of the first 5 years of operation only, of a new or relocated station. I lack the means but I promote the idea.
More philosophically, it is interesting how the principle of ‘adjustment’ has grown so much in climate work. It’s rather alien to most other fields that I know. I wonder why climate people put themselves in this position of allowing subjectivity to override what is probably good original data in so many cases.
That is a well and long known problem
http://climateaudit.org/2011/10/31/best-menne-slices/#comment-307953
http://wattsupwiththat.com/2014/01/29/important-study-on-temperature-adjustments-homogenization-can-lead-to-a-significant-overestimate-of-rising-trends-of-surface-air-temperature/
I don’t think there is much enthusiasm for any improvement among those invested in AWG.
Halving temperature trends over land may give a better guess than BEST, That matches better with McKitricks paper, with Watt’s draft and with lower troposphere satellite trends.
“…a temperature record subject to periodic or episodic maintenance or change…”. Have any tests been done to determine the magnitude of changes such as repainting compared to daily dirt and dust build-up? I have a white car which progessively becomes quite grey until a good rain storm. I would imagine such dirt build up could have a significant effect on a Stevenson screen, between rain storms?
Splitting a record at a breakpoint has the same effect as correcting the breakpoint. If the breakpoint was caused by station maintenance or other phenomena that RESTORES earlier observing conditions after a period of gradually increasing bias, correcting the breakpoint or splitting the record will preserve the biased trend and eliminate a needed correction. If a breakpoint is caused by a change in TOB, the breakpoint needs to be corrected or the record needs to be split to eliminate the discontinuity. If a breakpoint is cause by a station move, we can’t be sure whether we should correct it or leave it alone. If the station was moved because of a gradually increasing [urban?] bias and the station was moved to an observing location similar to the station’s early location, correcting the breakpoint will preserve the period of increasing urban bias. If the station wasn’t moved because the observing site wasn’t degrading, then correction is probably warranted.
WIthout definitive meta-data, one can’t be sure which course is best. However, only one large shift per station which cools the past can be attributed to a change is TOB, along with any pairs of offsetting large shifts. All other corrections that are undocumented probably should be included in the uncertainty of the observed trend (corrected for documented biases). For example, global warming in the 20th century amounted to 0.6 (observed change after correcting for documented artifacts) – 0.8 degC (after correcting all apparent artifacts).
It is clear beyond doubt (see for example the recent articles on Steve Goddard’s claim regarding missing data and infilling) and the poor siting issues that the surface station survey highlighted, that the land based thermometer record is not fit for purpose. Indeed, it never could be, since it has always been strained well beyond its original and design purpose. The margins of error far exceed the very small signal that we are seeking to wean out of it.
If Climate Scientists were ‘honest’ they would, long ago, have given up on the land based thermometer record and accepted that the margins of error are so large that it is useless for the purposes to which they are trying to put it. An honest assessment of that record leads one to conclude that we do not know whether it is today warmer than it was in the 1880s or in the 1930s, but as far as the US is concerned, it was probably warmer in the 1930s than it is today..
The only reliable instrument temperature record is the satellite record, and that also has a few issues, and most notably the data length is presently way too short to be able to have confidence in what it reveals.
That said, there is no first order correlation between the atmosheric level of CO2 and temperature. The proper interpretation of the satellite record is that there is no linear temperature trend, and merely a one off step change in temperature in and around the Super El Nino of 1998.
Since no one suggests that the Super El Nino was caused by the then present level of CO2 in the atmosphere, and since there is no known or understood mechanism whereby CO2 could cause such an El Nino, the take home conclusion from the satellite data record is that climate sensitivity to CO2 is so small (at current levels, ie., circa 360ppm and above) that it cannot be measured using our best and most advanced and sophisticated measuring devices. The signal, if any, to CO2 cannot be seperated from the noise of natural variability.
I have always observed that talking about climate sensitivity is futile, at any rate until such time as absolutely everything is known and understood about natural variation, what are its constituent forcings and what are the lower and upper bounds of each and every constituent forcing that goes to make up natural variation.
Since the only reliable observational evidence suggests that sensitivity to CO2 is so small, it is time to completely re-evaluate some of the corner stones upon which the AGW hypothesis is built. It is at odds with the only reliable observational evidence (albeit that data set is too short to give complete confidence), and that sugggests that something fundamental is wrong with the conjecture.
Per Willis
“…As a result, the raw data may not reflect the actual temperatures….”
//////////////////////////
Wrong; the raw data is the actual temperature at the location where the raw data is measured.
What you mean is whether there are some factors at work which have meant that the actual temperature measured (ie., the raw data) should not be regarded as representaive of temperatures because it has been distorted (upwards or downwards) due to some extrinsic factor (in which i include changes in the condition of the screen, instrumentation, TOBs as well as more external factors such as changes in vegetaion, nearby building etc). .
Global cooling says:
June 28, 2014 at 11:03 pm
Thanks, fixed.
While in theory the jumps should be equally positive or negative, most human activities tend to raise the local temperature. In particular the growth of the cities has led to UHI. As a result, when many of the weather stations moved to nearby airports after WWII, there would be a sharp cooling of the record.
In addition, if you just leave a met station alone, the aging of the paint and the growth of surrounding vegetation cutting out the wind both tend to warm the results.
However, there are changes that cool the station, so yes, the jumps will go both ways. But that doesn’t fix the problem. The scalpel method is removing the very information we need to keep from going wrong.
Mmmm … in theory, sure. Signal engineers do it every day. But in the temperature records? Who knows.
w.
I don’t understand this constant fiddling with data.
Consider; you perform an experiment, you get some results (ie., the actual results of the experiment which is the raw data that you have found). You then interpret these results, and set out your findings and conclusions (which will by necessity discuss the reliability amd mergins of errors of the actual results of the experiment). But you never substitute the actual results of the experiments with your own interpreted results, and claim that your own interpreted results are the actual results of the experiment conducted.
When someone seeks to replicate the experiment, they are seeking to replicate whether the same raw data is achieved. When you seek to review an earlier performed experiment, two distinct issues arise;
1. Does the replicated experiment produce the same raw data?
2 Does the interpretation which the previous experimentor gave to the findings withstand scientific scrutiny, or is there a different (possibly better) interpretation of the raw data?
These should never be confused.
The raw data should always remain collated and archived so that others coming after can consider what they consider the raw data shows. Given advances in techology and understanding, later generations may well have a very different take on what the raw data is telling. Unfortunately it appears that much of the original unadjusted raw data, on a global basis, is no longer available.
If we cannot detect the effects of UHI in the land based thermoter record, given that UHI is a huge signal and we know that over the years urbanisation has crept and that station drops outs have emphasised urban stations over truly rural stations, there is no prospect of seeing the far weaker signal of CO2.
Willis Eschenbach says:
June 29, 2014 at 1:03 am
//////////////////
Commonsense would suggest that the majority of recent adjustments should be to cool recent measurements.
Given the effects of UHI and urban crawl, switch to airports etc (with more aircraft traffic and more powerful jet engines compared to props etc) these past 40 or so years, would be that adjustments to the ‘record’ for 2014 through to say 1970 should be such that these measurements are lowered (since the raw data would be artificially too high due to warming pollution by UHI etc).
Yet despite this being the commonsense position, it appears that the reverse is happening. Why???
In Nov 1, 2011, Steve McIntyre writes:
That was a while ago. Has BEST published such a histogram of breakpoint trend offsets?
On the majority of BEST stations and it’s breakpoints I investigate, I am appalled at the shortness of the segments produced by the scalpel. We’ve just seen Lulling, TX. I’ve written about Stapleton Airport Denver,
CO, where the BEST breakpoints do not match airport expansion events, yet BEST misses the opening and closing of the airport!!!
People, I’ll accept a breakpoint in a station if it is move with a significant elevation change. No Airport in the world could have a move breakpoint based on elevation change. I’ll grant you that moving a temperature station at LAX from east end of the runway to the west end of the runway might warrant a breakpoint. But a climate change within the bounds of an airport is the exception, not the rule. Let us see from BEST how many AIRPORT stations have c(0,1,2,3,4,…) breakpoints in their history. I bet 90% of them make no sense. If there is a station move WITHIN an airport, and it is for microsite conditions, it does not deserve a break. If it is for maintenance, it does not deserve a break. If it is to move it away from an expanding terminal, it does not deserve a break. If it is moved next to the ocean, Ok, give it a breakpoint. How often does than happen? According to BEST it happens all the time.
Richard Verney says:
“If we cannot detect the effects of UHI in the land based thermometer record, given that UHI is a huge signal and we know that over the years urbanization has crept and that station drops outs have emphasized urban stations over truly rural stations, there is no prospect of seeing the far weaker signal of CO2.”
Perhaps the answer is to follow the advice of John Daly and use only those ´rural´ stations with a long record in areas where the response to CO2 ( if discernable) will be at its highest and competition with water vapour at its lowest – in the Arctic and Antarctic regions where the low temperatures give the highest IR output in the region where CO2 has its highest absorption bands. Given that it has been stated that we only need 50 or so records to give an accurate GTA he offers 60 plus sites that meet the criteria of being minimally affected by UHI effects, most of which do not show any long term warming trend with many showing cooling.
http://www.john-daly.com/ges/surftmp/surftemp.htm
Peter,
I agree completely!
Temporal UHI (Urban Heat Island) effect is another continuous drift, this is why BEST was unable to identify it. It also goes into the positive direction, because, although population explosion was already over 2 decades ago (global population below age 15 is not increasing any more), population density, along with economic activity keeps increasing in most places, due to increasing life expectancy and ongoing economic development. Which is a good thing, after all.
It is far from being negligible, UHI bias alone can be as much as half the trend or more.
I think a reasonable request of BEST is to produce a graph:
X: Number of Years
Y: Number of station segments whose length >= X
By lines:: For Years = (1902, 2012, by= 5) That would be about 21 curves.
That would let us easily see how many of the 40,000 stations BEST claims have a segment length of say 40 years for 2002. Or >= 20 years for 1992. I think most observers would be shocked at how few such segments remain in the record.
We made this request to Zeke toward the end of WUWT Video Zeke Hausfather Explains…, Dec. 18, 2013.
Richard H. and I made some very rough estimates from the raw files. BEST has this data at hand. They could post it tomorrow, if they wanted to.
Richard Verney
I always appreciate your comments. As I commented on the other thread
‘It is absurd that a global policy is being decided by our governments on the basis that they think we know to a considerable degree of accuracy the global temperature of land and ocean over the last 150 years.
Sometimes those producing important data really need to use the words ‘ very approximately’ and ‘roughly’ and ‘there are numerous caveats’ or even ‘we don’t really know.’
Tonyb
Tony Brown -Climate Reason
In my 2:21 post I mentioned that Richard H and I investigated some data on station length from the raw files BEST listed in an index. This is a plot of stations of greater length than X (for the entire life of the dataset). There are three curves based upon the percent of missing data.
The total number of sites was 35,000,
but only 10,000 of them had less than 10% missing data
and only 3000 of them had <10% missing data and greater than 38 years long.
I cannot remember the number of times I have written on blogs that ” IT IS TOTALLY SCIENTIFICALLY UNACCEPTABLE TO ALTER PAST DATA” unless you have good, scientific and mathematical analysis to allow you to do so without any doubt or favour.
STOP IT !!
The problem is an experimental error. Trying to fix an experimental area after the experiment using a statistical/processing methodology seems superficial. Fixing the flaw in the experimental design and rerunning the experiment is the way to go. It would seem better to remove stations that have moved or have had significant changes in land use. This may not be an option in most parts of the world but the US record may act as a benchmark for how one should assess global records.
“Stephen Richards says:
June 29, 2014 at 2:50 am
” IT IS TOTALLY SCIENTIFICALLY UNACCEPTABLE TO ALTER PAST DATA” ”
Absolutely right. If there are perceived issues with historical data then do analysis on the raw data to show things eg like measured rural temperature rises more slowly, than airport or city data etc. (if it does).
Do they even still have the historical raw data? As I understand it, lots of stations are no longer included in analysis, there are a myriad ways of deselecting stations to create any sort of trend you may wish. At least with raw data you are measuring what we experience and it is what we experience that determines whether “things are getting worse or better”. Signs of any “Thermogeddon” would appear in raw data with more certainty than in treated data.