Problems With The Scalpel Method

Guest Post by Willis Eschenbach

In an insightful post at WUWT by Bob Dedekind, he talked about a problem with temperature adjustments. He pointed out that the stations are maintained, by doing things like periodically cutting back the trees that are encroaching, or by painting the Stevenson Screen. He noted that that if we try to “homogenize” these stations, we get an erroneous result. This led me to a consideration about the “scalpel method” used by the Berkeley Earth folks to correct discontinuities in the temperature record.

The underlying problem is that most temperature records have discontinuities. There are station moves, and changing instruments, and routine maintainence, and the like. As a result, the raw data may not reflect the actual temperatures.

There are a variety of ways to deal with that, which are grouped under the rubric of “homogenization”. A temperature dataset is said to be “homogenized” when all effects other that temperature effects have been removed from the data.

The method that I’ve recommended in the past is called the “scalpel method”. To see how it works, suppose there is a station move. The scalpel method cuts the data at the time of the move, and simply considers it as two station records, one at the original location, and one at the new location. What’s not to like? Well, here’s what I posted over at that thread. The Berkeley Earth dataset is homogenized by the scalpel method, and both Zeke Hausfather and Steven Mosher have assisted the Berkeley folks in their work. Both of them had commented on Bob’s post, so I asked them the following.

Mosh and/or Zeke, Stephen Rasey above and Bob Dedekind in the head post raise several points that I hadn’t considered. Let me summarize them, they can correct me if I’m wrong.

• In any kind of sawtooth-shaped wave of a temperature record subject to periodic or episodic maintenance or change, e.g. painting a Stephenson screen, the most accurate measurements are those immediately following the change. Following that, there is a gradual drift in the temperature until the following maintenance.

• Since the Berkeley Earth “scalpel” method would slice these into separate records at the time of the discontinuities caused by the maintenance, it throws away the trend correction information obtained at the time when the episodic maintenance removes the instrumental drift from the record.

• As a result, the scalpel method “bakes in” the gradual drift that occurs in between the corrections.

Now this makes perfect sense to me. You can see what would happen with a thought experiment. If we have a bunch of trendless sawtooth waves of varying frequencies, and we chop them at their respective discontinuities, average their first differences, and cumulatively sum the averages, we will get a strong positive trend despite the fact that there is absolutely no trend in the sawtooth waves themselves.

So I’d like to know if and how the “scalpel” method avoids this problem … because I sure can’t think of a way to avoid it.

In your reply, please consider that I have long thought and written that the scalpel method was the best of a bad lot of methods, all methods have problems but I thought the scalpel method avoided most of them … so don’t thump me on the head, I’m only the messenger here.

w.

Unfortunately, it seems that they’d stopped reading the post by that point, as I got no answer. So I’m here to ask it again …

My best to both Zeke and Mosh, who I have no intention of putting on the spot. It’s just that as a long time advocate of the scalpel method myself, I’d like to know the answer before I continue to support it.

Regards to all,

w.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
181 Comments
Inline Feedbacks
View all comments
Evan Jones
Editor
June 29, 2014 3:17 pm

Microsite is the new UHI.

June 29, 2014 4:00 pm

evanmjones says:And he has made me think more deeply beyond the stats to the mechanism in play.”
That is good. That seems to be the main thing that would make the paper a lot more convincing. I have been thinking about this since our conversation, but I am unable to think of a mechanism that could explain the statistical results you found. Especially something that would cause artificial trends due to micrositing in the 1990s, but not since the US climate reference network was installed in 2004. Puzzling to me.
Could you simply call me VIctor Venema? I can’t help it that I had to get a PhD to be allowed to do research. That is the way the system works.
REPLY: You don’t need a PhD to be able to do research and publish papers, I’ve done three now, working on #4, and as many people like to point out, including yourself, I don’t have a PhD and according to many, I am too stupid to be in the same ranks with you. Yet, I do research and publish anyway. If the school of science didn’t have a foreign language requirement, and I didn’t have horrible unsolvable hearing problem, and the Dean of the School of Science wasn’t a prick at the time, and the ADA had been in place, that might have been different. TV/radio where I only had to speak was my salvation, I had a one-time chance and I took it. But, I know in the eyes of many in your position that career path makes me some sort of lowbrow victim of phrenology.
The explanation to the problem you pose is based in the physics of heat sinks, but you’ll just have to wait for the paper. Though, ahead of time to help readers understand, I may post an article and/or experiment to show how the issue manifests itself.
Bear in mind I don’t wish to start a dialog with you at the moment, mainly because you called your view of my religion into question without actually knowing what my view is, and I find that shameful and just as bad as the things you accuse me of. I’m only pointing out that PhD holders are not exclusive to research. No need to reply.
Also, while I can’t prevent it, even though I hold copyright on my own words, I ask that you not turn my comment into another taunt at your blog. It would be a good gesture if in fact you believe what you write about what I should be doing. – Anthony

June 29, 2014 4:10 pm

@Zeke Hausfather at 12:01 pm
Its also worth mentioning that Berkeley has a second type of homogenization that would catch spuriously inflated trends, at least if they were isolated. The kriging process downweights stations with divergent trends via-a-vis surrounding stations when creating the regional temperature field, after all stations have been homogenized.
Posit:
Well sited, Class 1 and 2 stations are the minority.
There are studies that suggest that Class 1/2 stations have lower trends than other.
The Best “second type of homogenization” would either,
A.) catch spuriously deflated trends and downweight them.
Therefore, the homogenization will have a tendency to downweight Well sited stations, a hypothesis consistent with findings in the Watts et al 2012 draft paper.
B), or be treating inflated trends differently than deflated trends.
Either A or B appear to be problematic.
The problem is that we must be upweighting Class 1 and Class 2 stations compared to Class 3, 4, 5. There is a case to be made that Class 3, 4, 5 stations should be downweighted to disappear.
Speaking of downweighting…. Do you (upweight,downweight) stations based upon the length of segments? Longer segments deserving greater weight, of course.

JeffC
June 29, 2014 4:17 pm

I’m sorry Will E but your claim that a change from F to C is a bullsh*t strawman and you know it … … 100 meters = x feet … reporting in feet or meters does not change the measurement … its called a correction, not an edit …

Bob Dedekind
June 29, 2014 4:21 pm

Zeke:

“What specific stations in the Auckland area show sawtooth-type patterns being incorrectly adjusted to inflate the warming trend? Here is a list of Auckland-area stations: http://berkeleyearth.lbl.gov/station-list/location/36.17S-175.03E

It’s difficult to work out what’s happening with your data, since it doesn’t make much sense.
For example, if you look at Albert Park in Auckland, the data runs to the present (I presume – the X-axis isn’t graduated particularly well) yet the station closed in 1989.
Where did you get your raw data from? If Albert Park has had another station spliced to its end, which station is that? How was it spliced?

June 29, 2014 4:37 pm

Bob,
You can see alternative names on the right side of the station page: http://berkeleyearth.lbl.gov/stations/157062
Stations tend to get merged if they have overlapping identical temperature measurements under different names.

June 29, 2014 4:43 pm

Bob,
There is a sawtooth-type signal in the difference series for Wellington – Kelburn around 1970-1980. You can see how both the gradual trend bias and the abrupt reversion to the mean are caught: http://berkeleyearth.lbl.gov/stations/18625

Bob Dedekind
June 29, 2014 5:17 pm

Zeke:

“You can see alternative names on the right side of the station page: http://berkeleyearth.lbl.gov/stations/157062

I’m sorry but I don’t get that. The list is:
ALBERT PARK
AUCKLAND
AUCKLAND AERODROME
AUCKLAND AIRP
AUCKLAND AIRPORT
AUCKLAND CITY
AUCKLAND, ALBERT PAR
As far as I can tell from this, there are two sites, Albert Park and Auckland Airport, but it certainly isn’t clear, because the chart has three red diamonds (Station moves) shown. How do you know when the station move happened? Do you look at metadata? Is the “station move” the same as a splice point?
The elevation is given as 27m. Albert Park is 49m, Auckland Airport is 5m or less. Perhaps it’s the average?

“Stations tend to get merged if they have overlapping identical temperature measurements under different names.”

It is extremely unlikely that Albert Park and Auckland Airport had identical temperatures, simply because there is a well-documented 0.66°C difference between the two. Unless you’re talking about correlations, or anomalies.

June 29, 2014 5:24 pm

Bob Dedekind,
Not sure, I’d have to look into the reasons for the merge in that particular case.

catweazle666
June 29, 2014 5:39 pm

Well, you have to hide the decline somehow, Willis.
That’s what climate science is all about, isn’t it?
What a SNAFU!
It will be years before the scientific profession recovers from the efforts of these shysters.

Bob Dedekind
June 29, 2014 5:46 pm

Zeke:

“There is a sawtooth-type signal in the difference series for Wellington – Kelburn around 1970-1980. You can see how both the gradual trend bias and the abrupt reversion to the mean are caught: http://berkeleyearth.lbl.gov/stations/18625

Well, sort of, I’m battling to see the gradual trend reduction there, but it may be because only the breakpoint graph is shown, is there a gradual change graph somewhere as well?
The problem at Kelburn is the growth of the shelter in the surrounding Botanical Gardens, that grew over the decades. The shelter clearances affected only trees close to the site, not the wider area. Hessell identified this in 1980.
The last close shelter clearance was 1969, apparently, so I’m not sure what caused the 1970-1980 excursion. A building was put up close by in 1968, and the maximum temperature thermometer was replaced in 1969.
If the BEST process reduces the trends, then this is a step in the right direction. I see no such adjustment in the NCDC approach.
However, reducing trends is tricky. Do you check against raw data from other stations regionally, or adjusted data?

June 29, 2014 5:55 pm

Bob Dedekind,
I believe that the difference series in question are calculated prior to adjustments by comparing each station to the raw station records of surrounding stations.
Also, NCDC’s method should be able to pick out similar sawtooth patterns; see the M4 model in Menne and Williams 2009: ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/papers/menne-williams2009.pdf
As I mentioned earlier, this could all be tested better using synthetic data, something that is planned as part of the new International Surface Temperature Initiative: http://www.geosci-instrum-method-data-syst-discuss.net/4/235/2014/gid-4-235-2014.html

Bob Dedekind
June 29, 2014 6:08 pm

Zeke:

“Also, NCDC’s method should be able to pick out similar sawtooth patterns; see the M4 model in Menne and Williams 2009: ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/papers/menne-williams2009.pdf

You’re right, it should be able to pick out these patterns, but doesn’t.
I have looked carefully through all the NCDC New Zealand stations adjustments. Not one shows any gradual trend reduction adjustment at all. If you can find one please point it out.
Auckland is an excellent test case. It’s a long-running site (since 1853) that contains well-documented gradual UHI/shelter problems (quantified by both NIWA and the NZCSC). It also has a splice to Mangere with a 0.6°C difference – in other words a perfect Hansen-type situation.
If an algorithm gets Auckland right, it will most likely work everywhere, at least for saw-tooth analysis. But getting Auckland wrong proves the algorithm needs work.

June 29, 2014 6:21 pm

@Zeke Hausfather at 12:01 pm
Here are a few examples of sawtooth and gradual trend inhomogeneities seem to be correctly adjusted:
Like Willis, what is the evidence that ANY of the breaks is a correct adjustment? . Much less ALL of them?
Notes:
#Moves, #Other Breaks, (3 Longest Segment since 1960 incllusive,) Difference from Regional
http://berkeleyearth.lbl.gov/stations/169993
SAVANNAH/MUNICIPAL, GA.
2 moves, 8 other breaks, (18, 17, 10) year, -0.5 deg C
http://berkeleyearth.lbl.gov/stations/30748
JONESBORO 2 NE (?Arkansas?)
6 moves (all since 1974), 14 Others, (16, 11, 8) years, -2.0 deg C
http://berkeleyearth.lbl.gov/stations/156164
TOYKO
2 moves (1 in 2006), 5 Others, (40, 15, 8), +1.9 deg C
http://berkeleyearth.lbl.gov/stations/161705
LAS VEGAS MCCARRAN INTL AP (1936-current)
2 moves (1996, 2008), 7 Others, ( 34, 13, 6 ) years, +2.5 deg C over regional
http://berkeleyearth.lbl.gov/stations/33493
FOLSOM DAM, (near San Fran, CA) (1893 to 1993)
?1 move 1957, 11 Others, (18, 15, NA) years, -1.0 deg C
http://berkeleyearth.lbl.gov/stations/34034
COLFAX (near Sacramento, CA) 1891-current
7 moves (6 since 1972), 6 Other breaks, (18, 16, 5) years, -0.1 deg C
This one bears a revisit.
It is a flat regional trend difference with a few years of -1.0.
The Raw Anomaly looks dead flat. BEST says it is 0.43 Deg / Century
After break points applied it is 0.71 deg / Century, Regional is 0.79 deg / Century.

Bob Dedekind
June 29, 2014 6:34 pm

Looking carefully at the BEST chart for Auckland, I’d guess that it merged Albert Park with Auckland Aero in 1962 (when Aero opened) and then joined Aero to Aero AWS in 2010. All well and good.
But what happened around 1930? Perhaps Riverhead Forest (opened 1928) was spliced in between Albert Park and Aero, but it isn’t on the list.
A mystery.

Bob Dedekind
June 29, 2014 6:46 pm

More worrying is the lack in BEST of the six stations specifically identified by Hessell (1980) as good rural New Zealand sites “not known to be significantly affected in any of these ways [sheltering/urbanisation/screen changes]”.
These sites are:
-Te Aroha
-Appleby
-Waihopai
-Lake Coleridge
-Fairlie
-Ophir
Why were these good sites excluded, when poor sites like Albert Park were included?
Should we be worried that Te Aroha’s trend is 0.23°C/century, Appleby’s is 0.52°C/century and Fairlie’s is 0.45°C/century (I haven’t calculated the others yet)?
All these are somewhat less than the average.

Roger Dewhurst
June 29, 2014 7:08 pm

A partial solution would be to run two identical stations very close to each other and offset the maintainance by, say, two years. The two sets of data can then be plotted together and any offset between them can be attributed to the maintainance.

thingadonta
June 29, 2014 7:14 pm

In my experience there is no statistical solution to this conundrum. There are only fudge factors.
Problems with combining different datasets which have actual sampling methodology differences cannot be resolved unless you go back and re-sample and re-submit for analysis. If the difference exists in the processing of the sample only, then you need to re-submit the sample, unless there is degradation of the sample over time. (Every case is different). And since weather/climate data is a case which is time -dependant, unless you have a time machine, in my humble opinion you can’t fix this problem, because you can’t re-sample nor re-submit for re-analysis.
What is important is that the data is archived and it is stated clearly what the sampling and methodology was. What you can’t do is throw out the original raw data, or combine different datasets without noting the inherent limitations. If you could, the laws of physics would have to be changed. Sorry can’t be fixed.

Konrad
June 29, 2014 7:15 pm

Problems with the scalpel method?
Well, there is the number one problem. It’s still using surface station data.
This data is not fit for the purpose BEST are trying to claim for it. No amount of extra time in the blender is going to unscramble the egg.
Repeated attempts to use surface station data to “identify” climate changes in fractions of a degree speaks to motive.

Bill Illis
June 29, 2014 7:18 pm

Just start first with a histogram of the breakpoint impacts over time. The next step is to figure out why.
We are just arguing about nebulous suppositions but noone is starting at the first point about what the data actually shows.

June 29, 2014 8:37 pm

Sorry Zeke, but thats a crock.
“Here are a few examples of sawtooth and gradual trend inhomogeneities seem to be correctly adjusted: http://berkeleyearth.lbl.gov/stations/169993
I just looked at the first one in the list and the breakpoint algorithm found lots of issues in modern times and not a single issue from 1870 through 1930 when we were using primative measurements which I’d fully expect to vary wildly with the measurement devices themselves. We were riding around on horses for crying out loud.

NikFromNYC
June 29, 2014 9:12 pm

What happens to the overall trend if you halve the threshold of your automatic knife?
Is it highly sensitive to that particular knob? Or is it robust to less or more frantic chopping?
Why does it chop regularly instead of rarely?
Are there multiple arcane parameters involved besides a simple threshold value?
Is there a sudden last decade reason why your system busts out into the climate model stratosphere?
What on Berkeley Earth are you *really* doing?
How many knobs are there and what are their ranges of adjustment?
Is this just another pretty alarmist merry go round?
But if we’re all going to die why won’t you tell us why?

June 29, 2014 9:22 pm

Victor Venema writes “If the mean of this difference is not constant, there is something happening at one station, that does not happen at the other.”
Makes the assumption that one station is better than another. Reality is almost certainly that there are some issues at all of the stations over the years and TOBs is certainly one that comes to mind.
Also this makes the assumption that there can be no legitimate regional trends over the years which also seems wrong and altered large scale irrigation comes to mind for that.

June 29, 2014 9:23 pm

More on my 6:21 pm reply to Zeke Hausfather at 12:01 pm
Lets revisit that TOKYO case
http://berkeleyearth.lbl.gov/stations/156164
2 moves (1 in 2006), 5 Others, (40, 15, 8), +1.9 deg C
As BEST stations go, this one is in fewer pieces, only eight, from 1876 thru 2013.
The Raw Temperature record shows a +4.0 deg rise, BEST says 2.59 deg C/century
The Difference From regional shows aout a +1.9 deg risk
So the Regional profile that the scalpel takes its orders from must show shows about 2.1 deg C of warming, and the table says 0.93 ± 0.10 deg C / century.
Here is the deal. That raw Temp Rise for Tokyo has a large UHI component. We know from <a href=http://wattsupwiththat.com/2014/04/26/picking-cherry-blossoms/#comment-1622330?studies of Cherry Tree Festival records that the cities urban centers have warmed significantly enough to accelerate the cherry blossoms as much as a week ahead of the countryside.
Ok, so BEST measures a spurious increase in trend against the region and adjusts Tokyo down to the regional trend. Oh! Happy Days, we’ve eliminated the UHI from the record. Rejoice! … Except we know from the cherry blossom records that all cities are experiencing acceleration in blooming. The regional record has a significant UHI component that BEST has just baked into the official adjusted “clean climate” record. And it will keep baking it in to every other city until the UHI is fully homogenized with all the stations.
While we are on the subject of the TOKYO station record and its relatively few breakpoints… It doesn’t have a breakpoint I expected. March 1945 should have generated one heckofa breakpoint and probable station move. BEST doesn’t show one. BEST can tease out of the data 20 station moves and breakpoints for Lulling, TX. But BEST somehow feels no break point is warranted on a day a quarter million people die in a city-wide firestorm.
I’m not a supporter of the BEST process. Never was. Never will be — I’ve seen enough.