BEST practices step uncertainty levels in their climate data

Note the step change. At about 1960, the uncertainty levels plummet, meaning BEST is claiming we became more than twice as certain of our temperature estimates practically overnight.
Note the step change. At about 1960, the uncertainty levels plummet, meaning BEST is claiming we became more than twice as certain of our temperature estimates practically overnight.

Brandon Shollenberger writes in with this little gem:

I thought you might be interested in a couple posts I wrote discussing some odd problems with the BEST temperature record.  You can find them here:

http://www.hi-izuru.org/wp_blog/2015/01/how-best-overestimates-its-certainty-part-2/

http://www.hi-izuru.org/wp_blog/2015/01/how-best-overestimates-its-certainty-part-1/

But I’ll give an overview.  BEST calculated its uncertainty levels by removing 1/8th of its data and rerunning its averaging calculations (then examining the variance in the results).  I’ve highlighted two problems I haven’t seen people discuss before.  If you’re familiar with the  Marcott et al reconstruction’s inappropriate confidence intervals, some of this may sound familiar.

First, BEST only reruns its averaging calculations to determine its uncertainty.  It does not rerun its breakpoint calculations.  As you may know, BEST breaks data from temperature stations into segments when it finds what it believes to be a “breakpoint.”  The primary way it looks for these breakpoints is by comparing stations to other stations located nearby.  If a station seems too different from its neighbors, it will be broken into segments which can then be realigned.  This is a form of homogenization, a process whereby stations in the dataset are made to be more similar to one another.

This process is not repeated when BEST does its uncertainty calculations.  The full data set is homogenized, and subsets of that homogenized data set are compared to determine how much variance there is.  This is inappropriate.  The amount of variance BEST finds within a homogenized data set does not tell us how much variance there is in BEST’s data.  It only tells us how much variance there is once BEST is finished homogenizing the data.

Second, to determine how much variance there is in its (homogenized) data set, BEST reruns its calculations with 1/8th the data removed, eight times.  This produces eight different series.  When comparing these different series, BEST realigns them so they all share the same baseline.  The baseline period BEST uses for its alignment is 1960-2010.

This is a problem.  By aligning the eight series on the 1960-2010 period, BEST artificially deflates the variance between those series in the 1960-2010 period (and artificially inflates the variance elsewhere).  That makes it appear there is more certainty in the recent portion of the BEST record than there actually is.  The result is there is an artificial step change in BEST uncertainty levels at ~1960.  This is the same problem demonstrated for the Marcott et al temperature record (see here).

All told, BEST’s uncertainty levels are a complete mess.  They are impossible to interpret in any meaningful way, and they certainly cannot be used to try to determine which years may or may not have been the hottest.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

151 Comments
Inline Feedbacks
View all comments
AntonyIndia
January 29, 2015 9:37 pm

“Prior to 1960 there is no data at all in one of the world’s continents, Antarctica “. What about this:
Ice cores record significant 1940s Antarctic warmth related to tropical climate variability David P. Schneider* † and Eric J. Steig ‡ http://www.environmentportal.in/files/PNAS-Aug08-CC-IMP-Ice-Antarct-jour_0.pdf

AntonyIndia
Reply to  AntonyIndia
January 29, 2015 9:47 pm

Please note how most of these Antarctic ice core data were supplied through personal communications to the authors. Climate “science” at work. No wonder others can’t find anything.

Reply to  AntonyIndia
January 29, 2015 9:53 pm

Berkeley Earth is a compilation of surface temperature records via thermometers, not inferred temperatures from ice cores or other proxies.

richardscourtney
January 29, 2015 11:39 pm

Brandon Shollenberger
Thankyou for your fine analysis and the informative discussion it has engendered.
There is an underlying issue which I think to be very important, and Zeke Hausfather explicity states it when he writes

Berkeley Earth is a compilation of surface temperature records via thermometers, not inferred temperatures from ice cores or other proxies.

Yes, there cannot be an “average temperature” of anything because temperature is an intrinsic property and, therefore, cannot provide a valid average.
Every determination of ‘average global temperature’ or ‘mean global temperature’ or etc. from GISS, HadCRU, UAH, etc. is merely “a compilation of surface temperature records” and is a function of the chosen compilation method.
Hence,
1.
Each of the global temperature time series has no known physical meaning.
2.
Each of the global temperature time series is a function of a unique compilation method.
3.
Each of the global temperature time series frequently alters its compilation method.
So, whatever ‘global temperature’ is, it is not a scientific indication of any stated physical parameter.
I assume you have seen Appendix B of this which considers these matters. It concludes

MGT time series are often used to address the question,
“Is the average temperature of the Earth’s surface increasing or decreasing, and at what rate?”
If MGT is considered to be a physical parameter that is measured then these data sets cannot give a valid answer to this question, because they contain errors of unknown magnitude that are generated by the imperfect compensation models.

and

To treat the MGT as an indicative statistic has serious implications. The different teams each provide a data set termed mean global temperature, MGT. But if the teams are each monitoring different climate effects then each should provide a unique title for their data set that is indicative of what is being monitored. Also, each team should state explicitly what its data set of MGT purports to be monitoring. The data sets of MGT cannot address the question “Is the average temperature of the Earth’s surface increasing or decreasing, and at what rate?” until the climate effects they are monitoring are explicitly stated and understood. Finally, the application of any of these data sets in attribution studies needs to be revised in the light of knowledge of what each data set is monitoring.

Richard

richard verney
Reply to  richardscourtney
January 30, 2015 1:43 am

“Yes, there cannot be an “average temperature” of anything because temperature is an intrinsic property and, therefore, cannot provide a valid average.”
/////////////////////
Quite so.
The land based thermometer record provides no meaningful insight into anything, and should have been ditched along time ago.
Given the small area of land and the low thermal capacity of the atmosphere, the only data relevant to global warming is OHC.
The problem is that (presently) there is no worthwhile data of OHC. Nothing pre ARGO is robust, and ARGO is insufficient duration, lacks spatial coverage, and no attempt has been made to assess what, if any, bias is inherent in the system (caused by the free floating nature of the buoys that ride currents which currents possess a distinct temperature profile differening from ajacent waters and by the lack of spatial coverage itself).

DD More
Reply to  richard verney
January 30, 2015 12:08 pm

Would like to point out that temperature does not always equate to heat. A quick quiz on temperatures – Knowing that at no time is the solar intensity greater the further north above the Tropic of Cancer, which of following states, Alabama, Florida, Minnesota, Montana, North Dakota and South Dakota, have the two highest Extreme High Temperatures on record and have the two lowest Extreme High Temperatures on record?
Did you get Alabama & Florida and North & South Dakota? For a bonus, did you get that Alabama & Florida was the lowest and the Dakota’s were the highest?
The difference is in humidity. Dry air gets to a higher temperature than wet air with the same energy.
Alabama 112 F Sept. 5, 1925 Centerville
Florida 109 F June 29, 1931 Monticello
Minnesota 115 F July 29, 1917 Beardsley
Montana 117 F July 5, 1937 Medicine Lake
North Dakota 121F July 6, 1936 Steele
South Dakota 120 F July 5, 1936 Gann Valley
http://ggweather.com/climate/extremes_us.htm
R.V., your point is made.

Reply to  richardscourtney
January 30, 2015 10:13 pm

Probably why ISO doesn’t define MGT.
http://www.iso.org/iso/home/standards.htm

Phil
January 30, 2015 1:30 am

I read the Booker article and took a look at the reference here. The article looks at three stations in Paraguay: Puerto Casados, Mariscal Estigarribia and San Juan Bautista Misiones.
Mariscal Estigarribia and San Juan Bautista Misiones appear to be airports. The weather station of Mariscal appears to be located at -22.045186, -60.627443 and that of San Juan may be at -26.636024, -57.103306 (in Google Maps). I found a list of weather stations for Paraguay here. I found another list of weather stations for Paraguay (code: PY) as part of a new dataset at the Met Office here. The new dataset is described in HadISD: A quality controlled global synoptic report database for selected variables at long-term stations from 1973-2011 (Dunn et al. 2012). Actually the database now extends to 2013 and individual station files can be found here. The code for Mariscal is 860680 and that for San Juan is 862600.
I used 7-Zip to decompress the downloaded files (ext. .qz). The resulting files are netCDF files which can be read with various utilities. I used Panoply for Windows 7. After decompressing, it runs directly from the unzipped folder (click on Panoply.exe) under Java Run Time Environment. If you don’t have it, search for Java SE Run Time Environment (not the browser plugin) and install it first. I used Java SE 7u76.
Although the Met Office website does not mention it, the netCDF files contain both the raw temperatures and change points calculated (if I understood it correctly) according to Menne and Williams 2009. Dunn et al 2012 does not even mention change points, so finding the documentation was difficult. I believe Pairwise homogeneity assessment of HadISD (Dunn et al. 2014) describes what they did.
Interestingly, they have apparently NOT done any adjustments to the data. Both Mariscal and San Juan appear to be airports, so there is quasi-hourly data in the Met Office files. The files have been “quality controlled,” but apparently not change point adjusted, if I understood it correctly.
With Panoply, it is possible to quickly plot graphs of the raw temps and the change points (with and without interpolation). You can export the temp data from Panoply as “labeled text” and then import it into a spreadsheet. The time is in hours since midnight on 1 Jan 1973, so extracting the time and date is a bit of a bother. I used the @slope function in Quattro Pro X5 to calculate the slope of a simple linear regression on the data without adjustments. I did it with and without simple linear interpolation for missing data. Mariscal had a lot of gaps, but San Juan had few. Hopefully, I didn’t goof up too badly.
The slope for San Juan was:
-0.000008465716398 (no interpolation) and
-0.000009930963906 (interpolating).
The slope for Mariscal was:
-0.000008738121812 (no interpolation) and
-0.000015694874005 (interpolating).
Compare that with the GISS slopes shown by Paul Homewood.
Plots for San Juan:
Change Points Image:
http://i60.tinypic.com/vncpp4.png
Temps image:
http://i62.tinypic.com/2dshh1x.png
Plots for Mariscal:
Change Points Image:
http://i58.tinypic.com/iny05l.png
Temps Image:
http://i60.tinypic.com/2ni8vpf.png
I am having a hard time understanding the validity of the change points algorithm. Both San Juan and Mariscal appear to show negligible climate change over about 40 years, beginning with cooling in the 1970s, even though they are airports. Being airports, there is quasi-hourly data and some reliance on good maintenance and calibration of the instruments, since the data was needed for aviation. Panoply and this new data set may permit a visual comparison of the change point algorithm with raw data. Unfortunately, it would be interesting to see what “adjustments” the change point algorithm would do, but that does not yet seem to be available.

Sven
January 30, 2015 1:39 am

Quite typical of any climate debate. Zeke (and then Mosher) gave a reason for sthe step change (and prior to that just arrogantly, without any real argument, stating plainly that Brandon is wrong) that Brandon quite convincingly refutes (no data prior to 1960. Or was it 1955? oh wait, maybe 1950? or maybe it just wasn’t used by BEST?) and then they just disappear. In a real and honest debate where the aim would be to get better knowledge, mistakes would be aknowledged (as, conditionally (“if it turns out it was a mistake”), Brandon did) and taken into account for moving forward…

Reply to  Sven
January 30, 2015 1:57 am

Sven, it is remarkable what responses I got from them. Three different BEST members have publicly stated data doesn’t exist when it takes no more than two minutes to find that data. I was hoping we’d be able to move past that as this topic has resulted in progress. BEST has now acknowledged:

The comments are correct though that the use of a baseline (any baseline) may artificially reduce the size of the variance over the baseline period and increase the variance elsewhere. In our estimation, this effect represents about a +/- 10% perturbation to the apparent statistical uncertainties on the global land average.

Something it has never done before even though, according to BEST, they’ve known about this issue all along. They may have known about this issue all this time, but I can’t find any indication they’ve ever told anyone about it. I think that’s incredible, but I also think it’s good that now they have. Now we know more about how to interpret the things BEST publishes. For instance, BEST published a report discussing whether or not 2014 is the hottest year on record which presented their results with uncertainty levels they say are accurate to the thousandth of a degree. We now know that’s not true, and BEST has known that’s not true all along.
You have to wonder how many other issues there are with BEST’s methodology they know about but simply don’t disclose. I can think of at least two more. And yes, I know I could try to figure out what effects all the issues I know about have. The problem is I’d have to buy a new computer and let it run code for weeks to do so. I have no intention of spending $400+ just because the people at BEST have decided not to be open with and transparent about their work like they claim to be.

In a real and honest debate where the aim would be to get better knowledge, mistakes would be aknowledged (as, conditionally (“if it turns out it was a mistake”), Brandon did) and taken into account for moving forward…

Yes, but in a real and honest debate, I wouldn’t have to play dumb to trick BEST members into publicly admitting problems they’ve known about for years.

rooter
Reply to  Sven
January 30, 2015 1:58 am

Typical yes. Schollenberger was wrong. But cannot admit that. Then he just repeats being wrong.
No reason for Zeke and Mosher to continue from there.

richardscourtney
Reply to  rooter
January 30, 2015 2:58 am

rooter
As always, you display failure to understand what you are talking about.
Schollenberger is right and you cite nothing to support your assertion that he “was wrong”.
Richard

basicstats
January 30, 2015 2:34 am

Congratulations are in order to anyone who fully understands BEST’s methods. Their methods paper is not exactly transparent.
It does seem relevant however to stress BEST is not a simple interpolation. As far as I can make out, it is a nonlinear (the kriging part) regression fit of a simple ‘climate’ model. The latter consists of a stationary ‘climate’ term and a global temperature term. Local ‘weather’ is then added to this fitted model by adding the appropriate part of the function being minimized to get the regression fit. If this is right, the (squared) ‘weather field’ is actually being minimized in the BEST fit?
Whatever, the point is that there is no reason to think it will correspond closely to observed temperatures at any particular location. Studying strange goings on at particular locations (often cited in comments) is probably not that relevant to whether BEST works or not. Maybe, BEST needs to explain more clearly its somewhat abstract version of temperature. Although I can see that might cause problems too.

Reply to  basicstats
January 30, 2015 2:53 am

basicstats, you say:

Whatever, the point is that there is no reason to think it will correspond closely to observed temperatures at any particular location.

This is an interesting issue because it makes BEST’s actions incredibly strange. It’s easy to find BEST representatives explaining their work won’t get local details right, yet at the same time, they want to present their temperature results on an incredibly fine scale. Right now they publish results at a 1º x 1º scale. They’ve talked about plans to do it on a 1/4º x 1/4º scale. If BEST doesn’t want people to interpret their results on a local level, why do they encourage people to look at their results on a local level?
You can pull up results on the BEST website for individual cities. Why would BEST do that if they want people to understand BEST shouldn’t be used on local levels? It’s baffling. One side of their mouth tells people to look at small local details while the other side tells people they don’t have resolution finer than (at least) hundreds of kilometers.

Studying strange goings on at particular locations (often cited in comments) is probably not that relevant to whether BEST works or not. Maybe, BEST needs to explain more clearly its somewhat abstract version of temperature. Although I can see that might cause problems too.

I’m not sure how this came up since I haven’t seen anyone doing this, but I’d agree if we were talking only about small, local areas. We’re not though. We’re not even just talking about entire states. We’re talking about BEST changing cooling trends in areas (at least) half the size of Australia into warming trends. That’s 1/3rd the size of Europe.
I don’t think you can publish your results on a 1º x 1º scale then expect people to know the resolution of your results are, “Continental scale.”

Shub Niggurath
Reply to  basicstats
January 30, 2015 4:29 am

basicstats, Brandon, you both have exactly right. What records they produce for local stations literally have no meaning. They are entirely synthetic. I’ve said this before, and not in the context of BEST but it applies to them thoroughly. Regional climate must be inferred from weather measurements from a conglomerate of stations in the area, and not the other way around.

Reply to  Shub Niggurath
January 30, 2015 9:56 pm

+10

Sven
January 30, 2015 2:43 am

Brandon, Anthony, I think there should be an update to the post

Reply to  Sven
January 30, 2015 2:54 am

Sven, I agree. I e-mailed Anthony a little while ago, before I responded to you. I expect he’ll add an update because of it. It just may take a little time due to things like sleeping 😛

Joseph Murphy
January 30, 2015 4:39 am

Good thread, must re-read.

Glenn999
January 30, 2015 7:09 am

Why are we interested in global temperatures. Wouldn’t it be more practical to discuss local temps for each state (in the US) and discuss why those temps are moving up and down. For instance where I live in Florida, the climate here is different from both the Keys and the Panhandle. No averaging or homogenization could tell us anything here about the three zones.
Just my 2 cents or maybe it’s a shiny penny.
Thanks in advance to anyone who responds.

A C Osborn
Reply to  Glenn999
January 30, 2015 11:54 am

This is a point that seems to completely escape the likes of Steve Mosher, he stated on Climate etc and I quote that
“you tell me the altitude and latitude of a location and I will tell you the temperature. And I’ll be damn close as 93% of the variance is explained by these two factors.”
When I asked him for the temperature at 51.6N elevation 90-100ft within 1 degree the only response was “The error for a given month is around 1.6C”
Now there was no mention of it being monthly or average data, he just said Temperature.
So I have repeated the request for January.
You see I have already shown that BEST cannot handle Island Temperatures, so I was intrigued by his claim.
Yesterday I could see a difference in temperature between one side of the UK at that Lat/Elevation of 2 degrees C from +2 to +4
But if I then look across the Atlantic at St Anthony it was at -8 degrees C and on the other side of Canada it was anywhere from +4 to +10.
So on the same Lat/Eleveation we have a variation 18 degrees C.
If you look at what controls the climate in those areas you will understand why.

mebbe
Reply to  A C Osborn
January 30, 2015 1:38 pm

He meant to say; “Give me the name of the place and I’ll look up the stats.”
Vancouver and Saguenay same latitude and elevation.
January mean temperature Vancouver +4 Saguenay -14
The day you looked reflected the monthly average exactly!

Reply to  Glenn999
January 30, 2015 10:03 pm

Climate is local. From the OED Condition (of a region or country) in relation to prevailing atmospheric phenomena, as temperature, dryness or humidity, wind, clearness or dullness of sky, etc., esp. as these affect human, animal, or vegetable life. My crops respond to local variables, not global.
Current temperature inside my greenhouse at 4:30 pm (summertime) in Southern Tasmania 15.7°C. Temperature Castle Forbes Bay 11°C. Expected temperature for this time of year ~23°C. Estimated “Global Warming” minus 12°C.
Global Warming Fatigue = Lassitude or weariness resulting from repeated claims it’s the “hottest year evah”…

John Bills
January 30, 2015 11:37 am
Reply to  John Bills
January 30, 2015 2:49 pm

John Bills, that’s a nice find. I remember being shocked when a station I looked up had over 20 “empirical breakpoints” in small segments. Now it looks tame in comparison.

John Bills
Reply to  Brandon Shollenberger
January 31, 2015 2:14 am
January 30, 2015 12:34 pm

@basicstats 2:34 am
Trying to understand what they write about poorly is tough.
But the real challenge is to understand what they don’t write about.
The head post by Brandon rightfully criticizes BEST for uncertainty analysis on only one element of their processing. Uncertainty on the scalpel is not done. But the scalpel is BEST’s blunder.
Denver Stapleton Airport has 10 breakpoints, some only 4 years apart. Lulling, Tx has 20. As mentioned by Rud above, the single most expensive weather station at Admunson-Scott at the South Pole has been shamefully broken without justification. Yet their TOKYO is missing a breakpoint it must have if there is any validity to their methods:

While we are on the subject of the TOKYO station record and its relatively few breakpoints… It doesn’t have a breakpoint I expected. March 1945 should have generated one heckofa breakpoint and probable station move. BEST doesn’t show one. BEST can tease out of the data 20 station moves and breakpoints for Lulling, TX. But BEST somehow feels no break point is warranted on a day a quarter million 100,000 people die in a city-wide firestorm.

Bill Illis
January 30, 2015 4:44 pm

Nobody believes the BEST temperature series anymore.
It started out as saying they were going to give us a true representation of temperature but then Robert Rohdes’ algorithm was designed to take out ALL the cooling breakpoints (even including the 90% false positive ones) and leave in all the warming breakpoints (including the 90% ones which were true positive warming breakpoints) and what we been left with is a Raw temperature series adjusted up by +1.5C. Even worse than the NCDC.

Verified by MonsterInsights