Temperature is such a simple finite thing. It is amazing how complex people can make it.
- commenter and friend of WUWT, ossqss at Judith Curry’s blog
Sometimes, you can believe you are entirely right while simultaneously believing that you’ve done due diligence. That’s what confirmation bias is all about. In this case, a whole bunch of people, including me, got a severe case of it.
I’m talking about the claim made by Steve Goddard that 40% of the USHCN data is “fabricated”. which I and few other people thought was clearly wrong.
Dr. Judith Curry and I have been conversing a lot via email over the past two days, and she has written an illuminating essay that explores the issue raised by Goddard and the sociology going on. See her essay:
Steve Goddard aka Tony Heller deserves the credit for the initial finding, Paul Homewood deserves the credit for taking the finding and establishing it in a more comprehensible
way that opened closed eyes, including mine, in this post entitled Massive Temperature Adjustments At Luling, Texas. Along with that is his latest followup, showing the problem isn’t limited to Texas, but also in Kansas. And there’s more about this below.
Goddard early on (June 2) gave me his source code that made his graph, but I
couldn’t get it to compile and run. That’s probably more my fault than his, as I’m not an expert in C++ computer language. Had I been able to, things might have gone differently. Then there was the fact that the problem Goddard noted doesn’t show up in GHCN data and I didn’t see the problem in any of the data we had for our USHCN surface stations analysis.
But, the thing that really put up a wall for me was this moment on June 1st, shortly after getting Goddard’s first email with his finding, which I pointed out in On ‘denying’ Hockey Sticks, USHCN data, and all that – part 1.
Goddard initially claimed 40% of the STATIONS were missing, which I said right away was not possible. It raised my hackles, and prompted my “you need to do better” statement. Then he switched the text in his post from stations to data while I was away for a couple of hours at my daughter’s music recital. When I returned, I noted the change, with no note of the change on his post, and that is what really put up the wall for me. He probably looked at it like he was just fixing a typo, I looked at it like it was sweeping an important distinction under the rug.
Then there was my personal bias over previous episodes where Goddard had made what I considered grievous errors, and refused to admit to them. There was the claim of CO2 freezing out of the air in Antarctica episode, later shown to be impossible by an experiment and the GISStimating 1998 episode, and the comment where when the old data is checked and it is clear Goddard/Heller’s claim doesn’t hold up.
And then just over a month ago there was Goddard’s first hockey stick shape in the USHCN data set, which turned out to be nothing but an artifact.
All of that added up to a big heap of confirmation bias, I was so used to Goddard being wrong, I expected it again, but this time Steve Goddard was right and my confirmation bias prevented me from seeing that there was in fact a real issue in the data and that NCDC has dead stations that are reporting data that isn’t real: mea culpa.
But, that’s the same problem many climate scientists have, they are used to some skeptics being wrong on some issues, so they put up a wall. That is why the careful and exacting analyses we see from Steve McIntyre should be a model for us all. We have to “do better” to make sure that claims we make are credible, documented, phrased in non-inflammatory language, understandable, and most importantly, right.
Otherwise, walls go up, confirmation bias sets in.
Now that the wall is down, NCDC won’t be able to ignore this, even John Nielsen-Gammon, who was critical of Goddard along with me in the Polifact story now says there is a real problem. So does Zeke, and we have all sent or forwarded email to NCDC advising them of it.
I’ve also been on the phone Friday with the assistant director of NCDC and chief scientist (Tom Peterson), and also with the person in charge of USHCN (Matt Menne). Both were quality, professional conversations, and both thanked me for bringing it to their attention. There is lots of email flying back and forth too.
They are taking this seriously, they have to, as final data as currently presented for USHCN is clearly wrong. John Neilsen-Gammon sent me a cursory analysis for Texas USHCN stations, noting he found a number of stations that had “estimated” data in place of actual good data that NCDC has in hand, and appears in the RAW USHCN data file on their FTP site
Sent: Friday, June 27, 2014 9:27 AM
Subject: Re: USHCN station at Luling Texas
I just did a check of all Texas USHCN stations. Thirteen had estimates in place of apparently good data.
410174 Estimated May 2008 thru June 2009
410498 Estimated since Oct 2011
410639 Estimated since July 2012 (exc Feb-Mar 2012, Nov 2012, Mar 2013, and May 2013)
410902 Estimated since Aug 2013
411048 Estimated July 2012 thru Feb 2014
412906 Estimated since Jan 2013
413240 Estimated since March 2013
413280 Estimated since Oct 2012
415018 Estimated since April 2010, defunct since Dec 2012
415429 Estimated since May 2013
416276 Estimated since Nov 2012
417945 Estimated since May 2013
418201Estimated since April 2013 (exc Dec 2013).
What is going on is that the USHCN code is that while the RAW data file has the actual measurements, for some reason the final data they publish doesn’t get the memo that good data is actually present for these stations, so it “infills” it with estimated data using data from surrounding stations. It’s a bug, a big one. And as Zeke did a cursory analysis Thursday night, he discovered it was systemic to the entire record, and up to 10% of stations have “estimated” data spanning over a century:
And here is the real kicker, “Zombie weather stations” exist in the USHCN final data set that are still generating data, even though they have been closed.
Remember Marysville, CA, the poster child for bad station siting? It was the station that gave me my “light bulb moment” on the issue of station siting. Here is a photo I took in May 2007:
It was closed just a couple of months after I introduced it to the world as the prime example of “How not to measure temperature”. The MMTS sensor was in a parking lot, with hot air from a/c units from the nearby electronics sheds for the cell phone tower:
Guess what? Like Luling, TX, which is still open, but getting estimated data in place of the actual data in the final USHCN data file, even though it was marked closed in 2007 by NOAA’s own metadata, Marysville is still producing estimated monthly data, marked with an “E” flag:
USH00045385 2006 1034E 1156h 1036g 1501h 2166i 2601E 2905E 2494E 2314E 1741E 1298E 848i 0 USH00045385 2007 797c 1151E 1575i 1701E 2159E 2418E 2628E 2620E 2197E 1711E 1408E 846E 0 USH00045385 2008 836E 1064E 1386E 1610E 2146E 2508E 2686E 2658E 2383E 1906E 1427E 750E 0 USH00045385 2009 969E 1092E 1316E 1641E 2238E 2354E 2685E 2583E 2519E 1739E 1272E 809E 0 USH00045385 2010 951E 1190E 1302E 1379E 1746E 2401E 2617E 2427E 2340E 1904E 1255E 1073E 0 USH00045385 2011 831E 991E 1228E 1565E 1792E 2223E 2558E 2536E 2511E 1853E 1161E 867E 0 USH00045385 2012 978E 1161E 1229E 1646E 2147E 2387E 2597E 2660E 2454E 1931E 1383E 928E 0 USH00045385 2013 820E 1062E 1494E 1864E 2199E 2480E 2759E 2568E 2286E 1807E 1396E 844E 0 USH00045385 2014 1188E 1247E 1553E 1777E 2245E 2526E -9999 -9999 -9999 -9999 -9999 -9999
In the USHCN V2.5 folder, the readme file describes the “E” flag as:
E = a monthly value could not be computed from daily data. The value is estimated using values from surrounding stations
There are quite a few “zombie weather stations” in the USHCN final dataset, possibly up to 25% out of the 1218 that is the total number of stations. In my conversations with NCDC on Friday, I’m told these were kept in and “reporting” as a policy decision to provide a “continuity” of data for scientific purposes. While there “might” be some justification for that sort of thinking, few people know about it there’s no disclaimer or caveat in the USHCN FTP folder at NCDC or in the readme file that describes this, they “hint” at it saying:
The composition of the network remains unchanged at 1218 stations
But that really isn’t true, as some USHCN stations out of the 1218 have been closed and are no longer reporting real data, but instead are reporting estimated data.
NCDC really should make this clear, and while it “might” be OK to produce a datafile that has estimated data in it, not everyone is going to understand what that means, and that the stations that have been long dead are producing estimated data. NCDC has failed in notifying the public, and even their colleagues of this. Even the Texas State Climatologist John Nielsen-Gammon didn’t know about these “zombie” stations until I showed him. If he had known, his opinion might have been different on the Goddard issue. When even professional people in your sphere of influence don’t know you are doing dead weather station data infills like this, you can be sure that your primary mission to provide useful data is FUBAR.
NCDC needs to step up and fix this along with other problems that have been identified.
And they are, I expect some sort of a statement, and possibly a correction next week. In the meantime, let’s let them do their work and go through their methodology. It will not be helpful to ANYONE if we start beating up the people at NCDC ahead of such a statement and/or correction.
I will be among the first, if not the first to know what they are doing to fix the issues, and as soon as I know, so will all of you. Patience and restraint is what we need at the moment. I believe they are making a good faith effort, but as you all know the government moves slowly, they have to get policy wonks to review documents and all that. So, we’ll likely hear something early next week.
These lapses in quality control and thinking that infilling estimated data for long dead weather stations is the sort of thing happens when the only people that you interact with are inside your sphere of influence. The “yeah that seems like a good idea” approval mumble probably resonated in that NCDC meeting, but it was a case of groupthink. Imagine The Wall Street Journal providing “estimated” stock values for long dead companies to provide “continuity” of their stock quotes page. Such a thing would boggle the mind and the SEC would have a cow, not to mention readers. Scams would erupt trying to sell stocks for these long dead companies; “It’s real, see its reporting value in the WSJ!”.
It often takes people outside of climate science to point out the problems they don’t see, and skeptics have been doing it for years. Today, we are doing it again.
For absolute clarity, I should point out that the RAW USHCN monthly datafile is NOT being infilled with estimated data, only the FINAL USHCN monthly datafile. But that is the one that many other metrics use, including NASA GISS, and it goes into the mix for things like the NCDC monthly State of the Climate Report.
While we won’t know until all of the data is corrected and new numbers run, this may affect some of the absolute temperature claims made on SOTC reports such as “warmest month ever” and 3rd warmest, etc. The magnitude of such shifts, if any, is unknown at this point. Long term trend will probably not be affected.
It may also affect our comparisons between raw and final adjusted USHCN data we have been doing for our paper, such as this one from our draft paper:
The exception is BEST, which starts with the raw daily data, but they might be getting tripped up into creating some “zombie stations” of their own by the NCDC metadata and resolution improvements to lat/lon. The USHCN station at Luling Texas is listed as having 7 station moves by BEST (note the red diamonds):
But there really has only been two, and the station has been just like this since 1995, when it was converted to MMTS from a Stevenson Screen. Here is our survey image from 2009:
NCDC’s metadata only lists two station moves:
As you can see below, some improvements in lat/lon accuracy can look like a station move:
Thanks to Paul Homewood for the two images and links above. I’m sure Mr. Mosher will let us know if this issue affects BEST or not.
And there is yet another issue: The recent change of something called “climate divisions” to calculate the national and state temperatures.
Certified Consulting Meteorologist and Fellow of the AMS Joe D’Aleo writes in with this:
I had downloaded the Maine annual temperature plot from NCDC Climate at a Glance in 2013 for a talk. There was no statistically significant trend since 1895. Note the spike in 1913 following super blocking from Novarupta in Alaska (similar to the high latitude volcanoes in late 2000s which helped with the blocking and maritime influence that spiked 2010 as snow was gone by March with a steady northeast maritime Atlantic flow). 1913 was close to 46F. and the long term mean just over 41F.
Seemingly in a panic change late this frigid winter to NCDC, big changes occurred. I wanted to update the Maine plot for another talk and got this from NCDC CAAG.
Note that 1913 was cooled nearly 5 degrees F and does not stand out. There is a warming of at least 3 degrees F since 1895 (they list 0.23/decade) and the new mean is close to 40F.
Does anybody know what the REAL temperature of Maine is/was/is supposed to be? I sure as hell don’t. I don’t think NCDC really does either.
Besides moving toward a more accurate temperature record, the best thing about all this hoopla over the USHCN data set is the Polifact story where we have all these experts lined up (including me as the token skeptic) that stated without a doubt that Goddard was wrong and rated the claim “pants of fire”.
They’ll all be eating some crow, as will I, but now that I have Gavin for dinner company, I don’t really mind at all.
When the scientific method is at work, eventually, everybody eats crow. The trick is to be able to eat it and tell people that you are honestly enjoying it, because crow is so popular, it is on the science menu daily.