Temperature is such a simple finite thing. It is amazing how complex people can make it.
– commenter and friend of WUWT, ossqss at Judith Curry’s blog
Sometimes, you can believe you are entirely right while simultaneously believing that you’ve done due diligence. That’s what confirmation bias is all about. In this case, a whole bunch of people, including me, got a severe case of it.
I’m talking about the claim made by Steve Goddard that 40% of the USHCN data is “fabricated”. which I and few other people thought was clearly wrong.
Dr. Judith Curry and I have been conversing a lot via email over the past two days, and she has written an illuminating essay that explores the issue raised by Goddard and the sociology going on. See her essay:
http://judithcurry.com/2014/06/28/skeptical-of-skeptics-is-steve-goddard-right/
Steve Goddard aka Tony Heller deserves the credit for the initial finding, Paul Homewood deserves the credit for taking the finding and establishing it in a more comprehensible
way that opened closed eyes, including mine, in this post entitled Massive Temperature Adjustments At Luling, Texas. Along with that is his latest followup, showing the problem isn’t limited to Texas, but also in Kansas. And there’s more about this below.
Goddard early on (June 2) gave me his source code that made his graph, but I
couldn’t get it to compile and run. That’s probably more my fault than his, as I’m not an expert in C++ computer language. Had I been able to, things might have gone differently. Then there was the fact that the problem Goddard noted doesn’t show up in GHCN data and I didn’t see the problem in any of the data we had for our USHCN surface stations analysis.
But, the thing that really put up a wall for me was this moment on June 1st, shortly after getting Goddard’s first email with his finding, which I pointed out in On ‘denying’ Hockey Sticks, USHCN data, and all that – part 1.
Goddard initially claimed 40% of the STATIONS were missing, which I said right away was not possible. It raised my hackles, and prompted my “you need to do better” statement. Then he switched the text in his post from stations to data while I was away for a couple of hours at my daughter’s music recital. When I returned, I noted the change, with no note of the change on his post, and that is what really put up the wall for me. He probably looked at it like he was just fixing a typo, I looked at it like it was sweeping an important distinction under the rug.
Then there was my personal bias over previous episodes where Goddard had made what I considered grievous errors, and refused to admit to them. There was the claim of CO2 freezing out of the air in Antarctica episode, later shown to be impossible by an experiment and the GISStimating 1998 episode, and the comment where when the old data is checked and it is clear Goddard/Heller’s claim doesn’t hold up.
And then just over a month ago there was Goddard’s first hockey stick shape in the USHCN data set, which turned out to be nothing but an artifact.
All of that added up to a big heap of confirmation bias, I was so used to Goddard being wrong, I expected it again, but this time Steve Goddard was right and my confirmation bias prevented me from seeing that there was in fact a real issue in the data and that NCDC has dead stations that are reporting data that isn’t real: mea culpa.
But, that’s the same problem many climate scientists have, they are used to some skeptics being wrong on some issues, so they put up a wall. That is why the careful and exacting analyses we see from Steve McIntyre should be a model for us all. We have to “do better” to make sure that claims we make are credible, documented, phrased in non-inflammatory language, understandable, and most importantly, right.
Otherwise, walls go up, confirmation bias sets in.
Now that the wall is down, NCDC won’t be able to ignore this, even John Nielsen-Gammon, who was critical of Goddard along with me in the Polifact story now says there is a real problem. So does Zeke, and we have all sent or forwarded email to NCDC advising them of it.
I’ve also been on the phone Friday with the assistant director of NCDC and chief scientist (Tom Peterson), and also with the person in charge of USHCN (Matt Menne). Both were quality, professional conversations, and both thanked me for bringing it to their attention. There is lots of email flying back and forth too.
They are taking this seriously, they have to, as final data as currently presented for USHCN is clearly wrong. John Neilsen-Gammon sent me a cursory analysis for Texas USHCN stations, noting he found a number of stations that had “estimated” data in place of actual good data that NCDC has in hand, and appears in the RAW USHCN data file on their FTP site
From:John Nielsen-Gammon Sent: Friday, June 27, 2014 9:27 AM To: Anthony Subject: Re: USHCN station at Luling Texas
Anthony –
I just did a check of all Texas USHCN stations. Thirteen had estimates in place of apparently good data.
410174 Estimated May 2008 thru June 2009
410498 Estimated since Oct 2011
410639 Estimated since July 2012 (exc Feb-Mar 2012, Nov 2012, Mar 2013, and May 2013)
410902 Estimated since Aug 2013
411048 Estimated July 2012 thru Feb 2014
412906 Estimated since Jan 2013
413240 Estimated since March 2013
413280 Estimated since Oct 2012
415018 Estimated since April 2010, defunct since Dec 2012
415429 Estimated since May 2013
416276 Estimated since Nov 2012
417945 Estimated since May 2013
418201Estimated since April 2013 (exc Dec 2013).
What is going on is that the USHCN code is that while the RAW data file has the actual measurements, for some reason the final data they publish doesn’t get the memo that good data is actually present for these stations, so it “infills” it with estimated data using data from surrounding stations. It’s a bug, a big one. And as Zeke did a cursory analysis Thursday night, he discovered it was systemic to the entire record, and up to 10% of stations have “estimated” data spanning over a century:

And here is the real kicker, “Zombie weather stations” exist in the USHCN final data set that are still generating data, even though they have been closed.
Remember Marysville, CA, the poster child for bad station siting? It was the station that gave me my “light bulb moment” on the issue of station siting. Here is a photo I took in May 2007:
It was closed just a couple of months after I introduced it to the world as the prime example of “How not to measure temperature”. The MMTS sensor was in a parking lot, with hot air from a/c units from the nearby electronics sheds for the cell phone tower:
Guess what? Like Luling, TX, which is still open, but getting estimated data in place of the actual data in the final USHCN data file, even though it was marked closed in 2007 by NOAA’s own metadata, Marysville is still producing estimated monthly data, marked with an “E” flag:
USH00045385 2006 1034E 1156h 1036g 1501h 2166i 2601E 2905E 2494E 2314E 1741E 1298E 848i 0 USH00045385 2007 797c 1151E 1575i 1701E 2159E 2418E 2628E 2620E 2197E 1711E 1408E 846E 0 USH00045385 2008 836E 1064E 1386E 1610E 2146E 2508E 2686E 2658E 2383E 1906E 1427E 750E 0 USH00045385 2009 969E 1092E 1316E 1641E 2238E 2354E 2685E 2583E 2519E 1739E 1272E 809E 0 USH00045385 2010 951E 1190E 1302E 1379E 1746E 2401E 2617E 2427E 2340E 1904E 1255E 1073E 0 USH00045385 2011 831E 991E 1228E 1565E 1792E 2223E 2558E 2536E 2511E 1853E 1161E 867E 0 USH00045385 2012 978E 1161E 1229E 1646E 2147E 2387E 2597E 2660E 2454E 1931E 1383E 928E 0 USH00045385 2013 820E 1062E 1494E 1864E 2199E 2480E 2759E 2568E 2286E 1807E 1396E 844E 0 USH00045385 2014 1188E 1247E 1553E 1777E 2245E 2526E -9999 -9999 -9999 -9999 -9999 -9999
Source: USHCN Final : ushcn.tavg.latest.FLs.52i.tar.gz
Compare to USHCN Raw : ushcn.tavg.latest.raw.tar.gz
In the USHCN V2.5 folder, the readme file describes the “E” flag as:
E = a monthly value could not be computed from daily data. The value is estimated using values from surrounding stations
There are quite a few “zombie weather stations” in the USHCN final dataset, possibly up to 25% out of the 1218 that is the total number of stations. In my conversations with NCDC on Friday, I’m told these were kept in and “reporting” as a policy decision to provide a “continuity” of data for scientific purposes. While there “might” be some justification for that sort of thinking, few people know about it there’s no disclaimer or caveat in the USHCN FTP folder at NCDC or in the readme file that describes this, they “hint” at it saying:
The composition of the network remains unchanged at 1218 stations
But that really isn’t true, as some USHCN stations out of the 1218 have been closed and are no longer reporting real data, but instead are reporting estimated data.
NCDC really should make this clear, and while it “might” be OK to produce a datafile that has estimated data in it, not everyone is going to understand what that means, and that the stations that have been long dead are producing estimated data. NCDC has failed in notifying the public, and even their colleagues of this. Even the Texas State Climatologist John Nielsen-Gammon didn’t know about these “zombie” stations until I showed him. If he had known, his opinion might have been different on the Goddard issue. When even professional people in your sphere of influence don’t know you are doing dead weather station data infills like this, you can be sure that your primary mission to provide useful data is FUBAR.
NCDC needs to step up and fix this along with other problems that have been identified.
And they are, I expect some sort of a statement, and possibly a correction next week. In the meantime, let’s let them do their work and go through their methodology. It will not be helpful to ANYONE if we start beating up the people at NCDC ahead of such a statement and/or correction.
I will be among the first, if not the first to know what they are doing to fix the issues, and as soon as I know, so will all of you. Patience and restraint is what we need at the moment. I believe they are making a good faith effort, but as you all know the government moves slowly, they have to get policy wonks to review documents and all that. So, we’ll likely hear something early next week.
These lapses in quality control and thinking that infilling estimated data for long dead weather stations is the sort of thing happens when the only people that you interact with are inside your sphere of influence. The “yeah that seems like a good idea” approval mumble probably resonated in that NCDC meeting, but it was a case of groupthink. Imagine The Wall Street Journal providing “estimated” stock values for long dead companies to provide “continuity” of their stock quotes page. Such a thing would boggle the mind and the SEC would have a cow, not to mention readers. Scams would erupt trying to sell stocks for these long dead companies; “It’s real, see its reporting value in the WSJ!”.
It often takes people outside of climate science to point out the problems they don’t see, and skeptics have been doing it for years. Today, we are doing it again.
For absolute clarity, I should point out that the RAW USHCN monthly datafile is NOT being infilled with estimated data, only the FINAL USHCN monthly datafile. But that is the one that many other metrics use, including NASA GISS, and it goes into the mix for things like the NCDC monthly State of the Climate Report.
While we won’t know until all of the data is corrected and new numbers run, this may affect some of the absolute temperature claims made on SOTC reports such as “warmest month ever” and 3rd warmest, etc. The magnitude of such shifts, if any, is unknown at this point. Long term trend will probably not be affected.
It may also affect our comparisons between raw and final adjusted USHCN data we have been doing for our paper, such as this one from our draft paper:
The exception is BEST, which starts with the raw daily data, but they might be getting tripped up into creating some “zombie stations” of their own by the NCDC metadata and resolution improvements to lat/lon. The USHCN station at Luling Texas is listed as having 7 station moves by BEST (note the red diamonds):
But there really has only been two, and the station has been just like this since 1995, when it was converted to MMTS from a Stevenson Screen. Here is our survey image from 2009:
Photo by surfacestations volunteer John Warren Slayton.
NCDC’s metadata only lists two station moves:
As you can see below, some improvements in lat/lon accuracy can look like a station move:
http://www.ncdc.noaa.gov/homr/#ncdcstnid=20024457&tab=LOCATIONS
http://www.ncdc.noaa.gov/homr/#ncdcstnid=20024457&tab=MISC
Thanks to Paul Homewood for the two images and links above. I’m sure Mr. Mosher will let us know if this issue affects BEST or not.
And there is yet another issue: The recent change of something called “climate divisions” to calculate the national and state temperatures.
Certified Consulting Meteorologist and Fellow of the AMS Joe D’Aleo writes in with this:
I had downloaded the Maine annual temperature plot from NCDC Climate at a Glance in 2013 for a talk. There was no statistically significant trend since 1895. Note the spike in 1913 following super blocking from Novarupta in Alaska (similar to the high latitude volcanoes in late 2000s which helped with the blocking and maritime influence that spiked 2010 as snow was gone by March with a steady northeast maritime Atlantic flow). 1913 was close to 46F. and the long term mean just over 41F.
Seemingly in a panic change late this frigid winter to NCDC, big changes occurred. I wanted to update the Maine plot for another talk and got this from NCDC CAAG.
Note that 1913 was cooled nearly 5 degrees F and does not stand out. There is a warming of at least 3 degrees F since 1895 (they list 0.23/decade) and the new mean is close to 40F.
Does anybody know what the REAL temperature of Maine is/was/is supposed to be? I sure as hell don’t. I don’t think NCDC really does either.
In closing…
Besides moving toward a more accurate temperature record, the best thing about all this hoopla over the USHCN data set is the Polifact story where we have all these experts lined up (including me as the token skeptic) that stated without a doubt that Goddard was wrong and rated the claim “pants of fire”.
They’ll all be eating some crow, as will I, but now that I have Gavin for dinner company, I don’t really mind at all.
When the scientific method is at work, eventually, everybody eats crow. The trick is to be able to eat it and tell people that you are honestly enjoying it, because crow is so popular, it is on the science menu daily.
![marysville_badsiting[1]](http://wattsupwiththat.files.wordpress.com/2014/06/marysville_badsiting1.jpg?resize=480%2C360&quality=83)






Wow.
Anthony there are more metadata sources for station moves than the one you point to.
So,
First off thanks for pointing out that we use RAW data and not adjusted data or zombie data
On station moves, we use ALL the data on station locations. not just one source
Here is the kicker. A station move will merley split the station record. NOT adjust it.
if you split a station where there is no actual move it has no effect.
Apparently you missed the internal NCDC memos, otherwise you’d understand why they provide zero statements to the actual data used…
Did I read somewhere that IPCC has abandoned land based temperatures in favor of sea surface temperatures?
Very nice to see this.
Good save.
Why does this bother me?
“Long term trend will probably not be affected.”
Does it translate to:
Yes, we’ve been caught fudging the figures, but we will claim warming, and that it’s AGW.
It may also affect our comparisons between raw and final adjusted USHCN data we have been doing for our paper, such as this one from our draft paper:
Please, do NOT delay the publication of your paper. Just make it version 1 that compares your clean Class 1&2 stations with what NCDC claimed was the truth for the past score of years. It becomes a first and best estimate for how much NCDC has been getting it wrong.
Just add a Post Script to the paper that NCDC now admits a big bug in their process. Your paper strongly indicated something wasn’t right. You now have confirmation.
I’m happy to see that you have seen the “light” Anthony. Your episode with Steven G was wholly unacceptable. As I pointed out, your conversation and subsequent reporting of, with Steve Mc indicated that you had “guided’ SteveM to the wrong conclusion which reinforced your belief in what you were doing. SteveM creating an artifact and the other axes you had to grind with him are not a good excuse for a REAL scientist. I have found SteveGs work somewhat distastful but you have to set aside your feelings and bagage in order to see through the fog.
I have been a skeptic since the ’60s. I read all the works and books I could find on weather and climate and came to the conclusion that they were lying. It wasn’t until I got my degrees and entered a research establishment that I saw exactly what was going on.
I really appreciate your untireing work, your dedication to your blog and your team. Please do not salé your reputation or any other skeptics reputation in this way again.
Thanks Anthony
Stephen Richards Engineer, Physicist.
Does it translate to:
Yes, we’ve been caught fudging the figures, but we will claim warming, and that it’s AGW.
No it says do worry we will change our algorythms to ensure AGW remains in the record.
Just look at their UHI adjustments. Absolutely ridiculous if not incompetent.
“All of that added up to a big heap of confirmation bias, I was so used to Goddard being wrong, I expected it again, but this time Steve Goddard was right and my confirmation bias prevented me from seeing that there was in fact a real issue in the data and that NCDC has dead stations that are reporting data that isn’t real: mea culpa.”
==========================================
Kudos to you, Sir.
And the best of British luck to you trying to sort this mess out.
In my Humble opinion, same thing is going on with sea level.
” Along with that is his latest followup, showing the problem isn’t limited to Texas”
But what was the problem in Texas? I did a post on Luling here. When you look at the local anomaly plots there is a very obvious inhomogeneity. The NOAA software detected this, and quarantined the data, exactly as it should. It then turned out, via comments of mesoman who had worked on this very site, that there was a faulty cable causing readings to be transmitted low, and this was fixed on Jan 2014.
So, you might say, good for the computer, it got it right, and Paul H was wrong. A bit of introspection from Paul Homewood and co re how they had been claiming malfeasance etc? But no, no analysis at all – instead they are on to the next “problem” in Kansas. And so the story goes – first we had problems in Texas, now in Kansas.
REPLY:
Despite what you think you can’t “estimate” the characteristics of temperature from effects of a faulty cable. In Lulings’s case, just throw out the data, don’t imagine you are smart enough to be able to predict the resistance changes that occur from rain, heat, humidity, dust, etc. as they affect it or the next lawnmower bangs into it. As you’ll note, the test “mesoman” did say the temperatures were fluctuating when he did his test to determine what was wrong. he said the data was unstable.
Can you predict what the temperature will be in a thermistor that has a faulty connection at any given moment? Can you predict what the low and high temperatures it will produce will be on any given day when compared to the vagaries of weather it experiences?
Is is patently absurd to try to salvage data from a faulty instrument, especially when you have one nearby also recording the temperature.
THROW OUT THE DATA – DON’T TRY TO FIX IT.
Imagine forensic science trying to get away with this stuff. I’m reminded of the famous line from
The Green MileThe Shawshank Redemption “how can you be so obtuse?”.-Anthony
As if by magic the corrected bug free data will show climate change is worse than we thought.
Nicely done Mr. Goddard!
v/r,
David Riser
Well played Antony for admitting that at first he was wrong about the revelations made by “Steven Goddard”. Also fair play giving Paul Homewood credit. However, we should all be giving Steve Goddard credit. He has been pointing out this for ages on his blog and has taken a huge amount of stick from Antony, from Zeke Hausfather, Nik Stokes, posters at Judy Curry’s blog, alarmists of various hues, and many others along the way. Yet it appears people only agreed he had a point when a small part of his work was confirmed by Paul Homewood.
Nick Stokes please find a station with a faulty cable causing readings to be transmitted high
Thank You for manning up and telling it how it is.
I know Steve’s overall attitude gets to some people, but he does do some great data mining work.
Best’s data “summaries” output is also completely biased, I have data for the UK that shows this.
It’s data collection, Anthony, but not as we know it.
What?! Steven Goddard’s post didn’t deal with the issue covered in this post. That there happened to be some problem in the data doesn’t mean Goddard’s post was accurate or correct.
REPLY: That’s true and false. I said to him there was nothing wrong with the USHCN data, but in fact there is.
He said trends calculated from RAW data using absolutes shows a cooling since they 30’s but I didn’t cover that here. My position is that the raw data is so badly corrupted by siting and other issues as a whole, you can’t really use it as a whole for anything of value and the way he went about it created a false trend.
That’s why I said in part 2 that we still need to do spatial gridding at the very least, but more importantly, we need to get rid of this massive load of dead, dying, and compromised stations, and stop trying to fix them with statistical nuances, and just focus on the good ones, use the good data, toss the rest. – Anthony
let me put a sharper point on this.
The NCDC metadata is not raw metadata. Blindly believing in its accuracy is not something a skeptic
will do,
What to do instead?
What we do is consider all sources of metadata. Thats NCDC as well as any other data source that has
this station. From that we collect all station moves.
we use all the data to inform the best estimate. we dont just blindly trust the NCDC data.
after all… look what the post is about
in short be skeptical of everything. one cannot say Im skeptical of the temperature data, but I trust the metadata.
Finally, Anthony will recall when the NCDC cut off access to the metadata system. When they did that I FOIA them. The mails released indicated that they had little faith in their metadata.
so we look at everything, knowing that mathematically if we slice a station where there is NO discontinuity in the time series the answer will be the same as if we didnt slice it. In other words the method is insensitive to slicing where there is no discontinuity.
Nick Stokes says: June 28, 2014 at 1:49 pm
Nick, have you read my responses to mesoman, that is not the only station with problems and the data provided by Zeke proves it.
LOL, C++ == Syrup of Ipecac Syntax.
Brandon Shollenberger says: June 28, 2014 at 1:55 pm
B***Sh*t, it is exactly what is in his Posts, note plural.
Thanks for this report. I’ve been reading Steven Goddard’s posts on this issue and the comments. A simple acquaintance with the material and data sets hasn’t been enough to follow all of it and a few of the comments have been more caustic than clarifying.
I haven’t read Judith Curry’s latest but will get there later today.
The current post here is quite clear for me but a complete novice would need a lot of background just to decipher the acronyms. If there are new readers I hope they will take some time doing this.
Good for all of you, especially Steven G., for sticking with this.
I have eaten some crow recently in my line of work. I thought there was a problem. There was. But it wasn’t hardware or software. It was a manufacturing defect.