The scientific method is at work on the USHCN temperature data set

Temperature is such a simple finite thing. It is amazing how complex people can make it.

commenter and friend of WUWT, ossqss at Judith Curry’s blog

Sometimes, you can believe you are entirely right while simultaneously believing that you’ve done due diligence. That’s what confirmation bias is all about. In this case, a whole bunch of people, including me, got a severe case of it.

I’m talking about the claim made by Steve Goddard that 40% of the USHCN data is “fabricated”. which I and few other people thought was clearly wrong.

Dr. Judith Curry and I have been conversing a lot via email over the past two days, and she has written an illuminating essay that explores the issue raised by Goddard and the sociology going on. See her essay:

http://judithcurry.com/2014/06/28/skeptical-of-skeptics-is-steve-goddard-right/

Steve Goddard aka Tony Heller deserves the credit for the initial finding, Paul Homewood deserves the credit for taking the finding and establishing it in a more comprehensible

way that opened closed eyes, including mine, in this post entitled Massive Temperature Adjustments At Luling, Texas.  Along with that is his latest followup, showing the problem isn’t limited to Texas, but also in Kansas. And there’s more about this below.

Goddard early on (June 2) gave me his source code that made his graph, but I

couldn’t get it to compile and run. That’s probably more my fault than his, as I’m not an expert in C++ computer language. Had I been able to, things might have gone differently. Then there was the fact that the problem Goddard noted doesn’t show up in GHCN data and I didn’t see the problem in any of the data we had for our USHCN surface stations analysis.

But, the thing that really put up a wall for me was this moment on June 1st, shortly after getting Goddard’s first email with his finding, which I pointed out in On ‘denying’ Hockey Sticks, USHCN data, and all that – part 1.

Goddard initially claimed 40% of the STATIONS were missing, which I said right away was not possible. It raised my hackles, and prompted my “you need to do better” statement. Then he switched the text in his post from stations to data while I was away for a couple of hours at my daughter’s music recital. When I returned, I noted the change, with no note of the change on his post, and that is what really put up the wall for me. He probably looked at it like he was just fixing a typo, I looked at it like it was sweeping an important distinction under the rug.

Then there was my personal bias over previous episodes where Goddard had made what I considered grievous errors, and refused to admit to them. There was the claim of CO2 freezing out of the air in Antarctica episode, later shown to be impossible by an experiment and the GISStimating 1998 episode,  and the comment where when the old data is checked and it is clear Goddard/Heller’s claim doesn’t hold up.

And then just over a month ago there was Goddard’s first hockey stick shape in the USHCN data set, which turned out to be nothing but an artifact.

All of that added up to a big heap of confirmation bias, I was so used to Goddard being wrong, I expected it again, but this time Steve Goddard was right and my confirmation bias prevented me from seeing that there was in fact a real issue in the data and that NCDC has dead stations that are reporting data that isn’t real: mea culpa.

But, that’s the same problem many climate scientists have, they are used to some skeptics being wrong on some issues, so they put up a wall. That is why the careful and exacting analyses we see from Steve McIntyre should be a model for us all. We have to “do better” to make sure that claims we make are credible, documented, phrased in non-inflammatory language, understandable, and most importantly, right.

Otherwise, walls go up, confirmation bias sets in.

Now that the wall is down, NCDC won’t be able to ignore this, even John Nielsen-Gammon, who was critical of Goddard along with me in the Polifact story now says there is a real problem. So does Zeke, and we have all sent or forwarded email to NCDC advising them of it.

I’ve also been on the phone Friday with the assistant director of NCDC and chief scientist (Tom Peterson), and also with the person in charge of USHCN (Matt Menne). Both were quality, professional conversations, and both thanked me for bringing it to their attention.  There is lots of email flying back and forth too.

They are taking this seriously, they have to, as final data as currently presented for USHCN is clearly wrong. John Neilsen-Gammon sent me a cursory analysis for Texas USHCN stations, noting he found a number of stations that had “estimated” data in place of actual good data that NCDC has in hand, and appears in the RAW USHCN data file on their FTP site

From:John Nielsen-Gammon

Sent: Friday, June 27, 2014 9:27 AM

To: Anthony

Subject: Re: USHCN station at Luling Texas

 Anthony –
   I just did a check of all Texas USHCN stations.  Thirteen had estimates in place of apparently good data.
410174 Estimated May 2008 thru June 2009
410498 Estimated since Oct 2011
410639 Estimated since July 2012 (exc Feb-Mar 2012, Nov 2012, Mar 2013, and May 2013)
410902 Estimated since Aug 2013
411048 Estimated July 2012 thru Feb 2014
412906 Estimated since Jan 2013
413240 Estimated since March 2013
413280 Estimated since Oct 2012
415018 Estimated since April 2010, defunct since Dec 2012
415429 Estimated since May 2013
416276 Estimated since Nov 2012
417945 Estimated since May 2013
418201Estimated since April 2013 (exc Dec 2013).

What is going on is that the USHCN code is that while the RAW data file has the actual measurements, for some reason the final data they publish doesn’t get the memo that good data is actually present for these stations, so it “infills” it with estimated data using data from surrounding stations. It’s a bug, a big one. And as Zeke did a cursory analysis Thursday night, he discovered it was systemic to the entire record, and up to 10% of stations have “estimated” data spanning over a century:

Analysis by Zeke Hausfather

Analysis by Zeke Hausfather

And here is the real kicker, “Zombie weather stations” exist in the USHCN final data set that are still generating data, even though they have been closed.

Remember Marysville, CA, the poster child for bad station siting? It was the station that gave me my “light bulb moment” on the issue of station siting. Here is a photo I took in May 2007:

marysville_badsiting[1]

It was closed just a couple of months after I introduced it to the world as the prime example of “How not to measure temperature”. The MMTS sensor was in a parking lot, with hot air from a/c units from the nearby electronics sheds for the cell phone tower:

MarysvilleCA_USHCN_Site_small

Guess what? Like Luling, TX, which is still open, but getting estimated data in place of the actual data in the final USHCN data file, even though it was marked closed in 2007 by NOAA’s own metadata, Marysville is still producing estimated monthly data, marked with an “E” flag:

USH00045385 2006  1034E    1156h    1036g    1501h    2166i    2601E 2905E    2494E    2314E    1741E    1298E     848i       0

USH00045385 2007   797c    1151E    1575i    1701E    2159E    2418E 2628E    2620E    2197E    1711E    1408E     846E       0

USH00045385 2008   836E    1064E    1386E    1610E    2146E    2508E 2686E    2658E    2383E    1906E    1427E     750E       0

USH00045385 2009   969E    1092E    1316E    1641E    2238E    2354E 2685E    2583E    2519E    1739E    1272E     809E       0

USH00045385 2010   951E    1190E    1302E    1379E    1746E    2401E 2617E    2427E    2340E    1904E    1255E    1073E       0

USH00045385 2011   831E     991E    1228E    1565E    1792E    2223E 2558E    2536E    2511E    1853E    1161E     867E       0

USH00045385 2012   978E    1161E    1229E    1646E    2147E    2387E 2597E    2660E    2454E    1931E    1383E     928E       0

USH00045385 2013   820E    1062E    1494E    1864E    2199E    2480E 2759E    2568E    2286E    1807E    1396E     844E       0

USH00045385 2014  1188E    1247E    1553E    1777E    2245E 2526E   -9999    -9999    -9999    -9999    -9999    -9999

Source:  USHCN Final : ushcn.tavg.latest.FLs.52i.tar.gz

Compare to USHCN Raw : ushcn.tavg.latest.raw.tar.gz

In the USHCN V2.5 folder, the readme file describes the “E” flag as:

E = a monthly value could not be computed from daily data. The value is estimated using values from surrounding stations

There are quite a few “zombie weather stations” in the USHCN final dataset, possibly up to 25% out of the 1218 that is the total number of stations. In my conversations with NCDC on Friday, I’m told these were kept in and “reporting” as a policy decision to provide a “continuity” of data for scientific purposes. While there “might” be some justification for that sort of thinking, few people know about it there’s no disclaimer or caveat in the USHCN FTP folder at NCDC or in the readme file that describes this, they “hint” at it saying:

The composition of the network remains unchanged at 1218 stations

But that really isn’t true, as some USHCN stations out of the 1218 have been closed and are no longer reporting real data, but instead are reporting estimated data.

NCDC really should make this clear, and while it “might” be OK to produce a datafile that has estimated data in it, not everyone is going to understand what that means, and that the stations that have been long dead are producing estimated data. NCDC has failed in notifying the public, and even their colleagues of this. Even the Texas State Climatologist John Nielsen-Gammon didn’t know about these “zombie” stations until I showed him. If he had known, his opinion might have been different on the Goddard issue. When even professional people in your sphere of influence don’t know you are doing dead weather station data infills like this, you can be sure that your primary mission to provide useful data is FUBAR.

NCDC needs to step up and fix this along with other problems that have been identified.

And they are, I expect some sort of a statement, and possibly a correction next week. In the meantime, let’s let them do their work and go through their methodology. It will not be helpful to ANYONE if we start beating up the people at NCDC ahead of such a statement and/or correction.

I will be among the first, if not the first to know what they are doing to fix the issues, and as soon as I know, so will all of you. Patience and restraint is what we need at the moment. I believe they are making a good faith effort, but as you all know the government moves slowly, they have to get policy wonks to review documents and all that. So, we’ll likely hear something early next week.

These lapses in quality control and thinking that infilling estimated data for long dead weather stations is the sort of thing happens when the only people that you interact with are inside your sphere of influence. The “yeah that seems like a good idea” approval mumble probably resonated in that NCDC meeting, but it was a case of groupthink. Imagine The Wall Street Journal providing “estimated” stock values for long dead companies to provide “continuity” of their stock quotes page. Such a thing would boggle the mind and the SEC would have a cow, not to mention readers. Scams would erupt trying to sell stocks for these long dead companies; “It’s real, see its reporting value in the WSJ!”.

It often takes people outside of climate science to point out the problems they don’t see, and skeptics have been doing it for years. Today, we are doing it again.

For absolute clarity, I should point out that the RAW USHCN monthly datafile is NOT being infilled with estimated data, only the FINAL USHCN monthly datafile. But that is the one that many other metrics use, including NASA GISS, and it goes into the mix for things like the NCDC monthly State of the Climate Report.

While we won’t know until all of the data is corrected and new numbers run, this may affect some of the absolute temperature claims made on SOTC reports such as “warmest month ever” and 3rd warmest, etc. The magnitude of such shifts, if any, is unknown at this point. Long term trend will probably not be affected.

It may also affect our comparisons between raw and final adjusted USHCN data we have been doing for our paper, such as this one from our draft paper:

Watts_et_al_2012 Figure20 CONUS Compliant-NonC-NOAA

The exception is BEST, which starts with the raw daily data, but they might be getting tripped up into creating some “zombie stations” of their own by the NCDC metadata and resolution improvements to lat/lon. The USHCN station at Luling Texas is listed as having 7 station moves by BEST (note the red diamonds):

Luling-TX-BEST

But there really has only been two, and the station has been just like this since 1995, when it was converted to MMTS from a Stevenson Screen. Here is our survey image from 2009:

Luling_looking_north

Photo by surfacestations volunteer John Warren Slayton.

NCDC’s metadata only lists two station moves:

image

As you can see below, some improvements in lat/lon accuracy can look like a station move:

image

http://www.ncdc.noaa.gov/homr/#ncdcstnid=20024457&tab=LOCATIONS

image

http://www.ncdc.noaa.gov/homr/#ncdcstnid=20024457&tab=MISC

Thanks to Paul Homewood for the two images and links above. I’m sure Mr. Mosher will let us know if this issue affects BEST or not.

And there is yet another issue: The recent change of something called “climate divisions” to calculate the national and state temperatures.

Certified Consulting Meteorologist and Fellow of the AMS Joe D’Aleo writes in with this:

I had downloaded the Maine annual temperature plot from NCDC Climate at a Glance in 2013 for a talk. There was no statistically significant trend since 1895. Note the spike in 1913 following super blocking from Novarupta in Alaska (similar to the high latitude volcanoes in late 2000s which helped with the blocking and maritime influence that spiked 2010 as snow was gone by March with a steady northeast maritime Atlantic flow). 1913 was close to 46F. and the long term mean just over 41F.

 CAAG_Maine_before

Seemingly in a panic change late this frigid winter to NCDC, big changes occurred. I wanted to update the Maine plot for another talk and got this from NCDC CAAG. 

CAAG_maine_after

Note that 1913 was cooled nearly 5 degrees F and does not stand out. There is a warming of at least 3 degrees F since 1895 (they list 0.23/decade) and the new mean is close to 40F.

Does anybody know what the REAL temperature of Maine is/was/is supposed to be? I sure as hell don’t. I don’t think NCDC really does either.

In closing…

Besides moving toward a more accurate temperature record, the best thing about all this hoopla over the USHCN data set is the Polifact story where we have all these experts lined up (including me as the token skeptic) that stated without a doubt that Goddard was wrong and rated the claim “pants of fire”.

They’ll all be eating some crow, as will I, but now that I have Gavin for dinner company, I don’t really mind at all.

When the scientific method is at work, eventually, everybody eats crow. The trick is to be able to eat it and tell people that you are honestly enjoying it, because crow is so popular, it is on the science menu daily.

Advertisements

  Subscribe  
newest oldest most voted
Notify of
MattN

Wow.

Anthony there are more metadata sources for station moves than the one you point to.
So,
First off thanks for pointing out that we use RAW data and not adjusted data or zombie data
On station moves, we use ALL the data on station locations. not just one source
Here is the kicker. A station move will merley split the station record. NOT adjust it.
if you split a station where there is no actual move it has no effect.

Chewer

Apparently you missed the internal NCDC memos, otherwise you’d understand why they provide zero statements to the actual data used…

Did I read somewhere that IPCC has abandoned land based temperatures in favor of sea surface temperatures?

kbray in california

Very nice to see this.
Good save.

Why does this bother me?
“Long term trend will probably not be affected.”
Does it translate to:
Yes, we’ve been caught fudging the figures, but we will claim warming, and that it’s AGW.

It may also affect our comparisons between raw and final adjusted USHCN data we have been doing for our paper, such as this one from our draft paper:
Please, do NOT delay the publication of your paper. Just make it version 1 that compares your clean Class 1&2 stations with what NCDC claimed was the truth for the past score of years. It becomes a first and best estimate for how much NCDC has been getting it wrong.
Just add a Post Script to the paper that NCDC now admits a big bug in their process. Your paper strongly indicated something wasn’t right. You now have confirmation.

Stephen Richards

I’m happy to see that you have seen the “light” Anthony. Your episode with Steven G was wholly unacceptable. As I pointed out, your conversation and subsequent reporting of, with Steve Mc indicated that you had “guided’ SteveM to the wrong conclusion which reinforced your belief in what you were doing. SteveM creating an artifact and the other axes you had to grind with him are not a good excuse for a REAL scientist. I have found SteveGs work somewhat distastful but you have to set aside your feelings and bagage in order to see through the fog.
I have been a skeptic since the ’60s. I read all the works and books I could find on weather and climate and came to the conclusion that they were lying. It wasn’t until I got my degrees and entered a research establishment that I saw exactly what was going on.
I really appreciate your untireing work, your dedication to your blog and your team. Please do not salé your reputation or any other skeptics reputation in this way again.
Thanks Anthony
Stephen Richards Engineer, Physicist.

Stephen Richards

Does it translate to:
Yes, we’ve been caught fudging the figures, but we will claim warming, and that it’s AGW.
No it says do worry we will change our algorythms to ensure AGW remains in the record.
Just look at their UHI adjustments. Absolutely ridiculous if not incompetent.

Anything is possible

“All of that added up to a big heap of confirmation bias, I was so used to Goddard being wrong, I expected it again, but this time Steve Goddard was right and my confirmation bias prevented me from seeing that there was in fact a real issue in the data and that NCDC has dead stations that are reporting data that isn’t real: mea culpa.”
==========================================
Kudos to you, Sir.
And the best of British luck to you trying to sort this mess out.

Steve Case

In my Humble opinion, same thing is going on with sea level.

” Along with that is his latest followup, showing the problem isn’t limited to Texas”
But what was the problem in Texas? I did a post on Luling here. When you look at the local anomaly plots there is a very obvious inhomogeneity. The NOAA software detected this, and quarantined the data, exactly as it should. It then turned out, via comments of mesoman who had worked on this very site, that there was a faulty cable causing readings to be transmitted low, and this was fixed on Jan 2014.
So, you might say, good for the computer, it got it right, and Paul H was wrong. A bit of introspection from Paul Homewood and co re how they had been claiming malfeasance etc? But no, no analysis at all – instead they are on to the next “problem” in Kansas. And so the story goes – first we had problems in Texas, now in Kansas.
REPLY:
Despite what you think you can’t “estimate” the characteristics of temperature from effects of a faulty cable. In Lulings’s case, just throw out the data, don’t imagine you are smart enough to be able to predict the resistance changes that occur from rain, heat, humidity, dust, etc. as they affect it or the next lawnmower bangs into it. As you’ll note, the test “mesoman” did say the temperatures were fluctuating when he did his test to determine what was wrong. he said the data was unstable.
Can you predict what the temperature will be in a thermistor that has a faulty connection at any given moment? Can you predict what the low and high temperatures it will produce will be on any given day when compared to the vagaries of weather it experiences?
Is is patently absurd to try to salvage data from a faulty instrument, especially when you have one nearby also recording the temperature.
THROW OUT THE DATA – DON’T TRY TO FIX IT.
Imagine forensic science trying to get away with this stuff. I’m reminded of the famous line from The Green Mile The Shawshank Redemption “how can you be so obtuse?”.
-Anthony

As if by magic the corrected bug free data will show climate change is worse than we thought.

David Riser

Nicely done Mr. Goddard!
v/r,
David Riser

Keith

Well played Antony for admitting that at first he was wrong about the revelations made by “Steven Goddard”. Also fair play giving Paul Homewood credit. However, we should all be giving Steve Goddard credit. He has been pointing out this for ages on his blog and has taken a huge amount of stick from Antony, from Zeke Hausfather, Nik Stokes, posters at Judy Curry’s blog, alarmists of various hues, and many others along the way. Yet it appears people only agreed he had a point when a small part of his work was confirmed by Paul Homewood.

Nick Stokes please find a station with a faulty cable causing readings to be transmitted high

A C Osborn

Thank You for manning up and telling it how it is.
I know Steve’s overall attitude gets to some people, but he does do some great data mining work.
Best’s data “summaries” output is also completely biased, I have data for the UK that shows this.

It’s data collection, Anthony, but not as we know it.

What?! Steven Goddard’s post didn’t deal with the issue covered in this post. That there happened to be some problem in the data doesn’t mean Goddard’s post was accurate or correct.

REPLY:
That’s true and false. I said to him there was nothing wrong with the USHCN data, but in fact there is.
He said trends calculated from RAW data using absolutes shows a cooling since they 30’s but I didn’t cover that here. My position is that the raw data is so badly corrupted by siting and other issues as a whole, you can’t really use it as a whole for anything of value and the way he went about it created a false trend.
That’s why I said in part 2 that we still need to do spatial gridding at the very least, but more importantly, we need to get rid of this massive load of dead, dying, and compromised stations, and stop trying to fix them with statistical nuances, and just focus on the good ones, use the good data, toss the rest. – Anthony

let me put a sharper point on this.
The NCDC metadata is not raw metadata. Blindly believing in its accuracy is not something a skeptic
will do,
What to do instead?
What we do is consider all sources of metadata. Thats NCDC as well as any other data source that has
this station. From that we collect all station moves.
we use all the data to inform the best estimate. we dont just blindly trust the NCDC data.
after all… look what the post is about
in short be skeptical of everything. one cannot say Im skeptical of the temperature data, but I trust the metadata.
Finally, Anthony will recall when the NCDC cut off access to the metadata system. When they did that I FOIA them. The mails released indicated that they had little faith in their metadata.
so we look at everything, knowing that mathematically if we slice a station where there is NO discontinuity in the time series the answer will be the same as if we didnt slice it. In other words the method is insensitive to slicing where there is no discontinuity.

A C Osborn

Nick Stokes says: June 28, 2014 at 1:49 pm
Nick, have you read my responses to mesoman, that is not the only station with problems and the data provided by Zeke proves it.

dccowboy

LOL, C++ == Syrup of Ipecac Syntax.

A C Osborn

Brandon Shollenberger says: June 28, 2014 at 1:55 pm
B***Sh*t, it is exactly what is in his Posts, note plural.

John F. Hultquist

Thanks for this report. I’ve been reading Steven Goddard’s posts on this issue and the comments. A simple acquaintance with the material and data sets hasn’t been enough to follow all of it and a few of the comments have been more caustic than clarifying.
I haven’t read Judith Curry’s latest but will get there later today.
The current post here is quite clear for me but a complete novice would need a lot of background just to decipher the acronyms. If there are new readers I hope they will take some time doing this.
Good for all of you, especially Steven G., for sticking with this.

I have eaten some crow recently in my line of work. I thought there was a problem. There was. But it wasn’t hardware or software. It was a manufacturing defect.

Mike Singleton

Anthony,
Kudos over the public crow mastication. The feathers are usually the hardest to get down.

“Despite what you think you can’t “estimate” the characteristics of temperature from effects of a faulty cable. In Lulings’s case, just throw out the data, don’t imagine you are smart enough to be able to predict the resistance changes that occur from rain, heat, humidity, dust, etc. as they affect it or the next lawnmower bangs into it.”
Throw out the data? That’s exactly what they did. They replaced it with an estimate based on neighboring stations. Not on trying to repair the Luling data. In the NCDC method which uses absolute temperatures, you have to have an estimate for each station, otherwise you get into the Godard spike issues.
I notice that John N-G said there were 13 stations in Texas that have had to replace measured data in recent years, for various periods. I believe Texas has 188 stations in total.
REPLY: Great, you should be a legal adviser in court.
Judge: The Blood samples tainted! You: OK. THROW IT OUT AND REPLACE IT WITH SOME BLOOD from …THAT GUY, OVER THERE! NO, Wait, lets get blood from the nearest five guys that look like him and mix it together. Yeah that’s a reasonable estimate.
You can’t ever assume your estimates will model reality.
Again, how can you be so obtuse? – Anthony

One last one, so that people can understand the various data products.
First some definitions:
Raw data: raw data is that data that presents itself as un adjusted. That is there is no evidence to sugggest it has been changed by any processing step. Typically there will be an associated ADJUSTED file.
Adjusted data: Adjusted station data in every case I have looked at is MONTHLY data. To adjust data
the people in charge of it do a DISCRETE station by station adjustment. They may adjust for a station move by applying a lapse rate adjustment. This has error. They then may adjust it for TOBS. this has error. They then may adjust it for instrucment changes. This has error. They then may adjust it for station moves in lat/lon. This has errors.
So, what do we do differently.
1. We use all sources of data, BUT we start by using raw daily where that is available.
the big sources are Ghcn Daily ( 30K+ stations), Global summary of the day, GSOD.
This is the vast vast majority of all data.
2. Where a station doesnt have raw daily, we use raw monthly. These are typically older stations
prior to 1830s
Next we produce 3 daatesets
A) RAW. this is a compliation of all raw sources for the site.
B) “Expected” This is our best estimate of what a site WOULD HAVE RECORDED if it
did not move, did not have instrument changes, tobs changes etc. These ‘corrections’
are not calculated discretely. Rather a station and all its neighbors are considered.
A surface is generated that minimizes the error. Now that error may be due a station move
a faulty instrument, an air conditioner, a instrument switch.. These are not calculated from the bottom
UP, rather they are estimated from the TOP DOWN.
C) regional expectation. This is dependent on the gridding one selects.
From the readme. READ Carefully.
You want raw data. go ahead use it.
You want to know what the EXPECTED values are for any station, given ALL the information.
use that.
You want to know what a regional expectation is. use
each of these datasets has a different purpose. What do you want to do?
From the readme which youll skip
“% The “raw” values reflect the observations as originally ingested by
% the Berkeley Earth system from one or more originating archive(s).
% These “raw” values may reflect the merger of more than one temperature
% time series if multiple archives reported values for this location.
% Alongside the raw data we have also provided a flag indicating which
% values failed initial quality control checks. A further column
% dates at which the raw data may be subject to continuity “breaks”
% due to documented station moves (denoted “1”), prolonged measurement
% gaps (denoted “2”), documented time of observation changes (denoted “3”)
% and other empirically determined inhomogeneities (denoted “4”).
%
% In many cases, raw temperature data contains a number of artifacts,
% caused by issues such as typographical errors, instrumentation changes,
% station moves, and urban or agricultural development near the station.
% The Berkeley Earth analysis process attempts to identify and estimate
% the impact of various kinds of data quality problems by comparing each
% time series to neighboring series. At the end of the analysis process,
% the “adjusted” data is created as an estimate of what the weather at
% this location might have looked like after removing apparent biases.
% This “adjusted” data will generally to be free from quality control
% issues and be regionally homogeneous. Some users may find this
% “adjusted” data that attempts to remove apparent biases more
% suitable for their needs, while other users may prefer to work
% with raw values.
%
% Lastly, we have provided a “regional expectation” time series, based
% on the Berkeley Earth expected temperatures in the neighborhood of the
% station. This incorporates information from as many weather stations as
% are available for the local region surrounding this location. Note
% that the regional expectation may be a systematically a bit warmer or
% colder than the weather stations by a few degrees due to differences
% in mean elevation and other local characteristics.
%
% For each temperature time series, we have also included an “anomaly”
% time series that removes both the seasonality and the long-term mean.
% These anomalies may provide an easier way of seeing changes through
% time.

For those of us who have been reading Steven Goddard’s blog for some time now, we have seen case, after case, after case of blatant data tampering. But the real “tell” is that the government data sets always lower the past temps and warm the present. ALWAYS.
There is no way to honestly explain that fact. No honest way. I see no way to ever trust the government data sets and I don’t really believe that the past records (the original raw data) is really available anymore.

The exception is BEST, which starts with the raw daily data, but they might be getting tripped up into creating some “zombie stations” of their own by the NCDC metadata and resolution improvements to lat/lon. The USHCN station at Luling Texas is listed as having 7 station moves by BEST (note the red diamonds):
BEST and its supporters suffer from their own confirmation bias in several ways.
Who can look at Lulling, TX and not conclude that the scalpel is being wielded by Jack the Ripper. Either the scalpel process is wrong or the station is so corrupted that it should be eliminated it cannot be saved by any surgeon.
There is some theoretical justification for using a scalpel to split temperature station records at known moves and replacement of equipment. I accept that. But I and others have argued that instrument drift is a significant part of the measured record. You can only split the record if you measured the drift in the first instrument at the time you took it off line. This is a necessary recalibration event and important information that BEST discards. You may know that an MMTS reads 0.5 degrees warmer than a pristine Stevenson Screen, but in what condition is the Stevenson Screen at the time of replacement? We know that Stevenson screens studied have a tendency to warm with age. Unless you attempt to measure the drift of the instrument at the time of replacement, you should not split the station.
While there are theoretical justifications for splitting temperature records, the are more theoretical justification for NOT splitting. The primary justification in my opinion is the loss of low frequency information content by shortening segments and ignoring absolute values in preference for slope. This applies a band pass filter on all temperature records when a LOW pass filter is what should be applied.
Yes, watch the confirmation bias.
Has BEST ever justified their process because it agrees with NCDC results?

Thanks An-thony.
Regards.

@Mark Stoval (@MarkStoval) at 2:17 pm

For those of us who have been reading Steven Goddard’s blog for some time now, we have seen case, after case, after case of blatant data tampering.

Data tampering by whom? It reads as though Steve Goddard is the tamperer.
But I think you mean that Goddard has exposed data tampering by others.

omnologos says:
June 28, 2014 at 1:54 pm
> Nick Stokes please find a station with a faulty cable causing readings to be transmitted high
Why just Nick? There are lots of other people with cables here. And why just high? Low is wrong too!
In fact, the outdoor thermometer in my kitchen has a bad seal and rain water gets into the connection with the thermistor. Rain water is somewhat conductive, and that appears to reduce the resistance of the thermistor, which causes a low reading.
And why just temperature? I could go on and on about audio, video, Ethernet, SCSI, USB and many other cables types that have earned my scorn and repair (or destruction). The hot electrical outlet due to a loose wire was interesting too.

temp

That statement that the NCDC releases better included a complete list of all the “peer-reviewed” “science” that is not junk.
This event will require rewriting of a huge number of papers and they need to demand that it be done… and they need to follow up with all the so called “science” journals so they all know the papers based on this data are wrong and need to at the very least include a huge disclaimer about the issue.
If they don’t they are all but admitting this was planned and are only back tracking because they got caught and have moved into coverup mode.

temp

PS someone FOI the emails for this data change bet the dog will eat them real quick

John Slayton

In August of 09 I attempted to locate the site of the USHCN station 353095 in Fremont, Oregon. MMS had reported it closed on 19 April 1996. By dumb luck I happened on COOP station 353029 a few miles away in Fort Rock, closed in 1918, re-opened on 19 April 1996. The B-91s from both stations were signed by the same J. Wagner.
I assumed that Fort Rock was the continuation of Fremont. However I have never found indication that Fort Rock was ever included in USHCN. As I noted in an earlier thread this week, the USHCN station list generally shows when a record’s source has been changed from one station to another, as, for example when Caldwell, ID, is dropped and Nampa Sugar Station replaces it. I never found such a change noted for Fremont / Fort Rock.
So, prodded by the present discussion, I just looked at the station data for Fremont (353095), and I find to my astonishment that it continues to be reported as station 353095, with an E flag up to the end of 2013.

MattN

So this would be the second SIGNIFICANT issue with the US record found in the last decade. And WE’RE supposed to have the best record and know what we’re doing.

Joseph Bastardi

we have to stop circular firing squads. For instance, Goddard posts so much… so many times, that he may have things in error sometimes. But turf wars among us are like a bunch of theologians arguing over how many angels you can stick on the head of a needle. Really has nothing to do with the search for the truth.. in that case of something of a higher authority, in the case of co2, something that I think is as Bill Gray put it many years ago , a scam ( or was it hoax, will have to ask him) To our credit, we are so obsessed with right and wrong that we do argue over small things, and yes they do matter. But I have found things I did not believe before I now do. And vice versa. One thing that keeps getting clearer to me is the amount of time, treasure etc wasted on 1/100th of the GHG, .04% of the atmosphere which has 1/1000th the heat capacity of the ocean and next to the affects of the sun, oceans and stochastic events probably can not be measured outside the noise, is a giant red herring and meant to distract from a bigger agenda, which has nothing to do with our obsessions.
In the end Thoreau may sum all this up, if I remember it correctly. The sum of all our fictions add up to a joint reality

Anthony, I get this is relevant to what you said to Steven Goddard, but I’ve read his post, and he didn’t say anything about what you describe in his post. There isn’t any indication what you describe in this post caused the difference Goddard highlighted.
REPLY: the trends issue came in subsequent post, related to the first. But the difference between raw and estimated data in his graph is the issue I’m addressing. Not sure why you can’t see that. There are in fact places where no raw data exists, but estimated data does. Marysville, case in point – Anthony

famous line from The Green Mile “how can you be so obtuse”?
The Shawshank Redemption
but they are both from Steven King.

Sean P Chatterton

Being a total novice at this, I have to ask, knowing that the figures have been fudged, how do “we” know the raw data has not been tampered with?

A C Osborn

Steven Mosher says: June 28, 2014 at 2:13 pm
You keep boasting how good BEST is, their Summaries are just as biased as the problem exposed here and I have the proof for the UK at least.

JustAnotherPoster

a challenge to nick stokes et al.
Please find a single station that has been adjusted down in the last 10 years of weather history.
The process that bumped one station up should also cool some down.
That’s the challenge.

onlyme

Ric Werme says:
June 28, 2014 at 2:23 pm
All instrumentation and controls work I did, cable and sensor errors were designed to fail low or open.
Safety issue.
failures were not randomly high or low averaging to 0.
Perhaps that’s what Omnologos is pointing to?

harry

“How can you be so obtuse” was from The Shawshank Redemption.

Justthinkin

Now I know for sure why I keep coming to this sight. Mr.Watts,you have shown a level of integrity,honesty,and curiosity that is very rare in this day and age.I would be honoured to eat crow with you,however,seeing as I am in Northern Alberta,I’ll gladly share the duck I am having tonight.

“Wait, lets get blood from the nearest five guys that look like him and mix it together. Yeah that’s a reasonable estimate.”
They are computing a spatial average, based on stations. Infilling with neighboring data doesn’t change anything. It just, in the final sum, changes the weighting. The neighboring stations get a bit more weight to cover the area near Luling.
As I showed in the shaded plots, there is plenty of data in the region. It doesn’t depend on Luling. Using a neighbour-based estimate is just the way of getting the arithmetic to work properly. With anomalies you could just leave Luling out completely. With absolute values, you have to do something extra, so that the climatology of the omitted Luling doesn’t create Goddard spike type distortions. Estimating from neighbor values is the simplest way to do it properly.
REPLY: Oh Nick, puhlease. When 80% of your network is compromised by bad siting, what makes you think those neighboring stations have any data thats worth a damn? You are adjusting bad data with…drum roll….more bad data. And that’s why homogenization fails here. It’s perfectly mathematically legitimate, but its useless when the data you are using to adjust with is equally crappy or crappier than the data you want to “fix”.
The problem with climate science is they really have no handle on just how bad the surface network is. I do. Evan does, John N-G does. Even Mosher has a bit of a clue.
You can’t make a clean estimated signal out of a bunch of muddied signals, ever.
Now its well past your bedtime is Australia. Maybe that is why you aren’t thinking clearly -Anthony

Jeff D.

Anthony, Humble Pie helps cover the taste of crow in my personal experience. Your humility has been displayed for all, well done Sir.
Steve G, thank you for living by my personal Motto ” Question Freaking Everything”. 🙂

Anthony wrote:

Then there was my personal bias over previous episodes where Goddard had made what I considered grievous errors, and refused to admit to them. There was the claim of CO2 freezing out of the air in Antarctica episode, later shown to be impossible by an experiment and

As one of the principles in the CO2 “frost” brouhaha, that affair still leaves a bad taste in my mouth. I’m glad I have good company.
The ICCC in Las Vegas will be interesting. Maybe I’ll take a vow of silence.

At nick…. I keep asking. If the station was warm not cold. Would any of the Code picked this up.
Personally I think the code looks for just odd ‘cold’ readings and moves them up. I.e. The opposite of adjusting for UHI.
If you can’t get the maths to work, you can’t get the maths to work. Admit that. Estimating leaves you wide open…….