Independent Review Discovers that NCDC Fumbles Data Handling in GHCN Climate Data

Guest essay by Bob Koss

Being an old retired guy with time on my hands, this summer I decided to find out just how well GHCN-Monthly follows their own methodology in regard to data collection. What I discovered is, they don’t. My remarks below relate strictly to the GHCN monthly unadjusted dataset on which their final adjusted dataset is based. At the end of this article are links to some verifications of what I discuss.

For those unfamiliar with the organizations involved, a few terms are defined.

The Global Historic Climate Network(GHCN), a part of the National Climatic Data Center(NCDC), is the repository other global temperature data analysts turn to for many of their data sources. Monthly Climatic Data of the World(MCDW) is also a part of NCDC and separately compiles a less extensive set of monthly data than GHCN. US Historic Climate Network(USHCN) is a network of stations completely within the continental US and are also part of NCDC. Met Office is a UK data source of stations, many of which overlap with other NCDC sources.

GHCN created a table of data sources, ranking them in order from low to high priority(quality). The highest priority data is to be used whenever multiple sources are available for the same station. This rule might as well not exist, since they don’t follow it. Evidently it is only a rule for PR purposes and not really necessary to follow.

Here is their description of that rule from the methodology paper linked near the end of this post.

[56]The data integration phase begins by assembling and

merging the various source level data sets. Although a single

datum may be provided by more than one source, only one

value is added to version 3 for any particular month. The

datum is selected based on availability and a hierarchical

process involving priority levels based on the reliability and

quality of the source. Data from sources considered to be of

higher quality and reliability are used preferentially over

other sources. Table 3 lists the sources, and their order of

assemblage (highest priority listed first). For example, if a

non-missing datum is present for the same date/location

from data source M (MCDW) and data source P (CLIMAT

bulletin), the datum from data source M will be placed in the

data set. The source from which each datum originated is

indicated in the version 3 data set by a source flag as shown

in the table. Daily reconstruction of the data set using this

method ensures that any changes made in the source data

sets get incorporated into GHCN-M while also allowing for

the reproduction of the version 3 data set by other institutions

or entities.

Table 3 mentioned in the above quote.

Table 3. Source Data Sets From Which GHCN-M Version 3 is

Constructed and Maintained

Priority Source Data Set Source Flag

1 Datzilla (Manual/Expert Assessment) Z

2 USHCN-M Version 2 U

3 World Weather Records W

4 KNMI Netherlands (DeBilt only) N

5 Colonial Era Archive J

6 MCDW (DSI 3500) M

7 MCDW quality controlled but not yet published C

8 UK Met Office CLIMAT K

9 CLIMAT bulletin P

10 GHCN-M Version 2 Ga

For any station incorporated from GHCN-M version 2 that had multiple

time series (“duplicates”) for mean temperature, the ‘G’ flag is replaced by

a number from 0 to 9 that corresponds to the particular duplicate in version

2 from which it originated. This number is the 12th digit in the version 2

station identifier.

Around June 6th, 2014 GHCN rolled back a higher quality source to a lower one by changing 2013 data from MCDW to Met Office data.(16000+ months of data) This resulted in numerous value changes and an increase in the amount of missing data. Those changes remained for over a month until I noticed while comparing my June 3rd file with one from early July. I inquired about the changes. Next day, July 10th, the higher quality source was re-inserted. I was told a couple days later, by one of the head GHCN team members, that it was “an unintentional processing problem that occurred with one of our ingest streams”. They did update their status.txt file, unsurprisingly in about as low-key a way as possible.

I find their reason unpersuasive. Why are they even touching 2013 data unless to over-write with a higher quality source? I wouldn’t expect them to still be streaming 2013 data, but have it always at hand and archived on site. They rebuild their dataset daily. What competent organization would not do a sanity check on their new build by running a simple data comparison to the previous dataset?

My latest query of about a week ago has to do with still using lower quality data at least as far back as 2001. For Australia between 2003-2013, 98% of their data is sourced to Met Office, but the higher quality MCDW has much of that data available. I don’t understand why they aren’t using the higher priority MCDW data. There are 2000-3000 pieces annually of Met Office data still being used since 2001, less than 1/3rd of it is related to Australia. Other countries in the database might also still be listed with inferior data simply because their data hasn’t been properly upgraded. A couple emails were exchanged, but no reason given, and no changes made. At this point I think it is questionable if GHCN will thoroughly investigate and upgrade to higher quality sources where appropriate. It will be a pleasant surprise if they do.

Below is a graphic example of how much difference the data source can make in the monthly temperature record. I’m not saying all stations have differences of such a magnitude, or that this shows the largest/smallest difference, or that all stations go in a similar direction. I haven’t checked, but wouldn’t be surprised if the differences tilted quite a bit in one direction.

Some digging in July led to finding the entire continent of Australia is devoid of data for September, October, November in 2011. They did have September, October data in v3.0 when it was superceded by v3.1 in early November 2011. v3.1 discarded October when it launched leaving only September intact. At some point in time since then they also discarded September. Emailed them about this on July 31st and a couple times since then. Latest is they are trying to get Met Office to re-transmit the data. MCDW has much of that data and since GHCN considers them a higher quality source than Met Office, I don’t understand why they aren’t using that instead.

Final example for today. October 2nd this year they deleted all the August data for the rest of the world(ROW) leaving only USHCN data in the database. They even deleted US station data not part of USHCN. Amazingly, they still managed to add ROW data for September during the deletion period. The August ROW data was missing until October 8th when they re-inserted it. Still don’t know why they deleted it. Mentioned it in an email about a week ago. No reason has been provided. The data deletion did increase the mean value of the remaining August data by 0.9C. Was there some announcement concerning global temperatures for summer or August during that period?

With such erratic data handling, the accuracy of their product is questionable.

This post is already long enough, so I’ll end here.

Reference links:

Free paper on GHCN v3 methodology. pg. 11 explains source priority and processing. http://onlinelibrary.wiley.com/doi/10.1029/2011JD016187/pdf

Daily issued data files along with status.txt, a readme, and other stuff.

ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/

Published MCDW data by station(ends 2011).

ftp://ftp.ncdc.noaa.gov/pub/data/globaldatabank/monthly/stage2/mcdw/

Published MCDW data by month. Current to Aug 2014.

http://www1.ncdc.noaa.gov/pub/data/mcdw/

A compilation of annual data concerning the 2013 roll-back, October 2014 deletion,

and the missing Australian data in 2011.

http://goo.gl/UZ73YF

0 0 votes

Article Rating

67 Comments

Inline Feedbacks

View all comments

ROM

November 3, 2014 3:53 pm

On second thoughts, just maybe the surveillance and personal data acquisition agencies have a put in place quite strong checks and balances and strong overseer groups to keep a tight rein on the activities and accuracies of the actual personal data collection sections of their organisations.
If so that would point to the contrasts in climate science of the utter incompetency and error riddled and complete lack of credible data collection and processing standards which have been allowed to become the norm in what has become just another branch of the hubris laden, self promoting advocacy driven climate alarmist science.
And on the entire basis of this this error riddled science the world has expended close to a trillion dollars over the last decade in a totally futile attempt to stop or prevent the chimera of a man kind created catastrophic warming due to anthropogenic CO2, a CO2 induced warming for which no evidence has been provided or proof provided that it actually exists in the real world climate.
Except the increasingly recognised fact that the data behind all the claims of increasing global temperatures relies totally on incomplete, corrupted, constantly changing, irrelevant in many aspects and unchecked and unverified and suspected either inadvertently or perhaps even deliberately corrupted processing of data from organisations run by global warming activist scientists

Andrew Krause

November 3, 2014 4:50 pm

“the answer is always statistically indistinguishable from X.” Just what is X, an imaginary number? Do I need 1 station or 100 or 200 to make an X? If I take all cities is that the same X as the X from all rural? When I have an X is the X for the Mid-west, the Artic or the whole world? If I run the same formula on the data tomorrow will last years X be same as yesterdays last years X. If not can we say X is not accurate but should be getting better all the time? How can we know? Good times…

TheLastDemocrat

Reply to Andrew Krause

November 4, 2014 5:47 am

The point about the X is perfectly legitimate. If there is an overall, predominant trend, you will see it in most any sample you grab.
This is the same concept I used when I posted the Wolfram Alpha strategy for checking the long-term temperature trend at any location you might want. With various specific records going back to 40, 60, 80, 100 years, nearly all sites show flat temp trends.
http://www.wolframalpha.com/
Enter “average temperature Istanbul [or Constantinople] past 80 years.”
Nary a Hockey Stick anywhere.

Centinel2012

November 3, 2014 6:20 pm

Reblogged this on Centinel2012 and commented:
This is a simple case of the Fox guarding the hen house.
The agenda of the politicians are supported by the agencies that they manage — would anyone in business or any place else where you were employer ever turn in a report that was not in support of the manager or owner of that business? I think not!
So to expect honest reporting from an agency of the government showing that things are not what the president wants shown are very very unlikely!

Pamela Gray

November 3, 2014 7:28 pm

My grandparents said the same thing in the 30’s. They said it was getting hotter. And then it got colder. Only the history challenged take today’s weather and think humans are to blame for this current weather pattern variation. Ground stations were NEVER meant to be exacting. They are ballpark sensors. They can tell us to wear a snow suit, not a bikini. But they can’t tell us that the temperature is .3 degrees colder or warmer than last year. And people who think sensors can do that must not have enough important sh** to do during daylight hours.

Mark Luhman

Reply to Pamela Gray

November 3, 2014 8:57 pm

Well said, I have alway though when you come up with precision the exced you instrumentation with claim of the accuracy beyond that of you instrumentation, I alway think of the old movie The Music Man, my first question what BS are you trying to sell me.

« Previous 1 2

wpDiscuz

Welcome to Watts Up With That, one of the most well-known climate blogs! We gather the latest scientific research, news, and expert opinion to help you understand how our planet is changing and what implications it may have for humanity. Our approach is based on facts, objective analysis, and open discussions about one of the most critical issues of our time. Watts up with that climate and what changes await us – let’s figure it out together!

Watts Up With That covers a wide range of topics related to climate change and its impact on the world. Here’s what’s important to us:

Global warming – its causes, consequences, and future forecasts.
Analysis of current climate research and its findings.
Climate change news.
Extreme weather events – hurricanes, droughts, floods, and their connection to climate change.
The impact of different energy sources on the environment and the development of sustainable technologies.
Political and economic aspects and how states and international organizations respond to climate change.

Watts Up With That?

Independent Review Discovers that NCDC Fumbles Data Handling in GHCN Climate Data

Like this:

The climate data they don’t want you to find — free, to your inbox.

Share this:

Like this:

Related Posts

New Temperature Study in Reno Finds Strong Urban Heat Island Bias at Official Climate Station

How Did Last Month’s (UK) Rainfall Compare With 1929?

Met Office’s N Ireland Rainfall Dataset Is Worthless

BBC’s Fake Record Rainfall Claims