Guest essay by Bob Koss
Being an old retired guy with time on my hands, this summer I decided to find out just how well GHCN-Monthly follows their own methodology in regard to data collection. What I discovered is, they don’t. My remarks below relate strictly to the GHCN monthly unadjusted dataset on which their final adjusted dataset is based. At the end of this article are links to some verifications of what I discuss.
For those unfamiliar with the organizations involved, a few terms are defined.
The Global Historic Climate Network(GHCN), a part of the National Climatic Data Center(NCDC), is the repository other global temperature data analysts turn to for many of their data sources. Monthly Climatic Data of the World(MCDW) is also a part of NCDC and separately compiles a less extensive set of monthly data than GHCN. US Historic Climate Network(USHCN) is a network of stations completely within the continental US and are also part of NCDC. Met Office is a UK data source of stations, many of which overlap with other NCDC sources.
GHCN created a table of data sources, ranking them in order from low to high priority(quality). The highest priority data is to be used whenever multiple sources are available for the same station. This rule might as well not exist, since they don’t follow it. Evidently it is only a rule for PR purposes and not really necessary to follow.
Here is their description of that rule from the methodology paper linked near the end of this post.
[56]The data integration phase begins by assembling and
merging the various source level data sets. Although a single
datum may be provided by more than one source, only one
value is added to version 3 for any particular month. The
datum is selected based on availability and a hierarchical
process involving priority levels based on the reliability and
quality of the source. Data from sources considered to be of
higher quality and reliability are used preferentially over
other sources. Table 3 lists the sources, and their order of
assemblage (highest priority listed first). For example, if a
non-missing datum is present for the same date/location
from data source M (MCDW) and data source P (CLIMAT
bulletin), the datum from data source M will be placed in the
data set. The source from which each datum originated is
indicated in the version 3 data set by a source flag as shown
in the table. Daily reconstruction of the data set using this
method ensures that any changes made in the source data
sets get incorporated into GHCN-M while also allowing for
the reproduction of the version 3 data set by other institutions
or entities.
Table 3 mentioned in the above quote.
Table 3. Source Data Sets From Which GHCN-M Version 3 is
Constructed and Maintained
Priority Source Data Set Source Flag
1 Datzilla (Manual/Expert Assessment) Z
2 USHCN-M Version 2 U
3 World Weather Records W
4 KNMI Netherlands (DeBilt only) N
5 Colonial Era Archive J
6 MCDW (DSI 3500) M
7 MCDW quality controlled but not yet published C
8 UK Met Office CLIMAT K
9 CLIMAT bulletin P
10 GHCN-M Version 2 Ga
For any station incorporated from GHCN-M version 2 that had multiple
time series (“duplicates”) for mean temperature, the ‘G’ flag is replaced by
a number from 0 to 9 that corresponds to the particular duplicate in version
2 from which it originated. This number is the 12th digit in the version 2
station identifier.
Around June 6th, 2014 GHCN rolled back a higher quality source to a lower one by changing 2013 data from MCDW to Met Office data.(16000+ months of data) This resulted in numerous value changes and an increase in the amount of missing data. Those changes remained for over a month until I noticed while comparing my June 3rd file with one from early July. I inquired about the changes. Next day, July 10th, the higher quality source was re-inserted. I was told a couple days later, by one of the head GHCN team members, that it was “an unintentional processing problem that occurred with one of our ingest streams”. They did update their status.txt file, unsurprisingly in about as low-key a way as possible.
I find their reason unpersuasive. Why are they even touching 2013 data unless to over-write with a higher quality source? I wouldn’t expect them to still be streaming 2013 data, but have it always at hand and archived on site. They rebuild their dataset daily. What competent organization would not do a sanity check on their new build by running a simple data comparison to the previous dataset?
My latest query of about a week ago has to do with still using lower quality data at least as far back as 2001. For Australia between 2003-2013, 98% of their data is sourced to Met Office, but the higher quality MCDW has much of that data available. I don’t understand why they aren’t using the higher priority MCDW data. There are 2000-3000 pieces annually of Met Office data still being used since 2001, less than 1/3rd of it is related to Australia. Other countries in the database might also still be listed with inferior data simply because their data hasn’t been properly upgraded. A couple emails were exchanged, but no reason given, and no changes made. At this point I think it is questionable if GHCN will thoroughly investigate and upgrade to higher quality sources where appropriate. It will be a pleasant surprise if they do.
Below is a graphic example of how much difference the data source can make in the monthly temperature record. I’m not saying all stations have differences of such a magnitude, or that this shows the largest/smallest difference, or that all stations go in a similar direction. I haven’t checked, but wouldn’t be surprised if the differences tilted quite a bit in one direction.
Some digging in July led to finding the entire continent of Australia is devoid of data for September, October, November in 2011. They did have September, October data in v3.0 when it was superceded by v3.1 in early November 2011. v3.1 discarded October when it launched leaving only September intact. At some point in time since then they also discarded September. Emailed them about this on July 31st and a couple times since then. Latest is they are trying to get Met Office to re-transmit the data. MCDW has much of that data and since GHCN considers them a higher quality source than Met Office, I don’t understand why they aren’t using that instead.
Final example for today. October 2nd this year they deleted all the August data for the rest of the world(ROW) leaving only USHCN data in the database. They even deleted US station data not part of USHCN. Amazingly, they still managed to add ROW data for September during the deletion period. The August ROW data was missing until October 8th when they re-inserted it. Still don’t know why they deleted it. Mentioned it in an email about a week ago. No reason has been provided. The data deletion did increase the mean value of the remaining August data by 0.9C. Was there some announcement concerning global temperatures for summer or August during that period?
With such erratic data handling, the accuracy of their product is questionable.
This post is already long enough, so I’ll end here.
Reference links:
Free paper on GHCN v3 methodology. pg. 11 explains source priority and processing. http://onlinelibrary.wiley.com/doi/10.1029/2011JD016187/pdf
Daily issued data files along with status.txt, a readme, and other stuff.
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/
Published MCDW data by station(ends 2011).
ftp://ftp.ncdc.noaa.gov/pub/data/globaldatabank/monthly/stage2/mcdw/
Published MCDW data by month. Current to Aug 2014.
http://www1.ncdc.noaa.gov/pub/data/mcdw/
A compilation of annual data concerning the 2013 roll-back, October 2014 deletion,
and the missing Australian data in 2011.
http://goo.gl/UZ73YF
![33y3mvq[1]](https://wattsupwiththat.files.wordpress.com/2014/11/33y3mvq1.gif?resize=605%2C340)
On second thoughts, just maybe the surveillance and personal data acquisition agencies have a put in place quite strong checks and balances and strong overseer groups to keep a tight rein on the activities and accuracies of the actual personal data collection sections of their organisations.
If so that would point to the contrasts in climate science of the utter incompetency and error riddled and complete lack of credible data collection and processing standards which have been allowed to become the norm in what has become just another branch of the hubris laden, self promoting advocacy driven climate alarmist science.
And on the entire basis of this this error riddled science the world has expended close to a trillion dollars over the last decade in a totally futile attempt to stop or prevent the chimera of a man kind created catastrophic warming due to anthropogenic CO2, a CO2 induced warming for which no evidence has been provided or proof provided that it actually exists in the real world climate.
Except the increasingly recognised fact that the data behind all the claims of increasing global temperatures relies totally on incomplete, corrupted, constantly changing, irrelevant in many aspects and unchecked and unverified and suspected either inadvertently or perhaps even deliberately corrupted processing of data from organisations run by global warming activist scientists
“the answer is always statistically indistinguishable from X.” Just what is X, an imaginary number? Do I need 1 station or 100 or 200 to make an X? If I take all cities is that the same X as the X from all rural? When I have an X is the X for the Mid-west, the Artic or the whole world? If I run the same formula on the data tomorrow will last years X be same as yesterdays last years X. If not can we say X is not accurate but should be getting better all the time? How can we know? Good times…
The point about the X is perfectly legitimate. If there is an overall, predominant trend, you will see it in most any sample you grab.
This is the same concept I used when I posted the Wolfram Alpha strategy for checking the long-term temperature trend at any location you might want. With various specific records going back to 40, 60, 80, 100 years, nearly all sites show flat temp trends.
http://www.wolframalpha.com/
Enter “average temperature Istanbul [or Constantinople] past 80 years.”
Nary a Hockey Stick anywhere.
Reblogged this on Centinel2012 and commented:
This is a simple case of the Fox guarding the hen house.
The agenda of the politicians are supported by the agencies that they manage — would anyone in business or any place else where you were employer ever turn in a report that was not in support of the manager or owner of that business? I think not!
So to expect honest reporting from an agency of the government showing that things are not what the president wants shown are very very unlikely!
My grandparents said the same thing in the 30’s. They said it was getting hotter. And then it got colder. Only the history challenged take today’s weather and think humans are to blame for this current weather pattern variation. Ground stations were NEVER meant to be exacting. They are ballpark sensors. They can tell us to wear a snow suit, not a bikini. But they can’t tell us that the temperature is .3 degrees colder or warmer than last year. And people who think sensors can do that must not have enough important sh** to do during daylight hours.
Well said, I have alway though when you come up with precision the exced you instrumentation with claim of the accuracy beyond that of you instrumentation, I alway think of the old movie The Music Man, my first question what BS are you trying to sell me.