"Gross" Data Errors in GHCN V2. for Australia

UPDATE: 11/11/10 An errata has been posted, see the end of this essay – Anthony

Port Hedland, WA BoM office

Guest post by Ed Thurstan of Sydney, Australia

Synopsis

This study shows that the NOAA maintained GHCN V2 database contains errors in calculating a Mean temperature from a Maximum and a Minimum. 144 years of data from 36 Australian stations are affected.

Means are published when the underlying Maximums and/or Minimums have been rejected.

Analysis

The Australian Bureau of Meteorology (BOM) provides NOAA with “entirely raw instrumental data via the Global Telecommunications System”. In the process of comparing BOM Max and Min outputs with NOAA “Raw” inputs, some oddities were noticed.

A database of Australian data (Country 501) was set up for each of GHCN V2.Max, V2.Mean, V2.Min. Each record consists of WMO Station ID, Modifier, Dup, Year, then 12 months of data Jan-Dec.

“Modifier” and “Dup” are codes which allow inclusion of multiple sets of data for the same station, or what appears to be the same station. This data is included rather than losing it in case it may be useful to someone. For this exercise, Modifier=0 and Dup=0 was selected.

Only those stations and years where all 12 months of data are present were selected. This results in about 14,000 station-years of monthly data being compared.

A compound key of Station ID concatenated with year was set up.

From Max and Min, an arithmetic mean was calculated to compare with V2.Mean.

Observation 1.

NOAA always rounds up to the nearest tenth of a degree in calculating V2.Mean.

Calculating (Reported V2.Mean – Calculated Mean) mostly gives a result of zero or 0.5 as shown in this example:

This appears to be poor practice, when the usual approach to neutralising bias is to round to the nearest odd or even number. However, the bias is small, as units are tenths of a degree.

This observation led to the discovery of larger errors.

Observation 2.

The difference between reported V2.mean and the calculated mean can be substantial.

Here is a cluster of (Reported V2.Mean – Calculated Mean):

For example, Station 94312 (Note: Port Hedland, Western Australia – Photo added: AW)

Port Hedland, WA BoM instrument enclosure - Source: http://www.bom.gov.au/wa/port_hedland/
Port Hedlan BoM station from the air - click to enlarge

In March 1996 shows that the reported GHCN V2.mean figure is 1.15oC lower than the mean calculated from V2.max and V2.min.

There is no obvious pattern in these errors.

As a spot check, the raw data from GHCN V2 for station 94312 in 1996 is as follows:

The arithmetic mean for March should be (377+256)/2 = 316.5

But NOAA has calculated it as 305. An error of 11.5 tenths of a degree.

WMO Station 50194312 is BOM Station 04032.

Here are the monthly averages calculated from BOM daily data:

With one exception, they are within 0.1oC of the NOAA figures. The exception is 0.2oC.

There are 144 years of data from 36 Australian stations affected.

GISS V2 Carries NOAA’s version of V2.Mean. So GISS will be propagating the error.

Full Error List

The full error list of stations is available on request. It comprises 144 years of data from 36 Stations.

Observation 3.

Unless there is a severe problem in transmitting BOM data to NOAA, then NOAA’s quality control procedures appear to reject a lot of superficially good BOM data.

When this happens, NOAA replace the suspect data with “-9999”, and write a QC.failed record.

GHCN V2.mean now contains many instances where a mean is reported, but the underlying V2.max and/or V2.min are flagged -9999. That is, they are not shown.

For example, station 50194312 (BOM 0432) shows:

Spot check. Following is matching raw data from GHCN V2 for checking purposes:

Note that Means are published when corresponding Max and Mins are absent in Jan, Feb and April.

The corresponding BOM raw daily data for 1991, 1995 and 2005 was checked. It is complete, with the exception of three days of 1991 minimums in May 1991. Two of these days have missing results. The third is flagged with a QC doubt. Note that this BOM data comes from the present BOM database, and may not be what went to NOAA in earlier years.

Here is the BOM data corresponding to the NOAA product:

And here are the differences, BOM – GHCN

Here we can see substantial corrections to input data, especially in 2005.

V2.max.failed was checked for data from this station. There is only one entry, for 1951. V2.Mean.failed referred to the same 1951 QC failure. V2.min.failed also has a single entry for October 2004.

Summary

There is a lot of published criticism of the quality of NOAA’s GHCN V2. I now add some more.

In my profession, errors of this sort would cause the whole dataset to be rejected. I am astonished that the much vaunted NOAA quality control procedures did not pick up such gross errors.

The error is compounded in the sense that it propagates via V2 into the GISS database, and other users of GHCN V2.

Appendix – Source Data

The GHCN V2 database, giving Annual and Monthly data, is available at ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2. The file create date of the set used in this study was October 15, 2010.

The Australian Bureau of Meteorology (BOM) supplies raw instrument data to NOAA electronically. This data is accessible on the interactive BOM site at:

http://www.bom.gov.au/climate/data/

This is daily max and min data, and should be the data supplied to NOAA.

Ed Thurstan

thurstan@bigpond.net.au

October 20, 2010

=================================================================

UPDATE VIA EMAIL:

Hi Anthony,
I made an error in comparing GHCN data against Aust. BOM data. A Graeme W spotted it, and I have just posted a correction in the comments. I have offered to email anyone a corrected report.
I chose 1991, 1995 and 2005 data to compare GHCN and BOM. 1991 and 1995 comparisons are correct. But I inadvertently compared 2005 GHCN data against 2007 BOM data. (2007 also exhibits the GHCN error at issue in the report.)

ERRATA

In the section where I compare BOM data against GHCN data to highlight corrections made to GHCN input data, I inadvertently compared 2005 GHCN to 2007 BOM data. The offending data for 2005 should read

BOM MAX Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 39.4 38.2 38.4 38.2 33.3 26.4 27.9 28.6 31.4 33.6 37.2 36.3 

BOM MIN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 26.7 27.3 26.4 23.7 18.5 14.7 14 13.8 15.6 18 20.8 25

BOM MEANJan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 33.05 32.75 32.4 30.95 25.9 20.55 20.95 21.2 23.5 25.8 29 30.65

DIFFERENCES

MAX Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 0.0 0.0 0.0 – 0.0 0.0 0..0 0.0 0.0 0.0 0.0 0.0

MIN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 0.0 0.0 0.0 – 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

MEAN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 1.2 0.9 0.9 1.0 0.8 0.7 0.8 0.7 0.7 0.7 0.7 0.8

 

The correction does not diminish my argument in any way. The same type of effect would be apparent if 2007 GHCN were compared against 2007 BOM data.

Apologies to all for the error.

Ed
Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
96 Comments
Inline Feedbacks
View all comments
PJB
November 10, 2010 12:09 pm

Clearly a barometer, errr thermometer errr guestimate of the actual temperatures.
Are we really surprised? Gavin should spend less time on the blog and more doing his job….

Steeptown
November 10, 2010 12:10 pm

“In my profession, errors of this sort would cause the whole dataset to be rejected.”
Mine too.

Peter Miller
November 10, 2010 12:18 pm

Just another example of the distorted data of supposed climate ‘science’.

incervisiaveritas
November 10, 2010 12:19 pm

To paraphrase Robert A Heinlein “Climate science is what they say, weather is what we get”.

James Sexton
November 10, 2010 12:26 pm

Wouldn’t it be nice, if just once, someone in the field would engage in self audits to a point where they can find and admit and correct mistakes?

November 10, 2010 12:27 pm

Good thing these guys aren’t engineers – anything they built would probably blow up, fall down or both – given the extremely sloppy job they have done handling the data.

November 10, 2010 12:30 pm

James Sexton,
That would be according to the scientific method – something the NOAA avoids.

November 10, 2010 12:38 pm

NOT ACCURATEAT ALL.

Mike
November 10, 2010 12:39 pm

The average value of a continuous function f(t) over an interval [a,b] is the integral from a to b of f(t) divided by (b-a). It is not equal to the average its maximum value and its minimum value. For example (I just covered this in class today) the average value of sin(t) over [0,pi] is 2/pi which is about 0.6366. The max is one and the min is 0, which average to 0.5. The shape of the function matters.
Here is a simpler discrete example. Suppose f(1) =1, f(2) = 1, f(3) =1 and f(4) = 9. The average is (1+1+1+9)/4 = 3 not (1+9)/2 = 5.
I do not know what method is used by BOT. I’m just exampling the basic math.

Tenuc
November 10, 2010 12:45 pm

Hard to believe that these people can be so incompetent!

R. de Haan
November 10, 2010 12:54 pm

No conspiracy here. It’s just coincidence that similar irregularities have occurred in data sets from countries all over the world. Please, don’t connect the dots but limit yourself to polite discussions. The politicians, burocrats and scientists involved did it only to save the planet…. from you.

Kev-in-UK
November 10, 2010 12:59 pm

James Sexton is right – but of course, that would impy that they were actually undertaking work as responsible and reputable scientists – which quite clearly, they aren’t!
and PJB is spot on – Mr Schmidt’s efforts would perhaps be far more worthwhile if he actually concentrated on his job!

Douglas DC
November 10, 2010 1:01 pm

As a former NOAA weather Observer, and a meticulous keeper of hand, written, MkI
eyeball observations, is it just me or had automation caused all the current records,
to be monkeyed with?
I Think I know the answer….

Michael
November 10, 2010 1:04 pm

OT
Chris Christie Skeptical That Global Warming Is Caused By Humans
“Mankind, is it responsible for global warming? Well I’ll tell you something. I have seen evidence on both sides of it. I’m skeptical — I’m skeptical. And you know, I think at the at the end of this, I think we’re going to need more science to prove something one way or the other. But you know – cause I’ve seen arguments on both sides of it that at times – like I’ll watch something about man made global warming, and I go wow, that’s fairly convincing. And then I’ll go out and watch the other side of the argument, and I go huh, that’s fairly convincing too. So, I go to be honest with you, I don’t know. And that’s probably one of the reason’s why I became a lawyer, and not a doctor, or an engineer, or a scientist, because I can’t figure this stuff out. But I would say at this point, that has to be proven, and I’m a little skeptical about it. Thank you.”
http://www.huffingtonpost.com/2010/11/10/chris-christie-global-warming_n_781494.html#comments

NucEngineer
November 10, 2010 1:04 pm

To paraphrase the Mexican bandito confronting Humphry Bogart in The Treasure of the Sierra Madre, “Data, we don’t need no stinking data.”
These guys have computers, and they call the computer outputs “data”.

RuhRoh
November 10, 2010 1:16 pm

Well, with data from past history, discarding the data set is not an option.
This seems to indicate a need to audit the Data QA ‘processes’ , however they might be implemented.
I’m unaware that any formal disclosure of these processes, which may be partially manual and involve some application of ‘judgement’. This would seem to be the direction to go, in light of this kind of discovery.
RR

Rational Debate
November 10, 2010 1:17 pm

Sooooo, do I understand this correctly? For all 144 years of data from 36 Australian stations, the errors (which obviously shouldn’t exist at least not nearly to this degree) don’t create any significant temp bias in one particular direction (up or down), nor create a significant bias in terms of changing the slope of temperatures over time for this set?

John in NZ
November 10, 2010 1:18 pm

Since the theory is right, the data do not need to be.
Data can always be adjusted later to fit the theory.

Frank
November 10, 2010 1:27 pm

Andy: There have been anecdotal account of dozens of such errors in the processing of temperature data to create a global temperature record. In particular, there has been substantial criticism of schemes to correct for UHI, station changes, and incomplete temperature records by extracting a signal for neighboring stations. A variety of bloggers have re-analyzed the raw temperature data. What has been the result of these efforts? What is the consensus about 20th century temperature rise? Does anyone from the skeptic community believe there is a reasonable adjustment for UHI?

November 10, 2010 1:38 pm

If I ever made mistakes of such magnitude and frequency when marking exam scripts, I would have been summarily dismissed after being damned at a hearing as to my professional fitness. Are there no sanctions for these people when they make such errors?

PJP
November 10, 2010 1:42 pm

Jeff L says:
November 10, 2010 at 12:27 pm
Good thing these guys aren’t engineers – anything they built would probably blow up, fall down or both – given the extremely sloppy job they have done handling the data.

Government engineers … spend millions on a spacecraft, then find out after it has been launched that half the plans were metric and the other half imperial.
Is that the sort of government engineers you are talking about?

Rhoda R
November 10, 2010 1:52 pm

Rational Debate; if this were the only instance of messed up data bases, perhaps you’d be right. BUT, and this is a big but, this isn’t. Just about every time someone looks at the raw data they find some sort of error or another, or strange adjustments or other ‘quirks’. This instance just adds to the overall impression that the data sets supporting climate theories are without value. Without honest data, how can we come to any conclusions about what is going on, how can we actually do ‘science’ when the data is unreliable?

Crispin in Waterloo
November 10, 2010 1:53 pm

“Are there no sanctions for these people when they make such errors?”
++++++++
No

Mike
November 10, 2010 1:56 pm

Michael says: November 10, 2010 at 1:04 pm “OT Chris Christie Skeptical That Global Warming Is Caused By Humans”
The Governor of NJ:
“…I’ll watch something about man made global warming, and I go wow, that’s fairly convincing. And then I’ll go out and watch the other side of the argument,…”
The Gov of a state is trying to figure this out by watching TV!!!!!!!!
No one can work out a hard science problem by watching TV. Can’t the man read?
“And that’s probably one of the reason’s why I became a lawyer, and not a doctor, or an engineer, or a scientist, because I can’t figure this stuff out.”
“First thing we do, let’s …” –Shakespeare
Friends don’t let friends vote for dumb people.

Steven mosher
November 10, 2010 2:13 pm

Global stats would be a good insight. In your first table you report errors of .05 and zero, but the second table indicates negative errors.
A histogram of the errors and summary states (in full C) would be helpful.
Then for every station compute trends both ways: using your various approaches.

1 2 3 4