"Gross" Data Errors in GHCN V2. for Australia

UPDATE: 11/11/10 An errata has been posted, see the end of this essay – Anthony

Port Hedland, WA BoM office

Guest post by Ed Thurstan of Sydney, Australia

Synopsis

This study shows that the NOAA maintained GHCN V2 database contains errors in calculating a Mean temperature from a Maximum and a Minimum. 144 years of data from 36 Australian stations are affected.

Means are published when the underlying Maximums and/or Minimums have been rejected.

Analysis

The Australian Bureau of Meteorology (BOM) provides NOAA with “entirely raw instrumental data via the Global Telecommunications System”. In the process of comparing BOM Max and Min outputs with NOAA “Raw” inputs, some oddities were noticed.

A database of Australian data (Country 501) was set up for each of GHCN V2.Max, V2.Mean, V2.Min. Each record consists of WMO Station ID, Modifier, Dup, Year, then 12 months of data Jan-Dec.

“Modifier” and “Dup” are codes which allow inclusion of multiple sets of data for the same station, or what appears to be the same station. This data is included rather than losing it in case it may be useful to someone. For this exercise, Modifier=0 and Dup=0 was selected.

Only those stations and years where all 12 months of data are present were selected. This results in about 14,000 station-years of monthly data being compared.

A compound key of Station ID concatenated with year was set up.

From Max and Min, an arithmetic mean was calculated to compare with V2.Mean.

Observation 1.

NOAA always rounds up to the nearest tenth of a degree in calculating V2.Mean.

Calculating (Reported V2.Mean – Calculated Mean) mostly gives a result of zero or 0.5 as shown in this example:

This appears to be poor practice, when the usual approach to neutralising bias is to round to the nearest odd or even number. However, the bias is small, as units are tenths of a degree.

This observation led to the discovery of larger errors.

Observation 2.

The difference between reported V2.mean and the calculated mean can be substantial.

Here is a cluster of (Reported V2.Mean – Calculated Mean):

For example, Station 94312 (Note: Port Hedland, Western Australia – Photo added: AW)

Port Hedland, WA BoM instrument enclosure - Source: http://www.bom.gov.au/wa/port_hedland/
Port Hedlan BoM station from the air - click to enlarge

In March 1996 shows that the reported GHCN V2.mean figure is 1.15oC lower than the mean calculated from V2.max and V2.min.

There is no obvious pattern in these errors.

As a spot check, the raw data from GHCN V2 for station 94312 in 1996 is as follows:

The arithmetic mean for March should be (377+256)/2 = 316.5

But NOAA has calculated it as 305. An error of 11.5 tenths of a degree.

WMO Station 50194312 is BOM Station 04032.

Here are the monthly averages calculated from BOM daily data:

With one exception, they are within 0.1oC of the NOAA figures. The exception is 0.2oC.

There are 144 years of data from 36 Australian stations affected.

GISS V2 Carries NOAA’s version of V2.Mean. So GISS will be propagating the error.

Full Error List

The full error list of stations is available on request. It comprises 144 years of data from 36 Stations.

Observation 3.

Unless there is a severe problem in transmitting BOM data to NOAA, then NOAA’s quality control procedures appear to reject a lot of superficially good BOM data.

When this happens, NOAA replace the suspect data with “-9999”, and write a QC.failed record.

GHCN V2.mean now contains many instances where a mean is reported, but the underlying V2.max and/or V2.min are flagged -9999. That is, they are not shown.

For example, station 50194312 (BOM 0432) shows:

Spot check. Following is matching raw data from GHCN V2 for checking purposes:

Note that Means are published when corresponding Max and Mins are absent in Jan, Feb and April.

The corresponding BOM raw daily data for 1991, 1995 and 2005 was checked. It is complete, with the exception of three days of 1991 minimums in May 1991. Two of these days have missing results. The third is flagged with a QC doubt. Note that this BOM data comes from the present BOM database, and may not be what went to NOAA in earlier years.

Here is the BOM data corresponding to the NOAA product:

And here are the differences, BOM – GHCN

Here we can see substantial corrections to input data, especially in 2005.

V2.max.failed was checked for data from this station. There is only one entry, for 1951. V2.Mean.failed referred to the same 1951 QC failure. V2.min.failed also has a single entry for October 2004.

Summary

There is a lot of published criticism of the quality of NOAA’s GHCN V2. I now add some more.

In my profession, errors of this sort would cause the whole dataset to be rejected. I am astonished that the much vaunted NOAA quality control procedures did not pick up such gross errors.

The error is compounded in the sense that it propagates via V2 into the GISS database, and other users of GHCN V2.

Appendix – Source Data

The GHCN V2 database, giving Annual and Monthly data, is available at ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2. The file create date of the set used in this study was October 15, 2010.

The Australian Bureau of Meteorology (BOM) supplies raw instrument data to NOAA electronically. This data is accessible on the interactive BOM site at:

http://www.bom.gov.au/climate/data/

This is daily max and min data, and should be the data supplied to NOAA.

Ed Thurstan

thurstan@bigpond.net.au

October 20, 2010

=================================================================

UPDATE VIA EMAIL:

Hi Anthony,
I made an error in comparing GHCN data against Aust. BOM data. A Graeme W spotted it, and I have just posted a correction in the comments. I have offered to email anyone a corrected report.
I chose 1991, 1995 and 2005 data to compare GHCN and BOM. 1991 and 1995 comparisons are correct. But I inadvertently compared 2005 GHCN data against 2007 BOM data. (2007 also exhibits the GHCN error at issue in the report.)

ERRATA

In the section where I compare BOM data against GHCN data to highlight corrections made to GHCN input data, I inadvertently compared 2005 GHCN to 2007 BOM data. The offending data for 2005 should read

BOM MAX Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 39.4 38.2 38.4 38.2 33.3 26.4 27.9 28.6 31.4 33.6 37.2 36.3 

BOM MIN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 26.7 27.3 26.4 23.7 18.5 14.7 14 13.8 15.6 18 20.8 25

BOM MEANJan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 33.05 32.75 32.4 30.95 25.9 20.55 20.95 21.2 23.5 25.8 29 30.65

DIFFERENCES

MAX Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 0.0 0.0 0.0 – 0.0 0.0 0..0 0.0 0.0 0.0 0.0 0.0

MIN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 0.0 0.0 0.0 – 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

MEAN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 1.2 0.9 0.9 1.0 0.8 0.7 0.8 0.7 0.7 0.7 0.7 0.8

 

The correction does not diminish my argument in any way. The same type of effect would be apparent if 2007 GHCN were compared against 2007 BOM data.

Apologies to all for the error.

Ed
The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
96 Comments
Inline Feedbacks
View all comments
MikeA
November 10, 2010 2:30 pm

Regarding “Observation 2”. Is the monthly mean supposed to be the average of the daily means or the average of the monthly max/min? I’m trying to do it in my head with rounding and it’s not working for me. Perhaps you should ask NOAA how they do it.

November 10, 2010 2:38 pm

I agree with Mike. You’re not calculating v2.mean as they would have done. The daily mean is the average of the daily max/min, and the monthly is the average of the dailies.
They should be the same? Well, not with missing values. If a max isn’t matched with a min for that day, it won’t go into v2.mean, but it will go into the monthly v2.max.
So in your Mar 1966 example, it’s likely that there were one or two hot days for which a minimum wasn’t recorded. They appear in v2.max but not v2.mean.

November 10, 2010 2:39 pm

Evidently Post Modern Science is more lost than Adam on Mother’s Day 🙂

November 10, 2010 2:41 pm

Well done Ed. Rational Debate: this isn’t saying there’s evidence of warming bias; it’s saying there’s evidence of crap data. The temperature data cannot be relied on.
Ken

Tim
November 10, 2010 2:41 pm

Suggest they use http://www.wolframalpha.com/. Don’t have to think, then.

Binny
November 10, 2010 2:54 pm

Notice how the grass around the site has been slashed exposing lots of bare ground.
An obvious necessity because of the fire hazard there’s lots of expensive equipment there that wouldn’t survive a grass fire.
But just another one of those things that make you go….Hmmmm.

November 10, 2010 2:55 pm

315, 264, 208, 303, 295, 332, 298, 214, 324.
I have just found some of the data that the Aussies misplaced. It was on my desk all along. I trust that they can put it all back where it belongs in the data set.

Rational Debate
November 10, 2010 3:26 pm

re: posts by: Rhoda R says: November 10, 2010 at 1:52 pm and Ken Stewart says: November 10, 2010 at 2:41 pm
Thanks to both of you for your replies. I get the unreliable problematic data part clearly, believe me – its one of the issues that wound up landing me squarely in the ‘skeptics’ camp, all the questionable data & subsequent handling.
But what I was trying to ask is for this particular problem that has been discovered, does it show any particular bias or trend, for this particular set? On a quick read it seems the author is saying no, but its not really clear…. soooooo, I’m asking….

David A. Evans
November 10, 2010 3:28 pm

Not read all the comments but I can suggest a reason.
Max mean would be the sum of daily max/days
Min mean would be the sum of daily min/days
Here’s where the problem lies…
If the monthly mean was calculated from the monthly (mean max + mean min)/2, you get a different answer to sum of daily means/days.
DaveE.

Scott
November 10, 2010 3:37 pm

Ed / Anthony,
I believe you need to correct the last paragraph. It is not Raw data. The BOM site is misleading. It’s ‘adjusted raw data’. I.e. They’ve changed it.

Golf Charley
November 10, 2010 3:39 pm

Aren’t the Australian BOM experts supposed to be checking the New Zealand temperature record?

1DandyTroll
November 10, 2010 3:45 pm

“In my profession, errors of this sort would cause the whole dataset to be rejected.”
We should all keep to our times and with that, let the good times roll so say ‘ello to the GHCN’s Zombie data, brought back to life with statistical artificial means to hunt us all!

Hugo M
November 10, 2010 3:52 pm

Ed,
when I compared the original data of a local weather station here in Germany with “raw” GHCN, I too found differences of up to 2 °C. And I really wondered why GHCN flagged monthly means as missing, when in reality these data are available — without any exception since 50 years. Hence the question is, if there might be a pattern behind that mess. Clearly, if a monthly mean is flagged as missing during summer, the yearly mean should get slightly colder, even if averaged anomalies are used instead of
plain data: the “Metereological Annual Mean” is computed based on anomalies of Seasonal Means using a rather complicated procedure. http://data.giss.nasa.gov/gistemp/station_data/seas_ann_means.html

BrianMcL
November 10, 2010 4:01 pm

Are these the guys who are checking the NZ data?
If so maybe they should get an independent review done. I hear the CRU are quite good at that kind of thing. Oh wait, on second thoughts maybe not.
Maybe the UK Met Office, or might they be a bit busy redoing their own figures.
Perhaps NASA could help out, or have they got some QC problems of their own too?
Can anybody help these poor guys out? It really is worse than we thought.

Graeme W
November 10, 2010 4:06 pm

I’ve just gone to the BOM site and used their option to get the mean maximum temperature for Mar 2005. The chart above has the figure as 33.6. The BOM site lists it as 38.4, which is an exact match for what the GHCN data shows.
Sorry, Ed, but you’ve done something screwy with your data. I suspect it’s in your calculation of the monthly means from the daily data, because the monthly mean maximum’s you’ve given in your table above (just before you do the BOM-GHCN calculation) don’t agree with the BOM website’s data (I only checked 2005, but if one year is wrong, I have to suspect the rest of the calculations, too).
Monthly Mean Maximum Temperatures for 004032 (Port Hedland):
http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=36&p_display_type=dataFile&p_startYear=&p_stn_num=004032

Graeme W
November 10, 2010 4:10 pm

Further to the above, here’ s the raw daily figures for 2005 for Port Hedland:
http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=122&p_display_type=dailyDataFile&p_startYear=2005&p_stn_num=004032
As you can see, there is no way the mean maximum temperature for Mar 2005 could be 33.6 as reported in the table above. The raw data shows the MINIMUM maximum temperature for that month as 33.2, and almost every day of the month is well above 33.6

David A. Evans
November 10, 2010 4:43 pm

DOH!
Had a real Homer moment. O put it down to lack of sleep.
Disregard my comment…
David A. Evans says:
November 10, 2010 at 3:28 pm
Sorry. Way past bed-time, been up over 20 hours now 🙁
DaveE.

Don Shaw
November 10, 2010 5:06 pm

Jeff L says:
November 10, 2010 at 12:27 pm
“Good thing these guys aren’t engineers – anything they built would probably blow up, fall down or both – given the extremely sloppy job they have done handling the data.”
Good point, Has anyone noticed that the once great NASA can no longer handle Hydrogen without singnificant leaks.
This from the government that thinks they can provide Hydrogen to millions of auto’s to save our planet from global warming.

jorgekafkazar
November 10, 2010 5:08 pm

Jeff L says: “Good thing these guys aren’t engineers – anything they built would probably blow up, fall down or both….”
And what did you think their objective was?

Bob M
November 10, 2010 5:14 pm

It’s a good thing that Carpenters don’t construct houses the same way Climate Scientists build models and assemble data sets. The first Woodpecker would decimate civilization.

November 10, 2010 5:18 pm

Graeme W
I think part of the problem is replicating what was done from the instructions.
My suggestion is this: if people want to analyze GHCN data and compare it to other sources they had better be able to produce the code they used to do it. Full stop.
Without the code we really cant tell what Ed did. So, in the same way i ask climate scientists for code we have to ask the people who check them for code.

latitude
November 10, 2010 5:44 pm

Anyone else get the impression that their egos advanced a lot
faster than their skills?

Graeme W
November 10, 2010 5:53 pm

Actually, Steven, I wasn’t even trying to replicate what Ed had done. I was simply doublechecking his figures that he said were the BOM figures. I quickly found out that they weren’t. That Mar 2005 maximum temperature of 33.6 isn’t what’s listed on the BOM website, and I can’t see any way that it could be derived from the raw daily data for Mar 2005.
If he’s used the wrong data from BOM, then all the BOM to GHCN comparison’s he’s listed are meaningless.

thingadonta
November 10, 2010 5:58 pm

The data errors are worse than we thought, pity they cant be changed unless there is a royal inquiry as to why the previous royal inquiry didn’t actually examine the data.

Ray Boorman
November 10, 2010 5:59 pm

Binny, if you knew the climate in Port Hedland, you would realise that slashing is not required. The place is a desert for at least 10 months every year. Stinking hot, very dry.