UPDATE: 11/11/10 An errata has been posted, see the end of this essay – Anthony
Guest post by Ed Thurstan of Sydney, Australia
This study shows that the NOAA maintained GHCN V2 database contains errors in calculating a Mean temperature from a Maximum and a Minimum. 144 years of data from 36 Australian stations are affected.
Means are published when the underlying Maximums and/or Minimums have been rejected.
The Australian Bureau of Meteorology (BOM) provides NOAA with “entirely raw instrumental data via the Global Telecommunications System”. In the process of comparing BOM Max and Min outputs with NOAA “Raw” inputs, some oddities were noticed.
A database of Australian data (Country 501) was set up for each of GHCN V2.Max, V2.Mean, V2.Min. Each record consists of WMO Station ID, Modifier, Dup, Year, then 12 months of data Jan-Dec.
“Modifier” and “Dup” are codes which allow inclusion of multiple sets of data for the same station, or what appears to be the same station. This data is included rather than losing it in case it may be useful to someone. For this exercise, Modifier=0 and Dup=0 was selected.
Only those stations and years where all 12 months of data are present were selected. This results in about 14,000 station-years of monthly data being compared.
A compound key of Station ID concatenated with year was set up.
From Max and Min, an arithmetic mean was calculated to compare with V2.Mean.
NOAA always rounds up to the nearest tenth of a degree in calculating V2.Mean.
Calculating (Reported V2.Mean – Calculated Mean) mostly gives a result of zero or 0.5 as shown in this example:
This appears to be poor practice, when the usual approach to neutralising bias is to round to the nearest odd or even number. However, the bias is small, as units are tenths of a degree.
This observation led to the discovery of larger errors.
The difference between reported V2.mean and the calculated mean can be substantial.
Here is a cluster of (Reported V2.Mean – Calculated Mean):
For example, Station 94312 (Note: Port Hedland, Western Australia – Photo added: AW)
In March 1996 shows that the reported GHCN V2.mean figure is 1.15oC lower than the mean calculated from V2.max and V2.min.
There is no obvious pattern in these errors.
As a spot check, the raw data from GHCN V2 for station 94312 in 1996 is as follows:
The arithmetic mean for March should be (377+256)/2 = 316.5
But NOAA has calculated it as 305. An error of 11.5 tenths of a degree.
WMO Station 50194312 is BOM Station 04032.
Here are the monthly averages calculated from BOM daily data:
With one exception, they are within 0.1oC of the NOAA figures. The exception is 0.2oC.
There are 144 years of data from 36 Australian stations affected.
GISS V2 Carries NOAA’s version of V2.Mean. So GISS will be propagating the error.
Full Error List
The full error list of stations is available on request. It comprises 144 years of data from 36 Stations.
Unless there is a severe problem in transmitting BOM data to NOAA, then NOAA’s quality control procedures appear to reject a lot of superficially good BOM data.
When this happens, NOAA replace the suspect data with “-9999”, and write a QC.failed record.
GHCN V2.mean now contains many instances where a mean is reported, but the underlying V2.max and/or V2.min are flagged -9999. That is, they are not shown.
For example, station 50194312 (BOM 0432) shows:
Spot check. Following is matching raw data from GHCN V2 for checking purposes:
Note that Means are published when corresponding Max and Mins are absent in Jan, Feb and April.
The corresponding BOM raw daily data for 1991, 1995 and 2005 was checked. It is complete, with the exception of three days of 1991 minimums in May 1991. Two of these days have missing results. The third is flagged with a QC doubt. Note that this BOM data comes from the present BOM database, and may not be what went to NOAA in earlier years.
Here is the BOM data corresponding to the NOAA product:
And here are the differences, BOM – GHCN
Here we can see substantial corrections to input data, especially in 2005.
V2.max.failed was checked for data from this station. There is only one entry, for 1951. V2.Mean.failed referred to the same 1951 QC failure. V2.min.failed also has a single entry for October 2004.
There is a lot of published criticism of the quality of NOAA’s GHCN V2. I now add some more.
In my profession, errors of this sort would cause the whole dataset to be rejected. I am astonished that the much vaunted NOAA quality control procedures did not pick up such gross errors.
The error is compounded in the sense that it propagates via V2 into the GISS database, and other users of GHCN V2.
Appendix – Source Data
The GHCN V2 database, giving Annual and Monthly data, is available at ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2. The file create date of the set used in this study was October 15, 2010.
The Australian Bureau of Meteorology (BOM) supplies raw instrument data to NOAA electronically. This data is accessible on the interactive BOM site at:
This is daily max and min data, and should be the data supplied to NOAA.
October 20, 2010
UPDATE VIA EMAIL:
In the section where I compare BOM data against GHCN data to highlight corrections made to GHCN input data, I inadvertently compared 2005 GHCN to 2007 BOM data. The offending data for 2005 should read
2005 39.4 38.2 38.4 38.2 33.3 26.4 27.9 28.6 31.4 33.6 37.2 36.3
BOM MIN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 26.7 27.3 26.4 23.7 18.5 14.7 14 13.8 15.6 18 20.8 25
BOM MEANJan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 33.05 32.75 32.4 30.95 25.9 20.55 20.95 21.2 23.5 25.8 29 30.65
MAX Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 0.0 0.0 0.0 – 0.0 0.0 0..0 0.0 0.0 0.0 0.0 0.0
MIN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 0.0 0.0 0.0 – 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
MEAN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 1.2 0.9 0.9 1.0 0.8 0.7 0.8 0.7 0.7 0.7 0.7 0.8
Apologies to all for the error.