UPDATE: 11/11/10 An errata has been posted, see the end of this essay – Anthony

Guest post by Ed Thurstan of Sydney, Australia
Synopsis
This study shows that the NOAA maintained GHCN V2 database contains errors in calculating a Mean temperature from a Maximum and a Minimum. 144 years of data from 36 Australian stations are affected.
Means are published when the underlying Maximums and/or Minimums have been rejected.
Analysis
The Australian Bureau of Meteorology (BOM) provides NOAA with “entirely raw instrumental data via the Global Telecommunications System”. In the process of comparing BOM Max and Min outputs with NOAA “Raw” inputs, some oddities were noticed.
A database of Australian data (Country 501) was set up for each of GHCN V2.Max, V2.Mean, V2.Min. Each record consists of WMO Station ID, Modifier, Dup, Year, then 12 months of data Jan-Dec.
“Modifier” and “Dup” are codes which allow inclusion of multiple sets of data for the same station, or what appears to be the same station. This data is included rather than losing it in case it may be useful to someone. For this exercise, Modifier=0 and Dup=0 was selected.
Only those stations and years where all 12 months of data are present were selected. This results in about 14,000 station-years of monthly data being compared.
A compound key of Station ID concatenated with year was set up.
From Max and Min, an arithmetic mean was calculated to compare with V2.Mean.
Observation 1.
NOAA always rounds up to the nearest tenth of a degree in calculating V2.Mean.
Calculating (Reported V2.Mean – Calculated Mean) mostly gives a result of zero or 0.5 as shown in this example:
This appears to be poor practice, when the usual approach to neutralising bias is to round to the nearest odd or even number. However, the bias is small, as units are tenths of a degree.
This observation led to the discovery of larger errors.
Observation 2.
The difference between reported V2.mean and the calculated mean can be substantial.
Here is a cluster of (Reported V2.Mean – Calculated Mean):
For example, Station 94312 (Note: Port Hedland, Western Australia – Photo added: AW)


In March 1996 shows that the reported GHCN V2.mean figure is 1.15oC lower than the mean calculated from V2.max and V2.min.
There is no obvious pattern in these errors.
As a spot check, the raw data from GHCN V2 for station 94312 in 1996 is as follows:
The arithmetic mean for March should be (377+256)/2 = 316.5
But NOAA has calculated it as 305. An error of 11.5 tenths of a degree.
WMO Station 50194312 is BOM Station 04032.
Here are the monthly averages calculated from BOM daily data:
With one exception, they are within 0.1oC of the NOAA figures. The exception is 0.2oC.
There are 144 years of data from 36 Australian stations affected.
GISS V2 Carries NOAA’s version of V2.Mean. So GISS will be propagating the error.
Full Error List
The full error list of stations is available on request. It comprises 144 years of data from 36 Stations.
Observation 3.
Unless there is a severe problem in transmitting BOM data to NOAA, then NOAA’s quality control procedures appear to reject a lot of superficially good BOM data.
When this happens, NOAA replace the suspect data with “-9999”, and write a QC.failed record.
GHCN V2.mean now contains many instances where a mean is reported, but the underlying V2.max and/or V2.min are flagged -9999. That is, they are not shown.
For example, station 50194312 (BOM 0432) shows:
Spot check. Following is matching raw data from GHCN V2 for checking purposes:
Note that Means are published when corresponding Max and Mins are absent in Jan, Feb and April.
The corresponding BOM raw daily data for 1991, 1995 and 2005 was checked. It is complete, with the exception of three days of 1991 minimums in May 1991. Two of these days have missing results. The third is flagged with a QC doubt. Note that this BOM data comes from the present BOM database, and may not be what went to NOAA in earlier years.
Here is the BOM data corresponding to the NOAA product:
And here are the differences, BOM – GHCN
Here we can see substantial corrections to input data, especially in 2005.
V2.max.failed was checked for data from this station. There is only one entry, for 1951. V2.Mean.failed referred to the same 1951 QC failure. V2.min.failed also has a single entry for October 2004.
Summary
There is a lot of published criticism of the quality of NOAA’s GHCN V2. I now add some more.
In my profession, errors of this sort would cause the whole dataset to be rejected. I am astonished that the much vaunted NOAA quality control procedures did not pick up such gross errors.
The error is compounded in the sense that it propagates via V2 into the GISS database, and other users of GHCN V2.
Appendix – Source Data
The GHCN V2 database, giving Annual and Monthly data, is available at ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2. The file create date of the set used in this study was October 15, 2010.
The Australian Bureau of Meteorology (BOM) supplies raw instrument data to NOAA electronically. This data is accessible on the interactive BOM site at:
http://www.bom.gov.au/climate/data/
This is daily max and min data, and should be the data supplied to NOAA.
Ed Thurstan
October 20, 2010
=================================================================
UPDATE VIA EMAIL:
In the section where I compare BOM data against GHCN data to highlight corrections made to GHCN input data, I inadvertently compared 2005 GHCN to 2007 BOM data. The offending data for 2005 should read
2005 39.4 38.2 38.4 38.2 33.3 26.4 27.9 28.6 31.4 33.6 37.2 36.3
BOM MIN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 26.7 27.3 26.4 23.7 18.5 14.7 14 13.8 15.6 18 20.8 25
BOM MEANJan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 33.05 32.75 32.4 30.95 25.9 20.55 20.95 21.2 23.5 25.8 29 30.65
DIFFERENCES
MAX Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 0.0 0.0 0.0 – 0.0 0.0 0..0 0.0 0.0 0.0 0.0 0.0
MIN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 0.0 0.0 0.0 – 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
MEAN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 1.2 0.9 0.9 1.0 0.8 0.7 0.8 0.7 0.7 0.7 0.7 0.8
Apologies to all for the error.










Clearly a barometer, errr thermometer errr guestimate of the actual temperatures.
Are we really surprised? Gavin should spend less time on the blog and more doing his job….
“In my profession, errors of this sort would cause the whole dataset to be rejected.”
Mine too.
Just another example of the distorted data of supposed climate ‘science’.
To paraphrase Robert A Heinlein “Climate science is what they say, weather is what we get”.
Wouldn’t it be nice, if just once, someone in the field would engage in self audits to a point where they can find and admit and correct mistakes?
Good thing these guys aren’t engineers – anything they built would probably blow up, fall down or both – given the extremely sloppy job they have done handling the data.
James Sexton,
That would be according to the scientific method – something the NOAA avoids.
NOT ACCURATEAT ALL.
The average value of a continuous function f(t) over an interval [a,b] is the integral from a to b of f(t) divided by (b-a). It is not equal to the average its maximum value and its minimum value. For example (I just covered this in class today) the average value of sin(t) over [0,pi] is 2/pi which is about 0.6366. The max is one and the min is 0, which average to 0.5. The shape of the function matters.
Here is a simpler discrete example. Suppose f(1) =1, f(2) = 1, f(3) =1 and f(4) = 9. The average is (1+1+1+9)/4 = 3 not (1+9)/2 = 5.
I do not know what method is used by BOT. I’m just exampling the basic math.
Hard to believe that these people can be so incompetent!
No conspiracy here. It’s just coincidence that similar irregularities have occurred in data sets from countries all over the world. Please, don’t connect the dots but limit yourself to polite discussions. The politicians, burocrats and scientists involved did it only to save the planet…. from you.
James Sexton is right – but of course, that would impy that they were actually undertaking work as responsible and reputable scientists – which quite clearly, they aren’t!
and PJB is spot on – Mr Schmidt’s efforts would perhaps be far more worthwhile if he actually concentrated on his job!
As a former NOAA weather Observer, and a meticulous keeper of hand, written, MkI
eyeball observations, is it just me or had automation caused all the current records,
to be monkeyed with?
I Think I know the answer….
OT
Chris Christie Skeptical That Global Warming Is Caused By Humans
“Mankind, is it responsible for global warming? Well I’ll tell you something. I have seen evidence on both sides of it. I’m skeptical — I’m skeptical. And you know, I think at the at the end of this, I think we’re going to need more science to prove something one way or the other. But you know – cause I’ve seen arguments on both sides of it that at times – like I’ll watch something about man made global warming, and I go wow, that’s fairly convincing. And then I’ll go out and watch the other side of the argument, and I go huh, that’s fairly convincing too. So, I go to be honest with you, I don’t know. And that’s probably one of the reason’s why I became a lawyer, and not a doctor, or an engineer, or a scientist, because I can’t figure this stuff out. But I would say at this point, that has to be proven, and I’m a little skeptical about it. Thank you.”
http://www.huffingtonpost.com/2010/11/10/chris-christie-global-warming_n_781494.html#comments
To paraphrase the Mexican bandito confronting Humphry Bogart in The Treasure of the Sierra Madre, “Data, we don’t need no stinking data.”
These guys have computers, and they call the computer outputs “data”.
Well, with data from past history, discarding the data set is not an option.
This seems to indicate a need to audit the Data QA ‘processes’ , however they might be implemented.
I’m unaware that any formal disclosure of these processes, which may be partially manual and involve some application of ‘judgement’. This would seem to be the direction to go, in light of this kind of discovery.
RR
Sooooo, do I understand this correctly? For all 144 years of data from 36 Australian stations, the errors (which obviously shouldn’t exist at least not nearly to this degree) don’t create any significant temp bias in one particular direction (up or down), nor create a significant bias in terms of changing the slope of temperatures over time for this set?
Since the theory is right, the data do not need to be.
Data can always be adjusted later to fit the theory.
Andy: There have been anecdotal account of dozens of such errors in the processing of temperature data to create a global temperature record. In particular, there has been substantial criticism of schemes to correct for UHI, station changes, and incomplete temperature records by extracting a signal for neighboring stations. A variety of bloggers have re-analyzed the raw temperature data. What has been the result of these efforts? What is the consensus about 20th century temperature rise? Does anyone from the skeptic community believe there is a reasonable adjustment for UHI?
If I ever made mistakes of such magnitude and frequency when marking exam scripts, I would have been summarily dismissed after being damned at a hearing as to my professional fitness. Are there no sanctions for these people when they make such errors?
Jeff L says:
November 10, 2010 at 12:27 pm
Good thing these guys aren’t engineers – anything they built would probably blow up, fall down or both – given the extremely sloppy job they have done handling the data.
Government engineers … spend millions on a spacecraft, then find out after it has been launched that half the plans were metric and the other half imperial.
Is that the sort of government engineers you are talking about?
Rational Debate; if this were the only instance of messed up data bases, perhaps you’d be right. BUT, and this is a big but, this isn’t. Just about every time someone looks at the raw data they find some sort of error or another, or strange adjustments or other ‘quirks’. This instance just adds to the overall impression that the data sets supporting climate theories are without value. Without honest data, how can we come to any conclusions about what is going on, how can we actually do ‘science’ when the data is unreliable?
“Are there no sanctions for these people when they make such errors?”
++++++++
No
@ur momisugly Michael says: November 10, 2010 at 1:04 pm “OT Chris Christie Skeptical That Global Warming Is Caused By Humans”
The Governor of NJ:
“…I’ll watch something about man made global warming, and I go wow, that’s fairly convincing. And then I’ll go out and watch the other side of the argument,…”
The Gov of a state is trying to figure this out by watching TV!!!!!!!!
No one can work out a hard science problem by watching TV. Can’t the man read?
“And that’s probably one of the reason’s why I became a lawyer, and not a doctor, or an engineer, or a scientist, because I can’t figure this stuff out.”
“First thing we do, let’s …” –Shakespeare
Friends don’t let friends vote for dumb people.
Global stats would be a good insight. In your first table you report errors of .05 and zero, but the second table indicates negative errors.
A histogram of the errors and summary states (in full C) would be helpful.
Then for every station compute trends both ways: using your various approaches.