"Gross" Data Errors in GHCN V2. for Australia

UPDATE: 11/11/10 An errata has been posted, see the end of this essay – Anthony

Port Hedland, WA BoM office

Guest post by Ed Thurstan of Sydney, Australia

Synopsis

This study shows that the NOAA maintained GHCN V2 database contains errors in calculating a Mean temperature from a Maximum and a Minimum. 144 years of data from 36 Australian stations are affected.

Means are published when the underlying Maximums and/or Minimums have been rejected.

Analysis

The Australian Bureau of Meteorology (BOM) provides NOAA with “entirely raw instrumental data via the Global Telecommunications System”. In the process of comparing BOM Max and Min outputs with NOAA “Raw” inputs, some oddities were noticed.

A database of Australian data (Country 501) was set up for each of GHCN V2.Max, V2.Mean, V2.Min. Each record consists of WMO Station ID, Modifier, Dup, Year, then 12 months of data Jan-Dec.

“Modifier” and “Dup” are codes which allow inclusion of multiple sets of data for the same station, or what appears to be the same station. This data is included rather than losing it in case it may be useful to someone. For this exercise, Modifier=0 and Dup=0 was selected.

Only those stations and years where all 12 months of data are present were selected. This results in about 14,000 station-years of monthly data being compared.

A compound key of Station ID concatenated with year was set up.

From Max and Min, an arithmetic mean was calculated to compare with V2.Mean.

Observation 1.

NOAA always rounds up to the nearest tenth of a degree in calculating V2.Mean.

Calculating (Reported V2.Mean – Calculated Mean) mostly gives a result of zero or 0.5 as shown in this example:

This appears to be poor practice, when the usual approach to neutralising bias is to round to the nearest odd or even number. However, the bias is small, as units are tenths of a degree.

This observation led to the discovery of larger errors.

Observation 2.

The difference between reported V2.mean and the calculated mean can be substantial.

Here is a cluster of (Reported V2.Mean – Calculated Mean):

For example, Station 94312 (Note: Port Hedland, Western Australia – Photo added: AW)

Port Hedland, WA BoM instrument enclosure - Source: http://www.bom.gov.au/wa/port_hedland/
Port Hedlan BoM station from the air - click to enlarge

In March 1996 shows that the reported GHCN V2.mean figure is 1.15oC lower than the mean calculated from V2.max and V2.min.

There is no obvious pattern in these errors.

As a spot check, the raw data from GHCN V2 for station 94312 in 1996 is as follows:

The arithmetic mean for March should be (377+256)/2 = 316.5

But NOAA has calculated it as 305. An error of 11.5 tenths of a degree.

WMO Station 50194312 is BOM Station 04032.

Here are the monthly averages calculated from BOM daily data:

With one exception, they are within 0.1oC of the NOAA figures. The exception is 0.2oC.

There are 144 years of data from 36 Australian stations affected.

GISS V2 Carries NOAA’s version of V2.Mean. So GISS will be propagating the error.

Full Error List

The full error list of stations is available on request. It comprises 144 years of data from 36 Stations.

Observation 3.

Unless there is a severe problem in transmitting BOM data to NOAA, then NOAA’s quality control procedures appear to reject a lot of superficially good BOM data.

When this happens, NOAA replace the suspect data with “-9999”, and write a QC.failed record.

GHCN V2.mean now contains many instances where a mean is reported, but the underlying V2.max and/or V2.min are flagged -9999. That is, they are not shown.

For example, station 50194312 (BOM 0432) shows:

Spot check. Following is matching raw data from GHCN V2 for checking purposes:

Note that Means are published when corresponding Max and Mins are absent in Jan, Feb and April.

The corresponding BOM raw daily data for 1991, 1995 and 2005 was checked. It is complete, with the exception of three days of 1991 minimums in May 1991. Two of these days have missing results. The third is flagged with a QC doubt. Note that this BOM data comes from the present BOM database, and may not be what went to NOAA in earlier years.

Here is the BOM data corresponding to the NOAA product:

And here are the differences, BOM – GHCN

Here we can see substantial corrections to input data, especially in 2005.

V2.max.failed was checked for data from this station. There is only one entry, for 1951. V2.Mean.failed referred to the same 1951 QC failure. V2.min.failed also has a single entry for October 2004.

Summary

There is a lot of published criticism of the quality of NOAA’s GHCN V2. I now add some more.

In my profession, errors of this sort would cause the whole dataset to be rejected. I am astonished that the much vaunted NOAA quality control procedures did not pick up such gross errors.

The error is compounded in the sense that it propagates via V2 into the GISS database, and other users of GHCN V2.

Appendix – Source Data

The GHCN V2 database, giving Annual and Monthly data, is available at ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2. The file create date of the set used in this study was October 15, 2010.

The Australian Bureau of Meteorology (BOM) supplies raw instrument data to NOAA electronically. This data is accessible on the interactive BOM site at:

http://www.bom.gov.au/climate/data/

This is daily max and min data, and should be the data supplied to NOAA.

Ed Thurstan

thurstan@bigpond.net.au

October 20, 2010

=================================================================

UPDATE VIA EMAIL:

Hi Anthony,
I made an error in comparing GHCN data against Aust. BOM data. A Graeme W spotted it, and I have just posted a correction in the comments. I have offered to email anyone a corrected report.
I chose 1991, 1995 and 2005 data to compare GHCN and BOM. 1991 and 1995 comparisons are correct. But I inadvertently compared 2005 GHCN data against 2007 BOM data. (2007 also exhibits the GHCN error at issue in the report.)

ERRATA

In the section where I compare BOM data against GHCN data to highlight corrections made to GHCN input data, I inadvertently compared 2005 GHCN to 2007 BOM data. The offending data for 2005 should read

BOM MAX Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 39.4 38.2 38.4 38.2 33.3 26.4 27.9 28.6 31.4 33.6 37.2 36.3 

BOM MIN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 26.7 27.3 26.4 23.7 18.5 14.7 14 13.8 15.6 18 20.8 25

BOM MEANJan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 33.05 32.75 32.4 30.95 25.9 20.55 20.95 21.2 23.5 25.8 29 30.65

DIFFERENCES

MAX Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 0.0 0.0 0.0 – 0.0 0.0 0..0 0.0 0.0 0.0 0.0 0.0

MIN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 0.0 0.0 0.0 – 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

MEAN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005 1.2 0.9 0.9 1.0 0.8 0.7 0.8 0.7 0.7 0.7 0.7 0.8

 

The correction does not diminish my argument in any way. The same type of effect would be apparent if 2007 GHCN were compared against 2007 BOM data.

Apologies to all for the error.

Ed
0 0 votes
Article Rating
96 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
PJB
November 10, 2010 12:09 pm

Clearly a barometer, errr thermometer errr guestimate of the actual temperatures.
Are we really surprised? Gavin should spend less time on the blog and more doing his job….

Steeptown
November 10, 2010 12:10 pm

“In my profession, errors of this sort would cause the whole dataset to be rejected.”
Mine too.

Peter Miller
November 10, 2010 12:18 pm

Just another example of the distorted data of supposed climate ‘science’.

incervisiaveritas
November 10, 2010 12:19 pm

To paraphrase Robert A Heinlein “Climate science is what they say, weather is what we get”.

James Sexton
November 10, 2010 12:26 pm

Wouldn’t it be nice, if just once, someone in the field would engage in self audits to a point where they can find and admit and correct mistakes?

November 10, 2010 12:27 pm

Good thing these guys aren’t engineers – anything they built would probably blow up, fall down or both – given the extremely sloppy job they have done handling the data.

November 10, 2010 12:30 pm

James Sexton,
That would be according to the scientific method – something the NOAA avoids.

Enneagram
November 10, 2010 12:38 pm

NOT ACCURATEAT ALL.

Mike
November 10, 2010 12:39 pm

The average value of a continuous function f(t) over an interval [a,b] is the integral from a to b of f(t) divided by (b-a). It is not equal to the average its maximum value and its minimum value. For example (I just covered this in class today) the average value of sin(t) over [0,pi] is 2/pi which is about 0.6366. The max is one and the min is 0, which average to 0.5. The shape of the function matters.
Here is a simpler discrete example. Suppose f(1) =1, f(2) = 1, f(3) =1 and f(4) = 9. The average is (1+1+1+9)/4 = 3 not (1+9)/2 = 5.
I do not know what method is used by BOT. I’m just exampling the basic math.

Tenuc
November 10, 2010 12:45 pm

Hard to believe that these people can be so incompetent!

R. de Haan
November 10, 2010 12:54 pm

No conspiracy here. It’s just coincidence that similar irregularities have occurred in data sets from countries all over the world. Please, don’t connect the dots but limit yourself to polite discussions. The politicians, burocrats and scientists involved did it only to save the planet…. from you.

Kev-in-UK
November 10, 2010 12:59 pm

James Sexton is right – but of course, that would impy that they were actually undertaking work as responsible and reputable scientists – which quite clearly, they aren’t!
and PJB is spot on – Mr Schmidt’s efforts would perhaps be far more worthwhile if he actually concentrated on his job!

Douglas DC
November 10, 2010 1:01 pm

As a former NOAA weather Observer, and a meticulous keeper of hand, written, MkI
eyeball observations, is it just me or had automation caused all the current records,
to be monkeyed with?
I Think I know the answer….

Michael
November 10, 2010 1:04 pm

OT
Chris Christie Skeptical That Global Warming Is Caused By Humans
“Mankind, is it responsible for global warming? Well I’ll tell you something. I have seen evidence on both sides of it. I’m skeptical — I’m skeptical. And you know, I think at the at the end of this, I think we’re going to need more science to prove something one way or the other. But you know – cause I’ve seen arguments on both sides of it that at times – like I’ll watch something about man made global warming, and I go wow, that’s fairly convincing. And then I’ll go out and watch the other side of the argument, and I go huh, that’s fairly convincing too. So, I go to be honest with you, I don’t know. And that’s probably one of the reason’s why I became a lawyer, and not a doctor, or an engineer, or a scientist, because I can’t figure this stuff out. But I would say at this point, that has to be proven, and I’m a little skeptical about it. Thank you.”
http://www.huffingtonpost.com/2010/11/10/chris-christie-global-warming_n_781494.html#comments

NucEngineer
November 10, 2010 1:04 pm

To paraphrase the Mexican bandito confronting Humphry Bogart in The Treasure of the Sierra Madre, “Data, we don’t need no stinking data.”
These guys have computers, and they call the computer outputs “data”.

RuhRoh
November 10, 2010 1:16 pm

Well, with data from past history, discarding the data set is not an option.
This seems to indicate a need to audit the Data QA ‘processes’ , however they might be implemented.
I’m unaware that any formal disclosure of these processes, which may be partially manual and involve some application of ‘judgement’. This would seem to be the direction to go, in light of this kind of discovery.
RR

Rational Debate
November 10, 2010 1:17 pm

Sooooo, do I understand this correctly? For all 144 years of data from 36 Australian stations, the errors (which obviously shouldn’t exist at least not nearly to this degree) don’t create any significant temp bias in one particular direction (up or down), nor create a significant bias in terms of changing the slope of temperatures over time for this set?

John in NZ
November 10, 2010 1:18 pm

Since the theory is right, the data do not need to be.
Data can always be adjusted later to fit the theory.

Frank
November 10, 2010 1:27 pm

Andy: There have been anecdotal account of dozens of such errors in the processing of temperature data to create a global temperature record. In particular, there has been substantial criticism of schemes to correct for UHI, station changes, and incomplete temperature records by extracting a signal for neighboring stations. A variety of bloggers have re-analyzed the raw temperature data. What has been the result of these efforts? What is the consensus about 20th century temperature rise? Does anyone from the skeptic community believe there is a reasonable adjustment for UHI?

Alexander K
November 10, 2010 1:38 pm

If I ever made mistakes of such magnitude and frequency when marking exam scripts, I would have been summarily dismissed after being damned at a hearing as to my professional fitness. Are there no sanctions for these people when they make such errors?

PJP
November 10, 2010 1:42 pm

Jeff L says:
November 10, 2010 at 12:27 pm
Good thing these guys aren’t engineers – anything they built would probably blow up, fall down or both – given the extremely sloppy job they have done handling the data.

Government engineers … spend millions on a spacecraft, then find out after it has been launched that half the plans were metric and the other half imperial.
Is that the sort of government engineers you are talking about?

Rhoda R
November 10, 2010 1:52 pm

Rational Debate; if this were the only instance of messed up data bases, perhaps you’d be right. BUT, and this is a big but, this isn’t. Just about every time someone looks at the raw data they find some sort of error or another, or strange adjustments or other ‘quirks’. This instance just adds to the overall impression that the data sets supporting climate theories are without value. Without honest data, how can we come to any conclusions about what is going on, how can we actually do ‘science’ when the data is unreliable?

Crispin in Waterloo
November 10, 2010 1:53 pm

“Are there no sanctions for these people when they make such errors?”
++++++++
No

Mike
November 10, 2010 1:56 pm

@ Michael says: November 10, 2010 at 1:04 pm “OT Chris Christie Skeptical That Global Warming Is Caused By Humans”
The Governor of NJ:
“…I’ll watch something about man made global warming, and I go wow, that’s fairly convincing. And then I’ll go out and watch the other side of the argument,…”
The Gov of a state is trying to figure this out by watching TV!!!!!!!!
No one can work out a hard science problem by watching TV. Can’t the man read?
“And that’s probably one of the reason’s why I became a lawyer, and not a doctor, or an engineer, or a scientist, because I can’t figure this stuff out.”
“First thing we do, let’s …” –Shakespeare
Friends don’t let friends vote for dumb people.

Steven mosher
November 10, 2010 2:13 pm

Global stats would be a good insight. In your first table you report errors of .05 and zero, but the second table indicates negative errors.
A histogram of the errors and summary states (in full C) would be helpful.
Then for every station compute trends both ways: using your various approaches.

MikeA
November 10, 2010 2:30 pm

Regarding “Observation 2”. Is the monthly mean supposed to be the average of the daily means or the average of the monthly max/min? I’m trying to do it in my head with rounding and it’s not working for me. Perhaps you should ask NOAA how they do it.

November 10, 2010 2:38 pm

I agree with Mike. You’re not calculating v2.mean as they would have done. The daily mean is the average of the daily max/min, and the monthly is the average of the dailies.
They should be the same? Well, not with missing values. If a max isn’t matched with a min for that day, it won’t go into v2.mean, but it will go into the monthly v2.max.
So in your Mar 1966 example, it’s likely that there were one or two hot days for which a minimum wasn’t recorded. They appear in v2.max but not v2.mean.

Enneagram
November 10, 2010 2:39 pm

Evidently Post Modern Science is more lost than Adam on Mother’s Day 🙂

November 10, 2010 2:41 pm

Well done Ed. Rational Debate: this isn’t saying there’s evidence of warming bias; it’s saying there’s evidence of crap data. The temperature data cannot be relied on.
Ken

Tim
November 10, 2010 2:41 pm

Suggest they use http://www.wolframalpha.com/. Don’t have to think, then.

Binny
November 10, 2010 2:54 pm

Notice how the grass around the site has been slashed exposing lots of bare ground.
An obvious necessity because of the fire hazard there’s lots of expensive equipment there that wouldn’t survive a grass fire.
But just another one of those things that make you go….Hmmmm.

November 10, 2010 2:55 pm

315, 264, 208, 303, 295, 332, 298, 214, 324.
I have just found some of the data that the Aussies misplaced. It was on my desk all along. I trust that they can put it all back where it belongs in the data set.

Rational Debate
November 10, 2010 3:26 pm

re: posts by: Rhoda R says: November 10, 2010 at 1:52 pm and Ken Stewart says: November 10, 2010 at 2:41 pm
Thanks to both of you for your replies. I get the unreliable problematic data part clearly, believe me – its one of the issues that wound up landing me squarely in the ‘skeptics’ camp, all the questionable data & subsequent handling.
But what I was trying to ask is for this particular problem that has been discovered, does it show any particular bias or trend, for this particular set? On a quick read it seems the author is saying no, but its not really clear…. soooooo, I’m asking….

David A. Evans
November 10, 2010 3:28 pm

Not read all the comments but I can suggest a reason.
Max mean would be the sum of daily max/days
Min mean would be the sum of daily min/days
Here’s where the problem lies…
If the monthly mean was calculated from the monthly (mean max + mean min)/2, you get a different answer to sum of daily means/days.
DaveE.

Scott
November 10, 2010 3:37 pm

Ed / Anthony,
I believe you need to correct the last paragraph. It is not Raw data. The BOM site is misleading. It’s ‘adjusted raw data’. I.e. They’ve changed it.

Golf Charley
November 10, 2010 3:39 pm

Aren’t the Australian BOM experts supposed to be checking the New Zealand temperature record?

1DandyTroll
November 10, 2010 3:45 pm

“In my profession, errors of this sort would cause the whole dataset to be rejected.”
We should all keep to our times and with that, let the good times roll so say ‘ello to the GHCN’s Zombie data, brought back to life with statistical artificial means to hunt us all!

Hugo M
November 10, 2010 3:52 pm

Ed,
when I compared the original data of a local weather station here in Germany with “raw” GHCN, I too found differences of up to 2 °C. And I really wondered why GHCN flagged monthly means as missing, when in reality these data are available — without any exception since 50 years. Hence the question is, if there might be a pattern behind that mess. Clearly, if a monthly mean is flagged as missing during summer, the yearly mean should get slightly colder, even if averaged anomalies are used instead of
plain data: the “Metereological Annual Mean” is computed based on anomalies of Seasonal Means using a rather complicated procedure. http://data.giss.nasa.gov/gistemp/station_data/seas_ann_means.html

BrianMcL
November 10, 2010 4:01 pm

Are these the guys who are checking the NZ data?
If so maybe they should get an independent review done. I hear the CRU are quite good at that kind of thing. Oh wait, on second thoughts maybe not.
Maybe the UK Met Office, or might they be a bit busy redoing their own figures.
Perhaps NASA could help out, or have they got some QC problems of their own too?
Can anybody help these poor guys out? It really is worse than we thought.

Graeme W
November 10, 2010 4:06 pm

I’ve just gone to the BOM site and used their option to get the mean maximum temperature for Mar 2005. The chart above has the figure as 33.6. The BOM site lists it as 38.4, which is an exact match for what the GHCN data shows.
Sorry, Ed, but you’ve done something screwy with your data. I suspect it’s in your calculation of the monthly means from the daily data, because the monthly mean maximum’s you’ve given in your table above (just before you do the BOM-GHCN calculation) don’t agree with the BOM website’s data (I only checked 2005, but if one year is wrong, I have to suspect the rest of the calculations, too).
Monthly Mean Maximum Temperatures for 004032 (Port Hedland):
http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=36&p_display_type=dataFile&p_startYear=&p_stn_num=004032

Graeme W
November 10, 2010 4:10 pm

Further to the above, here’ s the raw daily figures for 2005 for Port Hedland:
http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=122&p_display_type=dailyDataFile&p_startYear=2005&p_stn_num=004032
As you can see, there is no way the mean maximum temperature for Mar 2005 could be 33.6 as reported in the table above. The raw data shows the MINIMUM maximum temperature for that month as 33.2, and almost every day of the month is well above 33.6

David A. Evans
November 10, 2010 4:43 pm

DOH!
Had a real Homer moment. O put it down to lack of sleep.
Disregard my comment…
David A. Evans says:
November 10, 2010 at 3:28 pm
Sorry. Way past bed-time, been up over 20 hours now 🙁
DaveE.

Don Shaw
November 10, 2010 5:06 pm

Jeff L says:
November 10, 2010 at 12:27 pm
“Good thing these guys aren’t engineers – anything they built would probably blow up, fall down or both – given the extremely sloppy job they have done handling the data.”
Good point, Has anyone noticed that the once great NASA can no longer handle Hydrogen without singnificant leaks.
This from the government that thinks they can provide Hydrogen to millions of auto’s to save our planet from global warming.

jorgekafkazar
November 10, 2010 5:08 pm

Jeff L says: “Good thing these guys aren’t engineers – anything they built would probably blow up, fall down or both….”
And what did you think their objective was?

Bob M
November 10, 2010 5:14 pm

It’s a good thing that Carpenters don’t construct houses the same way Climate Scientists build models and assemble data sets. The first Woodpecker would decimate civilization.

November 10, 2010 5:18 pm

Graeme W
I think part of the problem is replicating what was done from the instructions.
My suggestion is this: if people want to analyze GHCN data and compare it to other sources they had better be able to produce the code they used to do it. Full stop.
Without the code we really cant tell what Ed did. So, in the same way i ask climate scientists for code we have to ask the people who check them for code.

latitude
November 10, 2010 5:44 pm

Anyone else get the impression that their egos advanced a lot
faster than their skills?

Graeme W
November 10, 2010 5:53 pm

Actually, Steven, I wasn’t even trying to replicate what Ed had done. I was simply doublechecking his figures that he said were the BOM figures. I quickly found out that they weren’t. That Mar 2005 maximum temperature of 33.6 isn’t what’s listed on the BOM website, and I can’t see any way that it could be derived from the raw daily data for Mar 2005.
If he’s used the wrong data from BOM, then all the BOM to GHCN comparison’s he’s listed are meaningless.

thingadonta
November 10, 2010 5:58 pm

The data errors are worse than we thought, pity they cant be changed unless there is a royal inquiry as to why the previous royal inquiry didn’t actually examine the data.

Ray Boorman
November 10, 2010 5:59 pm

Binny, if you knew the climate in Port Hedland, you would realise that slashing is not required. The place is a desert for at least 10 months every year. Stinking hot, very dry.

terrybixler
November 10, 2010 6:01 pm

I have a model of a real bridge that I would like to sell you. No I do not want to sell the model it is just there so you can see what the bridge would look like if you could see it. I also need the model for future sales. Of course the model represents real data from the bridge that has been point by point quality controlled. Of course, because of time going by you cannot see the actual data, I have misplaced some of it somewhere. You need not worry about the integrity of the bridge it has been certified by GISS and Hansen. How much should Australia pay for the bridge? Wait I have a counter offer form another viewer, the US. Story to be continued after the next election finishes the job.

Cynthia Lauren Thorpe
November 10, 2010 6:04 pm

Ladies and Gentlemen. I hope I’m not going to do damage to anyone’s psyche here.
Oh, heck. Why not.
I’m almost 53. I say that because in 1972, I was staring down the barrel of a HUGE Science Report, I was to construct for first year high school. It was a 6 week project.
I procrastinated even beyond my norm and ‘suddenly’ it was the night before my research paper was due. I vividly remember sitting at my little desk and creating my data by reviewing a few books (yes, it was supposed to be footnoted, and I did that, as well) and making up statistics outta the ‘clear blue’.
Somewhere in my (totally human) reasoning, I somehow understood that one teacher reading 40 research papers, couldn’t possibly check all the books and all the footnotes…soooooooooo……. ‘Up Up and Away’ my little project creatively went, till I actually began enjoying (oh, let’s call it ‘The George Soros-ness’ of being a little god in my own little room, creating my own little data to reach my ‘own little conclusions’.
Assuming you, as well as I, are of a mature age, having been around the block a time or two ~ you can most probably guess what marks I received for my (let’s delightfully call it, my creative genius? or ~ we could call it CHEATING? warm smiles…) Yeah. That’s right, my Scientist friends ~ I got a B+ on my research paper.
Now ~ the reason I’m sharing is not because I began to ‘fleece the system’ in high school (for one can suppose rightly that I became more learned in this process through the next 3 years) but, to simply put a finger upon exactly what humans
are capable of (and were I to be paid for it….gosh, who KNOWS what I’d’ve done…hmmm?)
So, PLEASE. All of you great guys an’ gals of ethical mores ~ GET TOGETHER AND HOLD THESE SO CALLED ‘EXPERTS’ ACCOUNTABLE, for perhaps eventually they will in due time find their own epiphany, of sorts ~ like say…….PUBLIC SCORN AND HUMILIATION…normally does the trick… and perhaps ~ someday, like myself,
they will be able to research without someone double-checking their stats. I suggest this whole debacle being quite cathartic…reminding me of my school days…when one of the only weirdos that graduated one year after me was George Stephanopolis… yeah, admittedly ~ sadly, we didn’t have the highest of standards… alas, it’s now up to you to hold them accountable, and I KNOW you can do it. I’ll agree to cheer lead and pray from the sidelines as you deftly eliminate this ‘smoke on the water’ or the skies that they’re ‘blowin’ in the wind’…
Cynthia Lauren
(ex-cheat and junior trouble-maker at large)

AusieDan
November 10, 2010 6:33 pm

I feel several of you correspondents have got it all wrong when you suggest that Gavin should give up blogging and concentrate on his day job.
I feel it would be better if he concentrated full time on blogging, where his heart obviously lies.
He should give up his day job.
Then someone more qualified could take a stong hand and improvide reliability of what is a most important if not vital resorce.
Gavin’s blog, with all its susceptability to human error is not quite so important.
He can be trusted to keep writing his blog.

George E. Smith
November 10, 2010 7:01 pm

“””” Mike says:
November 10, 2010 at 12:39 pm
The average value of a continuous function f(t) over an interval [a,b] is the integral from a to b of f(t) divided by (b-a). It is not equal to the average its maximum value and its minimum value. For example (I just covered this in class today) the average value of sin(t) over [0,pi] is 2/pi which is about 0.6366. The max is one and the min is 0, which average to 0.5. The shape of the function matters.
Here is a simpler discrete example. Suppose f(1) =1, f(2) = 1, f(3) =1 and f(4) = 9. The average is (1+1+1+9)/4 = 3 not (1+9)/2 = 5.
I do not know what method is used by BOT. I’m just exampling the basic math. “””””
Well Mike, what you say is true; and I have raised this issue so many times WRT the daily average temperatures reported for the GISS and other networks.
They DO in fact simply average the daily max and min Temperatures and report that single number for that day and that location.
And if the actual daily Temperature cycle did follow a sinusoidal function, then (max+min)/2 would give the correct average; and it also just conforms to the Nyquist sampling theorem, since the signal would be a band limited signal with a 1/24 hour signal bandwidth, so two equally spaced samples suffices to obtain the average.
However the appearance of any time assymmetry f(t) not equal f((T/2)-t) would imply the presence of at least a second harmonic component raising the band limit to 1/12 hours. In this case Nyquist is violated by a factor of two if you do min/max averaging; and with a factor of two undersampling; the aliassing noise spectrum folds all the way back to zero frequency; so the average is no longer recoverable.
And perish the thought that clouds would result in an even more ocmplex daily temperature cycle.
Incidently; I am sure that there is no natural physical process that would follow a half sinusoid cyclic behavior like a rectified sine wave; so as big a discrepancy as 0.5 to 0.636 (2/pi) probably doesn’t arise in practice.
But climatists seem totally oblivious to the laws for sampled data systems; and the Nyquist theorem. The global Temperature sampling regimen doesn’t follow the rules for either the time variable, or the spatial variable; where Nyquist is violated by orders of magnitude.
So much for knowing the average Temperature of this planet.

Doug Proctor
November 10, 2010 7:06 pm

Steve Mosher & Graeme W.:
Are you saying that Ed’s article is BS? That his “errors” are misinterpretations? What’s the bottom line, fellas?
You being PC or cherrypick complaining?

Truth Seeker
November 10, 2010 7:12 pm

NO Accuracy Applied

Rob Z
November 10, 2010 7:21 pm

In the spot check using 1996 data, it seems that all the data is wrong on the average. This would suggest that there is more data than just the max and the min that go into the average calculation. But this is where the alarmists/function people fall on their face. The stations don’t record more than the max and the min or shouldn’t because there would most likely be a bias. This shows up in that the station always gets checked at noon over lunch and might add a warming bias. If a “function” of more sampling than that occured, it’s logical to assume that some averages would be lower and some would be higher than the average of the max/min due to seasonal changes.
Also, if you look at the data in obs 3. The tabulated data of max/min/average for 1991….suggests that only use of the max and min is required to get the 1991 average.

Graeme W
November 10, 2010 8:42 pm

Doug Proctor says:
November 10, 2010 at 7:06 pm
Steve Mosher & Graeme W.:
Are you saying that Ed’s article is BS? That his “errors” are misinterpretations? What’s the bottom line, fellas?
You being PC or cherrypick complaining?

All I’ve done is to try to confirm the values he listed as being the original BOM data. I was surprised to find that the values I looked at didn’t match what’s on the BOM website.
Given what I found, the section of his article that reports “Gross” data errors between GHCN maximum data and BOM maximum data is just plain wrong.
I didn’t check the minimums to see if they agreed or were also wrong. The BOM website doesn’t have a mean temperature, so no direct comparison there is possible – you have to calculate the ‘BOM mean’ yourself, with the issues that others have already raised (do you average the mean max/min, or do you average the daily max/min, and then take the mean of all those daily averages?)
I haven’t checked any of the other sections of his article. I only checked that one because the errors were so large, and I wanted to provide the links to the raw data so everyone could see the error for themselves. As it turns out, I found that the raw data between BOM and GHCN agrees for March 2005, which is a direct contradiction to what is stated in the article above.
I just had the thought to see if the 33.6 figure appears in the BOM data for March of a different year, and it does – 2007. Indeed, all of the data he listed for BOM 2005 maximum temperatures are, in fact, the 2007 figures according to the BOM site.
I believe that the most likely reason for the errors is simply transcription. The BOM website doesn’t provide a contiguous table from 2010, but has heading breaks after 1972 and 1997. I suspect that Ed copied the tables into a spreadsheet, and had a copy error and shifted the later data by two years. He was thus comparing the 2005 GHCN data with the 2007 BOM data… which, not surprisingly, didn’t agree.
The problem is not an average calculation problem. The problem is that he’s accidentally compared the wrong years.

Graeme W
November 10, 2010 9:49 pm

Rob Z says:
November 10, 2010 at 7:21 pm
In the spot check using 1996 data, it seems that all the data is wrong on the average. This would suggest that there is more data than just the max and the min that go into the average calculation. But this is where the alarmists/function people fall on their face. The stations don’t record more than the max and the min or shouldn’t because there would most likely be a bias. This shows up in that the station always gets checked at noon over lunch and might add a warming bias. If a “function” of more sampling than that occured, it’s logical to assume that some averages would be lower and some would be higher than the average of the max/min due to seasonal changes.

My understanding is that the BOM takes automatic half-hourly temperature measurements, not just twice a day.
I believe previously they used thermometers that allowed them to record the maximum/minimum in the previous 24 hours, not just two measurements at particular times. I have a vague recollection of seeing such a thermometer (at least for the maximum). It had something that was pushed up by the mercury as the temperature rose, but that something didn’t fall back if the temperature dropped. That meant it showed the maximum temperature since it was last reset. It was reset on a daily basis (presumably when the readings were taken) so it would show the maximum temperature over the full 24 hours.
That sort of technology means that there’s no significant TOBS bias. The only time there would be the chance of a TOBS bias is if there was a significant temperature change just around the time of observation (eg. if the temperature is significantly dropping at 8:30am, for example, just as the observation is taking place, which may result in a bias in the minimum temperature being recorded). However that should be rare and this there should not be any significant TOBS bias in the data.

November 10, 2010 10:20 pm

Doug Proctor says:
November 10, 2010 at 7:06 pm
Steve Mosher & Graeme W.:
Are you saying that Ed’s article is BS? That his “errors” are misinterpretations? What’s the bottom line, fellas?
You being PC or cherrypick complaining?
###########
1. I am holding everyone to the same standard. I would need to see the code to understand how he did things.
2. three years ago I did a similar exercise with GHCN because I was concerned about the rounding NOAA was using. I looked at US stations and found nothing like Ed found. Thats just an observation.
3. I spent about 3 months looking at some of the ins and outs of GHCN and made numerous mistakes before I found out the way they did things didnt cause the problem I thought there was. So, I never rule out auditor error.
I have no opinion on whether or not he made and error or GHCN did. I know my past work hasnt found such errors. So, I’d have to see the code

November 10, 2010 10:30 pm

Graeme W says:
November 10, 2010 at 5:53 pm
Actually, Steven, I wasn’t even trying to replicate what Ed had done. I was simply doublechecking his figures that he said were the BOM figures. I quickly found out that they weren’t. That Mar 2005 maximum temperature of 33.6 isn’t what’s listed on the BOM website, and I can’t see any way that it could be derived from the raw daily data for Mar 2005.
If he’s used the wrong data from BOM, then all the BOM to GHCN comparison’s he’s listed are meaningless.
############
as I noted above I had done a similar exercise 3 years ago with US stations and found nothing like what Ed has found. There is also a problem with simply choosing dup 0 but I didnt want to get into that.
A bunch of us started a discussion of the duplicate problem a while back, but I’m not going to get into that. It also kinda looks like spreadsheet work which involves manual steps that can really screw things up ( personal experience).

November 10, 2010 10:41 pm

George E. Smith says:
November 10, 2010 at 7:01 pm
“””” Mike says:
November 10, 2010 at 12:39 pm
The average value of a continuous function f(t) over an interval [a,b] is the integral from a to b of f(t) divided by (b-a). It is not equal to the average its maximum value and its minimum value. For example (I just covered this in class today) the average value of sin(t) over [0,pi] is 2/pi which is about 0.6366. The max is one and the min is 0, which average to 0.5. The shape of the function matters.
#######
while technically true its really beside the point. The average obtained by sampling min/max is just an estimate of the number you would get by integrating.
Its not intended to represent the true average, but rather to estimate it.
The question is does this method give you an unbiased estimator. And does it give you an unbiased estimator over time.
This issue bother me as well, so I just looked at real data. temperature data that had been taken at short intervals and compared the answers you get looking at it both ways. Now its trivially true that the answers are different. The question is : is there a high or low bias? AND does that bias shift over time. Well, little did I know that I was re doing work that had been done before. Its not a biased estimator and it doesnt change over time.
If you like go download CRN data ( 5 minute intervals) and see for yourself, or look at Jerry Bs work over on John Dalys old site.. 190 stations, sampled every hour with the average calculated two different ways (min max) and integrated. We call the “integrated approach” Tmean and the min max approach Tave.
Simply, “average” doesnt mean mean.. its an unbiased ESTIMATOR of the mean.

Jeff Alberts
November 10, 2010 10:47 pm

No one can work out a hard science problem by watching TV. Can’t the man read?

If he’s from New Jersey, probably not. 😉

Jeff Alberts
November 10, 2010 10:55 pm

He should give up his day job.

I believe for the last year or so blogging is and has been Gavin’s day job. He spends so much time defending the indefensible he can’t possibly have time for actual work.

November 10, 2010 10:55 pm

NOAA is still putting 999.9 error codes into the GHCN database, which is then being used by GISS, even though BoM data is available for the relevant month.
The wonderful Jo Nova has allowed me to make some points at:
http://joannenova.com.au/2010/10/bom-giss-have-record-setting-bugs-affecting-a-million-square-miles/
To summarise, I point you to one of the Western Australia locations with BoM data “missing” from GHCN and GISS – the goldfields town of Kalgoorlie-Boulder. Check the GISS database:
http://www.waclimate.net/501946370000.txt
Note the 999.9 error for September 2009 down the bottom of the database. Now check the BoM record for September 2009:
http://www.bom.gov.au/climate/dwo/200909/html/IDCJDW6061.200909.shtml
The mean temp for Sep 2009 was 13.9 C. That means the Spring (S-O-N) 2009 average for Kalgoorlie-Boulder was 19.4 C.
GISS calculates it as 20.6 C . That’s 1.2 C above reality.
In turn, this means the annual average was 19 C , not 19.38 C as calculated by GISS. All 30 days in Sep 2009 were recorded by the BoM for Kalgoorlie-Boulder, so the monthly mean is valid.
Incidentally, I noticed and calculated the Kalgoorlie-Boulder error for September 2009 on Oct 4, 2010. I came back to check my numbers the day after, Oct 5, and found the mean for every month in 2009 had been shifted up overnight by .1 C, so the Spring and annual averages also shifted up .1 C. I don’t know if every month in the entire database back to 1939 was adjusted up by .1 C because I hadn’t paid any attention to them on Oct 4. More than nine months after 2009, it’s difficult to understand why every month last year needed an upward adjustment for this particular recording location.
So BoM has the data, for some reason it isn’t included by NOAA in the GHCN, and for some reason the incorrect error is passed by NOAA to GISS which then substitutes it with a mystery temperature to overcome the problem, but lifts the seasonal mean by 1.2 C above what should have been received from BoM in the first place.
Does anybody at BoM, NOAA or GISS check their figures? Over the past year, hasn’t anybody noticed or wondered why a month is “missing” for a major country town (shire population near 30,000)?
More at http://www.waclimate.net/giss-adjustments.html
And since I’m talking about Kalgoorlie-Boulder, check my graph of all temperature records for the town back to 1897, comparing historic BoM raw, BoM HQ adjusted, GISS and HadCRUT 3.
http://www.waclimate.net/giss-bom-kalgoorlie.html
Notice how much warmer the raw data is (blue line) compared to the HQ adjusted data (yellow line) in the first half of the 20th century?

November 10, 2010 11:04 pm

What caught my eye first off is that for some reason, the raw temperatures are stored in the database as integers, with what I presume is a one significant digit decimal. 273 becomes 27.3, 315 becomes 31.5, etc.
I’ve been a database developer for 25 years now, and I’m stumped to think of a reason for those temperatures to be stored in that manner. By using that transformation, someone somewhere has to keep the metadata to let future generations know that’s 31.5 and not 3.15 or even 315 degrees. Were they trying to save space by using integers instead of floating-point reals? /sarcasm /
Looking at how they came up with means where there were missing max or min values makes me wonder if their algorithm was not clearing out the accumulator when a “-9999” value was found. Without seeing the code there’s little chance of figuring out their process, but you can sure tell that something’s wrong somewhere!

Cynthia Lauren Thorpe
November 10, 2010 11:39 pm

I’ve read EVERY COMMENT thus far. I’m TRYING TO understand what you guys
are saying ~ mostly, though, I’m crossing my eyes, getting more coffee, and asking why you can’t simply ASK THEM FOR THE ‘CODE’ that will ~ Am I correct in this? ~ unlock the mysteries of their numbers that are lacking the floating decimals that prohibit anyone from coming back to their work (like future warmist prodigy?) to see watts up? (please forgive the pun)
Okay, okay ~ I’ll jus’ shut up and sip my coffee quietly. You guys are better than my favorite Soap in the 70’s… Dark Shadows… regardless, I believe you will eventually uncover their (door #1, 2, or 3): Error/Momentary (monetary?) Indescretion/oooor ~ Fraud. Regardless, I’m ‘rapt’ as you blokes say. God continue to bless Australia!
C.L. Thorpe

Geoff Sherrington
November 10, 2010 11:49 pm

By way of cross checking, I used a BOM version of temperature data from Port Hedland airport 04032 that was available before March 2007.
I selected Year 1966, daily observations of Tmax and Tmin, and made the Excel spreadsheet shown at
http://www.geoffstuff.com/Monthly%20from%20CD.xls
The conclusion is that the Tmean for 1966 is about 1 degree C higher from the BOM version than is shown above in the Ed Thurstan version. I used his data from his table following the line “As a spot check, the raw data from GHCN V2 for station 94312 in 1996 is as follows:”
This must be about post number 10 in 3 years where I have pointed to difficulties in discovering which Australian versions go from BOM to whom and when, then what more is done and why.
A one deg C difference in a year is enough to hide a decline.

Mark
November 11, 2010 12:37 am

PJP says:
November 10, 2010 at 1:42 pm
Government engineers … spend millions on a spacecraft, then find out after it has been launched that half the plans were metric and the other half imperial.
No doubt they also measure temperature in a mixture of Celsius and Fahrenheit; wind speed in a mixture of knots and feet per second and rainfall in a mixture of mm and inches 🙂

LabMunkey
November 11, 2010 1:05 am

“In my profession, errors of this sort would cause the whole dataset to be rejected. I am astonished that the much vaunted NOAA quality control procedures did not pick up such gross errors.”
Mine too. On top of that, had i been responsible for presenting/collecting it, i’d loose my job.

gnarf
November 11, 2010 1:33 am

In France, Courtillot (the main sceptical climatologist there) said something very true:
“from a thermodynamic point of view, averaging temperatures has no meaning. If you have two rooms, same volume, one is at 20 degrees C, the other is at 10 degrees C, and you open the door between the rooms and let the temperature stabilize, the final temperature will not be 15 degrees C”
Temperatures do not sum or average…but energy does. So if first you switch from temperature to radiated energy assuming the part of the earth from which you measured temp. is like a black body, then you can average these radiated energies, and come back to average temperature.

Lawrie Ayres
November 11, 2010 1:44 am

CL Thorpe,
Like you I need simple answers to very simple questions. E.g. Since the world’s future hangs in the balance and squillions of dollars ride on the outcome why don’t we have a proper, real high quality set of recording stations. Stations that are really compliant to the requirements of the 100 foot rule. Then we wouldn’t have these interminable discussions about the need for and methodology of adjustments.
I’m still having difficulty with a heavier than air gas floating high in the atmosphere rather than close to the surface.

E.M.Smith
Editor
November 11, 2010 2:19 am

FWIW, I think the method used by NOAA is a ‘average the daily means’ where what’s done here is to compare with the ‘monthly min / monthly max average’ and so will diverge. ( If I followed the article correctly at this late hour…)
Also, FWIW, you must know the “Vintage” of the data you are comparing at least down to the month. GHCN has Zombie and Lazarus thermometers who’s data show up at strange and wondrous times. So you can have large gaps that are then suddenly filled days, weeks, months, and even years later. And sometimes never shows up.
So unless you have a specific vintage that is the same in both sets, you may just be measuring the time instability of the data..

Ammonite
November 11, 2010 3:14 am

JamesS says: November 10, 2010 at 11:04 pm
I’ve been a database developer for 25 years now, and I’m stumped to think of a reason for those temperatures to be stored in that manner.
And data is persistent, so as soon as code is written to the format the problem becomes entrenched. I hear the US still uses imperial…

Cynthia Lauren Thorpe
November 11, 2010 3:33 am

Absolutely, Lawrie.
If enough of these ‘anomalies’ are world-wide and so…..so……basically inept ~ and that’s what we’re pinning the ‘hope of the world’ upon……as the story goes……. the ONLY thing that makes sense here, is that ‘they’ really own egos that blind them to the fact that truly knowledgeable people will eventually hold their ‘science’ to account.
Sorry if when I ‘sound off’ I seem to be a cynic. I’m truly not. It’s just that ol’ axiom ~
fool me once, etc… I essentially have great faith in fellow humans ~ but, basically only when their hearts have been humbled in some manner. God did that to me twenty years ago. While the process did hurt a bit, ego-wise… I heartily recommend humility to all. ‘Cause with God’s Wisdom an’ our ‘guts’ to stand as we should… this world is a much more congenial place for everyone ~ regardless the temperature.
Cynthia Lauren

amicus curiae
November 11, 2010 7:42 am

James Sexton says:
November 10, 2010 at 12:26 pm
Wouldn’t it be nice, if just once, someone in the field would engage in self audits to a point where they can find and admit and correct mistakes?
========
now why? do something that sensible and honest??
after all the TRUTH wont scare anyone into submission for an ETS Carbon Tax.
as an aussie I am purely disgusted with Bom, CSIRO and our govt and the ABC.
all in cahoots- and all an epic FAIL!

dixon
November 11, 2010 7:48 am

Thanks steven mosher for explaining about the average temp being an unbiased estimate of the mean. I’d been fretting about that when the daily cycle of hourly data from Perth, WA is so unsymmetrical. But I confess to being too lazy/inept to figure out how such a major flaw could have survived (I like to assume some degree of competence on both sides).

amicus curiae
November 11, 2010 7:57 am

Enneagram says:
November 10, 2010 at 2:39 pm
Evidently Post Modern Science is more lost than Adam on Mother’s Day 🙂
===
now that!! is funny!
the BoM…is NOT!

November 11, 2010 8:22 am

Chris Gillham says:
November 10, 2010 at 10:55 pm
NOAA is still putting 999.9 error codes into the GHCN database, which is then being used by GISS, even though BoM data is available for the relevant month.
########
One think that would be helpful is to put the problem into perspective for people.
If you write a program you can count the number of times the BOM has data that NOAA does not pick up. Then, you can do a summary and show rather easily in one number the impact this has on total record for australia.

November 11, 2010 1:28 pm

dixon says:
November 11, 2010 at 7:48 am (Edit)
Thanks steven mosher for explaining about the average temp being an unbiased estimate of the mean. I’d been fretting about that when the daily cycle of hourly data from Perth, WA is so unsymmetrical. But I confess to being too lazy/inept to figure out how such a major flaw could have survived (I like to assume some degree of competence on both sides).
########
when I first started looking at temperatures ( back in 2007 ) these are the things that immediately got my attention and which i set to work on trying to understand by looking at the data and running tests for myself.
1. The accuracy issue. How can we measure something to 1/10ths when the instrument is not good to 1/10th? That was easy. The Law of Large numbers.
2. The rounding issue. For the US we do this. We take the min in F ( round it) we take the max in F( round it) we average the two. ( round it) we then take a monthly average and report out to many decimal places. By creating series of of random numbers where I knew the GROUND TRUTH I could then simulate this process and show that rounding didnt bias the answer.
3. the min/max problem. Since I have worked in an industry built on Shannon and Nyquist the idea that you could get the “average” by sampling the min and the max made no sense whatsoever. But after spending time with the real problem I understood what they were doing. From a historical perspective we have several different forms of temperature collection:
A. min/max for the day.
B. 4-6 measures per day
C. hourly measures.
D. sub hourly measures ( today we have over 7 years of temperature every 5 minutes
from 3 different sensors at the same location– see the CRN network)
IF the problem you want to solve is figuring out whether or not the temperature has gone up, you are stuck with the historical min/max approach. That’s what they did in 1900. we have no time machine. So that is the metric you have. A physicist can look at this and say “if you integrate the function you will get a different answer than if you sample it twice” Well, That’s not the question. The questions are:
A. if you want to compare today to yesterday should you use the same method?
B. Does min/max bias high or bias low or does it give a random error?
C. does the trend derived from min/max differ from the trend derived from integrating the function? and is that difference biased high or low?
Once you realize that you can easily find the solution. You could prove it mathematically ( I suppose) but I just did it experiementally with real data
and with simulated data. Again, you create a temperature series that has the
statisical properties of the typical series ( down to the second if you like) you
then apply a random distribution of trend components. You calculate ground truth.
Then you test your sampling approach and only sample min/max. You then compute trends and see if the trend is bias 0 or not. And in the end you find out that it just doesnt matter. So Physically the integrated Mean does not equal the average of min and max. But the average of min/max ESTIMATES the mean and that estimate is unbiased. that’s the best we can do. The perfect should not be the enemy of the good.
4. The station drop problem. Mathematically it does not matter. But again I proved it doesnt matter by several methds of looking at the data.
There are bunch of other problems that Might matter. But getting people to pay attention to those is hard. Anthony gets these problems. SteveMc gets these problems. People writing papers today (in submission) get these problems, but most skeptics are missing the key issues and focusing on other issues that don’t matter.

Graeme W
November 11, 2010 1:35 pm

JamesS says:
November 10, 2010 at 11:04 pm
What caught my eye first off is that for some reason, the raw temperatures are stored in the database as integers, with what I presume is a one significant digit decimal. 273 becomes 27.3, 315 becomes 31.5, etc.
I’ve been a database developer for 25 years now, and I’m stumped to think of a reason for those temperatures to be stored in that manner….

On the other hand, I’ve also been working with databases for 25 years now, and I can think of a couple of reasons. The first and simplest is disk space. If you go back 25 years, disk space was much more expensive than it is today. Programmers would try to minimise the amount of disk space large datasets would take up, and integers would commonly take up half the disk space of a float. Given that almost all the data in the dataset is numbers, that means storing the number data as integers would have resulted in a significant reduction in disk space required, with a considerable saving in money for the project at the time.
The other reason I could think of was processing time. Again, with lots of number crunching being potentially required, integer arithmetic was much faster than floating point arithmetic. Having the data as integers would have offered a decrease in processing time. Whether that was significant would be dependent on what power machines they had.
Neither reason is particularly valid today, but I remember in my early days in IT of always running out of disk space on the machines we had available to us (we weren’t a large organisation with almost unlimited resources available to us), so a trick like this to reduce the amount of disk space needed would have been seriously considered.

dixon says:
November 11, 2010 at 7:48 am
Thanks steven mosher for explaining about the average temp being an unbiased estimate of the mean. I’d been fretting about that when the daily cycle of hourly data from Perth, WA is so unsymmetrical. But I confess to being too lazy/inept to figure out how such a major flaw could have survived (I like to assume some degree of competence on both sides).

When you only have two figures (max and min), the mean (commonly called the average) is about the best calculation you can get, but, as you point out, it doesn’t allow for asymmetrical temperature patterns (especially if they’re not consistent). Given that BOM takes, I believe, half hourly readings, much better approaches are possible.
Personally, I would have thought that a median value of the half-hourly readings would be a better judge of the ‘average’ temperature for a day, rather than a mean. That is, if all the readings were sorted into increasing order, the one in the middle of that sort would be the median. That would eliminate any effects from temporary temperature spikes.
Alas, we don’t have historical half-hourly readings going back far enough, and you can’t mix median and mean averages and do any sort of fair comparison, so we’ll just have to stick with the mean, and the lowest-common-denominator approach of simply averaging the max and min for the day.

Tim Folkerts
November 11, 2010 1:57 pm

There seem to be legitimate concerns about the data presented here. As far as I am concerned, until the original author comes back to address these issues, it it not even worth worrying about this whole blog entry.
In effect, a peer review has been done and significant questions have been raised. If the author doesn’t address these issues, why should the rest of us have more interest in it than the author does?
Unfortunately, the unreviewed version has already been “published” and casual readers will get no more than ” “Gross” Data Errors in GHCN V2. for Australia” headline. If the author cannot address the questions sufficiently in a short period of time, then a retraction would be in order — perhaps ” Gross Data Errors spotted in “Gross” Data Errors in GHCN V2. for Australia.”

Ed Thurstan
November 11, 2010 3:41 pm

Mike
To equate to the daily temperature cycle, I think your integral should have been over [0,2pi]. But no matter, I agree that an integrated temperature is more satisfying to an engineer than a min/max average. But only AMO stations give this, and that would limit the data supply to maybe 30 years.
I invested in some of Anthony’s very excellent temperature dataloggers to investigate this. I am doing a very simple rectangular integration on hourly data to compare it with the min/max average. I don’t have summer data yet, but so far I am surprised. Plotting integrated temperature against min/max average, I expected to see the slope of the correlation change between seasons, but I can’t see it yet. But I am at 33South. So I checked hourly data from Canada, from 85North. Again, not much difference between summer and winter. I plan to do some more work.
RuhRoh
I don’t agree that “application of ‘judgement'” has a place in any science that relies on recorded numerical data. It is little different from consensus scientific decisions.
Rational Debate
The one way half adjustments do create a tiny bias, but otherwise, you are correct. The errors are simply indicative of sloppiness, lack of thought and inadequate quality control of the output of the GHCN principal product – a mean temperature. Would you accept the same sloppiness in a financial institute’s calculation of your mortgage repayment ?
MikeA, Nick Stokes, David Evans
You may well be right, but you are surmising. You don’t know, and nor do I. And NOAA does not tell us their method. Just how many days are you prepared to miss in a month and still believe the monthly average of those remaining days ? How about we reject any suspicious low temperatures – especially recent ones ? Can we apply this short month principle to annual data, so that we can report an annual temperature when a couple of months data are missing ? To paraphrase a well known person – NO WE CAN’T !
NOAA simply adjust suspicious (in their minds) data which they receive from BOM, and delete some. I can’t find out who calculates V2 Mean. NOAA ? or was it supplied by the Aust. BOM ? But I am a simple engineer, and I expect that when someone puts up two numbers and their mean, then unless qualified, I expect that mean to be calculated the way it always has been.
I once worked for a Govt. research establishment. If the objective of an experiment had been to calculate a daily average based on hourly recordings, and I missed one, the whole day’s work was lost. If I had gone along to my section leader and said “Boss, I missed a reading while on a coffee break, but it does not matter. But if you want, I can figure out what it should have been from the surrounding data”, then I would have been booted out of that organisation.
Scott
Data appears on that site daily. The Aust BOM have no basis on which to adjust it at that frequency. It matches (with rare exceptions) the GHCN raw data which is purported to be raw as received from supplying country (at least for Australia). The BOM Daily High Quality data IS adjusted.
Graeme W
I am grateful to you for spotting that error, then going further to identify what I had done wrong.
2005 and 2007 both exhibited errors. I chose 2005 from GHCN, but unfortunately chose 2007 BOM data, then calculated the the differences. The 1991 and 1995 comparisons are correct.
The offending data for 2005 should read
BOM MAX Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 39.4 38.2 38.4 38.2 33.3 26.4 27.9 28.6 31.4 33.6 37.2 36.3
BOM MIN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 26.7 27.3 26.4 23.7 18.5 14.7 14 13.8 15.6 18 20.8 25
BOM MEANJan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 33.05 32.75 32.4 30.95 25.9 20.55 20.95 21.2 23.5 25.8 29 30.65
DIFFERENCES
MAX Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 0.0 0.0 0.0 – 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
MIN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 0.0 0.0 0.0 – 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
MEAN Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 1.2 0.9 0.9 1.0 0.8 0.7 0.8 0.7 0.7 0.7 0.7 0.8
I will give Anthony an Errata to be appended to my paper. Anyone who wishes a copy of the corrected paper may email me for one at thurstan@bigpond.net.au.
Ed Thurstan

Kev-in-UK
November 11, 2010 4:09 pm

The various comments about means and calculations are indeed interesting. But the most important calculation is the one thats been used by GHCN (or GISS or CRU, etc,). I have a simple question, which I believe has been partly alluded to in other comments – and that is – what is the data they have and what are the calculations do they use to arrive at the so called ‘mean’?
As someone, who as a schoolboy, took part in the daily met observations which were supposedly submitted to the met office in the 70’s – I remember that we had to take the obs at a certain time each day (9am as I vaguely remember) – and I remember also the max/min thermometer, etc. From this recollection, and the knowledge of the more modern equipment taking continuous readings – how can anyone reconcile older observations with modern ones? For certain, the old min/max thermometer just gave the ‘actual’ min/max value and (ignoring any device errors) this was an absolute value – I presume (but don’t know) that modern electrical devices simply ‘store/record’ the min/max the same as the old manual method? – if so, comparison to older records should at least be reasonable – but what if the new electronic devices (or the software that collects the readings) already do some kind of temp averaging over a 24hr period (i.e. by totting up all the readings and dividing by the number of readings of that day) ?
Someone mentioned half hour readings, for example, – could this miss a real min/max value? And more importantly, are we comparing chalk with cheese when trying to look at old and modern data?

Graeme W
November 11, 2010 6:57 pm

Okay, based on the new data, I can confirm that there is a problem with the calculation of the mean in the GHCN data file.
It doesn’t matter if you take the mean of the daily (max+min)/2 or if you take the monthly mean max and min and average them (the calculation results in the exact same number — an exercise for the mathematicians to prove if you’re bored, but the proof isn’t difficult, I just don’t know how to do the subscripting required to put it here), the numbers don’t add up.
I took the Jan 2005 figures from the BOM site for Port Hedland (one of the links I gave earlier gives the daily data for 2005), and from that calculated the mean max, mean min and the average of the two. The final answer doesn’t agree with the V2.mean data from GHCN.
My result was 33.06 (to two decimal places) because I didn’t do any rounding until the end, but that’s a long way from the 31.9 that the GHCN data reports. I can’t see how they got their figure (the BOM site doesn’t report mean daily temperatures, only max/min temperatures).
Could someone who is more knowledgeable about GHCN answer the question, or email an appropriate person at NCDC to ask the question? Everywhere I’ve looked people have assumed that the v2.mean file is the ‘raw’ mean temperature, but whatever it is, for Jan 2005 for Port Hedland, it’s not the ‘mean’ as per standard mathematical calculations.
[REPLY – So far as I know, v.2 is not raw, but the “new” adjusted number for GHCN (USHCN went to v.2 about a year back). ~ Evan]

Tim Folkerts
November 11, 2010 9:13 pm

Ed,
I appreciate your efforts to correct the errors. It would be wonderful if all such issues were addressed as rapidly. I haven’t had time to look at the specific changes to see exactly how it affects your original post or conclusions, but I’m sure other readers will continues to examine your results.
Tim

November 12, 2010 12:07 am

Kev-in-UK
read the observers handbook. we’ve discussed it many times in the past three years

November 12, 2010 2:47 am

if you want to understand why the ghcn numbers dont match I think you need to understand
“The reason why GHCN mean temperature data have duplicates while
mean maximum and minimum temperature data do not ……
“DUPL : Duplicate number. One digit (0-9). The duplicate order is
based on length of data. Maximum and minimum temperature
files have duplicate numbers but only one time series (because
there is only one way to calculate the mean monthly maximum
temperature). The duplicate numbers in max/min refer back to
the mean temperature duplicate time series created by
(Max+Min)/2.

very simply: the v2min and v2max are CALCULATED from all the duplicate records. So, when ed picks a duplicate record (Dup=0) from the v2mean he is picking one record. the longest record.
When you look at a record in V2Max you dont have duplicate records. You have ONE record. it has a duplicate number, but there are no Dups. there is only one record.
That one record has the MAX found in all duplicate records.
For V2Mean NOAA just records all the duplicate records. If the duplicate record as a min and max.. then a mean is calculated from min max of that dup record
For V2max NOAA looks at all the duplicate records and picks the Highest value for the max ( it could come from dup record 1) and the lowest value for MIN ( it could come from dup 2)
So V2min and v2max do NOT get used to create V2mean.
so: There is a source file that has all duplicate records. ( we dont have access to this)
3 files get created:
V2mean: reduces all duplicates to means
V2Max: looks at all duplicates and picks the max of all duplicates
V2Min: picks the min.
V2max and v2min DONT get used to create V2MEAN. they cant. becaise they only have one record per station. While V2mean has the duplicates
(duplicates are not duplicates… They differ ) Thats a whole nother story

Al Cooper
November 12, 2010 11:27 am

The high temp for a day might persist for one hour and the low temp might persist for several hours. Using these for a daily “mean/average” would be very inaccurate and misleading.
I would like to see a RMS (root-mean-square) result of temps taken at one hour or less intervals over a one day/week/month/year period.

November 12, 2010 1:43 pm

Al Cooper says:
November 12, 2010 at 11:27 am (Edit)
The high temp for a day might persist for one hour and the low temp might persist for several hours. Using these for a daily “mean/average” would be very inaccurate and misleading.
I would like to see a RMS (root-mean-square) result of temps taken at one hour or less intervals over a one day/week/month/year period.
#######
read my comments. You dont understand the terminology being employed. if you would like to understand it, go look at the data I pointed people at and calculate your own numbers.
The min/max is an ESTIMATOR of “average” unbiased estimator. It is not meant to capture an average that is the integral of the function. It is meant to ESTIMATE that value. Sometimes its high, sometimes its low. You can go look at years of data collected every 5 minutes and see for yourself. Nobody will do your work for you.

Al Cooper
November 12, 2010 2:13 pm

Steven Mosher says:
November 12, 2010 at 1:43 pm
“read my comments. You dont understand the terminology being employed. if you would like to understand it, go look at the data I pointed people at and calculate your own numbers.”
Sterve I am not interested in an ESTIMATOR of temps that are costing me $$$.
I want the ACCURATE truth.

Al Cooper
November 12, 2010 2:32 pm

Steven Mosher says:
November 12, 2010 at 1:43 pm
“read my comments. You dont understand the terminology being employed. if you would like to understand it, go look at the data I pointed people at and calculate your own numbers.”
You are the professed expert; I expected no replay that was not cordial.
You assume I do not follow your math.
You say I don’t understand the terminology, you do not know that.
If this is your best answer to my post, you have lost my respect.

November 13, 2010 12:47 am

Al Cooper says:
November 12, 2010 at 2:13 pm (Edit)
Steven Mosher says:
November 12, 2010 at 1:43 pm
“read my comments. You dont understand the terminology being employed. if you would like to understand it, go look at the data I pointed people at and calculate your own numbers.”
Sterve I am not interested in an ESTIMATOR of temps that are costing me $$$.
I want the ACCURATE truth.
#######
There is no such thing Every measurement is an estimation. I can explain that, but I cannot make you understand.
in 1850 in Anytown USA they looked a thermometer ( a min/max) thermometer once a day. They recorded 10C for the low and 20C for the high. I cannot change that history. in the year 2010, we have a thermometer in the same place. We record a min temp of 11C and a high temp of 20. We would estimate that it is warmer now. We would not estimate that it is cooler. We might bemoan the fact that we dont have temperature measures every nanosecond. But we dont have them. so we have to estimate.
We Could throw a tantrum, but that’s easy. The tough thing is to do the best you can with the data you have and characterize your uncertainty. There is ALWAYS uncertainty.
if you want accurate truth try math or logic they come closer.

November 13, 2010 12:48 am

Al:
I would like to see a RMS (root-mean-square) result of temps taken at one hour or less intervals over a one day/week/month/year period.”
well, whats stopping you? I pointed you at data sources. get to it.