BoM raw temperature data, GHCN and global averages.

In honor of Google’s latest diversity kerfuffle, I continue with my diversity initiative on WUWT with a guest post by Nick Stokes.~ctm

By Nick Stokes,

There is an often expressed belief at WUWT that temperature data is manipulated or fabricated by the providers. This persists despite the fact that, for example, the 2015 GWPF investigation went nowhere, and the earlier BEST investigation ended up complementing the main data sources. In this post, I would like to walk through the process whereby, in Australia, the raw station data is immediately posted on line, then aggregated by month, submitted via CLIMAT forms to WMO, then transferred to the GHCN monthly unadjusted global dataset. This can then be used directly in computing a global anomaly average. The main providers insert a homogenization step, the merits of which I don’t propose to canvass here. The essential points that you can compute the average without that step, and the results are little different.

The accusations of data corruption got a workout with the recent kerfuffle over a low temperature reading on a very cold morning at Goulburn, NSW in July, so I’ll start with the Bureau of Meteorology online automatic weather station data. I counted recently a total of 712 such stations, for which data is posted online every half hour, within ten minutes of being measured. You can find the data by states – here is NSW. You can find other states from the bar at the top, under “latest observations”. Here is a map of the stations in NSW in this table:

clip_image002

For context, I have marked with green the stations of Goulburn and Thredbo top which had temperatures of below -10C flagged on that very cold morning in July. On that BoM table, you can see stations listed like this (switching now to Victoria):

clip_image004

I switched because I am now following a post from Moyhu here, and I want a GHCN station which I could follow through. But it is the same format for all stations. This data is from 4 December 2016, and I have highlighted in green the min/max data that will flow through (unchanged except for possible quality control flagging) to GHCN unadjusted. It shows for Melbourne Airport, the most recent temperature (22.4) at 7pm, various other data, and then the min and max, along with time recorded. The min is incomplete; it showed the latest 7pm temperature, but would no doubt be lower by 9am the next day, which is the cut-off. The max probably wouldn’t change. You can see the headings by linking to the page here.

If you click on the station name, it brings up a full table of the half-hourly readings for the last three days, in this style:

clip_image006

Apologies for jumping forward to now (7 Aug), but I didn’t record this back in December. It shows the headings relevant to the above too; the top line is present (a few minutes ago), going back. Now you can see that this has to be automated; no-one is hovering over this stream of data with an eraser. If you click on the “Recent months”, it brings up the following table (an extract here, and we’re back in Dec 2016):

clip_image008

That was taken at the same time (just after 7pm, 4 Dec), and you’ll see that it shows the minimum attributed to Sunday 4th (before 9am), at 9.1, but not yet the max. If you look below that table you’ll see a list of the last 13 months linked, for which you can bring up the complete table. Here is what that Dec 2016 table now looks like:

clip_image010

The max of 31.7 is there; the min went down to 15.7. The other data hasn’t changed. Further down on that page, as it appears now, are the summary statistics for the month:

clip_image012

At the end of Dec 2016, that was transmitted to the WMO as a CLIMAT form, which you can see summarized at the Ogimet site

clip_image014

clip_image016

You can see that the min and max are transmitted unchanged. The mean of the two has also been calculated and is marked in brown. If you want further authenticity, that site will show you the code that the met office transmitted.

Finally, the CLIMAT form is transcribed into the GHCN unadjusted file, which you can see here. It’s a big file, and you have to gunzip and untar. You can also get a file for max and min. Then you have a text file, which, if you search for 501948660002016TAVG (which includes the Melb code) you see this line:

clip_image018

There is the 19.5 (multiplied by 100, as GHCN does). The other numbers will appear in the GHCN TMAX and TMIN files.

You can even go through to the adjusted file, and, guess, what, it is still unchanged. That is because homogenization rarely modifies recent data. But older data may be. GHCN unadjusted does not change, except if the source notifies an error. There are quality controls, which don’t change numbers, but may flag them.

There have been endless articles at WUWT about individual site adjustments, but no-one has tried to calculate the whole picture of the effect of adjustment. With the unadjusted vs adjusted files, it is possible to do that. I have been calculating a global anomaly every month, using the unadjusted GHCN data with ERSST. The June result is here; there is an overview page here, with links to the methods and code. This post compares the result of unadjusted vs adjusted GHCN; the difference is small. Here from it is a plot from 1900 to start 2015 showing TempLS (my program) unadjusted (blue) vs adjusted (green) and GISS (brown), 12 month running man. It’s an active plot, so you can see more details at the linked site.

image

If you want more convenient access to the station data, I have a portal page here. The heading line looks like this:

image

The BoM AWS link takes you to this page, listing all station names with links to their current month data page. BoM also posts the metadata for all their stations, and that link takes you to this page, which lists all stations (not just AWS, and including closed stations) with links to metadata. The GHCN Stations button links to this page, which links to the NOAA summary page for each GHCN station by name, or if you click the radio buttons, to station annual data in various formats.

Summary

 

I have shown, for Australia (BoM) at least, that you can follow the unadjusted temperature data right through from within a few minutes of measurement to its incorporation into the global unadjusted GHCN, which is then homogenized for global averages. Of course, I can only show one example of how it goes through without change, but the path is there, and transparent. Those who are inclined to doubt should try to find cases where it is modified.

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
324 Comments
August 9, 2017 5:47 am

“I agree that this relates to the very modern period only.”
So, the way we approach assessing climate depends on the time period.
I hope Nick and Mosh can explain this further, as this means the squiggles in the lines mean different things at different times.
Sounds like an inexact science.
Andrew .

A C Osborn
August 9, 2017 7:22 am

You have to admit that Nick is very good at what he does, whether he actually believes what he says I am not sure.
I would like to bring up this point – “As I said in the article:
‘This can then be used directly in computing a global anomaly average.
The main providers insert a homogenization step, the merits of which I don’t propose to canvass here.
The essential point is that you can compute the average without that step, and the results are little different.”
It just so happens that Zeke computed the Average Global Temp (which I assume was really land only) using average “Actual Value”, with no adjustments at all.
Here is the Result, would you say it looks anything like the current “straightened Trend” graphs from GISS or any of the others?
http://rankexploits.com/musings/wp-content/uploads/2014/06/Averaged-Absolutes.png
And then there is this Statement “But data of what? NAS and GISS 1981 are of NH, land only (and very few stations).
UAH is troposphere. Hansen 1987 was based on met stations only, no SST. GISS since 1998 is land/SST.
There is no use comparing such disparate things.”
But NCDC were using the Land & Sea Global Average in 1998 as well, in fact they published the 1997 Global Land/SST temperature here.
https://www.ncdc.noaa.gov/sotc/global/199713
Note the comment about a different Baseline, but also note what they published what not the Anomaly, but the Computed Actual Temperature, ie Baseline + Anomaly.
It was 62.45F or 16.92C.
The 1999 Report for 1998 stated that 1998 was hotter than 1997, so higher than 62.45F.
The current NCDC value for 1998 is 58.13F or 14.53C.
So in approximately 20 years they have lowered the 1998 temperature by 4.32F or 2.39C.
Perhaps Nick can provide an adequate Justification for such a large change?

August 9, 2017 9:08 am

Nick – You have shown, “for Australia (BoM) at least, that you can follow the unadjusted temperature data right through from within a few minutes of measurement to its incorporation into the global unadjusted GHCN, which is then homogenized for global averages”.
Unfortunately this does not follow through GHCN to the final stage, even for the unadjusted data. You need to examine the integration of the CLIMAT data, which you show, into MCDW by NOAA, which then becomes the final “global unadjusted GHCN”, but, as my example here at https://oneillp.wordpress.com/2017/02/09/ghcn-m-raw-data-from-ireland/ shows, may lead to corruption of data values which up to this point have been correctly reported in GHCN unadjusted based on the CLIMAT data. This is known to GHCN (see the March 11th 2017 entry in the GHCN-M v3 status.txt file), but the “correction” hardly inspires confidence. Failure to spot that your “corrected” unadjusted data still contains six identical rogue mean temperatures values for six months of the year at 51.85°N is impressive. Five of these are picked up by quality control as more than 5 SD from the historic monthly mean, but one slips through into the adjusted data. An obvious check,which appears to be absent, would be that any MCDW values grossly different from the previous CLIMAT data are flagged for manual checking. The belief in the status.txt file that problems are confined to “select stations in Ireland” may be overly optimistic.

Reply to  Peter O'Neill
August 10, 2017 2:48 pm

I’ve just noticed that Australia is unique among the major contributors to GHCN-M v3 in _not_ having its data integrated into MCDW. Australia has 30672 records for 586 stations in ghcnm.tavg.v3.3.1.20170809.qcu.dat, with no incorporation in MCDW. The next largest number of records without incorporation in MCDW is 555, for Iraq, with 14 stations.
So there is a reason why Nick would not have considered that step of incorporation into MCDW since it does not happen for Australia. But it does happen for at least some records for every other country in the GHCN-M v3 inventory with more than 555 records, and so it seems unwise for NOAA to simply conclude that this problem affects only “select stations in Ireland” without further explanation.
The inability to correctly implement a correction does not inspire confidence that the nature of the problem has been identified. It does not suggest that data has been manipulated or fabricated, but rather gathered and maintained without due care and attention.

Nick Stokes
Reply to  Peter O'Neill
August 10, 2017 10:27 pm

Peter,
The point of the article was really transparency. It’s not saying that errors can’t be made; it’s saying that if made, they can be found. You were able to do that with the rogue Irish values, by comparing what Ireland recorded to what appeared in GHCN.

August 9, 2017 9:58 am

Nits…mostly tech writing issues:
First, technical writing like this can be especially time consuming, requiring a dozen or more revision cycles for even a short article.
IMO, the most valuable thing in the entire article is that Australia’s bill of, er, bureau of meteorology use (highest+lowest)/2 as the daily central tendency, i.e. aggregate temperature number for each “day”, where “day” doesn’t necessarily begin right at midnight nor sun-set nor sun-rise. Next month: Techniques used/abused for arriving at monthly aggregate temperature, and temperature anomaly figures. Maybe by December we can get to a clear write-up on the mysterious “homogenization” and the various techniques and software used to carry it out.
Use a single weather station and work straight through that one from beginning to end, not even mentioning others. That simply created confusion.
Along the way, keep the hyperlinks synchronized with the words, tables and graphs…do not separate them by even a sentence if at all possible. And no hand-waving allowed. Revision cycles can make this even more difficult. I say this because I am not sure which of numerous links was supposed to lead us to source code…and am wary enough of malware and time priorities are such as not to simply try each one until I have happened on the correct one.
Then, go to show a second station all the way through. Then a third… Choose each station as examples of possibilities: stable readings, changes in environs that probably affected readings up, …down.
What are the frequencies of polling of the automated weather stations? Yes, there is bound to be both variation and drift, and proper and improper handling of missed readings. There was probably even more variation, drift, and missing readings on the old manual systems.
The amount of data is overwhelming. (I remember the local meteorology dept. and weather bureau guys freaking out through several telecomm upgrades. There was a couple decade span when the Felonious State U meteorology dept. had more and better data on the weather on a spot on Mars than we had here on the ground in a 2-3 mile radius around the building.)
When you post a graph, diagram, map, etc., please provide a key (+scale for maps). Some of these look like they might be very interesting, but mean nothing without a key.
At first, I had to pause to try to translate. Is BoM, “bill of materials”? Ah, context, probably “bureau of meteorology”? UK? Ah, a map, so it must be Australia. GHCN? GWPF? Yah, sure, I think I looked up that last one once. NCDC? Is that more like CDCP or BATFE? RSS? CSST? Ah, the SST part is probably “sea surface temperature”. USCRN? USHCN? CRN1? ACORN? Those leftist saboteurs who had to change their name in shame, but then shifted to other gangs to make it more difficult to track them? NMAT? NASA I know; used to work there. TOBS looks like it might be “Times of Observations”…oh, or “Time of Observation Biases” (so, why not ToObs or ToOs or ToOBs? ToO or ToOB for singular?). At least the quote from “Karl’s paper” defined its main acronym.
Please, be kind. My network access device can only sip and display tiny quantities of words…and decent images. This virtual key-board is an abomination. And this is neither the main focus of my profession nor a significant hobby, though I certainly am interested in seeing all this clearly laid out.

Mary Brown
August 9, 2017 12:35 pm

The past has been cooled consistently to exacerbate the warming trend. Administrative adjustments now are a giant factor in the warming.
… These convenient adjustments are highly unlikely from a statistical perspective.
… If the adjustments went the other way, would they have been done?
… Heads it is adjusted, tails, it is ignored.
… Why does NASA always ignore their own satellite data and instead quote GISS?
Temp adjustments to the GISS can be viewed here…
https://postimg.org/image/ctawb7m49/
I suspect this game has run it’s course, though.

TA
Reply to  Mary Brown
August 9, 2017 6:14 pm

“I suspect this game has run it’s course, though.”
I think so, too. There is no legitimate defense for changing historic temperature records the way they were done, and that should be obvious by the weak or nonexistence defense of questions about the historic record in this thread.
One of these days the truth will out to everyone.

Carl
August 9, 2017 1:16 pm

The article you linked to shows:
. Average adjustment is small, 0.0175 C/decade.
. The adjustment for Darwin is upwards in recent years.
If half the adjustments are up and half are down, but the up ones are recent but the down ones are for older measurements, then we can have both these statements true:
. Average adjustment is small.
. The effect of the adjustment is a large increase in the warming trend.
Do you have a graph showing adjustment by year?

Nick Stokes
August 9, 2017 1:29 pm

“then we can have both these statements true”
The statement about small adjustment is quantified by the change to trend, in C/decade. I showed a histogram of that over long-record world stations – Darwin was well out in the high tail.
“Do you have a graph showing adjustment by year?”
adjustment of what?

Carl
August 9, 2017 1:57 pm

Your article from December 21 2009 has a graph titled “GHCN adjustment change to trend, stations > 80 years length”. You said the average adjustment to trend was small, 0.0175 C/decade. I thought you were saying the average adjustment to temperatures was small, but reading it again I see that you were referring to the adjustment to the trend, so what I said in my previous post isn’t relevant.
In the R program, you had:
for(i in 1:(len-1)){
kk=kk+1
# to find matching rows, first check diff between stat nos and years
u=vmean_ann_adj[j,]-vmean_ann[i,]
# If the adjusted counter has got ahead of the unadj, wait
if(u[1]<0){ if(j<jmax) j=j+1; u=vmean_ann_adj[j,]-vmean_ann[i,] } # If we have a match, add to regression vec vv[]
if(u[1]==0 & u[2]==0 ){
if(!is.na(u[3])){ # don't add to regression if NA
k=k+1 # local adjusted counter
jj[k]=kk # x for regression
vv[k]=u[3] # discrepancies for regression
}
if(j0){
m=m+1 # m is station counter
grad[m]=slope(vv[1:k],jj[1:k]) # compute regression slope
k=0 # zero local counters
kk=0
}
}
I can program but haven't used R. It looks like "i" will be incremented every time around the "for" loop, but j will only be incremented only once around the loop even if there are more rows in part of vmean_ann_adj than in vmean_ann. Would that mean some rows will be missed?

Nick Stokes
Reply to  Carl
August 9, 2017 4:07 pm

Carl,
This is the process for extracting station records from the GHCN format. This has one line per year (12 mths) per station. I have two files, one adjusted, one unadjusted. I’m trying to keep them in sync. Not every line in the undajusted file will have a corresponding one in the adjusted; that is why j tracks i. I don’t think lines get lost. They are basically just being sorted. I was also fairly new to R then; I do it differently now.

August 9, 2017 2:22 pm

If the T constructions that gave rise to the cold scare were bogus the same may be true of those that gave rise to the warm scare. Save the gay baby whales and the Iditarod! –AGF

August 9, 2017 7:18 pm

I still regard as a mystery why August 2009 monthly min and max for at least 32 Western Australia weather stations were listed, seemingly accurate, in the BoM’s CDO from 1 September to 17 November 2009 when they were suddenly changed, the average increase for August 2009 monthlies close to 0.4C. The BoM claims it was a bug in an updated version of its Daily Weather Observations on the web that caused rounding to the nearest degree. The database bug curiosity is detailed at http://www.waclimate.net/bom-bug-temperatures.html
Not directly related but still relevant, more than half of all daily observations recorded across Australia before 1972 Celsius introduction were rounded to the nearest F degree.

Carl
August 10, 2017 2:51 am

Your article from December 21 2009 has a graph titled “GHCN adjustment change to trend, stations > 80 years length”. It shows the average adjustment is quite small. Wouldn’t most adjustments occur for stations with a short period of measurements. Those would be the most problematic. Do you have a graph showing the frequency of each size of adjustment for all stations.
Also, I think leaving out the UHI effect is a major failing of the attempt to get an accurate long term picture. Since there are few places that have depopulated, and so many sites are now at busy airports, the UHI effect is bound to show a spurious warming.

Nick Stokes
Reply to  Carl
August 10, 2017 3:08 am

Carl,
I did some later analyses of GHCN V3. The last is here, and links to the earlier. It shows three groups; stations with >30 yrs data, >45 yrs data, and >60. I don’t think it’s true that most adjustments happen for short period stations, but they are the most affected. It takes a certain interval to compile the evidence that adjustment may be needed, so they are less likely to be near the ends.
Homogenisation adjusts for discrete events, so it can’t deal with gradual UHI. GISS deals with UHI separately.

Carl
August 10, 2017 2:52 pm

Nick:
Thanks for your time providing information re the BOM adjustments. You said “Homogenisation adjusts for discrete events, so it can’t deal with gradual UHI. GISS deals with UHI separately”. What is your opinion of the skill of the GISS adjustments? Given that UHI would show a false warming signal, have GISS compensated for that by reducing the measured warming?

Nick Stokes
Reply to  Carl
August 10, 2017 7:31 pm

Carl,
Sorry, I don’t know very much about the skill of the GISS UHI adjustments.

Carl
Reply to  Nick Stokes
August 10, 2017 10:29 pm

You haven’t checked what GISS does with your data? I meant all their adjustments to BOM data, not just for UHI. I would think that if the BOM has already adjusted the data and homogenised it, there wouldn’t be any more adjustments needed for discontinuities due to site changes. Do you know whether they adjust the Australian data before including? You’ve given thoughtful responses on the BOM data. I’m trying to get your opinion as an informed person of what happens after the data leaves the BOM.
Regarding the BOM adjustments. Unless there is a pattern to the discontinuities, the adjustments should average out to about zero as you said they do. Wouldn’t the important changes then be the non-random ones, in particular UHI. Homogenising the data might be useful for looking at the history of a particular place, but for looking at climate trends the important adjustments would be those that didn’t cancel each other out. The biggest would have to be UHI. I think for climate trends, it would be necessary to eliminate data that was corrupted by UHI over time. That would require studying each site individually. For example, if a site was in the country 30 years ago, but is now surrounded by bitumen, buildings, cars and air conditioning vents then you would eliminate that site from the list that was used for climate trends. It might still be useful for determining whether today was hotter than yesterday, but not for determining whether the climate has warmed by a fraction of a degree over 50 years.