Errors in GHCN metadata inventories show stations off by as much as 300 kilometers
Guest post by Steven Mosher
In the debate over the accuracy of the global temperature nothing is more evident than errors in the location data for stations in the GHCN inventory. That inventory is the primary source for all the temperature series.
One question is “do these mistakes make a difference?” If one believes as I do that the record is largely correct, then it’s obvious that these mistakes cannot make a huge difference. If one believes, as some do, that the record is flawed, then it’s obvious that these mistakes could be part of the problem. Up until know that is where these two sides of the debate stand.
Believers convinced that the small mistakes cannot make a difference; and dis-believers holding that these mistakes could in fact contribute to the bias in the record. Before I get to the question of whether or not these mistakes make a difference, I need to establish the mistakes, show how some of them originate, correct them where I can and then do some simple evaluations of the impact of the mistakes. This is not a simple process. Throughout this process I think we can say two things that are unassailable:
1. the mistakes are real. 2. we simply don’t know if they make a difference. Some believe they cannot (but they haven’t demonstrated that) and some believe they will (but they haven’t demonstrated that). The demonstration of either position requires real work. Up to now no one has done this work.
This matters primarily because to settle the matter of UHI stations must be categorized as urban or rural. That entails collecing some information about the character of the station, say its population or the characteristics of the land surface. So, location matters. Consider Nightlights which Hansen2010 uses to categorize stations into urban and rural. That determination is made by looking up the value of a pixel in an image. If it is bright, the site is urban. If it’s dark (mis-located in the ocean) the site is rural.
In the GHCN metadata the station may be reported at location xyz.xyN yzx.yxE. In reality it can be many miles from this location. That means the nightlights lookup or ANY georeferenced data ( impervious surfaces, gridded population, land cover) may be wrong. One of my readers alerted me to a project to correct the data. That project can be found here. That resource led to other resources including a 2 year long project to correct the data for all weather stations. Its a huge repository. That led to the WMO documents one of the putative sources for GHCN. This source also has errors. Luckily the WMO has asked all member nations to report more accurate data back in 2009. That process has yet to be completed and when it is done we should have data that is reported down to the arc second. Until then we are stuck trying to reconcile various sources.
The first problem to solve is the loss of precision problem. The WMO has reports that are down to the arc minute. It’s clear that when GHCN uses this data and transforms it into decimal degrees that they round and truncate. These truncations, on occasion, will move a station. I’ve documented that by examining the original WMO documents and the GHCN documents. In other cases it hard to see the exact error in GHCN, but they clearly dont track with WMO. First the WMO coordinates for WMO 60355 and then the GHCN coordinates:
WMO: 60355 SKIKDA 36 53N 06 54E [36.8833333, 6.9000]
GHCN: 10160355000 SKIKDA 36.93 6.95
GHCN places the station in the ocean. WMO places it on land as seen above.
To start correcting these locations I started working through the various sources. In this post I will start the work by correcting the GHCN inventory using WMO information as the basis. Aware, of course that WMO may have it own issue. The task is complicated by the lack of any GHCN documents showing how they used WMO documents. In the first step I’ve done this. I compared the GHCN inventory with the WMO inventory and looked at those records where GHCN and WMO have the same station number and station name. That is difficult in itself because of the way GHCN truncates names to fit a data field. It’s also complicated by the issue of re spelling, multiple names for each site and the issue of GHCN Imod flags and WMO station index sub numbers.
Here is what we find. If we start with the 7200 stations in the GHCN inventory and use the WMO identifier to look up the same stations in the WMO official inventory we get roughly 2500 matches. Here are the matching rules I used.
1. the WMO number must be the same
2. The GHCN name must match the WMO name (or alternate names match).
3. The GHCNID must not have any Imod variants. (no multiple stations per WMO)
4. The WMO station must not have any sub index variants. (107 WMO numbers have subindexes)
That’s a bit hard to explain but in short I try to match the stations that are unique in GHCN with those that are unique in the WMO records. Here is what a sample record looks like.WMO positions are translated from degrees and minutes to decimal degrees and the full precision is retained. You can check that against GHCN rounding. As we saw in previous posts slight movements in stations can move them from Bright to dark and from dark to bright pixels.
63401001000 JAN MAYEN 70.93 -8.67 1001 JAN MAYEN 70.93333 -8.666667
63401008000 SVALBARD LUFT 78.25 15.47 1008 SVALBARD AP 78.25000 15.466667
63401025000 TROMO/SKATTO 69.50 19.00 1025 TROMSO/LANGNES 69.68333 18.916667
63401028000 BJORNOYA 74.52 19.02 1028 BJORNOYA 74.51667 19.016667
63401049000 ALTA LUFTHAVN 69.98 23.37 1049 ALTA LUFTHAVN 69.98333 23.366667
You also see some of the name matching difficulties where the two records have the same WMO and slightly different names. If we collate all differences on lat and lon in matching stations we get the following:
And when we check the worst record we find the following
WMO: 60581 HASSI-MESSAOUD 31.66667 6.15
GHCN: 10160581000 HASSI-MESSOUD 31.7 2.9
GHCN has the station at longitude [smm] 2.9. According to GHCN the station is an airport:
The location in the WMO file
And the difference is roughly 300km.WMO is more correct than GHCN. GHCN is off by 300km
An old picture of the approach (weather station is to the left)

Now, why does this matter. Giss uses GHCN inventories to get Nightlights. Nightlights uses the location information to determine if the pixel is dark (rural) or bright (urban)
NASA thinks this site is dark. They think it is pitch dark. Of course they are looking 300km away from the real site. From the inventory used in H2010.
10160581000 HASSI-MESSOUD 31.70 2.90 398 630R HOT DESERT A 0





I cannot actually believe they round lat/lon off to 2 decimal places! That’s such an embarrassingly basic error to make, anyone that’s ever worked with plotting points on a google map would know that within 30 seconds
Simon says:
November 2, 2010 at 6:33 am
I cannot actually believe they round lat/lon off to 2 decimal places! That’s such an embarrassingly basic error to make, anyone that’s ever worked with plotting points on a google map would know that within 30 seconds
#########
yep! the other thing I wanted to show was what this does when you do look-ups into a grid that has a 1/120th of a degree accuracy.
While I stopped short of proving it mathematically it seems that if you take an orginal location in degrees,minutes/seconds and transform it to decimal and round, then your datapoints are highly likely to land on the grid BOUNDARIES and on the grid corners. horrible stuff to track down.
of the 7280 stations well over half will land on a nightlights pixel boundary.
All because of rounding.
The boundary pixel problem means that it is next to impossible to verify against GISS .
Wow, yet another example of how poor the data used by climate scientist to try and measure a 0.? temperature anomaly is. The same type of problem came out strongly in the CRU Climategate scandal, thanks to the Harry reedme file. No wonder Jones was willing to destroy the data, rather than allowing other scientists to try and reproduce his departments work!
This is not the way science should be conducted, and there are plenty of other examples of ‘bending the truth’ for political gain. It would appear the modern scientific methods used by the IPCC climate cable can be split into two forms – the inductive and the deductive.
INDUCTIVE:
* formulate hypothesis
* apply for grant
* perform experiments or gather data to test hypothesis
* alter data to fit hypothesis
* publish
DEDUCTIVE:
* formulate hypothesis
* apply for grant
* perform experiments or gather data to test hypothesis
* revise hypothesis to fit data
* backdate revised hypothesis
* publish
(Thanks to Tom Weller’s book – SCIENCE MADE STUPID)
Tenuc.
The problems with the location data in GCHN predate the whole Jones issue and are unrelated.
When the data was created there was no thought of using it to geolocate. The problem is being worked by NOAA. its hard tedious work