Metadata fail: 230 GHCN land stations actually in the water

Why is this important? Well if you are calculating UHI for stations by looking at satellite images of nightlights, like GISS does (see my post on it at CA) , you’ll find that there’s generally no city lights in the water, leading you to think you’ve got no urbanization around the station. Using only 10 lines of code, Steve Mosher finds 230 errors in NCDC’s Global Historical Climatological Network (GHCN) data that places the station over water, when it should be on land. Does this affect the calculation of Earth’s surface temperature? Steve Mosher investigates. – Anthony

Wetbulb Temperature

by Steven Mosher

click to enlarge

This google map display is just one of 230 GHCN stations that is located in the water. After finding  instances of this phenomena over and over, it seemed an easy thing to find and analyze all such cases in GHCN. The issue matters for a two reasons:

  1. In my temperature analysis program I use a land/water mask to isolate land temperatures from sea temperatures and to weight the temperatures by the land area. An area that would be zero in the ocean, of course.
  2. Hansen2010 uses nightlights based on station location and in most cases the lights at a coastal location are brighter than those off shore. Although I have seen “blooming” even in radiance calibrated lights such that “water pixels” do on occasion have lights on them.

The process of finding “wet stations” is trivial in the “raster” package of R. All that is needed is high resolution land/sea mask. In my previous work, I used a ¼ degree base map. ¼ degree is roughly 25km at the equator.  I was able to find a 1km land mask used by satellites. That data is read in one line of code, and then it is simple matter to determine which stations are “wet”. Since NCDC is updating the GHCN V3 inventory I have alerted them to the problem and will, of course provide the code. I have yet to write NASA GISS. Since H2010 is already in the publishing process, I’m unsure of the correct path forward.

Looking through the 230 cases is not that difficult. It’s just time consuming.  We can identify several types of case: Atolls, Islands, and coastal locations. It’s also possible to put the correct locations in for some stations by referencing either WMO publications or other inventories which have better accuracy than either GHCN or GISS. We can also note that in some cases the “mislocation” may not matter to nightlights.  These are cases where you see no lights whatsover withing the  1/2 degree grid that I show. In the google maps presented below, I’ll show a sampling of all 230. The blue cross shows the GHCN station location and the contour lines show the contour of the nightlights raster. Pitch black locations have no contour.

I will also update this with a newer version of Nighlights. A google tour is available for folks who want it. The code is trivial and I can cover that if folks find it interesting. with the exception of the graphing it is as simple as this:

Ghcn<-readV2Inv() # read in the inventory
lonLat <- data.frame(Ghcn$Lon,Ghcn$Lat)
Nlight <- raster(hiResNightlights)
extent(Nlight)<-c(-180,180,-90,90) # fix the metadata error in nightlights
Ghcn<-cbind(Ghcn,Lights=extract(Nlight,lonLat)) # extract the lights using “points”
distCoast <-raster(coastDistanceFile,varname=”dst”) # get the special land mask
Ghcn <- cbind(Ghcn,CoastDistance=extract(distCoast,lonLat))
# for this mask, Water pixels are coded by their distance from land. All land pixels are 0
# make an inventory of just those land stations that appear in the water.
wetBulb <- Ghcn[which(Ghcn$CoastDistance>0),]
writeKml(wetBulb,outfile=”wetBulb”,tourname=”Wetstations”)

Some shots from the gallery. The 1km land/water mask is very accurate. You might notice one or two stations actually on land. Nightlights is less accurate, something H2010 does not recognize. Its pixels can be over 1km off true position. The small sample below should show the various cases. No attempt is made to ascertain if this causes an issue for identification of rural/urban categories. As it stands the inaccuracies in Nightlights and station locations suggests more work before that effort is taken up.

Click to enlarge images:

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
96 Comments
Inline Feedbacks
View all comments
Neal
November 8, 2010 8:23 am

Why are visible lights being used to identify population centers in the first place? Is there some reason actual data on the location of cities isn’t used?

November 8, 2010 8:31 am

redneck says:
November 8, 2010 at 5:38 am
Well it got me thinking about a quote I read years ago:
“Given enough data with statistics you can prove anything.”

I don’t know the source of your quote.
In any case, I recommend a classic on the subject of statistical gamesmanship: “How to Lie with Statistics” by Huff and Geis

Taphonomic
November 8, 2010 8:40 am

Steven,
One question regarding this work: have you verified that the displacement found is not a function of the location data being recorded in NAD27 and Google using NAD83 (or vice versa)? This can cause an apparent shift in location when plotting with Google.

Gordon Ford
November 8, 2010 9:12 am

In my former life, mining property development, QA/QC errors of this magnitude would be sufficient to cast serious doubt on all conclusions based on the data and metadata. The errors documented by Mr. Mosher indicate very sloppy work and a high probability that other errors exist in the data.
The discovery of errors of this sort in a mineral deposit database is a signal to prudent investors to consider bailing.
Betting trillions of dollars on conclusions developed from the current GHCN data without thorough search for, and correction off, additional errors is the height of folly.

November 8, 2010 9:24 am

Mike Haseler,
Its not like they schedule conferences to discuss shortcomings in current temperature records and ways to fix them, right? If you talk to scientists working on the surface temperature records (and there are surprisingly few of them, mostly owing to low budgets to fund ongoing work), you will find that they are quite well-aware of various factors that can lead to bias (station moves, sensor changes, land cover changes, urbanization, poor metadata, etc.), and spend most of their day figuring out how to a) improve the quality of the data and b) detect and correct for bias when it is not possible to remove it through obtaining higher-quality data.

Kev-in-UK
November 8, 2010 9:38 am

I wouldnt be surprised to see these stations suddenly added to the SST anomlay dataset!

jorgekafkazar
November 8, 2010 10:08 am

John Marshall says: “This may all be meaningless because, as some physicists say, temperature can only be taken if the system is at equilibrium. The atmosphere is never at equilibrium so any temperature taken is meaningless….”
We take care of this trivial problem by means of a special non-equilibrium algorithm. We can’t tell you what it is, or share the code, but it’s very robust. Twenty-five hundred scientists believe in it. Or maybe fifty. Whatever. When invoking this algorithm, we also burn sage and dance widdershins around a stripbark pine while chanting ‘hey-ya-ya-ya!” wearing nothing but a little woad. “All true Scotsmen” have also approved this algorithm, along with Al Gore’s Happy Ending Club, the Tooth Fairy League, the Union of Concerned Unicorns, the American Association of Phrenologists, the British National Academy of Astrologers, The World Conference of Necromancers, the Phlogiston Conservation Society, the Club of Barstow, and numerous professionals who don’t get paid unless they endorse it. We’re talking consensus, here, folks! Wake up and smell the Kool-Aid!
/s.o.

Robin Guenier
November 8, 2010 10:20 am

Re Fred Singer’s interesting claims (that (1) the reported 1977–1997 temperature rise is not seen in the proxy records and (2) that satellite data show “essentially” no warming between 1979 and 1997), thanks jimmi, John Peter, Juraj V and Tenuc (especially for the Woodfortrees link).
Interesting however that (re 2) jimmi says Singer is not correct, John Peter (via Roy Spencer) suggests that he is (temperatures were “flat”), Juraj V says that, although the TLT record show Singer is right, the SST record shows warming whereas Tenuc says that it does not – or at least that it’s “not statistically significant” (it was about 0.1 deg. C.) So I’m still unsure about whether or not Singer got that right. Any further thoughts?
Re (1) (Singer’s intriguing proxy record claim), Jura V’s glacier link seems to me inconclusive – is there any better evidence supporting him?

DesertYote
November 8, 2010 11:10 am

#
Martin Brumby says:
November 8, 2010 at 7:46 am
“Instead he suggested getting farmers to plant crops with shinier leaves. ”
#
Like Larry Nivens “Ringworld” Sunflowers! BzzzZAP!

Steven mosher
November 8, 2010 11:19 am

David Jones says:
November 8, 2010 at 1:16 am
Nightlights over water can be fishing boats or aerosols dispersing city light vertically. The instrument is sensitive enough to detect both. Or indeed, they can be calibration and/or registration errors, or blooming/bleeding.
##########
yes, the reasons are many as you suggest. The real issue isnt the presence of lights over water, the real issue is stations in dark water and the calculation of the fraction of land the station represents.
For lights over water I can handle that quite simply with the mask or with an even more highly detailed vector shoreline dataset, but its really not an issue since land stations should be on the land.
The problem of nightlights positional error ( >1km) means that you cannot simply register the station to the nightlights.. without the possibility of error.
using pitch black stations helps but ONLY in the developed world as nightlights does not track with population in the undeveloped world in the same fashion ( see Dodd and Pachuri)

Bad Andrew
November 8, 2010 11:22 am

(and there are surprisingly few of them, mostly owing to low budgets to fund ongoing work)
Zeke,
Why no money for such important work? If the numbers are innaccurate you get meaningless squiggly lines.
Maybe it’s not very important.
Andrew

Steven mosher
November 8, 2010 11:23 am

Zeke Hausfather says:
November 8, 2010 at 9:24 am
Mike Haseler,
Its not like they schedule conferences to discuss shortcomings in current temperature records and ways to fix them, right? If you talk to scientists working on the surface temperature records (and there are surprisingly few of them, mostly owing to low budgets to fund ongoing work), you will find that they are quite well-aware of various factors that can lead to bias (station moves, sensor changes, land cover changes, urbanization, poor metadata, etc.), and spend most of their day figuring out how to a) improve the quality of the data and b) detect and correct for bias when it is not possible to remove it through obtaining higher-quality data.
#########
well zeke in some cases that is correct. In other cases I have seen communication from certain well funded people who resist making fixes, and who continue to believe that certain data is more accurate than the actual source documents indicate. We will wait and see if they make the changes or not. In some cases they are not even aware that the source documents have been deprecated and the PI believes the old data should not be used. Just sayin.

Steven mosher
November 8, 2010 11:29 am

Taphonomic says:
November 8, 2010 at 8:40 am
Steven,
One question regarding this work: have you verified that the displacement found is not a function of the location data being recorded in NAD27 and Google using NAD83 (or vice versa)? This can cause an apparent shift in location when plotting with Google.
# Thats a smart question.
I did some tests with better station location data, correcting the mistaken locations in WMO and GHCN. That put those test stations on the land where they belong.
For some cases like Atolls its going to be very difficult.
WRT my graphics, the google earth image is loaded into R and then transformed so that its projection matches the projection of of Nightlights. I had assistance from experts on that part of the task to make sure that there wasnt any issue there.
Also, if you look at the NVDI flags for the stations you will find that in their prior work they also could se that certain land stations were in the water owning to the NVDI value they recorded. Basically the inventory was not DESIGNED to be used in conjunction with a product like nightlights.
On the other hand the error rate is very small ( 230 out of 7280 stations) and it WILL NOT change the final numbers in any substantial way. That’s no excuse for not using the best data.

Steven mosher
November 8, 2010 11:31 am

Neal says:
November 8, 2010 at 8:23 am
Why are visible lights being used to identify population centers in the first place? Is there some reason actual data on the location of cities isn’t used?
#####
asked that question 3 years ago.

Steven mosher
November 8, 2010 11:33 am

j ferguson says:
November 8, 2010 at 8:06 am
Steven,
These 230 stations are part of what number of stations in current use by GISS?
#####
I havent run the GISS numbers.
1. Its entirely possible that NONE of these mistakes carry over into GISS
2. I believe ( looking at their page) that GISS is aware of some of the problems so
they may be fixing it.
3. WMO has a new improved dataset out, that I will use for corrections later this week

Steven mosher
November 8, 2010 11:35 am

LearDog says:
November 8, 2010 at 7:18 am
Am traveling and don’t have access to my data to address the question: is this merely an issue of precision (x.xx vs x.xxxxxx) in the database or are these systematic / siting problems?
## both, less a precision issue with WMO stations, but both precision ( depending on the source) and flat wrong.

Steven mosher
November 8, 2010 11:37 am

Gary says:
November 8, 2010 at 7:27 am
And this just audits errors in the current sitings. What about previous site locations which may have had much different temperature measurement issues?
####
the historical dimension is mind numbing. Bascially, and I havent got to this, some of the sources GHCN and GISS use are estimates. No problem there EXCEPT where you try to register an estimate to a precise dataset like nightlights. Throwing darts.

Ian
November 8, 2010 11:38 am

Robin:
The problem in the proxy record is, in fact, well documented. If you go to Climate Audit, there are extended discussions of the issue – it is a problem which has led the likes of the CRU-gang and other Team members either to truncate their data or use other “tricks” to “hide the decline”. The problem they face is that many of the proxies used (tree rings, in particular), show no “warming” in the post-1970 period. They take a number of dubious approaches, including tacking on the “thermometer record” to the post-1960. That record, carefully “managed” by the various agencies, does show warming in the latter quarter of the 20th century.
Take at look at Climate Audit. There’s a good post entitled “The Trick”, dated 26 November 2009 that explains the problems that the warmists have faced with the proxy record. See: http://climateaudit.org/2009/11/26/the-trick/. (Sorry, not sure how to paste a live link).

Steven mosher
November 8, 2010 11:39 am

orkneygal says:
November 8, 2010 at 2:24 am
230 Stations.
Is that a lot?
### I dont address that. It depends how and IF these stations actually get used.
I’m about 5 steps away from that question. people always want to rush the answer.
Thats how you make small mistakes.
Does it matter?
If you want the best answer, yes it matters. If your happy with a little slop, probably wont matter.

ES
November 8, 2010 11:47 am

Is the Port Arthur station still being used for official temperatures? Port Arthur joined with Fort William in 1970 and created the city of Thunder Bay. The Environment Canada temperatures come from the airport which is not shown. If you follow the black line on lower part of picture the airport is outside the last yellow line in your picture.
That station is probably for ships entering the port.

Garrett Jones
November 8, 2010 12:02 pm

I do not have the math skills to follow up this idea, but I was wondering: if the temps and CO2 values are non linear data sets, could some facet of Chaos Theory be used to confirm their statical validity by straightforward calculation, rather the individual examination of how the data was collected at each site? (Current statistical tools seem not to be up to the job.) Specifically, since these are essentially bound data sets, i.e., temps will never go to absolute zero and CO2 will never go to 100%, (Or if either event does occur, we will not care.) could the math associated with “strange attractors” be a useful tool? Anyone with a thought?

Kev-in-UK
November 8, 2010 12:10 pm

I have a quick question regarding potential errors.
Steve says the error rate will be small and will not influence the final numbers, I don’t knowif this is a valid statement or not. Clearly, if assessment of UHI is based on some factors (e.g. nightlights), and other corection factors, such as station drop outs are ‘infilled’ using nearby (supposedly similar) station data, could there not be a compounding effect from say, an adjacent station to one of the 230 having been ‘homogenised’ using the less reliable station data? What I mean is, could there be a slight knock on effect?

November 8, 2010 12:32 pm

GHCN 83897 Florianopolis, Brazil.
http://maps.google.com/maps?q=-27.58+-48.57

Latimer Alder
November 8, 2010 1:51 pm

@zeke hausfather

Its not like they schedule conferences to discuss shortcomings in current temperature records and ways to fix them, right? If you talk to scientists working on the surface temperature records (and there are surprisingly few of them, mostly owing to low budgets to fund ongoing work), you will find that they are quite well-aware of various factors that can lead to bias (station moves, sensor changes, land cover changes, urbanization, poor metadata, etc.), and spend most of their day figuring out how to a) improve the quality of the data and b) detect and correct for bias when it is not possible to remove it through obtaining higher-quality data.

Now I know that the whole ‘field’ is in deep deep trouble. There is money coming out of people’s ears to do ‘climate change’ research. No matter how wacko a project, as long as it mentions AGW in its application it gets fast tracked to the front of the queue for spondulix.
And yet without accurate raw data that actually means something, there is nothing to study. Without experimental numbers one can bugger about with models until blue in the face or pin in the cheeks and it will all be entirely pointless. Am I the only one capable of grasping this fundamental point?
You say that the problems are widely known but the few people working on this area are ‘underfunded’. Well I ain’t met somebody living off other people’s hard-earned taxes yet who didn’t claim that they hadn’t been hosed down with quite enough luvverly lolly. But to say that the problems with data collection are known and understood but nobody has allocated enough money to fix them is surely criminal.
What of the IPCC – where in the reports does it say ‘actually chaps we know that this is all complete nonsense as we have no good data, but we’re going to devote all our individual and collective efforts over the next five years to getting a decent data collection mechanism together. Once we’ve done that we might be able to draw some sensible conclusions.
What of the ‘learned societies’. Where are they standing up and asserting that this is the most important problem in the whole field. Or the Hockey Team..brandishing their Nobel prize ( I shudder to write this without retching) an demanding that they cannot produce anything even remotely scientific without firm foundations.
I hear a devastating silence. Perhaps they are all so convinced about the rightness of their models that they have forgotten how to do experiments and data collection. Or have become so detached from the planet that they just don’t care anymore.
Whatever the reason, they bring deserved ridicule and contempt on their field. How can you tell when a ‘climate scientist is using robust data?’. And – you don’t have to, Such a thing does not exist.
Pitiful and contemptible.

Robin Guenier
November 8, 2010 2:43 pm

Ian:
Yes, I’m broadly familiar with the CRU/proxy issue – largely, as you say, associated with tree rings. But I suggest that Singer is making a point that is only slightly relevant to that and arguably directly relevant to this thread. What I think he’s saying is that the 1910-1940 warming can be seen in both the instrumental and proxy records – so that, in effect, each verifies the other. In contrast however, the 1977-1997 warming is seen in the (surface) instrument record but cannot be seen in the proxy records (presumably not confined to tree rings). To say the least, that seems odd. And surely should ring an alarm bell – and especially so if satellite data for the same period shows essentially no warming.
My question was (and is): is Singer correct in claiming that the 1977-1997 warming is not seen in the proxy record?