Why is this important? Well if you are calculating UHI for stations by looking at satellite images of nightlights, like GISS does (see my post on it at CA) , you’ll find that there’s generally no city lights in the water, leading you to think you’ve got no urbanization around the station. Using only 10 lines of code, Steve Mosher finds 230 errors in NCDC’s Global Historical Climatological Network (GHCN) data that places the station over water, when it should be on land. Does this affect the calculation of Earth’s surface temperature? Steve Mosher investigates. – Anthony
Wetbulb Temperature

This google map display is just one of 230 GHCN stations that is located in the water. After finding instances of this phenomena over and over, it seemed an easy thing to find and analyze all such cases in GHCN. The issue matters for a two reasons:
- In my temperature analysis program I use a land/water mask to isolate land temperatures from sea temperatures and to weight the temperatures by the land area. An area that would be zero in the ocean, of course.
- Hansen2010 uses nightlights based on station location and in most cases the lights at a coastal location are brighter than those off shore. Although I have seen “blooming” even in radiance calibrated lights such that “water pixels” do on occasion have lights on them.
The process of finding “wet stations” is trivial in the “raster” package of R. All that is needed is high resolution land/sea mask. In my previous work, I used a ¼ degree base map. ¼ degree is roughly 25km at the equator. I was able to find a 1km land mask used by satellites. That data is read in one line of code, and then it is simple matter to determine which stations are “wet”. Since NCDC is updating the GHCN V3 inventory I have alerted them to the problem and will, of course provide the code. I have yet to write NASA GISS. Since H2010 is already in the publishing process, I’m unsure of the correct path forward.
Looking through the 230 cases is not that difficult. It’s just time consuming. We can identify several types of case: Atolls, Islands, and coastal locations. It’s also possible to put the correct locations in for some stations by referencing either WMO publications or other inventories which have better accuracy than either GHCN or GISS. We can also note that in some cases the “mislocation” may not matter to nightlights. These are cases where you see no lights whatsover withing the 1/2 degree grid that I show. In the google maps presented below, I’ll show a sampling of all 230. The blue cross shows the GHCN station location and the contour lines show the contour of the nightlights raster. Pitch black locations have no contour.
I will also update this with a newer version of Nighlights. A google tour is available for folks who want it. The code is trivial and I can cover that if folks find it interesting. with the exception of the graphing it is as simple as this:
Ghcn<-readV2Inv() # read in the inventory
lonLat <- data.frame(Ghcn$Lon,Ghcn$Lat)
Nlight <- raster(hiResNightlights)
extent(Nlight)<-c(-180,180,-90,90) # fix the metadata error in nightlights
Ghcn<-cbind(Ghcn,Lights=extract(Nlight,lonLat)) # extract the lights using “points”
distCoast <-raster(coastDistanceFile,varname=”dst”) # get the special land mask
Ghcn <- cbind(Ghcn,CoastDistance=extract(distCoast,lonLat))
# for this mask, Water pixels are coded by their distance from land. All land pixels are 0
# make an inventory of just those land stations that appear in the water.
wetBulb <- Ghcn[which(Ghcn$CoastDistance>0),]
writeKml(wetBulb,outfile=”wetBulb”,tourname=”Wetstations”)
Some shots from the gallery. The 1km land/water mask is very accurate. You might notice one or two stations actually on land. Nightlights is less accurate, something H2010 does not recognize. Its pixels can be over 1km off true position. The small sample below should show the various cases. No attempt is made to ascertain if this causes an issue for identification of rural/urban categories. As it stands the inaccuracies in Nightlights and station locations suggests more work before that effort is taken up.
Click to enlarge images:
















Kev-in-UK
Looking at the whole chain is on my todo list. only so many hours
Not only GISS. Nicaragua accidentally invaded Costa Rica last week because the Nicaraguan army is using Google Maps for (inaccurate) navigation.
juanslayton says:
November 8, 2010 at 6:18 am
Now that to me was a WOW! moment. Just correcting the location co-ordinates logged as a station change.
DaveE.
@Steve Mosher
yeah, I realise it’s a big issue!
Are you doing a ‘project’ whereby others can perhaps help? Many hands make light work and all that….
it could be worthwhile ‘delegating’ some tasks, so long as suitably instructed and suitable written records kept (don’t wanna end up like Jones, eh?) – just a thought. (but I do appreciate that one can really only ‘trust’ ones own work sometimes)
Well Steven I don’t quite see what all the fuss is about. They are only off by 300 km; and Hansen thinks that the Temperature anomalies are good to 1200 km; so it looks like a direct hit bullseye to me; well maybe it’s some other bull component.
I do hope our Nuke missile targetting is a little bit closer to that. Man would I be PO’d if some fishing buddy gave me the GPS of his favorite fishing Hotspot; and it turned out to be in a Bar in Lodi.
You chaps in the field do run into some crazy situations though.
And just think about it; somebody wants to make cars without drivers; how does that grab you ?
BS Footprint says:
November 8, 2010 at 8:31 am
In any case, I recommend a classic on the subject of statistical gamesmanship: “How to Lie with Statistics” by Huff and Geis
============================================================
It is indeed a classic. I read it when I was in high school in the 1950’s. But it discusses very elementary statistics. The statistical discussions here at WUWT often boggle my brain, although I learn a lot.
Early in my career, a statistician said to me “too many engineers think statistics is a black box into which you can pour bad data, then crank out good answers” Judging from the present topic, many “climate scientists” think the same thing.
Kev in Uk
The tasks I could hand off will come next.
basically, after I finish cleaning the metadata I’ll do this:
Publish a list of around ( I hope) 500 or so stations with a google tour.
Then folks could go look at each of those sites and check that my algorithm worked.
@Steve Mosher:
It would seem to me that there is an error in the coordinates of the weather stations, since they would not be really floating in the water. So these stations are on a coast. The amount of mis-location would result in an [error] in night light analysis. This does indeed appear to be an error. It would be interesting to learn what is the source of the location error.
Looking at the bigger picture, assuming the error is random in nature, one would expect some errors in the other direction, where coastal locations are put further inland than they should, and more nightlights would be found surrounding the station, than there are in reality.
So what would the results of these errors do the the UHI corrections?
In some cases,where the stations were mis-located over water, stations might be labeled rural , that would ordinarily have been labeled urban. On the other hand some stations which moved inland, might have been labeled rural if located correctly, but ended up labeled urban. In some cases the mislocation may not have made a difference.
You didn’t say how many stations were examined to get the number 230. Is it the 1034 stations with long runs of continuous data, or the total of 7200 GHCN stations.
Even if the rural versus urban classifications have some inaccuracy associated with them, the differences in trends between them are so small, about 005DegC/ century, that the effect of the corrections for UHI can be expected to remain small.
I should have written “result in an error in night light analysis”in the first paragraph in my above post.
eadler says:
November 8, 2010 at 5:11 pm
@Steve Mosher:
It would seem to me that there is an error in the coordinates of the weather stations, since they would not be really floating in the water. So these stations are on a coast. The amount of mis-location would result in an [error] in night light analysis. This does indeed appear to be an error. It would be interesting to learn what is the source of the location error.
################
The location errors result from rounding the location data given by source documents or from errors in source documents or from source documents having limited precision.
################
Looking at the bigger picture, assuming the error is random in nature, one would expect some errors in the other direction, where coastal locations are put further inland than they should, and more nightlights would be found surrounding the station, than there are in reality.
######################
I have done some limited “first look” type analysis of the errors. mean error is around .02 degrees latitude. Longitude error is slightly more as would be expected. When i use corrected locations you see some adjustments in the rural/urban designations with some rural becoming urban and some urban becoming rural. WMO is dropping new data on the 10th so I’ve held off doing any final look at this till I have their fresh data. In general i would look at defining rural not as just one pixel being dark, but rather a region around the site.. to account for the positional errors in nightlights. thinking of something like a 5-10km zone, but need some numbers to support such a decision.
###############
So what would the results of these errors do the the UHI corrections?
In some cases,where the stations were mis-located over water, stations might be labeled rural , that would ordinarily have been labeled urban. On the other hand some stations which moved inland, might have been labeled rural if located correctly, but ended up labeled urban. In some cases the mislocation may not have made a difference.
####################
erarly in this series i argued that the errors cannot make a substantial difference. But that needs quantifying. But what you say is true. I just try to put numbers on the statements.
#############
You didn’t say how many stations were examined to get the number 230. Is it the 1034 stations with long runs of continuous data, or the total of 7200 GHCN stations.
##########################
its the total 7280. I’d like to fix the metadata so:
1. People like Nick Stokes and JeffId who use all the stations could benefit
2. people like Zeke and I who use subsets could benefit.
I actually like Nick and jeffId/RomanMs approach better from a statistical standpoint but I’m curious about what I will see if I only select a smaller number of stations with long histories and good metadata. Curious.
#################
Even if the rural versus urban classifications have some inaccuracy associated with them, the differences in trends between them are so small, about 005DegC/ century, that the effect of the corrections for UHI can be expected to remain small.
##########
Ive never expectred to see anything more than a UHI contribution that was greater than .3C. i would not be shocked with a bias of less than .1C. I’d do an over/under bet at .15C. I think Jones himself said estimates ranged between 0 and .3C. I see nothing to change that range. Hope that’s clear.
Steven, I couldn’t go by you statement here without commenting. Where exactly do you pull this 0.3 ºC maximum figure for UHI? My only recollection of an empirical figure from real data, actually some 10,000 stations, is Dr. Spencer’s work and published here month’s ago. It clearly shows the UHI being many times that not only in the largest cities, depending on population, of more like 1.0-1.2ºC. As his research showed even small towns can have a rather large UHI (0.4-0.7ºC) and if the locations by GHCN used by GISS is off by even a few kilometers this can definitely give the impression that non-urban areas are showing warming where the warming is only occurring tightly within or very near (as am airport) to the urban area. That is where the thermometers are.
http://www.drroyspencer.com/wp-content/uploads/ISH-UHI-warming-global-by-year.jpg
Do you somehow discredit the work by Dr. Spencer that this data was indicating?
I think your work here is much more important than even you might give it. An error of location could have huge implications of whether this is all we are seeing in this GW fiasco, individual local warming of one or a few hundred square kilometers around urban areas that is, however small, and not global world-wide warming.
Here, I’ll lookup Dr. Spencer’s posts here on this matter:
http://wattsupwiththat.com/2010/03/03/spencer-using-hourly-surface-dat-to-gauge-uhi-by-population-density/
http://wattsupwiththat.com/2010/03/04/spencers-uhi-vs-population-project-an-update/
http://wattsupwiththat.com/2010/03/10/spencer-global-urban-heat-island-effect-study-an-update/
Jimmi,
take a look at Wood for Trees:
http://www.woodfortrees.org/plot/uah/from:1979/to:1997/trend/plot/uah/from:1979/to:1997
Yes it goes up a little, but, with the accuracy of the measurements I think this falls into the not significant range!! Over a period of 20 years the trend rises about .06C.
Over 100 years that would be .3C. Jimmi, I can’t believe you are worried about less than .5C over 100 years!!
hat is .a bit less thTan the error
Oh, and Jimmi,
if you do the same graphs for the full satellite period you find that it is about .3c for 30 years!! Still not anywhere close to the IPCC’s 2c+ for 100 years and still pretty small in relation to the error.
Steven mosher @ur momisugly
November 8, 2010 at 11:39 am
_________________________
Thank you for your kind response.
I am actually interested in the question about the relative importance of the 230 stations.
Perhaps I should have asked the question a bit differently and more precisely.
In any case, your later comment about the 7820 total stations gave me a useful measure and perspective while your work continues..
Again, thank you for your response
Let’s have a look at Czech station of GHCN V2:
http://www.unur.com/climate/ghcn-v2/611/11464.html
Milesovka:
Coordinates correct (50.554924,13.931301 according to Googlemaps).
However, then there’s this: “Mountainous valley or at least not on the top of a mountain”
The station was specifically built at the very mountain top with the aim of studying storms and lightning!
>>Latimer
>>1. It relies on … the computation of an average temperature based on the just the maxima and minima of the daily readings.
Sure. So if we have 30oc temperature all day and night, except for a 2-hour drop to 10oc overnight, we can be sure that on average it was a coolish day.
Husten’s got a point. There are a lot of different mapping datums out there and it definitely matters which one you use. 200m error is easy in Australia if one set of coords is in WGS84 and the other is AGD66, that’s without any precision/rounding errors or simple human error transposing digital readouts to paper and then back to digital data which will compound things further. Maybe I looked in the wrong place but I couldn’t find a reported datum for the positions data in the metadata. That’ s a bit like recording a number without specifying the units used.
Robin:
You clearly are aware that there is a divergence problem with the tree-ring proxies. Singer’s article was very high level – my guess is that this is what he was referring to. You might try an email to directly to the source : he’s involved with SEPP, so try through that website (http://www.sepp.org/).
Thanks, Ian. Singer seems to be making an important point but I would like to see the evidence. I’ll try the SEPP link.
I was rather disappointed that no one on WUWT seemed able to help.
Very nicely done.
You might also find it interesting to compare those over the ocean with their elevation. It would help sort out the “atoll” station a 1 m elevation from the “mountain off shore” that’s in the ocean but 1000 m up…
Zeke Hausfather says:
November 8, 2010 at 9:24 am
“Mike Haseler,
Its not like they schedule conferences to discuss shortcomings in current temperature records and ways to fix them, right? If you talk to scientists working on the surface temperature records (and there are surprisingly few of them, mostly owing to low budgets to fund ongoing work), […]”
So you confirm that nearly all the CAGW money is dumped into supercomputer modeling, and nearly nothing into acquiring real world data. One could assume incompetence, or one could assume malice.
For simple defensive reasons, i prefer to assume malice.